1
이것은 아마도 쉬운 질문 일 수 있지만 알아낼 수는 없습니다. 내가 거기 밖으로 [email protected]
및 http://www.aachener-airport-taxi.de
을 얻으려고BeautifulSoup을 사용하여 데이터 추출
<!-- ENDE telefonnummer.jsp --></li>
<li class="email ">
<a
class="link"
href="mailto:[email protected]"
data-role="email-layer"
data-template-replacements='{
"name": "Aachener-Airport-Taxi Blum",
"subscriberId": "128027562762",
"captchaBase64": "data:image/jpg;base64,/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAvAG4DASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD02iivPLm58L6d4x1i4nsdLGpRNCsUcsSIFcASG5eUjEQLXCqWPJMfy72IWvyDD4d13JK90r6K/VLurb7nuzlynodcfqvxJ0PTda/se3ivtU1AMyPBp0HmsjKMkHJGTjOducbTnGK1rDw7awanJrF7FaXOsy4DXcdsI9oClQEBLEcE5JYk5xnaFVfGXTxP8JPElxqVxbQ6jaXshja8lGTOMhz8+S0bnng5BIJw+0GvRy3A4fETnHm5pJe6r8vM+uuui+TflqY1akoJO1l9567oPjKz13VJtMOnapp17HCLgQ6hbeUzx7tpZeTwDgc468ZwcM/4WD4ZOp/2bHfTTXh+7DDZzyFxt3ArtQ7lK/MCMgjkcVZ8LeLdK8X6e93pkjgxttlglAWSM9twBPBAyCCR17ggeWaHfw3fx11nU9QS53WTTrEtnbSTZKYgG5UVmxsySeBux06VVDAU6s6yqQlHkjeyet+2q2YSqOKjZp3Z61pnibSdX1C50+0uX+22yh5baeCSGRVPQ7XUEjkdPUeorz/xt8SPEPgzxWthJbaXdWUircR7UkSQxFiNpO4gN8pGcEdDjsKvhy1k8U/GOfxdpjQtpEWcu0yeYf3JhH7sEuu4gsNwXKjPXiuw1zw/Z+J9Y13S7xEIk0y0MUjLuMMm+62uORyCfUZGQeCa1jRwmDxKVVc0eVOSe8W2k1p1W/Tt5icp1Ie7o76eZu2epf294ft9R0a4hj+1RrJE80fmhPVWVXHzDlSA3BHfGKg8N3ep31pcT6jPaSbbmaCNbe3aLHlSvGSdztnO0HHGOnNeH6Dr2ufCfxPNpWqwvJYOwaaBTlXU8CaInHOB7ZxtOCPl9t8IzRXGhPPBIksUl/eukiMGVlN1KQQR1BFY5jl7wcG4tShJrllptrpf7v6uOlV9o9dGtzdooorxToCiiigCG7uo7K2e4lWZkTGRDC8rcnHCoCx69hWF4ZeHUNN1GG5s7ndNd3DTre2ckfnRvK4jz5ijePKCLjnChQccCujorWNRRpuNtW1rft/XfsJq7uclot5qem+EDY2+mTX2p6V/oiQtG1otwiSGNHV5AVOY1DnBI5xxkVn6j45vL7T57Ow8Ea/NdXC+THHqFhstyW4/eHcflwec4B6EjqO9orojiqXO5zpptu+7Xy06fc/MhwdrJnmXgjw7efDnwvqV9f2013qd5tKWdmrzfdQlEOxDtYsWBblR8vPrmfCS3k8LWWpy6vZavb3F1JGqwf2TcPhUBw25UI5LkY7bfevYKK6Z5rKrGqqsbuo1dp2+HZLRkqik1boeWfDvw7q6eOtc8UXdhNY2N95/kR3Q2THfMGGU5K4C85x1GMjmupstYgk8X3c4tdUWK5tLWCOR9MuUUusk5YEmMbQBIvJwOevBrqqKxxGO+sTlOpHdJKz2S+++w40+RJJnK+PfBsXjPQRarIkN7A3mW0zKCA2MFWOMhW4zjuFPOME+HGnXmk+A9PsL+3eC6gaZZI36g+c/5gjkEcEEEV1VFYvGVXhvqr+FO68t/wDMr2a5+fqFFFFcpYUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAH//Z",
"captchaWidth": "110",
"captchaHeight": "47",
"captchaEncryptedAnswer": "767338fffffff8ffffffd6ffffff8d3038ffffffba1971ffffffdfffffffe3f6c9"
}'
data-wipe='{"listener":"click","name":"Detailseite E-Mail","id":"128027562762"}'
>
<i class="icon-mail"></i>
<span class="text" >[email protected]</span>
</a>
</li>
<li class="website ">
<a class="link" href="http://www.aachener-airport-taxi.de" rel="follow" target="_blank" title="http://www.aachener-airport-taxi.de"
data-wipe='{"listener":"click","name":"Detailseite Webadresse","id":"128027562762"}'>
<i class="icon-website"></i>
<span class="text">Zur Website</span>
</a>
</li>
</ul>
</div>
: 나는 문제 BeautifulSoup로와 웹 페이지의이 부분에서 이메일과 URL을 추출하는 데 문제가 있습니다. 은 컴파일러가 대괄호 안에 선언하고 싶다고 생각하기 때문에 분명히 작동하지 않습니다. soup.find(class='email')
내가 거기에있는 모든 링크를 얻으려면 for link in soup.find_all('a'): print(link.get('href'))
사용할 수 있지만, 나는이 특정 하나 싶습니다. 링크는 항상 다르므로 정규식을 사용할 수 없으므로 직접 html-body를 탐색해야합니다.
니스, 감사합니다. 첫 번째 방법은 실제로 전자 메일 대신 전화 번호를 반환했습니다. 전화 번호가 게시 된 HTML 본문 부분 위에 비슷한 구조로 중첩되어 있기 때문입니다.하지만 메일을 추출하도록 수정할 수있었습니다.)'mail = soup.find (attrs = { "class": "email"}) .a [ "href"]'를 사용하면 mailto : info @ taxi-ac.de'를 반환합니다. 결과를 string.split에 넣기 만하면됩니다. 아마도 교과서적인 접근법이 아닐지 모르지만, 그것이 효과가 있습니다. –