html 파일을 여는 방법은 무엇입니까?

나는 test.html이라는 html 파일을 갖고 있는데, 한 단어는 בדיקה입니다.html 파일을 여는 방법은 무엇입니까?

나는 인 test.html를 열고이 코드 블록 사용하여 콘텐츠의 인쇄 : 이런 일이 왜

file = open("test.html", "r") 
print file.read()

하지만 ??????가 인쇄를 어떻게 해결할 수 있을까?

BTW. 텍스트 파일을 열면 잘 작동합니다.

편집 : 나는이 시도 것 :

>>> import codecs 
>>> f = codecs.open("test.html",'r') 
>>> print f.read() 
?????

출처

2014-12-02 david

유니 코드, UTF-8에 대한 설명 – vks

파일을 UTF-8 형식으로 열어야합니다. http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python –

여전히 작동하지 않는 경우 처리하려는 페이지를 게시하십시오. – wenzul

import codecs 
f=codecs.open("test.html", 'r') 
print f.read()

는 다음과 같이하십시오.

출처

2014-12-02 06:34:58 vks

작동하지 않습니다. – david

또한 codecs.open ("test.html", 'r', 'utf-8')을 시도하지만 f.read()를 인쇄 할 때 유니 코드 디코딩 오류가 발생합니다! – david

터미널을 사용하고 있습니다 !! – david

당신은 'URLLIB'를 사용하여 HTML 페이지를 읽을 수 있습니다.

#python 2.x 

    import urllib 

    page = urllib.urlopen("your path ").read() 
    print page

출처

2014-12-02 06:33:50 Benjamin

내가 가지고있어 ??? ! – david

어떻게'page'에 연산을 할 수 있습니까? ? 특정 단어를 읽는 것과 같습니다. 문자열처럼'page'를 사용할 수 있습니까? –

codecs.open을 인코딩 매개 변수와 함께 사용하십시오.

import codecs 
f = codecs.open("test.html", 'r', 'utf-8')

출처

2014-12-02 07:43:56 wenzul

다음과 같은 코드를 사용할 수있다 :

from __future__ import division, unicode_literals 
import codecs 
from bs4 import BeautifulSoup 

f=codecs.open("test.html", 'r', 'utf-8') 
document= BeautifulSoup(f.read()).get_text() 
print document

당신이 사이에있는 모든 빈 줄을 삭제하고 문자열로 모든 단어를 얻고 싶다면 (특수 문자를 방지, 숫자) 다음도 포함 :

이

import nltk 
from nltk.tokenize import word_tokenize 
docwords=word_tokenize(document) 
for line in docwords: 
    line = (line.rstrip()) 
    if line: 
     if re.match("^[A-Za-z]*$",line): 
      if (line not in stop and len(line)>1): 
       st=st+" "+line 
print st

* st=""

처럼, 처음 string로 st을 정의

출처

2015-12-03 11:09:09

-2

당신은 몇 가지 변화

https://stackoverflow.com/a/27243244/4815313 같은 python3에을 'URLLIB'를 사용할 수 있습니다.

#python3 import urllib page = urllib.request.urlopen("/path/").read() print(page)

출처

2016-02-09 13:13:04 Suresh2692

'AttributeError : 'module'객체에 'request'' 속성이 없습니다. –

@ tommy.carstensen이 [urllib python3] (https://docs.python.org/)을 살펴 보시기 바랍니다. 3/library/urllib.request.html # module-urllib.request) – Suresh2692

감사합니다. 나는이 문서에 대해 잘 알고있다. 들여 쓰기가 잘못되어'import urllib.request'가되어야합니다. –

html 파일을 여는 방법은 무엇입니까?

답변

관련 문제