다음은 제 코드입니다. 나는 Scrapy를 사용하여 웹 사이트를 긁어 내려고 색인 생성을 위해 Elasticsearch에 데이터를 저장하려고합니다.Scrapy : 응답을 치료하는 방법?
def parse(self, response):
for news in response.xpath('head'):
yield {
'pagetype': news.xpath('//meta[@name="pagetype"]/@content').extract(),
'description': news.xpath('//div[@class="module__content"]/*/node()/text()').extract(),
}
이제 내 문제는 '설명'입력란에 저장되는 값입니다. 내가 더 단지 여분의 공백, 줄 바꿈없는 일반 텍스트를 포함하는이 코드를 처리하려면 어떻게해야 공백이 많은, 줄 바꿈 코드와 'U'문자가 있습니다
[u'\n \n ', u'"For\n many of us what we eat on Christmas day isn\'t what we would usually consume and\n that\u2019s perfectly ok," Dr said.', u'"However\n it is not uncommon for festive season celebrations to begin in November and\n continue well in to the New Year.', u'"So\n if health is on the agenda, being mindful about what we put into our bodies\n with a balanced approach, throughout the whole festive season, is important."', u"Dr\n , a lecturer at School\n Sciences, said balancing fresh, healthy food with being physically active was a\n good start.", u'"Whatever\n the celebration, try to limit processed foods, often high in fat, sugar and\n salt," she said.', u'"Taking\n time during holidays to prepare food and make the most of fresh ingredients is\n often a much healthier option than relying on convenience foods and take away.', u'"Being\n mindful about going back for seconds is important too.\xa0 We don\u2019t need to eat until we feel\n uncomfortable and eating the foods we enjoy doesn\'t necessarily mean we need to\n eat copious amounts."', u"Dr\n own healthy tips and substitutes for the Christmas season\n include:", u'But\n just because Dr is a dietitian, doesn\u2019t mean she doesn\u2019t enjoy a\n Christmas treat or two.', u'"I\n would have to say my sister in law\'s homemade rocky road is my favourite\n festive treat. She makes it every Christmas day and it gets better each year," she\n said.', u'"I\n also enjoy a summer cocktail every so often during the festive season and a\n mojito would be one of my favourites on Christmas day. We make it with extra\n mint from the garden which is a nice, fresh addition.', u'"Rather\n than focusing on food avoidance, moderation is the best approach.', u'"There\n are definitely some more healthy choices and some less healthy options when it\n comes to the typical Christmas day menu, but it\'s more important to be mindful\n of a healthy, balanced diet throughout the festive period, rather than avoiding\n specific foods on one day of the year."', u'\n ', u'\n \n ', u'\n ', u'\n \n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'Related News', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'\n ', u'Search for related news']
....
(\ n) 코드와 'u'자?
나는 BeautifulSoup이 Scrapy와 잘 작동하지만 Scrapy와 BeautifulSoup을 통합하는 방법에 대한 예제를 찾을 수 없습니다. 나는 다른 방법을 사용하기 위해 열려 있습니다. 어떤 도움이라도 대단히 감사합니다.
감사
관련 항목 : http://stackoverflow.com/q/21839877/4063051 – glS
'u'는 목록에 유니 코드로 된 텍스트 정보입니다. 목록에서 단일 요소를 인쇄 할 경우'u'가없는 텍스트가 표시됩니다. – furas
은 지워지지 만 해당 문자열에서 줄 바꿈과 공백을 제거하기 만 원하십니까? – glS