2017-10-17 4 views
2

Apache Nutch-1.13 및 solr 6.6.0 버전을 사용하고 있습니다.Solr Nutch Crawler를 사용한 색인 생성

Indexer: java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) 
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147) 
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239) 

Error running: 
    /Users/myedlapalli/documents/nutch-solr-3/apache-nutch-1.13/runtime/local/bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch TestCrawl/crawldb -linkdb TestCrawl/linkdb TestCrawl/segments/20171017090519 
Failed with exit value 255. 

그리고 로그 : 나는이 예외를 가지고

bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/seed.txt TestCrawl 2 

:

는 내가 콘텐츠를 크롤링하려면 다음 명령을 실행 해요

2017-10-17 09:36:35,032 INFO solr.SolrIndexWriter - Indexing 1/1 documents 
2017-10-17 09:36:35,032 INFO solr.SolrIndexWriter - Deleting 0 documents 
2017-10-17 09:36:35,161 INFO solr.SolrIndexWriter - Indexing 1/1 documents 
2017-10-17 09:36:35,161 INFO solr.SolrIndexWriter - Deleting 0 documents 
2017-10-17 09:36:35,174 WARN mapred.LocalJobRunner - job_local193014604_0001 
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nutch: ERROR: [doc=http://www.cmo.com/features/articles/2017/8/21/5-emerging-technologies-rewrite-the-media-and-entertainment-script-.html] unknown field 'sp_type' 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) 
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nutch: ERROR: [doc=http://www.cmo.com/features/articles/2017/8/21/5-emerging-technologies-rewrite-the-media-and-entertainment-script-.html] unknown field 'sp_type' 
    at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:576) 
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240) 
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229) 
    at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) 
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210) 
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:188) 
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:179) 
    at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117) 
    at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) 
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
    at java.lang.Thread.run(Thread.java:748) 
2017-10-17 09:36:36,109 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) 
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147) 
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239) 

수있는 사람하시기 바랍니다 도와주세요? 미리 감사드립니다.

답변

0

이 경우 일반적으로 Solr 쪽의 로그를 확인하는 것이 좋지만이 특정 오류는 확인하십시오.

ERROR: [doc=http://www.cmo.com/features/articles/2017/8/21/5-emerging-technologies-rewrite-the-media-and-entertainment-script-.html] unknown field 'sp_type' 
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:576) 
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240) 

SOLR, (... ID http://www.cmo.com/features/articles/2017/8/21/5로) 문서를 보내는 것을 불평 스키마에 정의되지 않은 하나 개의 필드 : sp_type 당신은 이미, 특히 부분을 답이있다.

해당 필드에서 보내는 내용을 확인하거나 Solr 스키마에 필드를 추가해야합니다.

Solr 스키마에 정의되지 않은 필드가 더 많은 경우이 오류가 계속 표시됩니다. Nutch가 Solr에게 보낼 내용을 보려면 bin/nutch indexchecker <URL> 명령을 실행하는 것이 좋습니다.