0
나는 생각이 나는 HTML의 특정 요소를 선택하는 XPath를 사용할 수있는 것으로, 인덱스에 아파치 SOLR과 TikaEntityProcessor를 사용하여 HTML 문서를 시도하고있다.SOLR 티카 XPath는 예외
나는 TikaEntityProcessor Solr Wiki page의 하단에 표시 고급 예를 따랐다.
03-Oct-2012 16:39:48 org.apache.solr.handler.dataimport.DataImporter doFullImport
INFO: Starting Full Import
03-Oct-2012 16:39:48 org.apache.solr.core.SolrCore execute
INFO: [htmlTest] webapp=/apache-solr-3.6.1 path=/dataimport params={command=full-import} status=0 QTime=31
03-Oct-2012 16:39:48 org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties
INFO: Read dataimport.properties
03-Oct-2012 16:39:48 org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [htmlTest] REMOVING ALL DOCUMENTS FROM INDEX
03-Oct-2012 16:39:48 org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
commit{dir=C:\Program Files\Apache Tomcat\conf\apache-solr-3.5.0\htmlTest\data\index,segFN=segments_1e,version=1349187077567,generation=50,filenames=[_u.fnm, _u.nrm, _u.tis, _u.prx, _u.frq, _u.fdx, _u.fdt, _u.tii, segments_1e]
03-Oct-2012 16:39:48 org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1349187077567
03-Oct-2012 16:39:48 org.apache.solr.handler.dataimport.SqlEntityProcessor initQuery
SEVERE: The query failed 'null'
java.lang.NullPointerException
at java.io.File.<init>(File.java:222)
at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:96)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:53)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:44)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
03-Oct-2012 16:39:48 org.apache.solr.common.SolrException log
SEVERE: Exception while processing: tika-test document : SolrInputDocument[{text=text(1.0)={<html>
<meta name="Content-Encoding" content="ISO-8859-1">
<meta name="Content-Type" content="text/html">
<title></title>
<body>
<h1>This is my first heading</h1>
This is some content
<h1>This is my second heading</h1>
This is some more content
</body></html>}}]:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException
at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:65)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.NullPointerException
at java.io.File.<init>(File.java:222)
at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:96)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:53)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:44)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
... 11 more
03-Oct-2012 16:39:48 org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {deleteByQuery=*:*} 0 31
03-Oct-2012 16:39:48 org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:264)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:621)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException
at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:65)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.pullRow(EntityProcessorWrapper.java:330)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:296)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:683)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
... 5 more
Caused by: java.lang.NullPointerException
at java.io.File.<init>(File.java:222)
at org.apache.solr.handler.dataimport.FileDataSource.getFile(FileDataSource.java:96)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:53)
at org.apache.solr.handler.dataimport.BinFileDataSource.getData(BinFileDataSource.java:44)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
... 11 more
03-Oct-2012 16:39:48 org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
03-Oct-2012 16:39:48 org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
내 데이터 가져 오기 구성은 다음과 같습니다 :
<dataConfig>
<dataSource type="BinFileDataSource"/>
<dataSource type="FieldReaderDataSource" name="fld"/>
<document>
<entity name="tika-test" processor="TikaEntityProcessor"
url="C:/Program Files/Apache Tomcat/conf/apache-solr-3.5.0/htmlTest/data/html_basic.html" format="html">
<field column="text"/>
<entity type="XPathEntityProcessor" forEach="/html" dataField="text">
<field xpath="//h1" column="date" />
</entity>
</entity>
</document>
</dataConfig>
그리고 SOLR는 HTML 문서 나 데이터 가져 오기 명령을 완료 할 때
, 나는 다음과 같은 오류 메시지가 전송 색인은 다음과 같습니다
<html>
<head>
</head>
<body>
<h1>This is my first heading</h1>
<div>
This is some content
</div>
<h1>This is my second heading</h1>
<div>
This is some more content
</div>
</body>
은 그냥 몇 가지 추가 정보를 추가, 그것은 이해된다 자사의 소스로 SqlEntityProcessor에 XPathEntityProcessor 기본 설정됩니다. 어떤 이유로 TikaEntityProcessor에 바인딩 할 수 있다고 생각하지 않습니다 (그것이 작동하는 경우) –