Lucene의 복수 단어 쿼리

예 : Lucene 문서에 "description"이라는 열이 있습니다. "설명"의 콘텐츠가 [hello foo bar]이라고 가정 해 봅시다. 나는 [hello f]라는 쿼리를 원한다. 그러면 문서가 쳐져야한다. [hello ff] 또는 [hello b]은 치지 말아야한다. Lucene의 복수 단어 쿼리

나는 같은 PrefixQuery로, TermQuery는 BooleanQuery에 추가되었지만, 예상대로 작동하지 마십시오 Query를 만드는 프로그래밍 방법을 사용합니다. StandardAnalyzer이 사용됩니다.

테스트 케이스 :

는

A) : PhraseQuery query = new PhraseQuery(); query.add(new Term("description", "hello f"))-0가

모든 권고를 히트> : - : -> 0 명중

C) PhraseQuery query = new PhraseQuery(); query.add(new Term("description", "hello f*"))new PrefixQuery(new Term("description", "hello f"))> 0

B)를 공격? 감사!

출처

2012-12-17 卢声远 Shengyuan Lu

을 당신은 어떻게이 해봤습니까? 일부 코드 조각을 보여줄 수 있습니까? 이렇게하면 문제를 훨씬 잘 이해하는 데 도움이됩니다. –

org.apache.lucene.queryParser.QueryParse를 사용하여 "description : hello AND description : f *"와 같은 쿼리 문자열을 구문 분석 해 보았습니까? – pabrantes

@pabrantes "description : 안녕하세요 AND 설명 : f *"가 필요하지 않습니다. "hello"다음에 "f"가 필요합니다. –

인덱싱 중에 Ngram 또는 EdgeNgram을 시도했습니다.

http://lucene.apache.org/core/old_versioned_docs/versions/2_9_0/api/all/org/apache/lucene/analysis/ngram/NGramTokenizer.html

출처

2012-12-17 09:32:28 rrsk

하나 Term 객체에 여러 조건을 통과하기 때문에 그것은 작동하지 않습니다. 그것은 "안녕하세요 F"로 "안녕하세요"와 "F"를 검색 텍스트를 분할합니다

토큰 화하여 분석기를 입력 문자열 : 모든 검색 단어를 접두사 발견하길 원한다면, 당신은 할 필요가 :

TokenStream tokenStream = analyzer.tokenStream (null, new StringReader (searchText)); CharTermAttribute termAttribute = tokenStream.getAttribute (CharTermAttribute.class);

목록 tokens = new ArrayList(); while (tokenStream.incrementToken()) { tokens.add (termAttribute.toString()); }
모든 PrefixQueries

BooleanQuery 편집을 차례로 PrefixQuery에 넣어 될 필요가 Term 개체로 각 토큰을 넣고 예를 들어 다음과 같이 :

BooleanQuery booleanQuery = new BooleanQuery(); 

for(String token : tokens) {   
    booleanQuery.add(new PrefixQuery(new Term(fieldName, token)), Occur.MUST); 
}

출처

2012-12-17 10:09:48

아담 감사합니다! 이미 분석기에 첫 번째 방법을 사용합니다. 그러나 두 번째 방법은 예상 된 방법이 아닙니다. –

@Shengyuan 현재 코드를 표시하십시오 –

Lucene의 복수 단어 쿼리

답변

관련 문제