2012-07-06 4 views
0

과 큰 문장에서이 명 문구를 찾기 내가 가진 :최소한의 중복

phrase = "will have to buy online pass from EA to play online but its in perfect condition" 

phrases = ["its", 
"perfect condition", 
"but its", 
"in perfect condition", 
"from EA", 
"buy online pass from EA", 
"to play online but its in perfect condition", 
"online", 
"online pass", 
"play online but its in perfect condition", 
"online but its", 
"EA", 
"will have to buy online pass from EA to play online but its in perfect condition", 
"have to buy online pass from EA to play online but its in perfect condition", 
"u", 
"pass", 
"to buy online pass from EA"] 

내가 6 ~ 10 개 단어 이내 제한 및 중복 단어 현명한 이상이 배열에서이 명 문구를 발견하고 싶습니다. .. 같은

뭔가 :

result = ["to buy online pass from EA", "play online but its in perfect condition"] 

완벽 것은 .. 그것을 할 수있는 가장 좋은 방법은 무엇입니까?

+0

이 유사했다 ... 당신이 한 말인가요? http://stackoverflow.com/questions/11300921/reconstruct-original-sentence-from-smaller-phrases – Stpn

+0

두 개가 겹치지 않습니까? 어떤 순서로든? –

+0

모든 주문 예 6 Stpn

답변

0
split_phrases = phrases.map {|phrase| phrase.split } 

# find number of words of overlap between two word vectors 
def overlap(p1,p2) 
    s1 = p1.size 
    s2 = p2.size 

    # make p1 the longer phrase 
    if s2 > s1 
    s1,s2 = s2,s1 
    p1,p2 = p2,p1 
    end 

    # check if p2 is entirely contained in p1 
    return s2 if p1.each_cons(s2).any? {|p| p == p2} 

    longest_prefix = (s2-1).downto(0).find { |len| p1.first(len) == p2.last(len) } 
    longest_suffix = (s2-1).downto(0).find { |len| p2.first(len) == p1.last(len) } 

    [longest_prefix, longest_suffix].max 
end 

def best_two_phrases_with_minimal_overlap(wphrases, minlen=6, maxlen=10) 
    # reject too small or large phrases, evaluate every combination, order by word overlap 
    scored_pairs = wphrases. 
    select {|p| (minlen..maxlen).include? p.size}. 
    combination(2). 
    map { |pair| [ overlap(*pair), pair ] }. 
    sort_by { |tuple| tuple.first } 

    # consider all pairs with least word overlap 
    least_overlap = scored_pairs.first.first 
    least_overlap_pairs = scored_pairs. 
    take_while {|tuple| tuple.first == least_overlap }. 
    map {|tuple| tuple.last } 

    # return longest pair with minimal overlap 
    least_overlap_pairs.sort_by {|pair| pair.first.size + pair.last.size }.last 
end 

puts best_two_phrases_with_minimal_overlap(split_phrases).map{|p| p.join ' '} 

# to play online but its in perfect condition 
# to buy online pass from EA 
0

어때요?

result = Array.new 
phrases.each do |p| 
    result.push(p) if(phrase.include?(p) && (6..10).include?(p.split.size)) 
end 
#remove entries that are substr of others 
result.each do |r| 
    result.delete(r) if (t = result.clone ; t.delete(r) ; t.any? {|v| v.include?(r)}) 
end 
print result.inspect 
#["to play online but its in perfect condition", "to buy online pass from EA"] 
+0

좋은 답변을 찾았지만 문구가 머리와 꼬리에서 겹쳐 지지만 완전히 포함되지 않은 경우 제외됩니다. 서로. – dbenhur