두 파일에서 비슷한 단어 (문자열)를 찾으려면

파일 1의 단어 1과 파일 2의 단어 2의 유사성을 검증해야합니다. 단어 1 (파일 1)이 단어 2 (파일 2)와 같으면 파일 3이 True 및 False를 표시하는 출력이됩니다. 아래 코드가 있지만 오류가 있지만 출력을주지 않을 때 붙어있다. 자바 초보자입니다.두 파일에서 비슷한 단어 (문자열)를 찾으려면

import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileWriter; 
import java.io.IOException; 
import java.util.ArrayList; 
import java.util.Scanner; 

public class test2 { 

    private static ArrayList<String> load(String f1) throws FileNotFoundException { 
     Scanner reader = new Scanner(new File(f1)); 
     ArrayList<String> out = new ArrayList<String>(); 
     while (reader.hasNext()) { 
      String temp = reader.nextLine(); 
      String[] sts = temp.split(" "); 
      for (int i = 0; i < sts.length; i++) { 
       if (sts[i].equals("") && sts[i].equals(" ") && sts[i].equals("\n")) { 
        out.add(sts[i]); 
       } 
      } 
     } 
     return out; 
    } 

    private static void write(ArrayList<String> out, String fname) throws IOException { 
     FileWriter writer = new FileWriter(new File("out_test2.txt")); 
     for (int i = 0; i < out.size(); i++) { 
      writer.write(out.get(i) + "\n"); 
     } 
     writer.close(); 
    } 

    public static void main(String[] args) throws IOException { 
     ArrayList<String> file1; 
     ArrayList<String> file2; 
     ArrayList<String> out = new ArrayList<String>(); 
     file1 = load("IbanDict.txt"); 
     file2 = load("AFF_outVal.txt"); 

     for (int i = 0; i < file1.size(); i++) { 
      String word1 = file1.get(i); 
      for (int z = 0; z < file2.size(); z++) { 
       if (word1.equalsIgnoreCase(file2.get(z))) { 
        boolean already = false; 
        for (int q = 0; q < out.size(); q++) { 
         if (out.get(q).equalsIgnoreCase(file1.get(i))) { 
          already = true; 
         } 
        } 
        if (already == false) { 
         out.add(file1.get(i)); 
        } 
       } 
      } 
     } 
     write(out, "out_test2.txt"); 
    } 

}

출처

2011-08-15 ssaee

에 대한 나의 제안은 조금 더 :

귀하의 명확한 코드는 다음과 같이 보일 것인가? 읽을 수 없습니다. 또한 각 루프를 단계별로 디버깅하여 file3을 비교하거나 작성할 때 각 ArrayList 및/또는 변수가 보유하고있는 것을 정확히 확인해야합니까? – Jack

게시물을 편집하고 코드를 들여 쓰기하여 구조를 표시하십시오. 현재 상태에서는 읽을 수 없습니다. –

당신은 Levenshtein 거리를 조사하고 싶을지도 모른다. 이미 존재하는 Java 라이브러리/알고리즘이 있습니다. – Mike

먼저 Scanner이 (가) 문자열을 tokenize합니다. String.split 메서드를 사용하여 줄을 읽거나 토큰을 읽을 필요가 없습니다. here을 참조하십시오. 여기에 논리 오류가 같은

둘째, 보이는 :

for (int i = 0; i < sts.length; i++) { 
    if (sts[i].equals("") && sts[i].equals(" ") 
      && sts[i].equals("\n")) 
     out.add(sts[i]); 
}

(난 당신이 뭘 하려는지 이해 가정을)이 있어야한다 :

for (int i = 0; i < sts.length; i++) { 
    if (!(sts[i].equals("") && sts[i].equals(" ") && sts[i] 
      .equals("\n"))) 
     out.add(sts[i]); 
}

이입니다 왜 출력을 볼 수 없습니다.

참고 :이 일치 방법은 오류가 발생하기 쉽고 최적 (선형)에서 멀리 떨어져 있습니다. awk 나 Python과 같은 특수 텍스트 구문 분석 언어로 더 많은 성공을 거둘 수 있습니다 (Java에 구속되지 않는다고 가정). Java에 익숙하지 않은 경우 FilterReader/Writer 클래스를 here과 같이 확장 할 수도 있습니다.

출처

2011-08-15 02:25:47 wulfgarpro

사실, 나는 단어 1 사전 (file1)에 있는지 확인하려고 시도합니다. 따라서 if 문을 어떻게 향상시킬 수 있습니까? – ssaee

당신의 솔루션이 실제로 어떻게 작동하는지 보지 않고, 논리를 부정하려고하는 것이 좋습니다, 즉'if (! sts [i] .equals ("") && sts [i] .equals ("") && sts [i ] .equals ("\ n"))) {...}' – wulfgarpro

예, 비어 있습니다. if 문은 입력 파일의 형식을 따라야합니다. – ssaee

내가보기에 몇 가지 문제가 있습니다. 하나는 wulfgar.pro에서 공간이 중복되는 것을 지적했습니다.

또 다른 문제는 Scanner에 구두점이 포함되므로 file2가 "행복합니다"인 경우 file1 "나는 행복하고 슬픈"은 "행복"하지 않습니다.

단어가 몇 번 일치하는지 걱정하지 않으므로 Sets를 사용하도록 변경했습니다. 그런 다음 반복을 위해 for-each 루프를 사용하십시오 (generics를 사용하고 있으므로 각 루프를 수행 할 수 있어야합니다).

private static final Pattern PUNCTUATION_PATTERN = Pattern.compile("[\\w']+"); 

private static Set<String> load(String f1) throws FileNotFoundException { 
    Scanner reader = new Scanner(new File(f1)); 
    Set<String> out = new HashSet<String>(); 
    while (reader.hasNext()) { 
     String tempLine = reader.nextLine(); 
     if (tempLine != null 
       && tempLine.trim().length() > 0) { 
      Matcher matcher = PUNCTUATION_PATTERN.matcher(tempLine); 
      while (matcher.find()) { 
       out.add(tempLine.substring(matcher.start(), matcher.end())); 
      } 
     } 
    } 
    return out; 
}

피 루프 main 방법 다음으로 간략화 될 수있다 :

는 그래서 load 방법에서 잠시 루프 재 작성

public static void main(String[] args) throws IOException { 
    Set<String> out = new HashSet<String>(); 
    Set<String> file1 = load("IbanDict.txt"); 
    Set<String> file2 = load("AFF_outVal.txt"); 

    for (String word1 : file1) { 
     for (String word2 : file2) { 
      if (word1.equalsIgnoreCase(word2)) { 
       boolean already = false; 
       for (String outStr : out) { 
        if (outStr.equalsIgnoreCase(word1)) { 
         already = true; 
        } 
       } 
       if (!already) { 
        out.add(word1); 
       } 
      } 
     } 
    } 
    write(out, "out_test2.txt"); 
}

그리고 행 write 방법을 변경 반복하고 File.separator을 OS 독립적으로 사용하십시오.

private static void write(Iterable<String> out, String fname) throws IOException { 
    OutputStreamWriter writer = new FileWriter(new File(fname)); 
    for (String s : out) { 
     writer.write(s + File.separator); 
    } 
    writer.close(); 
}

출처

2011-08-18 17:50:41 CrackerJack9

기본적으로 파일 2의 단어가 파일 1에도 있는지 확인하고 싶다면 true를, 그렇지 않으면 false를 인쇄하십시오.

가장 쉬운 방법은 파일 1의 모든 단어를 검색 할 수있는 데이터 세트를 만드는 것입니다. 파일 2의 각 단어에 대해 단어 세트가 포함되어 있는지 또는 단어가 없는지 확인하십시오.

아래 코드는 아무 효과가 없습니다. 그것은 sts에있는 파일에있는 모든 단어의 배열을 생성 한 다음 단어가 아무것도 아닌지 확인하고 공백과 개행을 검사합니다. 그렇다면 ArrayList에 추가합니다. 단어는 결코 그 모든 것들이 될 수 없으므로 절대로 단어가 추가되지 않습니다.

Scanner reader = new Scanner(new File(f1)); 
ArrayList<String> out = new ArrayList<String>(); 
while (reader.hasNext()) { 
    String temp = reader.nextLine();  
    String[] sts = temp.split(" "); 
    for (int i = 0; i < sts.length; i++) { 
    if (sts[i].equals("") && sts[i].equals(" ") && sts[i].equals("\n")) { 
     out.add(sts[i]); 
    } 
    } 
}

당신이 당신의 사전에있는 모든 단어의 ArrayList를 가지고 스캐너의 모든 토큰을 반복하고

while (reader.hasNext()) { 
out.add(reader.next()); 
}

지금의 ArrayList에 추가하여 모든 단어의 컬렉션을 얻기 위해 여기에 루프를 수정 확인을 시작할 수 있습니다.

파일 2에서 단어가 사전에 포함되는 경우 당신은 단순히

dictionary.contains(file2.get(i))

포함 부르는 일치하는 항목이 있는지 확인하기 위해 ArrayList에있는 모든 문자열의 equals 메소드를 사용 할 수 있습니다 참조하십시오.

이제 줄 단위로 처리하려면 2 개의 데이터 집합을 만들어서는 안됩니다. 귀하의 사전은 데이터 집합이어야하지만, 파일 2의 경우 Scanner 객체를 사용하는 것이 더 쉽습니다.

스캐너에서 각 행을 읽습니다. hasNextLine()이 반복에 필요한 검사를하기 때문에 여기에서 hasNext() 대신 hasNextLine()을 사용해야합니다. 라인의 각 토큰에 대한

line = reader.nextLine();

검사는 각 라인을 확인하면서이

String[] splitLine = line.split(" "); 
for(String token: splitLine){  
    writer.write(dictionary.contains(file2.get(i))+" "); 
}

않는 경우에 한 줄을 쓸 수있는 목록에서 경기를 가지고 허위 + 공간 또는 실제 작성하는 경우 행 번호가 일치하도록 출력 파일.

public class Test{ 

    private static List<String> loadDictionary(String fileName) throws FileNotFoundException { 
    Scanner reader = new Scanner(new File(fileName)); 
    List<String> out = new ArrayList<String>(); 
    while (reader.hasNext()) { 
     out.add(reader.next()); 
    } 
    reader.close(); 
    return out; 
    } 

    public static void main(String[] args) throws IOException { 
    List<String> dictionary; 
    dictionary = loadDictionary("IbanDict.txt"); 

    Scanner reader = new Scanner(new File("AFF_outVal.txt")); 
    OutputStreamWriter writer = new FileWriter(new File("out_test2.txt")); 

    while(reader.hasNextLine()){ 
     String line = reader.nextLine(); 
     String[] tokens = line.split(" "); 
     for(String token: tokens){ 
     writer.write(dictionary.contains(token)+" "); 
     } 
     writer.write(System.getProperty("line.separator")); 
    } 
    writer.close(); 
    reader.close(); 
    } 
}

출처

2011-08-19 01:28:48

당신이 코드를 포맷 할 수 porblem

import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileWriter; 
import java.io.IOException; 
import java.util.HashMap; 
import java.util.Map; 
import java.util.Scanner; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

public class Test { 

    private static final Pattern WORD_PATTERN = Pattern.compile("[\\w']+"); 

    private static Map<String, Integer> load(final String f1) throws FileNotFoundException { 
    Scanner reader = new Scanner(new File(f1)); 
    Map<String, Integer> out = new HashMap<String, Integer>(); 
    while (reader.hasNext()) { 
     String tempLine = reader.nextLine(); 
     if (tempLine != null && tempLine.trim().length() > 0) { 
     Matcher matcher = WORD_PATTERN.matcher(tempLine); 
     while (matcher.find()) { 
      out.put(matcher.group().toLowerCase(), 0); 
     } 
     } 
    } 

    return out; 
    } 

    private static void write(final Map<String, Integer> out, final String fname) throws IOException { 
    FileWriter writer = new FileWriter(new File(fname)); 
    for (Map.Entry<String, Integer> word : out.entrySet()) { 
     if (word.getValue() == 1) { 
     writer.write(word.getKey() + "\n"); 
     } 
    } 
    writer.close(); 
    } 

    public static void main(final String[] args) throws IOException { 
    Map<String, Integer> file1 = load("file1.txt"); 
    Map<String, Integer> file2 = load("file2.txt"); 

    // below for loop will run just one time, so it is much faster 
    for (Map.Entry<String, Integer> file1Word : file1.entrySet()) { 
     if (file2.containsKey(file1Word.getKey())) { 
     file1.put(file1Word.getKey(), 1); 
     file2.put(file1Word.getKey(), 1); 
     } 
    } 

    write(file1, "test1.txt"); 
    write(file2, "test2.txt"); 
    } 

}

출처

2011-08-23 16:02:56 Kowser

나는'Pattern' 클래스의 사용법을 좋아합니다. 나는이 로직의 캡슐화를 촉진하는 방법으로'FilterReader/Writer'를 확장 할 것을 제안합니다. – wulfgarpro

두 파일에서 비슷한 단어 (문자열)를 찾으려면

답변

관련 문제