여러 PDF 파일을 Java의 폴더에있는 텍스트로 파싱하는 방법

pdf가 많은 폴더가 있으므로이 파일을 모두 txt로 변환하고 해당 텍스트 파일을 다른 폴더에 저장해야합니다. 나는 이것을 위해 자바를 사용하고 싶다.여러 PDF 파일을 Java의 폴더에있는 텍스트로 파싱하는 방법

pdf를 구문 분석하는이 코드가 있지만 한 번에 하나씩 만 작동하므로 수천 개의 pdf가있는 폴더를 처리해야합니다.

PDFTextStripper pdfStripper = null; 
PDDocument pdDoc = null; 
COSDocument cosDoc = null; 
File file = new File("C:/my.pdf"); 

try { 
    PDFParser parser = new PDFParser(new FileInputStream(file)); 
    parser.parse(); 
    cosDoc = parser.getDocument(); 
    pdfStripper = new PDFTextStripper(); 
    pdDoc = new PDDocument(cosDoc); 
    pdfStripper.setStartPage(1); 
    pdfStripper.setEndPage(20); 
    String parsedText = pdfStripper.getText(pdDoc); 
    }catch (IOException e) { 
    // TODO Auto-generated catch block 
    e.printStackTrace(); 
}

아이디어가 있으십니까?

출처

2017-04-24 fluxing23

뭔가를 시도 할 수 있습니다. –

폴더 이름과 하나의 파일 이름 대신'listFiles()'메소드를 사용해보십시오 –

당신은 파일을 반복 할 루프에서 위의 코드를 넣어이

PDFTextStripper pdfStripper = null; 
PDDocument pdDoc = null; 
COSDocument cosDoc = null; 
String parsedText=""; // append the text to this every time 
File folder = new File("/yourFolder"); // put all the pdf files in a folder 
File[] listOfFiles = folder.listFiles(); // get all the files as an array 

    for (File file : listOfFiles) { // cycle through this array 
     if (file.isFile()) { // for every file 
      try { //do the same 
       PDFParser parser = new PDFParser(new FileInputStream(file)); 
       parser.parse(); 
       cosDoc = parser.getDocument(); 
       pdfStripper = new PDFTextStripper(); 
       pdDoc = new PDDocument(cosDoc); 
       pdfStripper.setStartPage(1); 
       pdfStripper.setEndPage(pdDoc.getNumberOfPages()); // if always till the last page 
       parsedText += pdfStripper.getText(pdDoc) + System.lineSeparator(); // append the text to the String 
       }catch (IOException e) { 
       // TODO Auto-generated catch block 
       e.printStackTrace(); 
       } 
     } 
    }

출처

2017-04-24 15:21:18 Yahya

정말 고마워요 !! 후속으로 새로운 텍스트를 하나의 큰 텍스트 파일 대신 별도로 저장하는 방법이 있는지 궁금합니다. – fluxing23

내가 도와 줘서 다행이다 :) 당신은 "parsedText"텍스트 파일에 추가하는 대신 매 루프마다 저장할 수있다. – Yahya

고마워! 나는 그것을 시도 할 것이다. – fluxing23

여러 PDF 파일을 Java의 폴더에있는 텍스트로 파싱하는 방법

답변

관련 문제