Java가있는 이진 파일에서 바이트 시퀀스 검색

Java를 사용하여 일련의 이진 파일에서 검색해야하는 바이트 시퀀스가 있습니다.Java가있는 이진 파일에서 바이트 시퀀스 검색

예 : 바이너리 파일에서 바이트 시퀀스 DEADBEEF (16 진수)을 찾고 있습니다. Java에서이 작업을 수행하려면 어떻게해야합니까? 바이너리 파일에 대해 String.contains()과 같은 내장 메소드가 있습니까?

출처

2009-10-02 Bassam

아니요, 이렇게하는 기본 제공 방법이 없습니다. 그러나, 직접 HERE에서 복사 (원본 코드에 적용이 수정 프로그램) : 라이브러리를 선호하는 사람들을 위해

/** 
* Knuth-Morris-Pratt Algorithm for Pattern Matching 
*/ 
class KMPMatch { 
    /** 
    * Finds the first occurrence of the pattern in the text. 
    */ 
    public int indexOf(byte[] data, byte[] pattern) { 
     int[] failure = computeFailure(pattern); 

     int j = 0; 
     if (data.length == 0) return -1; 

     for (int i = 0; i < data.length; i++) { 
      while (j > 0 && pattern[j] != data[i]) { 
       j = failure[j - 1]; 
      } 
      if (pattern[j] == data[i]) { j++; } 
      if (j == pattern.length) { 
       return i - pattern.length + 1; 
      } 
     } 
     return -1; 
    } 

    /** 
    * Computes the failure function using a boot-strapping process, 
    * where the pattern is matched against itself. 
    */ 
    private int[] computeFailure(byte[] pattern) { 
     int[] failure = new int[pattern.length]; 

     int j = 0; 
     for (int i = 1; i < pattern.length; i++) { 
      while (j > 0 && pattern[j] != pattern[i]) { 
       j = failure[j - 1]; 
      } 
      if (pattern[j] == pattern[i]) { 
       j++; 
      } 
      failure[i] = j; 
     } 

     return failure; 
    } 
}

출처

2009-10-02 05:11:13 janko

I에 유래 사랑 바이트 :)의 큰 배열을 견딜 수 있습니다. 감사! – Teekin

거의 최적화 : data.length가 0 인 경우 패턴의 실패 함수를 계산할 필요가 없습니다. ==> data.length 제로 검사를 함수의 첫 번째 행으로 이동할 수 있습니다. – dexametason

private int bytesIndexOf(byte[] source, byte[] search, int fromIndex) { 
    boolean find = false; 
    int i; 
    for (i = fromIndex; i < (source.length - search.length); i++) { 
     if (source[i] == search[0]) { 
      find = true; 
      for (int j = 0; j < search.length; j++) { 
       if (source[i + j] != search[j]) { 
        find = false; 
       } 
      } 
     } 
     if (find) { 
      break; 
     } 
    } 
    if (!find) { 
     return -1; 
    } 
    return i; 
}

출처

2011-01-12 04:18:36 joseluisbz

문자열의 마지막 바이트에서 작동하지 않습니다. –

출처

2015-03-17 14:43:04

사용하지 않은 MAX_PATTERN_LENGTH 멤버가 나타내는 패턴에 1024 바이트 제한을 두어야 할 부분은 무엇입니까? – user1767316

당신은 Github에서의 라이브러리를 찾을 수 있습니다.

여기 Github에서에에서 해방 및 예 : https://github.com/riversun/bigdoc

package org.example; 

import java.io.File; 
import java.util.List; 

import org.riversun.bigdoc.bin.BigFileSearcher; 

public class Example { 

    public static void main(String[] args) throws Exception { 

     byte[] searchBytes = "hello world.".getBytes("UTF-8"); 

     File file = new File("/var/tmp/yourBigfile.bin"); 

     BigFileSearcher searcher = new BigFileSearcher(); 

     List<Long> findList = searcher.searchBigFile(file, searchBytes); 

     System.out.println("positions = " + findList); 
    } 
}

당신이 메모리를 검색 할 경우,이를 확인. 여기 Github의 상에 예 : https://github.com/riversun/finbin

import java.util.List; 

import org.riversun.finbin.BigBinarySearcher; 

public class Example { 

    public static void main(String[] args) throws Exception { 

     BigBinarySearcher bbs = new BigBinarySearcher(); 

     byte[] iamBigSrcBytes = "Hello world.It's a small world.".getBytes("utf-8"); 

     byte[] searchBytes = "world".getBytes("utf-8"); 

     List<Integer> indexList = bbs.searchBytes(iamBigSrcBytes, searchBytes); 

     System.out.println("indexList=" + indexList); 
    } 
}

반환 바이트

의 배열의 모든 일치하는 위치는 또한

출처

2015-07-09 05:00:50 riversun

Java가있는 이진 파일에서 바이트 시퀀스 검색

답변

관련 문제