정규식을 추출 할 정규식

어떻게 내용 유형 정보가있는 줄을 추출 할 수 있습니까? 일부 메일의 경우이 헤더는 전송 방법에 따라 2 줄 또는 3 줄 또는 4 줄이 될 수 있습니다. 정규식을 추출 할 정규식

Content-Type: text/plain; 
    charset="us-ascii" 
Content-Transfer-Encoding: 7bit 

Lorem ipsum dolor sit amet, consectetur adipisicing elit, 
sed do eiusmod tempor incididunt ut labore et dolore magna 
aliqua. Ut enim ad minim veniam, quis nostrud exercitation 
ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit 
esse cillum dolore eu fugiat nulla pariatur. Excepteur sint 
occaecat cupidatat non proident, sunt in culpa qui officia 
deserunt mollit anim id est laborum.

내가이 정규식을 시도 : 이것은 하나의 예입니다 ^(Content-.*:(.|\n)*)*하지만 모든 것을 움켜 잡는다. 나는 문구 자바 내 정규식 일부만을 얻을 수있는 방법을

Content-Type: text/plain; 
    charset="us-ascii" 
Content-Transfer-Encoding: 7bit

출처

2011-10-28 Carven

이 정규식

Pattern regex = Pattern.compile("Content-Type.*?(?=^\\s*\n?\r?$)", 
           Pattern.DOTALL | Pattern.MULTILINE);

출처

2011-10-28 03:26:37

나는이 시도하지만'()'false를 반환 찾을 수 있습니다. 그것은 그 부분을 찾지 못합니다. false를 반환하는 이유는 확실하지 않다 – Carven

@xEnOn, 여기 당신이 지금 그것을 시도하고 작동하는지 알려 수 있습니다, 내가 정규식을 업데이트 경기 http://regexr.com?2v20l –

@xEnOn을 보여줍니다. –

Pattern regex = Pattern.compile("^Content-Type(?:.|\\s)*?(?=\n\\s+\n)");

이 첫 번째 완전히 빈 줄 때까지 Content-Type을 시작 모든 일치합니다.

출처

2011-10-28 02:33:57 FailedDev

감사! 하지만이 방법으로 사용할 때'StackOverFlowError'를 얻는 이유는 무엇입니까? 'mailContent.replaceFirst ("^ Content-Type (? :. | \\ s) *? (? = \ n \\ s + \ n)", "");' – Carven

@xEnOn 솔직히 모르겠습니다. ideone.com에서 샘플을 게시 할 수 있습니까? – FailedDev

코드의 어느 부분을 샘플로 붙여야할지 모릅니다. 롤. 그것은 모든 일을 잘 작동하지만 내가 제안한 하나의 정규식을 변경하는만큼, 나는 StackOverFlowError를 얻는다. 그래서 유일한 문제는'replaceAll' 라인입니다. 그것은 내가 정규식 테스터에 넣을 때 작동했던 정규식 때문에 이상합니다. 하지만 왜 자바가 그 오류를 던지는 지 모르겠다. – Carven

^Content-(.|\n)*\n\n 공백 행까지 일치합니다.

출처

2011-10-28 03:22:51 hllau

체크 아웃을 헤더의 정확한 정의에 대한 관련 RFC를 시도 할 수 있습니다. IIRC는 본질적으로 줄 바꿈과 하나 이상의 공백 문자 (예 : 공백, 줄 바꿈하지 않는 공백, 탭)를 모두 동일한 머리글 행의 일부로 간주해야합니다. 나는 또한 당신이 줄 바꿈과 공백을 하나의 공백 요소로 붕괴해야한다고 생각한다. (참고 : 좀 더 복잡한 규칙이있을 수 있으므로 RFC를 확인한다.)

새 줄이 공백이 아닌 문자로 바로 시작하는 경우에만 다음 머리글이며 다른 줄 바꿈이 바로 따라 오면 머리글 구역이 끝나고 본문 구역이 시작됩니다.

BTW : 바퀴를 다시 만들지 않고 JavaMail을 사용하는 이유는 무엇입니까?

출처

2011-10-28 07:59:06

이 테스트 스크립트는 나를 위해 작동 : \r\n을하고 (유효하지만, 일반적으로 야생에서 사용) : \n 유닉스 스타일의 라인 종단

import java.util.regex.*; 
public class TEST 
{ 
    public static void main(String[] args) 
    { 
     String subjectString = 
      "Content-Type: text/plain;\r\n" + 
      " charset=\"us-ascii\"\r\n" + 
      "Content-Transfer-Encoding: 7bit\r\n" + 
      "\r\n" + 
      "Lorem ipsum dolor sit amet, consectetur adipisicing elit,\r\n" + 
      "sed do eiusmod tempor incididunt ut labore et dolore magna\r\n" + 
      "aliqua. Ut enim ad minim veniam, quis nostrud exercitation\r\n" + 
      "ullamco laboris nisi ut aliquip ex ea commodo consequat.\r\n" + 
      "Duis aute irure dolor in reprehenderit in voluptate velit\r\n" + 
      "esse cillum dolore eu fugiat nulla pariatur. Excepteur sint\r\n" + 
      "occaecat cupidatat non proident, sunt in culpa qui officia\r\n" + 
      "deserunt mollit anim id est laborum.\r\n"; 
     String resultString = null; 
     Pattern regexPattern = Pattern.compile(
      "^Content-Type.*?(?=\\r?\\n\\s*\\n)", 
      Pattern.DOTALL | Pattern.CASE_INSENSITIVE | 
      Pattern.UNICODE_CASE | Pattern.MULTILINE); 
     Matcher regexMatcher = regexPattern.matcher(subjectString); 
     if (regexMatcher.find()) { 
      resultString = regexMatcher.group(); 
     } 
     System.out.println(resultString); 
    } 
}

그것은 텍스트 가지는 모두 유효한 작동합니다.

출처

2011-10-28 15:44:53 ridgerunner

정규식을 추출 할 정규식

답변

관련 문제