antlr4는 렉서 아일랜드 문법에 대한 토큰 인식 오류를 나타냅니다.

일부 간단한 HTML 파일을 구문 분석하려면 antlr4가 필요합니다. 문법을 파서 문법과 렉서 문법으로 나눠서 "Definitive ANTLR4 Reference"에 설명 된대로 태그 내부의 문법 (< 및>)에 대한 아일랜드 문법을 사용할 수 있습니다. antlr4가 반복적으로 "토큰 인식 오류"라고 알려줍니다.antlr4는 렉서 아일랜드 문법에 대한 토큰 인식 오류를 나타냅니다.

파서 문법 :

grammar Rule; 

options { 
    tokenVocab = HTMLLexer; 
    language = Java; 
} 

/* Parser Rules */ 
doc : type? html ; 
type : '<!DOCTYPE HTML>' ; 
html : shtml head body ehtml ; 

head : shead meta* ehead ; 
meta : smeta ; 

body : sbody ebody ; 

shtml : '<' 'html' attr* '>' ; 
ehtml : '<' '/html' '>' ; 
shead : '<' 'head' attr* '>' ; 
ehead : '<' '/head' '>' ; 
smeta : '<' 'meta' attr+ '>' ; 

sbody : '<' 'body' attr* '>' ; 
ebody : '<' '/body' '>' ; 

attr : NAME '=' VALUE ;

렉서의 문법 :

lexer grammar HTMLLexer; 

COMMENT : '<!--' .*? '-->' -> skip ; 
CDATA : '<![CDATA[' .*? ']]>' ; 

OPEN  : '<' -> pushMode(INSIDE) ; 
SPEC_OPEN : '<!' -> pushMode(INSIDE) ; 

TEXT : (ENTITY | ~[<&])+ ; 
fragment ENTITY 
    : '&' [a-zA-Z]+ ';' 
    | '&#' [0-9]+ ';' 
    | '&#x' [0-9A-Za-z]+ ';' ; 

mode INSIDE; 
CLOSE  : '>' -> popMode ; 
SLASH_CLOSE : '/>' -> popMode ; 

StHTML : 'html' ; 
EnHTML : '/html' ; 

StHead : 'head' ; 
EnHead : '/head' ; 
StMeta : 'meta' ; 

StBody : 'body' ; 
EnBody : '/body' ; 

NAME : 'class' 
    | 'content' 
    | 'http-equiv' 
    | 'id' 
    | 'lang' 
    | 'name' 
    | 'style' 
    | 'type' 
    ; 

EQUALS : '=' ; 

VALUE : ('"' ~["<>\r\n]+ '"') 
    | ('\'' ~['<>\r\n]+ '\'') 
    | ~["'<>= \t\r\n]+ ; 
    ; 

WS : [ \t\r\n]+ -> skip ;

샘플 HTML 파일 : antlr4에서

<html> 
<head> 
<meta http-equiv=Content-Type content="text/html; charset=windows-1252"> 
<meta name=Generator content="Microsoft Word 14 (filtered)"> 
</head> 

<body lang=EN-US style='text-justify-trim:punctuation'> 
</body> 
</html>

출력 :

line 1:6 token recognition error at: '\n' 
line 2:6 token recognition error at: '\n' 
line 3:5 token recognition error at: ' ' 
line 3:6 token recognition error at: 'htt' 
line 3:9 token recognition error at: 'p' 
... 
[@0,0:0='<',<7>,1:0] 
[@1,1:4='html',<10>,1:1] 
[@2,5:5='>',<1>,1:5] 
[@3,7:7='<',<7>,2:0] 
[@4,8:11='head',<6>,2:1] 
[@5,12:12='>',<1>,2:5] 
[@6,14:14='<',<7>,3:0] 
[@7,15:18='meta',<2>,3:1] 
[@8,30:30='=',<9>,3:16] 
[@9,51:51='=',<9>,3:37] 
[@10,57:61='/html',<4>,3:43] 
[@11,71:71='=',<9>,3:57] 
[@12,85:85='>',<1>,3:71] 
[@13,87:87='<',<7>,4:0] 
[@14,88:91='meta',<2>,4:1] 
[@15,115:115='=',<9>,4:28] 
[@16,146:146='>',<1>,4:59] 
[@17,148:148='<',<7>,5:0] 
[@18,149:153='/head',<8>,5:1] 
[@19,154:154='>',<1>,5:6] 
[@20,157:157='<',<7>,7:0] 
[@21,158:161='body',<5>,7:1] 
[@22,167:167='=',<9>,7:10] 
[@23,179:179='=',<9>,7:22] 
[@24,211:211='>',<1>,7:54] 
[@25,213:213='<',<7>,8:0] 
[@26,214:218='/body',<11>,8:1] 
[@27,219:219='>',<1>,8:6] 
[@28,221:221='<',<7>,9:0] 
[@29,222:226='/html',<4>,9:1] 
[@30,227:227='>',<1>,9:6] 
[@31,229:228='<EOF>',<-1>,10:0] 
line 3:16 mismatched input '=' expecting NAME 
line 4:28 mismatched input '=' expecting NAME 
line 7:10 mismatched input '=' expecting {'>', NAME}

출처

2014-02-13 Barzee

우선, 구문 분석기의 선언을 grammar Rule; 대신 parser grammar Rule;으로 변경해야합니다. 나는 당신의 렉서에게 문제가 될 수있는 특정 오류 메시지를 생성하는 어떤 문제도 보지 못합니다.

출처

2014-02-13 00:59:54

antlr4는 렉서 아일랜드 문법에 대한 토큰 인식 오류를 나타냅니다.

답변

관련 문제