SAX 파서를 사용하여 XML을 구문 분석하는 방법

다음은이 tutorial입니다.SAX 파서를 사용하여 XML을 구문 분석하는 방법

잘 작동하지만 마지막 요소가있는 단일 문자열 대신 모든 문자열로 배열을 반환하고 싶습니다.

아이디어가 있습니까?

출처

2011-01-28 Johan

당신이 당신의 XML 구조의 추상 묘사를 게시하시기 바랍니다 수 :이 파서를 실행하려면이 코드를 사용할 수 있습니까? –

http://dearfriends.se/category/blog/feed/rss/ -> 소스보기 – Johan

178

따라서 XML 파서를 작성하여 이와 같은 RSS 피드를 구문 분석하려고합니다.

<rss version="0.92"> 
<channel> 
    <title>MyTitle</title> 
    <link>http://myurl.com</link> 
    <description>MyDescription</description> 
    <lastBuildDate>SomeDate</lastBuildDate> 
    <docs>http://someurl.com</docs> 
    <language>SomeLanguage</language> 

    <item> 
     <title>TitleOne</title> 
     <description><![CDATA[Some text.]]></description> 
     <link>http://linktoarticle.com</link> 
    </item> 

    <item> 
     <title>TitleTwo</title> 
     <description><![CDATA[Some other text.]]></description> 
     <link>http://linktoanotherarticle.com</link> 
    </item> 

</channel> 
</rss>

이제는 두 가지 SAX 구현을 사용할 수 있습니다. org.xml.sax 또는 android.sax 구현 중 하나를 사용하십시오. 짧은 핸드 예제를 게시 한 후 프로와 죄수 모두에 대해 설명하겠습니다.

android.sax 구현

는의가 android.sax 구현 시작하자.

먼저 RootElement 및 Element 개체를 사용하여 XML 구조를 정의해야합니다.

어쨌든 데이터를 보유 할 POJO (Plain Old Java Objects)로 작업 할 것입니다. 여기 POJO가 필요할 것입니다.

Channel.java

public class Channel implements Serializable { 

    private Items items; 
    private String title; 
    private String link; 
    private String description; 
    private String lastBuildDate; 
    private String docs; 
    private String language; 

    public Channel() { 
     setItems(null); 
     setTitle(null); 
     // set every field to null in the constructor 
    } 

    public void setItems(Items items) { 
     this.items = items; 
    } 

    public Items getItems() { 
     return items; 
    } 

    public void setTitle(String title) { 
     this.title = title; 
    } 

    public String getTitle() { 
     return title; 
    } 
    // rest of the class looks similar so just setters and getters 
}

이 클래스를 사용하면 Bundle에 넣어 그것으로 뭔가를 할 수 있도록 Serializable 인터페이스를 구현합니다.

이제 항목을 보관할 수업이 필요합니다. 이 경우에는 ArrayList 클래스를 확장하려고합니다. 우리의 항목 컨테이너

Items.java는

public class Items extends ArrayList<Item> { 

    public Items() { 
     super(); 
    } 

}

그게 전부입니다. 이제 모든 단일 항목의 데이터를 보유 할 클래스가 필요합니다.

Item.java

public class Item implements Serializable { 

    private String title; 
    private String description; 
    private String link; 

    public Item() { 
     setTitle(null); 
     setDescription(null); 
     setLink(null); 
    } 

    public void setTitle(String title) { 
     this.title = title; 
    } 

    public String getTitle() { 
     return title; 
    } 

    // same as above. 

}

예 : 당신이 볼 수있는

public class Example extends DefaultHandler { 

    private Channel channel; 
    private Items items; 
    private Item item; 

    public Example() { 
     items = new Items(); 
    } 

    public Channel parse(InputStream is) { 
     RootElement root = new RootElement("rss"); 
     Element chanElement = root.getChild("channel"); 
     Element chanTitle = chanElement.getChild("title"); 
     Element chanLink = chanElement.getChild("link"); 
     Element chanDescription = chanElement.getChild("description"); 
     Element chanLastBuildDate = chanElement.getChild("lastBuildDate"); 
     Element chanDocs = chanElement.getChild("docs"); 
     Element chanLanguage = chanElement.getChild("language"); 

     Element chanItem = chanElement.getChild("item"); 
     Element itemTitle = chanItem.getChild("title"); 
     Element itemDescription = chanItem.getChild("description"); 
     Element itemLink = chanItem.getChild("link"); 

     chanElement.setStartElementListener(new StartElementListener() { 
      public void start(Attributes attributes) { 
       channel = new Channel(); 
      } 
     }); 

     // Listen for the end of a text element and set the text as our 
     // channel's title. 
     chanTitle.setEndTextElementListener(new EndTextElementListener() { 
      public void end(String body) { 
       channel.setTitle(body); 
      } 
     }); 

     // Same thing happens for the other elements of channel ex. 

     // On every <item> tag occurrence we create a new Item object. 
     chanItem.setStartElementListener(new StartElementListener() { 
      public void start(Attributes attributes) { 
       item = new Item(); 
      } 
     }); 

     // On every </item> tag occurrence we add the current Item object 
     // to the Items container. 
     chanItem.setEndElementListener(new EndElementListener() { 
      public void end() { 
       items.add(item); 
      } 
     }); 

     itemTitle.setEndTextElementListener(new EndTextElementListener() { 
      public void end(String body) { 
       item.setTitle(body); 
      } 
     }); 

     // and so on 

     // here we actually parse the InputStream and return the resulting 
     // Channel object. 
     try { 
      Xml.parse(is, Xml.Encoding.UTF_8, root.getContentHandler()); 
      return channel; 
     } catch (SAXException e) { 
      // handle the exception 
     } catch (IOException e) { 
      // handle the exception 
     } 

     return null; 
    } 

}

은 이제 아주 간단한 예를했다. android.sax SAX 구현을 사용하면 얻을 수있는 주요 이점은 구문 분석해야하는 XML의 구조를 정의한 다음 해당 요소에 이벤트 수신기를 추가하는 것입니다. 단점은 코드가 꽤 반복되고 비대해진다는 것입니다.

org.xml.sax의 구현

org.xml.sax SAX 처리기 구현은 다소 상이하다.

여기서 XML 구조를 지정하거나 선언하지 않고 이벤트를 수신 대기 만하면됩니다.가장 널리 사용되는 것들은 이벤트 다음과 같습니다

문서 시작
문서 끝
요소 시작
요소 끝
예

요소 시작 및 요소 끝 사이

문자 위의 Channel 객체를 사용하는 핸들러 구현은 다음과 같습니다.

예 이제

public class ExampleHandler extends DefaultHandler { 

    private Channel channel; 
    private Items items; 
    private Item item; 
    private boolean inItem = false; 

    private StringBuilder content; 

    public ExampleHandler() { 
     items = new Items(); 
     content = new StringBuilder(); 
    } 

    public void startElement(String uri, String localName, String qName, 
      Attributes atts) throws SAXException { 
     content = new StringBuilder(); 
     if(localName.equalsIgnoreCase("channel")) { 
      channel = new Channel(); 
     } else if(localName.equalsIgnoreCase("item")) { 
      inItem = true; 
      item = new Item(); 
     } 
    } 

    public void endElement(String uri, String localName, String qName) 
      throws SAXException { 
     if(localName.equalsIgnoreCase("title")) { 
      if(inItem) { 
       item.setTitle(content.toString()); 
      } else { 
       channel.setTitle(content.toString()); 
      } 
     } else if(localName.equalsIgnoreCase("link")) { 
      if(inItem) { 
       item.setLink(content.toString()); 
      } else { 
       channel.setLink(content.toString()); 
      } 
     } else if(localName.equalsIgnoreCase("description")) { 
      if(inItem) { 
       item.setDescription(content.toString()); 
      } else { 
       channel.setDescription(content.toString()); 
      } 
     } else if(localName.equalsIgnoreCase("lastBuildDate")) { 
      channel.setLastBuildDate(content.toString()); 
     } else if(localName.equalsIgnoreCase("docs")) { 
      channel.setDocs(content.toString()); 
     } else if(localName.equalsIgnoreCase("language")) { 
      channel.setLanguage(content.toString()); 
     } else if(localName.equalsIgnoreCase("item")) { 
      inItem = false; 
      items.add(item); 
     } else if(localName.equalsIgnoreCase("channel")) { 
      channel.setItems(items); 
     } 
    } 

    public void characters(char[] ch, int start, int length) 
      throws SAXException { 
     content.append(ch, start, length); 
    } 

    public void endDocument() throws SAXException { 
     // you can do something here for example send 
     // the Channel object somewhere or whatever. 
    } 

}

내가 정말 android.sax 일을 통해 당신이 핸들러 구현의 실제 이점을 말할 수 없다 정직합니다. 그러나 나는 지금까지 꽤 분명해야하는 단점을 말할 수 있습니다. startElement 메서드의 else if 문을 살펴보십시오. 우리가 태그 <title>, link 및 description을 가지고 있기 때문에 XML 구조에서 추적해야합니다. 즉, 시작 태그가 <item> 인 경우 inItem 플래그를 true으로 설정하여 정확한 데이터를 올바른 객체에 매핑하고 </item> 태그가있는 경우 메소드에서 해당 플래그를 false으로 설정합니다. 우리가 그 item tag로 끝났음을 알리기 위해서.

이 예제에서는 관리하기가 매우 쉽지만 여러 레벨에서 반복 태그를 사용하여 복잡한 구조를 구문 분석해야하는 것은 까다로울 수 있습니다. 예를 들어 Enum을 사용하여 현재 상태를 설정하고 스위치/사례 statenenet을 많이 사용하여 현재 위치를 확인하거나 태그 스택을 사용하는 일종의 태그 추적기가 될 수있는보다 우아한 솔루션을 사용해야합니다.

출처

2011-01-28 13:30:47

@Adinia 두 구현을 함께 사용하는 것이 좋습니다. 왜 그렇게하는지 아는 한 그렇게하는 데 아무런 문제가 없습니다. –

@ octavian-damiean 제 코드가 제대로 작동했는데 왜 모든 줄을 썼는지 알지 못했습니다. 나는 어떻게 그것들 각각이 어떻게 작동 하는지를 이해함에 따라, 지금 조금씩 정리하려고 노력한다. 두 가지를 함께 사용하는 것이 좋습니다. – Adinia

@Adinia 알겠습니다. 천만에요. 그것에 대해 더 궁금한 점이 있으면 멋진 [Android chat room] (http://chat.stackoverflow.com/rooms/15/android)에 참여할 수도 있습니다. –

많은 문제에서 다른 목적으로 여러 종류의 xml 파일을 사용해야합니다. 나는 그 광대 함을 파악하려고 노력하지 않을 것이고, 내가이 모든 것을 필요로했다는 것을 내 자신의 경험으로 말할 것입니다.

Java는 아마도 내가 가장 좋아하는 프로그래밍 언어입니다. 또한,이 사랑은 당신이 어떤 문제를 해결할 수 있고 자전거를 필요로하지 않는다는 사실로 강화됩니다.

그래서 클라이언트가 데이터베이스 서버에 항목을 원격으로 만들 수 있도록 데이터베이스를 실행하는 클라이언트 - 서버를 만들었습니다. 입력 데이터 등을 확인할 필요는 없지만 그 점은 중요하지 않습니다.

작업의 원칙으로서, 나는 주저없이 xml 파일 형태로 정보 전송을 선택했다. 다음 유형 중 :

<? xml version = "1.0" encoding = "UTF-8" standalone = "no"?> 
<doc> 
<id> 3 </ id> 
<fam> Ivanov </ fam> 
<name> Ivan </ name> 
<otc> I. </ otc> 
<dateb> 10-03-2005 </ dateb> 
<datep> 10-03-2005 </ datep> 
<datev> 10-03-2005 </ datev> 
<datebegin> 09-06-2009 </ datebegin> 
<dateend> 10-03-2005 </ dateend> 
<vdolid> 1 </ vdolid> 
<specid> 1 </ specid> 
<klavid> 1 </ klavid> 
<stav> 2.0 </ stav> 
<progid> 1 </ progid> 
</ doc>

의사 기관에 관한 정보 라고만 말할 수 있습니다. 성, 이름, 고유 ID 등. 일반적으로 데이터 계열입니다. 이 파일은 안전하게 서버쪽에 있고 파일 파싱을 시작합니다. (DOM 대 SAX를) 구문 분석하는 두 가지 옵션 중

나는 그가 더 밝은 작동한다는 사실 SAX보기를 선택했고, 그는 먼저 나는 손 :

겠어요 -로 떨어졌다되었다 알다시피, 파서를 성공적으로 사용하려면 필요한 메소드 DefaultHandler를 재정의해야합니다. 시작하려면 필요한 패키지를 연결하십시오.

import org.xml.sax.helpers.DefaultHandler; 
import org.xml.sax. *;

이제 우리는

public class SAXPars extends DefaultHandler { 
   ... 
}

이의이 메소드 있으며, StartDocument 시작하자 우리의 파서를 쓰기 시작할 수 있습니다(). 그는 그 이름에서 알 수 있듯이 문서의 시작 부분에 반응합니다. 여기에는 메모리 할당과 같은 다양한 작업을 중지하거나 값을 재설정하는 것이 아니라 단지 적절한 메시지의 작업의 시작 표시 할 수 있도록 우리의 예는 매우 간단하다 :

Override 
public void startDocument() throws SAXException { 
   System.out.println ("Start parse XML ..."); 
}

다음을. 구문 분석기는 문서의 구조 요소를 충족시킵니다. startElement() 메서드를 시작합니다. 사실, 그의 모습은 다음과 같습니다. startElement (String namespaceURI, String localName, String qName, Attributes atts). 여기서 namespaceURI - 네임 스페이스 localName - 요소의 로컬 이름 qName- 네임 스페이스와 로컬 이름의 조합 (콜론으로 구분) 및 atts -이 요소의 특성. 이 경우 모두 간단합니다. qName'om을 사용하여 일부 서비스 라인 thisElement에 던져 넣기 만하면됩니다. 따라서 우리는 우리가 존재하는 순간 그 요소를 표시합니다.

@Override 
public void startElement (String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { 
   thisElement = qName; 
}

다음으로 우리는 그 의미를 알 수 있습니다. 여기에 characters() 메소드가 포함됩니다. 그는 다음과 같은 형식을 가지고 있습니다 : characters (char [] ch, int start, int length). 여기서 모든 것이 명확합니다. ch -이 요소 내에서 자체적으로 중요한 문자열을 포함하는 파일 start 및 length - 라인의 시작점과 길이를 나타내는 서비스 번호.

@Override 
public void characters (char [] ch, int start, int length) throws SAXException { 
   if (thisElement.equals ("id")) { 
      doc.setId (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("fam")) { 
      doc.setFam (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("name")) { 
      doc.setName (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("otc")) { 
      doc.setOtc (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("dateb")) { 
      doc.setDateb (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("datep")) { 
      doc.setDatep (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("datev")) { 
      doc.setDatev (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("datebegin")) { 
      doc.setDatebegin (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("dateend")) { 
      doc.setDateend (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("vdolid")) { 
      doc.setVdolid (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("specid")) { 
      doc.setSpecid (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("klavid")) { 
      doc.setKlavid (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("stav")) { 
      doc.setStav (new Float (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("progid")) { 
      doc.setProgid (new Integer (new String (ch, start, length))); 
   } 
}

아, 네. 나는 거의 잊었다. 그 대상이 naparsennye를 접을 것이기 때문에 데이터는 의사 유형에 대해 말해줍니다. 이 클래스는 정의되고 필요한 모든 setters-getters를가집니다.

다음 명백한 요소가 끝나고 그 다음에 다음 요소가옵니다. endElement()를 종료합니다. 항목이 종료되었으며 현재로서는 아무 것도 할 수 있음을 알립니다. 계속 진행합니다. 요소를 정화합니다.

@Override 
public void endElement (String namespaceURI, String localName, String qName) throws SAXException { 
   thisElement = ""; 
}

전체 문서가 나오면 파일의 끝으로갑니다. endDocument() 작업. 그것으로, 우리는 메모리를 비울 수 있고, 어떤 진단적인 인쇄를 할 수 있습니다. 우리의 경우, 파싱이 끝나는 것에 대해 작성하십시오.

@Override 
public void endDocument() { 
   System.out.println ("Stop parse XML ..."); 
}

그래서 우리는 xml 형식을 파싱 할 클래스를 얻었습니다. 나는이 주제에 쉽게 SAX 파서의 본질을 제시하는 데 도움이 희망

import org.xml.sax.helpers.DefaultHandler; 
import org.xml.sax. *; 
  
public class SAXPars extends DefaultHandler { 
  
Doctors doc = new Doctors(); 
String thisElement = ""; 
  
public Doctors getResult() { 
   return doc; 
} 
  
@Override 
public void startDocument() throws SAXException { 
   System.out.println ("Start parse XML ..."); 
} 
  
@Override 
public void startElement (String namespaceURI, String localName, String qName, Attributes atts) throws SAXException { 
   thisElement = qName; 
} 
  
@Override 
public void endElement (String namespaceURI, String localName, String qName) throws SAXException { 
   thisElement = ""; 
} 
  
@Override 
public void characters (char [] ch, int start, int length) throws SAXException { 
   if (thisElement.equals ("id")) { 
      doc.setId (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("fam")) { 
      doc.setFam (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("name")) { 
      doc.setName (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("otc")) { 
      doc.setOtc (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("dateb")) { 
      doc.setDateb (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("datep")) { 
      doc.setDatep (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("datev")) { 
      doc.setDatev (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("datebegin")) { 
      doc.setDatebegin (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("dateend")) { 
      doc.setDateend (new String (ch, start, length)); 
   } 
   if (thisElement.equals ("vdolid")) { 
      doc.setVdolid (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("specid")) { 
      doc.setSpecid (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("klavid")) { 
      doc.setKlavid (new Integer (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("stav")) { 
      doc.setStav (new Float (new String (ch, start, length))); 
   } 
   if (thisElement.equals ("progid")) { 
      doc.setProgid (new Integer (new String (ch, start, length))); 
   } 
} 
  
@Override 
public void endDocument() { 
   System.out.println ("Stop parse XML ..."); 
} 
}

다음은 전체 텍스트입니다.

엄격하게 첫 번째 기사를 판단하지 마십시오. 나는 적어도 누군가 유용했기를 바랍니다.

UPD :

SAXParserFactory factory = SAXParserFactory.newInstance(); 
SAXParser parser = factory.newSAXParser(); 
SAXPars saxp = new SAXPars(); 
  
parser.parse (new File ("..."), saxp);

출처

2014-10-24 13:07:04

SAX 파서를 사용하여 XML을 구문 분석하는 방법

답변

관련 문제