2017-02-18 1 views
-1

이 xml을 사용하여 xsd를 준비하고 행을 더 처리하여 데이터를 데이터베이스에 삽입하려고합니다. xsd를 준비하기 위해 xslt를 사용하여 구조를 원하는 형식으로 변환합니다.노드 값에 url이 포함되어있는 경우 XML 노드를 제거하는 방법은 무엇입니까?

<linked-hash-map> 
    <entry> 
    <string>_type</string> 
    <string>News</string> 
    </entry> 
    <entry> 
    <string>value</string> 
    <list> 
     <linked-hash-map> 
     <entry> 
      <string>name</string> 
      <string> 
      Virat Kohli 
      </string> 
     </entry> 
     <entry> 
      <string>url</string> 
      <string> 
      http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1 
      </string> 
     </entry> 
     <entry> 
      <string>image</string> 
      <linked-hash-map> 
      <entry> 
       <string>thumbnail</string> 
       <linked-hash-map> 
       <entry> 
        <string>contentUrl</string> 
        <string> 
        https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News 
        </string> 
       </entry> 
       <entry> 
        <string>width</string> 
        <int>640</int> 
       </entry> 
       </linked-hash-map> 
      </entry> 
      </linked-hash-map> 
     </entry> 
     <entry> 
      <string>description</string> 
      <string> 
      On Wednesday, cricketer Virat Kohli 
      </string> 
     </entry> 
     <entry> 
      <string>datePublished</string> 
      <string>2017-02-16T05:39:00</string> 
     </entry> 
     <entry> 
      <string>category</string> 
      <string>Entertainment</string> 
     </entry> 
     </linked-hash-map> 
     <linked-hash-map> 
     <entry> 
      <string>name</string> 
      <string> 
      Shah Rukh Khan’s TV show 
      </string> 
     </entry> 
     <entry> 
      <string>url</string> 
      <string> 
      http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1 
      </string> 
     </entry> 
     <entry> 
      <string>image</string> 
      <linked-hash-map> 
      <entry> 
       <string>thumbnail</string> 
       <linked-hash-map> 
       <entry> 
        <string>contentUrl</string> 
        <string> 
        https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&pid=News 
        </string> 
       </entry> 
       <entry> 
        <string>width</string> 
        <int>700</int> 
       </entry> 
       </linked-hash-map> 
      </entry> 
      </linked-hash-map> 
     </entry> 
     <entry> 
      <string>description</string> 
      <string> 
      Here’s some wonderful news 
      </string> 
     </entry> 
     <entry> 
      <string>datePublished</string> 
      <string>2017-02-16T05:36:00</string> 
     </entry> 
     <entry> 
      <string>category</string> 
      <string>Entertainment</string> 
     </entry> 
     </linked-hash-map> 
    </list> 
    </entry> 
</linked-hash-map> 

여기 URL에 쿼리 문자열이 있습니다. 어떻게 URL을 제거하는 방법 또는 querystring과 함께 URL을 인코딩하는 방법?

원하는 출력 아래

<?xml version="1.0" encoding="utf-8"?> 
<linked-hash-map> 
    <entry> 
    <linked-hash-map> 
     <_type>News</_type> 
     <datarow> 
     <name> Virat Kohli</name> 
     <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&v=1&r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&p=DevEx,5026.1</url> 
     <contentUrl> https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl> 
     <width>640</width> 
     <description> On Wednesday, cricketer Virat Kohli</description> 
     <readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink> 
     <datePublished>2017-02-16T05:39:00</datePublished> 
     <category>Entertainment</category>  
     </datarow> 
     <datarow> 
     <name> Shah Rukh Khan’s TV show</name> 
     <url> http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&CID=09E4F1057ADB64720330FB2E7BC96547&rd=1&h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&v=1&r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&p=DevEx,5040.1 </url> 
     <contentUrl> https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&pid=News </contentUrl> 
     <width>640</width> 
     <description> Here’s some wonderful news </description> 
     <readLink> https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb </readLink> 
     <datePublished>2017-02-16T05:39:00</datePublished> 
     <category>Entertainment</category> 
     </datarow> 
    </linked-hash-map> 
    </entry> 
</linked-hash-map> 

제가 이러한 구조를 변환하는 데 사용되는 스크립트이다. URL에 사용되는 앰퍼샌드 XML entity references, 즉 &amp; 해당로 교체해야합니다으로

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> 
    <xsl:strip-space elements="*"/> 

    <xsl:template match="node()|@*"> 
    <xsl:copy> 
     <xsl:apply-templates select="node()|@*"/> 
    </xsl:copy> 
    </xsl:template> 

    <xsl:template match="/linked-hash-map"> 
    <xsl:element name="{local-name()}"> 
     <xsl:for-each select="entry"> 
     <xsl:choose> 
      <xsl:when test="list/linked-hash-map"> 
      <xsl:for-each select="list/linked-hash-map"> 
       <datarow> 
       <xsl:for-each select="entry"> 
        <xsl:if test="not(node()[1]='image' or node()[1]='about' or node()[1]='clusteredArticles' or node()[1]='mentions' or node()[1]='provider' or node()[1]='url' or node()[1]='description' or node()[1]='name')"> 
        <xsl:text disable-output-escaping="yes">&lt;</xsl:text> 
        <xsl:value-of select="*[1]"/> 
        <xsl:text disable-output-escaping="yes">&gt;</xsl:text> 
        <xsl:value-of select="*[2]"/> 
        <xsl:text disable-output-escaping="yes">&lt;/</xsl:text> 
        <xsl:value-of select="*[1]"/> 
        <xsl:text disable-output-escaping="yes">&gt;</xsl:text> 
        </xsl:if> 
       </xsl:for-each> 
       </datarow> 
      </xsl:for-each> 
      </xsl:when> 
      <xsl:otherwise> 
      <xsl:text disable-output-escaping="yes">&lt;</xsl:text> 
      <xsl:value-of select="*[1]"/> 
      <xsl:text disable-output-escaping="yes">&gt;</xsl:text> 
      <xsl:value-of select="*[2]"/> 
      <xsl:text disable-output-escaping="yes">&lt;/</xsl:text> 
      <xsl:value-of select="*[1]"/> 
      <xsl:text disable-output-escaping="yes">&gt;</xsl:text> 
      </xsl:otherwise> 
     </xsl:choose> 
     </xsl:for-each> 
    </xsl:element> 
    </xsl:template> 
    <xsl:template match="/"> 
    <xsl:copy> 
     <linked-hash-map> 
     <entry> 
      <xsl:apply-templates/> 
     </entry> 
     </linked-hash-map> 
    </xsl:copy> 
    </xsl:template> 

</xsl:stylesheet> 
+0

시도한 스크립트는 어디에 있습니까? 어떤 오류나 바람직하지 않은 결과가 있습니까? – Parfait

+0

처음에는 스크립트 자체가 실행되는 동안 자체 실패. 앞으로 이동하려면 Java 코드를 통해 & 기호를 조작하고 공백으로 바꾼다. 나는 게시물을 업데이트했다. 상기 참조하십시오. – user3187932

답변

0

현재 원래 XML이 잘 형성되지 않는다.

원본 XML이 연결된 문자열의 텍스트 파일로 개발되면 안되기 때문에 원본 XML이 어떻게 렌더링되는지주의 깊게 확인하십시오 (이 마크 업이 작성되었을 수 있음). 불행히도, 이것은 범용 프로그래밍에서 일반적인 관행입니다. XML 문서는 W3C 호환 DOM 라이브러리로 구축되어야한다 (즉, 자바의 javax.xml, 파이썬의 xml.etree, PHP의 DOMDocument, .NET의 XmlDocument) 자신의 createElement, appendChild, setAttribute, 또는 대응하는 방법과.

일단 유효한 XML이 렌더링되면보다 일반적인 XSLT를 고려하십시오.

입력(문자 엔터티 조정)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<linked-hash-map> 
    <entry> 
    <string>_type</string> 
    <string>News</string> 
    </entry> 
    <entry> 
    <string>value</string> 
    <list> 
     <linked-hash-map> 
     <entry> 
      <string>name</string> 
      <string> 
      Virat Kohli 
      </string> 
     </entry> 
     <entry> 
      <string>url</string> 
      <string> 
      http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&amp;v=1&amp;r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&amp;p=DevEx,5026.1 
      </string> 
     </entry> 
     <entry> 
      <string>image</string> 
      <linked-hash-map> 
      <entry> 
       <string>thumbnail</string> 
       <linked-hash-map> 
       <entry> 
        <string>contentUrl</string> 
        <string> 
        https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&amp;pid=News 
        </string> 
       </entry> 
       <entry> 
        <string>width</string> 
        <int>640</int> 
       </entry> 
       </linked-hash-map> 
      </entry> 
      </linked-hash-map> 
     </entry> 
     <entry> 
      <string>description</string> 
      <string> 
      On Wednesday, cricketer Virat Kohli 
      </string> 
     </entry> 
     <entry> 
      <string>datePublished</string> 
      <string>2017-02-16T05:39:00</string> 
     </entry> 
     <entry> 
      <string>category</string> 
      <string>Entertainment</string> 
     </entry> 
     </linked-hash-map> 
     <linked-hash-map> 
     <entry> 
      <string>name</string> 
      <string> 
      Shah Rukh Khan's TV show 
      </string> 
     </entry> 
     <entry> 
      <string>url</string> 
      <string> 
      http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&amp;v=1&amp;r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&amp;p=DevEx,5040.1 
      </string> 
     </entry> 
     <entry> 
      <string>image</string> 
      <linked-hash-map> 
      <entry> 
       <string>thumbnail</string> 
       <linked-hash-map> 
       <entry> 
        <string>contentUrl</string> 
        <string> 
        https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&amp;pid=News 
        </string> 
       </entry> 
       <entry> 
        <string>width</string> 
        <int>700</int> 
       </entry> 
       </linked-hash-map> 
      </entry> 
      </linked-hash-map> 
     </entry> 
     <entry> 
      <string>description</string> 
      <string> 
      Here's some wonderful news 
      </string> 
     </entry> 
     <entry> 
      <string>datePublished</string> 
      <string>2017-02-16T05:36:00</string> 
     </entry> 
     <entry> 
      <string>category</string> 
      <string>Entertainment</string> 
     </entry> 
     </linked-hash-map> 
    </list> 
    </entry> 
</linked-hash-map> 

XSLT(인라인 주석 참조)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> 
    <xsl:strip-space elements="*"/> 

    <!-- APPLY ONLY SECOND ENTRY OFF ROOT --> 
    <xsl:template match="/linked-hash-map"> 
    <xsl:copy>  
     <xsl:apply-templates select="entry[2]"/>  
    </xsl:copy> 
    </xsl:template> 

    <xsl:template match="entry[2]"> 
    <xsl:copy> 
     <!-- RETRIEVE FIRST ENTRY CONTENT --> 
     <xsl:element name="{preceding-sibling::entry/string[1]}"> 
     <xsl:value-of select="preceding-sibling::entry/string[2]"/> 
     </xsl:element> 
     <!-- APPLY GRANDCHILD LINKED HASH MAP --> 
     <linked-hash-map><xsl:apply-templates select="list/linked-hash-map"/></linked-hash-map> 
    </xsl:copy> 
    </xsl:template> 

    <!-- GENERALIZE FOR ALL DESCENDANT ENTRY NODES (W/O LINKED HASH MAP CHILD) --> 
    <xsl:template match="linked-hash-map">  
    <datarow> 
     <xsl:for-each select="descendant::entry[local-name(*[2])!='linked-hash-map']">   
      <xsl:element name="{string[1]}"> 
      <xsl:value-of select="normalize-space(string[2]|int)"/> 
      </xsl:element> 
     </xsl:for-each> 
     <!-- ADDED NODE (NOT PART OF ORIGINAL) --> 
     <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink> 
    </datarow>  
    </xsl:template> 

</xsl:stylesheet> 

출력

<?xml version="1.0" encoding="UTF-8"?> 
<linked-hash-map> 
    <entry> 
     <_type>News</_type> 
     <linked-hash-map> 
     <datarow> 
      <name>Virat Kohli</name> 
      <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=nw8K4uNRgs-nvsuz2GyXpqMxdRmzWK8Xbm3W_1IlO24&amp;v=1&amp;r=http%3a%2f%2fmovies.ndtv.com%2fbollywood%2fvirat-kohli-hearts-anushka-sharma-a-timeline-of-their-romance-1659877&amp;p=DevEx,5026.1</url> 
      <contentUrl>https://www.bing.com/th?id=ON.EE674002EC235BD5795D34695EABF504&amp;pid=News</contentUrl> 
      <width>640</width> 
      <description>On Wednesday, cricketer Virat Kohli</description> 
      <datePublished>2017-02-16T05:39:00</datePublished> 
      <category>Entertainment</category> 
      <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink> 
     </datarow> 
     <datarow> 
      <name>Shah Rukh Khan's TV show</name> 
      <url>http://www.bing.com/cr?IG=3DA864FA197A4D5DAD062780C15E3A16&amp;CID=09E4F1057ADB64720330FB2E7BC96547&amp;rd=1&amp;h=4CnQhOg9Nm7pmIu9OvDl6x9WtYtSuXblCSR_WQz1VoA&amp;v=1&amp;r=http%3a%2f%2fwww.hindustantimes.com%2ftv%2fshah-rukh-khan-s-tv-show-circus-is-back-on-small-screen%2fstory-OjQUQIWi6ogxj5eF1hivTI.html&amp;p=DevEx,5040.1</url> 
      <contentUrl>https://www.bing.com/th?id=ON.2974262BB8317FA4D4BCE4A61CA9488E&amp;pid=News</contentUrl> 
      <width>700</width> 
      <description>Here's some wonderful news</description> 
      <datePublished>2017-02-16T05:36:00</datePublished> 
      <category>Entertainment</category> 
      <readLink>https://api.cognitive.microsoft.com/api/v5/entities/b8ef6b82-02be-1e24-584c-f8283b7bdaeb</readLink> 
     </datarow> 
     </linked-hash-map> 
    </entry> 
</linked-hash-map> 
+0

안녕하세요 파르페입니다. 그러나 내가 자바에서 xml 파일을 읽을 때 "1 바이트 UTF-8 시퀀스의 유효하지 않은 바이트 1"이라는 오류 메시지가 나타납니다. – user3187932

+0

어떤 XML 파일입니까? 원래 것 또는 변형 된 것? Java 1.8에서 XSLT를 실행했는데 아무 문제없이 Apache Xalan과 외부 Saxon을 사용했습니다. 귀하의 소스가 다를 수 있습니다 내 입력. 내가 언급 한 엔티티 문제를 기록해 둡니다. – Parfait

+0

안녕하세요 Parfait, 나는 xml 엔티티 참조가 잘 처리되었는지 확인했습니다. 하지만 내 이름과 설명 필드에는 다음과 같은 문자가 있습니다. banayai ha "친애하는"친구 user3187932

관련 문제