XML 속성을 파이썬으로 구문 분석 '및 "문자 분할

NVD XML을 사용하고 있으며 XML을 구문 분석하고 분할하여 결국 DB에 입력하려고합니다. 실행중인 문제는 구문 분석 된 XML 특성입니다. 값의 주변에 "또는"을 붙이십시오. 나는이 문자열들을 분리 할 수 없다. 코드와 현재 실패한 항목을 포함했습니다. 예상 출력은XML 속성을 파이썬으로 구문 분석 '및 "문자 분할

{'vendor': "america's_first_federal_credit_union", 'name': "america's_first_fcu_mobile_banking"}

에 실패

<entry type="CVE" name="CVE-2017-5916" seq="2017-5916" published="2017-05-05" modified="2017-05-16" severity="Medium" CVSS_version="2.0" CVSS_score="4.3" CVSS_base_score="4.3" CVSS_impact_subscore="2.9" CVSS_exploit_subscore="8.6" CVSS_vector="(AV:N/AC:M/Au:N/C:P/I:N/A:N)"> 
<desc> 
    <descript source="cve">The America's First Federal Credit Union (FCU) Mobile Banking app 3.1.0 for iOS does not verify X.509 certificates from SSL servers, which allows man-in-the-middle attackers to spoof servers and obtain sensitive information via a crafted certificate.</descript> 
</desc> 
<loss_types> 
    <conf/> 
</loss_types> 
<range> 
    <network/> 
</range> 
<refs> 
    <ref source="MISC" url="https://medium.com/@chronic_9612/follow-up-76-popular-apps-confirmed-vulnerable-to-silent-interception-of-tls-protected-data-64185035029f" adv="1">https://medium.com/@chronic_9612/follow-up-76-popular-apps-confirmed-vulnerable-to-silent-interception-of-tls-protected-data-64185035029f</ref> 
</refs> 
<vuln_soft> 
    <prod name="america's_first_fcu_mobile_banking" vendor="america's_first_federal_credit_union"> 
    <vers num="3.1.0" prev="1" edition=":~~~iphone_os~~"/> 
    </prod> 
</vuln_soft>

항목을 구문 분석되고있는 XML 항목의

product,america's_first_federal_credit_union,america's_first_fcu_mobile_banking

코드

#!/usr/bin/env python 
import os 
import sys 
import time 
from subprocess import call 
import xml.etree.ElementTree 
import re 

range_from = 2017 
range_to = 2017 

def process_entry(entry): 
    cve = entry.attrib.get("name") 
    print cve 
    cpes = get_cpes_affected(entry) 


def get_cpes_affected(entry): 
    child = [] 
    for e in entry.iter(): 
     if "}prod" in e.tag: 
      print e.attrib 
      print unichr(34) 
      if unichr(34) in e.attrib: 
       print "hey yo" 
       child.append("product," + str(e.attrib).split('"')[1] + "," + str(e.attrib).split('"')[3]) 
      else: 
       child.append("product," + str(e.attrib).split("'")[3] + "," + str(e.attrib).split("'")[7]) 
      #print e.tag, e.attrib 
     if "'prev'" in e.attrib: 
      child.append("version," + str(e.attrib).split("'")[7] + "," + str(e.attrib).split("'")[3]) 
     if "}vers" in e.tag and "'prev'" not in e.attrib: 
      child.append("version," + str(e.attrib).split("'")[3] + ",") 
      #print e.tag, e.attrib 
    for derp in child: 
     print derp 

for i in range(range_from, range_to+1): 
    os.system("wget -O tmp.zip https://nvd.nist.gov/download/nvdcve-%i.xml.zip" % i) 
    os.system("unzip -o tmp.zip") 
    e = xml.etree.ElementTree.parse('nvdcve-%i.xml' % i).getroot() 

    for entry in e: 
     process_entry(entry)

예입니다

그리고 그냥

{'vendor': 'emirates_nbd_bank_p.j.s.c', 'name': 'emirates_nbd_ksa'}

죄송 문제없이 분할 수의 문자열의 예를 포함 할 수는

Traceback (most recent call last): 
    File "prev-version-load.py", line 49, in <module> 
    process_entry(entry) 
    File "prev-version-load.py", line 18, in process_entry 
    cpes = get_cpes_affected(entry) 
    File "prev-version-load.py", line 33, in get_cpes_affected 
    child.append("product," + str(e.attrib).split("'")[3] + "," + str(e.attrib).split("'")[7]) 
IndexError: list index out of range

출처

2017-10-05 Adthrawn

을 그리고 당신이지고있는 오류는 ...? –

lxml을 사용하고 있습니까? –

그리고 얻으려는 결과는 무엇입니까? 'str'을'dict'하고 나서 구문 분석하려고하면 거의 당신이하고 싶은 것이 아닙니다 ... –

이것은 xml을 구문 분석하는 것과는 관련이 없지만 출력 형식을 지정하는 방법과는 관련이 없습니다.

쉘 스크립팅과 달리 대부분의 객체는 문자열이며 원하는 출력을 얻으려면 문자열로만 수행 할 수 있습니다. 파이썬은 객체 지향 언어이므로 Python의 객체에는 유형이 있습니다. 특히 e.attrib은 사전 유형이므로 사전에 문자열 연산을 수행 할 수 없습니다.

내가하려고했던 것 대신 ElementTree의 findall() 메서드를 사용하는 것이 좋습니다. 예를 들어, 당신이 정말로 무엇을하려고 이것이 생각 :

#!/usr/bin/env python from xml.etree import ElementTree as ET range_from = 2017 range_to = 2017 def process_entry(entry): cve = entry.attrib.get("name") print cve cpes = get_cpes_affected(entry) def get_cpes_affected(entry): prods = entry.findall('nvd:vuln_soft/nvd:prod', namespaces=namespaces) for prod in prods: print prod.attrib print '"' for prod in prods: print "product,{},{}".format(prod.attrib['vendor'], prod.attrib['name']) for vers in prod.findall('nvd:vers', namespaces=namespaces): if vers.get('edition'): print "version,{},".format(vers.attrib['edition']) elif vers.get('prev') == '1': print "version,{},".format(vers.attrib['prev']) else: print "version,{},".format(vers.attrib['num']) namespaces = {'nvd': 'http://nvd.nist.gov/feeds/cve/1.2'} # OPTIONAL: registering namespace is useful for outputting XML with ET.tostring()/ET.dump() #for prefix, ns in namespaces.items(): # ET.register_namespace(prefix, ns) for i in range(range_from, range_to+1): e = ET.parse('nvdcve-%i.xml' % i).getroot() for entry in e: process_entry(entry)

출처

2017-10-05 17:46:01

예, 이것은 제가 처음에 시도하고 실패한 것입니다. 그런 다음 이상하게 작동하지 않는 이상한 작업에 넘어졌습니다. – Adthrawn

교체 고려 오류를 ... 포함하는 것을 잊었다

if "}prod" in e.tag: 
    print unichr(34) 
    if unichr(34) in e.attrib: 
     print "hey yo" 
     child.append("product," + str(e.attrib).split('"')[1] + "," + str(e.attrib).split('"')[3]) 
    else: 
     child.append("product," + str(e.attrib).split("'")[3] + "," + str(e.attrib).split("'")[7]) 
    #print e.tag, e.attrib 
if "'prev'" in e.attrib: 
    child.append("version," + str(e.attrib).split("'")[7] + "," + str(e.attrib).split("'")[3]) 
if "}vers" in e.tag and "'prev'" not in e.attrib: 
    child.append("version," + str(e.attrib).split("'")[3] + ",")

With ...

reg=r"\"|'(?=[^\"]*')|'(?=\W*\")" 
if "prod" in e.tag: 
    #print(re.split(reg,str(e.attrib))) 
    child.append("product," + re.split(reg,str(e.attrib))[3] + "," + re.split(reg,str(e.attrib))[7]) 
    #print e.tag, e.attrib 
if "prev" in e.attrib: 
    child.append("version," + re.split(reg,str(e.attrib))[7] + "," + re.split(reg,str(e.attrib))[3]) 
if "vers" in e.tag and "prev" not in e.attrib: 
    child.append("version," + re.split(reg,str(e.attrib))[3] + ",")

이것이 작동하는지 알고 싶으면 설명하겠습니다.

UPDATE

더 나은 솔루션은 다음과 같습니다 : - 귀하의 주어진 XML에

if "prod" in e.tag: 
     #print(e.attrib) 
     child.append("product," + e.attrib['name'] + "," + e.attrib['vendor']) 
    if "prev" in e.attrib: 
     child.append("version," + e.attrib['prev'] + "," + e.attrib['num']) 
    if "vers" in e.tag and "prev" not in e.attrib: 
     child.append("version," + e.attrib['num'] + ",")

동작하는 예제는 세 가지 경우에 당신을위한 here, 내 원래의 솔루션 및 업데이트 된 솔루션입니다 .

출처

2017-10-05 16:50:40 kaza

아, 두 번째 해결 방법은 내가 시도했던 서두름보다 훨씬 낫습니다. 나는 if를하지 않고 필드를 끌어 오기 위해 XPATH를 사용하려고했지만 실패하고있었습니다. – Adthrawn

@Adthrawn : stdlib의 xml.etree가 xpath를 지원하지 않습니다. xpath를 사용하려면 [lxml] (https://pypi.python.org/pypi/lxml)의 [etree] (http://lxml.de/tutorial.html)을 사용해야합니다. –

XML 속성을 파이썬으로 구문 분석 '및 "문자 분할

답변

관련 문제