原文鏈接:http://wyb0.com/posts/python-read-and-write-xml/
0x00 解析XML的方法
SAX (simple API for XML)
python 標準庫包含SAX解析器赂蕴,SAX用事件驅(qū)動模型,通過在解析XML的過程中觸發(fā)一個個的事件并調(diào)用用戶定義的回調(diào)函數(shù)來處理XML文件难衰。DOM(Document Object Model)
將XML數(shù)據(jù)在內(nèi)存中解析成一個樹偿短,通過對樹的操作來操作XML铅檩。ElementTree(元素樹)
ElementTree就像一個輕量級的DOM国拇,具有方便友好的API沪么。代碼可用性好捅位,速度快,消耗內(nèi)存少拱层。
- 我在這里使用ElementTree
0x01 Element對象的屬性
每個Element對象都具有以下屬性:
- tag:string對象弥臼,表示數(shù)據(jù)代表的種類
- attrib:dictionary對象,表示附有的屬性
- text:string對象根灯,表示element的內(nèi)容
- tail:string對象径缅,表示element閉合之后的尾跡
- 若干子元素(child elements)
>>> from xml.etree import ElementTree as ET
>>> xml = """<books>
... <book id='37476'>aaaa</book>
... <book id='83727'>bbbb</book>
... </books>"""
>>> root = ET.fromstring(xml)
>>> root.tag
'books'
>>> child = root.getchildren()
>>> child
[<Element 'book' at 0x106f59410>, <Element 'book' at 0x106f59450>]
>>> child[0].tag
'book'
>>> child[0].attrib
{'id': '37476'}
>>> child[0].text
'aaaa'
0x02 文件內(nèi)容
<?xml version='1.0' encoding='UTF-8'?>
<books>
<book>
<name>Python黑帽子</name>
<date>2015</date>
<price>37¥</price>
<description>用python寫一些程序</description>
</book>
<book>
<name>Web安全深度剖析</name>
<date>2014</date>
<price>39¥</price>
<description>講述web滲透的基礎(chǔ)知識</description>
</book>
<book>
<name>白帽子講web安全</name>
<date>2013</date>
<price>44¥</price>
<description>道哥力作</description>
</book>
</books>
0x03 讀取xml節(jié)點
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from xml.etree import ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
# root = ET.fromstring(country_data_as_string) #通過字符串導(dǎo)入,直接獲取根
childs = root.getchildren()
books = []
for child0 in childs:
book = {}
for child00 in child0.getchildren():
# print child00.tag #標簽名,即name烙肺、date纳猪、price、description
# print child00.text
book[child00.tag] = child00.text
books.append(book)
print books
"""
books = [
{'name': 'Python黑帽子','date': '2015','price': '37¥','description': '用python寫一些程序'},
{'name': 'Web安全深度剖析','date': '2014','price': '39¥','description': '講述web滲透的基礎(chǔ)知識'},
{'name': '白帽子講web安全','date': '2013','price': '44¥','description': '道哥力作'}
]
"""
0x04 寫入xml文件
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from xml.etree.ElementTree import Element,ElementTree
books = [
{
'name': u'Python黑帽子',
'date': '2015',
'price': u'37¥',
'description': u'用python寫一些程序'
},
{
'name': u'Web安全深度剖析',
'date': '2014',
'price': u'39¥',
'description': u'講述web滲透的基礎(chǔ)知識'
},
{
'name': u'白帽子講web安全',
'date': '2013',
'price': u'44¥',
'description': u'道哥力作'
}
]
def indent(elem, level=0):
"""美化寫入文件的內(nèi)容"""
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for elem in elem:
indent(elem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
root = Element('books')
tree = ElementTree(root)
for book in books:
child0 = Element('book')
root.append(child0)
for k,v in book.items():
child00 = Element(k)
child00.text = v
child0.append(child00)
indent(root,0)
tree.write('aa.xml', 'UTF-8')