Working with minidom
We'll cover the following...
To start out, well need some actual XML to parse. Take a look at the following short example of XML:
Press + to interact
<?xml version="1.0" ?><zAppointments reminder="15"><appointment><begin>1181251680</begin><uid>040000008200E000</uid><alarmTime>1181572063</alarmTime><state></state><location></location><duration>1800</duration><subject>Bring pizza home</subject></appointment></zAppointments>
This is fairly typical XML and actually pretty intuitive to read. There is some really nasty XML out in the wild that you may have to work with. Anyway, save the XML code above with the following name: appt.xml
Let’s spend some time getting acquainted with how to parse this file using Python’s minidom module. This is a fairly long piece of code, so prepare yourself.
Press + to interact
import xml.dom.minidomimport urllib.requestclass ApptParser(object):def __init__(self, url, flag='url'):self.list = []self.appt_list = []self.flag = flagself.rem_value = 0xml = self.getXml(url)self.handleXml(xml)def getXml(self, url):try:print(url)f = urllib.request.urlopen(url)except:f = urldoc = xml.dom.minidom.parse(f)node = doc.documentElementif node.nodeType == xml.dom.Node.ELEMENT_NODE:print('Element name: %s' % node.nodeName)for (name, value) in node.attributes.items():print(' Attr -- Name: %s Value: %s' % (name, value))return nodedef handleXml(self, xml):rem = xml.getElementsByTagName('zAppointments')appointments = xml.getElementsByTagName("appointment")self.handleAppts(appointments)def getElement(self, element):return self.getText(element.childNodes)def handleAppts(self, appts):for appt in appts:self.handleAppt(appt)self.list = []def handleAppt(self, appt):begin = self.getElement(appt.getElementsByTagName("begin")[0])duration = self.getElement(appt.getElementsByTagName("duration")[0])subject = self.getElement(appt.getElementsByTagName("subject")[0])location = self.getElement(appt.getElementsByTagName("location")[0])uid = self.getElement(appt.getElementsByTagName("uid")[0])self.list.append(begin)self.list.append(duration)self.list.append(subject)self.list.append(location)self.list.append(uid)if self.flag == 'file':try:state = self.getElement(appt.getElementsByTagName("state")[0])self.list.append(state)alarm = self.getElement(appt.getElementsByTagName("alarmTime")[0])self.list.append(alarm)except Exception as e:print(e)self.appt_list.append(self.list)def getText(self, nodelist):rc = ""for node in nodelist:if node.nodeType == node.TEXT_NODE:rc = rc + node.datareturn rcif __name__ == "__main__":appt = ApptParser("appt.xml")print(appt.appt_list)
This code is loosely based on an example from the Python documentation and I have to admit that I think my mutation of ...
Access this course and 1400+ top-rated courses and projects.