XML: SAX and XInclude

[python 2.5, Linux Mandriva 2007.1]


does anybody know about an XML parser usable with the sax API (xml.sax)
and with XInclude feature support (directly or via hacks).

With specifying nothing (default parser), it simply transmit xinclude
elements (i tried to call parser on included file but it seem to be
waiting for a complete document with root element).

I tried libxml2, but failed with error (cf at the end).

I tried xmlproc, but it simply transmit xinclude elements too.

Posted On: Monday 5th of November 2012 02:32:49 AM Total Views:  383
View Complete with Replies

Related Messages:

Mysterious xml.sax Encoding Exception   (153 Views)
I have a module that uses xml.sax and feeds it a string of xml as in xml.sax.parseString(dictfile,handler) The xml is always encoded in utf-16, and the XML string always starts with This almost always works fine, but two users of this module get an exception whatever input they use it on. (The actual xml is generated by an api in our application that returns an xml version of metadata associated with the application's data.) The exception is xml.sax._exceptions.SAXParseException: :1:30: encoding specified in XML declaration is incorrect. In both of these cases, there are only plain, 7-bit ascii characters in the xml, and it really is valid utf-16 as far as I can tell. Now here is the hard part: This never happens to me, and having gotten the actual xml content from one of the users and fed it to the parser, I don't get the exception. What could be going on We are all on Python 2.5 (and all on an English locale). Any suggestions would be appreciated. -Jon Peck
xml.dom.minidom weirdness: bug?   (132 Views)
Hi. I was writing an xmltv parser using python when I faced some weirdness that I couldn't explain. What I'm doing, is read an xml file, create another dom object and copy the element from one to the other. At no time do I ever modify the original dom object, yet it gets modified. Unless I missed something, it sounds like a bug to me. the xml file is simply: full name which I store under the name test.xmltv Here is the code, I've removed everything that isn't applicable to my description. can't make it any simpler I'm afraid: from xml.dom.minidom import Document import xml.dom.minidom def adjusttimezone(docxml, timezone): doc = Document() # Create the base element tv_xml = doc.createElement("tv") doc.appendChild(tv_xml) #Create the channel list channellist = docxml.getElementsByTagName('channel') for x in channellist: #Copy the original attributes elem = doc.createElement("channel") for y in x.attributes.keys(): name = x.attributes[y].name value = x.attributes[y].value elem.setAttribute(name,value) for y in x.getElementsByTagName('display-name'): elem.appendChild(y) tv_xml.appendChild(elem) return doc if __name__ == '__main__': handle = open('test.xmltv','r') docxml = xml.dom.minidom.parse(handle) print 'step1' print docxml.toprettyxml(indent=" ",encoding="utf-8") doc = adjusttimezone(docxml, 1000) print 'step2' print docxml.toprettyxml(indent=" ",encoding="utf-8") Now at "step 1" I will display the content of the dom object, quite natually it shows: full name After a call to adjusttimezone, "step 2" however will show: That's it ! You'll note that at no time do I modify the content of docxml, yet it gets modified. The weirdness disappear if I change the line channellist = docxml.getElementsByTagName('channel') to channellist = copy.deepcopy(docxml.getElementsByTagName('channel')) However, my understanding is that it shouldn't be necessary. Any thoughts on this weirdness
Re: [newbie] using ElementTree, how to add doctype and xml pi   (192 Views)
En Wed, 19 Mar 2008 20:53:49 -0300, dave berk escribi: > I have an svg file i'm creating on the fly. How do I add the doctype and > xml > pi They're not an element per se, and there is no function to add them. The easiest way (but perhaps not-so-pure) is to just write those lines by hand before the document itself; see a recent post about "Inserting DTD statement to XML" -- Gabriel Genellina
Re: Write ooxml .ods (spreadsheat) from python?   (133 Views)
Rolf van de Krol wrote: > Neal Becker wrote: >> I'd like to output some data directly in .ods format. This format >> appears >> to be quite complex. Is there any python software available to do this >> I >> did look at pyuno briefly. It looks pretty complicated also, and it >> looks like it uses it's own private version of python, which would not >> help me. >> > Google is your friend. For example this: > > It seems like that guy found the way to go for your problem. > > Rolf > I don't think he's my friend today. I looked at this lib, but it starts with: # OOo's libraries import uno IIUC, this is the same problem. This uno module is tied to an old python (2.2) that ships with OO. Is it available standalone to build for python 2.5
Re: xpath and current python xml libraries   (132 Views)
On Dec 11, 2:03 am, wrote: > PyXML seems to be long gone. Is lxml the way to go if i want to have > xpath supported The libxml2dom package (which I maintain) also supports XPath and is also based on libxml2. If you want to migrate code from using PyXML without too much effort, it might be a solution. See here for details: Paul
Lxml on mac   (91 Views)
What to one do if one what to use lxml( index.html) on a mac Best regards
Parsing xml file in python   (113 Views)
I am a newbie in python I am trying to parse a xml file and write its content in a txt file. The txt contains null elements. Any reason what iam doing wrong here Here is the code that i wrote import sys,os import xml.sax import xml.sax.handler from xml.sax.handler import ContentHandler from xml.sax import make_parser class gmondxmlparse (ContentHandler): def __init__(self,searchTerm): self.searchTerm=searchTerm; def startElement(self,name,attrs): if name=="HOST": self.hostname=attrs.get('NAME',"") self.IP=attrs.get('IP',"") elif name=="METRIC": self.metricname=attrs.get('NAME', "") self.metricvalue=attrs.get('VAL',"") self.metrictype=attrs.get('TYPE',"") self.metricunit=attrs.get('UNITS',"") return def endElement(self,name): if name=="HOST" and self.searchTerm==self.hostname: try: fh=open('/root/yhpc-2.0/ganglia.txt' ,'w') except: print "File /root/yhpc-2.0/ganglia.txt can not be open" sys.exit(1) fh.write("This is a test for xml parsing with python with chris and amjad \n") fh.write("the host name is", self.hostname, "\n") fh.write("the ip address is", self.IP, "\n") fh.close() searchTerm="HOST" parser=make_parser() curHandler=gmondxmlparse(searchTerm) parser.setContentHandler(curHandler) parser.parse(open("/root/yhpc-2.0/gmond.xml")) Here is the sample of xml file Here is the xmk file called gmond.xml
ElementTree find with xmlns   (170 Views)
I'm having problems parsing a file: >>> tree = ElementTree.fromstring(""" world """) >>> print tree.find('body') None The above works fine with the first element being a simple , but not when I have all the xmlns's.
Remove namespace declaration from ElementTree in lxml   (162 Views)
: I want to remove an unused namespace declaration from the root element of an ElementTree in lxml. There doesn't seem to be any documented way to do this, so at the moment I'm reduced to sticking the output through str.replace() ... which is somewhat inelegant. Is there a better way -[]z.
xml-filter with XMLFilterBase() and XMLGenerator() shuffles attributes   (237 Views)
! I've made a trivial xml filter to modify some attributes on-the-fly: .... from __future__ import with_statement import os import sys from xml import sax from xml.sax import saxutils class ReIdFilter(saxutils.XMLFilterBase): def __init__(self, upstream, downstream): saxutils.XMLFilterBase.__init__(self, upstream) self.__downstream = downstream return def startElement(self, name, attrs): self.__downstream.startElement(name, attrs) return def startElementNS(self, name, qname, attrs): self.__downstream.startElementNS(name, qname, attrs) return def endElement(self, name): self.__downstream.endElement(name) return def endElementNS(self, name, qname): self.__downstream.endElementNS(name, qname) return def processingInstruction(self, target, body): self.__downstream.processingInstruction(target, body) return def comment(self, body): self.__downstream.comment(body) return def characters(self, text): self.__downstream.characters(text) return def ignorableWhitespace(self, ws): self.__downstream.ignorableWhitespace(ws) return .... with open(some_file_path, 'w') as f: parser = sax.make_parser() downstream_handler = saxutils.XMLGenerator(f, 'cp1251') filter_handler = ReIdFilter(parser, downstream_handler) filter_handler.parse(file_path) I want prevent it from shuffling attributes, i.e. preserve original file's attribute order. Is there any ContentHandler.features* responsible for that
xml modifications   (121 Views)
hi there...i'm a begginer level user and i've stumbbled upon a problem a bit beyond my knowledge. i hope that somebody will be able to help me with my problem... the problem is: i'm transforming an Access database to XML with some adjustements. basicaly i have one main table in which i have my main data and keys to other tables containing other relevant data. after exporting my main table to xml i have in some tags only keys to other tables. and now i want to change those keys to meaningful data. for instance: after exporting, i have a line like this: 1, and i want to change it to be just an opening tag for something else; .the data which must be written in attr1 and attr2 depends upon the number (in this case number 1) inside the tag (tagZ).
Advice for editing xml file using ElementTree and wxPython   (144 Views)
I'm a computational chemist who frequently dabbles in Python. A collaborator sent me a huge XML file that at one point was evidently modified by a now defunct java application. A sample of this file looks something like: Test File Name fileName Name of the input file water Number of Atoms natoms Number of atoms in the molecule 3 I've been playing around with parsing that file using the ElementTree functions, and writing little functions to walk the tree and print stuff out. I'd like to construct a little wxPython program to modify the values graphically, maybe using something like a TreeCtrl widget. I'm pretty sure I can figure out how to get the data into the widget. - Struct - File Name: water - Number of Atoms: 3 etc. What's confusing me is what I do when I shut down the gui and save the data back to a file. What I would like to be able to do is to update the values in the ElementTree itself, and use the .write(file) function of the elementtree to write out the file, since that ends up printing out something pretty much identical to the original xml file. If I want to do this, it seems like I need to keep a connection between the gui element and the original value in the elementtree, so I can update it. But I'm having a hard time visualizing exactly how this works. Can someone help me out here a bit If this is impossible, or too difficult, I can certainly figure out a way to dump the XML directly from the gui itself, but I worry that I'll mangle the XML in the process, which elementtree doesn't do (i.e. the null operation, parsing a file with elementtree and writing it out again doesn't change anything). Seems like this is something that's probably pretty common, modifying a data structure using a gui, so I'm hoping that someone has thought about this and has some good advice about best practices here.
lxml + mod_python: cannot unmarshal code objects in restricted execution mode   (259 Views)
Dmitri Fedoruk wrote: > def extApplyXslt(xslt, data, logger ): > try: > strXslt = urllib2.urlopen(xslt).read() > # i have to read the xslt url to the python string > except urllib2.HTTPError, e: > ....... > except urllib2.URLError, e: > ............. > try: > xslt_parser = etree.XMLParser() > xslt_parser.resolvers.add( PrefixResolver("XSLT") ) > > # and now I have to use the string; a more elegant solution, > anyone Sure, lxml.etree can parse from file-like objects. Just hand in the result of urlopen(). Apart from that, I saw that you found your way to the lxml mailing list, I'll respond over there. Stefan , On Sep 14, 3:04 am, Graham Dumpleton wrote: > Try forcing mod_python to run your code in the first interpreter > instance created by Python. > PythonInterpreter main_interpreter Thank you very much, that solved the problem! A more detailed discussion can also be found in the lxml-dev mailing list ( ) Dmitri
xml question   (134 Views)
Just curious if there's any python xml parsing tools built into the Mac (OS 10.4.10 Tiger) If so, could anyone share some simple code (or maybe point me to a web-site) of how to parse xml data from a file For example, if I had a file that contained this: Black Light Blue Green I'd want to look in the 'colors' xml element, then look at each 'color' xml element inside and pull any that have values (which would be Black, Light Blue and Green). I'd like to know how to pull 'attributes' too.
replacing xml elements with other elements using lxml   (142 Views)
, I'm attempting to generate a random story using xml as the document, and lxml as the parser. I want the document to be simplified before processing it further, and am very close to accomplishing my goal. Below is what I have so far. Any ideas on how to move forward The goal: read and edit xml file, replacing random elements with randomly picked content from within Completed: [x] read xml [x] access first random tag [x] pick random content within random item [o] need to replace tag with picked contents xml sample: Here is some content. Here is some random content. Here is some more random content. Here is some content. Python code: from lxml import etree from StringIO import StringIO import random theXml = "Here is some content.Here is some random content.Here is some more random content.Here is some content." f = StringIO(theXml) tree = etree.parse(f) r = tree.xpath('//random') if len(r) > 0: randInt = random.randInt(0,(len(r[0]) - 1)) randContents = r[0][randInt][0] #replace parent random tag with picked content here now that I have the contents tag randomly chosen, how do I delete the parent tag, and replace it to look like this: final xml sample (goal): Here is some content. Here is some random content. Here is some content. Any idea on how to do this So close!
Re: [lxml-dev] Python script to optimize XML text   (180 Views)
If your XML is well-formed, a XSLT is probably your best choice. I believe even the most trivial 'pass through' example might produce the output you expect here. -- Sidnei da Silva Enfold Systems Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214
lxml 1.3.6 released   (246 Views)
Hi all, lxml 1.3.6 is up on PyPI. This is a bug fix release for the stable 1.3 series. It features two important fixes for crash bugs. Updating is recommended. ** Install it with $ easy_install lxml==1.3.6 ** What is lxml """ In short: lxml is the most feature-rich and easy-to-use library for working with XML and HTML in the Python language. lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique in that it combines the speed and feature completeness of these libraries with the simplicity of a native Python API. """ Have fun, Stefan 1.3.6 (2007-10-29) ================== Bugs fixed ---------- * Backported decref crash fix from 2.0 * Well hidden free-while-in-use crash bug in ObjectPath Other changes ------------- * The test suites now run ``gc.collect()`` in the ``tearDown()`` methods. While this makes them take a lot longer to run, it also makes it easier to link a specific test to garbage collection problems that would otherwise appear in later tests.
toprettyxml messes up with whitespaces   (186 Views)
Hi all, I parse an XML file, replace a node with a new one (like updating cache) and write it back. Every write, new spaces are added. For example, first read - update - write cycle; My First App Second cycle: My First App Third cycle: My First App And this goes on. The node is one that is not touched in the XML, it is simply written back after reading. I have the same with void spaces in between the nodes, I managed to compensate that by stripping the lines. I would like to use toprettyxml to make it user editable and viewable. But this is really weird. How can I circumvent this behaviour regards, - Jorgen
Error when trying to write unicode xml to zipfile   (139 Views)
I get below error when trying to write unicode xml to a zipfile. zip.writestr('content.xml', content.toxml()) File "/usr/lib/python2.4/", line 460, in writestr zinfo.CRC = binascii.crc32(bytes) # CRC-32 checksum UnicodeEncodeError: 'ascii' codec can't encode character u'\u25cf' in position 2848: ordinal not in range(128) Any ideas Martin
xml yml and dependency hell   (99 Views)
yaml by its indent-orientation is quite pythonic. In comparison xml is cumbersome and laborious. Strangely ruby supports yaml out of the box but python requires a third party package PyYAML. Now this may not seem like a big deal for us -- installing pyYAML takes all of one minute -- but it may not be so to others as I recently learned. I conducted a python training for a corporate in which I showed among other things that yaml is much neater than xml. They agreed that it was neat but were reluctant to consider it because it adds dependency headaches at a later point with their customers. So is it likely that yaml will make it to the standard python library at some point