root/trunk/RBFoundation/RBFoundation/XMLObjectify.py

Revision 434, 22.0 kB (checked in by sholloway, 6 years ago)

Changed namespace_map in XMLBuilder to use XMLNamespaceMap class. Should provide better functionality and more uniform behavior.

Line 
1 #!/usr/bin/env python
2 ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 ##~ License
4 ##~
5 ##- The RuneBlade Foundation library is intended to ease some
6 ##- aspects of writing intricate Jabber, XML, and User Interface (wxPython, etc.)
7 ##- applications, while providing the flexibility to modularly change the
8 ##- architecture. Enjoy.
9 ##~
10 ##~ Copyright (C) 2002  TechGame Networks, LLC.
11 ##~
12 ##~ This library is free software; you can redistribute it and/or
13 ##~ modify it under the terms of the BSD style License as found in the
14 ##~ LICENSE file included with this distribution.
15 ##~
16 ##~ TechGame Networks, LLC can be reached at:
17 ##~ 3578 E. Hartsel Drive #211
18 ##~ Colorado Springs, Colorado, USA, 80920
19 ##~
20 ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21
22 """Builds a Python object tree from an XML stream.
23
24 Functions:
25
26     Objectify
27     ObjectifyFile
28
29 Classes:
30
31     ObjectifiedXML
32     Objectifier
33
34 Example:
35
36     from RBFoundation.XMLObjectify import Objectify
37    
38     obj = Objectify('''<stream:stream xmlns:stream="Test Stream Namespace">
39             <comment>Some generic xml</comment>
40             <comment author="Shane Holloway">Fun with xml!</comment>
41         </stream:stream>''')
42
43     # iteration
44     for each in obj.comment:
45         print 'Author: %-20s' % getattr(each, 'author', '?unknown?'),
46         print each()
47
48     # direct access
49     print obj.comment[1].author
50
51 History:
52
53     XMLObjectify found its conceptual roots in similar tool created by David Mertz <mertz@gnosis.cx>
54     named objectify.py, which is still extremely useful and can still be found at http://gnosis.cx . 
55     Futher history of his module can be found there.  Now I say conceptual roots can be found there,
56     because the present state of XMLObjectify represents a 2nd rewrite code, maintaining only the
57     founding ideas, in order to serve new requirements. 
58
59     One new requirement is the ability to "build" a python object tree by instantiating python classes
60     from modules found on disk, and subsequently raising an exception if a suitable module is not found.
61     (Raising that exception is one of the essential reasons for the 2nd rewrite, as I did not factor it
62     in the 1st.)  This requirement provides the RBFoundation of the Skinning framework found in the
63     RuneBlade distribution, as well as XMLClassBuilder module.
64
65     A second requirement was "the need for speed".  In both Mertz's objectify code and my first rewrite,
66     a python object tree can take quite a while to build.  (2.2 seconds for 31 k skin file.)  This,
67     in part is due to the DOM model provided with python, and partly to creating new python classes
68     on the fly.  This second rewrite notes that fact, and uses a prebuilt class (ObjectifiedXML) with
69     different data elements to avoid having to create new classes at run time.  This requirement is the
70     corner stone of the RuneBlade Jabber package's XML socket stream.
71
72     However, note that the first and second requirements are pretty much in direct opposition, as
73     dynamic imports will always be slower than using a predefined class that is already in scope. 
74     A conflict such as this points us in the direction of refactoring the functionality, allowing
75     for either speedy building of python object trees, or the great flexability of dynamically building
76     that object tree from modules on disk.  Hence, a new (pseudo) hierarchy is formed for this:
77        
78         + XMLBuilder (Abstract)
79         |
80         +-- XMLObjectify
81         |
82         +-+ XMLClassBuilder
83         | |
84         | +-- Skinning.XMLSkinner
85         |
86         +-- Jabber.Base
87 """
88
89 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90 #~ Imports                                           
91 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
92
93 from __future__ import generators
94 import XMLBuilder
95 from xml.sax.saxutils import escape, quoteattr
96
97 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
98 #~ Classes                                           
99 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
100
101 class BaseObjectifiedXML(XMLBuilder.XMLBuilderObjectBase):
102     """Represents an objectified XML node and its attributes with links to subnodes and contained PCData.
103
104     Example:
105    
106         from RBFoundation.XMLObjectify import Objectify
107        
108         obj = Objectify('''<stream:stream xmlns:stream="Test Stream Namespace">
109                 <comment>Some generic xml</comment>
110                 <comment author="Shane Holloway">
111                     Fun with xml!
112                     Lots of fun!
113                 </comment>
114             </stream:stream>''')
115
116         assert isinstance(obj, ObjectifiedXML)
117
118         # Element Access (note: there are two comments in the example XML
119         assert isinstance(obj.comment, list)
120         assert isinstance(obj.comment[1], ObjectifiedXML)
121
122         # PCData Access
123         assert isinstance(obj.comment[0](''), str)
124         assert isinstance(obj.comment[0](None), list)
125
126         # Attribute Access
127         assert isinstance(obj.comment[1].author, str)
128        
129         # Create new element
130         obj._addNewElement(None, 'comment', author='William')
131         assert obj.comment[2].author == 'William'
132
133         # Create new data
134         obj.comment[2]._addData("Some new data")
135
136         # Modify / create new attribute
137         obj.fun = "with XML!"
138
139         # Remove an attribute
140         del obj.fun
141
142         # Remove PCData
143         obj.comment[1]._removeElement(obj.comment[1](None)[-1])
144
145         # Remove All PCData
146         obj.comment[0]._clearData()
147
148         # Remove an element
149         obj._removeElement(obj.comment[0])
150
151         # Remove multiple elements
152         del obj.comments
153     """
154
155     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156     #~ Constants / Variables / Etc.                     
157     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158
159     _default_attributes = {}
160     _attributes_casts = {}
161  
162     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
163     #~ Special                                           
164     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
165
166     def __init__(self, owner, parent, node, attributes, namespacemap):
167         self.__namespace__, self.__node__ =  node
168         self.__namespace_map__ = namespacemap
169         self._attributes = self._default_attributes.copy()
170         self._attributes.update(attributes)
171
172         for key, cast in self._attributes_casts.iteritems():
173             try: self._attributes[key] = cast(self._attributes[key])
174             except KeyError: pass
175             except ValueError: pass
176  
177         self._elements = []
178
179     def __call__(self, joinstr=''):
180         """One way to get the PC data out of the node.  See __str__, __int__, __float__, and __complex__.
181         The joinstr argument adjusts how the different lines of PCData are joined, and passing None will
182         signal that the list should not be joined"""
183         return self._getData(joinstr)
184    
185     def __str__(self):
186         """Returns the PCData of the node in str format"""
187         return self._getData('')
188    
189     def __int__(self):
190         """Returns the PCData of the node in int format.  Be prepared to catch exceptions in the face of incorrect data.  (Or incorrect assumptions ;) )"""
191         return int(self._getData(''))
192
193     def __long__(self):
194         """Returns the PCData of the node in long format.  Be prepared to catch exceptions in the face of incorrect data.  (Or incorrect assumptions ;) )
195         Warning:  I think long type is being phased out?  Not quite sure."""
196         return long(self._getData(''))
197
198     def __float__(self):
199         """Returns the PCData of the node in float format.  Be prepared to catch exceptions in the face of incorrect data.  (Or incorrect assumptions ;) )"""
200         return float(self._getData(''))
201
202     def __complex__(self):
203         """Returns the PCData of the node in complex format.  Be prepared to catch exceptions in the face of incorrect data.  (Or incorrect assumptions ;) )"""
204         return complex(self._getData(''))
205
206     def __repr__(self):
207         """A slightly different repr"""
208         result = XMLBuilder.XMLBuilderObjectBase.__repr__(self)
209         return '<%s %r %r>' % (result[1:-1], self.__namespace__, self.__node__)
210        
211     def __cmp__(self, other):
212         """Compare two objects.  If other object is objectified, compare simplest elements first"""
213         if isinstance(other, ObjectifiedXML):
214             result = cmp(self.__namespace__, other.__namespace__)
215             if result: return result
216             result = cmp(self.__node__, other.__node__)
217             if result: return result
218             result = cmp(self._attributes, other._attributes)
219             if result: return result
220             result = cmp(self._elements, other._elements)
221             return result
222         else:
223             return cmp(other.__class__(self._getData('')), other)
224
225     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
226     #~ Protected Methods
227     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
228
229     def _iterAllElements(self, andData=0):
230         """Returns all elements."""
231         for x in self._elements:
232             if andData or x[0][-1]:
233                 yield x[-1]
234
235     def _iterElements(self, namespace=None, node=None, andData=0):
236         """Returns all elements."""
237         for x in self._elements:
238             if not andData and not x[0][-1]: continue
239             if namespace and namespace != x[0][0]: continue
240             if node and node != x[0][-1]: continue
241             yield x[-1]
242
243     def _getAllElements(self, andData=0):
244         """Returns all elements."""
245         if andData: return [x[-1] for x in self._elements]
246         else: return [x[-1] for x in self._elements if x[0][-1]]
247
248     def _getElements(self, namespace=None, node=None, andData=0):
249         """Returns all elements matching node."""
250         lst = self._elements
251         if not andData:
252             lst = [x for x in lst if x[0][-1]]
253         if node is not None:
254             lst = [x for x in lst if x[0][-1] == node]
255         if namespace is not None:
256             lst = [x for x in lst if x[0][0] == namespace]
257         return [x[-1] for x in lst]
258    
259     def _delAllElements(self, andData=0):
260         """Removes all elements, and if andData, all PCData as well."""
261         if andData: elements = []
262         else: elements = [x for x in self._elements if x[0][-1] == '']
263
264     def _delElements(self, namespace=None, node=None):
265         """Removes all elements matching node."""
266         lst = self._elements
267         if namespace is None:
268             if node is not None:
269                 lst = [x for x in lst if x[0][-1] != node]
270         elif node is None:
271             if namespace is not None:
272                 lst = [x for x in lst if x[0][0] != namespace]
273         else:
274             lst = [x for x in lst if x[0] != (namespace, node)]
275
276         if len(lst) != len(self._elements):
277             self._elements = lst
278             return 1
279         return 0
280
281     def _addElement(self, node, obj):
282         """Adds a subnode obj, that is in namespace, and has name node.  Obj is not necessarily an ObjectifiedXML class, but is required to implement _toXML."""
283         self._elements.append((node, obj))
284         return self._elements[-1]
285
286     def _addObjectifiedElement(self, obj):
287         """Same as _addElement, but assumes object has attributes __namespace__ and __node__.  A little more convenient for adding subnodes in code."""
288         return self._addElement((obj.__namespace__, obj.__node__), obj)
289
290     def _addNewElement(self, namespace, node, klass=None, **attributes):
291         """Creates and adds a new element in namespace, with name node, having attributes as given.  Uses self.__class__ for creating the element instance."""
292         namespace = namespace or self.__namespace__
293         klass = klass or self.__class__
294         namespacemap = self.__namespace_map__.newchain()
295         return self._addElement((namespace, node), klass(self, self, (namespace, node), attributes, namespacemap))
296        
297     def _removeElement(self, element):
298         """Removes a subnode element based on != relationship.  (Note: Can be a PCData element)"""
299         elements = [x[-1] for x in self._elements if x[-1] != element]
300         delta = len(self._elements) - len(elements)
301         if delta: self._elements = elements
302         return delta
303
304     def _getElementIndex(self, element):
305         idx = 0
306         for each in  self._elements:
307             if each[-1] is element:
308                 return idx
309             idx += 1
310
311     def _clearData(self):
312         """Removes all PCData from the element node."""
313         return self._delElements(node='')
314
315     __addDataHadNewLine = 1
316     def _addData(self, data):
317         """Adds PCData to the element node."""
318         HadNewline, self.__addDataHadNewLine = self.__addDataHadNewLine, data and data[-1] in '\n\r'
319         if not HadNewline and self._elements and self._elements[-1][0][-1] == '':
320             data = self._elements.pop()[1] + data
321         self._elements.append((('', ''), data))
322
323     def _setData(self, data):
324         """Clears PCData, then appends new PCData to the element node."""
325         self._clearData()
326         self._addData(data)
327
328     def _getData(self, joinstr=''):
329         result = [x[-1] for x in self._elements if not x[0][-1]]
330         if joinstr is not None:
331             return joinstr.join(result)
332         return result
333
334     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
335
336     def _toXML(self, strSplit='', nsOuter={}, bHeaderOnly=0):
337         """Converts the python object back into XML. 
338        
339             - If strSplit is None, the result is a nested list structure; otherwise, strSplit is used to join those lists into a string. 
340                 (default is '')
341            
342             - If nsOuter matches node.__namespace__, the namespace will not be included in the XML again. 
343                 (default is '')
344
345             - If bHeaderOnly is true, then only node, namespace, and attributes are included in the XML.  If bHeaderOnly is greater than 1,
346                 the header tag will also be closed.  If bHeader is false, then the child elements are included.
347                 (default is 0)
348         """
349         if isinstance(nsOuter, str): nsOuter = {nsOuter: ''}
350         elif nsOuter is None: nsOuter = {self.__namespace__: ''}
351         if self.__namespace_map__:
352             nsOuter = nsOuter.copy()
353             for key, value in self.__namespace_map__.iterxmlns():
354                 nsOuter[key] = value
355
356         if self.__namespace__ in nsOuter:
357             nodePrefix = nsOuter[self.__namespace__]
358         else:
359             # Remove old default namespace
360             nodePrefix = ''
361             for key, value in nsOuter.items():
362                 if value == nodePrefix: del nsOuter[key]
363             if self.__namespace__:
364                 # Add current namespace as "default"
365                 nsOuter[self.__namespace__] = nodePrefix
366                 self.__namespace_map__.setxmlns('', self.__namespace__)
367             else:
368                 # Remove the silly default namespace
369                 try: del self.__namespace_map__['']
370                 except KeyError: pass
371
372         # Node start
373         if nodePrefix:
374             nodename = '%s:%s' % (nodePrefix, self.__node__)
375         else: nodename = self.__node__
376         result = '<%s' % nodename
377
378         # Namespaces
379         result += self.__namespace_map__.xmlstr()
380
381         # Attributes
382         for attrname, attrvalue in self._attributes.iteritems():
383             if isinstance(attrname, tuple):
384                 if attrname[1] not in self._attributes:
385                     prefix = nsOuter[attrname[0]]
386                     if prefix: result += ' %s:%s=%s' % (prefix, attrname[1], quoteattr(attrvalue))
387                     else: result += ' %s=%s' % (attrname[1], quoteattr(attrvalue))
388                 #else: will be handled in not extended name form
389             elif nodePrefix: result += ' %s:%s=%s' % (nodePrefix, attrname[1], quoteattr(attrvalue))
390             else: result += ' %s=%s' % (attrname, quoteattr(attrvalue))
391
392         # Result constrution
393         if bHeaderOnly:
394             if bHeaderOnly > 1:
395                 result += '/>'
396             else: result += '>'
397             result = [result]
398         elif self._elements:
399             result += '>'
400             result = [result]
401             result.append(self._childrenToXML(strSplit, nsOuter))
402             result.append('</%s>' % nodename)
403         else:
404             result += '/>'
405             result = [result]
406         if strSplit is not None:
407             return strSplit.join(result)
408         else: return result
409     _toPrettyXML = _toXML
410
411     def _childrenToXML(self, strSplit='', nsOuter={}):
412         """Converts child python objects back into XML.
413             - If strSplit is None, the result is a nested list structure; otherwise, strSplit is used to join those lists into a string. 
414                 (default is '')
415            
416             - If nsOuter matches subnode.__namespace__, the namespace will not be included in the XML again. 
417                 (default is '')
418         """
419         if isinstance(nsOuter, str): nsOuter = {nsOuter:''}
420         elif nsOuter is None: nsOuter = {self.__namespace__: ''}
421         result = []
422         for tupleNSNode, each in self._elements:
423             if not tupleNSNode[-1]:
424                 result.append(escape(each))
425             else:
426                 result.append(each._toXML(strSplit, nsOuter))
427
428         result = filter(None, result)
429         if strSplit is not None:
430             return strSplit.join(result)
431         else: return result
432     _childrenToPrettyXML = _childrenToXML
433    
434 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
435    
436 class AttributedObjectifiedXML(object):
437     def __getattribute__(self, name):
438         """Allows for the node.attribute or node.subnode semantics"""
439         if '_' != name[0]:
440             _attributes = self._attributes
441             if name in _attributes: return _attributes[name]
442             xmlName = name.replace('_', '-')
443             if '-' == xmlName[-1]: xmlName = xmlName[:-1]
444             if xmlName in _attributes:
445                 return _attributes[xmlName]
446             result = self._getElements(node=xmlName)
447             if result:
448                 return result
449         return XMLBuilder.XMLBuilderObjectBase.__getattribute__(self, name)
450
451     def __setattr__(self, name, value):
452         """Allows user to create new attributes, or change values using node.attribute semantics"""
453         if '_' != name[0:1]:
454             if name in self._attributes:
455                 self._attributes[name] = value
456             else:
457                 xmlName = name.replace('_', '-')
458                 if '-' == xmlName[-1]: xmlName = xmlName[:-1]
459                 self._attributes[xmlName] = value
460         else:
461             return BaseObjectifiedXML.__setattr__(self, name, value)
462        
463     def __delattr__(self, name):
464         """Allows for deletion of attributes or subnodes through node.attribute or node.subnode semantics"""
465         if '_' != name[0:1]:
466             if name in self._attributes:
467                 del self._attributes[name]
468                 return
469
470             xmlName = name.replace('_', '-')
471             if '-' == xmlName[-1]: xmlName = xmlName[:-1]
472             if xmlName in self._attributes:
473                 del self._attributes[xmlName]
474                 return
475             else:
476                 if self._delElements(node=xmlName):
477                     return
478
479         BaseObjectifiedXML.__delattr__(self, name)
480         return
481
482 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
483
484 class ObjectifiedXML(BaseObjectifiedXML, AttributedObjectifiedXML):
485     pass
486
487 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
488
489 class Objectifier(XMLBuilder.XMLBuilder):
490     """An implementation of XMLBuilder that creates ObjectifiedXML nodes from an XML stream.
491     See Objectify and ObjectifyFile for usage."""
492     objectified_class = ObjectifiedXML
493     Objectify = XMLBuilder.XMLBuilder.Parse
494     ObjectifyFile = XMLBuilder.XMLBuilder.ParseFile
495
496     def _GetElementFactory(self, owner, parent, node, attributes, namespacemap):
497         """Signals that we always want to create self.objectified_class, which defaults to the ObjectifiedXML class."""
498         return self.objectified_class
499
500 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
501 #~ Functional Definitions                           
502 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
503
504 def Objectify(*args, **kw):
505     """Simple access to Objectifier.Objectify or Objectifier.ParseFile. 
506     Warning: Uses a shared object from _defaultObjectifier, and therefore is not threadsafe."""
507     return Objectifier().Objectify(*args, **kw)
508
509 def ObjectifyFile(*args, **kw):
510     """Simple access to Objectifier.ObjectifyFile or Objectifier.ParseFile. 
511     Warning: Uses a shared object from _defaultObjectifier, and therefore is not threadsafe."""
512     return Objectifier().ObjectifyFile(*args, **kw)
513
514 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
515 #~ Testing                                           
516 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
517
518 def _Test_XMLObjectify():
519     from pprint import pprint
520     xmlfile = open('test_objectify.xml', 'r')
521     obj = ObjectifyFile(xmlfile)
522     print repr(obj)
523     print ' ~ ' * 20
524     print (obj._toXML())
525     print ' ~ ' * 20
526     print (obj.message[0]._toXML())
527
528     print
529     print ' ~ ' * 20
530     print
531
532     obj = Objectify('''<test xmlns='OuterNamespace'><inner data='1'>content is fun!</inner></test>''')
533     print obj.inner[0]._toXML(nsOuter=None)
534
535     print
536     print ' ~ ' * 20
537     print
538
539     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
540
541     builder = Objectifier()
542     builder.objectified_class = BaseObjectifiedXML
543
544     xmlfile = open('test_objectify.xml', 'r')
545     obj = builder.ObjectifyFile(xmlfile)
546
547     print repr(obj)
548     print ' ~ ' * 20
549     print (obj._toXML())
550     print ' ~ ' * 20
551
552     from Aspects import Aspect
553     class MyAspect(Aspect, AttributedObjectifiedXML): pass
554     MyAspect.InsertAspect(obj)
555     print (obj.message[0]._toXML())
556
557     print
558     print ' ~ ' * 20
559     print
560
561     obj = builder.Objectify('''<test xmlns='OuterNamespace'><inner data='1'>content is fun!</inner></test>''')
562     MyAspect.InsertAspect(obj)
563     print obj.inner[0]._toXML(nsOuter=None)
564
565 #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
566
567 if __name__=='__main__':
568     print "Testing..."
569     _Test_XMLObjectify()
570     print "Test complete."
571
572
Note: See TracBrowser for help on using the browser.