IDEAS.txt

Things to try out when life permits
===================================

* zlib-based parsing/serialising of compressed in-memory data

  * requires a libxml2 I/O OutputBuffer with appropriate I/O functions
    that call into the zlib compression routines

* lzma-based parsing/serialising of compressed in-memory data

  * requires a libxml2 I/O OutputBuffer with appropriate I/O functions
    that call into the lzma compression routines

  * advantage over zlib: probably faster and better compression

  * maybe embed the lzma C sources in the distro
    http://www.7-zip.org/sdk.html

* generating XML using the ``with`` statement

  http://comments.gmane.org/gmane.comp.python.general/579950?set_lines=100000

* parse-time validation against a user provided DTD

  * currently only works for XML Schema

* somehow integrate RelaxNG compact notation (rnc versus rng)

  * currently not supported by libxml2 (patch exists)

* support subclassing XSLTAccessControl to provide custom per-URL
  access check methods

  * maybe custom resolvers are enough, or can be combined with this?

* reimplement iterparse() using the libxml2 xmlReader API

  * Advantage: the implementation can be made safer than the current
    SAX implementation, as the parser would not interact with the
    Python-level tree.

  * Disadvantage: the tree has to be built manually. In the current
    SAX based implementation, libxml2 does it for us.

* rewrite iterparse() to accept a parser as argument instead of being
  one

  * disadvantage: iterparse() can't deal with all parser options

* provide an HTMLParser wrapper that handles broken encodings in broken
  HTML better, e.g. using BeautifulSoup's "unicode dammit" analyser

* expose namespace prefixes through the QName class