Brain Flush

May 19, 2009

The Force Unleashed: XML+XPath on Android using dom4j and Jaxen

Filed under: Linux & Open Source, Mobile Devices, Software Development & Programming — Tags: , , , — Matthias @ 1:15 pm

*UPDATE* This post has become obsolete. Google has bundled the Java XPath APIs with the release of FroYo (Android 2.2 / level 8).

*UPDATE* The source code is now on GitHub. Feel free to fork ‘n fix. Here’s the JAR: http://github.com/kaeppler/dom4j-1.6.1-harmony/downloads

I have been very disappointed with Android’s XML parsing support from day one, it’s simply too low level, inconvenient to use, and is lacking important features (I was especially disappointed with the decision to exclude the JAXP XPath support from Android, which has become an integral part of the JSE).

This is not only about cosmetics. Parsing XML documents of only medium complexity already turned out to be error prone and very tedious on Android (white space normalization problems, broken Unicode entity ref expansion, etc.) and we would’ve had to rewrite stuff which existing Java XML libraries already do in a graceful and stable manner.

Since I have always been a big fan of dom4j, I fixed an issue with the latest source tree that prevented dom4j’s QNAME caching to work with Android’s Java implementation (or more precisely, with Apache Harmony’s SAX implementation — the Android Java implementation is based on a pre-release version of Apache Harmony).

I haven’t committed that change back to dom4j yet, because development seems to have stalled on that project, but if anyone is interested, I can host the source code and a working JAR somewhere (please drop a short line in the comments section, otherwise I won’t bother sharing it).

dom4j also works very well in conjunction with Jaxen (a free XPath implementation)!

Some example code to wet your mouth:

SAXReader reader = new SAXReader(); // dom4j SAXReader
Document document = reader.read(xmlInputStream); // dom4j Document

// select all link nodes with href "http://example.com"
List linkNodes = document.selectNodes("//link[@href='http://example.com']");

// select an attribute value
String val = linkNodes.get(0).attributeValue("href");

// select element text and trim it
String value = document.elementTextTrim("childNode");

etc. pp.

Simple, powerful, straight forward — and performance is also decent (it’s pretty slow in debug mode, but reasonably fast otherwise).

Create a free website or blog at WordPress.com.