Brain Flush

The Force Unleashed: XML+XPath on Android using dom4j and Jaxen

Posted in Linux & Open Source, Mobile Devices, Software Development & Programming by Matthias Käppler on May 19, 2009

*UPDATE* The source code is now on GitHub. Feel free to fork ‘n fix.

*UPDATE* I have put up the fixed dom4j JAR for download. The Jaxen JAR in that archive I didn’t touch, it’s the same you would download from their website.

I have been very disappointed with Android’s XML parsing support from day one, it’s simply too low level, inconvenient to use, and is lacking important features (I was especially disappointed with the decision to exclude the JAXP XPath support from Android, which has become an integral part of the JSE).

This is not only about cosmetics. Parsing XML documents of only medium complexity already turned out to be error prone and very tedious on Android (white space normalization problems, broken Unicode entity ref expansion, etc.) and we would’ve had to rewrite stuff which existing Java XML libraries already do in a graceful and stable manner.

Since I have always been a big fan of dom4j, I fixed an issue with the latest source tree that prevented dom4j’s QNAME caching to work with Android’s Java implementation (or more precisely, with Apache Harmony’s SAX implementation — the Android Java implementation is based on a pre-release version of Apache Harmony).

I haven’t committed that change back to dom4j yet, because development seems to have stalled on that project, but if anyone is interested, I can host the source code and a working JAR somewhere (please drop a short line in the comments section, otherwise I won’t bother sharing it).

dom4j also works very well in conjunction with Jaxen (a free XPath implementation)!

Some example code to wet your mouth:

SAXReader reader = new SAXReader(); // dom4j SAXReader
Document document = reader.read(xmlInputStream); // dom4j Document

// select all link nodes with href "http://example.com"
List<Element> linkNodes = document.selectNodes("//link[@href='http://example.com']");

// select an attribute value
String val = linkNodes.get(0).attributeValue("href");

// select element text and trim it
String value = document.elementTextTrim("childNode");

etc. pp.

Simple, powerful, straight forward — and performance is also decent (it’s pretty slow in debug mode, but reasonably fast otherwise).

Tagged with: , , ,

21 Responses

Subscribe to comments with RSS.

  1. Alistair said, on May 19, 2009 at 1:54 pm

    I would be interested in having this available. XMLPullParser is a pain in the arse to use.

    Al.

    • Matthias Käppler said, on May 19, 2009 at 1:58 pm

      I wholeheartedly agree :-) Okay then, I’ll upload the modified sources to Google Code later (don’t have time right now), so others can work on fixes as well, in case they find that more things are broken. Meanwhile, I’ll send you the JAR via email so you can get crackin on your stuff.

      Cheers

  2. Sagar said, on May 20, 2009 at 5:24 am

    Hi…I would to love to have it. It is very tedious parsing xml in android. I need to parse big xml files of aprpox 512 KB.I have developed my own parser but I want to try with dom4j. Will it be good in performance? Please mail me the jar file. Thanks a lot.

    • Matthias Käppler said, on May 20, 2009 at 7:58 am

      512kb is rather big already, so I’m not sure about the performance here (dom4j will load the complete doc into memory), but you can set up a test app in no time and see for yourself.

      Please no more questions about performance guys… it’s simply impossible for me to tell whether it’ll be fast enough for the apps you’re developing. We use it to parse small xml documents coming from a web service. For this scenario, performance is fine.

      You may also see a slight slowdown when accessing a doc using xpath for the first time, because this will trigger the classloader to load the jaxen stuff.

  3. Matthias Käppler said, on May 20, 2009 at 8:41 am

    Actually I just realized that I cannot put the sources on Google Code, because dom4j is not released under a common open source license. Any other suggestions?

    • Martin Österlund said, on September 22, 2009 at 3:53 pm

      dom4j is released under a BSD license, one of the most common open source licenses…

      • Matthias Käppler said, on September 22, 2009 at 5:51 pm

        You’re right, it’s indeed the BSD license. What got me turned off was this:

        3. The name “DOM4J” must not be used to endorse or promote
        products derived from this Software without prior written
        permission of MetaStuff, Ltd. For written permission,
        please contact dom4j-info@metastuff.com.

        4. Products derived from this Software may not be called “DOM4J”
        nor may “DOM4J” appear in their names without prior written
        permission of MetaStuff, Ltd. DOM4J is a registered
        trademark of MetaStuff, Ltd.

        That doesn’t sound very liberal after all, since I couldn’t give a derived project a name like dom4j-android. Any name not containing dom4j would completely obscure my intention: fix dom4j to run on Harmony.

  4. Ilya Volodarsky said, on July 31, 2009 at 7:00 am

    hi! I would love to have your updated dom4j jar with the Android QName fix. I spent about 2 hours implementing my XML generator/parser code and the 3 more futzing around trying to find the bug. I thought it was handling whitespace wrong (and i couldnt figure out why it worked in a java app, and not in Android), and then I found your post. Would really appreciate it!

    Thanks,

    Ilya

  5. Jim said, on August 25, 2009 at 2:56 pm

    Hi!

    I’m trying to import the dom4j .jar file in my Android project, but it doesnt seem to find the classes. E.g. I can’t import org.dom4j.Document.

    I imported the .jar file in my /src folder in Eclipse. When I open the white dom4j folders it only shows a .html file.

    Do you know what I’m doing wrong?

    • Matthias Käppler said, on August 25, 2009 at 3:07 pm

      The src/ folder’s purpose is to hold source code, not libraries. You may want to create a lib/ folder instead and drop the JAR in there. You then add the library as you would do with any other Java based project by adding it to the build path via the project properties. That worked fine for me.

      Cheers
      Matthias

      • Jim said, on August 31, 2009 at 4:00 pm

        Yes, thanks my friend I got it working.

        I’m new to Java so I didnt knew how to include .JAR files in my application at all. But now I got it working :)

  6. JG said, on August 27, 2009 at 12:59 pm

    comparing Sun’s Xpath usage and the dom4j/Jaxen approach, it would seem that I cannot create a dom4j/Jaxen XPath object without an expression parameter while in Sun’s approach I can. This seems rather strange as I might want to use that Xpath object with different expressions and it doesn’t seem to be possible to do that in dom4j/Jaxen. eg: In Sun’s evaluate method the 1st parameter refers to the expression while in dom4j/Jaxen there is no expression parameter. Is this correct?

  7. JG said, on August 27, 2009 at 3:10 pm

    I’m not sure if I follow. I was trying to use the dom4j/jaxen evaluate method as sun method is used: Object res = evaluate(String expression, Object item, QName returnType).
    In dom4j, I’m creating a new xpath object giving the expression as parameter and then calling evaluate(item) and ignoring the returnType. It just might be the return type setting that I’m missing.

  8. Joel Rives said, on September 17, 2009 at 7:43 pm

    Thanks for making the updated jar file available. I just ran into this problem as I switched from struggling to get JDom working on Android to Dom4J. I agree, the problem seems to be with the Harmony SAX parser. It is also causing problems for JDom. Only, with JDom it is more fatal. Anyway, thanks again.

  9. Eric Mill said, on September 29, 2009 at 3:36 pm

    Thanks for posting this. As you can tell by the continued comments, a good, simple XML parser in Android is still definitely a community need.

    • Matthias Käppler said, on September 29, 2009 at 4:10 pm

      Hey Eric,

      I also have a small XML parser abstraction for Android in the pipe (I plan to release this as part of my DroidFu utility library for Android) which is purely based on the XML Pull Parser implementation shipped with Android. Nothing special really, but it takes care of wrapping boilerplate code (such as skipping whitespace between nodes and normalizing whitespace in text nodes) and works with minimal configuration effort, so you can focus on the actual task.

      It uses reflection to allow being used like this:


      class MyModelParser extends XmlModelParser {

      // currentItem is a MyModel

      public void onFooTag(String content, Map attributes, String parentNode) {
      this.currentItem.setFoo(new Foo());
      }

      public void onFooText(String text) {
      this.currentItem.getFoo().setValue(text);
      }
      ...
      }

      parser = new MyModelParser();
      List results = parser.parse(someInputStream);

      when instantiated, the parser will take note of which callback methods are defined, and invoke them automatically when a matching tag or text node is spotted in the stream.

      it’s a simple thing, but works well enough for small documents. still have to test whether or not it’s affected by the unicode expansion problem.

  10. Jan Berkel said, on October 19, 2009 at 11:57 pm

    hi matthias,

    thanks for your work on this, i’ve been trying to get xml working on android for quite a while now. how about svn importing dom4j into github and then have a separate branch containing your harmony fixes? i’d be happy to set it up if you send me the diff.

  11. Jan Berkel said, on October 20, 2009 at 2:01 pm

    just found another option: there’s a sax driver for android’s pull parser, which you can use like so:


    XmlPullParser parser = org.xmlpull.v1.XmlPullParserFactory.newInstance().newPullParser();
    Driver driver = new org.xmlpull.v1.sax2.Driver(parser);
    driver.parse(new InputSource(...));


Leave a Reply