Brain Flush

May 19, 2009

The Force Unleashed: XML+XPath on Android using dom4j and Jaxen

Filed under: Linux & Open Source, Mobile Devices, Software Development & Programming — Tags: , , , — Matthias @ 1:15 pm

*UPDATE* This post has become obsolete. Google has bundled the Java XPath APIs with the release of FroYo (Android 2.2 / level 8).

*UPDATE* The source code is now on GitHub. Feel free to fork ‘n fix. Here’s the JAR: http://github.com/kaeppler/dom4j-1.6.1-harmony/downloads

I have been very disappointed with Android’s XML parsing support from day one, it’s simply too low level, inconvenient to use, and is lacking important features (I was especially disappointed with the decision to exclude the JAXP XPath support from Android, which has become an integral part of the JSE).

This is not only about cosmetics. Parsing XML documents of only medium complexity already turned out to be error prone and very tedious on Android (white space normalization problems, broken Unicode entity ref expansion, etc.) and we would’ve had to rewrite stuff which existing Java XML libraries already do in a graceful and stable manner.

Since I have always been a big fan of dom4j, I fixed an issue with the latest source tree that prevented dom4j’s QNAME caching to work with Android’s Java implementation (or more precisely, with Apache Harmony’s SAX implementation — the Android Java implementation is based on a pre-release version of Apache Harmony).

I haven’t committed that change back to dom4j yet, because development seems to have stalled on that project, but if anyone is interested, I can host the source code and a working JAR somewhere (please drop a short line in the comments section, otherwise I won’t bother sharing it).

dom4j also works very well in conjunction with Jaxen (a free XPath implementation)!

Some example code to wet your mouth:

SAXReader reader = new SAXReader(); // dom4j SAXReader
Document document = reader.read(xmlInputStream); // dom4j Document

// select all link nodes with href "http://example.com"
List linkNodes = document.selectNodes("//link[@href='http://example.com']");

// select an attribute value
String val = linkNodes.get(0).attributeValue("href");

// select element text and trim it
String value = document.elementTextTrim("childNode");

etc. pp.

Simple, powerful, straight forward — and performance is also decent (it’s pretty slow in debug mode, but reasonably fast otherwise).

About these ads

42 Comments »

  1. I would be interested in having this available. XMLPullParser is a pain in the arse to use.

    Al.

    Comment by Alistair — May 19, 2009 @ 1:54 pm

    • I wholeheartedly agree :-) Okay then, I’ll upload the modified sources to Google Code later (don’t have time right now), so others can work on fixes as well, in case they find that more things are broken. Meanwhile, I’ll send you the JAR via email so you can get crackin on your stuff.

      Cheers

      Comment by Matthias Käppler — May 19, 2009 @ 1:58 pm

  2. Hi…I would to love to have it. It is very tedious parsing xml in android. I need to parse big xml files of aprpox 512 KB.I have developed my own parser but I want to try with dom4j. Will it be good in performance? Please mail me the jar file. Thanks a lot.

    Comment by Sagar — May 20, 2009 @ 5:24 am

    • 512kb is rather big already, so I’m not sure about the performance here (dom4j will load the complete doc into memory), but you can set up a test app in no time and see for yourself.

      Please no more questions about performance guys… it’s simply impossible for me to tell whether it’ll be fast enough for the apps you’re developing. We use it to parse small xml documents coming from a web service. For this scenario, performance is fine.

      You may also see a slight slowdown when accessing a doc using xpath for the first time, because this will trigger the classloader to load the jaxen stuff.

      Comment by Matthias Käppler — May 20, 2009 @ 7:58 am

  3. Actually I just realized that I cannot put the sources on Google Code, because dom4j is not released under a common open source license. Any other suggestions?

    Comment by Matthias Käppler — May 20, 2009 @ 8:41 am

    • dom4j is released under a BSD license, one of the most common open source licenses…

      Comment by Martin Österlund — September 22, 2009 @ 3:53 pm

      • You’re right, it’s indeed the BSD license. What got me turned off was this:

        3. The name “DOM4J” must not be used to endorse or promote
        products derived from this Software without prior written
        permission of MetaStuff, Ltd. For written permission,
        please contact dom4j-info@metastuff.com.

        4. Products derived from this Software may not be called “DOM4J”
        nor may “DOM4J” appear in their names without prior written
        permission of MetaStuff, Ltd. DOM4J is a registered
        trademark of MetaStuff, Ltd.

        That doesn’t sound very liberal after all, since I couldn’t give a derived project a name like dom4j-android. Any name not containing dom4j would completely obscure my intention: fix dom4j to run on Harmony.

        Comment by Matthias Käppler — September 22, 2009 @ 5:51 pm

  4. hi! I would love to have your updated dom4j jar with the Android QName fix. I spent about 2 hours implementing my XML generator/parser code and the 3 more futzing around trying to find the bug. I thought it was handling whitespace wrong (and i couldnt figure out why it worked in a java app, and not in Android), and then I found your post. Would really appreciate it!

    Thanks,

    Ilya

    Comment by Ilya Volodarsky — July 31, 2009 @ 7:00 am

  5. Hi!

    I’m trying to import the dom4j .jar file in my Android project, but it doesnt seem to find the classes. E.g. I can’t import org.dom4j.Document.

    I imported the .jar file in my /src folder in Eclipse. When I open the white dom4j folders it only shows a .html file.

    Do you know what I’m doing wrong?

    Comment by Jim — August 25, 2009 @ 2:56 pm

    • The src/ folder’s purpose is to hold source code, not libraries. You may want to create a lib/ folder instead and drop the JAR in there. You then add the library as you would do with any other Java based project by adding it to the build path via the project properties. That worked fine for me.

      Cheers
      Matthias

      Comment by Matthias Käppler — August 25, 2009 @ 3:07 pm

      • Yes, thanks my friend I got it working.

        I’m new to Java so I didnt knew how to include .JAR files in my application at all. But now I got it working :)

        Comment by Jim — August 31, 2009 @ 4:00 pm

  6. comparing Sun’s Xpath usage and the dom4j/Jaxen approach, it would seem that I cannot create a dom4j/Jaxen XPath object without an expression parameter while in Sun’s approach I can. This seems rather strange as I might want to use that Xpath object with different expressions and it doesn’t seem to be possible to do that in dom4j/Jaxen. eg: In Sun’s evaluate method the 1st parameter refers to the expression while in dom4j/Jaxen there is no expression parameter. Is this correct?

    Comment by JG — August 27, 2009 @ 12:59 pm

  7. I’m not sure if I follow. I was trying to use the dom4j/jaxen evaluate method as sun method is used: Object res = evaluate(String expression, Object item, QName returnType).
    In dom4j, I’m creating a new xpath object giving the expression as parameter and then calling evaluate(item) and ignoring the returnType. It just might be the return type setting that I’m missing.

    Comment by JG — August 27, 2009 @ 3:10 pm

  8. Thanks for making the updated jar file available. I just ran into this problem as I switched from struggling to get JDom working on Android to Dom4J. I agree, the problem seems to be with the Harmony SAX parser. It is also causing problems for JDom. Only, with JDom it is more fatal. Anyway, thanks again.

    Comment by Joel Rives — September 17, 2009 @ 7:43 pm

  9. Thanks for posting this. As you can tell by the continued comments, a good, simple XML parser in Android is still definitely a community need.

    Comment by Eric Mill — September 29, 2009 @ 3:36 pm

    • Hey Eric,

      I also have a small XML parser abstraction for Android in the pipe (I plan to release this as part of my DroidFu utility library for Android) which is purely based on the XML Pull Parser implementation shipped with Android. Nothing special really, but it takes care of wrapping boilerplate code (such as skipping whitespace between nodes and normalizing whitespace in text nodes) and works with minimal configuration effort, so you can focus on the actual task.

      It uses reflection to allow being used like this:


      class MyModelParser extends XmlModelParser {

      // currentItem is a MyModel

      public void onFooTag(String content, Map attributes, String parentNode) {
      this.currentItem.setFoo(new Foo());
      }

      public void onFooText(String text) {
      this.currentItem.getFoo().setValue(text);
      }
      ...
      }

      parser = new MyModelParser();
      List results = parser.parse(someInputStream);

      when instantiated, the parser will take note of which callback methods are defined, and invoke them automatically when a matching tag or text node is spotted in the stream.

      it’s a simple thing, but works well enough for small documents. still have to test whether or not it’s affected by the unicode expansion problem.

      Comment by Matthias Käppler — September 29, 2009 @ 4:10 pm

  10. hi matthias,

    thanks for your work on this, i’ve been trying to get xml working on android for quite a while now. how about svn importing dom4j into github and then have a separate branch containing your harmony fixes? i’d be happy to set it up if you send me the diff.

    Comment by Jan Berkel — October 19, 2009 @ 11:57 pm

  11. just found another option: there’s a sax driver for android’s pull parser, which you can use like so:


    XmlPullParser parser = org.xmlpull.v1.XmlPullParserFactory.newInstance().newPullParser();
    Driver driver = new org.xmlpull.v1.sax2.Driver(parser);
    driver.parse(new InputSource(...));

    Comment by Jan Berkel — October 20, 2009 @ 2:01 pm

  12. vtd-xml has the industry leading parsing and xpath performance

    vtd-xml

    Comment by Anonymous — November 25, 2009 @ 9:36 am

    • That’s a rather interesting concept. Do you have benchmarks for mobile devices, in particular, Android? Also, I was really missing a comparison to a pure SAX parser. What are the dependencies of the Java implementation?

      Comment by Matthias Käppler — November 25, 2009 @ 9:49 am

      • The benchmark is applicable for any devices and processor types, a comparison with SAX type is somewhat difficult because, SAX is very low level and doesn’t support random access, yet VTD outperforms SAX by a typical 2 times, and still support XPath,random acess and has its signature of ease of use…

        Comment by anoymous — December 1, 2009 @ 12:07 am

  13. When I build the DOM4J.jar, and include in Android project, I receive a lot of ‘Ignoring InnerClasses attribute for an anonymous inner class that doesn’t come with an associated EnclosingMethod attribute. (This class was probably produced by a broken compiler.)’.
    Do you have a idea ?

    Comment by Philippe — January 19, 2010 @ 5:20 pm

    • that’s nothing to worry about. This happens for JARs that have been build with older Java compilers. If this annoys you, you could try re-building the JAR using the build file that comes with dom4j, but I think it requires Maven 1.

      Comment by Matthias Käppler — January 19, 2010 @ 5:35 pm

  14. Hi,

    I would really like to know what I am doing wrong… I built an Android application in Netbeans and want to use the XPath library of dom4j. Does anybody of you have an idea what’s wrong with the lines below?

    .
    .
    .
    URL url = new URL(Url);
    SAXReader reader = new SAXReader(); // Here debugging stops/goes to Walhalla without exception
    Document xmlDoc = reader.read(url);
    .
    .
    .

    Thanks in advance.
    Sebastian.

    Comment by Sebastian — February 13, 2010 @ 9:17 pm

  15. Hi Matthias Käppler,could you please send me a copy of this:http://infodump.de/dom4j+jaxen-android.zip ? I could’t open the link:(

    Comment by Andy — June 27, 2010 @ 4:29 pm

  16. I write a very quick XPATH analyzer for SAX (non random access). It’s possible to analyze the same XML file with many XPATH at the same time.
    It’s specifically write for Android.
    All is here : http://code.google.com/p/xpath4sax/

    Comment by Philippe Prados — June 28, 2010 @ 9:43 am

  17. [...] the meantime, there is a partial solution. Matthias Käppler has released a port of dom4j for Android which works with the SDK’s SAX parser to create object [...]

    Pingback by Consuming XML in Android: where’s XPath? | Adam Foltzer's Blog — January 9, 2011 @ 8:33 pm

  18. I am interested in your dom4j adapted for android 2.1 or earlier.
    I am even willing to pay 20$ for it!

    Let me know how I can get it and if it works on 2.1 and earlier versions?

    Comment by Timo — January 22, 2011 @ 9:53 pm

    • Haha, cheers, but you can have it for free :-) sources are on my GitHub: http://www.github.com/kaeppler — note that I haven’t tested it with recent Android versions, but the fix I applied was trivial so I’m positive it still works.

      Comment by Matthias Käppler — January 22, 2011 @ 10:54 pm

      • I downloaded the sources and trying to put it together.
        I am getting this error

        The project was not built since its build path is incomplete. Cannot find the class file for javax.xml.transform.Result.

        Also couple of methods in class
        org.dom4j.io.STAXEventWriter
        do not want to compile.

        I am trying to make dom4j work with Android 2.1.
        Thanks,
        Tim.

        Comment by Timo — January 22, 2011 @ 11:54 pm

      • dom4j is really, really old. Like, from 2004 or something. The build script is a Maven1 build script (Maven has hit version 3 meanwhile). TBH, I don’t exactly remember how I managed to build it. Truth be told, I wouldn’t even recommend using it anymore, since back when I made that change, there was no XPath support in Android. As of 2.2, there now is: http://developer.android.com/reference/javax/xml/xpath/package-summary.html

        You may want to re-consider whether dom4j is the way to go. It’s legacy software.

        Comment by Matthias Käppler — January 23, 2011 @ 12:58 pm

    • see my reply to Tim on where you can download the fixed JAR that I built.

      Comment by Matthias Käppler — January 24, 2011 @ 10:04 am

  19. We are using another library that depends on dom4j 1.6 version to be precise. It is very important for us to make it work as we have now android app that only works on 2.2 and not on lower version so we are probably covering only 10% of android users now. :(

    I even put up freelance project for 100$ to solve this.

    Comment by Tim — January 23, 2011 @ 8:10 pm

  20. The compiled dom4j.jar works but just as original gives some garbage together with parameters.
    It seems that some deserialization or proper parameters parsing is going wrong somewhere.
    In Android 2.2 it does work. I am not sure how much is changed between 2.1 and 2.2.
    Is it only XPath support?

    Comment by Tim — January 24, 2011 @ 10:57 pm

  21. From what I have found, dom4j doesn’t work on Android 2.1 but it works on 2.2. The first thing I want to do with dom4j is to convert my w3c Document to dom4j Document using this :

    org.dom4j.Document result = reader.read(source); where source is my w3c document

    It fails with a null pointer exception. The main root is that w3c Document implementation is different in 2.1 and 2.2. Any way to circumvent this ?

    Comment by Guillaume — December 13, 2011 @ 11:02 am

  22. Hi… recently i have to use dom4j because i need porting the POI(ooxml) to Android 2.3 and above. but i got a DocumentException from dom4j, because the sax2 driver of android give a null URI to dom4j, this make me mad…do you have any idea?

    Comment by NapoleonLiu@neusoft.com — January 10, 2012 @ 12:49 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 36 other followers

%d bloggers like this: