Touring the Commons - part 2

by Jason Menard (jason@javaranch.com)

This is the second in a three part series discussing elements of the Jakarta Commons project. The first part discussed the Commons BeanUtils package, the second part discusses the Commons Digester package, and the final part will demonstrate using these packages together to build a framework for dynamically generating data transfer objects.

Introduction

Last month we took a look at using the Jakarta Commons BeanUtils package to dynamically create and manipulate JavaBeans. This month we take a look at another very useful component of the Commons project - Digester. Digester is a component that facilitates the easy mapping of XML documents to Java objects.

The reasons why you might want to map an XML document to a set of Java objects are nearly limitless. A primary reason you might do this is to extract configuration information stored in an XML file to use in your own classes. Maybe you need this capability simply because "that's where the data is". XML is a popular format for storing a large variety of information, and having the ability to easily map this data to your own set of application specific objects is nothing to sneeze at. Regardless what your motivation is, it's very likely that Digester is just what you've been looking for.

The Problem: A Collection of Book Reviews

We have an XML file containing the book reviews here at JavaRanch, along with information about the books being reviewed. We want to write a Swing application to add new reviews and edit existing reviews. It is obvious to us that we need a way to import the XML data and have it mapped to a set of Java classes which we will use within our application. We decide to use Digester to handle the object mapping for us.

First, let's take a look at a snippet of our XML. See reviews.xml for the entire file.

<books>
  <book>
    <title>Design Patterns</title>
    <category>Design Patterns, UML, and Refactoring</category>
    <edition number="1">
      <author>
        <lastName>Gamma</lastName>
        <firstName>Erich</firstName>
      </author>
      <author>
        <lastName>Helm</lastName>
        <firstName>Richard</firstName>
      </author>
      <author>
        <lastName>Johnson</lastName>
        <firstName>Ralph</firstName>
      </author>
      <author>
        <lastName>Vlissides</lastName>
        <firstName>John</firstName>
      </author>
      <isbn>0201633612</isbn>
      <review>
        <rating>8</rating>
        <content>The most popular computer science book of all time...</content>
        <reviewer>Paul Wheaton</reviewer>
	<ranchCategory>trailboss</ranchCategory>
        <reviewDate>
          <month>1</month>
          <year>2000</year>
        </reviewDate>
      </review>
    </edition>
  </book>
</books>

This looks straightforward enough. We have a collection of books, each of which has a title, a category for the type of book it is, and one or more editions. We see that each edition has one or more authors (each having first and last names), an ISBN number, and one or more reviews. Taking a look at the reviews, we can see that each review is given a rating, has the month and year the review was done, the text of the review, and the name and JavaRanch category of the person who did the review. Based on this information we come up with Books.java, Book.java, Edition.java, Author.java, and Review.java. Since the review month and review year are one-to-one mappings with Review, we made them properties of Review since there was no need for a separate class just to store those two pieces of information.

Let's take a look at our Book class.

public class Book {
    private String title;
    private ArrayList editions = new ArrayList();

    public Book() {
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public List getEditions() {
        return editions;
    }

    public void addEdition(Edition edition) {
        editions.add(edition);
    }
}

The only thing to really note here is that there is an addEdition() method for adding an Edition object to an instance of Book. This is done throughout the classes we wrote where a class is maintaining a collection of objects, just like Book has a collection of Editions. We will need to have methods like this when we start working with Digester.

Now that we have some XML and some classes to map the XML to, let's take a look at using Digester.

Inside Digester

Digester uses three main concepts that must be understood in order to effectively make use of it: the object stack, element matching patterns, and processing rules.

The Object Stack

The Digester class maintains an internal stack of objects. Typically objects are pushed onto the stack or popped off of the stack according to processing rules. The Digester class contains the typical operations for manipulating the stack: clear(), peek(), push(), and pop().

Just to make sure we are all clear on how a stack works, let's walk through a quick example. I have a stack 'S' which is currently empty. I will represent it like so:

===
 S

Next let's push an Object 'A' onto the stack using the push() method. Our stack now looks like the following:

 A
===
 S

After pushing two more objects on the stack, 'B' and 'C' in that order, the stack is in the following state:

 C
 B
 A
===
 S

A call to peek() would return a reference to 'C', but the stack would remain in its current state. If we were to now pop() the stack, we would get a reference to 'C' and our stack would be left in the following state:

 B
 A
===
 S

Calling clear() would naturally remove all objects from the stack:

===
 S

Element Matching Patterns

Digester uses element matching patterns to describe the location of elements of your XML document relative to each other. You can think of it as similar to a directory hierarchy. Looking back at our XML document we can see that the first element in the document is the <books> element. The element matching pattern for the <books> element is simply:

books

Getting back to the analogy of our XML document as a directory tree, if we can imagine our XML document as being the "directory" that we are currently in, then referencing the <books> element with the element matching pattern that is simply "books" seems to make sense.

Well how about the <book> element? Our XML document lays it out like this:

<books>
  <book>
The element matching pattern for this structure is:

books/book

Once again, if we could imagine our XML document as a directory tree structure, this makes perfect sense. Let's try one more. What would the element matching pattern be for the author's last name? If you said...

books/book/edition/author/lastName

...then you are absolutely correct. Nothing too fancy here. It's a fairly straightforward concept.

Processing Rules

Processing rules are used by Digester to determine what actions need to be taken when the parser encounters a given element matching pattern. Processing rules typically are used to manipulate the stack (for example, popping an object off of the stack), create objects, and manipulate objects on the stack. Digester's processing rules are classes that implement the Rule interface.

Processing rules are fired when the parser matches the rule's element matching pattern. In other words, when the parser encounters the specified element in the XML document, the rule is triggered. Just like an XML element has a beginning tag, body content, and a closing tag, a Rule has a begin() method, a body() method, and an end() method. The begin() method of the Rule is called when the beginning of the XML element is encountered, the body() method is called when the body of the XML element is encountered, and the end() method is called when the end of the XML element is encountered. When parsing is complete, the finish() method of the rule is called, in order to take care of any resources which might need to be cleaned up.

While you may write your own rules, and in fact may often need to do so, the Digester package comes with several common Rule implementations. The Digester class has methods to register these processing rules. We'll take a look at a few of the more common processing rules and the Digester methods which register them.

ObjectCreateRule - This rule is used to create an instance of an object. The object is created by callings its no-argument constructor and pushed onto the stack when the begin() method is called. When the end() method is called, the object is popped off of the stack. To register an ObjectCreateRule that creates a new Book object whenever the <book> element is encountered, we make a call to Digester's addObjectCreate() method:

Digester digester = new Digester();
digester.addObjectCreate("books/book", Book.class);

We can see that addObjectCreate() take an element matching pattern and the class of the Java object to be created. In fact, every Digester method we'll look at that creates a Rule will take an element matching pattern as its first parameter.

BeanPropertySetRule - Sets a property of the bean that is the top object in the stack. By default, the property of the bean set is the same as the name of the current XML element being examined. The value set is the body content of the XML element.

digester.addBeanPropertySetter("books/book/title");

In the above example, setTitle() will be called on the bean on top of the object stack, setting that property with the body content of the <title> tag.

SetPropertiesRule - Sets a property of the bean that is the top object in the stack to the value of an XML attribute.

digester.addSetProperties("books/book/edition", "number", "number");

This example registers a SetPropertiesRule that calls the method setNumber()(the third argument of addSetProperties()) of the object on top of the stack, setting the value to the number attribute (the second argument of addSetProperties()) of the <edition> element.

SetPropertyRule - Sets a property of the bean on top of the stack to a value of an XML attribute. The property of the bean to be set is specified by another XML attribute.

For example, given the XML element <set-property property="name" value="jason"/>...

digester.addSetProperty("some/pattern/set-property", "property", "value");

...would call the setName() method of the object on top of the stack, passing the value "jason" as the argument.

CallMethodRule and CallParamRule - These two rules used in conjunction with each other allow the execution of arbitrary methods of the top bean on the stack. A CallMethodRule specifies the name of the method to call, the number of arguments that the method expects, and of course when it should be called as indicated by the element matching pattern. The method is called when the CallMethodRule's end() method executes. A CallParamRule indicates the values to be passed to method specified by the CallMethodRule. Values for the CallParamRule may be taken from either the character data of an XML element, or from an attribute of an XML element.

digester.addCallMethod("books/book/edition/isbn", "setIsbn", 1);
digester.addCallParam("books/book/edition/isbn", 0);

The first line in the above code specifies that the setIsbn() method should be called on the object on top of the stack, and that it takes one argument. The second line states that the character data of the <isbn> element should be passed as the first parameter (parameters are indexed zero-relative, so the first parameter has an index of 0) to the setIsbn() method. If we wanted to pass a parameter value from an XML element's attribute instead of its character data, our addCallParam() method might look something like this:

digester.addCallParam("books/book/edition", 0, "number");

In this case, the value passed will be taken from the number attribute of the <edition> element.

Looking back at the addCallMethod() method of Digester, one might think that to call a method that takes no attributes then we would use "0" as the value of the method's third argument. This is not the case however.

digester.addCallMethod("books/book/edition/isbn", "setIsbn", 0);

This example is a shorthand for the previous example where we used both addCallMethod() and addCallParam(). What this says is that we will call setIsbn() and pass the character data of the <isbn> element as the value of the argument to the method.

If these examples look like they accomplish essentially the same thing as a similar call to addBeanPropertySetter(), that is absolutely correct! The combination of addCallMethod() and addCallParam()do give us the flexibility to invoke arbitrary method calls however, which addBeanPropertySetter() does not provide for.

But what if we want to invoke a method that takes no arguments? In that case we can use the signature of addCallMethod() which takes only two methods - the element matching pattern and the name of the method, as in the following example:

digester.addCallMethod("some/pattern", "someMethod");

SetNextRule - This rule sets a property of the object that is next from the top of the stack, passing the object that is at the top of the stack as the value of the property being set. When the end() method is called, the object on top of the stack is popped and set to a property of the newly exposed top object. Typically a SetNextRule will be used to add one or more child objects to a parent object.

digester.addSetNext("books/book/edition/review", "addReview");

The example code shown here will call the addReview() method of the object that is next from the top of the stack, passing the object that is at the top of the stack as an argument. It is important to keep in mind that the stack is popped by this rule.

Using Digester

Once you understand the concepts outlined in the preceding sections, actually using Digester is really rather easy. All we need to do is follow a few simple steps and we are on our way.

The first thing we must do is create a new instance of Digester.

Digester digester = new Digester();

Now that we have our instance of Digester, we need to set any configuration properties that we might need. Configuration operations include whether or not the XML document should be validated against a DTD, the class loader to use for loading objects created by an ObjectCreateRule, and whether or not namespace aware parsing is enabled. There are a host of other configuration options, and I would urge you to read the API documents for further information.

digester.setValidating(false);

Next we must register our processing rules.

digester.addObjectCreate("books", Books.class);
digester.addObjectCreate("books/book", Book.class);
digester.addBeanPropertySetter("books/book/title");
digester.addObjectCreate("books/book/edition", Edition.class);
digester.addSetProperties("books/book/edition", "number", "number");
digester.addBeanPropertySetter("books/book/edition/isbn");
digester.addObjectCreate("books/book/edition/author", Author.class);
digester.addBeanPropertySetter("books/book/edition/author/lastName");
digester.addBeanPropertySetter("books/book/edition/author/firstName");
digester.addSetNext("books/book/edition/author", "addAuthor");
digester.addObjectCreate("books/book/edition/review", Review.class);
digester.addBeanPropertySetter("books/book/edition/review/rating");
digester.addBeanPropertySetter("books/book/edition/review/content");
digester.addBeanPropertySetter("books/book/edition/review/reviewer");
digester.addBeanPropertySetter("books/book/edition/review/reviewDate/month");
digester.addBeanPropertySetter("books/book/edition/review/reviewDate/year");
digester.addSetNext("books/book/edition/review", "addReview");
digester.addSetNext("books/book/edition", "addEdition");
digester.addSetNext("books/book", "addBook");

Once we register our processing rules we are almost home. In order to proceed we must identify our input XML document. In this example we are specifying a File, but the API specifies several other suitable input sources.

File input = new File(REVIEWS_XML);

If necessary, push any objects onto the stack that we need to. Finally all we need to do is call parse(), which returns the root object of the stack.

Books books = (Books)digester.parse(input);

That's all there is to it!. See ReviewDigester.java for the complete source code.

Conclusion

In this article we've seen just how easy it can be to create a mapping between an XML document and a hierarchy of Java objects. We focused on using the Rule implementations that Digester provides for us. Next month we will learn how to write our own Rule implementations, as well as what to do when you want to create an object using a constructor that takes arguments.

Digester has a few other tricks up its sleeve that we won't have time to look at. These include specifying our processing rules in XML and pluggable rules processing, among others. Take a look at the API documentation for more information.

Resources

Please feel free to email me with any questions or comments concerning this article.