Use Custom Tags to Aggregate RSS Feeds into JSP-Based Web Apps

by Simon Brown

With the abundance of news and blog (Weblog) sites growing continuously, keeping track of what’s going on can be a daunting task. Fortunately, standards such as RSS (Really Simple Syndication) provide an easy way to grab content from a particular site and aggregate it into a news reader application. This means that, rather than looking for news yourself, your news reader monitors sites that you’re interested in and downloads new content as it’s published.

This is a great model and many people have taken the concept to the Web by aggregating other content on their own Websites and offering aggregation services via the Web. In this article, I’ll show you how to use JSP custom tags to implement this type of functionality as a reusable component within your own JSP-based Web applications. Although I’m assuming that you’re comfortable with Java and you have some working knowledge of building Web applications with JavaServer Pages (JSP), I won’t assume any knowledge of JSP custom tags. Before we get started, download the code we'll use in this article here.

RSS and News Aggregators - a Brief History

The RSS standard has been around for the last few years, but only recently has it really started to catch on. One of the reasons behind this is that software like MovableType and Radio Userland have made blogging available to the masses in an affordable package. Where previously, news sections of Websites contained snapshots of a particular company or individual, and were rarely updated, these RSS tools allow us to take a more dynamic approach, providing an easy mechanism by which to add new news items to sites.

Thanks to these tools, and the proliferation of news being presented via the Internet, keeping up to date is a much more difficult task than ever before. For this reason, a standard format was defined to allow news to be syndicated and aggregated through desktop applications called news readers.

The result was that RSS, the Really Simple Syndication format, was born. In essence, RSS is simply an XML document that can be used to describe the content available on a given Website. Typically, “content” means news items, but other uses of RSS include summarising articles, short stories, and so on. A good example of an RSS feed is the UK news from the BBC. The introduction of the RSS standard format made the aggregation of content much, much easier than before.

Reading RSS Feeds from Java

Since RSS feeds are nothing more than standardised XML documents, reading and processing RSS is fairly easy in any language that provides support for XML. Now that J2SE 1.4 provides integral support for XML, it’s simply a matter of using the appropriate classes to read in the XML document. Once the document has been read, presenting it back to the user in a desktop or Web-based application is then trivial.

For the purpose of this article, I’m going to show how to aggregate content from an RSS feed into your own Web applications. The use of this technique can be applied across a wide range of applications, from corporate intranet sites aggregating content from various departments, to personal Websites aggregating content from friends and family.

Before we talk more about custom tags, let's quickly look at how we will read in the RSS feeds using Java code. Manipulating XML documents in Java code can be tedious and therefore, rather than code against the raw XML, I've chosen to build a very simple object representation of the RSS feed. The first class is called RssFeed, and this represents a given RSS feed that contains a number of items.

package rss;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;

public class RssFeed {

 private Collection items = new ArrayList();

 public void addItem(RssItem item) {
   items.add(item);
 }

 public Collection getItems() {
   return Collections.unmodifiableCollection(items);
 }

}

In reality, RSS feeds have other characteristics, but the collection of items is enough for our purposes. Next up is the Java class that represents an item in a RSS feed. Typically, several pieces of information will be presented for each item in the feed; I've chosen to wrap these within an object. The information that’s relevant for each item in this example is its title and a link (URL) back to the full story on the Web.

package rss;

public class RssItem {

 private String title;
 private String link;

 public String getTitle() {
   return title;
 }

 public void setTitle(String title) {
   this.title = title;
 }

 public String getLink() {
   return link;
 }

 public void setLink(String link) {
   this.link = link;
 }

}

The final piece of the puzzle is the class that will actually read the XML document and transform it into our object representation -- the RssReader class. Instead of getting bogged down with the syntax and semantics of reading XML files, the implementation of this class has been omitted. Essentially, all that the read() method does is access the RSS feed at the specified URL and convert each item contained within the feed into an RssItem object.

package rss;

public class RssReader {

 /**
  * Reads the RSS feed at the specified URL and returns an RssFeed instance
  * representing it.
  */
 public RssFeed read(String url) {
   ... body of method omitted ...
 }

}

This is all the logic we need to read RSS feeds. Let's now take a look at how to hook this into a JSP page.

Reading RSS Feeds from JSP

For the purposes of this article, I’m going to assume that you’ve chosen to build your Web application using Java Web technologies and, specifically, that you’re going to use JavaServer Pages (JSP). One of the great things about this technology is that it makes the injection of dynamic behaviour into your pages easy -- all you need to do is add a little Java here and there. For example, we could use the classes we've just built in a JSP page:

<%
 RssReader reader = new RssReader();
 RssFeed rssFeed = reader.read("http://www.acme.com/rss.xml");
 Iterator it = rssFeed.getItems().iterator();
 while (it.hasNext()) {
   RssItem rssItem = (RssItem)it.next();
%>
   <a href="<%= rssItem.getLink() %>"><%= rssItem.getTitle() %></a>
   <br />
<%
 }
%>

This code grabs the RSS feed from the specified URL and, using the standard java.util.Iterator class, loops over each item displaying a hyperlink back to the full story.

While embedding Java code as a scriptlet is useful, it can soon become problematic -- particularly if you need to reuse that code on other pages within your Web applications. After all, there is no easy way to reuse such code; to simply copy and paste the code throughout your application will eventually lead to maintainability issues, as changes need to be replicated in every occurrence of that script. Ideally, we want to reuse code at the component level, taking a given component and using it wherever necessary. JSP custom tags are the answer.

Wrapping the RSS Reader in a JSP Custom Tag

JSP custom tags are a means to wrap common and recurring logic into a form that’s reusable across the pages of your Web applications. In the current version of the JSP specification (1.2), the behaviour of a custom tag is implemented as a Java class that activates a specific interface, in much the same way you’d implement a servlet, for example. This Java class is generally called the tag handler.

The key difference between custom tags and regular Java classes lies in the way they’re used within JSP pages. Where Java code is simply embedded in the page, custom tags are used with an XML syntax. For example, here’s how the custom tag will be used when we’ve finished building it. Notice that here we have a starting tag, some body content, then the ending tag.

<rss:rssFeed url="http://www.acme.com/rss.xml">
 <a href="<%= rssItem.getLink() %>"><%= rssItem.getTitle() %></a>
 <br />
</rss:rssFeed>

This will achieve exactly the same result as the scriptlet code we saw before -- the tag provides the iteration over each item in the feed. Essentially, the tag is now the looping construct and the body content of the tag is evaluated for each iteration, in the same way that the content of a while loop is evaluated for each iteration. Custom tags provide us with a way to build cleaner, more concise JSP pages that benefit from the advantages of reusing components -- increased maintainability, quality, reliability, and so on.

Building a Tag Handler Class

Now that you have an understanding of why custom tags are useful, let’s take a look at how to build one. As I said before, the behaviour is wrapped up as a regular Java class that implements a specific interface. In the case of custom tags, the interface that must be implemented (at a minimum) is Tag, from the javax.servlet.jsp.tagext package. This interface is fairly straightforward and provides a number of callback methods that will be executed when the tag is used on a JSP page. At runtime, the JSP page creates for a given custom tag a new instance of the tag handler class, and calls the callback methods. This might sound complicated at first, but it becomes fairly straightforward once you’ve built a couple of tags for yourself.

For convenience, the JSP specification also provides a basic implementation of this interface, called TagSupport, which is an ideal starting point from which to build your own tags. The following code snippet shows the start of the source code for the tag handler, including all the necessary imports. It also shows a couple of attributes -- we’ll see where these are used later on.

package tagext;

import java.util.Collection;
import java.util.Iterator;

import javax.servlet.jsp.JspException;
import javax.servlet.jsp.PageContext;
import javax.servlet.jsp.tagext.TagSupport;

import rss.RssFeed;
import rss.RssReader;

public class RssFeedTag extends TagSupport {

 private String url;
 private Iterator iterator;

In the tag usage example I presented earlier, we saw that there was a starting tag, some body content, and the ending tag. These different aspects of the tag are important because they define when the JSP page will fire the callback methods that we write in our tag handler class. The functionality we need to implement is identical to that shown before; we need to read in an RSS feed from a URL, then iterate over each item in the feed so that a hyperlink can be generated. With the TagSupport class, the three callback methods that are available to us are doStartTag(), doAfterBody() and doEndTag(). Let’s look at each of these in turn.

The doStartTag() method is called when the starting tag is encountered on the JSP page, and is called only once for any given custom tag.

 public int doStartTag() throws JspException {
   RssReader reader = new RssReader();
   RssFeed feed = reader.read(url);
   iterator = feed.getItems().iterator();

   if (iterator.hasNext()) {
     pageContext.setAttribute("rssItem", iterator.next());
     return EVAL_BODY_INCLUDE;
   } else {
     return SKIP_BODY;
   }
 }

Since this code is only called once per tag, and is the first of the callback methods to be fired, it’s here that we can read the RSS feed and set up an iterator to loop over the collection of items. If you think of this method as the start of the while loop, the first step is to check that there are items in the collection. If there are, we want to make the first item available for use in the JSP page, so that we can utilise it within the body content of the tag. To do this, we set an attribute on the JSP PageContext to refer to the first item in the collection. Next, we must tell the JSP page that the body content of the tag should be evaluated, which is achieved by returning a constant value of EVAL_BODY_INCLUDE. If, however, there aren’t any items in the collection, we tell the JSP page that it should skip evaluating the body content by returning the constant value of SKIP_BODY.

The next callback method of interest is the doAfterBody() method, which is called after the body content of the tag has been evaluated.

 public int doAfterBody() throws JspException {
   if (iterator.hasNext()) {
     pageContext.setAttribute("rssItem", iterator.next());
     return EVAL_BODY_AGAIN;
   } else {
     return SKIP_BODY;
   }
 }

Once the body content is evaluated, the next step is to see whether there are any more items in the collection. If there are, we want to make the new item available to the JSP page and indicate that the body content should be re-evaluated, by returning a constant value of EVAL_BODY_AGAIN. After this, the doAfterBody() method is called again, to see if yet another evaluation is required. This sequence repeats until there are no more items in the collection, in which case a constant value of SKIP_BODY is returned.

When all the evaluations have been performed, the final callback method is executed.

 public int doEndTag() throws JspException {
   return EVAL_PAGE;
 }

This implementation simply tells the JSP page that the remainder of the page should be processed as normal. In fact, the implementation of the TagSupport class contains the same implementation of this method, meaning we don’t actually need to implement it. However, I’ve shown it here for completeness.

The final method we need to implement is a setter for the URL. If you look back to the example of how the tag will be used on the page, you’ll notice that the URL of the RSS feed is specified as an attribute of the custom tag. To make this information available to the tag handler class, we need to write a setter method in the same way we’d write setter methods for JavaBean properties. Any setter methods that correspond to attributes of a custom tag are called before the callback methods, so that their values are available for use within those callback methods.

 public void setUrl(String url) {
   this.url = url;
 }

}

Compiling the Tag Handler Class

To compile the tag handler class, make sure you have the Servlet and JSP classes in your classpath. If you’re using Tomcat 4, these can be found in the $TOMCAT_HOME/common/lib/servlet.jar file. The resulting class file should be placed under the WEB-INF/classes directory of your Web application, in the appropriate directory/package structure.

Describing the Tag

The next step is to describe the custom tag using an XML file called a tag library descriptor, or TLD file. This step is necessary because it allows us to define how the custom tag will be used on the page, the name of its attributes, and so on. Starting at the top, we have all the usual XML header information:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE taglib
 PUBLIC "-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.2//EN"
 "http://java.sun.com/dtd/web-jsptaglibrary_1_2.dtd">

Next, we have the start of the tag library definition. Although custom tags are reusable, they must be defined within the context of a tag library -- a collection of one or more tags that are typically related in some way. This block of XML allows us to define the version of our tag library, the required version of JSP, a short name, and a description of the tag library.

<taglib>

 <tlib-version>1.0</tlib-version>
 <jsp-version>1.2</jsp-version>
 <short-name>rss</short-name>
 <description>
   Tags used to present RSS information.
 </description>

Following this is the definition of the tags in the tag library. Here we have only one, which defines the name of the tag as it will be used on a JSP page, the name of the tag handler class, the body content type and, again, a short description.

 <tag>
   <name>rssFeed</name>
   <tag-class>tagext.RssFeedTag</tag-class>
   <body-content>JSP</body-content>
   <description>
     A tag to present a headlines/titles of items from an RSS feed.
   </description>

Of these, the body content probably needs some more explanation. The JSP specification provides several body content types that can be defined for custom tags, with the two most useful being empty and JSP. A body content type of empty indicates that the custom tag will be used without body content on the page, which is handy when you simply want the tag to perform some sort of action. On the other hand, a body content type of JSP indicates that there will be regular JSP constructs used between the start and end tags. This is what we’re using in this example, because we’d like the body content to be evaluated for each item in the RSS feed.

The next part of the XML file describes the scripting variable that will be introduced into the page within the body content of the tag. In the tag handler code, we get the next RSS item from the collection, then place the reference to that object in the page context under the name rssItem. One of the things custom tags can do is make these attributes available as scripting variables on the JSP page, so they can be accessed with the request-time expression syntax of <%= … %>. Here, we specify the name and type of the variable, along with a scope of NESTED to indicate that the variable should only be accessible between the starting and ending tags.

   <variable>
     <name-given>rssItem</name-given>
     <variable-class>rss.RssItem</variable-class>
     <scope>NESTED</scope>
   </variable>

The final aspect of the tag to describe is its attributes. In this example there is only a single attribute, called url, which is used to indicate the source of the RSS feed. To ensure that the tag works as expected, we've stated that this attribute must be supplied when the tag is used. The rtexprvalue element of the attribute tag says that the value of the attribute must be statically defined in the JSP page. In other words, the value of the attribute isn't the result of a request-time expression.

   <attribute>
     <name>url</name>
     <required>true</required>
     <rtexprvalue>false</rtexprvalue>
   </attribute>
 </tag>

</taglib>

Using the Tag

For the purposes of this example, let's assume that the TLD file has been saved as rss.tld under the WEB-INF directory of your Web application. To use a custom tag, you first need to tell the JSP page where to find the description of that tag. This is achieved through the taglib directive, with the uri attribute pointing to the TLD file that represents the tag library, and the prefix attribute stating how the tags in that tag library will be identified and used on the page. Then, using the same syntax as before, we can use the tag to read the RSS feed provided by any Website, and generate a set of hyperlinks to the current news stories on that site.

<%@ taglib uri="/WEB-INF/rss.tld" prefix="rss" %>

<rss:rssItems url="http://www.sitepoint.com/rss.php">
 <a href="<%= rssItem.getLink() %>"><%= rssItem.getTitle() %></a>
 <br />
</rss:rssItems>

Future Enhancements

The tag presented here is fairly simple in its implementation, and there are many enhancements that could be made. For example, every time the JSP page is requested, the tag opens up an HTTP connection to retrieve the contents of the RSS feed. While this is okay for a low traffic site, a better solution would be to cache the feed on a regular basis. This would avoid the performance penalty associated with opening a network connection for every page request.

Also, the tag doesn't take into account what happens if a network error occurs. For example, the Website might be down or may not be functioning correctly. In any case, you would probably want to add some error handling, perhaps to display a message to alert users that the feed isn't currently available.

Summary

In this article we've looked at what RSS is, how to read RSS feeds, and how to integrate this functionality into a JSP-based Web application. Although we could have built this functionality directly into the JSP page using Java code scriptlets, developing a JSP custom tag has allowed us to build a more maintainable component with the added advantage that it’s reusable, too.

Building the tag handler class and writing the TLD file does involve slightly more work than would embedding Java code into the page. However, I believe that the benefits in maintainability and reusability easily justify the additional effort involved.