Advanced Ant Techniques, Part I
by Ajith Kallambella
Part I of a series

Introduction

This January Ant celebrated its 6th year birthday. In January of 2000, the Ant tool, which was created by James Duncan Davidson to build his Apache project called Tomcat, was moved out of the Tomcat source and into a separate Apache project. Since then it has brought the words "build" and "manageability" closer than ever. Ant gained popularity because of its simplicity, extensibility and cross platform support and within a short span of time has become the build tool of choice for a large number of commercial and open source projects alike.

Although Ant has been extremely successful as an enterprise build management tool, most Ant implementations do not use some of the advanced features provided by the tool. In this article, we'll look at a few advanced techniques that can unleash the true powers of Ant, and turn this mere build tool into an indispensable enterprise asset. Examples are provided where applicable, but lack of detailed code is intentional. It is important to focus on the concepts, not so much on the implementation. Software is a writing. Ideas expressed in software, just like in art, music, plays and other media, can be done in many unique ways. Familiarity with Ant tool and some experimentational spirit is all that you need to turn these techniques into real working code.

Technique: Divide and Conquer, Chain and Fail

As projects grow in size, Ant scripts follow the suit, and over a period of time become monolithic and unmanageable. As large efforts are broken into multiple parallel development efforts, multiple projects with a fair degree of dependency are born. The stake in a stable, reliable build tool across the enterprise becomes even more important. In this installment of "Advanced Ant Techniques," we will look at some methods that will help simplify the build scripts and strategies that allows the growth of your software development shop without compromising manageability.

The theme is "divide and conquer": break up that single build file into multiple logical chunks before it becomes a maintenance nightmare. Once you have created these smaller build modules, the theme is to find ways to tie them together so that they work in synergy. Dividing and chaining are simple and powerful techniques that helps to keep the maintenance overhead down.

Design by contract using <import>

When you start to look across your build files, you'll invariably find some common structures: the ubiquitous <init> task, or the property setup tasks are some good examples. Move them into a common file and very soon you will end up with a build system consisting of multiple independent components, with some shared foundational code. With this, the need for the <import> task surfaces.

Most people tend to overlook the additional features provided by <import> task beyond enabling text includes. In particular, I'm refering to the concept of task overrides. When a file is <import>-ed into another file, the importing file can see all the tasks in the imported file as if they were local. When the importing file redefines some of these tasks, it is called task overriding.

The combination of task overriding and defining common tasks in a shared build file can be used to design build modules for controlled reuse. The strategy lends itself to the principles of design by contract with the shared file establishing the contract and the consumers of such shared files fulfilling the contract. Before it becomes too esoteric, lets' look at an example.

Imagine a shared build file commontasks.xml:
<project name="CommonTasks">
    <target name="init" depends="cleanup, init-properties">
      <mkdir dir="${build.dir}/classes"/>
      <mkdir dir="${build.dir}/lib"/>
    </target>
    <target name="compile" depends="init">
      <javac srcdir="${src.dir}" destdir="${build.dir}/classes">
        <classpath refid="classpath"/>
      </javac>
    </target>
    <target name="jar" depends="compile" if="jar.name">
      <jar destfile="${build.dir}/lib/${jar.name}" basedir="${build.dir}/classes"/>
    </target>
    <target name="deploy" depends="jar"/>
</project>

If you observe carefully, this build file is defining multiple abstract entities, each one serving as a contract for the main file ie., the importing build file. First, the abstract targets <cleanup> and <init-properties> must be implemented by the importing file. As the name suggests, the <init-properties> is expected to initialize various property names used throughout this file.

Next, the contract is verified by the <jar> target using the conditional if construct. The target will not be executed if the property jar.name is left undefined. Hopefully the importing file will have done this in the <init-properties> contract.

Finally, the no-op <deploy> target acts as a stubbed-out contract. To get any work done by this build file, this target must be overridden by the main file.

The main file can benefit on all the work done by the imported file, just by fulfilling a few contracts. When all contracts are fulfilled, the main file looks something like this:

<project name="MainProject" default="deploy">
      <import file="commontasks.xml"/>
      <target name="init-properties">
            <property name="src.dir"   value=".\src"/>
            <property name="build.dir" value=".\build"/>
            <property name="jar.name" value="myapp.JAR"/>
      </target>
       <target name="cleanup">
            ...
      </target>
        <target name="deploy" depends="jar">
            ...
      </target>
</project>

What have we learned? The combination of covariants (abstract targets) and invariants (concrete targets) lets you easily design reusable build modules. Generic targets can be built using deferred task definition and verification strategies to ensure that a particular contract has been fulfilled.

Refactor using <macrodef>

It is not uncommon to find same task being repeatedly invoked with a small variance. In many cases, if you parameterize the variance, then such tasks can be refactored and invoked with different argument repeatedly, just like a Java method call that is invoked with different arguments. Prior to Ant 1.6, the <Ant> and <antcall> tasks came in handy. For most part these two core tasks are similar, except the <Ant> task allows invoking targets in another build file whereas <antcall> restricted the invoked targets to the local file.

<target name="buildAllJars">
  <antcall target="buildJar">
    <param name="build.dir" value="tools"/>
  </antcall>
  <antcall target="buildJar">
    <param name="build.dir" value="moduleA"/>
  </antcall>
  <antcall target="buildJar">
    <param name="build.dir" value="moduleB"/>
  </antcall>
</target>
<target name="buildJar">
  <jar destfile="lib/${build.dir}.jar" basedir="${build.dir}/classfiles"/>
</target>

The problem is, <antcall> re-parses the build file and re-runs the targets, even though all that changes from one call to another is, in most cases, just the parameter values. Since Ant 1.6 a smarter alternative is available: the <macrodef> task. It offers all the benefits of <antcall> without the overhead of reparsing. It offers additional features such as ordered execution of nested tasks using the <sequential> construct. Optional attributes can be supplied with default values, and the caller may omit them.

<target name="buildAllJars">
  <buildJar build.dir="tools"/>
  <buildJar build.dir="module-A"/>
  <buildJar build.dir="module-B"/>
</target>

<macrodef name="buildJar">
  <attribute name="build.dir"/>
  <jar destfile="lib/${build.dir}.jar" basedir="${build.dir}/classfiles"/>
</macrodef>

We have now defined a macro and use that macro repeatedly within the same build file execution. And on top of it, it is more readable than the <antcall> approach. From what I have seen, most if not all uses of <antcall> in build files can be replaced by <macrodef>.

Chain and auto discover using <subant>

When you have multiple projects with a fair amount of dependency, and when new projects are initiated on a regular basis, extensibility becomes an important requirement.

An ideal build system should allow painless growth. By doing a little extra work, new build participants should be able to benefit from the existing foundation modules. A robust build is system like an assembly line in a software factory: it should lend it self to new components and dependencies without breaking existing functionality.

In reality, chaining works fine until it needs a change. Let's consider an example scenario. A large financial organization has multiple departments: banking, loans, insurance, customer service, etc., and each department initiates its own software development efforts.

They create separate enterprise applications which need to be vaguely aware of each other. Each software group writes and owns their build scripts, test scripts and so on. We can easily conceive of an enterprise build system that builds "the application," chaining all individual projects, and then runs integration tests. Everything works like a well oiled machine until the banking groups starts a new project. How do we include the new project in the master build? Well, we can edit the chain, and include another <antcall>. Like me, you may see a problem here, if you have to do this over and over again, every time a new thing happens.

This is where the concept of auto discovery plays in to the theme of chaining, and the usefulness of <subant> task. It allows pluggable build files without changing master build script. The master script can auto discover new build components and execute it in real time. Almost magically!

<subant> comes in two flavors: one the build file with different base directories, and the other executes same target defined in multiple build files.

The first flavor uses the concept of generic Ant file. If you name the build file consistently across your projects, it finds them, and executes the specific target in each. This is similar to executing <antcall> on a target, but on a list of stand alone build files, automatically setting the project's base directory each time.

The following build looks for files named "build.xml" in each subdirectory of the current directory. For each such build.xml it finds, it runs the <main> target.

    <project name="MasterBuild" default="buildAllSubProjects">
        <target name="buildAllSubProjects">
            <subant target="main" genericantfile="build.xml" >
              <fileset dir="."/>
            </subant>
        </target>
The second mode of <subant> lets you omit the genericantfile attribute instead, lets you supply the list of build files to iterate over, calling a specific target in each build file. This mode works as if an <Ant> task is invoked within a loop.
    <project name="MasterBuild" default="buildAllSubProjects">
        <target name="buildAllSubProjects">
            <subant target="main">
              <fileset dir="." includes="scripts/project-*build.xml"/>
            </subant>
        </target>

This script will automatically discover the build files for all the modules, as long as they are named per the convention "project-*build.xml" and is located in a scripts directory. With the capability to automatic discovery build files, if you add a new project, all you need to do is to follow simple convention of naming, structuring and locating the build files in a specific directory. As may now be apparent to you, the use of <subant> indirectly enforces consistent project directory and build file targets across multiple projects.

Being in the maelstrom of many large software development efforts, I have come to the realization that clean dependency is an oxymoron. In reality, the dependency tree across multiple projects looks more like a spaghetti ball than a tree. This is where the concept of horizontal dependency comes in to play.

What is a horizontal dependency? If projectA needs classfiles of projectB to compile, and projectC needs classfiles of both projectA and projectB, then the projects A,B and C are horizontally dependant on the <compile> target. Although <subant> is typically used to sub build projects in entirety, it becomes an invaluable tool for addressing horizontal dependencies. Rather than executing in one <subant> task in the order -

      projectB.<buildAll>, projectA.<buildAll>, projectC.<buildAll> ...
We could write two <subant> tasks. The first executes the horizontal <compile>:
      projectB.<compile>, projectA.<compile>, projectC.<compile> ...
And the second one would build them:
      projectB.<buildl>, projectA.<buildl>, projectC.<buildl> ...

Failing

The discussion on chaining is not complete without talking about failing.

The technique of divide and conquer often results in a chained build system with dependencies manifesting as contracts, property definitions, build time artifacts (classfile dependencies as described above in the horizontal targets example), third party tools etc.

A strategy for error detection and error handling is becomes essential to stop something bad from happening, before the negative effects are passed through the chain. This is no different than catching and throwing exceptions in Java.

The use of failonerror attribute stops execution when a build exception is detected. It is prudent to make use of this attribute anywhere and everywhere possible in a chained structure.

      <target name="buildAllSubProjects" failonerror="true">
            <subant target="main">
              <fileset dir="." includes="*/build.xml"/>
            </subant>
        </target>
The <fail> task provides ways to flag errors. Preconditions, predicates and assertions can be implemented using the conditional constructs provided by Ant namely, the <condition> element, and if and unless attributes.
  <fail message="Logging tools missing. Cannot continue build proces">
     <condition>
      <not>
      <available classname="org.mycompany.tools.CustomLogger"/>
      <not>
     </condition>
   </fail>
Checking the availability of a build time dependency artifact, seeing if a property has been set, ensuring that the server has been stopped before attempting to deploy etc are all examples of error scenarios where early detection is preferred over late recovery. It is always a good practice to attach a meaningful error message to the <fail> task so that the Ant displays something more useful than "BUILD FAILED". In the next installment of this series, we'll look at using Ant in agile development, especially in automating continuous integration processes.