Collections in Java
Part 1 - The List Interface

Prior to Java 2, there were three basic ways to hold a group of objects: arrays, Vector, and Hashtable. Since Java 2, we have a much improved group of Collection classes to fill virtually any need. There are two main interfaces that inherit from the Collection interface: List and Set. Set objects do not allow duplicates and do not maintain their objects in the order they are added. List objects allow duplicates and maintain their objects in the exact order they are added. In this article we will look at the three classes that are implementations of the List interface: Vector, ArrayList and LinkedList.

Here is a look at the List objects and their relationship to the Collection interface:


The most efficient way to store a group of objects is still an array. Arrays are efficient both in storing and in randomly retrieving entries. The array is a simple linear sequence, which is why it is very efficient. In addition, an array can keep track of type information so that you can avoid having to cast objects to their proper type. With this efficiency comes a price. The size of your array must be allocated before it is used and it is not expandable. However, if you know how many objects you will need to store, then an array is your probably your best choice.

The three implementations of the List interface are Vector, ArrayList and Linked List. Which one you should choose will depend on the requirements of your implementation. An ArrayList is very efficient for getting objects from the list. The LinkedList is more efficient at insertions into and deletions from the list. It also performed very well at iterating through the list. The Vector performed worst of all 3 List objects. The Vector class is less efficient than the ArrayList class mainly because the Vector class is synchronizedI would recommend that that you run your own tests simulating your production environment to determine which List object will perform best for you.

Testing of various List objects using 50,000 repetitions.

Type

Get

Iterate

Insert

Remove

array 297 610    
ArrayList 406 1453 281 6344
LinkedList 3328 1032 62 16
Vector 1218 3125 297 6609

The Collection interface


The Collection interface is the parent interface of the List and Set interfaces. It contains many of the most commonly required methods to process a collection. The Collection interface supports sequential processing of the Collection. The methods to support random access of a Collection are found in the List interface. Here are some of the key methods of the Collection interface:

Using an Iterator

The Collection interface does not supply any get( ) methods, so you will need to use the Iterator class to get items from the Collection object. The Iterator class makes sequential processing of a Collection very easy. The key methods of the Iterator class are:

Here is an example of using an Iterator with an ArrayList:

Iterator myIterator = myCollection.iterator( );
while (myIterator.hasNext( )) {
System.out.println(myIterator.next( ));
}


Iterator and ListIterator (which we will discuss shortly) are fail-fast. That is, after the iterator is created if the Collection is changed by any method other than a method found in the Iterator class (remove or add) then the Iterator will throw a ConcurrentModificationException. However, Sun warns that it is possible that under some circumstances the Collection may not detect the change and the Iterator could exhibit "arbitrary, non-deterministic behavior". You should be careful to insure that your Collections are synchronized if they can be updated by multiple threads.

Using the List interface

If random access of the List is required, then the Collection interface cannot be used since it does not have the methods required. The Collection interface is the parent of both the List and the Set interfaces and since random access is not meaningful in the Set classes no random access methods are supplied by the Collection interface. The List interface does have methods for randomly accessing a Collection.
Some of the additional methods supplied by the List interface are:

Using the Listlterator interface

The Listlterator is a child of the Iterator interface and adds additional functionality that the Iterator interface does not provide. The additional methods allow for reverse traversing of the collection and for inserting or replacing elements from the collection. The Listlterator includes all the methods of the Iterator interface and adds these methods:

The ArrayList class

The ArrayList class is the simplest implementation of the List interface. The implementation of the ArrayList is an array that automatically reallocates itself to a larger capacity whenever it runs out of room. That is, when the array is full a new larger array is created and all the items are copied into this larger array. The ArrayList is very efficient for locating an element in the list and for inserting objects at the end of the list. The ArrayList class has only a few methods in addition to those found in the List interface:

Allocating an ArrayList

By default, ArrayList objects are allocated to be large enough to hold only a few objects. When the limit is reached, a new ArrayList is allocated larger than the previous one and the old ArrayList is copied to the new ArrayList. With the ArrayList class the default initial capacity is 10 and the formula to increase the capacity of the ArrayList each time it reaches its limit is (oldCapacity * 3)/2 + 1. Assuming that we use Collection c = new ArrayList( ) for an ArrayList which will hold 100 elements, we will guarantee that our ArrayList will be allocated 7 times (10 - 16 - 25 - 38 - 58 - 88 - 133). By allocating the ArrayList using Collection c = new ArrayList(100) we can avoid the overhead of reallocating and recopying our ArrayList.

The LinkedList class

The implementation of a LinkedList is a doubly-linked list. That is, when the list needs to expand, it allocates a new block and adds a link from the end of the original block to the new block and from the new block to the original block. In this way the list can be traversed in either direction. The LinkedList is very efficient at insertions into and deletions from the list. It also performs very well when iterating through the list.

The LinkedList class provides the required methods to simulate queues, stacks, and deques. The additional methods of the LinkedList are:

The additional methods are designed to allow you to use method names that will simulate queues, stacks, and deques. In actual use, there should be little difference between getFirst( ) and get(0) although the comments in the code suggest that the stack/queue/deque operations may run slightly faster.

The Vector Class


As we already mentioned, the Vector class is the only one of the three List objects which is synchronized. The Vector class was included in Java prior to the development of the Collection and List interfaces. It has been retrofitted into the List interface without removing the methods which were originally in the class. This is the reason that the Vector includes both an add method and an addElement method. When you use the Vector you should use only the methods found in the List interface so that it will be easy to switch to another List object if it becomes desirable.

Conclusion


We have taken a quick look at the main components of the List interface. As you can see, which List object you wish to use will depend greatly upon how you intend on using it in your production environment. You may wish to do performance checking using the various List objects to determine which is most suited for your needs.

Next month we will take a tour of the Set interface and explore the classes implementing this other main child of the Collection interface.