JavaRanch Newsletter Articles in this issue :
Collections in Java Part 2 - The Set InterfaceThomas Paul Printable Version
Cindy's Segment - a Whimsical View of the WorldCindy Glass Printable Version
Cattle Drive UpdateJason Adam
Pauline McNamara
Printable Version
Book Review of the MonthCindy Glass
Madhav Lakkapragada
Printable Version
July GiveawaysCarl Trusiak Printable Version
July NewsCarl Trusiak Printable Version

Collections in Java
Part 2 - The Set Interface

by Thomas Paul

Last month we discussed the basics of the Collection interface and reviewed one of the children of this interface, the List interface This month we will look at the second child interface, the Set interface. There are three non-abstract implementations of the Set interface: TreeSet, HashSet, and LinkedHashSet (new to J2SE 1.4). Set objects do not allow duplicates and are not required to maintain their objects in the order they are added. HashSet will maintain items in an order designed for fast lookup. LinkedHashSet will maintain the items in the order they are added while also maintaining a hash for fast lookup. TreeSet will maintain items in a sorted order. A look at the performance difference of the three shows that HashSet is generally the most efficient.

Testing of Set objects using 50,000 repetitions.

Type

Contains

Iterate

Insert

Remove

TreeSet 6469 594 438 47
HashSet 5188 640 329 15
LinkedHashSet 5297 375 343 32


Figure 1: relationships between the classes and interfaces of the Set interface

The Set interface

The Set interface does not add any methods to the Collection interface which is its parent. Any class implementing the Set interface is not permitted to have duplicate entries. The formal definition of the Set interface requires that no two elements in the Set are equal [such that e1.equals(e2) is never true]. Since a Set can not have duplicates, the Set can contain no more than one null entry. A Set is not required to maintain the elements of the Set in the order they are added although the LinkedHashSet does maintain order.

Here are some of the key methods of the Set interface:

  • boolean add(Object) - Ensures the Set holds the argument. The Object is added only if it isn't already in the Set. Returns false if the Object was not added to the Set.
  • void clear() - Removes all elements from the Set.
  • boolean contains(Object) - Returns true if the Set contains the argument.
  • boolean isEmpty() - Returns true if the Set contains no elements.
  • Iterator iterator() - Returns an Iterator object which can be used to traverse through the Set. (See last month's article for information on the Iterator interface.)
  • boolean remove(Object) - Removes the argument from the Set. Returns true if the argument was removed.
  • int size() - Returns the number of elements in the Set.
  • Object[] toArray() - Returns an Object array containing all the elements in the Set.

The HashSet class

The HashSet class is the simplest implementation of the Set interface. The HashSet does not add any additional methods beyond those found in the Set interface. The HashSet achieves good performance by using a hash to store the Object in the Set. The hash allows fast lookup. Any Object placed in a HashSet must implement the hashCode( ) and equals( ) methods. We will discuss more of this in a little while.

The HashSet does not guarantee the order of the items in the Set and allows one null element. Duplicates are not permitted. The add(Object o) will return false if you attempt to add an item already in the HashSet. The HashSet offers "constant time" performance. This means that adding items to the Set will not cause significant performance degradation. The performance of basic functions such as add, remove, etc is based on two factors which can be specified in the constructor of the HashSet, initial capacity and load factor.

Choosing an initial capacity that is too high can waste space as well as effect performance. (The constant time performance of iterating through a HashSet is based is based on the total of the number of entries plus the capacity.) The default initial capacity is 16. The capacity should never be smaller than you expect entries in the HashSet and should be at least large enough so that the load factor does not cause the HashSet to increase in size. Sun recommends that a HashSet be allocated with a capacity of twice the predicted number of entries.

Load factor is a measure of how full the HashSet is allowed to get before its capacity is increased. When the number of entries exceeds the product of the load factor and the current capacity, the capacity is doubled. As an example, if the initial capacity is 100 and the load factor is .75 then the HashSet will double in size to a capacity of 200 when the HashSet reaches 75 entries. Generally, the default load factor (.75) offers a good trade-off between time and space costs. Higher values decrease the space overhead, but increase the time it takes to look up an entry. Load factor is always specified as a number from 0.0 to 1.0.

As I mentioned above, classes that are to be stored in a HashSet must implement the hashCode() and equals() method. Here is an example of this implementation. We will use this same class for the examples that follow.

public class Employee {
	private String socSecNum;
	private String empNum;
	private String name;
	private int department;

	public String getName( ) {
		return name;
	}
	public String getEmpNum( ) {
		return empNum;
	}
	public int getDepartment( ) {
		return department;
	}
	public String getSocSecNum( ) {
		return socSecNum;
	}
	public boolean equals(Object obj) {
 		boolean retValue = false;
    	if (obj instanceof Employee) {
			Employee e  = (Employee)obj;
			retValue =  empNum.equals(e.getEmpNum( ));
		}
		return retValue;
	}
	public int hashCode( ) {
		return empNum.hashCode( );
	}
}


The LinkedHashSet Class

New to J2SE 1.4, the LinkedHashSet class is used in exactly the same way as the HashSet class. It has no additional methods and has constructors that take identical parameters as the HashSet. The only difference is that the LinkedHashSet maintains the order of the items added to the Set. It does this by maintaining a doubly linked list containing the hash and the original order of the items. According to Sun, the LinkedHashSet should run nearly as fast as the HashSet.

The TreeSet class

The TreeSet class guarantees that the Objects in the Set will be sorted in ascending order by either the Objects natural order or by a Comparator provided to the constructor of the TreeSet. Searching for an item in a TreeSet will be slower than in a HashSet because the hashing algorithm gives better performance than the compareTo() method which is used to locate items in a TreeSet.

TreeSet does add several new methods to the Set interface.

  • Object first() - Gets the first element from the sorted set.
  • Object last() - Gets the last element from the sorted set.
  • SortedSet headSet(Object toElement) - Returns a view of the portion of this set whose elements are less than toElement.
  • SortedSet tailSet(Object fromElement) - Returns a view of the portion of this set whose elements are greater than or equal to fromElement.
  • SortedSet subSet(Object fromElement, Object toElement) - Returns a view of the portion of this set whose starting element is greater than or equal to fromElement and whose ending element is less than toElement.

Sorting Collections

You have two choices for determining how to sort items in the TreeSet. Whichever option you choose must guarantee that items found to be equal are really equal. In one of our examples we will be adding items to the TreeSet based on name. This will only work if none of our employees have the same name! Remember, the TreeSet does not accept duplicate entries.

The first option is to only use Objects that implement the Comparable interface in the TreeSet. For your own classes, you will need to implement the Comparable interface if you wish to store them in a TreeSet. The Comparable interface has a single method:

int compareTo(Object o)

The compareTo() method will return a negative number, zero, or positive number depending on whether the Object is less than, equal to, or greater than the Object argument. It is recommended that Objects that implement the compareTo() should ensure that if the equals() method of the Object return true then the same comparison using compareTo( ) should return zero.

(x.compareTo(y)==0) == (x.equals(y))

An example of a class using the Comparable interface we would make these changes to our Employee class:

public class Employee implements Comparable {
	// The comparison can be done on any field we choose.
	// For this example we have decided to sort by empNum.
	public int compareTo(Object obj) {
		Employee e  = (Employee)obj;
		return empNum.compareTo(e.getEmpNum( ));
	}
}

The second option is provide a Comparator for the TreeSet to use. The advantage of the Comparator is that we can create multiple Comparators that can be used to sort our TreeSet in multiple ways. Using the example above, we could add two Comparators; one for name and a different one for socSecNum.

public class Employee implements Comparable {
	public static final Comparator SOC_SEC_ORDER  = new Comparator( ) {
		public int compare(Object o1, Object o2) {
			Employee emp1 = (Employee)o1;
			Employee emp2 = (Employee)o2;
			return emp1.getSocSecNum( ).compareTo(emp2. getSocSecNum( );
		}
	};
	public static final Comparator NAME_ORDER  = new Comparator( ) {
		public int compare(Object o1, Object o2) {
			Employee emp1 = (Employee)o1;
			Employee emp2 = (Employee)o2;
			return emp1.getName( ).compareTo(emp2. getName( );
		}
	};
}

Now in a class that uses a TreeSet with Employee objects I can use this constructor for my TreeSet to create a TreeSet that will keep my Employees in name order:

TreeSet ts = new TreeSet(Employee.NAME_ORDER);

The same options used for TreeSet can be used for any Collection. A Collection can be sorted by using the static Sort method of the Collections class. (Notice the extra "s" at then end of "Collections".) So to sort an ArrayList of Employee objects by Social Security number, we would do the following:

Collections.sort(empArrayList, Employee.SOCIAL_SECURITY_ORDER);

Since Employee implements the Comparable interface, to sort by employee number we can simply do:

Collections.sort(empArrayList);

Conclusion

We have taken a quick look at the main components of the Set interface and also looked at how we can sort items in a TreeSet or in any Collection. As we have seen, which Set object you wish to use will depend greatly upon how you intend on using it in your production environment. Performance is less a concern in choosing among Set objects. The main concern is the functionality you require. If your only requirement is fast lookup, then you can use HashSet. If you need to be able to retrieve your items in the same order that you added them to the Set, then you would use the LinkedHashSet. And finally, if you need the items in your Set to be maintained in a specific sort order, then you would use the TreeSet.

Next month we will take a tour of the Map interface.


Return to Top

Cindy's Segment - a Whimsical View of the World

Segment 5:
Second Class Citizens
Or -
Living Life by the String Pool


Tenements! Let?s talk tenements. You think that life is all comfy here in JVMLand? You have visions of ClassFile Castles, with Royal Static Variables and rows of orderly Object Houses out there on the heap? Well, we have already talked about how "Not all Variables are created equal", but in truth, not even all OBJECTS are created equal!!

Life in the Constant Pool

In the courtyard of every ClassFile Castle (that would be the Castle built from the ClassFile plans) there is a Method area, where there is kept the actual documentation of how each Method is expected to behave. Carved out of this Method Area is a thing called the ?Runtime Constant Pool?, and a fine thing that it is, too. It is a gathering of all of those things that are going to be used by the Class and are known to be constant for the entire run of the JVM. This includes numeric values that are referenced in final variables. It includes the names of the methods of the class, the names of the fields of the class, the names of any interfaces of the Class, the name of the Class itself is a constant in the Constant Pool. Every ClassFile Castle has a map called the ?Constant Pool Table? that defines how this Runtime Constant Pool is laid out.

And why is this so? Well, if things are KNOWN to be constant, then they can just be stuck in a table that can be pre-indexed for speed and efficiency, and things just go whizzing along. This way you do not need to spell out a method name every time you want to invoke it. The compiler just uses a reference to where the name is in the Constant Pool Table and saves all that space. Then the Runtime Constant Pool holds the reference to where the method itself can be found in the Method area.

All of which would be great if things stopped there. But NO, some Gosling guy got the great idea of sticking String literals in there also. Sort of like letting the postman just find your box in the post office, instead of having to walk the sidewalks of the suburbs to find you.

So now you have String literals mapped in the Constant Pool Table in the ClassFile, with an actual reference in the Runtime Constant Pool in the Method area. And does all this just point to some instance out on the heap??? NO!! Instead of letting the actual instance of the String live out on the heap with all of the other objects, they are forced into a concentration camp called the String Pool. The actual location of this String Pool is top secret. Some say that it is in the Method Area. Some say that it is in a special area of the heap. Only the JVM knows for sure.

So ? well ? it doesn?t sound SO bad living in a Pool.

Branded ?8?

Now let me tell you how it REALLY is. First of all, in the Constant Pool Table you are forced to be marked with an ?8? so everyone can easily see that you are a String. Sort of like sewing a ?Scarlet Letter? on your sleeve. Other constants get other labels. Integer constants are marked ?3?, Field name references are marked ?9? etc. Every different constant type gets its own special label. But for heavens sakes, THOSE OTHER GUYS are not objects like Strings are.

In addition, you are forced to do the work of many objects. Once you are stacked in this String Pool, any other occasions where a class wants to use the same String value, they just point to you and make you do that also.

It is just miserable being a String. There is NO room for you to store other variables as part of your state. You are way overworked. But the worst part is that being a String you are NEVER allowed to change your value. And once you are slammed into that String Pool ? you are stuck FOREVER. (Well . . . maybe until your Class is unloaded and you go . . . ahhhh ? no need to get into religious issues).

Strings on the Heap

Now in all fairness, not ALL Strings are crowded into this concentration camp. If you happen to be lucky enough to have been created using the ?new? keyword instead of with literal syntax, then you DO get your own home in the suburbs. But even so, you are considered Second Class Citizens, and as such are not allowed to change your value EVER. At any moment you could be rounded up by the Gestapo and forced into the String Pool. All it takes is ONE person pointing at you and saying ?intern? and off you go to spend the rest of your existence in the tenements.

Kissing cousins ? StringBuffers

Needless to say, sometimes folks get a little irritated by this thing about you not getting to ever change your value. Actually it gets to them enough that even good friends sometimes desert you. And do you know who they end up with? Invariably they end up hanging out with the lucky branch of the family, the StringBuffers.

StringBuffers are never treated as shabbily as Strings are. They are never stuck in the String Pool. They can change their value as often as they want. And do you know WHY? Because behind every StringBuffer is an array doing the dirty work for him. And StringBuffers are ALLOWED to tinker with the values contained in the array. And the StringBuffers just discard the array and set up a new one if the need arises. Sort of like paying for your freedom at the price of the bodies of other objects.

Movement for String Rights

It really is sad how badly this one Class is treated. All the OTHER Classes get their own Object Homes on the heap. All the OTHER Classes allow the objects to change their values. None of the OTHER Classes re-use the same object over and over and over. Heck, even the pitiful arrays get to be treated as real objects, and THEY don?t even have a ClassFile Castle of their own. They live at the mercy of the JVM itself.

What we need here is someone to stand up for String Rights. Someone who will crusade for equality. After all, who CARES about efficiency when personal rights are at stake. What does performance matter when the lives of so many objects are impacted. Would YOU want to be one of the downtrodden, fodder in the gears of bureaucracy?

What we need is a spokesperson with charisma and candor. A force for right and justice. A Martin Luther String Jr. to lead us into a better world ? or JVM as the case may be.

Will YOU do it? Will YOU be the voice for String Rights. Come on. Join the multitudes of caring souls who yearn for a world where all objects are created equal.

Thank You for caring.

*******************************************************************

editorial comment from Jim Yingst:

Actually, it sounds like the String literals have a serious persecution complex where they think they're the only ones suffering. All String instances labor under many of the same restrictions, but you don't hear them whining about it. In fact the String literals seem oblivious to the security benefits of their nice little gated community, safe from roving gangs of GC thugs found elsewhere. It's actually a rather elitist place to live, where you can't get in unless you pass the intern() method's check for uniqueness. You'd think these literal Strings would show some compassion for their oppressed brethren out in the heap, but NOOOOO...

*******************************************************************

Copyright ? 2002. Cindy Glass. All rights reserved.


Graphics by Pauline McNamara and Stephanie Grasson

Next month - "Maintaining your Structural Integrity"- or - "Corned Beef Hash. . . . Codes"

?

?


Return to Top
Movin' them doggies on the Cattle Drive

It's where you come to learn Java, and just like the cattle drivers of the old west, you're expected to pull your weight along the way.

The Cattle Drive forum is where the drivers get together to complain, uh rather, discuss their assignments and encourage each other. Thanks to the enthusiastic initiative of Johannes de Jong, you can keep track of your progress on the drive with the Assignment Log. If you're tough enough to get through the nitpicking, you'll see your name on the Graduation Log.

Gettin' them doggies...
This month the number of Cattle Drivers wrastlin' Java code has hovered around 20. The majority are still in the OOP and Servlet corrals, but this month we got a fresh batch of Drivers out there takin' on Java Basics, and they ain't easy either! Activity in the JDBC corral is picking up too...

Fresh riders on the Drive...
Got a bunch o' new Cattle Drivers signed up on the assignment log, all jumpy and chompin' at the bit to drive them doggies. Big welcome to our latest batch of fresh riders: Craig Lewis, Julia Koelsch, Juliane Gross, Peter Berquist and Rob Ross. Enjoy the ride and hold on tight, it can get bumpy! A belated welcome to our own David O'Meara, hang in there mate, them doggies can be tougher than a pack o' alligators!

And a shiny spur goes to...
Once again, we got a pack o' spurs to hand out to those hardy souls who made it through a section of the Cattle Drive. Elouise Kivineva picked up her first silver spur for finishing Java Basics. Way to go, Elouise, have fun ropin' the rest!

Three tenacious Drivers roped up enough clean code to drive those OOP doggies: Joel Cochran, Josu? Cede?o and Ronald Schindler can be mighty proud of their second silver spurs. Nice ridin' guys!

A shiny golden spur for a Cattle Drive regular who just keeps on codin', Peter Gragert, who has been ridin' an' ropin' real steady. Way to ride 'em, Peter! Peter's real active on the Drive and has already tied up a JDBC assignment or two.

Just last month the dust was kicked up in the JDBC corral, and it looks to be stayin' pretty wild in there. Special congrats to Daniel Olson and Lance Finney, who've been in the saddle quite a while ropin' that JDBC code. Well they made it out o' there and now it's sarsaparilla time in the saloon! Don't be thinking about leaving town at sundown, y'all are welcome to hang out and give a hand to the dusty ones still on the trail - after the celebration of course!

Back from being out on the range...
Sometimes long lost Cattle Drivers untangle themselves from the sagebrush and actually make it back to the Drive. This month some familiar faces, with maybe an extra scratch or two, made it back to pitch in and code again. A hardy welcome back to Gerald Dettling, Rick Prevett and Ronald Schindler, who returned to wrastle some more assignments!

Saddle sore...
So close and yet so far, that's how it feels when you're just about to graduate from one of the Cattle Drive "schools." Greg Harris knows the feeling, he's working on the last of the Servlets. Hang in there Greg, you'll get to sit in a real chair soon!

Following the good example of Daniel and Lance, Jason Adam has been itchin' to get out of that JDBC corral. Keep yer spurs in tight, Jason, you can ride it out, fer sure!

Takin' a break to look after the critters...
Sure 'nough hate to have to say it: our bartender Jason Adam, who's been right busy at JavaRanch and on the Cattle Drive, is going to take a break to work on the homestead and plenty of other stuff keeping him busy outside JavaRanch. Thanks for all the hard work and dedication on the Drive, Jason. We'll be watching the horizon and listening for hoofbeats, hoping whole bunches that you'll be riding this way again real soon!

Takin' orders at the bar...
Matthew Phillips, a regular at the Cattle Drive, has picked up bartending duties to fill in the gap left by Jason. Appreciate the help Matt, those ranchhands can get rowdy sometimes!

Nitpicking is hard work too...
We know they're workin' reeeeeally hard, after all we've seen what those assignments look like once the nitpickers have combed through 'em. Hats off to Marilyn deQueiroz and Jason Adam for their dedication and patience with the pesky vermin that always manage to make their way into those assignments. Newbie nitpicker Pauline McNamara has started collecting nits too.

Tips for the Trail...
Working on an assignment with a bunch of methods in it? (Now which one ;-) could that be?) It's still all about readability: remember to declare a method before you call it from another one. Happy codin'!

Written by Pauline McNamara, Jason Adam and Marilyn deQueiroz


Return to Top
Book Review of the Month

Applying UML and Patterns - An Introduction to Object-Oriented Analysis and Design and the Unified Process (2nd Edition)
by Craig Larman
Craig Larman has outdone himself. This edition, with its many changes and new sections, is really almost a rewrite rather than an enhancement.

This edition uses the Unified Process (UP) as a sample iterative process, replacing the first edition?s generic set of Recommended Process and Methods (RPM). Many of the changes in the text and diagrams revolve around this shift. If you don?t use UP, don?t worry: the material is still very relevant since the basic ideas can apply to any iterative development process.

Java programmers will also feel right at home since the code examples are in Java. If you use another object-oriented language, you should still be able to follow the discussion. Larman expands his discussion of object-oriented design principles and the General Responsibility Assignment Software Patterns (GRASP) introduced in the first edition, particularly the "Don?t Talk to Strangers" pattern which has been incorporated into the more general "Protected Variations" pattern.

Other changes include updating use cases to follow the approach popularized by Alistair Cockburn and including a third iteration in the POS System case study. My favorite addition was the new "You Know You Didn?t Understand..." sections. They listed common misconceptions about various UP concepts-" a useful aid to quickly review your understanding of the preceding sections.

If you read the first edition, you?ll probably want to read this one, too. Highly recommended for anybody who wants to learn the basics of UML and iterative OOAD.

(Junilu Lacar - bartender - June 2002)

More info at Amazon.com More info at Amazon.co.uk

Chosen by Cindy Glass and Madhav Lakkapragada

Return to Top
July Giveaways

Starting Date Book Author(s) Publisher JavaRanch Forum
July 9 J2EE and XML Development Kurt Gabrick Manning Publications Co. XML, XSL, DOM and SAX
July 16 SCWCD Exam Study Kit Hanumant Deshmukh
Jignesh Malavia
Paul Anil
Manning Publications Co. SCWCD
July 23 Java for the Web with Servlets, JSP, and EJB: A Developer's Guide to J2EE Solutions Budi Kurniawan New Riders Publishing J2EE and EJB
July 30 JSTL in Action Shawn Bayern Manning Publications Co. JSP

For more information on JavaRanch's Giveaways see Book Promotion Page.

Return to Top
July News

Hey, we've added a Tack Room. No one should be out on the range without their vital JavaRanch supplies. T-shirts, caps, lunch bags, and mugs will keep you ready whether you are out on a cattle drive or just dropping in to the saloon.

As always JavaRanch is dedicated to providing you with the best source of information on Java Programming and Engineering.

by Carl Trusiak


Return to Top

Managing Editor: Carl Trusiak

Comments or suggestions for JavaRanch's NewsLetter can be sent to the NewsLetter Staff
For advertising opportunities contact NewsLetter Advertising Staff