Thursday 7 August 2008

Hashcode and Equals

The hashcode method is inherited from the Object class. The signature of hashcode is
public int hashCode()

Generally a hashcode value will be the memory address of the object. You can demonstrate this to yourself quite easily with some trivial code such as.

public class ShowHash{
public static void main(String argv[]){
ShowHash sh = new ShowHash();
System.out.println(sh.hashCode());
}
}

When I compiled and ran this code I got an output of : 7474923
Which is a representation of the memory address of that class on that run of the program.

equals and hashCode

Every object has access to an equals method because it is inherited from the Object. However this default object does not always do anything useful as by default it simply compares the memory address of the object. To make the objects equal on your semantics, you need to override this method. Let us assume that If the String class did not implement its own version of the equals method comparing two Strings would compare the memory address rather than the character sequence. This is rarely what you would want, and for this reason the String class implements it's own version of the equals method that makes a character by character comparison.

Here is another of the points from the API documentation
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

This principle is illustrated with the following code,

public class CompStrings{
public static void main(String argv[]){
String s1 = new String("Hello");
String s2 = new String("Hello");
System.out.println(s1.hashCode());
System.out.println(s2.hashCode());
Integer i1 = new Integer(10);
Integer i2 = new Integer(10);
System.out.println(i1.hashCode());
System.out.println(i2.hashCode());
}
}

This code will print out the same hashCode value for s1 and s2 and i1 and i2 on each run of the program. Objects that are equal according to the equals method must return the same hashCode value.
What if I ask otherwise . if objects have same hashcode are they equal ?

Not necessarily... inverse is not true. If two objects are not equal according to equals, they are not required to return different hashCode values.

"The summary is if two things are equal (according to equal()), they must have the same hashCode".

One more thing you must keep in mind is if you override anyone of these you should be overridding the other also.

The hashcode is of particular use with the hash based Collection classes like HashTable, HashSet, HashMap etc ... The nature of hash based collections is to store keys and values. The key is what you use to look up a value.

Illusteration : How Hashtable works.

  • Hash table implementations uses the hash code value to quickly get to the memory location where an object is stored as a key in a hash table. It is not necessary that hashCode() return a unique integer for an object. If two objects return the same hash code, equals() method is used to pick the one the hash table is looking for.

  • So, what does happen when two unequal objects return the same hash code? When Java goes to put objectA as a key and some other object as its value in a hash table using myHashtable.put(objectA, someValue), it first goes to the location indicated by objectA's hash code. If there is already an object there - may be because objectB has been stored there as a key previously, Java checks to see if objectA.equals(objectB). If yes, it replaces the value of key objectB with "soemValue". If no, Java puts objectA and someValue at a different location and creates a link from objectB to objectA. Next time Java needs to get the value of a key, say objectA, from the hash table, it goes to the location indicated by the hash code of the key object. Finding two objects there, it checks if the first object (objectB) equals the object in hand (objectA). If yes, it returns the value for the first object. If no, it goes down the link and checks if the second object (objectA) equals the object in hand (objectA). If yes, it returns the value of the second object. And so on.

  • When two unequal objects return the same hash code, it is called a "collision". It sounds dreadful and you would think some damage occurred. But no, nothing bad happened and you don't need any collision insurance. It just slows down the process of storing and retrieving those objects a bit. From JDK documentation: "Note that the hash table is open: in the case a "hash collision", a single bucket stores multiple entries, which must be searched sequentially." It will be nice if all objects returned unique hash codes. But in real life, it is not so. After all, a hash code can return a maximum of 2 billion or so combinations (maximum Java integer value) where as you can have infinite number of objects. Some of them will necessarily return the same hash code. Collisions are inevitable.

  • From JDK documentation: "An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method."

    "As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur."
  •