Tuesday, December 07, 2004

Writing good hashcodes and equals

How would one write a good hashcode? Here is one I've seen in real life:

class Thing
{
    List itemlist;

    public int hashCode()
    {
        return toString().hashCode();
    }

    public boolean equals(Object o)
    {
        return o.toString().equals(toString());
    }

    public String toString()
    {
        return "Items = " + itemlist.toString() ;
    }
}
For reasons of brevity, I've drastically reduced the size of the original toString() function, but suffice to say, it printed the all elements of the entire complex Object with appropriate keywords. And the member itemlist was one that could grow indefinitely long. It was obvious that the dude who wrote it does not know what a hashcode is at all. A hashcode serves as a value for quick comparison between Objects. It is essentially used to quickly locate which bucket the item is stored in HashMaps and other data structures that provide near constant access time. If the hashCode computation takes longer than the equals comparison, then you need to brush up on your fundamentals. What could be done differently? A simple way to define a good hashCode is given below:
class A
{
    Integer a, b;
    String c;
    Set d;

    public int hashCode()
    {
        // Sum of (prime number * hashcode) for selected members
        return 7 * a.hashCode() + 11 * b.hashCode();
    }

    public boolean equals(Object o)
    {
        if(o == null || !getClass().equals(o.getClass()) || o.hashCode() != hashCode())
            return false;
        A other = (A)o;
        return other.a == a && other.b == b
            && c.equals(other.c) && d.equals(other.d);
    }
}
As you can see, while equal Objects imply equal hashCodes, the converse is not necessarily true. Most of the time, the check for class and hashCode will eliminate most matches. Among the few that do match, you just need to compare the itemlists without converting them to String first. This has the added benefit that the comparison can terminate early for example if the lengths are different. If you do not have any other members or you believe that they do not sufficiently provide a proper distribution of the hashcodes, you can define the functions as below:
class Thing
{
  private List itemlist;

    public int hashCode()
    {
        if( itemlist != null && itemlist.size() > 0 )
            return itemlist.size() * 7 + itemlist.get(0).hashCode();
        else
            return 0;
    }

    public boolean equals(Object o)
    {
        if (o == null || !getClass().equals(o.getClass()) || o.hashCode() != hashCode())
            return false;
        Thing thing = (Thing)o;
        return thing.itemlist == null && itemlist == null ||
            thing.itemlist.equals(itemlist);
    }

    public String toString()
    {
        return "Items = " + itemlist.toString() ;
    }
}
Of course, you have to ask yourself what business you have in returning toString() of classes that can be very large. If unavoidable, you should at least cache the results of toString() and hashCode() and update them on changes. This may need synchronization, so generally keeping hashcodes and toString functions simple is the better choice.

No comments: