Friday, December 31, 2004

Is BASIC good for anything?

Has BASIC become the red haired stepchild that nobody wants?

Some folks look down upon BASIC programmers; in fact they even avoid thier company. I'm not trying to say that all programmers are equal in the eyes of the CPU or anything, but remember that Bill Gates is basically a BASIC programmer and that BASIC was once the language of choice for many music related application in the 80's. BASIC's PLAY keyword makes it simple to program musical notes and I'm sure the language has a couple of other redeeming features as well (I just can't remember what they are at the moment).

BASIC's lack of the need for formal structure such as class declarations or include statements plus its interpreted nature makes it a popular language to learn programming even in places like India where everyone is allegedly born partially a programmer.

Speaking of India, here is a song written entirely in BASIC.

10 '+------------------------------------------------------------------------+
20 '|                      The Indian National Anthem                        |
90 '+------------------------------------------------------------------------+
110  PLAY "L8C#D#FFFFFFL4FL8FFD#FL4F#"  '| Jana gana mana adhinayaka jaya hai|
120  PLAY "L4FL8FFL4D#L8D#D#CD#L2C#"    '| Bharatha bhagya vidhatha          |
130  PLAY "L4D#G#L8G#L4G#L5G#"          '| Punjaba Sinda                     |
140  PLAY "L8G#L4G#L8G#G#G#L8A#L4G#"    '| Gujaratha maratha                 |
150  PLAY "L4F#L8F#F#L4FL8FFD#F#L1F"    '| Dravida uthkala vanga             |
160  PLAY "L4FL8FFl4Fl8fd#g#g#l4g#f#f#" '| Vindhya Himachala Yamuna Ganga    |
170  PLAY "L4fl8ffd#d#d#l8cl4d#l1c#"    '| Uthkala jaladi taranga            |
180  PLAY "L8ffffl4fl4fl8d#fl1f#"       '| Tava shubha name jage             |
190  PLAY "L8ff#g#g#l4g#l8f#fd#l8f#l1f" '| Tava shubha ashisa mage           |
200  PLAY "l4ffl8d#d#d#d#cd#l1c#"       '| Gahe Tava jaya gatha              |
210  PLAY "l8g#g#g#g#l4g#l8g#g#"        '| Jana gana mangala                 |
220  PLAY "l4g#l8g#g#g#l8a#l4g#"        '| Dayaka jaya hai                   |
230  PLAY "l4f#l8f#f#l4fl8ffl8d#f#l1f"  '| Bharatha bhagya vidhata           |
240  PLAY "o5l8ccl1c#l8co4a#l1o5c"      '| Jaya hai jaya hai                 |
250  PLAY "l8o4g#g#o5l1o4a#"            '| Jaya hai                          |
260  PLAY "l8c#d#fff#f#d#fl1f#"         '| Jaya jaya jaya jaya hai           |
270  END                                '+-----------------------------------+

Choosing the best suited language

But its got to be done in Jaavaa! (or .NET or VB or whatever)

No, Dr. Dre, you need to choose what's the most suitable language to solve the problem. Believe it or not, the complexity of the application changes dramatically with a language that thinks differently from the problem class at hand.

For example, consider a simple app that converts strips non-ASCII characters by converting them to their ASCII counterparts.

Lets say your manager loves BASIC and writes it as below


OPEN "unstripped" FOR INPUT AS #1
OPEN "stripped" FOR OUTPUT AS #2
WHILE NOT EOF(1)
     a$ = INPUT$(1, 1)
     n = ASC(a$)
     IF n > 127 THEN
           PRINT #2, CHR$(ASC(a$) - 128);
     ELSE
           PRINT #2, a$;
     END IF
WEND
CLOSE
END
That is quite reasonable, isn't it? You could probably write something equivalent in C, Java or related languages.
Yes, but look how much simpler it would be if Perl was chosen as the language of choice.

perl -pe 'y/\x80-\xff/\x00-\x7f/'
Its a one liner you type on the command line. For text processing, Perl tends to beat any other language.

Tuesday, December 07, 2004

Writing good hashcodes and equals

How would one write a good hashcode? Here is one I've seen in real life:

class Thing
{
    List itemlist;

    public int hashCode()
    {
        return toString().hashCode();
    }

    public boolean equals(Object o)
    {
        return o.toString().equals(toString());
    }

    public String toString()
    {
        return "Items = " + itemlist.toString() ;
    }
}
For reasons of brevity, I've drastically reduced the size of the original toString() function, but suffice to say, it printed the all elements of the entire complex Object with appropriate keywords. And the member itemlist was one that could grow indefinitely long. It was obvious that the dude who wrote it does not know what a hashcode is at all. A hashcode serves as a value for quick comparison between Objects. It is essentially used to quickly locate which bucket the item is stored in HashMaps and other data structures that provide near constant access time. If the hashCode computation takes longer than the equals comparison, then you need to brush up on your fundamentals. What could be done differently? A simple way to define a good hashCode is given below:
class A
{
    Integer a, b;
    String c;
    Set d;

    public int hashCode()
    {
        // Sum of (prime number * hashcode) for selected members
        return 7 * a.hashCode() + 11 * b.hashCode();
    }

    public boolean equals(Object o)
    {
        if(o == null || !getClass().equals(o.getClass()) || o.hashCode() != hashCode())
            return false;
        A other = (A)o;
        return other.a == a && other.b == b
            && c.equals(other.c) && d.equals(other.d);
    }
}
As you can see, while equal Objects imply equal hashCodes, the converse is not necessarily true. Most of the time, the check for class and hashCode will eliminate most matches. Among the few that do match, you just need to compare the itemlists without converting them to String first. This has the added benefit that the comparison can terminate early for example if the lengths are different. If you do not have any other members or you believe that they do not sufficiently provide a proper distribution of the hashcodes, you can define the functions as below:
class Thing
{
  private List itemlist;

    public int hashCode()
    {
        if( itemlist != null && itemlist.size() > 0 )
            return itemlist.size() * 7 + itemlist.get(0).hashCode();
        else
            return 0;
    }

    public boolean equals(Object o)
    {
        if (o == null || !getClass().equals(o.getClass()) || o.hashCode() != hashCode())
            return false;
        Thing thing = (Thing)o;
        return thing.itemlist == null && itemlist == null ||
            thing.itemlist.equals(itemlist);
    }

    public String toString()
    {
        return "Items = " + itemlist.toString() ;
    }
}
Of course, you have to ask yourself what business you have in returning toString() of classes that can be very large. If unavoidable, you should at least cache the results of toString() and hashCode() and update them on changes. This may need synchronization, so generally keeping hashcodes and toString functions simple is the better choice.

Thursday, September 16, 2004

Does Java pass Objects by reference?

You often hear that Java passes primitives by value and Objects by reference. Unfortunately, this is misleading. A reference has a specific meaning in programming terminology. A reference is the variable itself. Its not a pointer, its not a copy, its the actual variable. This means that if a variable is passed by reference to a function and the function assigns another value to it, the changes are visible outside the function. C++ implements references faithful to the original meaning while Java does not.
#include <iostream.h>

class Ref
{
        public:
        int one(int& it) { it = 1; } // Reassign variable
        void two(int& it) { it++; }  // Change variable
};

int main()
{
        Ref ref;

        int item = 0;
        ref.one(item);
        cout << "Item is " << item << endl;
        ref.two(item);
        cout << "Item is " << item << endl;

        int* i = new int(0);
        ref.one(*i);
        cout << "I is " << *i << endl;
}
Prints:
Item is 1
Item is 2
I is 1
Java on the other hand, passes a copy of the reference. This means that any changes to the variable are preserved, but you cannot reassign the variable to point to something else within the traditional meaning of "reference."
import java.util.Date;

public class Ref
{
        public void one(Date item) { item = new Date(); }

        public void two(Date item) { item.setTime(365*24*60*60*1000L); }

        public static void main(String[] args)
        {
                Date item = new Date();
                item.setTime(0);
                Ref ref = new Ref();
                ref.one(item); // Change not reflected here
                System.out.println("Item is "+item);
                ref.two(item); // Change is reflected here
                System.out.println("Item is "+item);
        }
}
Prints:
Item is Wed Dec 31 19:00:00 EST 1969
Item is Thu Dec 31 19:00:00 EST 1970
The net effect of this is that a Java function can change the Object passed to it, but any reassignment to something else is lost. Many argue that this is indeed safer and C++ funtionality can be achieved by having the function return the new value and let the caller assign it to the actual variable. It also protects the programmer from having to overload the = operator when calling outside code to ensure that the reference is not reassigned. Click here for good reference on References

Saturday, July 31, 2004

Tricks with Comments

Trick one: The code switch Sometimes it is essential to switch between two blocks of code. On systems that support C style comments, you could use the following construct:
//* Two slashes for version A, one slash for version B

int main() // version A
{
   printf("Version A\n");
}

/*/

int main() // version B
{
   printf("Version B\n");
}

//*/
Works on C/C++/Java/C#. For SQL:
--/*
SELECT * FROM A
/*/
SELECT * FROM B
--*/
Remove the -- on the first line to switch to the second block. Now, if anyone knows how to do it in other languages...

Trick two: Nested comments Some compilers support nested comments (actually, only Borland!). This piece of code helps you detect if the compiler supports nested comments:

#define NESTED /* /* /*/ 0 */*/*/1

main()
{
   puts(NESTED ? "Supported" : "Unsupported");
}

Thursday, July 29, 2004

Bad algorithms and caching

The fibonacci function is traditionally defined recursively:

fib(n) = fib(n-1) + fib(n-2); fib(1) = fib(0) = 1

This would typically be implemented in the venerable C language as:

unsigned long long fib(int n)
{
        return n == 0 || n == 1 ? 1 : fib(n-1) + fib(n-2);
}
The problem with this implementation is that the performance is atrocious.
$ time ./fib -r 40
165580141
 
real    0m18.263s
user    0m18.262s
sys     0m0.002s
The reason is that each higher number needs geometrically increasing number of recursive invocations:
 
 fib(0) = 1 call
 fib(1) = 1 call
 fib(2) = 3 calls
 fib(3) = 5 calls
 fib(4) = 9 calls
 fib(5) = 15 calls
 fib(6) = 25 calls
 fib(7) = 41 calls
 fib(8) = 67 calls
 fib(9) = 109 calls
 fib(10)=177 calls
 ....
 fib(40) = 331160281 calls
Curiously, the number of calls for fib(n) can be defined as:
 fib_calls(n) = 1 + fib_calls(n-1) + fib_calls(n-2); fib_calls(1) = fib_calls(0) = 1 
 
The performance problem is addressed by rewriting the function to work iteratively:
unsigned long long iterative_fib(int n)
{
        register unsigned long long n1 = 0, n2 = 1, tmp;
 
        int i = 0;
        for(i = 0 ; i < n; i++)
        {
                tmp = n2;
                n2 = n2 + n1;
                n1 = tmp;
        }
        return n2;
}
This gives a remarkable boost in performance:
$ time ./fib -i 40
165580141
 
real    0m0.003s
user    0m0.001s
sys     0m0.002s
Another way to get around the performance problem is to use caching with the recursive algorithm:
static unsigned long long cache[100];
 
unsigned long long cached_fib(int n)
{
        if(cache[n])
                return cache[n];

        return n == 0 || n == 1 ? 1 : (cache[n] = cached_fib(n-1)+cached_fib(n-2));
}
Performance is similar to that of the iterative version:
$ time ./fib -c 40
165580141
 
real    0m0.003s
user    0m0.000s
sys     0m0.003s
However, in a program that calls the fibonacci function numerous times, the cached recursive version would work on an average, faster than the iterative verion. The flip side is that caching uses up memory. After a certain threshold, the system starts swapping to disk and performance degrades rapidly. So you really want to select good algorithms wherever possible and save caching for operations that simply have to be cached such as those involving disk or network IO.

Click here to download the complete source

Thursday, February 19, 2004

Simplicity, the first principle

Simple programs evolve, complex programs die. 99 percent of features are under utilized in any program. Simplicity and elegance is way more successful than feature load. So is adherence to existing standards.