Comments on codebytez: Gates' Law

People are wanting to delete the Wikipedia article...

2008-09-05T22:17:00.000-04:00

People are wanting to delete the Wikipedia article "Gates' Law"

The article:
http://en.wikipedia.org/wiki/Gates%27_law
The article before it was messed with:
http://en.wikipedia.org/w/index.php?title=Gates%27_law&oldid=236567039

I would like there to be contributors either for, or against. I don't think there is a balanced vote right now. You can vote here:
http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Gates%27_law

Thanks again.

Thanks everyone for the feedback.Scott, in case yo...

2006-03-26T22:20:00.000-05:00

Thanks everyone for the feedback.
Scott, in case you may be unaware, there ARE garbage collectors for C/C++ http://www.hpl.hp.co.uk/personal/Hans_Boehm/gc/
and some of them definitely do some defragmenting. Also you seem to have not followed my statement on virtual memory. There are some applications that use a huge (64GB+) RAM systems with paging disabled to guarantee nanosecond access performance. Unless you have one of those, please dont believe that page faults never occur if you allocate less than the available RAM. If enough number of applications are running, you WILL encounter page faults. Go ahead, try it. As for the mov vs xor, only the ones old enough to have programmed a REAL application for an 8086 know what Im talking about. We have spent hours changing application to save that one byte (Embedded/BIOS programmers) or that one millisecond of execution time (device drivers). The fact of the matter is that the mov IS slower on an 8086. You can find this idiom on all over Peter Norton's books (Guide to the IBM PC) etc. from the early 80's.

@noocyte:And parsing strings in java could very ea...

2006-03-24T15:55:00.000-05:00

@noocyte:
And parsing strings in java could very easily end up being just as costly as creating new Scanner objects, since strings are immutable.. :) Hence you get a new string each time you change it.
True, but when you use Scanner, you also create Strings (in addition to the Scanner objects). I mean, either way (manual parsing and Scanner), you'll have the original Strings and the Strings you need. The difference was really the Scanner objects. (it was a very simple parsing routine)
I can't speak for java, but in .net it really is cheap to allocate small objects.
"temporary" objects? Maybe the .Net compiler allocates those on the stack (or maybe even in regions). This is what escape analysis can give you. (which is gonna be in Mustang according to the IBM article)
@anonymous: you mean swap out of the cache, or swap out to disk? I don't think swap out to disk should be considered relevant anymore.
No analysis of the cost of allocation means anything without including the full-cycle cost -- that is, of freeing too, and checking if it's free, and of pushing other useful stuff out to swap to make room.
This is certainly true, and something I was trying to point out.
All that can be said confidently, given all the uncertainties, is that (1) where performance matters, dynamic allocation (off-stack) loses; and (2) where dynamic allocation is unavoidable, getting reliably tolerable performance from garbage-collected allocation is way, way harder than from malloc()/free().
I totally agree with (2). I don't agree with (1) though. It's not always true. If you consider the full-cycle cost, then a copying collector can be faster than malloc/free. It's very hard to make general comments about this. There is no one collector which gives the best performance in all cases and you can obtain better performance with a collector by tuning it for a specific application.

We have a lot more software today, and that becaus...

2006-03-24T09:05:00.000-05:00

We have a lot more software today, and that because more people write it.
But today everybody seems to belieave it is a programmer, it's innevitable that a lot of software has poor performance/quality.
There is also a concept called "time to market" which unfortunately has priority over perforamce and sometimes over quality.
Just don't blame the languages for the mistakes programmers make.

Scott doesn't tell the half of itIndeed, I did not...

2006-03-23T17:13:00.000-05:00

Scott doesn't tell the half of it

Indeed, I did not. My point was not to explain all the intricacies of Java garbage collection but to point out that it (along with everything else) is not as the original poster described.

For those who want to start learning more about generational GC and escape analysis, this article is a decent starting point. You will never really understand the performance of a system (especially one varying version to version as much as Java) without your own experiments (and there are some good tools for doing so), but it at least has some more depth of analysis.

- Scott Lamb

This article is living proof of why java is better...

2006-03-23T12:45:00.000-05:00

This article is living proof of why java is better then you think: people who think they know, but don't. This has all been dicussed a long time ago:
http://www-128.ibm.com/developerworks
/java/library/j-jtp01274.html

and his whole "java could put it on the stack thing".
From the article:
The JIT compiler can perform additional optimizations that can reduce the cost of object allocation to zero. Consider the code in Listing 2, where the getPosition() method creates a temporary object to hold the coordinates of a point, and the calling method uses the Point object briefly and then discards it. The JIT will likely inline the call to getPosition() and, using a technique called escape analysis, can recognize that no reference to the Point object leaves the doSomething() method. Knowing this, the JIT can then allocate the object on the stack instead of the heap or, even better, optimize the allocation away completely and simply hoist the fields of the Point into registers.

This is why we have bad code: people like the writer. People like the writer are why we need Java. To protect us against unmitigated arrogance.

> xor ax,ax> inc axversus> mov ax,1xor reg,reg tak...

2006-03-23T11:34:00.000-05:00

> xor ax,ax
> inc ax

versus

> mov ax,1

xor reg,reg takes : 2(286) 2(386) 1(486 and above) clocks

inc reg takes: 2(286) 2(386) 1(486 and above) clocks

mov reg,immed takes: 2(286) 2(386) 1(486 and above) clocks

so:
on 286 2+2 clocks VS 2 clocks
on 386 2+2 clocks VS 2 clocks
on 486 1+1 clocks VS 1 clock

To noocyte.Anonymous said:"As an experiment try to...

2006-03-23T11:31:00.000-05:00

To noocyte.

Anonymous said:
"As an experiment try to compile the same code in visual c++ 4 or 6 toward a new release visual studio whatever; i guarantee you that the code compiled with the old rel will be faster AND much smaller in size."

So its not about the newer compilers being slover.
Its supposed to be about never compilers producing larger and slower executables than the old compilers.

One other thing: on AMD processors, both code sequ...

2006-03-23T07:23:00.000-05:00

One other thing: on AMD processors, both code sequences -- like almost any other example of that ilk -- run at exactly the same speed.

Scott doesn't tell the half of it, and his anonymo...

2006-03-23T07:20:00.000-05:00

Scott doesn't tell the half of it, and his anonymous interlocutor doesn't either. No analysis of the cost of allocation means anything without including the full-cycle cost -- that is, of freeing too, and checking if it's free, and of pushing other useful stuff out to swap to make room. If all memory were the same speed, and nothing got swapped out, and no other programs were competing for memory, that measurement might not be so hard.

In the Real World (TM), cache locality makes memory items not swapped out accessible in a cycle, or three, or a hundred, where items swapped out take millions or even billions of cycles to get to. Unfortunately, garbage collection, particularly when more than one GC program is running, makes it overwhelmingly more likely for items to be swapped out.

A result is that artificial benchmarks of GC performance are guaranteed to be drastically misleading. At the same time, malloc() allocation performance varies radically from one platform to the next; those that keep "freelists" that must be searched remain distressingly common.

All that can be said confidently, given all the uncertainties, is that (1) where performance matters, dynamic allocation (off-stack) loses; and (2) where dynamic allocation is unavoidable, getting reliably tolerable performance from garbage-collected allocation is way, way harder than from malloc()/free().

(A final note: assigning the result of C++ "new" into a raw pointer almost always a mistake; use some sort of smart-pointer object. Any program with a "delete" statement is suspect.)

Despite what people here in the comments say, the ...

2006-03-23T06:27:00.000-05:00

Despite what people here in the comments say, the creation of objects in Java is *not* cheap. It is true that most JVMs have a compacting (actually a copying) garbage collector, which is in most cases also generational. It does mean that allocating memory on the heap is not expensive, it comes down to increasing a pointer. That does not mean the creation of an object is cheap. The creation of an object is really expensive and probably the most expensive operation in most Java programs. An object that has been created, does take up space on your heap and even though the deallocation of such an object when it is no longer used is in fact free with copying collectors (copying collectors copy the "live" objects, they never touch "dead" objects), it does mean that the more objects you create, the more often you're gonna trigger a garbage collection. And believe me, that is expensive. A couple of days ago I reviewed a small bit of Java code. This Java code was a little parsing routine for parsing a very simple text format. The problem was that it needed over 15 minutes to parse a 3MB file (I never let it finish though). In any case, for each parsed line 3 Scanner objects were created, which were disregarded almost immediately. This is bad. I replaced it with a simple manual parsing of a String for each line, avoiding the creation of numerous objects. Loading time went down to 7 seconds. This is an extreme case, but it just shows that if you carelessly create objects, you'll pay a price.

@Scott Lamb: Technically, Java's memory allocation is not faster because of "generational" garbage collection, but because of "compacting" garbage collection, which would be either mark-slide or copying collection. Generational garbage collection is implemented on top of that to make collection in general more efficient and it can also reduce pause times. I assume you say that allocation is faster in Java in the sense that it just has a contiguous "free" region, while the C/C++ implementation will probably just use a free-list. I don't think that's gonna make such a big difference though.

You all should aim for Motorola processors. Maybe ...

2006-03-23T05:44:00.000-05:00

You all should aim for Motorola processors. Maybe since these have the stack allocation flipped upside-down than Intel's... you all may see something much different :)). Wouldn't that be amazing??? Have a good day and happy programming.

This was intended as a joke. I will now make room for smarter guys to share their thoughts.

true or false, it isn't the general casesure that ...

2006-03-23T04:20:00.000-05:00

true or false, it isn't the general case
sure that from dveloper point of view compiling the program take much less time in newer prcoessor
also
mathimatics and computation ,graphics rendering
has an insanly speed improvement in newer processor

another issue

when using java your aim to reduce development efforts which is the cost of the insane performace
aquired from using good assembly

This article is unbelievably wrong.Others have men...

2006-03-22T20:51:00.000-05:00

This article is unbelievably wrong.

Others have mentioned that the mov instruction is not quite as you've described.

I'm not sure what you mean about "defragmenting the heap". There's no way in C or C++ to move stuff around, since it's not possible to know if a given memory address holds a pointer to another one. (There isn't enough introspection for that.) The only thing remotely close to this is combining of contiguous empty regions within free().

You definitely don't understand virtual memory. There's a mapping layer. A large virtual memory space does not mean you get many page faults. You only get many page faults if you have a lot more allocated than you have physical memory and your accesses are non-local.

Java's heap allocation is also considerably different (faster) than C++ in the common case due to the memory layout allowed by generational garbage collection. The short version is that it can just keep a pointer to the unused region, go until it runs out, then run the garbage collector which compacts things. (The process I just described as impossible in C or C++.)

The theoretical optimization Java stack optimization you've referred to does not render the compiler prohibitively slow or incorrect. It's called "escape analysis" and it's in beta now.

- Scott Lamb

This article makes no sense and remember that "new...

2006-03-22T20:15:00.000-05:00

This article makes no sense and remember that "new" in java isn't a direct mapping to malloc either the amount of memory used by the JVM doesn't change much unless the program needs more than the initial size.

Bad coding can happen in MM systems but that is because people hang onto objects for too long.

Most objects that don't hang around till between Garbage collection get destroyed by it and it is virtually free. I think you need to read up on how a VM works before coming to such assumptions.

This hasn't only to do with bad programming practi...

2006-03-22T20:04:00.000-05:00

This hasn't only to do with bad programming practices. It's got to do with "slower" compilers.

As an experiment try to compile the same code in visual c++ 4 or 6 toward a new release visual studio whatever; i guarantee you that the code compiled with the old rel will be faster AND much smaller in size.

i suspect this has got to do with microsofts contracts with intel..

Actually on an 8086, for the instruction "mov ax, ...

2006-03-22T18:29:00.000-05:00

Actually on an 8086, for the instruction "mov ax, 1", the processor needs to fetch the word from the location at CS:IP and move it to AX. So yes, there is a memory access. Irrespective of the actual implementation specifics, the point was that the programmer knew that "xor ax,ax; inc ax" was faster than "mov ax,1" and yes, with newer pentiums it was the other way around, but by then, hardly anyone was programming in assembler.

Hmm, wasn't memory allocation in Java supposed to ...

2006-03-22T18:28:00.000-05:00

Hmm, wasn't memory allocation in Java supposed to be dirt cheap? From what I've heard it's just a pointer incrementation.

And the fragemntation issue is rather elgantly solved too. The most important technique in java is to not keep your objects hanging around needlessly.

On x86, the inc instructon potentially updates the...

2006-03-22T15:18:00.000-05:00

On x86, the inc instructon potentially updates the flags register and therefore create a dependency that will cause the hardware to stall.

It's all about bloat man.The absolute worst? Adobe...

2006-03-22T15:14:00.000-05:00

It's all about bloat man.

The absolute worst? Adobe. Man, CS2 apps are slow to startup, memory hogs, even switching to Photoshop from another program is slow, and this on a 3GHZ P4 with 1GB RAM.

And indeed they stuffed it with all kinds of stuff, some of it runs as a service even...

> xor ax,ax> inc ax> rather than:> mov ax,1> The r...

2006-03-22T14:37:00.000-05:00

> xor ax,ax
> inc ax

> rather than:

> mov ax,1

> The reason was simple.
> The programmer knew the latter
> alternative required access to
> memory

Actually, even average programmer should know that there is no memory access involved. Yes, if I remember well, first variant is one byte shorter, but that is all.

OTOH, starting with Pentium I, simple variant (mov ax, 1) will run faster in most cases, as the other one introduces dependecies and cannot be parallelized by superscalar machine.

Maybe it is rather this "smart approach" that makes the software run slow ? :)

There's actually a law called Wirth's law: Softwar...

2006-03-22T14:07:00.000-05:00

There's actually a law called Wirth's law: Software gets slower faster than hardware gets faster.

Ref http://en.wikipedia.org/wiki/Wirth%27s_law

Sadly, so true.

2006-03-22T13:59:00.000-05:00

Sadly, so true.