Arg! Twice as slow as Pentium...

rharder

Do not read this sign.
Argh! This is frustrating, and I wonder how much of it has to do with the way Objective-C messages are passed around in Cocoa:

I have a random number generator (a good one with 10^57 cycle length!), and I timed how long it took to generate one billion random numbers between zero and one.

On my work's 733 Mhz Pentium III with Yellow Box for Windows NT it took 6.8 minutes.

On my PowerBook G4 with Mac OS X 10.0.4 it took 12.4 minutes.

Where's all the extra slow-down coming from? Even if the chips' architectures were not major players, you'd only expect a 5/7 relationship in performance.

No AltiVec can be used with the RNG's double and long calculations.

Very odd.

-Rob

 
Is the program a Command Line one, or is it a GUI. Do you think Aqua has anything to do with the performance hits?

Maybe when 10.1 is released it will score better.
 
It's GUI. It consists of a button that says Go. Then there's a for loop that loops one billion times and generates random numbers. They're just generated--I don't even do anything with them.

-Rob
 
This is an absolute stab in the dark. I'm not sure how the NT box would be generating random numbers, but in the very least my OpenBSD box reseeds an "entropy pool" with presumedly erratic data from network latency, serial inputs, system loads, etc. As OSX has some BSD roots (although I believed that was mostly in the network arena) is it possible that OSX is being slowed down because it's constantly refilling this entropy pool, and the NT box is just chucking out the numbers?

Even if this were the case, you can agree this is a fairly narrow performance niche to fill hehehe.

I can think of a thousand facts that would make this seem like a ridiculous idea, so I'll hush now. It was just a thought.
-stephen
 
You know, that leads me to another thought...

Might be that if you ran the program as root (different priority level) it would perform better.

Have you benchmarked it on your desktop machine? Maybe it is a powerbook issue (slower clocks if using battary or something like that).
 
I'm not using the built-in RNG. It's a new one that's got much better properties and a much longer cycle length. It's processing consists of a bunch of addition and multiplication of doubles.

The two programs are identical, both running under ProjectBuilder, one on Windows NT, one on Mac OS X.

I could try running as root. Aside from the rumor that 'nice' and thread priorities aren't implemented, I'd be pretty upset to learn that I have to run code as root in order to get good performance! This is probably not it, though I'll give it a shot for troubleshooting's sake.

It seems as if the Objective-C message-passing has way too much overhead on OS X. This is surprising, seeing that Cocoa has been around under other names for at least a decade (and has no problems as Yellow Box). These kinks should have been ironed out years ago, if in fact these "kinks" are what's causing the terrible performance.

Still, one good lesson to take from this (which we all should have known anyway):

    Don't use Objective-C messages for performance-critical code.

Sure, use it to construct your app and be a wrapper, but for heavy math and such, use C/C++ where the lower overhead is the more important trait.

Cheers.

-Rob
 
I decided to run a test myself using PHP rather than C [which is WAY slower by the way]. The problem with my test is that the 4 test machines are not all running the same versions of PHP. When I upgrade PHP on my G4, I will post the execution time decrease if any.

Reatta: dual 733MHz PIII, 512MB RAM, PHP 4.0.6: Did nothing in 1.0672010183334 seconds

Delorean: single 733MHz PIII, 256MB RAM, PHP 4.0.6: Did nothing in 1.0971649885178 seconds

[PHP must NOT be multi threaded??]

Immaculata: single 400MHz PII, 256MB RAM, PHP 4.0.5: Did nothing in 1.5302710533142 seconds

Darklotus: Dual 500MHz G4, 1GB RAM, PHP 4.0.4pl1: Did nothing in 2.8723210096359 seconds


Here is the code in case you would like to try it out:
Code:
<?

function getmicrotime(){ 
    list($usec, $sec) = explode(" ",microtime()); 
    return ((float)$usec + (float)$sec); 
} 

$time_start = getmicrotime();
    
for ($i=0; $i < 1000000; $i++){
    //do nothing, 1000000 times
}

$time_end = getmicrotime();
$time = $time_end - $time_start;

echo "Did nothing in $time seconds";

?>
 
rharder:

you mentioned that you did nothing with the random numbers that you were generating. If you are going to wait minutes for the results, why don't you just SUM and average the random numbers.

At the end of the loop, assuming you are generating numbers between 0 and 1, you should get an average of "0.5".

Please post your results so we can see.
 
Ummm...yep. I get 0.499735 on one stream and 0.500080 on another after averaging one million draws from a U[0,1) distribution. This is on YellowBox. I don't have OS X here.

-Rob
 
hey rharder,

can you write a benchmarking app that would test the following...

1) random numbers like you have to test CPU.

2) memory access
a) alloc 32mb of data and do something
b) alloc 64mb of data and do something
c) alloc 128mb of data and do something
d) alloc 256mb of data and do something

3) max out all available real memory, to force virtual memory access. Then random access VM.

4) Random read HD.


You would have benchmark data to compare between different versions of OS X, and different computers. You would also be able to test different customization techniques (like which HD to put the VM). Make it user friendly, and make recommendations based to users like: "if you added 128mb more memory, you would gain an average of 10% increased performance..."

All of us here can supply benchmark data. I have an AGP 400mhz G4 with 256mb and an iMac 233mhz 128mb to throw into the mix.
 
Yeah, rharder, while you're at it, write a full-featured word processor that can read Word documents too :p
 
No problem. Here it is. [href]Oops[/href]

Hmm. That would be nice to try out though.

Here's another test I did: I tossed the Objective-C overhead and just performed millions of additions and multiplications on doubles. What took 3.5 minutes on my 500Mhz G4 took 2.1 minutes on a 733Mhz PIII.

So there you have it: this test suggests that with doubles, the G4's no better per-Mhz than a Pentium. Pity.

I wonder if it's as bad with floats...

-Rob
 
Converting all doubles to floats (and invalidating the RNG in the process) the 733Mhz PIII took 1.8 minutes to do the same task as above. Y'all are going to have to wait until I get home tonight to test the 500Mhz G4 against the same modification.

-Rob
 
One thought would be that Yellow Box for NT is cleaner (and better seasoned) because it is not that different from OpenStep Enterprise for NT (which was around since at least NT 3.51). I would love to see you compare NT with Yellow Box , Rhapsody for Intel and Mac OS X Server 1.x on PPC (specially at the speeds of your systems, my Rhapsody for Intel system is at 133 MHz). Plus consider the overhead that has been added to Mac OS X during the Developer Preview versions. I knew people who could run Mac OS X Server 1.0 on unaccelerated 8600 and 9600 series systems, that would be out of the question with 10.0.4.

My guess is that we are going to get some of the missing speed back, but the rest has been aimed at enhancing the "user experience" with aqua. Apple has good reason to want to do this now because it would be pretty bad if Web Objects (which uses many of the same tools as Yellow Box) ran better on NT/2000 systems than Mac OS X system (specially at the prices they charge for WO).
 
I don't know about os x, but linux has a random number generator integrated into the system, and it saves random seeds and such so that random numbers are rarely repeated. If os x has that, maybe it accessed it every cycle.

Also, how about something other than random numbers? maybe the user inputs two numbers and they are divided a million times or something. And, not to give you the shopping list, but have you thought about trying something like that in os9 (obviously different in every aspect)?



PS: Someone here has a computer named the same thing as my car.
And will somsone please pick up moreOJ at the store?
 
I guess I could try it in OS 9. I couldn't use the Objective-C stuff, obviously, but the meat of the RNG (as with all RNGs not based on things like random temperature fluctuations) is just a bunch of additions and multiplications of doubles, so I ought to be able to run it in 9.

BTW, sorry I haven't tried converting to floats and rerunning on OS X yet.

-Rob
 
I have been recently studying about Bitwize and binary manipulation. Instead of working with doubles, why not write a formula that works with binary, and only displays the result as a standard double.

For example, you have a page of a novel as your seed-text. Then you sample the time. Say it is: 4:32:59.483. Take the last three numbers (milliseconds), as your "start-at-char". You would have a set length of say 20 chars (for just over 15 number precision). The first number of the seconds would determine what type of operation you would perform: inverse, bitwise left-shift, or bitwise right-shift, bitwise left-rotation, bitwise right-rotation, or unchanged. The last number of the seconds (9) would be used for the shift and rotation operations. All operations would be performed on the entire string as one bit bit-block.

When you save the random number as a double, allocate it, then force the numbers into your predetermined spaces. 1.545017535281741e-1.

It might be a bit of a memory hog (because of the seed-string), but it should perform very fast operations.
 
Interesting idea.

This particular RNG uses doubles because it guarantees 53bits of accuracy in the mantissa.

-Rob
 
really, this is very similar to the test I did with fib(40)
It might not have been 40, I don't remember. All tests on same machine, dual g4 450 320 M Ram. Command line, calculated the fibinacci (sp?) 40th number in the most grotesque possible recursive way.

9 seconds under OS 9 metrowerks codewarrior
12.5 seconds under java on X - projectbuilder
17 seconds under C on X - projectbuilder

Any chance you could javafy your RNG and see how that compares?

the second processor was irrelevant in all tests.
 
Back
Top