# OS X vs. Linux (x86) speed?



## bbloke (Sep 18, 2003)

A friend of mine who is very experienced with Linux and various forms of UNIX asked me to run a speed test via the Terminal app.  He said that his laptop (running Linux) executed the following command in 48 seconds:

time awk 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}'

On my QuickSilver DP 1 GHz machine, it took...  60 seconds!  I asked how fast his laptop was and was disappointed to find it was only a 1.13 GHz (Pentium) machine.     

I'm not out to start any G4 vs. Pentium debates, but I wondered why his laptop would astonish me by performing so very much better than I expected compared to my G4!  It also is particularly odd, as it does not reflect my "feeling" of the relative speeds in the real world (I use a 1.5 GHz P4 at work, alas).  A different friend of mine ran some C-based code on his Linux machine (600 MHz x86) and surprised himself by finding it was quicker than when compiled on what was supposed to be a much faster UNIX workstation.

I'm not a programmer, and I'm not an expert on hardware or coding.  But I have been left wondering what the reasons could be.  Does Linux have any particular advantage in this area?  Does this test bring out the particular strengths/weaknesses of the different architecture?


----------



## hulkaros (Sep 18, 2003)

Too many things too explain... Move along there is nothing to understand here other than give us another reason to flame each other... 

Still, you may want to know that Linux for Mac exists and in many distributions too!


----------



## bbloke (Sep 18, 2003)

Erm, this was *definitely* not an attempt to start a flame war!  I'm genuinely interested to get to the bottom of this, as the results really surprised me and I wondered if any UNIX/Linux users here would know of any particular biases in the test.

I am well aware that Linux exists in different incarnations for the Mac, too.  

Honestly, this is not about starting an argument, I'm genuinely after information!


----------



## Racer D (Sep 18, 2003)

hmm dual processors don't help u much here I'd say. And linux is prolly a bit faster than os x too

1min 31s on 700mhz G4 iMac
(safari, iTunes, finder, X-Chat aqua, konfabulator running in bg)

1min 42s on 600mhz celeron / redhat9
(mldonkey, BitchX in bg, no window server, console only & me ssh-ing in)


----------



## Lycander (Sep 18, 2003)

It's quite simple actually:

FPU = floating point unit (FP math operations)

ALU = err... Arithmatic L-something Unit. So it's integer math.

Pentium: Strong ALU, weak FPU.

G4: String FPU, weak ALU.

Software optimized for Pentium processors have to use SSE/SSE2 instructions for floating point math to increase performance.

And if you look at OSX developer docs, especially the API reference, Apple designed the Cocoa framework to use the FPU for even simple arithmatic, by way of using floating point data types when regular integers would be sufficient.

The code your friend asked you to run simply iterates an variable (i) from 0 to 10,000. But it does not explicitly say what data type 'i' is, so it probably defaults to int (integer). Not only does your friend's laptop have more MHz on you, but it also has stronger integer performance.

If you were to redo the test but find a way to ensure the data type used is floating point and not integer, the results may be better in the G4's favor.


----------



## lurk (Sep 18, 2003)

Another important thing is that awk is a dinky little scripting language for processing text files.  The program you entered was not compiled but instead it was interpreted by the awk interpreter.  Under Linux awk is gawk (the GNU reimplementation), and on OSX it is the original awk.   

So what you are really doing is comparing the performance of to totally unrelated implementations of interpreters for the awk language running on totally different machines and OSes.  As a general rule GNU stuff is faster as they have been able to learn from the mistakes of their predecessors, so with everything being equal gawk should be faster.

I hope it is clear that this really is an appels vs. oranges kind of comparison...


----------



## wiz (Sep 20, 2003)

i agree with lurk. compile a simple c program and test it.


----------



## hulkaros (Sep 21, 2003)

And I will repeat:
"Too many things too explain... Move along..."

Flame thread or not! 

If ANYONE believes seriously that a G3/G4 CPU is slower than a Wintel/Amd running @ same MHz by using the above method, hmmmmm


----------



## Lycander (Sep 21, 2003)

Well the above method doesn't prove anything hardware-wise but it does prove that OSX could have used better open-source tools. Scripting/interpreted environments are obviously slower than native byte code but it does illustrate weaker integer math of the PPC platform, inspite of having better branch prediction.


----------



## mkwan (Sep 21, 2003)

Let's all embrace Unix/linux


----------



## btoneill (Sep 22, 2003)

> _Originally posted by Lycander _
> *Well the above method doesn't prove anything hardware-wise but it does prove that OSX could have used better open-source tools. Scripting/interpreted environments are obviously slower than native byte code but it does illustrate weaker integer math of the PPC platform, inspite of having better branch prediction. *



Actually it doesn't say anything about OSX not having better opensource tools. gawk is actually slower then the original awk (1977 specs), as well as the newer awk (1985 specs).

This was done on my Sun Ultra60 with 2x450Mhz processors.

```
$ for x in awk nawk gawk; do
> echo "doing $x"
> time $x 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}' < /tmp/awkawk
> done
doing awk

real     2:00.1
user     1:49.2
sys         0.0
doing nawk

real     1:55.6
user     1:45.2
sys         0.0
doing gawk

real     2:04.2
user     1:53.7
sys         0.0
```

awk uses  109secs of actual cpu, nawk uses 105secs of cpu, while gawk uses 113secs. On solaris awk is by default the 1977 spec awk, nawk is the 1985 spec awk, which is what most other systems list as awk. The redirection of /tmp/awkawk is just an empty file and is needed as the original awk expects some sort of input.

As to the original question, alot of things can cause differences. Many of them have already been mentioned, but there are others such as some x86/linux only optimizations which can be done with gawk to increase performance, there may be different optimization levels used in the compile of the software, etc. A much better test would be something that actaully taxes the system,and does a wide range of things, instead of just a for loop. A good test might be the mysql benchmark test that tests a wide range of things. It's located in the sql-bench directory or a standard mysql install.

Brian


----------



## Lycander (Sep 23, 2003)

> _Originally posted by btoneill _
> *Actually it doesn't say anything about OSX not having better opensource tools. gawk is actually slower then the original awk (1977 specs), as well as the newer awk (1985 specs).*


Someone above had said gawk was faster than awk, so I was going with that lead. I don't know myself I've never used (g)awk.


----------



## wiz (Sep 23, 2003)

use C compile it natively and test test test  test test 

then use int and doubles and test test test test test 

then use the interpreters but forget testing it.


----------



## lurk (Sep 29, 2003)

> _Originally posted by Lycander _
> *Someone above had said gawk was faster than awk, so I was going with that lead. I don't know myself I've never used (g)awk. *



Don't put much creedence into my random spouting off in the presence of real data   I also did not say that it was faster just that it most likely was faster.  When you look across the set of GNU vs Unix tools as a general rule the GNU stuff is faster for the reasons I cited.  But that is only in general any specific instance may go the other direction.

One thing that both the Sparc and the PPC have in common is that they do not natively do byte level addressing which the x86 does do.  So applications which munge character strings like awk may actually have a little advantage on x86 because the other processors have to do some funky magic to work with 8 bit data.

Take that last point with a  grain of salt though because my old Apple ][ also had byte level addressing.

-Eric


----------



## jjmac (Oct 28, 2003)

Try that test in Single User Mode (Cmd+S at startup) and tell us what append.


----------



## Arden (Oct 28, 2003)

There shouldn't be a difference, it just goes by your processor speed.  (Not "just," but there's little if any interference caused by overhead like Aqua.)


----------



## jjmac (Oct 31, 2003)

1min15sec on an ibook 800,12,cd,384mb and entourage, preview and IE running bg


----------



## michaelsanford (Nov 1, 2003)

...Arithmetic Logic Unit...


----------



## cogito (Nov 4, 2003)

side note... pop open your CPU monitor... notice only one of the processors is being used.  My dualie 1ghz ran it in 48 seconds with itunes, safari, photoshop, dreamweaver, and ichat open.  No point being made at all.


----------



## theed (Nov 9, 2003)

oh boy.  That test is sooooooo not about OSes.  It's loosely about hardware performance on a type of manipulation that doesn't do any work.  It's the type of operation which is helped by MHz and MHz alone, the fatness and FPU coolness of the PPC does nothing here.  Java interpretation suffers a similar fate though, so it's probably roughly indicative of how fast an interpreter could be run on those hardware platforms.

Since it's a loop, it's probably not a good test of OS, memory allocation, stability, or actually getting work done.  At least he didn't try printing 10,000 asterisks and use that data to claim that OS X was slow.

A lot of research could be done on how to benchmark a system, and this would be a classic demo of how to make a nearly worthless benchmark.  It's probably fairly closely related to SPEC-int for the processors it's running on, which is also a nearly worthless benchmark for a whole computer, but is useful for predicting how fast a CPU will munge through a loop without doing any work.    How do you like that circular logic?


----------



## Ripcord (Nov 10, 2003)

That's very very interesting, 

on my DP 2.0 G5 the command took 24.3s.  My single processor 1.25ghz G4, this took 4m11.984s, and around that consistently.

No, I didn't mistype - in fact, to be sure, I cut and pasted from above.  It was 10,000 iterations of i and j.  Before starting, CPU hovered around 4% used, during the test it stayed at a steady 100%.

Reducing to i<1000 brought the time to 24.9s (on the G4)

My G4 doesn't seem particularly slow otherwise - should this worry me?


----------



## Ripcord (Nov 10, 2003)

Nevermind - it looks like it was the GNU Awk that got installed with fink that took so long - if I run the awk in /usr/bin it completed in 49.626s this time.  Weird - I wonder why the fink awk (GNU Awk 3.1.2) is so slow?


----------



## theed (Nov 11, 2003)

it's probably bloated with silly features.  You know those gnu guys.  Have you even seen emacs?  ;-)

Seriously though, I don't know why the difference.


----------



## Ripcord (Nov 11, 2003)

In the back of my mind I was hoping that my machine was running 4 times slower than it should (and I just hadn't noticed), which meant that there was probably something I could do to make it run 4 times faster =)


----------



## ericl (Nov 14, 2003)

Well, you are comparing Apples 7 Oranges!!  Just kidding.

You didn't say how much memory each host had (installed ohysical memory)

You didn't say how much free memory each host had before you hit the <return> key.  On a unix host that has been running for a few hours (#uptime command), free memory as reported by the #vmstat command tends to be low.

Since the execution time on each host was over 30 seconds, it is safe to assume that the process that writes dirty filesystem pages to disk(fsflush on SunOS) ran at least once

You didn't say what priority the process ran with on each host (#nice command)

You didn't say how many processes were running on each host (#/usr/ucb/ps -aux|wc)

You didn't say how many times you ran the test (at least 10 times, then average the results)


----------



## theed (Nov 14, 2003)

ummmm, he also didn't say what the phase of the moon was.  The code is a loop.  That's it, and as such it tends to sequentially use the basic math functions of the CPU without doing parallel work, thus typically reflecting very roughly what the MHz rating of the chip is.  The code doesn't consume RAM.  It's a weak if not worthless benchmark and it largely reflects the hardware, not so much the OS.

Your knowledge of unix commands is fairly impressive.  Your following of this thread is not so impressive.

Sorry to jump all over you there new guy, but you kinda came down on us without having any sort of real point (that I can see anyway) and it just comes off as obnoxious.  That and you said Apples 7 oranges ... Aside from that, welcome to the forums.  

And how should I pronounce your name?  Kinda like oracle?


----------



## Viro (Nov 16, 2003)

Well, another purely CPU but rather not too useless benchmark is the distributed.net client. This client tries to break the RC5 encryption, and it provides a handy benchmark utility(pass -bench to the client).

Once you run that, I dare you to find an x86 based CPU that can beat even my lowly Powerbook G4 867 Mhz . Because distributed.net is optimized for Altivec, it totally blows the competition away. My P4 1.8 Ghz does roughly 1.9 Million keys a second. On a good day, my G4 867 tops 9 million keys a second, over 4 times as fast. 

While its just a CPU benchmark, it does show that the G4 and G5 aren't pushovers, once code has been properly optimized to run on them.


----------



## theed (Nov 16, 2003)

Yeah, and that's the point.  The PPC chips haven't been real movers in terms of clock speed, but they have serious ability to get work done in each clock cycle.  However, the distributed.net client is about as artificial a benchmark as the one described at the beginning, unless you happen to be doing exactly that kind of work, parallel instructions.  This happens a good bit in science, high end math, and video encoding/decoding. 

However, for speeding up Quake, running java, ripping through script languages, compiling, doing AI work ... the other artificial benchmark is probably more telling.  Those tasks require very little work per clock cycle, but they need the results of the last computation to do the next one.  They don't go parallel very well.

So while I am going to say that both the distributed.net client and the awk loop are both nearly worthless and artificial, they both represent something.  And if you put them together and average the score, then you start to get something that has general meaning.  Actually you should probably weight the loop thing about 4 times as heavy as the distributed.net client because it represents the more common case of the processor being nearly loadless and looping.

Meanwhile, the ability of the PPC to get lots of work done per clock cycle, plus the crazy clock speed of the G5, and there is serious power there.  Intel is countering by trying to add the ability to get more work done per clock cycle to their x86 line, (hyperthreading) and/or by moving to iTanium, or whatever it's called this week.  It's an interesting game.


----------



## Viro (Nov 16, 2003)

> However, for speeding up Quake, running java, ripping through script languages, compiling, doing AI work ... the other artificial benchmark is probably more telling. Those tasks require very little work per clock cycle, but they need the results of the last computation to do the next one. They don't go parallel very well.



There are somethings that I would disagree with. In a 3D game, once you move an object you'll have to update its coordinates. Calculating the new position for each vertex in a 3D scene is something that can most definitely be done in parallel. The reason why most games on Mac don't use Altivec is because the data structures used aren't designed for use with Altivec. But that's really the fault of developers for not targetting the Mac market and just doing ports as an after thought.

AI work, it really depends on what you mean by AI. Genetic algorithms can be heavily parallelized, same with celullar automata and neural networks. 

If Java automatically vectorized code or generated Altivec code, you'd get a speed increase. Sun's JVM currently emits SSE instructions, and look and certain programs (heavily reliant on maths) run just as fast as native C code, if not faster.

But yeah, until something happens to automagically make programs use Altivec, things will always be Mhz bound.


----------



## ericl (Nov 17, 2003)

I just meant to help, stimulate thought.  I'm an unemployed UNIX person trying to see if I still like computers.  I'm trying to learn more about this UNIX especially the filesystems it uses.  Your observations are correct; lot's of UNIX experience, zero experience at posting, etc.  I type like hell & my favorite Word processor is vi editor; it's everywhere you wanna be.

As for my point about memory, I have used Sun's since 1985, doing lot's of performance related stuff.  The worst memory problem I encountered was on a system that would take 15 minutes just to login.  Iv'e noticed that this 576MB system pages to disk alot.  Back in the day (my 19 y/o daughter says that a lot & at being 53 this month I assume you are closer to her age than mine) UNIX systems could actually deadlock when they could not pageout fast enough to open new processes.

So I am making an assumption that the paging algorithms used by this OS are not as good as current industrial strength UNIX's.  FYI, I don't believe Linux is industrial strength either.

So this benchmark may not be memory intensive, but does require memory to run in.  #ps -aux looking at the RSS field will give you an idea.  Hey, do you know what the page size on MAC's is?

Anyway, on a UNIX system that has been running under load, free memory as measured by #vmstat (where is this command on MAC?) will be fairly low.  The kernel will use a variable with names like minfree, desfree (minimum free memory, desparation free memory) to determine when to start paging/swapping out processes to free-up memory so that the system will not deadlock.

The value of minfree , desfree & all the other paging related variables is determined by taking the variable physmem & dividing by an integer that is a power of 2. 

Sun used to publist a monthly e-magazine named sunworldonline.

Adrian Cockcraft (authored a Sun Tuning book with a Porche on the front cover) wrote a monthly tuning article.  I am sure this stuff is still on Sun's web site somewhere.  The paging algorithm is covered very well there as well.

Although Sun now uses system 5 UNIX & MAC is bsd, many things are very similar.

docs.sun.com is a GREAT place to get UNIX documentation

Anyway, thanks for the welcome & sorry I was rude to u guys - Iv'e been a computing professional since 1978 dealing with all the social misfits that aren't in prison, so I have developed a thick skin & did not mean anything between the lines.

Anyway, I'm in Seattle, it's November & the sun is shining!!

See ya man, Eric


----------



## theed (Nov 17, 2003)

alright then ericl, I too hope I didn't come off too harshly, but if your skin is as thick as you say then we're probably just fine.  My age, yeah, closer to 19 than 53, by about 16 years.

Sun seems to generally be good about documentation, and the javadocs make me so happy sometimes I could cry.  In a happy way.  Regarding the pagin on a mac, ummm, it's OK, I'm not sure what the page size is I just kinda figured it was 4k, but I have no way to back that up at all.  

The metrics to measure it aren't the same though.  If you look at it from a server perspective, lots of little file swaps, many parallel tasks, it's not going to be that great, but the paging system was meant to deal with realtime audio needs and large data moves on a workstation without losing realtime capabilities.  (I figure that means that it's performance is a little weak, it likes to read ahead, and doesn't block interrupts)  Compared to any other VM system, judging by the metrics that OS X was designed for, no other VM system is as good...  I read that in some programmer propaganda at one point.

vmstat is vm_stat.  I don't know why, I just go with it.  I usually just hang out looking at various modes of top though, it seems to give me more legible, meaningful output than a lot of the other utilities.

Well, I think that about wraps up this thread.  Hope to bump into you in various other sections of the forums ericl.

And Viro ... fine points aside, I think we're agreeing on the matter of the effective usefulness of parallel processing (specifically SIMD) on the desktop.


----------



## bing (Dec 4, 2003)

Just for the hell of it 

iBook 700Mhz G3 - 83s
1.4Ghx Xeon     - 39s

but the xeon has no gui and wasnt doin anyhting else bar apache


----------



## theed (Dec 4, 2003)

well hey, look at that, almost exactly a direct correlation to clock speed on those two.


----------

