OS X vs. Linux (x86) speed?

Ripcord · Nov 10, 2003

That's very very interesting,

on my DP 2.0 G5 the command took 24.3s. My single processor 1.25ghz G4, this took 4m11.984s, and around that consistently.

No, I didn't mistype - in fact, to be sure, I cut and pasted from above. It was 10,000 iterations of i and j. Before starting, CPU hovered around 4% used, during the test it stayed at a steady 100%.

Reducing to i<1000 brought the time to 24.9s (on the G4)

My G4 doesn't seem particularly slow otherwise - should this worry me?

Ripcord · Nov 10, 2003

Nevermind - it looks like it was the GNU Awk that got installed with fink that took so long - if I run the awk in /usr/bin it completed in 49.626s this time. Weird - I wonder why the fink awk (GNU Awk 3.1.2) is so slow?

theed · Nov 11, 2003

it's probably bloated with silly features. You know those gnu guys. Have you even seen emacs? ;-)

Seriously though, I don't know why the difference.

Ripcord · Nov 11, 2003

In the back of my mind I was hoping that my machine was running 4 times slower than it should (and I just hadn't noticed), which meant that there was probably something I could do to make it run 4 times faster =)

ericl · Nov 14, 2003

Well, you are comparing Apples 7 Oranges!! Just kidding.

You didn't say how much memory each host had (installed ohysical memory)

You didn't say how much free memory each host had before you hit the <return> key. On a unix host that has been running for a few hours (#uptime command), free memory as reported by the #vmstat command tends to be low.

Since the execution time on each host was over 30 seconds, it is safe to assume that the process that writes dirty filesystem pages to disk(fsflush on SunOS) ran at least once

You didn't say what priority the process ran with on each host (#nice command)

You didn't say how many processes were running on each host (#/usr/ucb/ps -aux|wc)

You didn't say how many times you ran the test (at least 10 times, then average the results)

theed · Nov 14, 2003

ummmm, he also didn't say what the phase of the moon was. The code is a loop. That's it, and as such it tends to sequentially use the basic math functions of the CPU without doing parallel work, thus typically reflecting very roughly what the MHz rating of the chip is. The code doesn't consume RAM. It's a weak if not worthless benchmark and it largely reflects the hardware, not so much the OS.

Your knowledge of unix commands is fairly impressive. Your following of this thread is not so impressive.

Sorry to jump all over you there new guy, but you kinda came down on us without having any sort of real point (that I can see anyway) and it just comes off as obnoxious. That and you said Apples 7 oranges ... Aside from that, welcome to the forums.

And how should I pronounce your name? Kinda like oracle?

Viro · Nov 16, 2003

Well, another purely CPU but rather not too useless benchmark is the distributed.net client. This client tries to break the RC5 encryption, and it provides a handy benchmark utility(pass -bench to the client).

Once you run that, I dare you to find an x86 based CPU that can beat even my lowly Powerbook G4 867 Mhz

. Because distributed.net is optimized for Altivec, it totally blows the competition away. My P4 1.8 Ghz does roughly 1.9 Million keys a second. On a good day, my G4 867 tops 9 million keys a second, over 4 times as fast.

While its just a CPU benchmark, it does show that the G4 and G5 aren't pushovers, once code has been properly optimized to run on them.

theed · Nov 16, 2003

Yeah, and that's the point. The PPC chips haven't been real movers in terms of clock speed, but they have serious ability to get work done in each clock cycle. However, the distributed.net client is about as artificial a benchmark as the one described at the beginning, unless you happen to be doing exactly that kind of work, parallel instructions. This happens a good bit in science, high end math, and video encoding/decoding.

However, for speeding up Quake, running java, ripping through script languages, compiling, doing AI work ... the other artificial benchmark is probably more telling. Those tasks require very little work per clock cycle, but they need the results of the last computation to do the next one. They don't go parallel very well.

So while I am going to say that both the distributed.net client and the awk loop are both nearly worthless and artificial, they both represent something. And if you put them together and average the score, then you start to get something that has general meaning. Actually you should probably weight the loop thing about 4 times as heavy as the distributed.net client because it represents the more common case of the processor being nearly loadless and looping.

Meanwhile, the ability of the PPC to get lots of work done per clock cycle, plus the crazy clock speed of the G5, and there is serious power there. Intel is countering by trying to add the ability to get more work done per clock cycle to their x86 line, (hyperthreading) and/or by moving to iTanium, or whatever it's called this week. It's an interesting game.

Viro · Nov 16, 2003

However, for speeding up Quake, running java, ripping through script languages, compiling, doing AI work ... the other artificial benchmark is probably more telling. Those tasks require very little work per clock cycle, but they need the results of the last computation to do the next one. They don't go parallel very well.

There are somethings that I would disagree with. In a 3D game, once you move an object you'll have to update its coordinates. Calculating the new position for each vertex in a 3D scene is something that can most definitely be done in parallel. The reason why most games on Mac don't use Altivec is because the data structures used aren't designed for use with Altivec. But that's really the fault of developers for not targetting the Mac market and just doing ports as an after thought.

AI work, it really depends on what you mean by AI. Genetic algorithms can be heavily parallelized, same with celullar automata and neural networks.

If Java automatically vectorized code or generated Altivec code, you'd get a speed increase. Sun's JVM currently emits SSE instructions, and look and certain programs (heavily reliant on maths) run just as fast as native C code, if not faster.

But yeah, until something happens to automagically make programs use Altivec, things will always be Mhz bound.

ericl · Nov 17, 2003

I just meant to help, stimulate thought. I'm an unemployed UNIX person trying to see if I still like computers. I'm trying to learn more about this UNIX especially the filesystems it uses. Your observations are correct; lot's of UNIX experience, zero experience at posting, etc. I type like hell & my favorite Word processor is vi editor; it's everywhere you wanna be.

As for my point about memory, I have used Sun's since 1985, doing lot's of performance related stuff. The worst memory problem I encountered was on a system that would take 15 minutes just to login. Iv'e noticed that this 576MB system pages to disk alot. Back in the day (my 19 y/o daughter says that a lot & at being 53 this month I assume you are closer to her age than mine) UNIX systems could actually deadlock when they could not pageout fast enough to open new processes.

So I am making an assumption that the paging algorithms used by this OS are not as good as current industrial strength UNIX's. FYI, I don't believe Linux is industrial strength either.

So this benchmark may not be memory intensive, but does require memory to run in. #ps -aux looking at the RSS field will give you an idea. Hey, do you know what the page size on MAC's is?

Anyway, on a UNIX system that has been running under load, free memory as measured by #vmstat (where is this command on MAC?) will be fairly low. The kernel will use a variable with names like minfree, desfree (minimum free memory, desparation free memory) to determine when to start paging/swapping out processes to free-up memory so that the system will not deadlock.

The value of minfree , desfree & all the other paging related variables is determined by taking the variable physmem & dividing by an integer that is a power of 2.

Sun used to publist a monthly e-magazine named sunworldonline.

Adrian Cockcraft (authored a Sun Tuning book with a Porche on the front cover) wrote a monthly tuning article. I am sure this stuff is still on Sun's web site somewhere. The paging algorithm is covered very well there as well.

Although Sun now uses system 5 UNIX & MAC is bsd, many things are very similar.

docs.sun.com is a GREAT place to get UNIX documentation

Anyway, thanks for the welcome & sorry I was rude to u guys - Iv'e been a computing professional since 1978 dealing with all the social misfits that aren't in prison, so I have developed a thick skin & did not mean anything between the lines.

Anyway, I'm in Seattle, it's November & the sun is shining!!

See ya man, Eric

theed · Nov 17, 2003

alright then ericl, I too hope I didn't come off too harshly, but if your skin is as thick as you say then we're probably just fine. My age, yeah, closer to 19 than 53, by about 16 years.

Sun seems to generally be good about documentation, and the javadocs make me so happy sometimes I could cry. In a happy way. Regarding the pagin on a mac, ummm, it's OK, I'm not sure what the page size is I just kinda figured it was 4k, but I have no way to back that up at all.

The metrics to measure it aren't the same though. If you look at it from a server perspective, lots of little file swaps, many parallel tasks, it's not going to be that great, but the paging system was meant to deal with realtime audio needs and large data moves on a workstation without losing realtime capabilities. (I figure that means that it's performance is a little weak, it likes to read ahead, and doesn't block interrupts) Compared to any other VM system, judging by the metrics that OS X was designed for, no other VM system is as good... I read that in some programmer propaganda at one point.

vmstat is vm_stat. I don't know why, I just go with it. I usually just hang out looking at various modes of top though, it seems to give me more legible, meaningful output than a lot of the other utilities.

Well, I think that about wraps up this thread. Hope to bump into you in various other sections of the forums ericl.

And Viro ... fine points aside, I think we're agreeing on the matter of the effective usefulness of parallel processing (specifically SIMD) on the desktop.

bing · Dec 4, 2003

Just for the hell of it

iBook 700Mhz G3 - 83s
1.4Ghx Xeon - 39s

but the xeon has no gui and wasnt doin anyhting else bar apache

theed · Dec 4, 2003

well hey, look at that, almost exactly a direct correlation to clock speed on those two.

OS X vs. Linux (x86) speed?

Ripcord

Senior Lurker

Ripcord

Senior Lurker

theed

Registered

Ripcord

Senior Lurker

ericl

Registered

theed

Registered

Viro

Registered

theed

Registered

Viro

Registered

ericl

Registered

theed

Registered

bing

Registered

theed

Registered