It's all about the way the software is written, and how the hardware talks to itself.
For example (just throwing out numbers here, nothing official) the speed by which a 1k instruction passes from the CPU to the video card and back on a mac is say 10ms over a cache of 133MHz. On a SGI that speed is 5ms over a 266Mhz cache. But that's only one way to do it, and that may not be how SGI did it. (It would run hotter than you'd care to know)
The other much more efficent way is to tell the CPU to do things in a different manner. i.e. Don't make a complicated GUI for me, just connect coordinates 129,118,558,199 and shade them. Don't worry about anything else. So now you've got a computer that is using maybe 5% of it's CPU for gui, and has 95% left for the program's processes.
Now this could be all wrong, but it's how it was explained to me when I was working on O2's w/ 300 something MHz chips, and 128MB of ram.
Btw, you want to talk about f-a-s-t OpenGL rendering! Quake ran on that thing 1024x768 without a stutter. My PC of PII status w/ 256MB and a voodoo card w/ 4MB could barely handle 800x600. It's all about how the software talks to the hardware.