Cache Questions

mindbend

Registered
So, Apple's hyping this whole L2/L2 cache thing as though it makes up for the huge MHZ gap. I want some real world examples of how these caches will speed up performance. They talk about caching application code for instant access. Is that really that big a deal? I'm not convinced. Granted, I don't really understand what it's doing, so I don't know, but I remember the old 2 meg L2 cache hype on my PowerTower and I was hard pressed to notice any big deal about it.

What exactly is being cached? Say like in Photoshop, is it caching menu options, undo data, image data, what?

For the record, I'm getting one [DP1gig] either way, I just wanted to learn a bit more about how much L2/L3 cache REALLY helps. My gut stills tells me not much, it's clearly an attempt to distract us from the chasm known as MHZ gap.
 
Think of cache as memory. The L2 and L3 cache are little bits of memory running at superfast speed, so the processor can output little bits of information that it uses a lot to the cache, and when it needs them again, it can page them in and blazingly fast speeds, instead of using the slow 133 MHz system bus. The L2 cache runs at the speed of the processor, and the L3 cache is basically DDR RAM dedicated to the processor.

Real-time performance tests can really show that L2 and L3 cache can help quite a bit. Although real DDR RAM and a HyperTransport system bus would really help quite a bit, too. Those are the two areas in which Apple is sorely lacking.
 
Originally posted by mindbend
My gut stills tells me not much, it's clearly an attempt to distract us from the chasm known as MHZ gap.

To talk about processor cache, let step away from the Mac/PowerPC arena for a minute. I think a good real world example would be my SGI Indy running with an R4600SC (secondary cache, in this case 512k) at 133 MHz (hey, you know this must be a really slow system looking at the MHz, strange that it's main use for me is video capture). I had a friend who saw and like my Indy so much that he bought one that he thought was the same because it was a 133 MHz system. He could understand why some of the things that seemed quite fast on my system where not as fast on his. As it turned out, he had gotten an R4600PC (primary cache only) system which runs 15-30% slower than mine. At another point in time I thought about getting an R5000 upgrade (same type of processor used in the early O2 systems), and the R5000PC/150 MHz looked good until I saw that it wasn't really any faster than my system. Then I saw that the R4400SC/200MHz was selling at about half the price of the R5000SC/180MHz (which was because the R5000SC/180 was quite a bit faster than the R4400SC/200, again with the wonderful MHz not matching real system speed).

Lets now take a look at the second generation of PowerPC processors for a minute (seeing as you brought them up). The 603/604 series were not design to use cache effeciently. Add to that that the L2 cache was on the mother board and used the system bus, and the performance boost would not be worth the effort.

Then came the G3. Have you ever asked yourself "why was the G3 so much faster than the 603/604 processors at any given MHz?" The answer to that question is how it takes advantage of L2 cache and how a dedicated cache only bus was created between the processor and the L2 memory (usually at some fraction of the actual processor speed). The only problem with this was that you couldn't get more than one processor's cache to play nicely within a system... until the G4 that is. The nexted big step was having a large (256k) on board/at speed L2 cache (data bus is now inside the processor) and having a larger L3 cache on a dedicated bus with the processor(s).

So what difference does this make? Lets look at the Pentium series of processors for a moment. The Pentium II is (MHz for Mhz) faster than the Pentium III. Why? Because in order to move forward in the MHz gap they had to take performance short cuts (which the higher MHz mostly made up for). This happened again in the transition from the Pentium III to the Pentium 4. Now lets look at the PowerPC line. One branch moves along from the 601 to the 603 to the G3, and the other from the 601 to the 604 to the G4. The move from the 601 to the 603 was a step backwards for performance, but a step forwards for MHz (and heat which is why that line ended up on early PowerPC processors). The move from the 603 to the G3 was both a big jump forwards in performance (a 233 MHz G3 was much faster than a 300 MHz 603e) and heat (the first time in history that a processor debuted in both desktop and laptop systems). In the jumps from 601 to 604 to G4, we got both a performance increase and a MHz increase (though as you noted, the MHz numbers didn't jump up as fast as in the Pentium/x86 lines).

So where does this "gap" show up again... why in Intels new Itanium processor (running at 800 MHz, compared to 2.2 GHz for the cheeper Pentium 4). The fact of the matter is, once you get past the consumer market (where people don't know any better), MHz doesn't tell you much about a system, but a systems use of cache can be very telling (some SGI's have been at 4 MB for quite some time now, we should have match them with that mark by now). Mind you, believe what you like, you really should check these types of things out for yourself. And remember that there are more than just Macs and PC in this world to take into account when studying relative performance between processors.
 
I'm sure I'm oversimplifying here, but the L2 cache can be regarded as the second fastest info processing location after the CPU itself (if the L2 is optimally configured). In descending order of speed, it goes: CPU, L2 cache, RAM, hard disk.
 
I appreciate everyone's tech talk, really I do, my original request has gone unanswered, which is REAAL WORLD examples of exactly how cache increases performance.

e.g. games, video rendering, screen redraws, switching between apps, sherlock finds, etc. Nobody has provided an example of how any of these or other things are affected.

You say cache makes a difference. I believe you, but where?
 
Originally posted by mindbend
I appreciate everyone's tech talk, really I do, my original request has gone unanswered, which is REAAL WORLD examples of exactly how cache increases performance.

e.g. games, video rendering, screen redraws, switching between apps, sherlock finds, etc. Nobody has provided an example of how any of these or other things are affected.

You say cache makes a difference. I believe you, but where?


You are thinking about the cache thing all wrong. It won't provide the cut and dry performance increases you are looking for.

More L2 cache will prevent the processor from waiting for instructions. It will decrease the frequency that the processor has to ask (and then wait) for instructions to come from RAM or the Hard Disk.

More level 2 cache will increase your performance in CPU intensive games (not GPU intensive games), it will increase Video Rendering, screen redraws, and switching between apps. The Sherlock Find is more likely limited by your Hard Disk Speed.

You are looking for L2 benchmarks in terms of a Quake 3 FPS boost. I seriously doubt that you will find that. L2 cache will provide an OVERALL performance boost. And RacerX gave some very good real life examples of the overall system boost.

FaRuvius
 
actually a cache is any intermediate storage for quick retrieval. There isn't a specific type of data stored in the cache. Anything that was uned recently has a chance of being stored in the cache.

hierarchy goes processor
L1 cache (on processor, at processor speed, 32K instruction 32K data)
L2 cache (dedicated bus, half processor speed usually, 512K to 2MB)
L3 cache (if they want ANOTHER layer of cache, 2 Meg maybe, faster than system)
RAM
HD for use as VM.

In modern implementations, you could view HD as the memory and RAM as the cache, it'd make no difference in action, just terminology. If you understand VM, you understand cache in an upside down way.

If you really want, you could buy yourself a processor upgrade card with software controlls for the cache, and then you could run benchmark after benchmark determining what efect L2 cache had. Truth is, it's a fuzzy general number due to statistics and averages. Everything will be faster. How much depends on what you do. You're free to pry that puppy off of your motherboard and let the rest of us know what it feels like. I'd guess 15% on games, 30% on rendering and compiler times.
 
On a G3 upgrade card, I ran MacBench several times under various configurations. Here's the skinny. Base system is a G3 300. (B&W I guess)

G3 340 card 512k cache @ 270 int:86% float:112%
G3 340 card cache disabled int:37% float:89%

I didn't run other tests, expect them te differ less than these did, or maybe not at all.
 
OK, thanks for the examples. I'm convinced. Here's some more dumb questions.

Where does general RAM play into this and hy isn't the general system RAM in effect a cache?

Consequently, why isn't cache RAM huge, like 16, 64, 512 RAM? I know cache RAM isn't SIMM/DIMM chips (right?), so I guess expense is part of it, but would big giant amounts of cache RAM help to a proportional degree?

Sorry for the continued dumb questions.
 
Actually they aren't dumb questions.

First, for a long time you have been able to create a cache in real RAM (see Memory control panel in Classic OS), it was limited by the mother board bus speed and had to deal with traffic from other apps.

The size issue has to do with efficiency. I like the Browser cache as an example here. For Netscape you can decide just how much hard drive space you want for your browser cache. You would think that more is better, but what happens is that after a certain amount (mine is set to 12 MB) the browser is spending as much or more time searching for cached information as it would have taken to just re-download the information again. Processor cache is the same way, where if you can not make effective use of the larger cache, it could kill any speed increase you would think you could get.
 
to RacerX : common misconception. The RAM cache isn't cache for the RAM, that requires special, faster, dedicated hardware; the RAM cache is cache for the Hard Drive, so that when you read from the HD a big chunk of data is held in RAM, in case you happen to want the next couple of bytes in the near future, it's in RAM instead of on the HD. RAM responds in 6 to 60 ns depending on machine, and HD responds in 7 to 12 ms, similar numbers, different orders of magnitude, but L2/L3 cache and RAM cache (control panel) are not related.

I'm mentioning this here, because it may be easier for some to see the advantage of cache if we talk about RAM and Hard Drives because these ore more tangible for many people. Same concept, different location, and L3 cache is meant to respond in .2ns rather than 6ns.

1GHz cycle takes 0.001 ns
1 ms = 1000 ns
1 sec = 1000 ms (in case you needed this)
 
posted by theed
to RacerX : common misconception. The RAM cache isn't cache for the RAM, that requires special, faster, dedicated hardware; the RAM cache is cache for the Hard Drive, so that when you read from the HD a big chunk of data is held in RAM, in case you happen to want the next couple of bytes in the near future, it's in RAM instead of on the HD. RAM responds in 6 to 60 ns depending on machine, and HD responds in 7 to 12 ms, similar numbers, different orders of magnitude, but L2/L3 cache and RAM cache (control panel) are not related.

First it is called disk cache (not RAM cache), and I only brought it up to illustrate an example of in-RAM cache techniques, and never said that it was related to the L1/L2/L3 instruction cache (other than the obvious use of the term cache). But it's main use is storing frequently used information in RAM.

Oh, and theed, just for the record, I don't think that browser cache is related to L1/L2/L3 cache either (wouldn't want you misconstrue that example as well). :D

And if we are going to be picky, I would point out that in your definition of L2, you missed that fact that Apple's current L2 is at processor speed (which is why the G4's 256k L2 cache is so impressive). But that is just if we are getting picky. :p
 
didn't mean to diss you my brotha. And yeah, looking at the specs, at clock L2 is pretty sweet. I kinda switched explanations half way through thinking about classic 2 level cache implementations and then moving to Apple's current 3 layer caching set.

Out of curiosity though, when your L1 and L2 cache are both at processor speed, is there really any hierarchy to them? Do they become one big cache in effect? Is L2 8 way associative where L1 is flat? I'm baffled by 1GHz L2, it seems to defy the whole concept of L2 ... it's just too fast. :)
 
I was giving you a hard time ;) .

Your definition of L1 stays the same even in the case of the new G4s. But in the case of the L2, instead of half speed we get full speed which is why that 256k is better than 512k in the normal L2 setting. The problem is that 256K at speed is still not equal to 2 MB at half... but you have that space that was left from when the L2 was internalized, so Apple/Motorola added an extra L3 where you would normally see an L2 before. My guess is that the 256K L2 and the 2 MB L3 give the same boost that a normal (half-speed) 3+ MB L2 would give.

I would point out that in some upgrades (and earlier versions of Rhapsody/Mac OS X Server) the L2 cache of the G3 was not active with out a software patch. That would lead me to believe that the L1 is a function of the inner working of the processor, but the L2+ is utilized (organized) by a set of software instruction. That is just a guess though.
 
Back
Top