Aqua & UI speed theories

Try doing programming some time... and the PPC can't automagically switch modes, it creates way too many issues.

The bus really only mimics what the chips throw across it. If the CPU is big endian, data will be expected on the bus in a big endian manner. If the CPU is little endian, data will be expected in a little endian manner.

Endian-ness is not bit order, it is byte order. On a big endian system, you see the high 8 bits at the low end of the word (32-bits on 32 bit processors), and on a little ending, you see the low 8 bits. Text is a non-issue since each char is only a byte long.

However, rewiring pins/etc doesn't really make the manufacturing of hardware cheap when you can do it in software. For example, the tdfx driver for Linux has issues on PPC because it isn't endian safe. Also, Apple's IOKit docs explicitly state to developers to make sure the endian-ness of their data is correct when dealing with PCI cards and the like.

It is an issue, and a rather big one. Instead of just pushing, you are adding another layer of processing to every pixel moving across the bus, as well as every bit of data moving across the bus. As I said, I am programming drivers for Voodoo cards to get at least some support under OS X, and am dealing with these issues this thread covers every step of the way.
 
and endianness is bit order within a number, if the number is multiple bytes, then it may mess up byte order as well.

Since the cards are custom fabbed anyway for these chips, and since the whole point of having a video card is to NOT do it in software, I'd think that ATI already has this issue covered. With a truly cross platform card, it could be an issue, if it isn't dealt with in ROM or something.

I've programmed FPGA's, it's real easy to switch bit order in HW with a single logical switch. Byte order would take a little more logic, and may be done just about as effectively in SW.

3dfx never made a mac card, so using it on a mac would likely require software faking of what could have been done in HW.

Good luck with your driver.
 
Okay, now I am calling your lie... 3Dfx produced Mac Voodoo 4s and 5s, and even provided a flasher utility and drivers so that a PC PCI Voodoo 3 could be used in a Mac. Hell, I am using both under MacOS 9 right now :rolleyes:

When referring to programming, endian-ness is dealt with when it concerns byte order. You are right about bit order not being an issue, but when you shove over a 4 byte integer a bus, you need to make sure it is in the same byte order as the receiving device (RAM doesn't have endian-ness, and isn't an issue). PPC is big-endian for byte order, x86 is little-endian. ATi and 3Dfx sold Mac cards which were just flashed with a different ROM and different drivers... little-endian cards. Endian swapping is very common when talking to generic PCI devices that are designed to be for PC mainly, but also work in a Mac.

Open Firmware-compliant PCI/AGP device ROMs are essentially NDRVs with fields describing the device. An NDRV is a code fragment that OS 9 and earlier used to handle PCI video cards before on-disk drivers could be loaded. These NDRVs were executed by the CPU and did the endian swapping for the cards. OS X still uses them for ATi cards for raw framebuffer access. (IONDRVFramebuffer in IOKit)

On the comment of swapping endianess in software (which it is always done that way...), look at my previous trick question a couple posts back.
 
I thought they had kinda after the fact written mac drivers for previously PC only cards. I didn't know there was a ROM upgrade, and I simply don't recall an actual retail Mac 3dfx offering. Unless I find otherwise I'll regard myself as mistaken.

I would be kinda disappointed if the PPC instruction set didn't include some sort of byte swapping endianness call. There's probably a way to be kind quick using altivec, but still.

But I'm really disappointed if what you say is true of ATI's bit swapping in software still. It's simply ridiculous. I mean really, I could draw the schematic freehand right now for how to wire in bit swapping on their totally custom chip. Leave one bit in ROM for whether bit swapping should occur on this platform.
 
I never really said anything about bit swapping... I am referring to byte swapping. As you stated, and I agree, the bit order doesn't really matter, however, the byte order does. The good news is that this only affects multi-byte values being moved around the bus, but the bad news is that practically every value sent to the video card and back is 2-byte or 4-byte. A 32-bit color will have the colors rather wierd looking, and 16-bit color will just plain look awful if you can even see anything.
 
I didn't catch the part where you agreed with me in the bits and we were concentrating only on bytes. My bad.

Indeed, I can see how that'd be left to software.

As for the trick question, I'd be guessing that the one with fewer instructions would be the fastest. But I'll be the first to admit I don't know the intracacies of how superscaler chips decide which instructions decide which instructions to run in parallel.
 
The odd thing is, the shifting/etc is about as fast as the load/store. Why?

loading in the load 4 times, store 4 times manner makes the processor wait for 8 free cycles on the bus... the larger the gap between the bus clock and the CPU clock, the more CPU cycles are wasted (especially in this manner... the ability to execute multiple instructions is useless in this case).

However, loading the whole thing in one go, and then letting registers do the work allows it to do the work by the time 4 bus cycles have passed (depends on the exact machine, but this is true in general).

Despite the fact it takes fewer instructions to do the loadx4/storex4 method (which needs 8 free bus cycles), the dependance on the bus makes it as slow, if not slower than the load/shift/and/or/store method (which only needs 2 free bus cycles). One of the things programmers have to keep in mind these days oddly enough.
 
Back
Top