Random Thoughts: Video Cards / Components
Friday, 1. February 2008, 09:15:15
This article is about a weird foray into the internals of an old video card. You want to talk about leaky abstractions, this is probably a good example, though most likely completely useless for anything else. There's also another short topic on Project V components later on.
Just a couple items of note. First, I've been playing around with some video card internals. Different video cards have different capabilities and I'm currently playing around with an older video card that has Pixel Shader 1.4. I was getting some strange results and decided to check the properties of the hardware registers that hold the values inside the pixel shader itself. I found some interesting side-effects that are really annoying to say the least.
This card is an old ATI 9200SE. I might check out some other cards later on, but I'll probably move on to stuff that is more directly related to Project V. On a side-note, the result of these tests produced some nice code for Project V that I can use. I know enough OpenGL to do what I want now. Anyways, registers on this video card can only be accessed by using them as floating point numbers. The weird thing is that I don't think they are floating point numbers at all. I think they are 16-bit fixed point registers. A fixed point number is one where the decimal point is assumed at a certain location (between bits). In this case, it's a 4.12 format as best I could tell.
The annoying part is that numbers (0-255) in a texture get mapped from 0.0 to 1.0. What's annoying about this is that there is no perfect conversion possible. 255 is not a power of two so anything that has base 255 won't fit perfectly in a binary register, floating point or not. Before going further, I decided to check if it was floating point or not. This is easy to do. If you divide a fixed point number by 2, after X number of iterations, you'll eventually start losing information and your number will eventually get to zero. With floating point number, you can divide a lot before something bad happens. This is because it's the exponent that gets modified, not the mantissa. For an 8bit exponent, you could divide 128 times before something happened. Obviously, you're not going to have a 128 bit register when you know the largest it could be is 24bit (what is advertised on the web for these cards). Also, floating point numbers go bad all at once. Fixed point number lose one bit at a time. So you can tell the difference there too.
What I found was that I could divide by 16 (and multiply by 16) and still get the same result back. But dividing by 32, I'd lose the least significant bit when displaying an 8bit result on the screen. I'd get all even numbers. All except for the number 255. That would always stay at 255 unless I shifted by 13 bits. The specs say that whole numbers can only go up to plus or minus 8. So we know there are 4 bits here (maybe 5, I haven't checked). If 255 stays as 255 no matter what, then we can assume this is represented as one internally as expected. But what of the other values? This would mean that 255 values are mapped onto 256 values. One would be vacant. I suspected it would be 128 or 129 or some value near the middle that would both map to the same value when displayed on screen. No such luck.
I wrote some code to try and represent a 16bit value using two 8bit color values. What I found was very strange. Splitting up the numbers was no problem because I could clip values wherever I wanted just by dividing. 255 was a problem, but I left that as a side issue. All other values clipped fine to get the high part. Once I have the high part, I can delete it from the original value, shift left and output the value. No problem.
When I put them back together (into 12bit values, not 16bit because they won't fit), I got weird results. I could only see the high 8bits displayed on screen (this is normal), but even then there was something off. My test simply consisted of a texture gradient going from black to white. I should have been able to reconstruct it perfectly, but most values were slightly off. If I clipped the high order values, then I could reproduce the correct result. What this meant is that 8bit texture values were being converted to some weird values when put into these 16bit registers. It was adding some extra bits in the lowest 4 bits. I decided to find out what they were. This was simple to do. Take the original value, clip it as I did before and subtract it from the original. That'll leave only the lowest four bits. Then I multiply this by 256 and output it to the screen. Here's what I got from a gradient that simply has pixels going from 0 on the left to 255 on the right in greyscale.

What's going on here? The values range from 0x00 to 0xF0 and the last band is 0x00 again. So the last four bits go from 0 to 16 and then suddenly drops back to 0 again. The second band starts at pixel 8 and stays that way for 16 pixel bands. Only the first and last bands are 8 pixels wide with a value of zero. So this is how your pixels get converted. A pixel value of 0x08 gets converted internally to 0x081. A pixel value of 0xF0 gets converted to 0xF0F.
Ok, so that's how they scale 255 values onto 256. They use an extra 4bit in fractional representation. Fine. But that last band bothered me. Why is it black? Should the progression not continue? Then it hit me. What if it does continue? The next value after 15 is 16. But 16 takes 5 bits. What if that was included in the result? This would mean that 248 is not represented internally as 248 (0xF8), but rather as 248+16/16 which makes 249 (0xF90). So 248 is the value that's skipped over (not 128 as I had thought). I haven't checked what it maps to when displayed. In any case, this would explain how values from 248 to 255 are internally represented as one higher than they really are. It would also explain how 255 is internally represented as 256 (or 1.0 in fixed point notation).
So even though I correctly split up 16bit values, I still get problems if these values come from two different textures. If I use them BEFORE I output them, I'm fine. But then, why would you split them up in the first place? You simply don't. You split them because you don't have a choice. But then you hit a snag like this that causes problems.
This doesn't just cause problems for pixel values. You use the same operations for both pixel values and for texture coordinates to do dependent reads, so these problems creep up when doing texture sampling. Of note is the fact that you can store constants for use in your pixel shaders. The problem is that I found out, using a similar process as above, that constants are internally stored as 8bit values. So what you think is 0.5 is actually 128/255 which is internally represented as (0x808) which is 0.501953125. Not a big difference, but this is what the designers decided to accept.
With newer cards, the internals use floating point numbers. Yet, the usage of 255 as 1.0 remains. There is no fractional part even if there is an instruction available that will let you retrieve it. There should really have been a set of instructions for pixels and one for everything else. At the very least have a way to convert between the two. Or better yet, have everything use base 2. Leave 255 as 255. If you really want to scale your values to span the entire 256 value range, you can scale your number by 256/255 and then multiply by the opposite when you're done.
Next item on my random thoughts is about components. I need a better way to develop and test them. Right now, I can "subclass" a component and change only what I need changed. Everything else is taken from the base components. I left this in for development purposes. But I don't really like this. It makes compilation a nightmare having to merge all these different things together. I suppose I could convert them all to self-contained versions and then use that. When you ship a component, it's supposed to be locked and then become self-contained. The problem is that I don't see many developers working that way. Reminds me too much of COM and that was a huge mess.
If anyone has any ideas on how to keep track of different version of a component during development, let me know. It's really getting to me. Then again, I may simply convert all components to self-contained ones and then work with that. Right now, I'm converting every single time I need to compare component types, even if they've been visited before. I was thinking of caching, but now I'm thinking I should just create an entirely new network based on the original one.
Oh, and if I'm not posting as much as before, it's because I'm really motivated to getting things done these days. My other projects (as well as Project V) are moving forward and will see results, so things are moving ahead.
Just a couple items of note. First, I've been playing around with some video card internals. Different video cards have different capabilities and I'm currently playing around with an older video card that has Pixel Shader 1.4. I was getting some strange results and decided to check the properties of the hardware registers that hold the values inside the pixel shader itself. I found some interesting side-effects that are really annoying to say the least.
This card is an old ATI 9200SE. I might check out some other cards later on, but I'll probably move on to stuff that is more directly related to Project V. On a side-note, the result of these tests produced some nice code for Project V that I can use. I know enough OpenGL to do what I want now. Anyways, registers on this video card can only be accessed by using them as floating point numbers. The weird thing is that I don't think they are floating point numbers at all. I think they are 16-bit fixed point registers. A fixed point number is one where the decimal point is assumed at a certain location (between bits). In this case, it's a 4.12 format as best I could tell.
The annoying part is that numbers (0-255) in a texture get mapped from 0.0 to 1.0. What's annoying about this is that there is no perfect conversion possible. 255 is not a power of two so anything that has base 255 won't fit perfectly in a binary register, floating point or not. Before going further, I decided to check if it was floating point or not. This is easy to do. If you divide a fixed point number by 2, after X number of iterations, you'll eventually start losing information and your number will eventually get to zero. With floating point number, you can divide a lot before something bad happens. This is because it's the exponent that gets modified, not the mantissa. For an 8bit exponent, you could divide 128 times before something happened. Obviously, you're not going to have a 128 bit register when you know the largest it could be is 24bit (what is advertised on the web for these cards). Also, floating point numbers go bad all at once. Fixed point number lose one bit at a time. So you can tell the difference there too.
What I found was that I could divide by 16 (and multiply by 16) and still get the same result back. But dividing by 32, I'd lose the least significant bit when displaying an 8bit result on the screen. I'd get all even numbers. All except for the number 255. That would always stay at 255 unless I shifted by 13 bits. The specs say that whole numbers can only go up to plus or minus 8. So we know there are 4 bits here (maybe 5, I haven't checked). If 255 stays as 255 no matter what, then we can assume this is represented as one internally as expected. But what of the other values? This would mean that 255 values are mapped onto 256 values. One would be vacant. I suspected it would be 128 or 129 or some value near the middle that would both map to the same value when displayed on screen. No such luck.
I wrote some code to try and represent a 16bit value using two 8bit color values. What I found was very strange. Splitting up the numbers was no problem because I could clip values wherever I wanted just by dividing. 255 was a problem, but I left that as a side issue. All other values clipped fine to get the high part. Once I have the high part, I can delete it from the original value, shift left and output the value. No problem.
When I put them back together (into 12bit values, not 16bit because they won't fit), I got weird results. I could only see the high 8bits displayed on screen (this is normal), but even then there was something off. My test simply consisted of a texture gradient going from black to white. I should have been able to reconstruct it perfectly, but most values were slightly off. If I clipped the high order values, then I could reproduce the correct result. What this meant is that 8bit texture values were being converted to some weird values when put into these 16bit registers. It was adding some extra bits in the lowest 4 bits. I decided to find out what they were. This was simple to do. Take the original value, clip it as I did before and subtract it from the original. That'll leave only the lowest four bits. Then I multiply this by 256 and output it to the screen. Here's what I got from a gradient that simply has pixels going from 0 on the left to 255 on the right in greyscale.

What's going on here? The values range from 0x00 to 0xF0 and the last band is 0x00 again. So the last four bits go from 0 to 16 and then suddenly drops back to 0 again. The second band starts at pixel 8 and stays that way for 16 pixel bands. Only the first and last bands are 8 pixels wide with a value of zero. So this is how your pixels get converted. A pixel value of 0x08 gets converted internally to 0x081. A pixel value of 0xF0 gets converted to 0xF0F.
Ok, so that's how they scale 255 values onto 256. They use an extra 4bit in fractional representation. Fine. But that last band bothered me. Why is it black? Should the progression not continue? Then it hit me. What if it does continue? The next value after 15 is 16. But 16 takes 5 bits. What if that was included in the result? This would mean that 248 is not represented internally as 248 (0xF8), but rather as 248+16/16 which makes 249 (0xF90). So 248 is the value that's skipped over (not 128 as I had thought). I haven't checked what it maps to when displayed. In any case, this would explain how values from 248 to 255 are internally represented as one higher than they really are. It would also explain how 255 is internally represented as 256 (or 1.0 in fixed point notation).
So even though I correctly split up 16bit values, I still get problems if these values come from two different textures. If I use them BEFORE I output them, I'm fine. But then, why would you split them up in the first place? You simply don't. You split them because you don't have a choice. But then you hit a snag like this that causes problems.
This doesn't just cause problems for pixel values. You use the same operations for both pixel values and for texture coordinates to do dependent reads, so these problems creep up when doing texture sampling. Of note is the fact that you can store constants for use in your pixel shaders. The problem is that I found out, using a similar process as above, that constants are internally stored as 8bit values. So what you think is 0.5 is actually 128/255 which is internally represented as (0x808) which is 0.501953125. Not a big difference, but this is what the designers decided to accept.
With newer cards, the internals use floating point numbers. Yet, the usage of 255 as 1.0 remains. There is no fractional part even if there is an instruction available that will let you retrieve it. There should really have been a set of instructions for pixels and one for everything else. At the very least have a way to convert between the two. Or better yet, have everything use base 2. Leave 255 as 255. If you really want to scale your values to span the entire 256 value range, you can scale your number by 256/255 and then multiply by the opposite when you're done.
Next item on my random thoughts is about components. I need a better way to develop and test them. Right now, I can "subclass" a component and change only what I need changed. Everything else is taken from the base components. I left this in for development purposes. But I don't really like this. It makes compilation a nightmare having to merge all these different things together. I suppose I could convert them all to self-contained versions and then use that. When you ship a component, it's supposed to be locked and then become self-contained. The problem is that I don't see many developers working that way. Reminds me too much of COM and that was a huge mess.
If anyone has any ideas on how to keep track of different version of a component during development, let me know. It's really getting to me. Then again, I may simply convert all components to self-contained ones and then work with that. Right now, I'm converting every single time I need to compare component types, even if they've been visited before. I was thinking of caching, but now I'm thinking I should just create an entirely new network based on the original one.
Oh, and if I'm not posting as much as before, it's because I'm really motivated to getting things done these days. My other projects (as well as Project V) are moving forward and will see results, so things are moving ahead.
By spc476, # 1. February 2008, 20:46:16
By Vorlath, # 3. February 2008, 14:18:24