Skip navigation.

exploreopera

| Help

Sign up | Help

A Blag About Graphics and Programming and Stuff

*gasp* Linux...

I'm a Microsoft fanboy: I use Visual Studio, DirectX, Windows Media Player, I like Vista and I think that C# and .NET are great.
But there's a geek in me who wants to see how the latest and greatest Linux distributions look like. I have a 10GB partition just for that. I install Linux, play around with it for a couple of hours and never boot it up again for the next half a year until a newer and greater version comes out.

My latest evaluation checklist looks something like this:
  • How difficult is it to access my NTFS partition;
  • Connecting to my home wireless network;
  • Setting up proper(1280x800) screen resolution;
  • Does the headphone jack work;
  • How hard is it to play "evil" (mp3, etc) multimedia files;
  • Hardware graphics acceleration;
Naturally I want all of this "out of the box" (or at least with a couple of clicks of a button), without having to google stuff and type alien mumbojumbo in the console.
And it just so happens that this checklist is nearly impossible to pass (at least on my laptop)!

Old(7.10) ubuntu/kubuntu versions used to fail miserably, the only thing that worked was reading NTFS and sometimes- wireless. Yes, I couldn't even change the screen resolution to 1280x800!
Mandriva One 2008 and Linux Mint 4.0 looked half decent- headphone jack and hardware acceleration were the only things that didn't work, the default resolution was messed up, but it was quite simple to change it.

Mandriva Linux 2008.1
The new version came out a couple of weeks ago. I boot up the live CD and as it is loading, I sense that something is different: zomg! The resolution is 1280x800! I couldn't believe my eyes, and then the most incredible thing happened: a window popped up and asked if I wanted to enable Compiz! Whoa, it supports hardware acceleration for my cutting edge, super new HD2600 card! And the magic didn't end there- sound was coming out of the headphones, and no sound was coming out of the built-in speakers while the headphones were plugged-in.
The only problem was "glsl shaders not supported". But at least I can run Tux Racer at 50fps...

That was a historic day- not only a Linux distribution passed my entire checklist but it did it straight "out of the box".
Mandriva Linux 2008.1- certified by me.

I expect the next version to support multiple displays and maybe even the volume control thingy on the side of my laptop.

[rant]

Ubuntu 8.04 (<adjective> <noun>)
Yes, this is the one that came out today. The most beloved and popular linux distribution. It should give Mandriva a run for it's money!
No, not really.
No hardware graphics acceleration, no compiz, no sound from the headphones (uses the built-in speaker even when the headphones are plugged-in), half of teh internets is broken because "requires additional plugin" or "your Flash version is out of date", no mp3 playback without doing "stuff". And the fun thing: "your (graphics) hardware doesn't require proprietary drivers". OH RLY? Then how come it doesn't work?

I can understand not playing mp3 files, not including Flash and other hippie "free" software nonsense. I can understand when they don't support shaders or a webcam or an sd card reader, because, hey, writing drivers is hard.
But the thing that gets me, is when the dirty, tree loving hippies behind Ubuntu and other "free" software choose not to support my hardware just because they have personal issues with ATI or some other hardware vendor, I take that as a personal "fuck you".
"Humanity towards others" my ass.

And this bring me to my main point: the way I see it, there are two types of linux:
  1. Corporate Linux: Mandriva, SuSe, Fedora. This is the good kind, because they have a corporation behind them and that means that they actually care about quality (and money);
  2. Everything else: it sucks. An exception is Linux Mint. I have no idea how did this happen. Will have to investigate more.

[/rant]

Geomipmapping

The basic idea is that you split your terrain into smaller patches and render each patch in lower detail as it gets further away from the camera. Pretty much the same as texture mipmapping, hence the name- geo mipmapping. More info.

The annoying thing is solving the gaps between patches of different LOD (T-junctions). I've already done this a couple of years ago, back then, I used a static index buffer for the center of the patch, and a dynamic one for the edges.

That worked quite ok, the best part was, that one patch could have been in highest detail, its neighbor- in lowest detail and both of them would connect perfectly.

Instead of choosing witch LOD to use based on distance, I used to test how big of a "pop" a lower detail patch would cause, if it was acceptable (just a couple of pixel)- use the lower LOD.
Determining which LOD to use this way had two great effects:
Patches that were further away used lower LOD (just like determining based on distance).
Patches that were flat or very smooth, used lower LOD even if they were close to the camera. For example, if a camera is in the middle of a patch that is just a flat plane, it would be rendered in lowest detail because a plane never causes any popping. On the other hand, if you use distance based LOD, the same patch would be rendered in highest detail because it is very close to the camera.

Ok, back to the present.
Currently I'm using simple distance based LOD, limiting to only one LOD difference between neighboring patches and precomputing every possible variation of LOD/edge. So every LOD has 9 permutation of different edges (I guess 16 if you're not doing any culling, 12 if you're not doing a certain hack).
I'm going to use geomorphing to solve the popping problem.

Terrain with the height map as the texture:

Wireframe (LOD+ culling)
Size: 513x513; Patch size: 33;
720fps
(click)

Wireframe (no LOD, no culling)
300fps
(click)
(click)

(click)

Depth of Field is Fun



SlimDX

As you may know Microsoft no longer supports Managed DirectX and wants you to use XNA instead. What if (for whatever reasons) you don't want to use xna? SlimDX to the rescue!
Basically, it's just a managed wrapper on top of pure DirectX with several additional classes like BoundingBox/Sphere/Frustrum, Ray etc. Yay for not having to learn a new API!

Most of the stuff is straightforward but since there isn't any proper documentation there are several "interesting" things you have to figure out yourself. My favorite one was that D3DXPlaneTransform() requires an inverse transpose of the transformation matrix, while SlimDX.Plane.Transform() does not. Oh the fun I had "debugging" that one...

And now, the important part: DO NOT download the ZIP (currently "SlimDX (Nov 07).zip"). It's very out of date and very useless. Just get it off the SVN and compile it yourself. It will save you a lot of headache.


p.s. Added a little poll. Let's see kind of hardware you have... I can't believe that currently SM4.x is at a 100% :rolleyes:

Map Your Texels to Pixels!

Directly Mapping Texels to Pixels.
The main problem is, that when texels are not aligned to pixels, Linear filtering blurs the texture and it ends up looking ugly(a bit more info than the msdn article).

Another problem is that the is image is slightly offset. By 0.5 texels to be exact. This isn't really a problem if the resolution is high. But what happens if you want to stretch a low res texture on the entire screen?
Bloom is a perfect example: render something a small texture (128x128), blur it vertically (render to another 128x128 texture), blur the result horizontally (render to a 128x128 texture), stretch the result over the screen.

If you aren't aligning texels to pixels, the image is "slightly" offset three times. But surely that can't add up to anything noticeble?!
O RLY..? ("And don't call me Shirley" :wink:)
Without alignment:
With alignment:(Yes, the red circle thingy is blooming green, it's just to make the problem more visible)

As the articles state, all you have to do, is offset the vertices (or texture coordinates) by half a texel:
Pos-= 1.0/ textureSize* 0.5;

So if your render target is 800x600 and your texture is 128x128 you need to do
Pos-= 1.0/ 128.0* 0.5;


Have a nice day.

Demo: GPU Fractals



Introduction.

The demo comes in four flavors (techniques):
  1. An unrolled loop (the loop has the "[unroll]" attribute). The pixel shader compiles to "approximately 2247 instructions". Runs at ~18 fps;
  2. A "special" unrolled loop (more about this later);
  3. A loop with a "[loop]" attribute (let's call it a static loop). Compiles to "approximately 19 instructions". Runs at ~18fps (identical to the unrolled loop);
  4. Dynamic loop. The loop break's when the distance to origin becomes greater than 2. Compiles to "approximately 26 instructions". Runs at anywhere between ~6 and ~380fps, depending on what is on screen.

The "Special" Technique
When generating the Mandelbrot set you need to calculate Z= Z^2+ C, where Z and C are complex numbers. In HLSL it looks like this:
tmp= (Z.x* Z.x)- (Z.y*Z.y);
Z.y= 2*Z.x*Z.y;
Z.x= tmp;
Z+= C;
This is what the "normal" techniques use, but the special one does this:
tmp= (Z.x* Z.x)- (Z.y*Z.y) + C.x;
Z.y= 2*Z.x*Z.y + C.y;
Z.x= tmp;
(Notice how C is added)

The result should be identical, and I guess the first version should be faster since it has one instruction less.
And here comes the wtf moment: when used in an unrolled loop, both versions compile to the same number of instructions (2247), but the "special" one runs much faster (~28fps vs. ~18fps) and the resulting image is slightly different (you can see it when you zoom-in somewhere).

I have no idea why the instruction count is the same when the loop is unrolled (static and dynamic loops compile as expected). Nor why does it run so much faster. Tried comparing the ASM code but couldn't work anything out.

Demo
Because of the crazy instruction count and/or looping, you need hardware that supports shader model 3.0 (gf6 or radeon x1X00). So I made a little video (1:18, ~9mb).

Download (source + binary, 40KB)

Fractals are Even More Fun!

The algorithm for generating the Mandelbrot set looks like this:

for each pixel
{
  C= (x,y);
  Z= (0,0);
  i= 0;
  while i<MAX_NUMBER_OF_ITERATIONS and Z distance to origin< 2 
  { 
     Z= Z^2 + C;
     i++;
  }
  Color= ColorTable[i];
  DrawPixel(x,y, Color);
}
Where C and Z are complex numbers.

When MAX_NUMBER_OF_ITERATIONS= 256, generating this(1280x1024) image takes about 20 seconds.

But, as you can see, the algorithm is very easy to parallelize: the pixels are independent of each other, there's no writing to the same memory locations etc.
So, for example, you could run it with two threads (one thread generates one half, the second- the other half of the image). And on a dual core machine it should run two times faster (on quad core- four times, etc.)

Now if only I had access to some kind of processor that had more that two cores, much more. A processor that would be built to run parallel code. Maybe those cores could be called, I don't know, "stream processing units"? And there could be a 100, no, 120 of them!

Oh... wait... :wink:

Wrote a little program that generates the same image (1280x1024, 256 iterations, the color table is in a 256x1 texture) in the pixel shader.
It runs on 17fps. That is 20 seconds on the CPU and 0.06 seconds on the GPU! w00t?

The shader is a perfect test to see how dynamic and static branching/looping performs in different situations.

I should be posting a demo in a couple of days.

P.S. Updated a very old post with info how to use 3dmax sdk's IGame to export animations.

Fractals are Fun

One of my assignments for the C# course at the university is fractals.
Nothing fancy, simply "fractals" so here we go:

Mandelbrot set:
(click for uber large)

Same as above, just zoomed in somewhere:

Julia set (C= -0.835-0,2321i):

Here's a good website if you're interested how it works and don't want your head to explode from reading a lot of smart words.

Fractals are fun because you can (in theory) zoom in as much as you want, but what happens when you run out of floating point precision? This:

C# and .NET is great but working with System.Drawing.Color is terrible. There are a lot of predefined values (wtf is "BlanchedAlmond" or "MediumTurquoise" ?!) and a FromArgb(int, int, int, int) method.
What the hell? If a linear interpolation method too much to ask, then how about addition, subtraction, multiplication operators or at least FromArgb(float, float, float, float)?

Content Pipeline

Happy New Year.
Better late than never :rolleyes:

The exams at the university are finally over and I have a chance to get some proper work done on the thing that I work on.

Without going into to much detail, the "\data" directory of the last "official build" was 80mb and 145 files. Currently it's cranking at 319 files and 418mb (but there's some duplicated junk).

All of the data has to pass the "content pipeline" (slice up, export the models, resize, convert the textures to a DXT format, pack them, etc.)
The problem is that my "content pipeline" looks something like this:
  1. Take data;
  2. wtf?!
  3. Load it;
  4. Spot that something is terribly wrong;
  5. Goto 2;
Finally the content is at a stable state (read: doesn't change frequently). So the amount of me contemplating stabbing myself in the eye with a corkscrew has gone down dramatically. Still, I wish I could figure out how to optimize the second step...

The moral is: kids, remember, if you are planning to make anything more complex than a pacman clone, think about all the shit that will have to be moved around.

That is all.

What, the... End?

Somehow managed to get virtual depth cube maps to work. No fetch4, no fancy filtering, simply render to a D3DFMT_D24X8 texture and automagically get the shadowing term with a tex2Dproj().
The results:
Simple D3DFMT_R32F cubemap: 64fps;
Virtual D3DFMT_R32F cubemap: 62fps;
Virtual depth cube map: 69fps;

I think I'm doing the double-speed Z rendering, but on the other hand, the final pixel shader is more complicated, I guess that's why the performance increase is so small.
One interesting thing, my ATI hd2600 does perform some kind of shadow filtering when Linear filtering is enabled for the depth map. So you get a bit of speed and a bit better image quality.
The annoying thing is that even though you are rendering only to the depth buffer, you still need a color buffer which is the same size as the depth buffer (in my case- 1536x1024).


And here's the performance on my gf6200:
Simple D3DFMT_R32F cubemap: 17fps;
Virtual D3DFMT_R32F cubemap: 16fps;
Virtual depth cube map: 16fps;
Impressive :wink:

(click)
May 2008
SMTWTFS
April 2008June 2008
123
45678910
11121314151617
18192021222324
25262728293031