photo of Macallan

Rants & Ramblings

Subscribe to RSS feed

Posts tagged with "Sun"

Better late than never

, , , ...

I've been bitching about lack of hardware documentation ( especially on graphics hardware ) for a long time, and sometimes the right people hear me whine and drop programming manuals on my (virtual) doorstep. A few days ago this happened again, this time I got:
  • official cg6 documentation, including the geometry unit
  • official SX documentation ( the graphics processor built into the SS10SX's and SS20's memory controllers ) - complete with instruction set description. Claims to be preliminary and predates the SS20 ( it's dated 1990 ) but most information in there appears to be accurate.
  • a firmware loader and some code for the cg12 - maybe we can get some acceleration out of that one as well


We pretty much know how to deal with a cg6, while the matrix / geometry unit is interesting it's not something that's useful for basic 2D acceleration.
The SX manual more or less confirms what we knew and/or guessed about the SX - it's a vector processor with plenty of internal registers ( 128 of them, 32bit wide, with the first 8 having special purposes ), it can access all physical memory ( well, it is built into the memory controller ) but apparently no SBus space. The CPU is supposed to feed it instructions, one or two at a time, there are two sets of mappable registers - one for kernel use, one for userland with parts read only ( things like boundary checking, otherwise userland could use the SX to access arbitrary memory ). The manual documents most of the register bits ( I found some that are set by SX but which the manual claims are unused, that's not all that surprising though, the manual is a few years older than ( and therefore a few revisions behind ) the hardware I'm using ).
SX turns out to be a vector processor, not some sort of SIMD unit as we initially suspected. The good thing about that is the fact that most instructions take a count of how many times to repeat the operation on successive registers and/or memory locations, that way we can read or write up to 32 registers or do up to 16 other operations with a single instruction, that way limit the number of instructions the CPU has to send. Now these operations don't all run at the same time - SX has two ALUs, so there is some parallelism but not a whole lot. On the other hand, MBus CPUs aren't exactly fast by today's standard either, so whatever we can offload to SX will probably help.

Cruel & unusual graphics hardware

, , , ...

I started hacking on Sun's CG6 / GX family of graphics controllers again, mostly because a SPARCstation LX showed up on my doorstep and I found a relatively reasonably priced source for video memory modules for it. With that it's capable of running video resolutions up to 1600x1280, which makes it useful as a debug head. Sure, a 50MHz MicroSPARC is painfully slow by today's standards but it's still more than enough to run a bunch of xterms, text editors and ssh sessions.
Without the additional video memory it would top out at 1152x900 which, even though most modern TFTs support it, doesn't match up with their native resolution resulting in ugly stretching artifacts. Panels that use 1280x1024 or 1600x1200 are far more common and with the VRAM upgrade the LX can run them at native resolution.
The good news is that the LX's video output circuitry produces a nice, crisp picture even at high resolutions ( unlike, for example, shark or a whole bunch of contemporary consumer grade graphics hardware ). The bad news is that the LX's onboard CG6 is rather slow even compared to other CG6. I did most of my development work on various Turbo GX and XGX variants found on SBus cards, all are quite a bit faster. On top of that there are small differences in the actual graphics processor, and one bit me on the LX:
All CG6 have a bit in the status register which is set whenever the blitter or the drawing engine are busy. On some variants ( by coincidence all the SBus cards I have fall into this category ) there's another bit in the same register indicating that the pipeline is full, so in order to send another command my drivers would wait for that bit to clear. Now it turns out that the LX's onboard CG6 doesn't support this second bit so we have to wait until the blitter is done before sending more commands, resulting in more waiting and less parallelism between CPU and graphics controller.
While there I also added support for anti-aliased fonts to the cgsix driver, which is quite usable, despite the slow CPU, as long as there is video memory available to cache glyphs in. Of course the alpha blending has to be done by software since the CG6 doesn't even know about the concept.

Alpha-blending vs. FFB

, , , ...

Sun's Creator / Creator3D / Elite3D family of graphics boards is a strange bunch, compared to what you'd find in PCs. Their distinguishing design choice is their use of 3dRAM, which is marketing speak for video memory with built-in ALUs. The idea is to conserve video RAM bandwidth by eliminating or at least greatly reducing read-modify-write cycles. For this purpose the chip supports many different views on its memory:
  • Five 'dumb' apertures, which bypass the ALUs and access memory directly. One 32bit per pixel one and four 8bit views, one for each component ( red, green, blue, X - depending on context that's either WID or alpha )
  • Six 'smart' apertures which access memory through the on-chip ALUs and that was may have all sorts of side effects like bit operations, alpha, depth cueing, z-buffering etc. applied. One for 32bit per pixel, one for each channel and a 64bit view through which both pixel and Z-buffer data are visible.

Each pixel consists of 96 bits of information on the 3D models - front and back, Z and stencil buffer, the non-3D models only has 32bit per pixel - just one framebuffer, no double or Z buffering available.
So, in order to draw anti-aliased fonts in the ffb driver my first idea was to program the ALUs for a * fg + (1 - a) * bg alpha blending, set the colour source to constant so fg in the formula will come from the foreground colour register instead of pixel data written to the framebuffer. Unfortunately there is no mode to have the background colour come from a register as well so in order to draw a character we first have to fill the character cell with the background colour. This isn't too bad, if we're drawing a space we can stop right there and skip the whole alpha blending business. We have to wait for the drawing engine to finish anyway since changes in ALU programming only take effect when the engine is idle. Now we should be able to just memcpy the alpha map for the character into the 8bit smart aperture corresponding to the X channel, in this case alpha. Unfortunately this doesn't work, if I do this I end up with colour data from the pixel I write to being fed to the ALU, if I write only the alpha value into the 32bit smart aperture it uses the colour from the foreground register. Ah well, still 32bit writes per pixel but at least I don't have to combine alpha and foreground colour like Xorg's sunffb driver does.

New year, new graphics hardware

, , , ...

Some Kind Soul(tm) sent me a Sun XVR-1000 board, complete with daughter card for additional outputs. This thing occupies two slots ( one UPA and whatever is next to it - the specs claim it needs one UPA and one PCI slot which is nonsense, it only needs additional space, not connectivity ) and has four outputs - traditional 13W3, S-Video, VGA and DVI. The latter two are on a daughter card with one additional DAC so they're likely not independent ( that, and the big, fat heatsink with attached wind tunnel accounts for the need for a 2nd slot )
Documentation for this card is not available but supporting it as a dumb framebuffer was trivial - the firmware tells you where the card's memory regions are, what video mode it's in so getting it going required just minimal poking around. For a dumb framebuffer the thing is quite fast though - in my U60 it beats some PCI graphics boards with acceleration, I guess that's where the UPA connection pays off.
So, now we support it as both console ( gfb at mainbus ) and in X ( with the wsfb driver ) - both unaccelerated for now but usable.
This hardly does the card any justice, but without documentation there's not much I can do. I found the website of one of the engineers involved with its design, tried to contact him but didn't get an answer so far. Since he also worked on other Sun graphics hardware I'm sure I won't run out of questions for a while if he ever answers.

Wrote a couple drivers

, , , ...

Since I got an S24 for the promise of writing drivers I had to follow through - tcx as sbus now uses the 'blitter' for scrolling and the stipple 'engine' for filling rectangles and drawing characters. This card is seriously weird.
First the non weird characteristics - it's got 4MB video memory, each pixel is 26bit with 2 bits control and either 8 bit palette or 24bit colour with or without gamma correction. It plugs into the SPARCStation 5's AFX slot which is basically the MicroSPARC's graphics bus which is 64bit wide and apparently works at more or less the same speed as the memory interface. To that add a DAC which understands the control bits, supports a hardware sprite etc, and different views on the graphics memory, either as 24bit without control, 24bit with control bits or 8 bit. So far so standard. The weird thing is how this card does graphics acceleration. It doesn't have a graphics processor, instead it has a 'blit space' and a 'stipple space'. Both respond to 64bit writes where the address you write to defines a target pixel in video memory and what you write into it defines a command which can be 'copy up to 32 pixels from one location to another' or 'write this 32 pixel pattern in this colour'. Stipple space supports only transparent patterns and they have to be aligned to 32 pixel boundaries. Blit space can copy 1 to 32 pixels without any alignment requirements. Both exist in two versions, either with access to the control bits or without ( although copies without control bits are kind of useless ).
I also added acceleration support to Xorg's suntcx driver, it uses EXA and supports block copies and rectangle fills.

After that I finally got around to write a console driver for the Sun PGX32 / TechSource Raptor GFX 8P. It's nothing special, just a Permedia 2 with 8MB SGRAM. For now pm2fb at pci uses the blitter for scrolling and rectangle fills. For some reason I couldn't get the drawing engine to actually draw anything, therefore characters are drawn by software and rectangle fills use SGRAM-specific fast fill operations. I must be missing something here since Xorg's glint driver clearly manages to get it to draw stuff. The whole thing is a mess - a dozen subunits, each with enable bits in a dozen different registers. I probably forgot to enable one crucial subunit which keeps drawing operations from doing anything else than zeroing rectangles out. Ah well, at least copy operations work right.

Finally, last week hell froze over and I had another look at my rev. 5 Shark's graphics chip. Finally cleaned up the igsfb hacks that have been sitting in my source tree for years, committed the mode setting code, made sure it doesn't behave any different than before on Krups, added support for Sun-like video mode specifiers in OpenFirmware's output-device variable.
So, on a rev. 5 Shark you can now
setenv output-device screen:r1280x1024x60

And igsfb will switch to 1280x1024 in 60Hz. Any mode defined in src/sys/dev/videomodes/ should work as long as the graphics chip can support it. There is no support for TV output yet.
The firmware doesn't support modesetting but it will happily ignore the mode specifier so we can use it and users can now treat the shark more or less like a weird Sun wink
That said, I also finally got around to write an Xorg driver. The xf86-video-igs driver so far only supports the VLB CyberPro 2010 but adding support for PCI variants and the CyberPro 2000 should be trivial - I don't have the hardware though. The driver is still quite immature, it uses wscons ioctl()s for the hardware cursor, doesn't contain any modesetting support ( it uses whatever it finds, that's why I added the code described above wink ), it only accelerates rectangle fills and block copies and there is no support to switch colour depth either. This will be added to the kernel driver though, make it switch to whatever is the highest colour depth possible in the given mode when switching to graphics mode.

PGX32 / Raptor GFX 8P support

, , , ...

Yeah, we support it now.
Turned out genfb would have worked out fo the box if the firmware wasn't buggy - the OpenFirmware calls to change palette registers don't work and in 32bit colour the 'linebytes' property incorrectly contains the same value as 'width'. So I added a workaround, if 'linebytes' is smaller than bytes_per_pixel*width then we know 'linebytes' is bogus and use that instead. Yeah, it's slow but it works.
The PGX32 isn't much more than an off-the-shelf Permedia2 with Sun firmware and a few jumpers to turn off things like fixed VGA PCI resources. XFree86's glint driver supports the chip and would have worked out of the box if it had ever been tested on a big endian machine with an operating system that enforces PCI mapping restrictions - NetBSD doesn't allow you to mmap PCI space that doesn't belong to any device. The driver on the other hand was written a little bit sloppy - there's a register block, 128kB, which contains the same set of registers twice, once big endian and once little endian. The little endian block comes first. Now the problem is that the driver always tries to mmap 128kB even when it wants the 2nd part which results in attempting to map 64kB of unoccupied PCI space. Trivial to fix but still annoying.
Other than that it Just Works(tm).

X.org and NetBSD/sparc64

, , ,

Since Core decided to switch to X.org I've been gradually importing sources into NetBSD's cvs repository, namely modular X.org 7.0, freetype2 and Mesa. The latter two were part of the monolothic releases.
As a first step to get something usable I added the missing bits and pieces to make it work on NetBSD/sparc64, binaries can be found here. This is by no means final and doesn't include most of our additions, like all the sunffb bugfixes ( only some ) or CG6 acceleration. I tested it on an ffb2+ and the onboard Rage Pro found in my Ultra 10 - works just fine, seems reasonably stable after the last round of bugfixes.

Opera on a SPARCbook?

, , , ...

Yesterday I went insane or something and tried to run Opera 7.54 under NetBSD's COMPAT_SVR4 on my 3GX ( I used 7.54 because I had it around and was just too lazy to download something newer at that point ). As expected it barfed over a bad system call. Since this particular error didn't happen on sparc64 I had a look - turns out the 32bit emulation maps this particular call to an empty function returning 0 so I changed COMPAT_SVR4 accordingly and the error went away. And Opera just started. No garbled GUI like on sparc64, everything looked fine.
Poking around a bit more I found that DNS lookups don't work in opera. Poking around even more I found that DNS lookups don't work with any Solaris binary ( like, telnet <some_IP> works, but telnet some.host.name doesn't if it required a DNS lookup ).
I'm using libraries snatched from a machine running Solaris 9 and apparently Sun added a shedload of new sockio()s and the DNS resolver barfs if sockio(SIOCGLIFNUM) fails. Christos added it over night and now Opera works fine with 'Synchronous DNS' enabled.
The reason why I find this noteworthy is - by todays standards this machine is slow ( just a 110MHz MicroSPARC II ) and has not much RAM ( 64MB ). Opera is fast enough to be useful. It's certainly faster than the HTML renderer in KDE1 and running something gecko-based with only 64MB RAM is a joke. In fact only Dillo is faster but it doesn't support any CSS and of course lacks just about everything Opera supports.
Just for kicks ( and to get rid of the ad banner which is really annoying on a small 800x600 screen ) I installed Opera 8.52 - works just fine. Not a bit slower than 7.54, quite the opposite actually.
So big thumbs up to Opera Software for making the only modern browser that's usable on this kind of hardware.

NetBSD on Sun hardware

, , ,

So 3.0 is about to be released Really Soon Now(tm) - why should anyone care? Here's why:

First, the infamous sleep forever bug that made various UltraSPARC boxes lock up randomly has been fixed. Finally.
Second, NetBSD/sparc64 switched to the wscons console driver which allows nice things like virtual consoles, different terminal emulations, fonts and so on. We have accelerated drivers for most Sun-labeled graphics devices you're likely to find an any supported Ultra, namely the CG6 family ( GX, GX+, TGX, TGX+ ), ffb ( Creator, Creator3D ), afb ( Elite3D ) and mach64 ( PGX24, graphics chips found on Ultra 5 and Ultra 10 mainboards, probably others )
With 3.0 can also run XFree86 on all of these, with full acceleration. The acceleration part for the cg6-driver had to be written from scratch and there were a few bugs to squish in the ffb/afb driver and the Xserver itself but now things work nicely.
To sum it up - we're almost on par with i386 feature-wise now. For 4.0 I'll hopfully omit 'almost' wink

The cg6 isn't exclusively found in sparc64 machines, there are probably more 32bit Suns equipped with this kind of graphics board and of course both the console and the XFree driver work there too. However, since we still don't have wscons-compliant console drivers for all framebuffers commonly found in 32bit Suns these won't be included with 3.0

What's to come after 3.0:
  • A new ffb driver that uses XAA and supports hardware-accelerated alpha-blending ( this greatly speeds up drawing of anti-aliased text ) - works in -current, too new for 3.0.
  • support for switching virtual consoles with X running. Still needs some bugs fixed.
  • wscons and XFree on NetBSD/sparc. Right now we have drivers for Weitek P9100 and CG6 ( both X and console ), cg14 and ZX/leo are being worked on.
  • greatly improved support for Tadpole SPARCbook 3GX and similar laptops. We have drivers for the audio chip ( still somewhat experimental but good enough to play MP3s ), PCMCIA ( stable ), console (stable, but not active since NetBSD/sparc didn't switch to wscons yet ), XFree86 ( currently 8bit only but with some acceleration ), CPU power saving, more feedback on the built-in status-LCD and so on.
  • XFree86 now works on the JavaStation Krups. Unaccelerated and 8bit only but at least it uses a hardware cursor, it's definitely usable for light work ( and you won't run anything heavy on a 100MHz MicroSPARC IIep anyway )


May 2013
S M T W T F S
April 2013June 2013
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31