photo of Macallan

Rants & Ramblings

Subscribe to RSS feed

PowerPC abandonment syndrome

, , ,

I have been bitching about Opera discontinuing their version for MacOS X / PowerPC before, especially since they didn't bother to keep it going until it reached a state of reasonable stability and functionality, instead they just dropped it in the middle of something that looked more like a public, unannounced beta phase. Ever since I've been looking for a suitable replacement since I'm not going to get rid of my PowerPC hardware any time soon.
Of course, in the original bitch&whine post I said I would drop MacOS X altogether ( and did so on some machines - they're NetBSD-only now ), I can't really do that on the G5 yet.
Of course there is Safari which, even though Apple abandoned the PowerPC version not long after Opera did, is in considerably better shape than Opera 10.63, especially the version that comes with OSX 10.5.
Finally there is - or rather, was - Firefox. I never liked the OSX version and now they stopped providing PowerPC-builds for everything newer than 3.6. Also in much better shape than the final Opera version but it lacks stuff like HTML5.
The solution I found is a branch of Firefox called TenFourFox. It's more or less current Firefox with PowerPC bits dusted off and optimized. As the name suggests it supports OSX 10.4, it works like a charm on my G5 and unlike the official Firefox I've been unable to crash it so far.

Old Mac vs. alpha blending

, , , ...

A while ago Someone™ sent me a Performa 6360 - it's got a 160MHz 603ev CPU, not exactly high end even in 1996. Installation was quite painful since the firmware neither supports the onboard video nor booting anything other than MacOS from CDROM. Since I wanted to upgrade the harddisk anyway I just prepared the 'new' disk in another Mac, and since I wanted to do some voodoofb hackery I put a Voodoo3 in the single PCI slot which does have the right firmware goo to serve as OpenFirmware console. This particular Performa came with a standard 10MBit/s Apple Ethernet board, no modem, no TV module and the standard 256kB cache module. I would have tried the G3 accelerator from my PowerMac 4400 since it uses the same cache slot but the graphics card is in the way. Either way, I found two suitable 64MB modules, now RAM is maxed out at a whopping 136MB.
Now for the other reason why I've been playing with this machine. It's quite slow and therefore a nice test bed for CPU-intensive tasks like alpha blending. The Voodoo3's 2D engine doesn't support alpha blending and the 3D engine is 16bit only even though the rest of the card will happily do 24bit colour. So the first step was to add support for anti-aliased fonts to voodoofb, for now only in 8 bit. As usual, rendering is by software but actual drawing of the characters uses host blits so we can use the pipeline instead of having to wait for the engine every time we want to draw something. This is already pretty fast, in order to make it faster I added a simple cacheing scheme which stores commonly used characters ( as in, everything that uses the default attribute ) in video memory when they're drawn the first time and if they're needed again we simply blit them in place from off-screen memory. That made it even faster.
While there I finally added DDC2 support ( mode switching has been in the driver for years although mostly unused ) which works nicely up to 1680x1200 ( my TV's 1920x1080 didn't work for some reason so these modes are disabled for now until I find out what's (not) going on ).
The other other reason for reviving this machine was the unsupported onboard video. In OF it shows up as /valkyrie, the 'screen' devalias points to it by default, so apparently it was intended to be the console at some point.
Of course the only documentation ( if you can call it that ) is the Linux driver which was apparently reverse engineered from MacOS. The hardware is rather primitive - there is an i2c-controlled PLL which generates the pixel clock, 1MB framebuffer memory, a simple RGB DAC and a handful registers to program video modes, colour depth and interrupts. Video mode programming is weird - there's a single 8bit register and the upper two bits are used to turn off video output and sync signals. The lower 4 or 5 bits apparently correspond to video mode numbers used by MacOS, so there is no way to program arbitrary modes although we can use whatever pixel clock we want. My driver is therefore split into two - one for the PLL so it can attach to CUDA's i2c bus and it might be useful for other, similar video hardware which may use the same way to program the pixel clock, and the actual framebuffer driver. It switches video modes by matching the requested mode against a list of suitable MacOS modes and then programming the PLL with the right pixel clock. Works alright so far. Since there is no drawing engine whatsoever everything is drawn in software which brings us to the next point, namely anti-aliased fonts on dumb framebuffers.
As it turned out, even on a low latency bus with a relatively slow CPU, the time it takes to draw an anti-aliased character is dominated by the time it takes to shove the pixels into the framebuffer, not the actual calculations. The fact that my first implementation of the drawing method was quite inefficient didn't help either.
In order to speed things up I now let it render each scanline into a buffer in main memory and then use memcpy to move it into video memory, instead of writing each pixel separately. That gave a nice boost. Then I discovered that the 'fast path' for blank characters which I copied from an existing putchar() method was even worse - it drew every pixel separately and every time it checked for a shadow framebuffer in order to update that as well, pixel by pixel. Replacing that with memset() gave another big boost.
The benchmark I used was to scroll a bunch of text ( always the same file of course ) and measure how long it takes.
In its first incarnation valkyriefb took about 56 seconds. The memcpy() trick reduced it to 50 seconds. Using memset() to draw blanks got it down to 32 seconds, and cacheing glyphs in main memory reduced it to 27 seconds.
So, out of the whole time it took to scroll the text ( which, on a dumb framebuffer redraws the entire page instead of reading from video memory. According to the same benchmark scrolling by copying video memory is even slower than the original, inefficient putchar() implementation ).
So, out of 32 seconds of constant drawing of anti-aliased characters, the actual calculations took a mere 5 seconds. The rest is almost all writing to video memory. I also experimented with mapping video memory cacheable or with relaxed ordering restrictions but when using memcpy() and memset() neither one made a measurable difference.
For comparison, the same benchmark on voodoofb took an average of ~2.15 seconds without cacheing in video memory, and ~1.3 with cacheing. The difference is that on voodoofb scrolling is done by the blitter so it doesn't draw nearly as many characters, which is why the calculations don't amount to the same 5 seconds.
The same optimizations yielded a visible speedup on a 2GHz Athlon 64 with PCIe graphics running as a dumb framebuffer. You'd think a CPU like that with a link to video memory that's way faster than the Performa's CPU bus would render circles around the 603e / Voodoo3 combo. Nope, it doesn't. It's barely faster than valkyriefb ( the same benchmark took 29 seconds without glyph cacheing ) and doesn't come anywhere near the Voodoo3. I'll have to figure out how to do host blits on modern Radeons.
Lessons learned:
  • video memory is slow, likely slower than you think it is
  • video memory reads are to be avoided at almost all cost, if you think you're at the point where the cost is too high you're probably wrong
  • forget your intuition, your CPU is probably faster than your video memory. It's always a good idea to measure instead of going with probably ill-supported assumptions.
  • fast methods to write video memory may compensate even for a vastly faster CPU
  • PCIe is ridiculously fast with big burst transfers, it really, really hates it when you transfer small chunks


CPU frequency scaling on Gdium

, , ,

I went through with the plan described earlier - use one of the SM502's PWMs to generate a 100Hz timer interrupt, change clock speed only in the timer interrupt handler and that way compensate for the effect on the MIPS cycle counter ( as in, we have a global counter that updates every timer interrupt and time counters just measure cycles since the last timer interrupt adjusted for CPU clock ). This has been committed a while ago, along with changes to pkgsrc/sysutil/estd.
The good news - at a lower clock speed the machine gets significantly less hot.
The bad news - the fan will still spin up every now and then, just not as often as it used to.

Alpha-blending in low colour depths

, , , ...

The main reason to add support for alpha blending to wsdisplay was to give us relatively easy access to TrueType fonts - all anti-aliased fonts currently present in the NetBSD source tree were generated from TTF fonts found in pkgsrc with licenses that appear to allow redistribution of specific renderings. While freetype2 ( which is what the conversion utility uses ) can output monochrome bitmaps suitable for wsdisplay, the results look much better with anti-aliasing enabled.
Now we support some graphics hardware that doesn't support more than 8 bit colour, but which might be found in machines which are easily fast enough to do the alpha blending calculations by software, or graphics hardware that is just too slow when run in more than 8 bit, so we should be able to use the new fonts in 8 bit as well. There's prior art too, for example RISC OS supports anti-aliasing in a low as 16 colours ( that's 16 colours, not 16 bit ).
The solution is to simply use a fake 'true' colour map. In 8 bit we can use 3 bits for red, 3 bits for green and two for blue, which should be enough to make anti-aliased fonts look halfway decent. Rendering in 24 bit still looks an order of magnitude better but r3g3b2 doesn't look horrible either.
To get this to work rasops needs to know that the hardware colour map is r3g3b2 instead of ANSI colours in the first 16 palette entries - I added a flag which drivers can set to get the right devcmap. In order to test this out I added alpha blending support in 8bit colour to the r128fb driver, mostly because the hardware is relatively common in PowerMacs, the firmware always hands us 8 bit colour, and my Pismo was just sitting there waiting for some hackery. It's doing all calculations by software since we don't have any docs on the 3D engine, the 2D engine doesn't support alpha blending in any way and even if it did it probably wouldn't support it in 8 bit. Just like voyagerfb it still uses the blitter to draw characters so we don't have to scribble into video memory and stall the drawing engine.
The good news - it's pretty fast and looks better than bitmap fonts.
The bad news - you can immediately tell the difference to the same font rendered in 24 bit. Black on white looks nice but some other colour combinations not so much. Still lets us use the new fonts with usable results, which was the whole point of the exercise.

Alpha-blending vs. FFB

, , , ...

Sun's Creator / Creator3D / Elite3D family of graphics boards is a strange bunch, compared to what you'd find in PCs. Their distinguishing design choice is their use of 3dRAM, which is marketing speak for video memory with built-in ALUs. The idea is to conserve video RAM bandwidth by eliminating or at least greatly reducing read-modify-write cycles. For this purpose the chip supports many different views on its memory:
  • Five 'dumb' apertures, which bypass the ALUs and access memory directly. One 32bit per pixel one and four 8bit views, one for each component ( red, green, blue, X - depending on context that's either WID or alpha )
  • Six 'smart' apertures which access memory through the on-chip ALUs and that was may have all sorts of side effects like bit operations, alpha, depth cueing, z-buffering etc. applied. One for 32bit per pixel, one for each channel and a 64bit view through which both pixel and Z-buffer data are visible.

Each pixel consists of 96 bits of information on the 3D models - front and back, Z and stencil buffer, the non-3D models only has 32bit per pixel - just one framebuffer, no double or Z buffering available.
So, in order to draw anti-aliased fonts in the ffb driver my first idea was to program the ALUs for a * fg + (1 - a) * bg alpha blending, set the colour source to constant so fg in the formula will come from the foreground colour register instead of pixel data written to the framebuffer. Unfortunately there is no mode to have the background colour come from a register as well so in order to draw a character we first have to fill the character cell with the background colour. This isn't too bad, if we're drawing a space we can stop right there and skip the whole alpha blending business. We have to wait for the drawing engine to finish anyway since changes in ALU programming only take effect when the engine is idle. Now we should be able to just memcpy the alpha map for the character into the 8bit smart aperture corresponding to the X channel, in this case alpha. Unfortunately this doesn't work, if I do this I end up with colour data from the pixel I write to being fed to the ALU, if I write only the alpha value into the 32bit smart aperture it uses the colour from the foreground register. Ah well, still 32bit writes per pixel but at least I don't have to combine alpha and foreground colour like Xorg's sunffb driver does.

Anti-aliased fonts in wsdisplay

, , , ...

After a little exchange on IRC about having more usable fonts for wsdisplay, anti-aliasing and using things like TrueType fonts in the console I went to work with freetype2, mostly because it's already part of NetBSD's build process.
As it turned out having freetype spit out 8bit alpha maps for individual glyphs is almost trivial, what's not so trivial is to find the right character cell size. The problem is that you give freetype a font height in pixels but this height does not include space for diacritics that may or may not appear above and below characters and all metrics are relative to the base line with no obvious indication where in the character cell to put it. My naive approach works like this - for a given character cell height request a type with 90% of that height, then measure the height of the capital W to find the base line within the requested height, leave the extra 10% for diacritics. The reason is this - for a console font we don't want too much empty space between lines and diacritics are rare so we accept the occasional cut off for the sake of something closer resembling a traditional terminal font. For every glyph truetype also gives us a distance to advance in X direction for drawing the next character, with a monospace font this should be the same for all glyphs which is exactly what we need for wsdisplay.
My ttf2wsfont utility currently takes a font, does the measurements above for a given cell height, then allocates memory and renders all ISO1 characters as alpha maps and spits the result out as a C header file which resembles the existing wsfont files except that font data are 8 bit instead of monochrome.
Now to the kernel part. First, since rendering alpha maps isn't really feasible with colour-indexed video modes we need to make sure there is always a fallback to a monochrome font and we need to make sure never to feed an alpha map font to a monochrome only rendering routine. For now I just keep alpha and mono fonts in separate lists and explicitly check the alpha list when we know we can handle them.
That leaves the actual rendering which is surprisingly trivial. For each pixel in the alpha map the result is simply alpha * foreground_colour + (1 - alpha) * background_colour. Or, since we're dealing with 8 bit values here, each component is (alpha * foregound_component + (255 - alpha) * background_component) >> 8. I only implemented this for rasops32, adding it to 15 and 16 is trivial.
The result looks pretty, even on a laptop where the VESA BIOS is too braindead to initialize a video mode that matches the panel's native resolution.
The next step is to implement the alpha blending in hardware with at least a few drivers - ffb and crmfb come to mind, mostly because it's ridiculously easy to do on the corresponding hardware. No need for messing with texture mapping, ffb has RAMs with built-in ALUs that support alpha blending so all we need to do is to set a few registers and then scribble the alpha map into the 8bit smart aperture as is. With crmfb it's just as easy - the drawing engine supports alpha for all operations, including bitblts, so we can keep the font in video memory and drawing a character is a few register writes with no actual data uploads.
Since I've been hacking on Gdium for the last couple weeks of course I had to add support there too, unfortunately the hardware's alpha blending support is useless here - all it can do is to combine two images with a constant alpha value, it has no concept of per pixel alpha. The only thing we can do here is to use host blits to draw characters instead of scribbling into video memory, that way at least all operations go through the pipeline and we don't have to sync the drawing engine every time we need to draw something.

Gdium surgery

, , ,

A while ago my Gdium's power connector got bad - it would only make contact when pulled in a certain way and even then it was flaky at best so I finally decided to open the thing and see if I can fix it.
Laptops, especially small ones, are notoriously difficult to take apart and often it's hard to tell if whatever is holding parts together is a screw you didn't find or some plastic tab you need to ( really, really carefully ) pry open - in other words, wether to apply force or not, if so how much force - therefore it is always a good idea to check if someone else had the problem before you did. Three seconds with google brought up this. It's straight forward, the only difference is that in my Gdium there was no sticky tape to hold the keyboard in place, just plastic tabs.
To my surprise the Gdium - as far as laptops go - is fairly easy to do surgery on. Not a lot of tape, glue and plastic tabs. Many screws, not too many different ones and it's easy to tell which goes where. All screws have metal counterparts embedded in the case, not a single one of those wood screw like things screwed directly into the plastic which serve only one purpose - to wear out on first contact. Manufacturing quality is a lot better than I expected and there are no funky tricks like that tiny but strong magnet that holds the keyboard down in the iBook G4. The only thing that's difficult is the fact that some cables are very thin ( like the ones that go to the buttons and the USB camera in the lid ) so you have to be careful there.
But back to the power connector problem. Turns out the connector was fine, just a bad soldering pad which was easy enough to fix. The connector is a nondescript barrel type, mounted SMD-style instead of having the pins poke through the mainboard, and there is nothing else holding it in place so be careful if you want to avoid having to open your Gdium.

Misguided attempts on saving money and How Not To Do Things

, ,

I finally found out how to control the Gdium's fan.
As the title suggests, it's quite bizarre. The temperature monitor chip they used is an LM75 which does exactly one thing - measure its own temperature and optionally notify the CPU if it gets too hot. On Gdium, this signal is abused to 'control' a fan. This approach has a few distinct disadvantages over using an actual fan controller:
  • No speed control. The thing is either loud or it doesn't spin.
  • No fan monitoring. There is no way to check if the fan is actually spinning.
  • Most fan controllers support at least one external sensor to be tacked on the CPU or whatever else gets especially hot ( usually the graphics chip ) in addition to an internal sensor. Gdium's CPU has no built-in sensor, and neither does the graphics chip.

I'm more and more getting the impression that whoever designed the Gdium didn't really think things through. First there is no CPU clock independent high reolution timer. Battery and temperature monitoring chips as well as some of the buttons have to be polled. And now the thing is loud by design.
Someone's been cutting the wrong corners.

Some more Gdium support

,

Backlight control now works as expected - not just on and off, and the hotkeys work everywhere, not just in X.
Controlling the backlight level is kinda funky - on other graphics chips there is usually a register somewhere near the other flat panel interface registers, you poke an 8 bit value into which may or may not be linear ( Rage 128 is like that for example ). On Gdium it isn't quite that easy. The SM502 has a bunch of GPIO pins which can be used as plain old on-and-off GPIOs or other things, like serial interfaces, flat panel interfaces, i2c buses and so on. Three of them can be run as Pulse Width Modulation outputs, and one of those controls the backlight level. This has the following advantages:
  • on/off is simple - run the output as GPIO and simply turn it on and off with a write to the appropriated GPIO_DATA register.
  • no dedicated backlight control hardware ( don't need it if you don't use a flat panel )

It makes backlight control a little bit more complicated. It works like this: there's a 96MHz clock, a power-of-two divider and two counters - one defines how many clock cycles the output should remain high or low respectively. A side effect is that you can't get all on or all off that way - it has to be at least one cycle on and one cycle off, so these have to be special cased and handled by switching to GPIO mode. Otherwise, brightness is controlled by feeding the PWM output to the backlight with a capacitor to smooth it out, you get any given level by programming the PWM to output ~20kHz and adjust the duty cycle according to the level you want.
With this, backlight control works properly.
The hotkeys required more hackery to work though - the Fn key is not handled by the keyboard controller at all, it's not even reported as a modifier - it's Just Another Key. OpenBSD just added a special keycode translation hook which seems redundant given that we already have code in place to handle the Fn key on some newer Apple laptops. What I ended up doing is to recycle as much code from the Apple keyboard hack, catch the Fn key before translation, use a special translation table if Fn is down. While there I also added code to translate USB keycodes directly to PMF events, that way the hotkeys work everywhere. I had to make the code optional since there is no easy way to detect a Gdium keyboard from the ukbd driver's point of view - the keyboard controller is a generic Cypress part, just going by the CPU class we're running on seems wrong since there is nothing which prevents other MIPS or even Loongson boxes from using the same keyboard controller.

Gdium support

, , ,

Loongson support finally works, thanks to Manuel Bouyer's work on porting OpenBSD's code, with this I finally managed to get my Gdium to boot multiuser.
Since some of the device support already existed I didn't port any of OpenBSD's Gdium-specific drivers and instead used NetBSD's existing ones and code I wrote for our initial effort on getting NetBSD to work. The hardware includes:
  • a Silicon Motion SM502 'Multimedia Companion Controller' - for graphics, audio, timers. Also contains a USB controller which non-prototype Gdiums don't seem to use. I wrote a driver for the graphics portion in 2009, added support for stuff like i2c and a base device for other drivers to attach to.
  • a M41T8x real time clock. Already supported by the strtc driver.
  • an LM75 temperature sensor. Already supported by the lmtemp driver.
  • a generic ehci/ohci USB2 controller
  • a Realtek 8139 fast ethernet controller
  • a Realtek RT2561C wlan controller
  • an ST7 microcontroller to manage things like power buttons and battery charge. This thing is strange - we talk to it over i2c and it doesn't seem to have any way to directly alert the CPU on anything, we actually have to poll the thing in order to figure out if the battery is low or someone pressed the power button. Wrote my own driver since OpenBSD's doesn't really do much at all.

Another problem with these machines is the braindead firmware. It's PMON2000, which is intended for embedded controllers and evaluation boards, and that shows. It can boot over network, contains all sorts of debugging facilities but there is no way to get information like the current video mode, vram location and geometry out of it. Or any information at all that's not basic configuration variable stuff. This means there is no way to have a machine independent, simple early console - drivers either need early attachment hooks, which is ugly, or we need hacks to retrieve or guess the necessary parameters. On Gdium we can safely assume that the display is 1024x600 in 16 bit and finding the framebuffer is just one BAR read, but on other machines it's not that easy. There is no device tree either so devices that can't be probed ( like i2c devices for example ) need to be guessed based on the model.
Finally, there is only one high resolution timer in the entire system and that's the CPU's cycle counter. The problem with that is, the counter's frequency changes with the CPU's clock frequency so we need to compensate for that if we ever want to support frequency scaling ( and since Gdium is a little laptop which gets fairly warm we most definitely do ). My current plan is to use one of the SM502's pulse width modulation units to generate a periodic interrupt at 100Hz, only modify the CPU clock rate in the timer interrupt handler, save the adjusted cycle counter on each interrupt and when querying the counter use the cycles passed since the last timer interrupt, adjust for frequency scaling, add the count saves in the last interrupt and return that. With this the counter's frequency should appear uniform no matter what clock the CPU actually runs on, we can guarantee it's uniform between timer interrupts and we only lose resolution when lowering the clock rate.
Why oh why didn't they put a cycle counter in the SM502? Just something that increments on every PWM cycle? Or add a counter to the CPU that works like PowerPC's time base, with its own clock, independent from the main CPU clock. Guess it shows which CPUs were designed with laptops in mind and which weren't.
Finally, since the kernel runs in 64bit while the userland is N32 I keep running into ioctl()s that need to be translated by the compat/netbsd32 code. The problem occurs only with ioctl()s which pass pointers between userland and the kernel - obviously they're different sizes which changes the data structures passed, which needs to be compensated for. On NetBSD, instead of having separate ioctl() handlers for 32bit and 64bit calls in every driver, we have code which translates based on the ioctl() number and the data structures passed, that way for example everything that uses for example a struct plistref can use the same translator.
As it is now, Gdium goes multiuser, ethernet, graphics, USB, real time clock etc. all work. X works with the wsfb driver only so far ( there's a problem loading modules which depend on other modules, like Xorg drivers that use XAA, EXA etc. - not suer if it's a bug in the real time linker or binutils or whatever. Wsfb works because it doesn't depend on anything. ) There is no audio support yet and wlan support doesn't work right ( It manages to associate with my router but stops doing anything after answering a few pings. Not sure if it's the ral driver or something else. )
There is basic powerd and envsys support - pushing the power button initiates a shutdown, closing the lid turns the backlight off, envstat gives temperatures and power status.

PowerBook surgery

, , ,

After losing another laptop harddisk to child induced blunt force trauma I got me an ATA-to-CompactFlash adaptor and an 8GB CF card. The adaptor is made to replace a 2.5" harddisk - it has the right connector in the right place and threaded screw holes that match up as well. Since the disk in my PowerBook was kinda flaky I replaced it with the combination above just to see how it would work out.
The card I got shows up as an ATA66 device:
wd0: <SanDisk SDCFH-008G>
wd0: drive supports 1-sector PIO transfers, LBA addressing
wd0: 7641 MB, 15525 cyl, 16 head, 63 sec, 512 bytes/sect x 15649200 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)

A bit smaller than the harddisk it replaces but more than enough for NetBSD, X11, KDE and whatever else I need, music and such are accessible over the network anyway so no need to carry yet another copy around.
Since flash memory only survives a certain number of writes I took a few precautions:
  • mount everything with the noatime option to disable access time logging. Without it every open() of a file would generate a write to record a time stamp.
  • turn off web browser disk caches - the PowerBook has 1GB RAM and the internet connection is pretty fast, no need to waste write cycles with this.
  • don't waste much room for swap space. The machine shouldn't need any during normal operation anyway.
  • put /tmp and /var/tmp in a ramdisk - no need to waste write cycles for sockets and caches

With all this startup time improved noticeably although it wan't exactly slow to begin with. The laptop is now completely silent unless the fan spins ( which only kicks in when the CPU gets some serious load ) or something mucks with the DVD drive, it also runs slightly cooler.

Targus sucks

, , , ...

So the SPARCbook's power supply died and I had to hunt for a replacement. It's nothing exotic - 12V, 50W, unremarkable barrel connector. Nothing fancy at all. After some poking around I found out that Targus makes 'universal' laptop chargers that supposedly work with any laptop. Since the SPARCbook's requirements were nothing extraordinary I bought one.
As it turns out these things are far less universal than they want you to believe. The 'universality' is achieved by using coded tips that tell the power supply what voltage to use and then wire it to a connector, they come with 10 tips for supposedly the most common models. So far so good. I found a connector that fit the SPARCbook but unfortunately it gave me 16V so I went to Targus' website in order to find a tip that works and that's where the trouble began.
Their website gives no technical ( or rather, useful ) information whatsoever. All you get is either pictures of the tips with no information or a searchable list of laptop models, devices etc., if your laptop isn't listed you're out of luck. After some fruitless searching I finally sent a message to their technical support - after all they should know what they're selling and surely there's another laptop out there that needs the same connector and voltage, it's not like Tadpole didn't use standard parts wherever possible.
Well, the answer I got is responsible for this post's headline - quite possible the most useless response I ever got from any 'technical' support ever. They told me I'm out of luck if my laptop isn't listed and they wouldn't recommend plugging it in with 16V. No attempt at solving the problem at all, just ha-ha we got your money now bugger off.
Well, they're not going to get any further business from me, even if some day I need a power supply for a laptop that is on their list.

Powerbook support

, , , ...

Most bits & pieces for running NetBSD on a Pismo were already in place ( after all, the thing has a lot of similarities with AGP PowerMac G4s and later iBooks ) and I managed to fix a few nits in the last few days:
  • the r128fb driver now knows how to set backlight levels both via wsconsctl and via PMF hotkey events
  • volume control via PMF / hotkeys works now
  • lid open / close events are now forwarded to powerd ( the switch broke in my iBook before I could make it work so it had to wait for working hardware, and it's Something Completely Different in the PowerBook 3400c )
  • powerd's lid_change script will now turn off the backlight ( after saving the level ) on close and restore it on open
  • the smartbat driver will now mark all relevant sensors as invalid if there is no battery in the respective slot

Other than that, even hotswapping the drive bay works fine, cardbus support Just Works(tm) ( no need for a hack like in the PB3400c ). The Pismo has the usual two i2c buses found in most ( all? ) UniNorth Macs but according to OF there is nothing useful hooked up to them ( well, except some modem control ) and there is no obvious way to talk to the fan controller for stuff like reading temperature sensors ( luckily the CPU's built-in sensor seems to work right ), there might be another i2c bus connected to the PMU. I need to dig up a nice high resolution picture of a Pismo mainboard, maybe the fan controller is another Analog Devices part which we can talk to using one of these i2c buses.

New hardware

, ,

Recently a nice, new looking G3 PowerBook showed up on my doorstep. It's a Pismo, got a 500MHz CPU with 1MB L2 cache and 128MB RAM with one SO-DIMM slot empty. 512MB SO-DIMMs go for $20 on ebuy so upgrading it to 1GB shouldn't be a problem. The laptop apparently sat in a closet since 2001 so it can't have been in use for more than a year.
That said, the last MacOS X version that's officially supported is 10.4.11 which I'm not going to bother with, NetBSD runs just fine and only needs some minor fixes:
  • the hotkey driver works fine but the audio driver needed a minor fix to play nice with it and the video driver needs support for backlight control
  • the smart battery driver shows bogus values for empty battery slots
  • the Xorg driver can't detect the display size and doesn't support Xrender acceleration

Another one bites the dust

, , ,

Now that Opera has discontinued support for anything PowerPC I'll have to go look for a new browser sooner or later - I'm certainly not going to fork over some hard earned money just to get an Intel Mac, at least not as long as I have perfectly adequate PowerPC boxes.
Either way, Safari grew support for extensions lately and guess what's there - session saving and ad blocking, a selective flash blocker and a whole lot of other things. For mouse gestures there's CocoaGestures. Looks like Safari became a suitable replacement just in time - now let's see how long Apple can be arsed to produce PowerPC binaries. Should be at least till 10.7 comes out.
Otherwise, I'm getting rid of MacOS X altogether on the machines which Apple decided can't have 10.5 - NetBSD works well enough on those and I won't depend on Apple or Opera for anything.

Another SBus monster

, , , ...

Some Kind Soul(tm) sent me a Sun CG12 a few weeks ago, also known as GS or Matrox SG3. Yes, the board was designed and manufactured by Matrox, contains a bunch of Matrox-designed custom chips which don't show up in any part supplier's inventory list and a handful better known ones.
So, let me describe the board. It's quite huge - if you've ever seen a Leo or an AG-10e, it's like that, just one more slot. It occupies three SBus slots so it only fits in a handful machines like the SS5, SS1(+) and maybe the SS20 where it would hog one CPU slot. I put it in my SS5 where it blocks all SBus slots so there's no room for a fast ethernet board. Bummer.
The board I got turned out to be a CG12+, with 8MB framebuffer, 2MB z-buffer, 128KB RAM for the DSP. The only documentation available is a header file from SMI which contains a rudimentary register and address space layout, existing drivers in OpenBSD and Linux are both based on this file, and by necessity, so is mine.
If you read this header file and then look at documentation for older Matrox chips you will see a lot of familiar terms, apparently the entire Millennium I family and its ancestry ( like Athena ) are more or less distant relatives of the CG12. This should at least give us some ideas how the drawing engine works.
Of course I got the board for the promise to make it work in NetBSD. Unfortunately the firmware lies about the graphics mode set up at boot time so genfb at sbus produces garbage - the address property points at the monochrome overlay plane while the depth property contains 32. Sure, the board has a 32bit framebuffer but that's not what the firmware console uses. Getting it to work as a monochrome wsdisplay was easy enough but when I tried to use a shadow framebuffer to speed things up I found out that NetBSD's machine independent drawing routines for colour depths less than 8 hadn't been adapted to shadow framebuffer usage, so I did that and now the card works fine as a not all that slow but still monochrome console.
Given the similarities between the CG12 and older, more or less well documented Matrox graphics chips it may be possible to get the blitter going. I did some experiments but didn't get very far - all older Matrox graphics chips start commands by ORing 0x100 to the address of a register write. This doesn't match up with the CG12 - the engine registers occupy two successive blocks of 0x100 bytes, so if they use the same method they must use a different value to OR to the address. Also, the engine appears to be in a reset state of some sort, writes to any drawing related registers are ignored, or at least don't show up when reading registers. All other registers appear to contain something sensible and respond to writes, like the DACs ( there are three Brooktree 8 bit DACs, apparently working in lockstep ), various registers that control which view of the framebuffer you see ( be it 8 or 24 bit, overlay or enable planes etc. ) and what appears to be video mode setup.
It may well be that the board requires a firmware image to be uploaded somewhere ( maybe the DSP's private memory ), on the other hand the device properties give a firmware version. It is unclear wether this refers to the ROM content in general or the DSP's firmware. It is also unclear to what extent the drawing engine depends on the DSP, if at all. I don't think we will be able to use the DSP ( there are registers which seem to be for communication with the DSP but no indication whatsoever how they are supposed to work ) but I don't think we have to - the drawing engine will probably work just fine without. Also, the drawing engine seems to be pretty much 2D only, z-buffer operations seem to be the only 3d-support present - maybe the DSP is supposed to do some higher level drawing operations.
What we know about the board is this:
  • it's got 8MB dual-ported framebuffer memory, 2MB z-buffer and 128kB DSP memory - quite a lot for a graphics board from 1990
  • there's a monochrome overlay plane and an enable plane for the overlay
  • there's an 8 bit WID plane and what appears to be a 256 entry WID LUT
  • the framebuffer can be used as 8bit, 24 bit single buffer or 2x 12bit with double buffering, the overlay/enable planes seem to be completely independent
  • apparently there is no support for a hardware sprite - I guess we're supposed to draw a cursor into the enable/overlay planes, that way at least it wouldn't interfere with the regular framebuffer
  • there's a DSP, we don't know what exactly it's supposed to do
  • the DACs seem to be regular 256 entry greyscale DACs working in lockstep to provide gamma table / palette support.
  • the drawing engine seems to be similar enough to older, documented Matrox designs - there might be a chance to get it going.

Even though the card is quite monstrous its heat output seems to be moderate at worst, it's certainly nowhere near as hot as a Leo or even an AG-10e.

Another kernel graphics driver

, , , ...

I got me a new toy - a Blade 2500, 2x 1.28GHz US-IIIi, 4GB RAM and two so far unsupported graphics boards, namely an XVR-500 and an XVR-1200. Of course I had to do something about the graphics boards.
As it turns out both use different variants of 3Dlabs Wildcat OpenGL accelerators to which not a shred of documentation is publicly available. Linux and OpenBSD support them to some degree, OpenBSD even has some acceleration for some of them, but all very hacky and with loads of unanswered questions.
The problem with these boards is that we get the hardware in a quite insane state - there are two 8bit framebuffers, two 32bit framebuffers, and somewhere there's a control plane which selects for each pixel from which framebuffer the colour information comes. Unfortunately this control plane contains garbage ( pixels appear randomly selected from the two 8bit framebuffers ) and we have no idea how to access it, so what both OpenBSD and Linux do is to do all drawing operations in both framebuffers to make sure the pixels show up.
This approach has a few distinctive disadvantages - it's slow, and when you do scrolling operations you can see which pixel belongs to what framebuffer since they scroll one after the other ( of course the latter point is moot on boards where OpenBSD supports acceleration ).
To get around this my driver goes a different route - it uses a shadow framebuffer in main memory. Although wsdisplay already supports shadow framebuffers I couldn't use it since I still need to update two framebuffers, so I had to write my own. All operations are done on the shadow buffer first and then copied into both framebuffers. This has the following advantages:
  • all framebuffer reads over the PCI bus are eliminated
  • all drawing operations that involve reading from the framebuffer are done in cached memory
  • copying data from the shadow buffer to the graphics board happens from cache
  • all drawing operations are done only once, into cached memory, and copying the results to the graphics board is much faster than repeating the drawing operation

The result is speed which can rival some older accelerators, at least if you stick the board into a 64bit/66MHz slot.
That said, the wcfb driver has been tested with XVR-500 and XVR-1200 boards in a Blade 2500 and an Ultra 60, it should work fine in other machines ( as long as the firmware sets up a usable graphics mode ) and it will probably work with other Wildcat-based graphics boards like the Expert3D series, XVR-600 etc.

A simple guide to Sun graphics hardware on NetBSD

, , , ...

NetBSD/sparc and sparc64 support quite a few different graphics cards and onboard chips these days, obviously they all have rather different strengths and weaknesses. Let's start with SBus cards:
  • CG3 - a dumb framebuffer with an 8 bit DAC that doesn't even support a hardware cursor. Slow. Avoid unless there really is no alternative. Firmware obeys mode specifiers on newer variants, older boards can switch video modes only by jumper or not at all. No significant heat output.
  • CG6/GX/Lego - a family of accelerated 8 bit graphics boards and onboard chips. Can have 1MB or 2MB of usable video memory, sometimes twice the amount to allow double buffering. The blitter is quite fast for its age, it can certainly keep up with any mach64, especially the Turbo variants. Boards with more than 1MB RAM support quite high resolutions as well. Firmware obeys mode specifiers on most boards. No significant heat output.
  • CG14/SX - VRAM module and onboard rendering engine found in SPARCstation 20 and some SPARCstation 10 models. CG14 is an oversized memory module with a DAC bolted to its side which fits into one of two special memory slots. It's available with 4MB and 8MB VRAM, acceleration relies on the SX chip on the mainboard which is unsupported in NetBSD for lack of documentation. Supports 8bit and 24bit output with a hardware cursor. Since it's sitting on the memory bus it's faster than any unaccelerated SBus board and thus actually usable. Firmware obeys mode specifiers. No significant heat output.
  • ZX/Leo - a monstrosity designed for 3D graphics. Supports up to 1280x1024 in 24 bit. These boards get very hot, especially the TurboZX. It's not very fast as a console, even with acceleration, and X in 24bit isn't supported yet for lack of documentation ( relavent bits about the DAC are missing in available docs ) - beats dumb framebuffers but not by a big margin. Firmware ignores mode specifiers, only tools shipping with Solaris can switch modes.
  • Fujitsu AG-10e - same size as ZX/Leo but a very different beast. Has separate graphics chips for 8bit and 24bit planes and WIDs. Decent speed in both X and the console, gets warm but nowhere near as hot as the ZX/Leo. Currently the only way to get accelerated X in 24bit with an SBus-only machine. Firmware ignores mode specifiers. It's supposed to support DDC2 but I see no evidence of that.

So, these are your options on 32bit Suns. If you want a reasonably fast console and by X you mean 'bunch of xterms' get a CG6. If you need more than 1152x900 get a GX+, TurboGX+, XGX+ etc. - they can go up to 1600x1200. Newer variants all occupy a single SBus slot. The Turbo prefix indicates a model with higher clock speed, the Plus indicates more than 1MB VRAM. Older variants may occupy two slots.
If you need 24bit and have two free SBus slots go find an AG-10e. The problem with this is that it won't obey mode specifiers and we lack documentation to switch video modes ourselves so it's 1152x900 in 66Hz, even though the board itself should support significantly higher resolutions. If you need 24bit in high resolution you'll need an 8MB VRAM module. It will happily switch to pretty much whatever mode you want ( as long as it fits into 8MB VRAM ) - speed isn't great but not too bad.
ZX/Leo, for now at least, should be avoided. As a console it's slower than a CG6 and burns a lot more electricity which is all converted into heat. And it's a two slot beast.

Now to UPA boards:
  • Creator/Creator3D - family of graphics boards, all have 5MB VRAM, the 3D variants have two 5MB buffers and a Z-buffer. These boards can also support higher resolutions by combining their two buffers. Newer boards are significantly faster than old ones ( UPA clock speed ranges from 66MHz up to 120MHz ). All boards support 1280x1024 in 24 bit, Creator3D can go up to 1920x1200.
  • Elite3D - more or less a Creator3D with geometry processors. Can not combine buffers for higher resolutions and for our purposes it's not significantly faster than a last generation Creator3D, it produces a lot more heat though ( nowhere near ZX/Leo levels though )
  • XVR-1000 - next generation Creator3D, with lots more VRAM, its own CPU, loads of texture memory. We support it only as a dumb framebuffer but thanks to its fast interface and even faster VRAM it beats some accelerated PCI graphics boards in X even without any kind of acceleration. Supports whatever your monitor needs and then some. Although it has a huge heat sink it doesn't seem to get all that warm, probably because we run it as a dumb framebuffer.

If you want a fast console get a Creator(3D). If you already have an Elite3D keep it unless you need more than 1280x1024. There is no useful documentation available on the XVR-1000 so we probably won't be able to support acceleration any time soon.
All these boards obey firmware mode specifiers and DDC2.

PCI boards:
  • PGX - a 2MB ATI Rage II, also found on older Ultra 5/10 mainboards. Works alright as an accelerated console but X in 24 bit is limited by video memory.
  • PGX24 - a 4MB ATI Rage Pro, also found on newer Ultra 5/10 mainboards. Faster then a Rage II, more suitable for 24bit thanks to more VRAM, happily runs 1152x900 in 24bit at 75Hz or higher resolutions in 8 bit.
  • PGX64 - an ATI Rage XL, also found on Blade 1x0 mainboards. Should work fine, should be faster than a Rage Pro, but I don't have the hardware. Consider it untested.
  • PGX32 - an 8MB Permedia2. Made by TechSource as Raptor GFX 8P. Firmware is buggy but we have workarounds. Performance as console and in X is decent, image quality in high resolutions may not be all that great though.
  • XVR-100 - a 32MB Radeon RV100. By far the fastest of the bunch as a console, things aren't that clear in X though. Image quality at high resolutions is good, it has a DVI port too.

This one's easy - get an XVR-100. Supported well as a console and in X - it's a bog standard Radeon. All should obey video mode specifiers and support DDC2.

Now there are Suns which have both SBus and UPA or both PCI and UPA.
The Ultra2 can take an AG-10e but the U1E can't - no two SBus slots directly next to each other. If you don't need X in 24bit a TurboGX(+) isn't significantly slower than an old Creator, if you already have a Creator keep it. AG-10e vs. old Creator is more complicated. Creator supports hardware accelerated alpha blending, AG-10e does not ( well, in theory the chip does but only via DMA from main memory which is unsupported ). Creator also has a faster link to the mainboard. AG-10e however has more usable video memory and doesn't use the CPU for image copy operations. Your mileage may vary. In real life there's probably not that much of a noticeable difference.

PCI vs. UPA:
XVR-100 as a console is fastest. In X however, a last generation Creator3D can give it a run for the money, especially in a well equipped Ultra 60. UPA offers much more bandwidth than PCI, even if you stick the XVR-100 into a 66MHz slot ( as you should, the board supports it ) - UPA runs at 120MHz though ( in an U60 at least ), and is 64bit wide. If you need DVI the choice is easy - there's no way to add DVI to any Creator.
In real life you will probably notice differences between the XVR-100 and a Creator3D - sometimes the Creator will be faster, sometimes the XVR-100. If you need many PCI cards you'll probably want a Creator or two ( the U60 has two UPA slots after all ). It's difficult to pick a clear favourite here - Creator3D boards are much faster at image transfers between main memory and VRAM, they're also much faster in alpha blending operations ( think anti-aliased font rendering ) although this is broken in Xorg 1.6 ( works fine in 1.4 though ). The XVR-100 has much more usable off screen memory, VRAM-to-VRAM blits are faster and don't use the CPU, unlike Creator it also supports video overlays. Heat output is no concern with either of them.

In -current NetBSD will let you use pretty much any combination of graphics boards in X ( exceptions are the unaccelerated ones which will work only as console / primary head ).

New year, new graphics hardware

, , , ...

Some Kind Soul(tm) sent me a Sun XVR-1000 board, complete with daughter card for additional outputs. This thing occupies two slots ( one UPA and whatever is next to it - the specs claim it needs one UPA and one PCI slot which is nonsense, it only needs additional space, not connectivity ) and has four outputs - traditional 13W3, S-Video, VGA and DVI. The latter two are on a daughter card with one additional DAC so they're likely not independent ( that, and the big, fat heatsink with attached wind tunnel accounts for the need for a 2nd slot )
Documentation for this card is not available but supporting it as a dumb framebuffer was trivial - the firmware tells you where the card's memory regions are, what video mode it's in so getting it going required just minimal poking around. For a dumb framebuffer the thing is quite fast though - in my U60 it beats some PCI graphics boards with acceleration, I guess that's where the UPA connection pays off.
So, now we support it as both console ( gfb at mainbus ) and in X ( with the wsfb driver ) - both unaccelerated for now but usable.
This hardly does the card any justice, but without documentation there's not much I can do. I found the website of one of the engineers involved with its design, tried to contact him but didn't get an answer so far. Since he also worked on other Sun graphics hardware I'm sure I won't run out of questions for a while if he ever answers.

Wrote a couple drivers

, , , ...

Since I got an S24 for the promise of writing drivers I had to follow through - tcx as sbus now uses the 'blitter' for scrolling and the stipple 'engine' for filling rectangles and drawing characters. This card is seriously weird.
First the non weird characteristics - it's got 4MB video memory, each pixel is 26bit with 2 bits control and either 8 bit palette or 24bit colour with or without gamma correction. It plugs into the SPARCStation 5's AFX slot which is basically the MicroSPARC's graphics bus which is 64bit wide and apparently works at more or less the same speed as the memory interface. To that add a DAC which understands the control bits, supports a hardware sprite etc, and different views on the graphics memory, either as 24bit without control, 24bit with control bits or 8 bit. So far so standard. The weird thing is how this card does graphics acceleration. It doesn't have a graphics processor, instead it has a 'blit space' and a 'stipple space'. Both respond to 64bit writes where the address you write to defines a target pixel in video memory and what you write into it defines a command which can be 'copy up to 32 pixels from one location to another' or 'write this 32 pixel pattern in this colour'. Stipple space supports only transparent patterns and they have to be aligned to 32 pixel boundaries. Blit space can copy 1 to 32 pixels without any alignment requirements. Both exist in two versions, either with access to the control bits or without ( although copies without control bits are kind of useless ).
I also added acceleration support to Xorg's suntcx driver, it uses EXA and supports block copies and rectangle fills.

After that I finally got around to write a console driver for the Sun PGX32 / TechSource Raptor GFX 8P. It's nothing special, just a Permedia 2 with 8MB SGRAM. For now pm2fb at pci uses the blitter for scrolling and rectangle fills. For some reason I couldn't get the drawing engine to actually draw anything, therefore characters are drawn by software and rectangle fills use SGRAM-specific fast fill operations. I must be missing something here since Xorg's glint driver clearly manages to get it to draw stuff. The whole thing is a mess - a dozen subunits, each with enable bits in a dozen different registers. I probably forgot to enable one crucial subunit which keeps drawing operations from doing anything else than zeroing rectangles out. Ah well, at least copy operations work right.

Finally, last week hell froze over and I had another look at my rev. 5 Shark's graphics chip. Finally cleaned up the igsfb hacks that have been sitting in my source tree for years, committed the mode setting code, made sure it doesn't behave any different than before on Krups, added support for Sun-like video mode specifiers in OpenFirmware's output-device variable.
So, on a rev. 5 Shark you can now
setenv output-device screen:r1280x1024x60

And igsfb will switch to 1280x1024 in 60Hz. Any mode defined in src/sys/dev/videomodes/ should work as long as the graphics chip can support it. There is no support for TV output yet.
The firmware doesn't support modesetting but it will happily ignore the mode specifier so we can use it and users can now treat the shark more or less like a weird Sun wink
That said, I also finally got around to write an Xorg driver. The xf86-video-igs driver so far only supports the VLB CyberPro 2010 but adding support for PCI variants and the CyberPro 2000 should be trivial - I don't have the hardware though. The driver is still quite immature, it uses wscons ioctl()s for the hardware cursor, doesn't contain any modesetting support ( it uses whatever it finds, that's why I added the code described above wink ), it only accelerates rectangle fills and block copies and there is no support to switch colour depth either. This will be added to the kernel driver though, make it switch to whatever is the highest colour depth possible in the given mode when switching to graphics mode.