Thursday, February 7, 2013 2:49:41 PM
wsdisplay, NetBSD, SX, graphics
...
I've been bitching about lack of hardware documentation ( especially on graphics hardware ) for a long time, and sometimes the right people hear me whine and drop programming manuals on my (virtual) doorstep. A few days ago this happened again, this time I got:
- official cg6 documentation, including the geometry unit
- official SX documentation ( the graphics processor built into the SS10SX's and SS20's memory controllers ) - complete with instruction set description. Claims to be preliminary and predates the SS20 ( it's dated 1990 ) but most information in there appears to be accurate.
- a firmware loader and some code for the cg12 - maybe we can get some acceleration out of that one as well
We pretty much know how to deal with a cg6, while the matrix / geometry unit is interesting it's not something that's useful for basic 2D acceleration.
The SX manual more or less confirms what we knew and/or guessed about the SX - it's a vector processor with plenty of internal registers ( 128 of them, 32bit wide, with the first 8 having special purposes ), it can access all physical memory ( well, it is built into the memory controller ) but apparently no SBus space. The CPU is supposed to feed it instructions, one or two at a time, there are two sets of mappable registers - one for kernel use, one for userland with parts read only ( things like boundary checking, otherwise userland could use the SX to access arbitrary memory ). The manual documents most of the register bits ( I found some that are set by SX but which the manual claims are unused, that's not all that surprising though, the manual is a few years older than ( and therefore a few revisions behind ) the hardware I'm using ).
SX turns out to be a vector processor, not some sort of SIMD unit as we initially suspected. The good thing about that is the fact that most instructions take a count of how many times to repeat the operation on successive registers and/or memory locations, that way we can read or write up to 32 registers or do up to 16 other operations with a single instruction, that way limit the number of instructions the CPU has to send. Now these operations don't all run at the same time - SX has two ALUs, so there is some parallelism but not a whole lot. On the other hand, MBus CPUs aren't exactly fast by today's standard either, so whatever we can offload to SX will probably help.
Thursday, February 7, 2013 1:59:42 PM
Xorg, wsdisplay, NetBSD, graphics
...
I've been working on omapfb and friends in the last few weeks, some of the results are:
- Proper console selection via firmware variable - omapfb will no longer steal the console unconditionally. Output on the serial port remains active until omapfb takes over.
- I added a driver for the on-chip DMA controller ( and in a bout of creativity named it omapdma ) - omapfb will use it for basic acceleration ( things like scrolling and drawing rectangles ). No putchar() acceleration, no anti-aliasing support (yet). Other drivers will need it as well, for example the audio part doesn't have its own DMA facility.
- For hardware where we use DMA buffers as video memory I added a flag to tell bus_dmamem_mmap() to map memory cache-inhibited but with things like write combining, write buffering, relaxed ordering etc. enabled. Added support for it to ARM's bus_dma implementation and made omapfb use it.
- omapfb now supports the WSDISPLAYIO_GVIDEO and WSDISPLAYIO_SVIDEO ioctl()s so screen blanking in X will turn off video output and that way allow both the monitor and the display controller to go to sleep.
- wsdisplay finally got a new ioctl() to get all relevant framebuffer geometry info in one go, including things like pixel format and the actual amount of video memory. Now wsfb no longer needs to guess based on ioctl(WSDISPLAYIO_GTYPE). Added a generic implementation that just takes data from rasops_info. This also includes a flag to tell userland that the video memory it's about to map is normal main memory ( as opposed to memory sitting behind a potentially slow and/or laggy bus ) - wsfb uses this to turn off shadow, which gives a nice speedup in these cases. Made omapfb and Raspberry Pi's genfb backend support it.
Tuesday, October 23, 2012 12:17:08 PM
wsdisplay, NetBSD, graphics, sparc
...
I added anti-aliasing support to the pnozz and agten drivers. Both are found in 32bit SBus systems which means there's not a whole lot of CPU power to be burned by rendering fonts. This isn't much of a problem though, since both come with a fair amount of off-screen memory, which we can ( and will ) use to cache glyphs once they're rendered the first time.
The agten driver got a more general overhaul:
- the i128 blitter routines no longer wait for completion, instead there are two waiting functions now - one waits for completion, the other for room in the pipeline
- putchar() got a wrapper to allow glyph caching and avoiding to sync the drawing engine unless absolutely necessary
- and of course the palette is now R3G3B2
With this the driver is about 15%-20% faster than before, speed is in the same range as a Rage Pro / PGX24 on sparc64.
Thursday, September 13, 2012 11:59:43 AM
wsdisplay, PGX32, NetBSD, 3Dlabs
...
As I wrote last time, pm2fb now supports anti-aliased fonts and mode setting. Like all the other drivers there is no userland interface (yet) - the driver will just talk to the monitor via DDC2, find a usable mode and run with it.
This has been tested only on a TechSource Raptor GFX 8P / Sun PGX32 so far, an 8MB Permedia 2V with Sun firmware, which seems to be one of the more common low end PCI graphics cards found in Sun hardware. Support for Permedia 2 is disabled for now because it's got a different DAC than the 2V and I don't have the hardware so I can't test it.
The mode setting part was fairly easy - just program the DAC's PLL with the right parameters for whatever pixel clock we need ( or rather, half of it since we run it in 64bit mode ), tell the graphics chip to output 64bit chunks of data and the usual display, sync, blank etc. parameters, program pixel size and colour format - done. No need to mess with the memory controller or other more esoteric bits like on certain other hardware.
The rendering engine on the other hand is a little more - umm - special. First of all, it needs a stride which is a multiple of 32 pixels. And it wants it in the form of three partial products, as in a sum of three instances of (n == 0) ? 0 : 32 << (n - 1), not just a mere value in a register like more sane engines would use. Then, the whole thing is a 3D drawing engine which can be tricked into doing 2D operations if you ask nicely, unlike other chips which have separate engines for 2D and 3D. The result is that 2D drawing ops are as slow and complicated as 3D ones. It works as a long pipeline which passes data through various sub-units ( doing stuff like colour format conversion, texture mapping and the like ) one pixel at a time.
Unfortunately, simple 2D operations like filling or copying rectangles work like that too - one pixel at a time. To speed things up one can enable a fast fill mode for drawing rectangles, and for copies in less than 32bit there is a 'packed mode' which allows the engine to move data in 32bit chunks - four pixels at a time in 8bit. The former is straight forward, just set a bit in a register, the latter requires more preparation. In order to use packed mode we have to trick part of the engine into thinking it's working on 32bit pixels, clip off whatever pixels may overhang on the left and right borders and adjust for source and destination not being aligned. Still not too bad if it wasn't for the fact that the manual is rather vague on how exactly this is supposed to work. It took a lot of poking around and staring at various other drivers to get that right.
Either way, in 8bit colour with the blitter running in packed mode the whole thing is a lot faster than in 32bit ( the firmware sets up 32bit by default - it can be changed but that's undocumented and requires mucking around with OpenFirmware ), it ends up somewhere between a PGX24 ( a Rage Pro ) and a PGX64 ( a Rage XL ) - not too bad, fairly usable. For now the driver will always switch to 8bit R3G3B2 colour for speed, use anti-aliased fonts when available and cache glyphs in off-screen memory. Adding support for higher colour depths isn't hard but since it results in a serious slow down and X does its own mode setting anyway it's probably not worth it.
Saturday, September 8, 2012 2:56:15 PM
wsdisplay, NetBSD, graphics, alpha blending
Just a quick status report on anti-aliasing support. The following drivers can use anti-aliased fonts:
- cgsix - 8bit only
- ffb - 32bit only ( 8bit would be just one channel, no speedup to be gained there and we'd lose hardware alpha blending ), alpha blending done by hardware, no glyph caching
- genfb - 8bit and 32bit, no glyph caching yet
- gfb - 32bit only, no glyph caching yet
- machfb - 8bit only
- r128fb - 8bit only
- radeonfb - 8bit and 32bit colour
- valkyriefb - 8bit only, no glyph caching yet
- voodoofb - 8bit only
- voyagerfb - 32bit only
If not mentioned otherwise, drivers support caching glyphs in video memory and alpha blending is done by software. That said, pm2fb and chipsfb will grow support in the near future, so do crmfb and maybe newport ( the latter two probably with hardware alpha blending since the hardware supports it, even in r3g3b2 as far as I can tell ).
Adding support for hardware that's essentially a dumb framebuffer is trivial - set up the right palette and tell rasops that you want alpha fonts ( rasops supports it in 8bit and 32bit ). Genfb, gfb and valkyriefb work like that and there is no reason why for example cgthree and cgfourteen can't do the same with little to no effort. For these two we'd want a generic scheme for caching glyphs in main memory though, since the machines these are typically found in tend to have rather slow CPUs.
Wednesday, July 18, 2012 12:56:20 AM
NetBSD, CG6, graphics, sparc
...
I started hacking on Sun's CG6 / GX family of graphics controllers again, mostly because a
SPARCstation LX showed up on my doorstep and I found a relatively reasonably priced source for video memory modules for it. With that it's capable of running video resolutions up to 1600x1280, which makes it useful as a debug head. Sure, a 50MHz MicroSPARC is painfully slow by today's standards but it's still more than enough to run a bunch of xterms, text editors and ssh sessions.
Without the additional video memory it would top out at 1152x900 which, even though most modern TFTs support it, doesn't match up with their native resolution resulting in ugly stretching artifacts. Panels that use 1280x1024 or 1600x1200 are far more common and with the VRAM upgrade the LX can run them at native resolution.
The good news is that the LX's video output circuitry produces a nice, crisp picture even at high resolutions ( unlike, for example, shark or a whole bunch of contemporary consumer grade graphics hardware ). The bad news is that the LX's onboard CG6 is rather slow even compared to other CG6. I did most of my development work on various Turbo GX and XGX variants found on SBus cards, all are quite a bit faster. On top of that there are small differences in the actual graphics processor, and one bit me on the LX:
All CG6 have a bit in the status register which is set whenever the blitter or the drawing engine are busy. On some variants ( by coincidence all the SBus cards I have fall into this category ) there's another bit in the same register indicating that the pipeline is full, so in order to send another command my drivers would wait for that bit to clear. Now it turns out that the LX's onboard CG6 doesn't support this second bit so we have to wait until the blitter is done before sending more commands, resulting in more waiting and less parallelism between CPU and graphics controller.
While there I also added support for anti-aliased fonts to the cgsix driver, which is quite usable, despite the slow CPU, as long as there is video memory available to cache glyphs in. Of course the alpha blending has to be done by software since the CG6 doesn't even know about the concept.
Friday, February 3, 2012 2:10:43 AM
PowerPC, Firefox, Mac OS X, Opera
I have been bitching about Opera discontinuing their version for MacOS X / PowerPC before, especially since they didn't bother to keep it going until it reached a state of reasonable stability and functionality, instead they just dropped it in the middle of something that looked more like a public, unannounced beta phase. Ever since I've been looking for a suitable replacement since I'm not going to get rid of my PowerPC hardware any time soon.
Of course, in the original bitch&whine post I said I would drop MacOS X altogether ( and did so on some machines - they're NetBSD-only now ), I can't really do that on the G5 yet.
Of course there is Safari which, even though Apple abandoned the PowerPC version not long after Opera did, is in considerably better shape than Opera 10.63, especially the version that comes with OSX 10.5.
Finally there is - or rather, was - Firefox. I never liked the OSX version and now they stopped providing PowerPC-builds for everything newer than 3.6. Also in much better shape than the final Opera version but it lacks stuff like HTML5.
The solution I found is a branch of Firefox called
TenFourFox. It's more or less current Firefox with PowerPC bits dusted off and optimized. As the name suggests it supports OSX 10.4, it works like a charm on my G5 and unlike the official Firefox I've been unable to crash it so far.
Saturday, January 28, 2012 1:46:39 AM
PowerPC, NetBSD, graphics, alpha blending
...
A while ago Someone™ sent me a
Performa 6360 - it's got a 160MHz 603ev CPU, not exactly high end even in 1996. Installation was quite painful since the firmware neither supports the onboard video nor booting anything other than MacOS from CDROM. Since I wanted to upgrade the harddisk anyway I just prepared the 'new' disk in another Mac, and since I wanted to do some voodoofb hackery I put a Voodoo3 in the single PCI slot which does have the right firmware goo to serve as OpenFirmware console. This particular Performa came with a standard 10MBit/s Apple Ethernet board, no modem, no TV module and the standard 256kB cache module. I would have tried the G3 accelerator from my PowerMac 4400 since it uses the same cache slot but the graphics card is in the way. Either way, I found two suitable 64MB modules, now RAM is maxed out at a whopping 136MB.
Now for the other reason why I've been playing with this machine. It's quite slow and therefore a nice test bed for CPU-intensive tasks like alpha blending. The Voodoo3's 2D engine doesn't support alpha blending and the 3D engine is 16bit only even though the rest of the card will happily do 24bit colour. So the first step was to add support for anti-aliased fonts to voodoofb, for now only in 8 bit. As usual, rendering is by software but actual drawing of the characters uses host blits so we can use the pipeline instead of having to wait for the engine every time we want to draw something. This is already pretty fast, in order to make it faster I added a simple cacheing scheme which stores commonly used characters ( as in, everything that uses the default attribute ) in video memory when they're drawn the first time and if they're needed again we simply blit them in place from off-screen memory. That made it even faster.
While there I finally added DDC2 support ( mode switching has been in the driver for years although mostly unused ) which works nicely up to 1680x1200 ( my TV's 1920x1080 didn't work for some reason so these modes are disabled for now until I find out what's (not) going on ).
The other other reason for reviving this machine was the unsupported onboard video. In OF it shows up as /valkyrie, the 'screen' devalias points to it by default, so apparently it was intended to be the console at some point.
Of course the only documentation ( if you can call it that ) is the Linux driver which was apparently reverse engineered from MacOS. The hardware is rather primitive - there is an i2c-controlled PLL which generates the pixel clock, 1MB framebuffer memory, a simple RGB DAC and a handful registers to program video modes, colour depth and interrupts. Video mode programming is weird - there's a single 8bit register and the upper two bits are used to turn off video output and sync signals. The lower 4 or 5 bits apparently correspond to video mode numbers used by MacOS, so there is no way to program arbitrary modes although we can use whatever pixel clock we want. My driver is therefore split into two - one for the PLL so it can attach to CUDA's i2c bus and it might be useful for other, similar video hardware which may use the same way to program the pixel clock, and the actual framebuffer driver. It switches video modes by matching the requested mode against a list of suitable MacOS modes and then programming the PLL with the right pixel clock. Works alright so far. Since there is no drawing engine whatsoever everything is drawn in software which brings us to the next point, namely anti-aliased fonts on dumb framebuffers.
As it turned out, even on a low latency bus with a relatively slow CPU, the time it takes to draw an anti-aliased character is dominated by the time it takes to shove the pixels into the framebuffer, not the actual calculations. The fact that my first implementation of the drawing method was quite inefficient didn't help either.
In order to speed things up I now let it render each scanline into a buffer in main memory and then use memcpy to move it into video memory, instead of writing each pixel separately. That gave a nice boost. Then I discovered that the 'fast path' for blank characters which I copied from an existing putchar() method was even worse - it drew every pixel separately
and every time it checked for a shadow framebuffer in order to update that as well, pixel by pixel. Replacing that with memset() gave another big boost.
The benchmark I used was to scroll a bunch of text ( always the same file of course ) and measure how long it takes.
In its first incarnation valkyriefb took about 56 seconds. The memcpy() trick reduced it to 50 seconds. Using memset() to draw blanks got it down to 32 seconds, and cacheing glyphs in main memory reduced it to 27 seconds.
So, out of the whole time it took to scroll the text ( which, on a dumb framebuffer redraws the entire page instead of reading from video memory. According to the same benchmark scrolling by copying video memory is
even slower than the original, inefficient putchar() implementation ).
So, out of 32 seconds of constant drawing of anti-aliased characters, the actual calculations took a mere 5 seconds. The rest is almost all writing to video memory. I also experimented with mapping video memory cacheable or with relaxed ordering restrictions but when using memcpy() and memset() neither one made a measurable difference.
For comparison, the same benchmark on voodoofb took an average of ~2.15 seconds without cacheing in video memory, and ~1.3 with cacheing. The difference is that on voodoofb scrolling is done by the blitter so it doesn't draw nearly as many characters, which is why the calculations don't amount to the same 5 seconds.
The same optimizations yielded a
visible speedup on a 2GHz Athlon 64 with PCIe graphics running as a dumb framebuffer. You'd think a CPU like that with a link to video memory that's way faster than the Performa's CPU bus would render circles around the 603e / Voodoo3 combo. Nope, it doesn't. It's barely faster than valkyriefb ( the same benchmark took 29 seconds without glyph cacheing ) and doesn't come anywhere near the Voodoo3. I'll have to figure out how to do host blits on modern Radeons.
Lessons learned:
- video memory is slow, likely slower than you think it is
- video memory reads are to be avoided at almost all cost, if you think you're at the point where the cost is too high you're probably wrong
- forget your intuition, your CPU is probably faster than your video memory. It's always a good idea to measure instead of going with probably ill-supported assumptions.
- fast methods to write video memory may compensate even for a vastly faster CPU
- PCIe is ridiculously fast with big burst transfers, it really, really hates it when you transfer small chunks
Sunday, January 8, 2012 2:37:12 PM
NetBSD, Loongson, MIPS, Gdium
I went through with the plan described earlier - use one of the SM502's PWMs to generate a 100Hz timer interrupt, change clock speed only in the timer interrupt handler and that way compensate for the effect on the MIPS cycle counter ( as in, we have a global counter that updates every timer interrupt and time counters just measure cycles since the last timer interrupt adjusted for CPU clock ). This has been committed a while ago, along with changes to pkgsrc/sysutil/estd.
The good news - at a lower clock speed the machine gets significantly less hot.
The bad news - the fan will still spin up every now and then, just not as often as it used to.
Wednesday, December 28, 2011 9:09:06 AM
NetBSD, Pismo, graphics, alpha blending
...
The main reason to add support for alpha blending to wsdisplay was to give us relatively easy access to TrueType fonts - all anti-aliased fonts currently present in the NetBSD source tree were generated from TTF fonts found in pkgsrc with licenses that appear to allow redistribution of specific renderings. While freetype2 ( which is what the conversion utility uses ) can output monochrome bitmaps suitable for wsdisplay, the results look much better with anti-aliasing enabled.
Now we support some graphics hardware that doesn't support more than 8 bit colour, but which might be found in machines which are easily fast enough to do the alpha blending calculations by software, or graphics hardware that is just too slow when run in more than 8 bit, so we should be able to use the new fonts in 8 bit as well. There's prior art too, for example RISC OS supports anti-aliasing in a low as 16 colours ( that's 16 colours, not 16 bit ).
The solution is to simply use a fake 'true' colour map. In 8 bit we can use 3 bits for red, 3 bits for green and two for blue, which should be enough to make anti-aliased fonts look halfway decent. Rendering in 24 bit still looks an order of magnitude better but r3g3b2 doesn't look horrible either.
To get this to work rasops needs to know that the hardware colour map is r3g3b2 instead of ANSI colours in the first 16 palette entries - I added a flag which drivers can set to get the right devcmap. In order to test this out I added alpha blending support in 8bit colour to the r128fb driver, mostly because the hardware is relatively common in PowerMacs, the firmware always hands us 8 bit colour, and my Pismo was just sitting there waiting for some hackery. It's doing all calculations by software since we don't have any docs on the 3D engine, the 2D engine doesn't support alpha blending in any way and even if it did it probably wouldn't support it in 8 bit. Just like voyagerfb it still uses the blitter to draw characters so we don't have to scribble into video memory and stall the drawing engine.
The good news - it's pretty fast and looks better than bitmap fonts.
The bad news - you can immediately tell the difference to the same font rendered in 24 bit. Black on white looks nice but some other colour combinations not so much. Still lets us use the new fonts with usable results, which was the whole point of the exercise.
Sunday, December 25, 2011 6:50:28 AM
creator, graphics, ffb, alpha blending
...
Sun's Creator / Creator3D / Elite3D family of graphics boards is a strange bunch, compared to what you'd find in PCs. Their distinguishing design choice is their use of 3dRAM, which is marketing speak for video memory with built-in ALUs. The idea is to conserve video RAM bandwidth by eliminating or at least greatly reducing read-modify-write cycles. For this purpose the chip supports many different views on its memory:
- Five 'dumb' apertures, which bypass the ALUs and access memory directly. One 32bit per pixel one and four 8bit views, one for each component ( red, green, blue, X - depending on context that's either WID or alpha )
- Six 'smart' apertures which access memory through the on-chip ALUs and that was may have all sorts of side effects like bit operations, alpha, depth cueing, z-buffering etc. applied. One for 32bit per pixel, one for each channel and a 64bit view through which both pixel and Z-buffer data are visible.
Each pixel consists of 96 bits of information on the 3D models - front and back, Z and stencil buffer, the non-3D models only has 32bit per pixel - just one framebuffer, no double or Z buffering available.
So, in order to draw anti-aliased fonts in the ffb driver my first idea was to program the ALUs for a * fg + (1 - a) * bg alpha blending, set the colour source to constant so fg in the formula will come from the foreground colour register instead of pixel data written to the framebuffer. Unfortunately there is no mode to have the background colour come from a register as well so in order to draw a character we first have to fill the character cell with the background colour. This isn't too bad, if we're drawing a space we can stop right there and skip the whole alpha blending business. We have to wait for the drawing engine to finish anyway since changes in ALU programming only take effect when the engine is idle. Now we should be able to just memcpy the alpha map for the character into the 8bit smart aperture corresponding to the X channel, in this case alpha. Unfortunately this doesn't work, if I do this I end up with colour data from the pixel I write to being fed to the ALU, if I write only the alpha value into the 32bit smart aperture it uses the colour from the foreground register. Ah well, still 32bit writes per pixel but at least I don't have to combine alpha and foreground colour like Xorg's sunffb driver does.
Tuesday, December 20, 2011 4:19:16 PM
wsdisplay, NetBSD, graphics, alpha blending
...
After a little exchange on IRC about having more usable fonts for wsdisplay, anti-aliasing and using things like TrueType fonts in the console I went to work with freetype2, mostly because it's already part of NetBSD's build process.
As it turned out having freetype spit out 8bit alpha maps for individual glyphs is almost trivial, what's not so trivial is to find the right character cell size. The problem is that you give freetype a font height in pixels but this height does not include space for diacritics that may or may not appear above and below characters and all metrics are relative to the base line with no obvious indication where in the character cell to put it. My naive approach works like this - for a given character cell height request a type with 90% of that height, then measure the height of the capital W to find the base line within the requested height, leave the extra 10% for diacritics. The reason is this - for a console font we don't want too much empty space between lines and diacritics are rare so we accept the occasional cut off for the sake of something closer resembling a traditional terminal font. For every glyph truetype also gives us a distance to advance in X direction for drawing the next character, with a monospace font this should be the same for all glyphs which is exactly what we need for wsdisplay.
My ttf2wsfont utility currently takes a font, does the measurements above for a given cell height, then allocates memory and renders all ISO1 characters as alpha maps and spits the result out as a C header file which resembles the existing wsfont files except that font data are 8 bit instead of monochrome.
Now to the kernel part. First, since rendering alpha maps isn't really feasible with colour-indexed video modes we need to make sure there is always a fallback to a monochrome font and we need to make sure never to feed an alpha map font to a monochrome only rendering routine. For now I just keep alpha and mono fonts in separate lists and explicitly check the alpha list when we know we can handle them.
That leaves the actual rendering which is surprisingly trivial. For each pixel in the alpha map the result is simply alpha * foreground_colour + (1 - alpha) * background_colour. Or, since we're dealing with 8 bit values here, each component is (alpha * foregound_component + (255 - alpha) * background_component) >> 8. I only implemented this for rasops32, adding it to 15 and 16 is trivial.
The result looks pretty, even on a laptop where the VESA BIOS is too braindead to initialize a video mode that matches the panel's native resolution.
The next step is to implement the alpha blending in hardware with at least a few drivers - ffb and crmfb come to mind, mostly because it's ridiculously easy to do on the corresponding hardware. No need for messing with texture mapping, ffb has RAMs with built-in ALUs that support alpha blending so all we need to do is to set a few registers and then scribble the alpha map into the 8bit smart aperture as is. With crmfb it's just as easy - the drawing engine supports alpha for all operations, including bitblts, so we can keep the font in video memory and drawing a character is a few register writes with no actual data uploads.
Since I've been hacking on Gdium for the last couple weeks of course I had to add support there too, unfortunately the hardware's alpha blending support is useless here - all it can do is to combine two images with a constant alpha value, it has no concept of per pixel alpha. The only thing we can do here is to use host blits to draw characters instead of scribbling into video memory, that way at least all operations go through the pipeline and we don't have to sync the drawing engine every time we need to draw something.
Thursday, December 8, 2011 4:08:40 PM
soldering, power supply, connector, Gdium
A while ago my Gdium's power connector got bad - it would only make contact when pulled in a certain way and even then it was flaky at best so I finally decided to open the thing and see if I can fix it.
Laptops, especially small ones, are notoriously difficult to take apart and often it's hard to tell if whatever is holding parts together is a screw you didn't find or some plastic tab you need to ( really, really carefully ) pry open - in other words, wether to apply force or not, if so how much force - therefore it is always a good idea to check if someone else had the problem before you did. Three seconds with google brought up
this. It's straight forward, the only difference is that in my Gdium there was no sticky tape to hold the keyboard in place, just plastic tabs.
To my surprise the Gdium - as far as laptops go - is fairly easy to do surgery on. Not a lot of tape, glue and plastic tabs. Many screws, not too many different ones and it's easy to tell which goes where. All screws have metal counterparts embedded in the case, not a single one of those wood screw like things screwed directly into the plastic which serve only one purpose - to wear out on first contact. Manufacturing quality is a lot better than I expected and there are no funky tricks like that tiny but strong magnet that holds the keyboard down in the iBook G4. The only thing that's difficult is the fact that some cables are very thin ( like the ones that go to the buttons and the USB camera in the lid ) so you have to be careful there.
But back to the power connector problem. Turns out the connector was fine, just a bad soldering pad which was easy enough to fix. The connector is a nondescript barrel type, mounted SMD-style instead of having the pins poke through the mainboard, and there is nothing else holding it in place so be careful if you want to avoid having to open your Gdium.
Wednesday, November 9, 2011 6:06:21 AM
NetBSD, MIPS, Gdium
I finally found out how to control the Gdium's fan.
As the title suggests, it's quite bizarre. The temperature monitor chip they used is an LM75 which does exactly one thing - measure its own temperature and optionally notify the CPU if it gets too hot. On Gdium, this signal is abused to 'control' a fan. This approach has a few distinct disadvantages over using an actual fan controller:
- No speed control. The thing is either loud or it doesn't spin.
- No fan monitoring. There is no way to check if the fan is actually spinning.
- Most fan controllers support at least one external sensor to be tacked on the CPU or whatever else gets especially hot ( usually the graphics chip ) in addition to an internal sensor. Gdium's CPU has no built-in sensor, and neither does the graphics chip.
I'm more and more getting the impression that whoever designed the Gdium didn't really think things through. First there is no CPU clock independent high reolution timer. Battery and temperature monitoring chips as well as some of the buttons have to be polled. And now the thing is loud by design.
Someone's been cutting the wrong corners.
Friday, November 4, 2011 9:28:39 PM
NetBSD, Gdium
Backlight control now works as expected - not just on and off, and the hotkeys work everywhere, not just in X.
Controlling the backlight level is kinda funky - on other graphics chips there is usually a register somewhere near the other flat panel interface registers, you poke an 8 bit value into which may or may not be linear ( Rage 128 is like that for example ). On Gdium it isn't quite that easy. The SM502 has a bunch of GPIO pins which can be used as plain old on-and-off GPIOs or other things, like serial interfaces, flat panel interfaces, i2c buses and so on. Three of them can be run as Pulse Width Modulation outputs, and one of those controls the backlight level. This has the following advantages:
- on/off is simple - run the output as GPIO and simply turn it on and off with a write to the appropriated GPIO_DATA register.
- no dedicated backlight control hardware ( don't need it if you don't use a flat panel )
It makes backlight control a little bit more complicated. It works like this: there's a 96MHz clock, a power-of-two divider and two counters - one defines how many clock cycles the output should remain high or low respectively. A side effect is that you can't get all on or all off that way - it has to be at least one cycle on and one cycle off, so these have to be special cased and handled by switching to GPIO mode. Otherwise, brightness is controlled by feeding the PWM output to the backlight with a capacitor to smooth it out, you get any given level by programming the PWM to output ~20kHz and adjust the duty cycle according to the level you want.
With this, backlight control works properly.
The hotkeys required more hackery to work though - the Fn key is not handled by the keyboard controller at all, it's not even reported as a modifier - it's Just Another Key. OpenBSD just added a special keycode translation hook which seems redundant given that we already have code in place to handle the Fn key on some newer Apple laptops. What I ended up doing is to recycle as much code from the Apple keyboard hack, catch the Fn key before translation, use a special translation table if Fn is down. While there I also added code to translate USB keycodes directly to PMF events, that way the hotkeys work everywhere. I had to make the code optional since there is no easy way to detect a Gdium keyboard from the ukbd driver's point of view - the keyboard controller is a generic Cypress part, just going by the CPU class we're running on seems wrong since there is nothing which prevents other MIPS or even Loongson boxes from using the same keyboard controller.
Thursday, October 6, 2011 3:04:38 PM
Loongson, NetBSD, MIPS, Gdium
Loongson support finally works, thanks to Manuel Bouyer's work on porting OpenBSD's code, with this I finally managed to get my Gdium to boot multiuser.
Since some of the device support already existed I didn't port any of OpenBSD's Gdium-specific drivers and instead used NetBSD's existing ones and code I wrote for our initial effort on getting NetBSD to work. The hardware includes:
- a Silicon Motion SM502 'Multimedia Companion Controller' - for graphics, audio, timers. Also contains a USB controller which non-prototype Gdiums don't seem to use. I wrote a driver for the graphics portion in 2009, added support for stuff like i2c and a base device for other drivers to attach to.
- a M41T8x real time clock. Already supported by the strtc driver.
- an LM75 temperature sensor. Already supported by the lmtemp driver.
- a generic ehci/ohci USB2 controller
- a Realtek 8139 fast ethernet controller
- a Realtek RT2561C wlan controller
- an ST7 microcontroller to manage things like power buttons and battery charge. This thing is strange - we talk to it over i2c and it doesn't seem to have any way to directly alert the CPU on anything, we actually have to poll the thing in order to figure out if the battery is low or someone pressed the power button. Wrote my own driver since OpenBSD's doesn't really do much at all.
Another problem with these machines is the braindead firmware. It's PMON2000, which is intended for embedded controllers and evaluation boards, and that shows. It can boot over network, contains all sorts of debugging facilities but there is no way to get information like the current video mode, vram location and geometry out of it. Or any information at all that's not basic configuration variable stuff. This means there is no way to have a machine independent, simple early console - drivers either need early attachment hooks, which is ugly, or we need hacks to retrieve or guess the necessary parameters. On Gdium we can safely assume that the display is 1024x600 in 16 bit and finding the framebuffer is just one BAR read, but on other machines it's not that easy. There is no device tree either so devices that can't be probed ( like i2c devices for example ) need to be guessed based on the model.
Finally, there is only one high resolution timer in the entire system and that's the CPU's cycle counter. The problem with that is, the counter's frequency changes with the CPU's clock frequency so we need to compensate for that if we ever want to support frequency scaling ( and since Gdium is a little laptop which gets fairly warm we most definitely do ). My current plan is to use one of the SM502's pulse width modulation units to generate a periodic interrupt at 100Hz, only modify the CPU clock rate in the timer interrupt handler, save the adjusted cycle counter on each interrupt and when querying the counter use the cycles passed since the last timer interrupt, adjust for frequency scaling, add the count saves in the last interrupt and return that. With this the counter's frequency should appear uniform no matter what clock the CPU actually runs on, we can guarantee it's uniform between timer interrupts and we only lose resolution when lowering the clock rate.
Why oh why didn't they put a cycle counter in the SM502? Just something that increments on every PWM cycle? Or add a counter to the CPU that works like PowerPC's time base, with its own clock, independent from the main CPU clock. Guess it shows which CPUs were designed with laptops in mind and which weren't.
Finally, since the kernel runs in 64bit while the userland is N32 I keep running into ioctl()s that need to be translated by the compat/netbsd32 code. The problem occurs only with ioctl()s which pass pointers between userland and the kernel - obviously they're different sizes which changes the data structures passed, which needs to be compensated for. On NetBSD, instead of having separate ioctl() handlers for 32bit and 64bit calls in every driver, we have code which translates based on the ioctl() number and the data structures passed, that way for example everything that uses for example a struct plistref can use the same translator.
As it is now, Gdium goes multiuser, ethernet, graphics, USB, real time clock etc. all work. X works with the wsfb driver only so far ( there's a problem loading modules which depend on other modules, like Xorg drivers that use XAA, EXA etc. - not sure if it's a bug in the runtime linker or binutils or whatever. Wsfb works because it doesn't depend on anything. ) There is no audio support yet and wlan support doesn't work right ( It manages to associate with my router but stops doing anything after answering a few pings. Not sure if it's the ral driver or something else. )
There is basic powerd and envsys support - pushing the power button initiates a shutdown, closing the lid turns the backlight off, envstat gives temperatures and power status.
Tuesday, February 15, 2011 6:39:38 AM
CompactFlash, NetBSD, Solid State Disk, PowerBook
After losing another laptop harddisk to child induced blunt force trauma I got me an ATA-to-CompactFlash adaptor and an 8GB CF card. The adaptor is made to replace a 2.5" harddisk - it has the right connector in the right place and threaded screw holes that match up as well. Since the disk in my PowerBook was kinda flaky I replaced it with the combination above just to see how it would work out.
The card I got shows up as an ATA66 device:
wd0: <SanDisk SDCFH-008G>
wd0: drive supports 1-sector PIO transfers, LBA addressing
wd0: 7641 MB, 15525 cyl, 16 head, 63 sec, 512 bytes/sect x 15649200 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 4 (Ultra/66)
A bit smaller than the harddisk it replaces but more than enough for NetBSD, X11, KDE and whatever else I need, music and such are accessible over the network anyway so no need to carry yet another copy around.
Since flash memory only survives a certain number of writes I took a few precautions:
- mount everything with the noatime option to disable access time logging. Without it every open() of a file would generate a write to record a time stamp.
- turn off web browser disk caches - the PowerBook has 1GB RAM and the internet connection is pretty fast, no need to waste write cycles with this.
- don't waste much room for swap space. The machine shouldn't need any during normal operation anyway.
- put /tmp and /var/tmp in a ramdisk - no need to waste write cycles for sockets and caches
With all this startup time improved noticeably although it wan't exactly slow to begin with. The laptop is now completely silent unless the fan spins ( which only kicks in when the CPU gets some serious load ) or something mucks with the DVD drive, it also runs slightly cooler.
Tuesday, February 8, 2011 10:39:05 AM
customer disservice, charger, targus, SPARCbook
...
So the SPARCbook's power supply died and I had to hunt for a replacement. It's nothing exotic - 12V, 50W, unremarkable barrel connector. Nothing fancy at all. After some poking around I found out that Targus makes 'universal' laptop chargers that supposedly work with any laptop. Since the SPARCbook's requirements were nothing extraordinary I bought one.
As it turns out these things are far less universal than they want you to believe. The 'universality' is achieved by using coded tips that tell the power supply what voltage to use and then wire it to a connector, they come with 10 tips for supposedly the most common models. So far so good. I found a connector that fit the SPARCbook but unfortunately it gave me 16V so I went to Targus' website in order to find a tip that works and that's where the trouble began.
Their website gives no technical ( or rather, useful ) information whatsoever. All you get is either pictures of the tips with no information or a searchable list of laptop models, devices etc., if your laptop isn't listed you're out of luck. After some fruitless searching I finally sent a message to their technical support - after all they should know what they're selling and surely there's another laptop out there that needs the same connector and voltage, it's not like Tadpole didn't use standard parts wherever possible.
Well, the answer I got is responsible for this post's headline - quite possible the most useless response I ever got from any 'technical' support ever. They told me I'm out of luck if my laptop isn't listed and they wouldn't recommend plugging it in with 16V. No attempt at solving the problem at all, just ha-ha we got your money now bugger off.
Well, they're not going to get any further business from me, even if some day I need a power supply for a laptop that is on their list.
Sunday, September 19, 2010 1:28:29 AM
NetBSD, Pismo, macppc, Apple
...
Most bits & pieces for running NetBSD on a Pismo were already in place ( after all, the thing has a lot of similarities with AGP PowerMac G4s and later iBooks ) and I managed to fix a few nits in the last few days:
- the r128fb driver now knows how to set backlight levels both via wsconsctl and via PMF hotkey events
- volume control via PMF / hotkeys works now
- lid open / close events are now forwarded to powerd ( the switch broke in my iBook before I could make it work so it had to wait for working hardware, and it's Something Completely Different in the PowerBook 3400c )
- powerd's lid_change script will now turn off the backlight ( after saving the level ) on close and restore it on open
- the smartbat driver will now mark all relevant sensors as invalid if there is no battery in the respective slot
Other than that, even hotswapping the drive bay works fine, cardbus support Just Works(tm) ( no need for a hack like in the PB3400c ). The Pismo has the usual two i2c buses found in most ( all? ) UniNorth Macs but according to OF there is nothing useful hooked up to them ( well, except some modem control ) and there is no obvious way to talk to the fan controller for stuff like reading temperature sensors ( luckily the CPU's built-in sensor seems to work right ), there might be another i2c bus connected to the PMU. I need to dig up a nice high resolution picture of a Pismo mainboard, maybe the fan controller is another Analog Devices part which we can talk to using one of these i2c buses.
Wednesday, September 8, 2010 9:49:51 PM
NetBSD, Pismo, PowerBook
Recently a nice, new looking G3 PowerBook showed up on my doorstep. It's a Pismo, got a 500MHz CPU with 1MB L2 cache and 128MB RAM with one SO-DIMM slot empty. 512MB SO-DIMMs go for $20 on ebuy so upgrading it to 1GB shouldn't be a problem. The laptop apparently sat in a closet since 2001 so it can't have been in use for more than a year.
That said, the last MacOS X version that's officially supported is 10.4.11 which I'm not going to bother with, NetBSD runs just fine and only needs some minor fixes:
- the hotkey driver works fine but the audio driver needed a minor fix to play nice with it and the video driver needs support for backlight control
- the smart battery driver shows bogus values for empty battery slots
- the Xorg driver can't detect the display size and doesn't support Xrender acceleration
1 2 3 Next »