Leaded Solder ([syndicated profile] leadedsolder_feed) wrote2025-08-05 12:00 am

A SPARC makes a little fire

Posted by

Way back in May of 2018, I was unable to get the SparcStation 1+ to stop returning “Illegal Instruction” errors for any attempt at booting. This made absolutely no sense to anyone I asked about it, and they suggested replacing the PROM battery, because at least then we’d have fewer known-broken parts in the computer. I ignored this advice, and just stuck the computer in a corner with the other broken machines for awhile so it could think about what it did.

You know what you did, you bastard

A few weekends later, I decided to go back and give the little Sun another chance. Perhaps it was the cute pizzabox appearance, or maybe it was that I needed to clear the bench space for something else.

To begin diagnosis, I yanked the hard drive out, and discovered the Quantum ProDrive in it was in fact an Apple factory drive. Here’s the badge, indicating that the drive was shipped with an Apple ROM.

Apple ROM sticker on the hard drive

Looks like it has some pins for a hard drive access LED. Considering it’s a full-height drive, I wonder which Mac this would have come out of. Maybe an early SE or II? Definitely not an LC.

Quantum LED pins

A lot of early Apple hard drives are hardcoded to SCSI ID 0, and while they have pins to reset them, they don’t come with jumpers (I realized this was since the three jumpers set the SCSI ID in binary, which should have been obvious to me before - no jumpers = all false = 0). As far as I can tell, this one is the same way.

I decided to boot it without a hard drive installed at all. Maybe that would give me a cooler error than “Illegal Instruction!”

It did - the computer decided to try and boot off the loopback device instead, and when it failed, it didn’t say “illegal instruction.” What’s more, I could now type boot sd(0,0,0)vmunix without getting Illegal Instruction either!

I immediately slammed the drive back in and tried to boot. This time, I just got a SCSI device is not responding error. Assuming the drive was dead, I began my usual Quantum ProDrive diagnostic method of:

  1. Feeling the drive for vibrations that indicate it is working,
  2. Tapping the drive gently with the handle of a screwdriver,
  3. Tapping the drive less gently with the handle of a screwdriver,
  4. Repeatedly trying to start the drive until I hear it finally come to life.

Quantum ProDrive HD being smug

Eventually, the drive started making noise and vibrating, so I assumed it was fully alive. This was not the case, as I would find out later, but it didn’t matter.

I looked online for information about the “not responding” error, and found one Usenet post from 1995. In it, the protagonist had a similar machine (an IPX) that wouldn’t boot, but was able to boot it using boot sd(0,3,0)vmunix instead of the default ID of 0.

That’s weird, since probe-scsi was pretty adamant the drive was target 0, device 0. Some more searching revealed that, yeah, the SparcStation firmware reroutes ID 0 to ID 3 for some reason only understood by Sun engineers and/or the legally insane (pay attention - this comes up again later.) OK, whatever… let’s try it.

Now I got an even better error message - bad magic number in disk label. I assume that this is telling me that the firmware doesn’t understand the partition map of the disk.

That error message was followed by a complaint that I had pointed the computer at a non-executable file, which makes sense, since not being able to read the partition map generally makes it difficult to find any files.

More disks! More parts! More problems!

To check if I could magically make this thing start working by adding more variables to the problem, I did a little bit of disk swapping. In went a second SCSI disk from my collection, a 2.2GB Quantum Viking that I found in a greasy shoebox. While this drive is supposedly a little big for the older firmware to support, it should at least be detected by probe-scsi as a SCSI device.

The following table describes my findings:

  Viking in Left Bay ProDrive in Left Bay Nothing in Left Bay
Viking in Right Bay N/A “SCSI bus hung” “Bad magic number”
ProDrive in Right Bay Hangs after banner N/A “Bad magic number”
Nothing in Right Bay Hangs after banner “Bad magic number” Hangs after banner

Hanging the SCSI bus wasn’t too big of a surprise in retrospect, since both drives are set to ID 0 from the factory and would of course conflict. I was not thinking awfully clearly back in 2018 when I was doing this, apparently.

What surprised me is that putting the Viking in bay 1 locked up the machine, and so did not having any disks at all. I thought this was supposed to be a diskless-capable workstation.

Nothing was adding up.

Floppy drives exist, right?

There is one more thing the SparcStation 1+ has up its sleeve: an auto-eject floppy drive. Like the Macintoshes, ejecting the floppy is driven by software, which is a feature that sounds really boring but is actually cool in person. Much like this entire machine. At least a Macintosh got to eject floppy disks in outer space.

In theory, I could make a Linux or BSD boot floppy and boot off it. Then I could use the booted-from operating system’s disk tools to look at the drive, figure out what (if anything) is on it, and maybe get a better clue as to why this computer doesn’t want to boot. Worst case, I could at least format the drive and install a new OS.

In practice, not so much. My first indication that something was up came when I tried to run boot fd() on a blank floppy. The computer ran the drive for a few seconds, and then complained that it could not read the first sector, and the floppy self-test failed.

The label of the original Sony MP-F17W-P1 floppy drive. Why would you betray me, Sony? What's in it for you??

After a few more attempts with different boot disks, I determined that nothing really worked on this floppy drive except for the eject command, which, it seemed, would be the only real functionality I could squeeze out of this computer. At least it makes a cool noise.

After thinking about it, I decided to remove the floppy drive and take a look inside. Everything looked great, except that the drive door’s spring was no longer able to hold it closed, and that there was a tiny piece of plastic - probably from the drive door - lying in the back of the drive away from the mechanism. I removed the plastic, lubed up the jackshaft and eject gears with a stingy amount of lithium grease, and reinstalled the drive.

No change.

I removed the floppy cable from the motherboard. For some reason, a previous owner had filed off the alignment tab from the cable, and then installed it backward. Could this be the problem?

It wasn’t - with the motherboard cable flipped around, the floppy drive displayed the normal PC-standard “LED on all the time, nothing else works” symptoms. I got a self-test error for “expected track 2, got track 0” and eject no longer worked. At least now I know it’s probably 34-pin Shugart, and also that the cable was in the right way around the first time, since eject works.

Fine, be that way. Who needs a floppy drive anyway?

Nice drive

After a few more power cycles of the SparcStation, the ProDrive HDD, which was already pretty loud when booting, made an even louder noise and then started to spin at the approximate volume of a jet engine. I am pretty sure the spindle was still a little rusted during the previous attempts. Now it’s finally free to spin again!

However, the drive still near-instantly returned the “bad magic number” for ID = 3 (really ID = 0, more on this later.) Which is weird, because if the drive wasn’t spinning before, how did the computer know what was on the disk? And why can’t I hear the drive being accessed when I try this?

Something very weird was going on with this machine’s SCSI setup. And the floppy drive. And without an AUI adapter, I also couldn’t expect to net-boot the machine.

Frustrated, I decided to shelve the computer for a few months and come back to it later. And by “a few months,” I mean, like, seven years.

Several Years Later…

Picking the project up in 2025, I decided I would use my BlueSCSI in Initiator Mode to dump the disk. I had been dumping a whole lot of other SCSI disks lying around the house, and the mystery of just what was on this Apple ROM disk had been gnawing at me.

Unfortunately, as it had been a few years in the interim, the drive had seized up again. Curse you, entropy.

No amount of hammering seemed to be able to get the stuck spindle free this time, and I stripped one of the screws badly trying to remove the lid. After a few sessions of no progress, I was getting angry messing with it, so the hard drive went in the freezer for a bit to see if that would shrink the horrible rubber and loosen its impervious hold. Surprise, the thing that makes no sense didn’t work either.

After several attempts at freeing up the drive, I put it on the shelf. We’ll drill out this screw, free up the bumper, and dump it some other day. Who knows what mysterious (Mac?) treasures could be living on this hard drive?

Even though this part of the project was a failure, it got the SparcStation uncovered, and opened. So why not take another whack at fixing it?

Even if the Quantum had spun up and worked, I decided it would be best to start over with a known-good drive. I put a BlueSCSI v2 SCSI emulator inside the SparcStation, with a known-good install of SunOS. And what better way to do it than to build a fresh disk image in the very same emulator that I once used to play Time Killers?

MAME-ing myself

I knew that the sun4 machines were supported in MAME, so I went looking to see if anyone would hold my hand through the process. That was a bit too much to ask, but I did get basically 90% of the way there from reading this wiki page on installing SunOS into MAME1.

After starting MAME with a valid SparcStation BIOS ROM and configuration, I had to boot cdrom, then tell the partitioner the type of disk I was running2. I didn’t know what “random 2.0GB compressed file” was in Sun-land, so I picked “Sun 1.3G” and then proceeded to partitioning.

The partitioning utility lets you pick a pre-defined layout based on the aforementioned type, which had two options for the 1.3 gig drive. I picked the “standard” one, because I don’t drive an automatic. After writing out the disk label (now I know what that is!) and rebooting, I was able to run suninstall and install the rest of the operating system.

Of course, it then punted me out to OpenBoot without any idea what to type to boot into my new operating system. After some stumbling, I determined it was boot disk3. At one point, I had to set the date and time, and the installer really did not like the idea of it being 2025. Me neither, bud. We’re all trying to get through this together. I lied to it, and said it was 1997, which has the same days of the week as 2025.

I got all the way into OpenWindows before I realized I didn’t know how to run shutdown from a regular account…

I have switched into root with the 'su' command and then done 'shutdown -h now', which tells the system it is going to shut down now.

When in doubt, pop a root shell. No idea how you’re supposed to safely turn off your expensive workstation if you’re a lowly engineer without root access, but such plebian problems are beneath me.

So now I had a two-gig CHD file containing a working SunOS installation. How was I gonna get it to the SparcStation so I could enjoy the fruits of all this labour?

chdman extracthd looked like the best course of action. That dumped out a roughly 2-gigabyte flat file. I offered that up onto the BlueSCSI’s SD card, along with the CDROM image.

I crossed my fingers that I had gotten all the filenames right, but that was going to be the least of my issues…

Big Sparc, Little Smoke

After many years, it was time to introduce power into this machine once again. I had to root around my workshop to find the Sun 13W3 video adapter, but thankfully the keyboard and mouse were big enough that they were hard to lose. Of course, it had been wedged into a corner where it was not especially usable, along with a pile of other dead and half-dead machines that had been piled up in the area since the little Sun’s pseudo-accidental decommissioning.

While inspecting the machine again, I noticed that the black plastic part of the keyboard port was falling out. I shoved that back in, and made a mental note to glue it up later.

The BlueSCSI is inserted into the SparcStation, sitting in the little easy-remove tray.

First, I plugged the BlueSCSI in without any power cables. I figured that the SS1+ was new enough to provide termination power on the SCSI connector, and therefore it would be enough to power the BlueSCSI without having to run any power cables.

To my surprise, I was wrong. The BlueSCSI didn’t light up, and the SparcStation hung after showing the sign-on banner. Stop+A didn’t work to break into OpenBoot. That sure sounds like what happened back in 2018 when I didn’t have any disks installed at all, doesn’t it?

I went rummaging through my cables for a Molex-to-Berg power adapter. Despite knowing that I had some good ones downstairs, I decided I would instead save myself the trip and get some from my ancient bin of 90s cables. In a SATA-to-IDE adapter board box, I found one.

I plugged it in, and the BlueSCSI’s power LED lit up.

The voltage regulator on the Pico lit up, too.

And smoked. Huh.

There is a hole in the voltage regulator for the BlueSCSI now. I guess you could say it "blue" up.

After quickly shutting everything off, I unplugged the BlueSCSI and checked the voltages coming out of the SparcStation. +12V was coming out of the red wire on the adapter. Someone a long time ago had wired the Molex end backward in this adapter, while wiring the Berg end correctly. I didn’t notice, because the cables coming out of the SparcStation are all black3. If I had seen a red/yellow mixup between the motherboard and adapter, I would have never turned it on.

The Pico on my BlueSCSI was now dead, having been force-fed +12V in a place that it expected to receive only +5V.

The adapter is wrong. Pin 4 is supposed to be +5V, red, but it is instead yellow.

The microSD card that was in the slot at the time also refused to mount or even identify later on my modern computers, so the damage was quite comprehensive.

In the future, I’m going to swap out the Pico and see if the BlueSCSI comes back to life. I will probably also see if the stricken Pico works when powered directly from 3.3V with the voltage regulator disabled. I’m not confident that the board will be in good shape, especially the level shifters, as they would have also gotten whacked with the twelve volts that killed the SD card. It will be more of a “for fun” project.

Luckily, the SparcStation itself seemed to be fine, with it able to boot to the same useless “just the banner” screen as before I installed any hard drives. Still, it would be hard to tell for sure until I got another hard drive in here.

Note to self: don’t feed your five-volt peripherals twelve volts.

Land of Con-Fuse-ion

After I was done being extremely angry, I got to thinking. If only termination power had worked, none of this would ever have happened.

And, you know, it’s kind of weird that the SparcStation 1+ doesn’t have termination power. It’s a pretty modern computer. Although I wasn’t initially able to find conclusive documentation either way due to the collapse of actual useful search engines, the “Sun-Managers’ Old System and Software Frequently Asked Questions” FAQ says this:

On newer machines (sparcstations and later), many people have done this [hot swap SCSI devices] regularly without problems. Halt the machine (sync;L1-A), remove or add the device, then continue. However, it is possible to blow the SCSI termination power fuse on the motherboard. If your machine hangs immediately on powerup unless the SCSI bus is externally terminated, this fuse (2A Littelfuse) may need to be replaced. Caveat Emptor.

The sunhelp hardware reference FAQ concurs, and further says that a SparcStation 1+ does indeed have this fuse:

18) My SPARCstation 1/1+ says “The SCSI bus is hung. Perhaps an external device is turned off.” when I try to boot, or it locks up completely after displaying the banner. What do I do?

Check the SCSI termination fuse, located on the motherboard near the external SCSI connector. The fuse looks like a small cylinder that is usually clear or totally black with a black top and white writing. It is in a socket and is easy to remove. If adding an external device that powers its own terminator makes the machine work, the problem is definitely the termination fuse.

You know, this system is hanging on startup with no SCSI drive inserted (see above, in “hangs after banner.”) And before that, it complained that the SCSI bus was hung. Is the fuse blown?

I pulled out the TurboGX card and looked around the vicinity of the external SCSI connector. To my surprise, I found two fuses, along with a third one close to the keyboard port.

Two socketed round plastic fuses are positioned next to the AMD AM7990 Ethernet controller of the computer.

Checking them in order, I found that the one marked “U2” was open. Both of the other two fuses were intact. Checking part III of the Sun hardware FAQ, it was confirmed that U2 is the SCSI termination fuse on the 4/60 (SparcStation 1) and 4/65 (SparcStation 1+) motherboards, Sun part number 150-1174.

The blown U2 fuse, removed from its socket, sitting against the top of the power supply.

Blown fuse! Time to get another. On the top is written “125V 2A LF,” which confirms that it’s a Littelfuse part. However, a couple years have passed since it was made, and Littelfuse has changed hands quite a few times, and what’s left of Sun probably isn’t about to sell me one. Would I still be able to find whatever fuse this was?

Digi-Key reports that they have zero of the Micro 273 in 2A in stock, and also it costs $11.23.

After some searching, I was able to figure out that this is called a Micro 273 fuse, part number 0273002.H, and they are still in production. Unfortunately, Digi-Key wanted $11.23 Canadian for one fuse, and they were also out of stock.

My luck finally began to turn when I saw that Littelfuse had a samples program that included the 0273002.H, so I ordered two. Hey, I might need another depending on how this goes.

At the time of writing, they haven’t shipped these sample fuses yet. When that changes, I’ll change this sentence.

In the meantime, I swapped the intact Ethernet fuse to the SCSI position and fired up the machine with no drives in it. Sure enough, it got past the banner immediately and proceeded to the self-test. After a couple minutes, it kicked me out to the boot prompt. This is what I had been expecting it to do back in 2018, so at least we’re making some progress.

Now that the SCSI termination fuse has been replaced, the SparcStation is going past the initial sign-on banner, proceeding to a sound and memory test.

Not having bus termination would sure explain some of the weird stuff going on with this computer.

Fuses don’t just blow on their own. Who or what blew the fuse? Considering the shenanigans with the SCSI cables and the mismatched Quantum hard drive, I’m guessing the previous owner, or even the recycler that sold it to me. The FAQs blame hot-swapping SCSI disks for blowing these fuses, but I can’t imagine a lot of people using a Sun workstation would ever think that’s okay to do.

It’s too bad I didn’t figure this out before I turned a BlueSCSI into very expensive smoke, but that miswired cable would have eventually killed something anyway. I’m glad it wasn’t the entire computer.

Another emulator

All of my BlueSCSIs were being used for other purposes, so I needed another disk emulator to keep momentum going on this project.

Drake was out of BlueSCSIs, so I ordered a ZuluSCSI Compact RP2040 with mounting brace from DECromancer. I’d been meaning to try out the Zulu for a little while, and this seemed like the perfect excuse.

The mounting brace is not exactly what I thought it would be. I figured it would be an adapter plate so that the ZuluSCSI would easily screw into the Sun’s plastic standard hard-drive retainer like any other internal hard drive, but it seems to be designed for use as a front-panel for a computer with 3.5” bays.

The ZuluSCSI and its adapter plastic, ziptied into the Sun adapter sled.

I took the opportunity to redo my disk images, as well. Having the mismatch between the 2-gig file and the Sun1.3G layout was bothering me, so I decided to create a whole new image of the correct size.

The BlueSCSI platforms page for the SparcStation tells me that a Sun1.3G drive is 1,369,661,440 bytes long. I went ahead and used chdman to produce a hard drive image of that size, and then repeated the SunOS 4.1.4 install as described in the previous section.

This time, I picked the “full install,” instead of opting only for a developer-style workstation. We’ve got one point three gigabytes of storage, man. We’ll never run out! And if we do, I’m sure I can figure out how to use up the rest of this 8GB card on a second “drive” that gets mounted into the tree.

I plugged the SD card into the ZuluSCSI, and it was time to party. Unfortunately, nothing came up on probe-scsi, so obviously booting would have failed. The ZuluSCSI blinked five times rapidly at startup, which indicates “SD card not detected.”

I noticed a suspicious-looking solder bridge on one of the buffers, so I took a picture and sent it to the ZuluSCSI support email written on the board. Alex got back to me within a half hour (on a weekend, even) and explained that it was normal and not to worry about it. After some more back-and-forth, he asked if it was running the latest firmware.

I had no idea what firmware this board was running, so I plugged it into a USB cable and got ready to flash a new Pico UF2 file on it. But at that moment… the LED blinked only once. When it was plugged into USB power, the ZuluSCSI had absolutely no problem seeing the SD card!

I went back to the SparcStation and plugged it back in. Five blinks. I measured the +5V power at the unpopulated Berg connector on the top of the ZuluSCSI, which should be reflecting how much power the RP2040 microcontroller is seeing from the SCSI termination line.

It was only showing 2.4 to 2.6 volts, which is certainly not enough for a 3.3V system, and definitely not enough for a 3.3V system that expects to be given a 5V power source! Termination power, it seems, was still not in great shape on this SparcStation after all.

After some grumbling and anger at the universe, I decided to externally power the ZuluSCSI.

USB powered

It was getting late, so to temporarily test this theory, I ran a too-short micro-USB cable to an old phone charger and powered the ZuluSCSI inside the SparcStation. Now, probe-scsi would actually return some devices!

After some fumbling, I managed to get the SparcStation to attempt boot by typing the command boot sd(0,0,0)vmunix, which got it as far as trying to mount the /usr filesystem (at /dev/sd3g) and failing.

The SparcStation complains that the ZuluSCSI is an unrecognized brand of CD-ROM before attempting to mount /usr, which fails because sd3g isn't a device, and then tries to run fsck, which fails because it's probably in /usr somewhere.

We’ll get to that complaint about the ZuluSCSI-brand CD-ROM drive later.

I think this is because I built the image while telling MAME it was at ID 0, but the firmware does a little swapping for historical reasons, and maps ID 3 into ID 0 and vice versa. So that “3” in /dev/sd3g probably means the SunOS install was done thinking it was at SCSI ID 3, which is really at SCSI ID 0, which means that telling the ZuluSCSI to mount it at SCSI ID 3 makes the computer think it’s at SCSI ID 0 which means it should be getting it from /dev/sd0g but is not. Confused yet?

I renamed the HD3_512.img hard drive image to HD0_512.img and then started the system again. After typing boot sd(0,3,0)vmunix I was soon inside SunOS. After login, it started OpenWindows and the OpenLook window manager. In colour.

A stippled grey screen, with a mouse cursor on it. This would have been the nice teal OpenWindows splash screen, but I was too slow to catch it.

Unfortunately, at this point I wasn’t sure how to get out of it and safely shut the computer off. I hadn’t plugged in a mouse, I wasn’t sure if the mice could be hot-plugged, and I didn’t have the right mouse pad for the (optical) mouse, either. After a little fumbling and key-pressing to try and get a terminal to appear that I could type into, I gave up and switched the SparcStation off at the switch on the back.

The console complains that there is an XNeWS network security violation and that it rejected a connection from blort, the same machine this is running on.

I did see a mysterious “network security violation” error on the open console complaining about refusing a connection from itself (the hostname blort,) which I didn’t see in MAME. It seems like that would have kept me from getting a usable terminal even if I had a mouse installed. I wonder what that’s all about?

Conclusion

There’s a lot more to do with this thing before it can become a fully operational SparcStation:

  • Find a usable mouse;
  • Find a fuse for the Ethernet, and get it working;
  • Figure out what exactly is going on with SCSI termination power on this system, and what regulates it;
  • In the meantime, put a Berg connector on the ZuluSCSI so I’m not using a phone charger to power it;
  • Bodge a battery onto the NVRAM module so we can have permanent settings and not wait minutes for a memory test every time;
  • Glue the loose plastic part of the keyboard port back in, if that’s even possible;
  • Upgrade the ROM to the same version I used in MAME, and see if that supports the other video card I bought;
  • Figure out how to work OpenLook well enough to write some code.

Phew! Making a working computer out of someone’s dumpster liner is hard work.

For now, though, this is the most progress made in the last, oh, eight years? I’ll take it.

Thank you to everyone who gave me advice on how to misuse this once-expensive machine as an e-waste toy. And thank you to Sun, for making this system very wide and very flat, so I could stack other broken computers on top of it while I was waiting to fix a problem that (in retrospect) was very obvious.

Thank you also to you for reading.

If you’ve got a broken old computer you haven’t looked at for a while, you should dust it off and give it a try. What’s the worst that could happen, letting the magic smoke out?

Repair Summary

Fault Remedy Caveats
A perfectly good SCSI emulator seems to be smoking. Don’t mix up +5V and +12V next time.  
SCSI initialization appears to hang forever after banner. Replace blown SCSI fuse at U2. It is still unknown why the fuse blew.
ZuluSCSI refuses to identify SD card. Add additional power. I still don’t know why termination power is so weak on this SparcStation; it should “just work.”
  1. What even is the difference between SunOS and Solaris? SunOS >= 5 became “Solaris 2,” but it’s not just a naming exercise. Along with several other Unix vendors at the same time, Sun wholly changed the source base of the operating system from BSD to a “unified” Unix System V, after several years of slowly altering SunOS to be more compliant with the latter standard. 

  2. Based on the questions I got asked when I picked “custom,” this relates to the geometry of the disk. 

  3. The cables also look badly soldered, and not crimped, so I wonder if these are someone’s homemade cables. This poor SparcStation has seen a lot of action. 

Matthew Garrett ([personal profile] mjg59) wrote2025-08-03 08:10 pm
Entry tags:

Cordoomceps - replacing an Amiga's brain with Doom

There's a lovely device called a pistorm, an adapter board that glues a Raspberry Pi GPIO bus to a Motorola 68000 bus. The intended use case is that you plug it into a 68000 device and then run an emulator that reads instructions from hardware (ROM or RAM) and emulates them. You're still limited by the ~7MHz bus that the hardware is running at, but you can run the instructions as fast as you want.

These days you're supposed to run a custom built OS on the Pi that just does 68000 emulation, but initially it ran Linux on the Pi and a userland 68000 emulator process. And, well, that got me thinking. The emulator takes 68000 instructions, emulates them, and then talks to the hardware to implement the effects of those instructions. What if we, well, just don't? What if we just run all of our code in Linux on an ARM core and then talk to the Amiga hardware?

We're going to ignore x86 here, because it's weird - but most hardware that wants software to be able to communicate with it maps itself into the same address space that RAM is in. You can write to a byte of RAM, or you can write to a piece of hardware that's effectively pretending to be RAM[1]. The Amiga wasn't unusual in this respect in the 80s, and to talk to the graphics hardware you speak to a special address range that gets sent to that hardware instead of to RAM. The CPU knows nothing about this. It just indicates it wants to write to an address, and then sends the data.

So, if we are the CPU, we can just indicate that we want to write to an address, and provide the data. And those addresses can correspond to the hardware. So, we can write to the RAM that belongs to the Amiga, and we can write to the hardware that isn't RAM but pretends to be. And that means we can run whatever we want on the Pi and then access Amiga hardware.

And, obviously, the thing we want to run is Doom, because that's what everyone runs in fucked up hardware situations.

Doom was Amiga kryptonite. Its entire graphical model was based on memory directly representing the contents of your display, and being able to modify that by just moving pixels around. This worked because at the time VGA displays supported having a memory layout where each pixel on your screen was represented by a byte in memory containing an 8 bit value that corresponded to a lookup table containing the RGB value for that pixel.

The Amiga was, well, not good at this. Back in the 80s, when the Amiga hardware was developed, memory was expensive. Dedicating that much RAM to the video hardware was unthinkable - the Amiga 1000 initially shipped with only 256K of RAM, and you could fill all of that with a sufficiently colourful picture. So instead of having the idea of each pixel being associated with a specific area of memory, the Amiga used bitmaps. A bitmap is an area of memory that represents the screen, but only represents one bit of the colour depth. If you have a black and white display, you only need one bitmap. If you want to display four colours, you need two. More colours, more bitmaps. And each bitmap is stored in an independent area of RAM. You never use more memory than you need to display the number of colours you want to.

But that means that each bitplane contains packed information - every byte of data in a bitplane contains the bit value for 8 different pixels, because each bitplane contains one bit of information per pixel. To update one pixel on screen, you need to read from every bitmap, update one bit, and write it back, and that's a lot of additional memory accesses. Doom, but on the Amiga, was slow not just because the CPU was slow, but because there was a lot of manipulation of data to turn it into the format the Amiga wanted and then push that over a fairly slow memory bus to have it displayed.

The CDTV was an aesthetically pleasing piece of hardware that absolutely sucked. It was an Amiga 500 in a hi-fi box with a caddy-loading CD drive, and it ran software that was just awful. There's no path to remediation here. No compelling apps were ever released. It's a terrible device. I love it. I bought one in 1996 because a local computer store had one and I pointed out that the company selling it had gone bankrupt some years earlier and literally nobody in my farming town was ever going to have any interest in buying a CD player that made a whirring noise when you turned it on because it had a fan and eventually they just sold it to me for not much money, and ever since then I wanted to have a CD player that ran Linux and well spoiler 30 years later I'm nearly there. That CDTV is going to be our test subject. We're going to try to get Doom running on it without executing any 68000 instructions.

We're facing two main problems here. The first is that all Amigas have a firmware ROM called Kickstart that runs at powerup. No matter how little you care about using any OS functionality, you can't start running your code until Kickstart has run. This means even documentation describing bare metal Amiga programming assumes that the hardware is already in the state that Kickstart left it in. This will become important later. The second is that we're going to need to actually write the code to use the Amiga hardware.

First, let's talk about Amiga graphics. We've already covered bitmaps, but for anyone used to modern hardware that's not the weirdest thing about what we're dealing with here. The CDTV's chipset supports a maximum of 64 colours in a mode called "Extra Half-Brite", or EHB, where you have 32 colours arbitrarily chosen from a palette and then 32 more colours that are identical but with half the intensity. For 64 colours we need 6 bitplanes, each of which can be located arbitrarily in the region of RAM accessible to the chipset ("chip RAM", distinguished from "fast ram" that's only accessible to the CPU). We tell the chipset where our bitplanes are and it displays them. Or, well, it does for a frame - after that the registers that pointed at our bitplanes no longer do, because when the hardware was DMAing through the bitplanes to display them it was incrementing those registers to point at the next address to DMA from. Which means that every frame we need to set those registers back.

Making sure you have code that's called every frame just to make your graphics work sounds intensely irritating, so Commodore gave us a way to avoid doing that. The chipset includes a coprocessor called "copper". Copper doesn't have a large set of features - in fact, it only has three. The first is that it can program chipset registers. The second is that it can wait for a specific point in screen scanout. The third (which we don't care about here) is that it can optionally skip an instruction if a certain point in screen scanout has already been reached. We can write a program (a "copper list") for the copper that tells it to program the chipset registers with the locations of our bitplanes and then wait until the end of the frame, at which point it will repeat the process. Now our bitplane pointers are always valid at the start of a frame.

Ok! We know how to display stuff. Now we just need to deal with not having 256 colours, and the whole "Doom expects pixels" thing. For the first of these, I stole code from ADoom, the only Amiga doom port I could easily find source for. This looks at the 256 colour palette loaded by Doom and calculates the closest approximation it can within the constraints of EHB. ADoom also includes a bunch of CPU-specific assembly optimisation for converting the "chunky" Doom graphic buffer into the "planar" Amiga bitplanes, none of which I used because (a) it's all for 68000 series CPUs and we're running on ARM, and (b) I have a quad core CPU running at 1.4GHz and I'm going to be pushing all the graphics over a 7.14MHz bus, the graphics mode conversion is not going to be the bottleneck here. Instead I just wrote a series of nested for loops that iterate through each pixel and update each bitplane and called it a day. The set of bitplanes I'm operating on here is allocated on the Linux side so I can read and write to them without being restricted by the speed of the Amiga bus (remember, each byte in each bitplane is going to be updated 8 times per frame, because it holds bits associated with 8 pixels), and then copied over to the Amiga's RAM once the frame is complete.

And, kind of astonishingly, this works! Once I'd figured out where I was going wrong with RGB ordering and which order the bitplanes go in, I had a recognisable copy of Doom running. Unfortunately there were weird graphical glitches - sometimes blocks would be entirely the wrong colour. It took me a while to figure out what was going on and then I felt stupid. Recording the screen and watching in slow motion revealed that the glitches often showed parts of two frames displaying at once. The Amiga hardware is taking responsibility for scanning out the frames, and the code on the Linux side isn't synchronised with it at all. That means I could update the bitplanes while the Amiga was scanning them out, resulting in a mashup of planes from two different Doom frames being used as one Amiga frame. One approach to avoid this would be to tie the Doom event loop to the Amiga, blocking my writes until the end of scanout. The other is to use double-buffering - have two sets of bitplanes, one being displayed and the other being written to. This consumes more RAM but since I'm not using the Amiga RAM for anything else that's not a problem. With this approach I have two copper lists, one for each set of bitplanes, and switch between them on each frame. This improved things a lot but not entirely, and there's still glitches when the palette is being updated (because there's only one set of colour registers), something Doom does rather a lot, so I'm going to need to implement proper synchronisation.

Except. This was only working if I ran a 68K emulator first in order to run Kickstart. If I tried accessing the hardware without doing that, things were in a weird state. I could update the colour registers, but accessing RAM didn't work - I could read stuff out, but anything I wrote vanished. Some more digging cleared that up. When you turn on a CPU it needs to start executing code from somewhere. On modern x86 systems it starts from a hardcoded address of 0xFFFFFFF0, which was traditionally a long way any RAM. The 68000 family instead reads its start address from address 0x00000004, which overlaps with where the Amiga chip RAM is. We can't write anything to RAM until we're executing code, and we can't execute code until we tell the CPU where the code is, which seems like a problem. This is solved on the Amiga by powering up in a state where the Kickstart ROM is "overlayed" onto address 0. The CPU reads the start address from the ROM, which causes it to jump into the ROM and start executing code there. Early on, the code tells the hardware to stop overlaying the ROM onto the low addresses, and now the RAM is available. This is poorly documented because it's not something you need to care if you execute Kickstart which every actual Amiga does and I'm only in this position because I've made poor life choices, but ok that explained things. To turn off the overlay you write to a register in one of the Complex Interface Adaptor (CIA) chips, and things start working like you'd expect.

Except, they don't. Writing to that register did nothing for me. I assumed that there was some other register I needed to write to first, and went to the extent of tracing every register access that occurred when running the emulator and replaying those in my code. Nope, still broken. What I finally discovered is that you need to pulse the reset line on the board before some of the hardware starts working - powering it up doesn't put you in a well defined state, but resetting it does.

So, I now have a slightly graphically glitchy copy of Doom running without any sound, displaying on an Amiga whose brain has been replaced with a parasitic Linux. Further updates will likely make things even worse. Code is, of course, available.

[1] This is why we had trouble with late era 32 bit systems and 4GB of RAM - a bunch of your hardware wanted to be in the same address space and so you couldn't put RAM there so you ended up with less than 4GB of RAM
Matthew Garrett ([personal profile] mjg59) wrote2025-07-30 06:48 pm
Entry tags:

Secure boot certificate rollover is real but probably won't hurt you

LWN wrote an article which opens with the assertion "Linux users who have Secure Boot enabled on their systems knowingly or unknowingly rely on a key from Microsoft that is set to expire in September". This is, depending on interpretation, either misleading or just plain wrong, but also there's not a good source of truth here, so.

First, how does secure boot signing work? Every system that supports UEFI secure boot ships with a set of trusted certificates in a database called "db". Any binary signed with a chain of certificates that chains to a root in db is trusted, unless either the binary (via hash) or an intermediate certificate is added to "dbx", a separate database of things whose trust has been revoked[1]. But, in general, the firmware doesn't care about the intermediate or the number of intermediates or whatever - as long as there's a valid chain back to a certificate that's in db, it's going to be happy.

That's the conceptual version. What about the real world one? Most x86 systems that implement UEFI secure boot have at least two root certificates in db - one called "Microsoft Windows Production PCA 2011", and one called "Microsoft Corporation UEFI CA 2011". The former is the root of a chain used to sign the Windows bootloader, and the latter is the root used to sign, well, everything else.

What is "everything else"? For people in the Linux ecosystem, the most obvious thing is the Shim bootloader that's used to bridge between the Microsoft root of trust and a given Linux distribution's root of trust[2]. But that's not the only third party code executed in the UEFI environment. Graphics cards, network cards, RAID and iSCSI cards and so on all tend to have their own unique initialisation process, and need board-specific drivers. Even if you added support for everything on the market to your system firmware, a system built last year wouldn't know how to drive a graphics card released this year. Cards need to provide their own drivers, and these drivers are stored in flash on the card so they can be updated. But since UEFI doesn't have any sandboxing environment, those drivers could do pretty much anything they wanted to. Someone could compromise the UEFI secure boot chain by just plugging in a card with a malicious driver on it, and have that hotpatch the bootloader and introduce a backdoor into your kernel.

This is avoided by enforcing secure boot for these drivers as well. Every plug-in card that carries its own driver has it signed by Microsoft, and up until now that's been a certificate chain going back to the same "Microsoft Corporation UEFI CA 2011" certificate used in signing Shim. This is important for reasons we'll get to.

The "Microsoft Windows Production PCA 2011" certificate expires in October 2026, and the "Microsoft Corporation UEFI CA 2011" one in June 2026. These dates are not that far in the future! Most of you have probably at some point tried to visit a website and got an error message telling you that the site's certificate had expired and that it's no longer trusted, and so it's natural to assume that the outcome of time's arrow marching past those expiry dates would be that systems will stop booting. Thankfully, that's not what's going to happen.

First up: if you grab a copy of the Shim currently shipped in Fedora and extract the certificates from it, you'll learn it's not directly signed with the "Microsoft Corporation UEFI CA 2011" certificate. Instead, it's signed with a "Microsoft Windows UEFI Driver Publisher" certificate that chains to the "Microsoft Corporation UEFI CA 2011" certificate. That's not unusual, intermediates are commonly used and rotated. But if we look more closely at that certificate, we learn that it was issued in 2023 and expired in 2024. Older versions of Shim were signed with older intermediates. A very large number of Linux systems are already booting certificates that have expired, and yet things keep working. Why?

Let's talk about time. In the ways we care about in this discussion, time is a social construct rather than a meaningful reality. There's no way for a computer to observe the state of the universe and know what time it is - it needs to be told. It has no idea whether that time is accurate or an elaborate fiction, and so it can't with any degree of certainty declare that a certificate is valid from an external frame of reference. The failure modes of getting this wrong are also extremely bad! If a system has a GPU that relies on an option ROM, and if you stop trusting the option ROM because either its certificate has genuinely expired or because your clock is wrong, you can't display any graphical output[3] and the user can't fix the clock and, well, crap.

The upshot is that nobody actually enforces these expiry dates - here's the reference code that disables it. In a year's time we'll have gone past the expiration date for "Microsoft Windows UEFI Driver Publisher" and everything will still be working, and a few months later "Microsoft Windows Production PCA 2011" will also expire and systems will keep booting Windows despite being signed with a now-expired certificate. This isn't a Y2K scenario where everything keeps working because people have done a huge amount of work - it's a situation where everything keeps working even if nobody does any work.

So, uh, what's the story here? Why is there any engineering effort going on at all? What's all this talk of new certificates? Why are there sensationalist pieces about how Linux is going to stop working on old computers or new computers or maybe all computers?

Microsoft will shortly start signing things with a new certificate that chains to a new root, and most systems don't trust that new root. System vendors are supplying updates[4] to their systems to add the new root to the set of trusted keys, and Microsoft has supplied a fallback that can be applied to all systems even without vendor support[5]. If something is signed purely with the new certificate then it won't boot on something that only trusts the old certificate (which shouldn't be a realistic scenario due to the above), but if something is signed purely with the old certificate then it won't boot on something that only trusts the new certificate.

How meaningful a risk is this? We don't have an explicit statement from Microsoft as yet as to what's going to happen here, but we expect that there'll be at least a period of time where Microsoft signs binaries with both the old and the new certificate, and in that case those objects should work just fine on both old and new computers. The problem arises if Microsoft stops signing things with the old certificate, at which point new releases will stop booting on systems that don't trust the new key (which, again, shouldn't happen). But even if that does turn out to be a problem, nothing is going to force Linux distributions to stop using existing Shims signed with the old certificate, and having a Shim signed with an old certificate does nothing to stop distributions signing new versions of grub and kernels. In an ideal world we have no reason to ever update Shim[6] and so we just keep on shipping one signed with two certs.

If there's a point in the future where Microsoft only signs with the new key, and if we were to somehow end up in a world where systems only trust the old key and not the new key[7], then those systems wouldn't boot with new graphics cards, wouldn't be able to run new versions of Windows, wouldn't be able to run any Linux distros that ship with a Shim signed only with the new certificate. That would be bad, but we have a mechanism to avoid it. On the other hand, systems that only trust the new certificate and not the old one would refuse to boot older Linux, wouldn't support old graphics cards, and also wouldn't boot old versions of Windows. Nobody wants that, and for the foreseeable future we're going to see new systems continue trusting the old certificate and old systems have updates that add the new certificate, and everything will just continue working exactly as it does now.

Conclusion: Outside some corner cases, the worst case is you might need to boot an old Linux to update your trusted keys to be able to install a new Linux, and no computer currently running Linux will break in any way whatsoever.

[1] (there's also a separate revocation mechanism called SBAT which I wrote about here, but it's not relevant in this scenario)

[2] Microsoft won't sign GPLed code for reasons I think are unreasonable, so having them sign grub was a non-starter, but also the point of Shim was to allow distributions to have something that doesn't change often and be able to sign their own bootloaders and kernels and so on without having to have Microsoft involved, which means grub and the kernel can be updated without having to ask Microsoft to sign anything and updates can be pushed without any additional delays

[3] It's been a long time since graphics cards booted directly into a state that provided any well-defined programming interface. Even back in 90s, cards didn't present VGA-compatible registers until card-specific code had been executed (hence DEC Alphas having an x86 emulator in their firmware to run the driver on the card). No driver? No video output.

[4] There's a UEFI-defined mechanism for updating the keys that doesn't require a full firmware update, and it'll work on all devices that use the same keys rather than being per-device

[5] Using the generic update without a vendor-specific update means it wouldn't be possible to issue further updates for the next key rollover, or any additional revocation updates, but I'm hoping to be retired by then and I hope all these computers will also be retired by then

[6] I said this in 2012 and it turned out to be wrong then so it's probably wrong now sorry, but at least SBAT means we can revoke vulnerable grubs without having to revoke Shim

[7] Which shouldn't happen! There's an update to add the new key that should work on all PCs, but there's always the chance of firmware bugs