Matthew Cowen - The origins of Rosetta(2) probably lie in a little-known technology from 1996 called FX!32

Unless you’re a hermit or not in any way linked to the tech industry, you’ll be aware that Apple has released its in-house designed processors to replace the current Intel-supplied ones used in the low-end line of Apple’s computers; the MacBook Air, the MacBook Pro 13” and the Mac Mini.

Upon reception, people have been benchmarking these processors with nothing short of absolutely stunning results. They are that good it seems. Everything from switching resolutions —which is, by the way, instantaneous with no blanking or delay— to running Apple M1 optimised tasks at over three times the speed for some functions, as compared to even the fastest of the Intel family.

But I’ve been most interested in this transition to RISC¹ from CISC², or to put it differently, from Intel to Apple arm-based processors, for one reason. Rosetta.³ A Apple officially calls it Rosetta, but we all know it as Rosetta 2 because its original outing was in 2006. Back then Apple was embarking on its first major transition from the PowerPC line of processors to Intel’s x86 line. Rosetta, at that time, provided the bridge between the older PowerPC applications and the newer operating system that was running entirely on the Intel instruction set.⁴ Rosetta was an emulation software that took PowerPC-based commands and turned them into equivalent Intel-based commands, allowing the application to run, albeit slowly. There is an overhead that is not negligible to run as an emulation. At the elementary level, the processor has to do at least twice the work than an application running natively.

Rosetta 2 does things a little differently, and as a result, substantially reduces the time required to run the translated applications. The word ‘translate’ is the key to understanding Rosetta 2.

Back in 1996 during the precipitous misfortune of digital, a major computer company from Maynard, Massachusetts, digital had designed, built and implemented a RISC-based architecture processor called Alpha. The move to RISC was seen as the way forward and was —rightly so, if what we’re seeing today from Apple— projected to be the future of processor design.

At the time, there was a belief that RISC-based microprocessors were likely to replace x86-based microprocessors, due to a more efficient and simplified implementation that could reach higher clock frequencies.

(FX!32 - Wikipedia. [en.wikipedia.org/wiki/FX!3...](https://en.wikipedia.org/wiki/FX!32))

There was, however, one snag, and that was application compatibility with the growing x86 application base that had taken hold at the time, through PCs running various flavours of Windows. One interesting version had been commercialised for a few years, NT or New Technology, and was quickly outdoing the established Unix workstation operating systems, like digital’s own AXP.

To remove this sticking point, Raymond J. Hookway and Mark A. Herdeg led a team of engineers in developing a much better solution to the CISC ➔ RISC problem than simple emulation. Released in 1996 and discussed in detail in this 1997 Digital Technical Journal article, DIGITAL FX!32 provided the means for the binaries to be “translated” from x86 to Alpha. FX!32 took native x86 binaries and created alpha DLLs, or Dynamic Linked Libraries, and ensured that these ran in the place of the original x86 binaries.

FX!32 allowed two things to happen. One, FX!32 let non-native x86 code run on the Alpha processors with a much smaller speed penalty than emulation. Version 1.0 reportedly ran at 40-50% of the speed of native Alpha code. It was way faster than emulator software that typically ran at a tenth (or slower) of native speed. Subsequent versions and other optimisations allowed the code to run at over 70% of the native Alpha processors speed. Being that the alpha processor was the fastest processor on the market at the time, this allowed complex applications like Microsoft Office, to run at very useable speeds on Alpha workstations running NT 3.51.

Secondly, the work done to translate the binary was not lost and re-expended every time the required application was run, as it was in emulation. FX!32 optimised the binaries in the background and stored the translated libraries on-disk which enabled the second-run experience to be virtually unnoticeable. The background translation ran without user interaction and allowed the processor to choose the best possible optimisations in terms of computational resources enabling the user to start the application and get to work after a short delay. Modules not yet used in the application were optimised in the background and on the first run, were fast and responsive.

The primary goals of the project were to provide 1) transparent execution of x86 applications and 2) to achieve approximately the same performance as a high-end x86 platform. FX!32 achieved both these goals.

That brings us to Apple’s Rosetta 2 technology. Wikipedia’s entry for Rosetta 2 is two sentences:

Rosetta 2 is included starting with macOS Big Sur to aid in the Mac transition to Apple Silicon from Intel processors. In addition to the just-in-time (JIT) translation support available in Rosetta, Rosetta 2 includes support for translating an application at installation time.

Technical information is scarce, as Apple typically shields these types of technical documents. The page dedicated to Rosetta on developer.apple.com is scant in technical detail too. But I suspect the origins of the technology lie in FX!32, updated to run x86 64bit instructions. The difference between now and then, is that Apple’s M1 is so fast that even the 20-30% speed hit allows these computers to run Intel code faster than Intel itself can (on the line of processors Apple is replacing).

Just. Stunning.

20 November 2020 — French West Indies

Reduced Instruction Set Computer ↩
Complex Instruction Set Computer ↩
Taken from the Rosetta Stone that enabled historians and scientists to understand 3 languages, as the stone contained translations of Green, Demotic and Hieroglyphic ↩
The instruction set determines how the processors calculates the code it is fed. Both RISC and CISC have their advantages and disadvantages ↩