VLIW, Trace Scheduling, and the Rise and Fall of Multiflow

 39 min video

 3 min read

YouTube video ID: J7157XB8rxc

Source: YouTube video by AsianometryWatch original video

PDF

Think of a CPU as a bustling kitchen where raw data are ingredients and the final program output is a plated dish. An instruction set lists the tools and techniques the chef can use, while the fetch‑decode‑execute cycle is the routine of gathering ingredients, preparing them, and serving the result. Instruction‑level parallelism (ILP) lets the kitchen staff work on multiple steps at once, but data dependencies and conditional branches act like recipes that must wait for a sauce to simmer before the next step can begin.

The Radical Idea: Trace Scheduling

Traditional compilers limit themselves to tiny code blocks—about six instructions—because branches can send execution down many different paths. Trace scheduling throws that restriction out the window. It treats the whole program as a single block, predicts the most likely execution path (the “trace”), and rearranges instructions across branch boundaries to fill a Very Long Instruction Word (VLIW). When the prediction holds, the compiler can pack dozens of operations into one word, promising 10–30× speedups. If the program deviates, “compensating code” steps in to restore the correct state, preventing crashes. As one commentator put it, “The compiler practically has to be a time‑traveling Mary Sue.”

The VLIW Architecture

VLIW flips the traditional hardware‑centric model on its head. Instead of building complex scheduling logic into the processor, VLIW keeps the hardware simple and pushes all scheduling work to the compiler. The conceptual ELI‑512 was the first VLIW machine, designed to execute 10–30 RISC‑level operations per cycle. Its Bulldog compiler generated the massive instruction words that the hardware would then execute without further runtime arbitration. Critics like Bob Colwell warned that the compiler’s overhead and code bloat could erase any performance gains, but the promise of “the hardware just does whatever the compiler tells it to do” kept the idea alive.

The Multiflow Saga

In 1984, Josh Fisher, John Ruttenberg, and John O'Donnell founded Multiflow to turn VLIW theory into a commercial product. Their first TRACE model, the 7/200, used a 256‑bit instruction word that could hold seven operations. Later models, such as the 28/200, expanded to 1024‑bit words with twenty‑eight operations per cycle. Building these machines was a physical nightmare: thousands of pins required a rubber mallet—affectionately dubbed “the Persuader”—to coax them into place. The compiler, the true heart of the system, could take up to three days to finish a single compilation, but it delivered the promised parallelism. Multiflow raised $33 million in venture capital and grew to 160 employees before its downfall.

The Collapse

The mid‑1980s minisupercomputer market resembled a crowded kitchen with about 20 vendors vying for a $350 million pie. RISC‑based “Killer Micros” from Sun, IBM, and MIPS entered the scene with better price‑performance ratios, riding Moore’s Law to cheaper, faster chips. Multiflow’s hopes of a strategic acquisition by DEC evaporated in 1990, and without that lifeline the company entered voluntary liquidation. The episode left behind a community of “Multifloids” who still recall the black‑magic aura of the technology. As one observer noted, “To many of us, what the Multiflow people told us it could do seemed like black magic.”

  Takeaways

  • Trace scheduling treats an entire program as a single block, predicting the most likely execution path to enable massive VLIW instruction bundles.
  • VLIW architecture shifts scheduling complexity from hardware to the compiler, allowing simple processors to execute many operations per cycle.
  • Multiflow's TRACE machines used 256‑bit to 1024‑bit instruction words, but their hardware required thousands of pins and a rubber mallet for assembly.
  • The minisupercomputer market collapsed under pressure from RISC‑based workstations that offered superior price‑performance and benefited from Moore's Law.
  • Multiflow raised $33 million, grew to 160 employees, and failed to secure a DEC acquisition, leading to its voluntary liquidation in 1990.

Frequently Asked Questions

How does trace scheduling enable the large speedups promised by VLIW?

Trace scheduling predicts the most likely execution path and moves instructions across branch boundaries into a single Very Long Instruction Word. When the prediction is correct, the processor can execute dozens of operations in one cycle, delivering 10–30× speedups. Compensating code handles mispredictions.

Who is Asianometry on YouTube?

Asianometry is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF