The assumption that every technology will continue to get faster and better has become second nature much in thanks to Moore’s Law. As Moore’s Law predicted, the number of transistors on an integrated circuit have doubled at a steady pace since 1965 (initially every year, later adjusted to roughly every two), bringing exponential increases to computing power. For more than half a century Moore’s Law held true with stunning precision, helping turn brick-sized telephones into handheld supercomputers.
For the tech industry, Moore’s Law set the trajectory of progress: smaller, cheaper, faster, repeat. It offered a roadmap. CPUs sat at the center of data architectures and system hardware could set clear targets for development. Programmers would focus on new features knowing more powerful computers will make applications run faster.
Everything continued on this merry way because Moore’s Law prevailed. Until it didn’t.
Moore’s Wall
Moore’s Law has run up against a more formidable law: the law of physics. Transistors on CPUs have become so small they are now just a few atoms in size. Challenges of power and heat have made performance gains of the past years marginal, while shrinking transistors any further will take heroic efforts that are increasingly complex and audaciously expensive.
That circles back to the essence of Moore’s Law, which argues the more transistors added to a chip, the cheaper each one got. It was as much about economics as it was about computing muscle.
With the engine of technological and economic growth running out of steam, computing technology is going to have to find radically different ways to progress and rethink some of its conventions.
Circumventing the CPU
“Conventional thinking has been that compute dominates the data architecture,” said Dr. Siva Sivaram, the president of technology and strategy at Western Digital, in a recent keynote at Flash Memory Summit. But in the post-Moore’s Law era, this idea has to fundamentally change. Dr. Sivaram brought together three of the company’s executives to talk about the technologies that will be the agents of that change.
One of the speakers was Dr. Richard New, the vice president of research at Western Digital. His teams explore future concepts of neuromorphic computing, spintronics, DNA, and exotic materials. Yet their focus is also on the urgency of changing architectures now, and the technologies that will drive the coming decade of progress.
“One of the significant trends in compute during the last decade has been this idea of moving compute from a central processing unit out to some other device that can do the compute more efficiently,” said Dr. New.
Instead of turning the crank on general-purpose CPUs, custom devices are designed to accelerate a specific application. GPGPUs, for example, can crunch many operations at the same time. But those tasks have to be identical. This type of parallelism is perfect for running millions of mathematical operations for AI, but it’s not the best fit for the type of basic computations needed on a laptop.
CPUs won’t be abandoned, but they’ll be joined by a sea of heterogenous computational devices. Even storage-based ones.
More than Moore
“Computational storage extends that idea [of distributed computing], and the idea there is to take a compute function that would be in the CPU and move it down to the storage device,” said Dr. New. It is a concept that exposes some of the underlying deficits in today’s data architectures.
The aggregate bandwidth of NAND and SSDs in a storage enclosure is much greater than the network’s bandwidth. Workloads requiring high throughput, like scanning a large genomics dataset for a particular DNA sequence, could do so more efficiently if the data didn’t have to squeeze through a bottleneck to be processed elsewhere.
“Conventional data architectures bring all the data from storage and move it to the central location of CPUs, GPUs, [etc.] to get processed. But this moving takes lots of energy and latency. It takes time,” said Dr. Yan Li, vice president of design engineering in memory technology at Western Digital. With over 200 patents under her belt, Dr. Li is known for challenging conventions and her healthy dissatisfaction with the status quo.
Dr. Li and her team are working on a groundbreaking concept of computational storage, one that could exist within the building blocks of flash storage itself: NAND memory.
While Moore’s Law is slowing for CPU and DRAM, NAND is only scratching its scaling potential. Unlike CPUs that scale by making transistors smaller, 3D NAND brings a revolution of multiple scaling directions: vertical (layers), lateral (cells per layer) and logical (bits per cell).
Dr. Li and her team are continuously pushing boundaries to solve the physical and cost challenges associated with “smallness.” But her rebellious thinking is exploring another remarkable idea.
“There is a potential in the future [that] we will actually build intelligence inside the NAND. So NAND [can] not only can store the data, it can also compute, encrypt for security, ECC for data integrity, even add AI functions,” she said.
The concept is so revolutionary that it would completely alter the architecture and functionality of both NAND and storage.
Shifting the Center of Architectures
As long as Moore’s Law maintained its course, there was less incentive to change things. For hardware, that meant the CPU sat at the driver’s seat and dictated much of its surroundings.
Many people may not realize that in today’s server architecture the number of available “memory sticks,” the maximum amount of memory that can be allocated per processor, and even per system, are all determined by the CPU. And any specialized computing device such as a GPU, DPU or FPGA can only access memory by going through the CPU.
Dr. Sivaram sees a future where memory is unchained from the CPU, free to scale to terabytes or even petabytes and will be shared across computing devices. This memory-centric type of architecture is getting within reach. Dr. Sivaram’s team has developed a fabric called OmniXtend that breaks down this hierarchy and gives equal memory access to all computing devices in an environment, while preserving cache coherency.
Cache coherency is one of the biggest challenges for heterogenous computing. Multiple processors need a synchronized view of data, but they may each have a copy of the data in cache at the same time.
OmniXtend is the first cache-coherent memory fabric that is based on open-standard interfaces, leveraging low cost Ethernet. Any device in the industry could potentially adopt it. Dr. Sivaram’s recent RISC-V summit keynote was a call to action to the open-source community to take the next step in the memory-centric revolution and create a unified open coherency bus.
Software Will Supercharge Scale
On the software end of things, Moore’s Law made applications perpetually faster. Programmers focused on features, disregarding inefficiencies and overhead. But now that CPUs are reaching their limitations, software will pick up the slack.
The good news is that software languages are advancing and programmers are developing new ones. Some suggest that an improvement of a factor of 1,000 over Python may not be a stretch.
But it’s more than that.
If the architectural shift is towards workload-specific hardware, then the greatest gains will be seen when devices have a deep understanding of end applications, and vice versa. Increasing efficiencies and performance will depend on marrying hardware and software in a far more intimate and deliberate way.
For Dr. New that philosophy is driving the next generation of solutions. “It’s easy to make a device that performs a specific hardware function, but you really have to be able to support that on the software,” he said. His team’s work on Zoned Storage is a fundamentally different way of thinking about the relationship between storage devices and the software stack.
“As you get into the data center, scale really starts to matter a lot and all these inefficiencies really start to add up,” said Ihab Hamadi, a Western Digital Fellow who leads a systems architecture team. He sees no shortage of places to optimize across the storage stack, starting with applications, software libraries, middleware, down to operating systems with file systems and drivers.
Hamadi explained how, in some cases, multiple layers repeat the same fundamental operations, wasting space and bogging things down. But changing the status quo means creating new ways for the software host to interact with a device. “For high scale, many data center services will need to look beyond the familiar and easy-to-use block-level interfaces to Zoned Namespaces (ZNS). It’s a heavy lift, and our focus is on making that software transition easier.”
Uncharted Territories Ahead
For Dr. Sivaram, the industry has already entered a data-centric era, where the goal isn’t just about performance gains but also about getting the maximum value out of the proliferation of data. He sees this change as a profound responsibility for Western Digital. “This amount of data is [only] useful if it is stored. Stored literally means that, at the end of it, a bit has to flip somewhere,” he said.
Moore’s Law has been a glorious journey. While its ending has been more of a soft wane than a dramatic crag, its implications are not. Technology innovation won’t be stopping, but it’s going to be radically different than what the industry has known. Some see it as a call to arms for some creative disruption, others as the freedom to innovate.