Carnegie Mellon University

pbit_prob.png

ASICs for Hard Computing Problems

As Moore's Law reaches its limits, further transistor miniaturization faces significant challenges, restricting improvements in computational speed and energy efficiency. Classical accelerators like GPUs and TPUs, optimized for deterministic algorithms, struggle with probabilistic tasks such as Monte Carlo and Markov Chain Monte Carlo (MCMC) algorithms—essential for AI, combinatorial optimization, and probabilistic sampling. These algorithms are inefficient on current hardware due to their serial nature and high-power consumption during random number generation. Quantum computing offers potential solutions but faces scalability barriers like cryogenic requirements and high error rates, making it difficult to realize for many applications today. To address these challenges, probabilistic computers built from probabilistic bits are emerging. p-bits are classical bits that fluctuate probabilistically between 0 and 1 at room temperature. Current p-bit implementations include analog systems (e.g., stochastic MTJs, oscillators), digital annealers, and FPGA-based demonstrations. However, each of these approaches has its own set of limitations. Analog systems, while promising, often face speed and scalability constraints, whereas FPGA implementations suffer from redundant logic that hampers energy efficiency and speed. We build domain-specific probabilistic computing hardware accelerators to address the scaling challenges of today’s probabilistic computing hardware solving some of the hardest problems that quantum computers are poised to solve.

chimera accelerator chip layout and block diagram

Area-optimal p-computing accelerators

This is a reconfigurable ASIC-based probabilistic computing accelerator built on TSMC’s 28nm technology node, designed to efficiently solve large-scale combinatorial optimization problems. Leveraging a novel Chimera graph topology, digital p-bit architectures, and on-chip SRAM, the chip supports up to 16,384 spins with 10-bit coefficient precision in just 0.5 mm² core area, achieving 61.6 μs time-to-solution for 100 Monte Carlo iterations. The system includes 32 p-bit processing elements, reconfigurable address generation, and a power-efficient architecture at 17.09 μW per spin.

Pegasus More techincal details

Scalable p-computing accelerator

Here is a scalable and reconfigurable digital Ising machine implemented in TSMC 28nm CMOS that solves combinatorial optimization problems using probabilistic bits (p-bits) and Gibbs sampling. Our design is one of the first to implement the Pegasus graph topology with digital silicon CMOS, supporting dense connectivity and multi-chip scalability via asynchronous communication and on-chip memory-mapped weights. Each chip hosts 6,192 time-multiplexed nodes running at 200 MHz, with robust verification and physical design enabling large-scale deployment. This architecture provides a CMOS-compatible, energy-efficient platform for tackling NP-hard problems beyond the capability of traditional Ising machine topologies.