Technology-Architecture Codesign
We are developing efficient frameworks for the end-to-end co-design of technologies and architectures for a wide variety of applications at various levels of abstraction. The key goals of these frameworks are: (a) enabling quick estimation of application performance (written using a high-level language) for given technology parameters across logic, memory, and on-chip/inter-chip connectivity, and (b) deriving technology targets from application performance needs, such as energy and execution time, for future, yet-to-be-discovered technologies.
- Architecture-aware technology optimization
- Rapid evaluation of emerging technologies and design tradeoffs
- Application-specific guidance for technology pathfinding
Symbolic modeling of hardware accelerators
We are developing symbolic co-design methodologies that capture the intricate dependencies between application, architecture, and technologies for hardware accelerators. The workflow begins with high-level application input and propagates through a forward–inverse optimization loop to produce optimized technology parameters under architectural and physical constraints.
The forward pass optimizes architectural design points for a fixed technology and a given application, while the inverse pass derives target technology parameters for a given application performance objective.
To illustrate with an example, consider AI applications (e.g., CNNs, Transformers) written in PyTorch. Our workflow begins by integrating PyTorch models with high-level synthesis (HLS) toolchains to generate application dataflow graphs and corresponding operator netlists. This is followed by block-level physical design to determine interconnect parasitics for energy–delay modeling, completing the forward pass.
For the inverse pass, this framework incorporates a symbolic simulator that expresses workload performance (energy and delay) metrics as functions of technology parameters, enabling nonlinear optimization to derive technology parameter targets directly from workload performance objectives.

Ultra-dense 3D interconnects
Modern AI systems are increasingly constrained by two fundamental bottlenecks: the memory wall, where data transfer between compute and memory dominates energy and latency, and the miniaturization wall, where further scaling of 2D layouts is hindered by physical and manufacturing limits. To address these challenges, 3D integration has emerged as a promising solution, enabling logic and memory to be stacked vertically. This reduces interconnect distances and boosts on-chip density. However, as model sizes and throughput demands continue to rise, vertical bandwidth has surfaced as a new performance limiter in 3D architectures. This necessitates new interconnect topologies tailored for ultra-dense 3D integrated systems. We explore a progression from conventional 2D crossbar arrays to folded 2D network-on-chip (NoC), and ultimately to 3D-optimized designs that leverage ultra-dense 3D inter-layer vias (<100 nm) for enhanced bandwidth. Physical design experiments confirm the feasibility and performance benefits of this approach. These improvements are fed into a PyTorch-based system-level simulator, which models the mapping of real neural networks onto multi-layer processing arrays. By accounting for energy–delay tradeoffs and dynamic PE allocation, our framework demonstrates how architectural and physical co-design can enable scalable, high-performance AI systems.