Carnegie Mellon University

Neocortex

Unlocking Interactive AI Development for Rapidly Evolving Research

Allocations

Neocortex is a highly innovative resource that targets the acceleration of AI-powered scientific discovery by vastly shortening the time required for deep learning training, fostering greater integration of artificial deep learning with scientific workflows, and providing revolutionary new hardware for the development of more efficient algorithms for artificial intelligence and graph analytics.

Neocortex democratizes access to game-changing compute power otherwise only available to tech giants for students, postdocs, faculty, and others, who require faster turnaround on training to analyze data and integrate AI with simulation. It also inspires the research community to scale their AI-based research and integrate AI advances into their research workflows.

With Neocortex, users are able to apply more accurate models and larger training data, scale model parallelism to unprecedented levels and avoid the need for expensive and time-consuming hyperparameter optimization. The development of new algorithms in machine learning and graph analytics is enabled through this innovative AI platform.

Training

Neocortex System Specifications

Neocortex features two Cerebras CS-2 systems and an HPE Superdome Flex HPC server robustly provisioned to drive the CS-2 systems simultaneously at maximum speed and support the complementary requirements of AI and HPDA workflows.

Neocortex is federated with Bridges-2 to yield great benefits including:

  • Access to the Bridges-2 filesystem for management of persistent data
  • General-purpose computing for complementary data wrangling and preprocessing
  • High-bandwidth connectivity to other XSEDE sites, campus, labs, and clouds
The configuration of each specialized system is described below:

Cerebras CS-2

Each CS-2 features a Cerebras WSE-2 (Wafer Scale Engine 2), the largest chip ever built.

AI Processor

Cerebras Wafer Scale Engine (WSE-2)
  • 850,000 Sparse Linear Algebra Compute (SLAC) Cores
  • 2.6 trillion transistors
  • 46,225 mm² 40 GB SRAM on-chip memory
  • 20 PB/s aggregate memory bandwidth
  • 220 Pb/s interconnect bandwidth

System I/O

1.2 Tb/s (12 × 100 GbE ports)

HPE Superdome Flex

Processors

32 x Intel Xeon Platinum 8280L, 28 cores, 56 threads each, 2.70-4.0 GHz, 38.5 MB cache (more info).

Memory

24 TiB RAM, aggregate memory bandwidth of 4.5 TB/s

Local Disk

32 x 6.4 TB NVMe SSDs 
  •  204.6 TB aggregate
  • 150 GB/s read bandwidth

Network to CS-1 systems

24 x 100 GbE interfaces
  • 1.2 Tb/s (150 GB/s) to each Cerebras CS-1 system 
  • 2.4 Tb/s aggregate

Interconnect to Bridges-2

16 Mellanox HDR-100 InfiniBand adapters 
  • 1.6 Tb/s aggregate

OS

Red Hat Enterprise Linux


This material is based upon work supported by the National Science Foundation under Grant Number 2005597.

Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Acknowledgment in Publications

Please use the following citation when acknowledging the use of computational time on Neocortex:

Buitrago P.A., Nystrom N.A. (2021) Neocortex and Bridges-2: A High Performance AI+HPC Ecosystem for Science, Discovery, and Societal Good. In: Nesmachnow S., Castro H., Tchernykh A. (eds) High Performance Computing. CARLA 2020. Communications in Computer and Information Science, vol 1327. Springer, Cham. https://doi.org/10.1007/978-3-030-68035-0_15