Carnegie Mellon University

Webinar - Neocortex Overview and Upcoming Call for Proposals

Presented on Monday, October 4, 2021, 2:00 - 3:00 pm (ET), by Paola Buitrago, Director of Artificial Intelligence and Big Data at the Pittsburgh Supercomputing Center (PSC), and Natalia Vassilieva, Ph.D. (Cerebras Systems Inc.).

This webinar gives an overview of Neocortex, a deployed NSF-funded AI supercomputer at PSC. Neocortex, which captures groundbreaking new hardware technologies, is designed to accelerate AI research in pursuit of science, discovery, and societal good. Join us to learn more about this exciting new system and how to be part of the next group of users. Neocortex has been deployed at the PSC early 2021 and currently supports research in drug discovery, genomics, molecular dynamics, climate research, computational fluid dynamics, signal processing and medical imaging analysis. For more information about Neocortex, please visit https://www.cmu.edu/psc/aibd/neocortex/

View slides

Table of Contents

00:00 - Welcome
01:45 - Code of Conduct
02:24 - Introduction
03:16 - The Neocortex System: Context
08:31 - The Neocortex System: Motivation
12:46 - The Neocortex System: Hardware Description
18:28 - Early User Program and Exemplar Use Cases
22:23 - Call for Proposals (CFP)
25:51 - To Learn More and Participate
26:46 - Cerebras CS-1: Introduction
35:24 - The Wafer-Scale Engine (WSE)
40:10 - Software and Programming
48:30 - Focus areas for the upcoming CFP
50:20 - Q&A Session

Q&A

If you attended PEARC21, you can watch the talk on the conference platform - otherwise, you can read a great summary of the panel on HPCWire
This question has been answered live: [50:55]
The proposal acceptance rate in our first round was close to 40%. At PSC, we support Airflow in other systems and projects. We can explore enabling the support on Neocortex specifically.
You will need to leverage kernel or SDK for this.
The proposal has been designed to be very lightweight in or order to minimize the burden on potential users. On Neocortex, the one way to access the system and try it out is via this CFP. We encourage you to apply, allow our team to work with you so you can try the CS-1 and SuperdomeFlex servers, while making sure you are following best practices and getting the most out of the system.
Depends on how TF looks like - if you leverage estimator API the changes will be minor - replace TF with Cerebras wrapper and estimator, for example - but if you previously have a TF wrapper you may have to adapt further
For deep learning, you treat this as one big machine, for batch sizes, we support sizes from small to extreme - the compiler takes care of this. There are tools to help with programming for larger batches.

Yes, those proposals are also welcomed. We already have a couple of projects that would classify under this category. It is worth considering that a project of this nature requires a very close and involved collaboration between PSC, vendors, and the project members. It is for this reason that the number of projects of this kind that we are supporting can be limited.

This question has been answered live: [54:30]
We have much higher memory bandwidths, a traditional GPU/CPU core has a memory hierarchy, we do not have that, and can run things at a higher utilization. We also have a data architecture, internal representation of the data is sparse.
This question has been answered live: [55:34]
No
This depends on the workload - you do not need the superdome flex for most jobs, but there are some examples where the model is not compute-intensive, but you need the full SDF.
This question has been answered live: [58:26]
While at this time we are mostly focused on AI based applications we do expect to start making the system available to other types of HPC applications. We encourage non AI HPC proposals.
Each project is granted an allocation with an specific amount of resources in Neocortex and Bridges-2 which can be expanded as needed. We do leverage Slurm to support batch and interactive types of compute. Each Slurm job is allocated an entire CS-1 (all the cores on a WSE).