Carnegie Mellon University

Webinar - Neocortex Spring 2023 Call for Proposals and System Overview

Presented on Tuesday, February 28, 2023, 2:00 - 3:00 pm (ET), by Paola A. Buitrago, Neocortex, Principal Investigator & Project Director and Director, AI and Big Data, Pittsburgh Supercomputing Center; Claire Zhang, Machine Learning Solutions Engineer at Cerebras Systems Inc.; Dr. Leighton Wilson, HPC Solutions Engineer at Cerebras Systems Inc., and Dr. Dirk Van Essendelft, HPC, AI, and Data Scientist at the National Energy Technology Laboratory.

This webinar presents the upcoming Spring 2023 Call for Proposals (NeocortexSpring2023CFP) and gives a system overview of Neocortex, an AI-specialized NSF-funded supercomputer deployed at PSC/CMU.

For more information about Neocortex, please visit https://www.cmu.edu/psc/aibd/neocortex/

View Slides

Table of Contents

00:00 - Welcome
02:20 - Intro
04:41 - Speakers
05:48 - The Neocortex Program
11:16 - Neocortex System Overview
14:17 - Applications Supported by Neocortex - as of February 2023
19:46 - Spring 2023 Call for Proposals
23:16 - To Learn More and Participate
23:48 - Cerebras CS-2: the AI Compute Engine for Neocortex, Overview
26:26 - Developer Resources
27:29 - CS-2 for Deep Learning
29:11 - ML Software Key Features
30:12 - Topics of interest for ML applications
32:14 - CS-2 for HPC Using the SDK
34:59 - Cerebras SDK
39:40 - Topics of interest for HPC applications
41:02 - Cerebras Recap
42:51 - Using NETL's WFA for Scientific Computing on the WSE2
44:02 - What is the WFA?
47:27 - Near Real Time Scientific Modeling
50:33 - Seeking Beta Testers for Scientific Computing
54:26 - Open Q&A

Q&A

Data are stored and processed at PSC, and protected like all data on PSC-operated systems. Please contact neocortex@psc.edu for further explanations.
All the data and work takes place at PSC, none at Cerebras. Please contact neocortex@psc.edu if you need details about PSC policies.
DFT should be possible in the WFA but we do not have kernels built for it.  It is on the long-term development list for the WFA.
The recording of the webinar and slides will be made available. This is the link to access the CFP webpage.
We validate all of the models released in the ModelZoo. Sometimes there exist several different implementations of the model, typically we mirror one of those. Please pay attention to the model configurations in yaml files. In our README files we also provide details about model implementation.
The SDK and WSE presentations will discuss possible non-ML applications.
It would require some porting in any case, and the ability to port dependst on the types of the models implemented in TorchANI. If the models require kernels outside of the set of existing kernels (typical components of MLPs, transformer-style models), than it won’t be possible to port it today.
We have a mainstream Slurm configuration and most of the different features supported by Slurm are permitted in the Neocortex system. There might be some slight differences, but if something doesn’t work as expected, we will be happy to work with you to solve any issues.
It's up to you to map the physical problem to the hardware.  You have to do that in the SDK or in the lower-level tools in the WFA.  Problem mapping is one of the most important parts of how to code the WSE and one thing that is very different than traditional distributed computing.
Yes. All of them have 48KB of SRAM.
The recording of the webinar and slides will be made available using this webinar page.

For the SDK, our *host code* must be Python, at the moment, but C++ support is coming. All the code running on the wafer itself, i.e., the compute kernels, must be written with CSL.

This is for SDK. For ML, you would write code in python and leverage Pytorch or TF.

That depends on what you mean by "better".
The communication is between PEs. Each PE can access its own SRAM and you can program to read data from SRAM of another PE and send it to the target PE via on-chip fabric.
Not yet.
You could also use the SDK, and in fact, I think many DFT kernels would be a great fit. However, using the SDK will take a significant amount of effort since it’s quite low level.

Technically, we actually already support C++ host code, but it’s undocumented. By the time we provide system access to the next round of proposals, we will have some documented examples.

We have yet to test integrating into larger, established C++ code bases with other dependencies, etc., so we unfortunately can’t guarantee that existing C++ code bases can integrate CSL/ SDK kernels without running into compilation issues.

This question has been answered live: [54:52]
This question has been answered live: [57:55]
This question has been answered live: [58:45]
This question has been answered live: [01:00:32]
This question has been answered live: [01:02:37]