Webinar - Neocortex: CS-2 Overview
Presented on Tuesday, March 29, 2022, 2:30 - 3:30 pm (ET), by Dr. Natalia Vassilieva, Director of Product, Machine Learning at Cerebras Systems Inc.
This webinar gives an overview of the recent Neocortex System upgrade, an NSF-funded AI supercomputer deployed at PSC, now featuring two Cerebras CS-2 systems. In order to help researchers better understand the benefits of the new servers and the changes to the systems, we would like to invite you to participate in a virtual overview presentation by Dr. Natalia Vassilieva from Cerebras.
For more information about Neocortex, please visit https://www.cmu.edu/psc/aibd/neocortex/.
View slides (Will be available soon)
Table of Contents
|
00:08 - Welcome 01:50 - Code of Conduct 03:17 - CS-2 Overview 06:01 - Cerebras Wafer Scale-Engine 2 07:45 - Cerebras CS-1 and CS-2: Cluster-scale Performance in a Single System 08:48 - The Cerebras Software Platform 10:17 - Execution Mode on CS-1 for DNNs 11:43 - Execution Modes on CS-2 for DNNs 14:15 - Comparing Execution Modes 17:26 - CS-2 advantages for Pipelined 18:08 - Can fit larget models. How much larger? 25:46 - Can fit larget inputs. How much larger? 27:41 - Faster training. How much faster? 32:44 - CS-2 and Weight Streaming advantages 36:45 - Wafer Memory Management 38:56 - No layer partitioning 41:13 - Summary 42:26 - Q&A Session
|
Please find the recording on the Neocortex Portal |
Q&A
How do we request additional disk storage on the new CS2 machine? and identify if the system is a CS1 or CS2?
Neocortex is now CS2 only. The storage is on the SDFlex front-end, as before.
Does CS-2 enable significantly less allocation wait times (due to the availability of more cores etc)?
If the same-sized problem can be decomposed onto more processing elements, it will run faster. However, the larger size may allow for larger models to be run that were not able to be run before. We don’t know how the use will change to know the timing changes with any level of certainty.
So the ability to stream weights is due to new software and more cores, not fundamental changes to the hardware?
Yes, that is right, the software stack handles how the model is mapped and the availability of more cores and bandwidth allows us to do this with bigger models.
Are the weights/gradients synchronized in the multi-replica setting per batch (i.e all-reduce)?
Yes, that is right.
Not sure if I understand correctly, but for multi-replica, you need to aggregate gradients and update weights iteratively, correct? If so, how often?
In a single replica setting, updates happen every step (one passes through a batch). In multi-replica, one batch is distributed across all the replicas, and each replica process samples sequentially.
This question has been answered live: [43:03]
How many weights does the U-Net have here?
Around 31 million weights.
We mentioned 3D volumes here, are we going to support more on operation on these data types? Video, dynamic images, etc.
This question has been answered live:
[45:01]
Is the weight streaming mode available with PyTorch code? Can I just import my model, and ask the CS-2 to run in weight-streaming mode?
This question has been answered live:
[45:51]
Why proportional to batch size? You are streaming the data in also, right?
This question has been answered live:
[47:25]
How fast can weights stream onto the cs2 chip from the external memory?
This question has been answered live:
[48:25]
Is there a demo codebase and documentation we can get to utilize CS-2s?
This question has been answered live:
[49:50]
Is there a way to request to certain types of models (computer vision-related) to be included in the releases? I have a specific model in mind that could benefit from weight streaming
This question has been answered live:
[51:28]
Are you considering interfacing CS-2 to a quantum computer for hybrid quantum-classical processing for algorithms like Variational Quantum Eigensolver to find the ground energy state of small molecules?
This question has been answered live:
[52:07]
If the model works in pipelined mode, is it likely to work with weight streaming? So I can check if all the operations are supported by the CS compiler
This question has been answered live:
[54:40]