Cerebras Systems sets record for largest AI models ever trained on one device
The LLM/Multimodal AI community achieved a significant hardware milestone today, from a venturebeat article we see that:
The Cerebras CS-2 system can train multibillion-parameter natural language processing (NLP) models including GPT-3XL 1.3 billion models, as well as GPT-J 6B, GPT-3 13B and GPT-NeoX 20B. Cerebras said that for the first time ever, a single CS-2 system with one Cerebras wafer can train models with up to 20 billion parameters — a feat not possible on any other single device. One of the CS-2 systems fits inside a standard datacenter rack and it’s about 26 inches tall.
This is an important hardware achievement for our space because it significantly reduces the complexity when it comes to training large language models. In the past, you would need various technical roles with complex infrastructure expertise in order train a large language model in parallel. Not only was this costly and tedious, it wasn’t even the most efficient way to train a model. With Cerebras’ announcement today, we see that we can train a large language model, billions of parameters in size, on a single device. One node! And a lot, lot faster too! In some cases, I’ve been told, the time to train a model is going down from weeks to less than a day thanks to the superior hardware efficiencies of the Cerebras CS-2 device.
A single device is exciting because this means we can not only train models quicker and a lot easier, but we can actually iterate on them. We may discover greater efficiencies and methods of training AI models that simply weren’t possible before because of the massive infrastructure which was needed in the past. Through this system, we can also develop more domain-specific language/multimodal models like in the biological or physical sciences. Most importantly, through CS-2, my belief is that more people will be able to jump in and participate in research and commercial development which will help us understand so much more about language models and what’s truly possible with them.
Cerebras has worked really hard to make their technology very convenient and accessible, fitting well into an existing developer or ML researcher’s workflow. It’s shockingly easy to use from what I can tell. It supports ML libraries you already use like Pytorch and Tensorflow, which is awesome, considering all the heavy lifting it’s doing for you behind the scenes.
Computer hardware underpins everything in our modern world, advances in hardware create a more progressive society as well as a more equitable world for everyone. I had a great chat yesterday with Cerebras’ VP of Product, Andy Hock, and their underlying goal here is to put the power of this high computing technology and large AI models into more peoples’ hands. It’s really exciting to imagine a world where anyone can spin up custom large language/multimodal models, advance our research understanding, and participate in key AI research and commercial development alongside the biggest players in the world.
CS-2 represents a key milestone along a much greater exponential curve. I’m convinced a lot more, particularly in terms of LLM scale, will be possible from here on out.
If you’re interested in learning more about CS-2, I recommend you check out their official blog post:
If you’re a developer or a user of LLM products like GPT-NeoX, I really believe you should keep Cerebras on your radar. For an hourly rate, through their cloud partner Cirrascale, you can also give CS-2 a spin and try training a model yourself:
If you’re interested in advancing the LLM/multimodal space through hardware achievements, they’re hiring:
https://www.cerebras.net/join-us/
If you’re a startup or larger organization, it may be worth reaching out to their sales team, they can explain the benefits of CS-2 way better to you and also work with you to help narrow down which option may be best for you (custom model vs. fine tuning vs. starting with a pretrained model):
https://www.cerebras.net/get-demo/
The underly cost and affordability here has certainly improved with CS-2, so, it may very well be within your organization’s budget. Your time to train a model will also be dramatically reduced. Perhaps, from weeks to less than a day. I encourage you to connect with them and see what’s possible here.
Please note, I don’t have any commercial affiliation with them, but have been monitoring their work for a few years now. I’m really excited to keep seeing what their team comes up with! As a community, we should support Cerebras more. Unlike other hardware providers, they are exclusively focused on making custom hardware for the specific needs in our space, so the more we can do to support them, the better it is for the whole industry. The point is - I encourage you to follow along with their journey as well!