GPT Model Training Competition Heats Up …

Dylan Patel

Dec 1, 2022

Cerebras Is Now Cost Competitive For Training GPT-like Large Language Models

Read →

7 Comments

Liberty

Liberty’s Highlights

Dec 20, 2022

I'll be curious to see costs on H100 when they're more available. A100 is getting a bit old.

Expand full comment

Jon Metzler

Connect

Dec 2, 2022

seems like you really enjoyed NeurIPS! have enjoyed your many tweet.s interesting aside - both Jasper and Cerebras are Foundation Capital companies.

Expand full comment

Reply (1)

Dylan Patel

Dec 2, 2022Author

You around tommy? Let's chat. I don't think Jasper makes much sense, but that's exactly why I want to have the argument.

Expand full comment

i responded by twitter DM - next Wednesday afternoon would be great for live conversation. wrapping up semester-end stuff on my end

Expand full comment

Marko Grdinić

Dec 8, 2022

This post spurred me to look into Cerebras, and its approach is quite interesting. Its chips get full performance regardless of the batch size, which would make them ideal for deep RL experiments, unlike GPUs which are only suitable for supervised learning. But with GPUs you can at least buy or rent them them if you have the money, while the Cerebras' chip costs around 5 million according to what I've read on the ML sub.

So buying them is out for me and most other people, but I wonder if it would be possible to rent a part of them for small scale research costing no more than 1k per month?

My guess is that it wouldn't, but does anybody know for sure?

Expand full comment

Reply (1)

Dylan Patel

Dec 8, 2022Author

Cerebras has their cloud they are trying to get people to play with. I'd reach out. I'm going to the office today, I'll push them to add something like this.

Expand full comment

Reply (1)

Marko Grdinić

Dec 8, 2022

That's pretty cool of you. I emailed them yesterday, but I've yet to get a response.

A gripe I have with most of these AI companies in that they seem to be going too hard after the big fish. Take Graphcore for instance. I checked the costs for its cloud, and the cheapest version is something like 20$/h, but based on how the data was presented, I can only guess that it rents its machines on a weekly and monthly basis. If something costs 9k a month, basically nobody is going to come in and play around and create software in his spare time. You can only use such machines to train large workloads non-stop otherwise it is money wasted. I think Tenstorrent is the only company I've looked at that has the intention of selling cards that an average user could put into his home rig. And this is good from a strategic standpoint as Tenstorrent will have 3rd parties writing both commercial and open-source software for its hardware, which will in turn increase value of its products while companies like Graphcore will be forced to do a lot more in-house in order to show their value.

At this moment in time, backprop is extremely dominant, but it doesn't work well for reinforcement learning. Yet, it is hard to get anything else to work on the GPUs, they are match made in heaven. If better algorithms could be found, it would raise the value of their products greatly, and they are offering exactly the products needed to search for such algorithms. But the people training LLMs won't generally be doing that sort of research.

Expand full comment