10 Comments

Thanks Dylan. How does sharing across "many" CPUs work? You get multiple Leo-P which connect to each other ? I got impression that 1 Leo P x16 will handle 2 CPUs (or maybe more with lower lanes but that will be slower ?)

Expand full comment
author
Sep 3, 2022·edited Sep 3, 2022Author

First generation is pooling. Future generations would involve memory sharing. For pooling, each Leo P would connect to 2 CPUs, but each CPU could connect to a number of Leos, so there would be a topology where each leo is connected to various CPUs and the pools are managed in that manner. For memory sharing applications, people will start to stick these behind a switch. Technically you can do that in CXL 2.0 with a flat hierarchy, but 3.0 it will be more ubiquitous. Then it doesn't matter how many lanes the pooling devices have because they can connect to everything.

Expand full comment

thx, i was looking up memory sharing/pooling and hit genzconsortium, are they utilizing cxl ?

Expand full comment
author

Gen Z gave up and joined CXL.

Expand full comment

Side note- Montage Technologies actually delivered the first ASIC memory expander (Gen5 CXL.io w/ DDR4/DDR5 combo controller) back in April’22. This is a fully functional part not an FPGA prototype. Their first sku is focused on the memory expansion module market

Expand full comment
author

Title is about pooling though, the killer app. Samsung sent samples of their CXL memory expander early this year as well. I saw it working in person in February for example.

Montage Technologies is way behind on a memory pooling device.

Expand full comment

Is CXL (memory expansion/pooling) a capability that could be incorporated directly into CPUs and potentially other chips? Or is the required silicon too large to be incorporated and requires an external chip?

Expand full comment
author

Not sure what you mean. We can attach memory directly to a memory controller on a CPU, but then that limits flexibility and requires a ton more pins on the CPU. CXL attached memory increases flexibility, let's you pool across mamy CPUs, and use less pins for similar capacity and bandwidth. That CXL attached memory needs a memory controller on the memoey side.

Expand full comment

Ok I see. The requirement is in the memory side, not on the CPU/GPU/etc.. Makes sense, the memory chips themselves are oblivious to the "pooling" so that's why you need the intermediary. just some small confusion on my part.

Expand full comment
author

Here's a good link about the tradeoff of serial vs parallel attached DRAM.

http://ww1.microchip.com/downloads/en/DeviceDoc/Serial-Memory-Technology-White-Paper-00003192B.pdf

Expand full comment