Thanks Dylan. How does sharing across "many" CPUs work? You get multiple Leo-P which connect to each other ? I got impression that 1 Leo P x16 will handle 2 CPUs (or maybe more with lower lanes but that will be slower ?)
First generation is pooling. Future generations would involve memory sharing. For pooling, each Leo P would connect to 2 CPUs, but each CPU could connect to a number of Leos, so there would be a topology where each leo is connected to various CPUs and the pools are managed in that manner. For memory sharing applications, people will start to stick these behind a switch. Technically you can do that in CXL 2.0 with a flat hierarchy, but 3.0 it will be more ubiquitous. Then it doesn't matter how many lanes the pooling devices have because they can connect to everything.
Side note- Montage Technologies actually delivered the first ASIC memory expander (Gen5 CXL.io w/ DDR4/DDR5 combo controller) back in April’22. This is a fully functional part not an FPGA prototype. Their first sku is focused on the memory expansion module market
Title is about pooling though, the killer app. Samsung sent samples of their CXL memory expander early this year as well. I saw it working in person in February for example.
Montage Technologies is way behind on a memory pooling device.
Is CXL (memory expansion/pooling) a capability that could be incorporated directly into CPUs and potentially other chips? Or is the required silicon too large to be incorporated and requires an external chip?
Not sure what you mean. We can attach memory directly to a memory controller on a CPU, but then that limits flexibility and requires a ton more pins on the CPU. CXL attached memory increases flexibility, let's you pool across mamy CPUs, and use less pins for similar capacity and bandwidth. That CXL attached memory needs a memory controller on the memoey side.
Ok I see. The requirement is in the memory side, not on the CPU/GPU/etc.. Makes sense, the memory chips themselves are oblivious to the "pooling" so that's why you need the intermediary. just some small confusion on my part.
Thanks Dylan. How does sharing across "many" CPUs work? You get multiple Leo-P which connect to each other ? I got impression that 1 Leo P x16 will handle 2 CPUs (or maybe more with lower lanes but that will be slower ?)
First generation is pooling. Future generations would involve memory sharing. For pooling, each Leo P would connect to 2 CPUs, but each CPU could connect to a number of Leos, so there would be a topology where each leo is connected to various CPUs and the pools are managed in that manner. For memory sharing applications, people will start to stick these behind a switch. Technically you can do that in CXL 2.0 with a flat hierarchy, but 3.0 it will be more ubiquitous. Then it doesn't matter how many lanes the pooling devices have because they can connect to everything.
thx, i was looking up memory sharing/pooling and hit genzconsortium, are they utilizing cxl ?
Gen Z gave up and joined CXL.
Side note- Montage Technologies actually delivered the first ASIC memory expander (Gen5 CXL.io w/ DDR4/DDR5 combo controller) back in April’22. This is a fully functional part not an FPGA prototype. Their first sku is focused on the memory expansion module market
Title is about pooling though, the killer app. Samsung sent samples of their CXL memory expander early this year as well. I saw it working in person in February for example.
Montage Technologies is way behind on a memory pooling device.
Is CXL (memory expansion/pooling) a capability that could be incorporated directly into CPUs and potentially other chips? Or is the required silicon too large to be incorporated and requires an external chip?
Not sure what you mean. We can attach memory directly to a memory controller on a CPU, but then that limits flexibility and requires a ton more pins on the CPU. CXL attached memory increases flexibility, let's you pool across mamy CPUs, and use less pins for similar capacity and bandwidth. That CXL attached memory needs a memory controller on the memoey side.
Ok I see. The requirement is in the memory side, not on the CPU/GPU/etc.. Makes sense, the memory chips themselves are oblivious to the "pooling" so that's why you need the intermediary. just some small confusion on my part.
Here's a good link about the tradeoff of serial vs parallel attached DRAM.
http://ww1.microchip.com/downloads/en/DeviceDoc/Serial-Memory-Technology-White-Paper-00003192B.pdf