StarNUMA: Mitigating NUMA Challenges with Memory Pooling
Published in 57th IEEE/ACM International Symposium on Microarchitecture (MICRO 2024), 2024
Albert Cho, Alexandros Daglis
Large multi-socket machines suffer from NUMA effects, where remote and local memory access latency can differ by 4X. StarNUMA introduces a CXL memory device pool directly accessible from all sockets in a single hop, to house shared pages and ameliorate long-latency memory accesses. StarNUMA reduces the average memory access time of a 16-socket system by 35%, yielding performance improvement of 1.13X on average, and up to 1.29X.