Disaggregated Cloud Memory with Elastic Block Management
With the growing importance of in-memory data processing, cloud service providers have launched large memory virtual machine services to accommodate memory intensive workloads. Such large memory services using low volume scaled-up machines are far less cost-efficient than scaled-out services consisting of high volume commodity servers. By exploiting memory usage imbalance across cloud nodes, disaggregated memory can scale up the memory capacity for a virtual machine in a cost-effective way. Disaggregated memory allows available memory in remote nodes to be used for the virtual machine requiring more memory than its locally available memory. It supports high performance with the faster direct memory while satisfying the memory capacity demand with the slower remote memory. This paper proposes a new hypervisor-integrated disaggregated memory system for cloud computing. The hypervisor-integrated design has several new contributions in its disaggregated memory design and implementation. First, with the tight hypervisor integration, it investigates a new page management mechanism and policy tuned for disaggregated memory in virtualized systems. Second, it restructures the memory management procedures and relieves the scalability concern for supporting large virtual machines. Third, exploiting page access records available to the hypervisor, it supports application-aware elastic block sizes for fetching remote memory pages with different granularities. Depending on the degrees of spatial locality for different regions of memory in a virtual machine, the optimal block size for each memory region is dynamically selected. The experimental results with the implementation integrated to the KVM hypervisor, show that the disaggregated memory can provide on average 6% performance degradation compared to the ideal local-memory only machine, even though the direct memory capacity is only 50% of the total memory footprint.
The idea of using remote memory has been proposed and implemented for more than two decades. In recent years, the network speed increases dramatically and such highspeed networks have become available in common server clusters. With such readily available high bandwidth networks, using remote memory is drawing more and more attention from multiple communities than ever. The highbandwidth low-latency interconnects constitute the basis of memory disaggregation and rack scale computing . The remote memory is accessed by the RDMA controller provided by modern interconnects. The RDMA supports the zero-copy and one-sided control of data movement. The zero-copy prevents the data from being copied to/from a kernel buffer for data transmission, and the one-sided control allows that CPU involvement is not necessary in remote systems for data transfers, unlike SEND/RECV model . In addition to the high performance of RDMA, the one-sided control makes networked systems more robust because RDMA data connections survive the software failures including OS kernel crashes on the remote server. The separation of RDMA data route from the CPU domain provides the improved robustness.
• The heterogeneity of workloads commonly incurs the imbalance of memory usages in each node.
• The variance in memory usages can cause memory shortages in some machines.
• The inherent memory imbalance can open a new opportunity to provide a large memory virtual machine (VM) cost-efficiently by combining the free memory in multiple servers into a single unified memory
In the proposed hypervisor-based design, the disaggregated memory support is directly integrated to the page management in the KVM hypervisor. Unlike the prior study which supports the remote memory as a block device and uses the existing storage-based swap mechanism, the proposed integrated design can provide fine-grained adjustments of memory eviction and high scalability with the hypervisor integration. This paper proposes a dynamic block size adjustment technique, called elastic block, to find the optimal block size for each VM. The proposed mechanism can assign the optimal block size not only for each VM but also different memory regions in VMs, as it tracks the spatial locality in the hypervisor-managed memory map for each VM.
Thus this paper proposed a new memory disaggregation system backed by RDMA-supported high bandwidth networks. The proposed hypervisor-based design for disaggregated memory provides memory extension to the remote memory transparently to guest operating systems and applications. Its new design proposed a new replacement scheme, overlapped memory reclaim and network transfer, and scalability supports by per-vCPU data structures and lockless writeback operations. In addition, the elastic block maximizes the performance benefit of exploiting the spatial locality, as it dynamically adapts to changing access patterns. The experimental results showed that the disaggregated memory can provide on average 6% performance degradation compared to the ideal local-memory only machine, even though the direct memory capacity is only 50% of the total memory footprint. In addition, the proposed design provides scalable performance with increasing numbers of vCPUs. With the advent of high bandwidth non-volatile memory technologies, the proposed disaggregated memory will be able to expand its support for general hierarchical memory systems, such as conventional DRAM and new non-volatile memory. To prove its design flexibility, the paper also showed the preliminary performance evaluation with the new Optane SSD as the indirect memory.