A few days ago, the Google AI team introduced Snap, a microkernel-inspired approach to host networking at the 27th ACM Symposium on Operating Systems Principles. Snap is a userspace networking system with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shaping policy enforcement, and a high-performance reliable messaging and RDMA-like service.
The Google AI team says, “Snap has been running in production for over three years, supporting the extensible communication needs of several large and critical systems.”
Prior to Snap, Google AI team says they were limited in their ability to develop and deploy new network functionality and performance optimizations in several ways. This is because developing kernel code was slow and drew on a smaller pool of software engineers. Second, feature release through the kernel module reloads covered only a subset of functionality and often required disconnecting applications, while the more common case of requiring a machine reboot necessitated draining the machine of running applications.
Unlike prior microkernel systems, Snap benefits from multi-core hardware for fast IPC and does not require the entire system to adopt the approach wholesale, as it runs as a userspace process alongside our standard Linux distribution and kernel.
Source: Snap Research paper
Using Snap, the Google researchers also created a new communication stack called Pony Express that implements a custom reliable transport and communications API. Pony Express provides significant communication efficiency and latency advantages to Google applications, supporting use cases ranging from web search to storage.
Snap’s architecture comprises of recent ideas in userspace networking, in-service upgrades, centralized resource accounting, programmable packet processing, kernel-bypass RDMA functionality, and optimized co-design of transport, congestion control, and routing. With these, Snap:
To dynamically scale CPU resources, Snap works in conjunction with a new lightweight kernel scheduling class called MicroQuanta. It provides a flexible way to share cores between latency-sensitive Snap engine tasks and other tasks, limiting the CPU share of latency-sensitive tasks and maintaining low scheduling latency at the same time.
A MicroQuanta thread runs for a configurable runtime out of every period time units, with the remaining CPU time available to other CFS-scheduled tasks using a variation of a fair queuing algorithm for high and low priority tasks (rather than more traditional fixed time slots).
MicroQuanta is a robust way for Snap to get priority on cores runnable by CFS tasks that avoid starvation of critical per-core kernel threads. While other Linux real-time scheduling classes use both per-CPU tick-based and global high-resolution timers for bandwidth control, MicroQuanta uses only per-CPU highresolution timers. This allows scalable time-slicing at microsecond granularity.
Snap is being received positively by many in the community.
https://twitter.com/copyconstruct/status/1188514635940421632
To know more about Snap in detail, you can read it’s complete research paper.
Amazon announces improved VPC networking for AWS Lambda functions
Netflix security engineers report several TCP networking vulnerabilities in FreeBSD and Linux kernels
ReactOS 0.4.12 releases with kernel improvements, Intel e1000 NIC driver support, and more