Draco Upstreamed to Linux Kernel

A Collaborative Journey from Architecture Research to Operating System Security Innovation with Real-world Impact

by Tianyin Xu, Hubertus Franke

A team of researchers from University of Illinois at Urbana-Champaign, Carnegie Mellon University, IBM, and RedHat, affiliated with the IBM-Illinois C3SR center, have upstreamed their work on operating system (OS) security to the Linux kernel. As reported by Phoronix (one of the largest open-source news sites), the feature named constant-action bitmaps is yielding “a very nice speedup” for system-call security, a cornerstone for protecting shared OS kernels.

System-call security restricts how untrusted applications interact with the OS kernel (system calls are the main interface exposed by OS kernels to the userland). The idea is to defend against exploits of kernel vulnerabilities by only allowing well-specified system calls and rejecting the others. System-call security is a key building block of modern virtualization technologies (e.g., Docker, LXC/LXD, Google gVisor, Amazon Firecracker, and Mesos Containerizer), sandboxing technologies (e.g., Google Sandboxed API and Firejail), web browsers (Chrome and Firefox), Android mobile apps, and many other important systems and applications (e.g., OpenSSH and systemd). With the new feature that the researchers have released in Linux v5.11, the performance overhead of system call checks on Linux will be remarkably reduced. Linux v5.11 supports all major CPU architectures.

The released feature is exciting! It not only improves application performance by minimizing the security checking overhead, but also enables practitioners who take security seriously to implement more thorough security policies. With containers being a main cloud environment that empowers millions of applications today, the impact is significant,” said Hubertus Franke, a Distinguished Research Staff Member at IBM Research and a core member of the research team.

The Origin: The Draco Project

Constant-action bitmap is a part of the Draco research project started by Saburo Muroga Professor Josep Torrellas and Assistant Professor Tianyin Xu from the Department of Computer Science at the University of Illinois at Urbana-Champaign, together with Dimitrios Skarlatos, a former PhD student of Professor Torrellas and now an Assistant Professor of Computer Science at Carnegie Mellon University. The key insight of Draco is that the patterns of system calls in real-world applications have locality, i.e., an application typically issues the same system calls with the same sets of arguments repeatedly. Therefore, Draco caches system call IDs and argument values in a special cache, after they have been checked and validated. With Draco, on subsequent system calls, the cache is first looked up and, on a hit, the checks can be skipped, eliminating any checking overhead. The research paper that describes Draco was presented at the 53rd ACM/IEEE Symposium on Microarchitecture held in October 2020.

The project name was inspired after Draco or in greek Δράκων, the Athens legislator in Ancient Greece who replaced for the first time oral law with written code.

Computing systems are undergoing a radical shift, propelled by stern security requirements and by an unprecedented growth in data and users. In this new era of computing, it is urgent to rethink the synergy between the operating system and the hardware layers in order to provide lightweight but strong security guarantees to the users, like Draco.” said Skarlatos.

The Journey: Upstreaming Draco to the Linux Kernel

While research may show feasibility and promise, transforming research ideas into a practical OS innovation that can directly benefit all the Linux-based containers and applications takes a long journey. The journey was started in a technical meetup organized by the IBM-Illinois C3SR center where Xu met Franke and the two discussed the Draco project in depth.

It was very clear to me that the Draco project was a very practical solution to this important problem of system call security checking overhead we have also observed at IBM,” said Franke, “However, there needed to be a significant effort to turn the architecture research into an operating system innovation, and new challenges to be overcome for a practical implementation.”

The potential impact greatly excited Skarlatos, Torrellas, and Xu who teamed up with Franke and his colleague at IBM, Tobin Feldman-Fitzthum. Franke also invited Andrea Arcangeli, a well-known Linux kernel developer at RedHat, to join the team. Arcangeli is the creator of Seccomp, the Linux kernel component for system call security checks on top of which the team built Draco. Xu invited YiFei Zhu, a University of Illinois undergraduate student in his course CS 423 (Operating System Design) to lead the Draco implementation and upstream efforts. The C3SR center supported the work and Zhu was also supported by both NSF and the Office of Undergraduate Research (OUR) of the University of Illinois.

I’m very grateful to the C3SR Center for generously supporting the Draco upstream effort and helping assemble the all-star team. Upstreaming research to open-source projects like the Linux kernel is highly impactful and I’m very glad that the center values such impact,” said Xu.

As predicted by Franke, implementing a practical Draco cache on top of Seccomp for Linux is non-trivial and the team has had to address a number of technical challenges along the way. One of the key challenges is to automatically populate the bitmap cache, which requires understanding the check logic of the check program (known as a Seccomp filter) written in the BPF language. In fact, this challenge has blocked a concurrent, independent endeavor with similar goals. The other solution requires executing the Seccomp filter at load time for every supported system call and watching what happens; however, it incurs prohibitive performance overhead, especially given that the solution requires “memory-management trickery” involving TLB flushes. Zhu innovatively addressed the challenge by statically emulating the execution of the Seccomp filter to extract the desired information automatically from the filter program with negligible overhead and no side effects.

As a side story, Zhu’s innovative emulation solution also unblocked the other team’s proposal, which resulted in two alternatives to implement the important feature. Arcangeli shepherded the upstream process and Zhu kept improving the implementation from the first version to the fifth version, which was merged into the Linux kernel in the end. The feature was released with the Linux kernel v5.11 on 2021 Valentine’s Day.

This is a collaborative team effort. Each member of the Draco team has a unique, complementary expertise, which is key to the success in creating a solution that passes the high bar of adoption by Linux,” said Torrellas.

The Future: More Powerful and More Efficient Security Checks

The upstreamed constant bitmap cache is only one part of the Draco project as it only applies to system call IDs. It is the first step in upstreaming the Draco research results. The original research paper discusses more advanced cache designs that support caching system call arguments. Caching arguments requires more complex systems as different system calls can have different numbers of arguments, and each argument can have different values. How to implement and upstream the argument cache on top of Seccomp for Linux remains a challenge the team is working on. Meanwhile, the Draco team has also been conducting research to investigate new techniques to enable more powerful and efficient security checks to significantly improve the security of modern computer systems in cloud computing environments.

Security is a key concern of modern computing paradigms with extensive sharing. The principle of the Draco project is to rethink today’s security mechanisms and redesign them to accommodate emerging computing environments and workloads,” said Torrellas.

I very much enjoy working with the team. I think the collaborative culture and expertise in architecture and operating systems are impressive. I particularly enjoy the emphasis on practicality and usefulness, which is a key to impact,” said Arcangeli.

“_I’m very happy that the team didn’t stop and I’m very excited about the new directions on more powerful and efficient system call security,” said Feldman-Fitzthum.

I’d like to give most credits to our amazing students, YiFei Zhu, Qingrong Chen, and Jack Chen who did all the hard work. With this exceptionally strong and collaborative team at the C3SR center, I can’t be more confident to foresee more great research to come,” said Xu.