Computer scientists have invented a highly effective—yet incredibly simple—algorithm to decide which items to toss from a web cache to make room for new ones. Known as SIEVE, the new open-source algorithm holds the potential to transform the management of web traffic on a large scale.
SIEVE is a joint project of computer scientists at Emory University, Carnegie Mellon University and the Pelikan Foundation. The team's paper on SIEVE will be presented at the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI) in Santa Clara, California in April.
A preprint of the paper is already making waves.
"SIEVE is bigger and greater than just us," says Yazhuo Zhang, an Emory Ph.D. student and co-first author of the paper. "It is already performing well but we are getting a lot of good suggestions to make it even better. That's the beauty of the open-source world."
Zhang shares first authorship of the paper with Juncheng (Jason) Yang, who received his master's degree in computer science at Emory and is now a Ph.D. candidate at Carnegie Mellon.
"SIEVE is an easy improvement of a tried-and-true cache-eviction algorithm that's been in use for decades—which is literally like centuries in the world of computing," says Ymir Vigfusson, associate professor in Emory's Department of Computer Science.
Vigfusson is co-senior author of the paper, along with Rashmi Vinayak, an associate professor in Carnegie Mellon's computer science department. Yao Yue, a computer engineer at the Pelikan Foundation, is also a co-author.
In addition to its speed and effectiveness, a key factor sparking interest in SIEVE is its simplicity, lending it scalability.
"Simplicity is the ultimate sophistication," Vigfusson says. "The simpler the pieces are within a system designed to serve billions of people within a fraction of a second, the easier it is to efficiently implement and maintain that system."
Keeping 'hot objects' handy
Many people understand the value of regularly reorganizing their clothing closet. Items that are never used can be tossed and those that are rarely used can be moved to the attic or some other remote location. That leaves the items most commonly worn within easy reach so they can be found quickly, without rummaging around.
A cache is like a well-organized closet for computer data. The cache is filled with copies of the most popular objects requested by users, or "hot objects" in IT terminology. The cache maintains this small collection of hot objects separately from a computer network's main database, which is like a vast warehouse filled with all the information that could be served by the system.
Caching hot objects allows a networked system to run more efficiently, rapidly responding to requests from users. A web application can effectively handle more traffic by popping into a handy closet to grab most of the objects users want rather than traveling down to the warehouse and searching through a massive database for each request.
"Caching is everywhere," Zhang says. "It's important to every company, big or small, that is using web applications. Every website needs a cache system."
And yet, caching is relatively understudied in the computer science field.
How caching works
While caching can be thought of as a well-organized closet for a computer, it is difficult to know what should go into that closet when millions of people, with constantly changing needs, are using it.
The fast memory of the cache is expensive to run yet critical to a good experience for web users. The goal is to keep the most useful future information within the cache. Other objects must be continuously winnowed out, or "evicted" in tech terminology, to make room for the changing array of hot objects.
Cache-eviction algorithms determine what objects to toss and when to do so.
FIFO, or "first-in, first-out," is a classic eviction algorithm developed in the 1960s. Imagine objects lined up on a conveyor belt. Newly requested objects enter on the left and the oldest objects get evicted when they reach the end of the line on the right.
In the LRU ("least recently used") algorithm, the objects also move along the line towards eviction at the end. However, if an object is requested again while it moves down the conveyor belt, it gets moved back to the head of the line.
Hundreds of variations of eviction algorithms exist but they have tended to take on greater complexity to gain efficiency. That generally means they are opaque to reason about and require high maintenance, especially when dealing with massive workloads.
"If an algorithm is very complicated, it tends to have more bugs, and all of those bugs need to be fixed," Zhang explains.
Citation: Computer scientists invent simple method to speed cache sifting (2024, January 24) retrieved 24 January 2024 from https://techxplore.com/news/2024-01-scientists-simple-method-cache-sifting.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.