Momento: Caching at Scale and More, Without All the Hassle

Former DynamoDB team, which set out to simplify caching for developers, adds Topics pub/sub messaging pipeline.

Jul 22nd, 2024 11:23am by Susan Hall

Featued image for: Momento: Caching at Scale and More, Without All the Hassle

The Momento co-founders, part of the engineering team that built DynamoDB at AWS, wanted to make adding a dynamic cache to your application as simple as creating an S3 bucket. More recently they’ve added Topics, a low latency pub/sub feature and are turning their attention to storage as part of the platform.

“Sitting in Amazon, working with a lot of customers, [they] were having the same set of outages inside AWS as well as outside external AWS customers around caching,” said Khawaja Shams, Momento CEO, who built the company with cofounder Daniela Miao.

“The reason why we felt the customers were having outages was because they were managing each cache as an individual fleet. And in a world where enterprises have hundreds of microservices, each microservice having its own caching cluster can get to be a really, really intractable problem. It’s painful for the platform team to pay attention to the capacity, to the security, to the availability on every single one of them. So what we built was a multitenanted caching fleet.”

“Our job is to minimize the number of boxes in your architecture diagram. … If you look at a Redis cluster, there are a lot of boxes: shards, replicas, zones, VPCs [virtual private clouds], all that. And then you add the web servers to it, and then you add the authentication to it. So we kind of collapse that all into one,” he said.

“We wanted to give people the experience of caching that S3 offers, you know, you just create the bucket, and you just start writing data into it. And you don’t care about any of the nodes or instance types, and so forth.”

The Seattle-based company turned to serverless technology to do so.

Intelligent Routing

If you look at companies managing some of the largest caches in the world — Twitter (now called X), Facebook, Netflix — they have dedicated caching teams, Shams said. Meanwhile, everyone else is trying to cobble together something on a shoestring budget.

But if you look at the architecture that the Netflixes, the Facebooks and the Twitters use, “They have a routing layer that sits in front. That routing layer is dealing with authentication, and it’s dealing with finding where the data lives …,” he said.

Momento offers such an abstraction in the routing layer.

“When you have that stateless routing layer … you can take a lot of liberty in terms of moving data around on the storage side. You don’t have to inform the clients of any changes that are happening. Now, in contrast, if you look at how customers were using the cache in a system like ElastiCache or Redis Enterprise and so forth, they have a very leaky abstraction. The client, the SDK, is fully aware on the server-side topology: How many shards do I have? How many replicas do I have? Which shard has which key, and all their clients have to agree upon this particular knowledge. This is why those systems have maintenance windows. Those systems can’t scale in and out as effectively. And maintenance windows are hard. And it’s like an hour a week that you got to schedule for the cache to get upgraded. You don’t have that with services like S3 or DynamoDB.”

Memento architecture

He calls the routing layer the heart of the system, the storage, the lungs, and an intelligent control plane the brain.

That is the layer that manages the health of the fleet. Does the entire fleet have enough capacity? How are the individual nodes doing? Which nodes need to be updated on the software? And how do I orchestrate that dance so that the clients don’t even know that we have upgraded the software? How do you … have a smooth handoff between multiple storage nodes in a completely seamless manner? How do you do placement? How do you decide which cache has how many partitions and which specific nodes those partitions limit?

“We’ve got this intelligent control plane layer that’s orchestrating behind the scenes, maintaining the warm capacity. It’s adding replicas into the caches that are needed and just doing a lot of the work that a platform engineer would have to be manually architecting around and doing deployments to support when they’re scaling in and out. This intelligent control plane layer just handles all of that work for the end users and makes the system completely seamless from scaling in and out from a deployment perspective, from a security perspective, and patches and so forth,” he said.

The Momento Gateway layer handles fan-out for services across millions of simultaneous client connections. It includes features such as authentication, access control, multiple transport protocols and more.

Momento also can accept connections directly from client devices. Wyze Labs, for instance, can connect images from its security cameras directly into Momento. Momento’s client SDKs manage integrations including support for gRPC, WebSockets and HTTP, among others. These SDKs also incorporate optimizations and best practices, such as binary serialization and Zstandard compression.

Consultant Alex DeBrie has called Momento “a cache that’s specifically designed for the unique properties of the cloud rather than for instance-based, self-hosted infrastructure.”

Founded in 2021, its customers include CBS, Wyze Labs, German TV station ProSieben, Japanese mobile phone operator NTT Docomo, Paramount and Taco Bell. It recently announced a $15 million Series A round of funding.

Serverless Pub/Sub System

Just announced Momento Topics is a serverless publish-subscribe messaging pipeline for event-driven architectures. Similar to AWS Eventbridge, Azure Event Grid and GCP Eventarc, it’s designed to immediately deliver published messages to all current subscribers of a topic and then discard them, keeping traffic costs down.

“One of the biggest problems with many existing pub/sub implementations is that any messages that get sent while a subscriber isn’t listening get dropped on the floor. Our messages have sequence numbers built-in, allowing you to detect if they missed a message,” states a blog post.

Those sequence numbers allow users to check messages from a specific sequence number.

It has announced plans to add two features: Executors, which would allow Lambda functions to be directly invoked as subscribers to topics, and Triggers, which would allow subscriptions to be triggered based on changes to data stored in Momento Cache.

It has a free tier of 50 GB of traffic.

Momento Storage implements an integrated gateway and tiered caching strategy as a buffer to reduce pressure on the underlying persistence layer.

Its plans for storage are deeper integrations with existing vendors.

“Our mission is just to make the developer more effective and we believe we can do that. We see a lot of nuances when developers are trying to wrangle the cache and the storage as separate from it. And we think we can actually integrate them better with the best practices baked in without having to reinvent our own database or storage service,” Shams said.

Susan Hall is the Sponsor Editor for The New Stack. Her job is to help sponsors attain the widest readership possible for their contributed content. She has written for The New Stack since its early days, as well as sites...