Stormpath.com

Stormpath in Planet Cassandra: 50k Accounts Imported in Under 200ms

2013-12-02

This interview originally appeared on Planet Cassandra and is reposted here with permission.

Les Hazlewood: CTO and Co-Founder at Stormpath

Matt Pfeil: Founder at DataStax

TL;DR: Stormpath is a user management API for developers, that handles: identity management, user management, and security for applications.

Stormpath supports millions of accounts, and all of the statistics, data and analytics around those accounts created a need for a data store that could handle extreme load and scale. Alongside scale, Les required high availability, as Stormpath could “never, ever go down”, and for that they deployed Cassandra.

Stormpath shifted off of MySQL to Cassandra cutting import time for their customers from 5 days to merely hours. Their deployment is entirely in the Amazon cloud across multiple datacenters. Depending on what they’re doing, they have a replication factor of three to five, with a minimum of five nodes deployed at all times for their Cassandra cluster.

Hello, Planet Cassandra listeners. This is Matt Pfeil. Today I’m joined by Les Hazlewood from Stormpath. Les, thanks for taking some time today to talk about your Apache Cassandra use case.

Why don’t we start things off by telling everyone a little bit about yourself and what Stormpath does.
Sure. Again, my name is Les Hazlewood. I’m the CTO and Co-Founder of Stormpath. Stormpath is a user management API for developers. We’re fundamentally a REST+JSON API hosted in the cloud, and we handle identity management, user management, and security for applications.

Very cool. What’s the use case for Cassandra?
For us, we have to support hundreds of thousands, millions of accounts across multiple different directories from different customers around the world. So, as you get into millions of accounts, millions of records, and all of the statistics, data and analytics around those records, we needed a data store that could handle the extreme load and scale, and could linearly scale as we grow as a startup.

Cassandra was a perfect choice for us because it has no single point of failure. There’s no master and it’s a pure distributed replicated database. Because of the nature of our business, we process authentication attempts for hundreds of thousands, if not millions, of accounts around the world - we can never, ever go down. One of our primary concerns is high availability, and Cassandra’s full tolerant distributed architecture helps us guarantee that for our customers.

Since you’re talking about how important uptime is, can you talk a little bit about what your infrastructure looks like? Are you running in the cloud multiple data centers?
Yeah, so we are across multiple data centers. We’re hosted on Amazon, and we’re 100% Amazon-backed, currently. We’ll probably expand into other data centers, like Rackspace and others, soon enough. We’re across all the east coast zones, and depending on what we’re doing, we have a replication factor of three to five, and so we have a minimum of five nodes deployed at all times for our Cassandra cluster. We’re running m2.4xl instances to handle the horsepower.

Once we warrant or justify the load to move up to the SSD-based machines, we will. So far, we’re perfectly fine with those five machines at the moment.

So, obviously uptime is of high importance. Is your dataset large, or is it primarily the ability to have the data in many locations close to the end user that’s more of a driver for you?
Both, actually. We need to be able to tolerate up to a significant number of machines dying to still be able to process an authentication attempt. We have to make sure that the data is always available to us. That’s really important, but in addition to that, we’re rolling out features all the time that require a lot of data, a lot of load to be put on the servers. For example, time series data on what authentications succeed, what authentications fail over time, how many users are using a particular application at any point in time, events that users take while they’re using applications. This is all very heavily time series-based data, so we can report and give charts and analytics based on user actions and user behavior to our customers.

The quantity of data for us is significantly increasing. Additionally, we also have a very interesting use case for our own product in that we have an on-premise agent that interacts with Active Directory and LDAP installations, and then mirrors that data securely into the cloud. For certain LDAP or AD installations, there could be multiple hundreds of thousands, if not millions, of account records that need to be transferred to us in the cloud.

To do that efficiently and process that information quickly, it’s very hard to do with, say, relational database technologies. We can actually pump all that data into Cassandra as soon as it comes in to our infrastructure. Then we can use Cassandra techniques, like pagination and storing certain results per number of Cassandra rows. That allows us to chunk up the data very quickly, very efficiently, and we can process it very, very quickly.

Our recent tests had us pumping in 50,000 accounts in under 200 milliseconds, which is ridiculous compared to other technologies out there. I think there are other platforms, say Google, that has similar technology. Their import might take customers four or five days to a week, whereas because of Cassandra, we could probably do that in the order of a couple hours.

That’s amazing. Talking about other technologies, did you start out on Cassandra? Or did you migrate off of another solution?
We migrated off. There are certain things that we need to support that do require ACID transactions, and our team was basically a traditional spring hibernate shop, running on a HA MySQL pair. We had a relational database, and that’s what the product initially started out with. We had fairly large instances for replication, but we knew that as we grew as a startup and started getting much bigger customers, that wouldn’t scale. So we shifted over to Cassandra recently, and we still have transactional things that are running on our MySQL cluster, but all of the new functionality we’re rolling out is all based on Cassandra.

Great. So one last question for you: What’s your favorite feature that’s come out in Cassandra over the iterations?
I’m really, really looking forward to the lightweight transactions. We haven’t really been able to leverage those just yet. We also think CQL has been a nice feature. There were a couple things that it lacked in the earlier days, but the DataStax team has really done a great job in filling those gaps, so it’s really nice and mature now. That helps some people that are moving from a relational database world into the Cassandra world a little bit, it’s a little bit better for them, a little easier for them to migrate. That’s been beneficial. We also like Thrift too, so it’s nice to be able to choose one or the other depending on needs.

We also really appreciate Cassandra virtual node capability, and we have basically one systems engineer that maintains and manages our Cassandra clusters, and he does it with no problems. He’s got the stuff automated via Chef and using virtual nodes. The fact that we only need one guy to do this speaks a lot to Cassandra’s scale and capability from a hands-off perspective. It’s been really good for us from an ops side as well.

Les, I want to thank you for your time today. Is there anything else you’d like to share about the future of Stormpath, or anything with the community?
One of the things that our customers had been screaming for is this notion to be able to supply ad hoc data to Stormpath. Whenever somebody creates a group or an account within Stormpath for their application, they want to be able to attach any data that they want that’s specific to their application.

We’re rolling that out right now, and we couldn’t have done that without Cassandra because we needed a schema-less data store that could scale with huge quantities of data. We feel that Cassandra was the best option for us to roll that new feature out, and we’re seeing ad hoc data supplied directly by end users able to be persistent at scale with Cassandra. We don’t think there would have been an easy way to roll that out otherwise. Our most requested feature by customers is now directly backed by Cassandra, and it has been a great experience for us.

Stormpath is hiring!
http://www.jobscore.com/jobs/stormpath/big-data-engineer-cassandra/bkigCQJdKr4O-OiGakhP3Q