Eric: Ladies and gentlemen, hello and welcome back once again to The Briefing Room. Yes indeed, my name is Eric Kavanagh I will be your host and moderator for the show that is designed to get right down to the nitty-gritty of what is going on in the world of enterprise software. There are some really fascinating developments going on all over the data management space. We’re going to talk about that today. The Crown Jewels: Is Enterprise Data Ready for the Cloud?
Listen along to The Briefing Room with database legend Jim Starkey and Dr. Robin Bloor
The mission as many of you know, here in The Briefing Room is to reveal the essential characteristics of enterprise software. We do that by taking live analyst briefings. We’ll bring an analyst, often times it’s our very own Dr. Bloor as it is today, but we work with many of the different independent analysts all around the industry. The reason is pretty simple: 1) they share their honest opinions, they don’t work for big analyst firms and so they don’t have any pressure from management to give their candid opinion about things and 2) because there are many of them out there who have a lot of good real-world experience and we want to see people who have built solutions, gotten into the weeds, actually worked out some things and figured out how to make all this stuff seamlessly integrate into your enterprise. We’re trying to give you the ability to learn from these experts to figure out what you could do to improve your situation in your company.
We talk about different topics each month; this month is the Cloud. Big topic obviously, Big Data is also a very big topic; May it’s going to be Database. We do that because we want you to get a good apples to apples comparison between key vendors in the particular space. As many of you know, when you are purchasing some new enterprise technology it’s usually not terribly cheap, so you want to make sure you get exactly precisely what you need and frankly that is the reason we invented this entire show.
Let’s talk about what’s happening in the cloud with respect to the database. I’ve been tracking the space for quite a few years now and really so many of the old database technologies, even though they had great stuff, were really never designed for the cloud. Why, because the cloud didn’t exist. I mean you could argue: semantics has been around for 30 or 40 years. There used to be something called the mainframe obviously and we’re kind of almost going back to that era in a slightly different way. There’s a lot of stuff going on up there, but there’s a very heterogeneous environment in the enterprise these days. There’s a lot of data that’s on premise and now we’re all starting to explore the cloud and doing so in various ways.
Really a handful of disruptive forces have come along, for example, salesforce.com, but many others with lots of different applications in the cloud that companies use. It’s interesting; I was at a conference just a couple of days ago, they talked about Shadow IT. I always love that name, Shadow IT, it sounds so exciting; and the cool thing about Shadow IT is that it delivers some kind of business value. Almost inevitably what happens is people in a company get tired of dealing with some bottleneck, some issue or hurdle, someone standing in their way, not letting them do what they want to do and so they go and develop their own system.
Well these days, a lot of that shadow IT is in the cloud because there are all these solutions for contact management or marketing or sales automation. Whatever the case may be, there are tons and tons of cloud offerings; there are whole conglomerates forming coalitions around provision in cloud services. There’s a lot of stuff happening there, the one thing that has really been a bit of a hurdle for cloud deployment has been which database do you use up in the cloud to manage this strange environment?
All these APIs, all your data, how much security do you need, how much governance do you need for all that stuff. There are all these different requirements and frankly they are very, very complex and that’s why we need innovation in the database space and boy are we lucky today to have none other than Jim Starkey along with our very own Robin Bloor.
You can see NuoDB is the vendor in the Briefing Room today; it’s a NewSQL distributed database solution. Architected to scale elastically on the cloud; this is really interesting stuff. As you can see it leverages the peer-to-peer distributed architecture and is ACID compliant. Those of you who went to a college in the 60s love that and there’s out man Jim Starkey. This guy has been around for a long time, he looks good though as you can see and really ask any questions you want because he can answer probably any one of them.
With that I’m going to hand you over to Jim Starkey.
Jim: Okay, thank you very much. I’m sitting in here for Barry Morris who got called away for a very important meeting which happens when you are a new startup. You just have to put up with me I guess. Actually I’m retired from the company, I started the company, I developed the architecture and the software and I’ve been around since the ARPAnet and so it’s time for me to go off and do something else. But I think I can represent the technology in the company quite well.
NuoDB is the next generation distributed database. It’s a database system that is a total break, 100% break from what’s been done in the past. It’s relational standard interfaces. It has JDBC interface as its primary interface. But the technology under the SQL engine is just completely different, it’s not disk based, it’s based on what we call atoms which are distributed objects. There can be any number of instances of an atom, they replicate to each other in a peer-to-peer basis and they are completely dynamic. It’s a radically different architecture from anything that I’ve done in the past and anything else that exists that I’m aware of in the planet.
It’s designed to be completely elastic, which means you can at anytime add another node and it goes faster, higher capacity. It’s certainly designed for the cloud, for the unusual characteristics of the cloud. It’s for database administrators and everything in between. We call it active-active; let me describe very briefly what the architecture is like. In a single database for NuoDB there’re really two different types of nodes that they have to be present. 1) is a transaction manager that pumps SQL, does user requests and runs transactions. 2) The other is the storage managers; something that keeps persistent copies of all of the data so the databases will be shut down and brought back up if necessary.
A database system can have any number of transaction nodes, whatever is necessary, and any number of storage managers. They all communicate together, they all participate in replication and depending on what the demands of the application are, you can balance what you need for storage and where you want the storage to be as well as how much transaction processing that you need. There is great depth in the company and the investors. I have been around literally since the ARPAnet. I’d like to say, I was on the internet when they were reporting certain computers but it was actually the ARPAnet the precursor doing distributed data management.
I’ve done 4 or 5 relational database management systems: I dsigned the DEC RDB product architecture. Barry Morris who’s our CEO, was with CEO with StreamBase, a real time database type processing for high volume data. But also in our board we have Mitchell Kertzman who was at one point CEO of Sybase and Gary Morgenthaler who was the founding CEO of the Regional Commercial Ingres Company and also the CEO of Illustra.
Back when I was on the board of 5, four of the members had been database CEOs; a great deal of experience. Our company is located in Cambridge, MA.
Recently NuoDB announced a very interesting business deal with Dessault Systems. For those of you who follow the ad world, Dessault has a system called CATIA it’s a 3D modeling system. If you’ve been on an airplane the last month, it was probably designed with CATIA; they kind of own that market. Dessault not only has a business arrangement with the NuoDB, they came in as investors; as an investment rather, the major investment.
A company makes an investment in a vendor for really one or two reasons. One is they want to base their product on a technology and they want to be able to make sure that that is being done properly. The other is, they see a hot market and when they see it and they want to jump on the bandwagon. We hope that in this case it is absolutely both. It’s a very demanding application and it’s going to be pushing the technology for both companies. I’m very excited about that.
Roughly what NuoDB is about is that, if you look at cloud-style applications, we’re all talking web applications, mobile applications. People have figured out how to have Web servers that scale out. You have load balancers; you have as many Web servers as you need to handle demand that distributes very nicely. Same with app servers; you have whatever you need to actually do the computing, the Web servers. Down on the bottom you’ve got storage servers and everyone knows how to put lots and lots of storage on computers.
But the bottleneck has always been the DBMS servers that historically have not scaled. The reason that they don’t scale is that generally they either run out in one computer and the only way you get more capacity is to get a faster and more expensive computer or in a shared storage system where you had a bunch of CPUs, small number of CPUs essentially emulating a single processor system but sharing shared disk.
A cloud database has to do better than that. It has to treat a processor, a CPU, not as the central part but just a piece of the computing environment. Elasticity is everything, you need to be able to make something faster by plugging something in rather than taking it down, reconfiguring it and bring it back up. This picture here, it’s real data: you have an application that’s running on a single node; it’s running nicely, producing a fair number of transactions per second. You plug in a second node, there’s a short delay and suddenly you’ve doubled the capacity. NuoDB actually has this characteristic.
Let me talk about the capabilities of the system and essentially everything is about elastic scalability. All of the characteristics that make NuoDB different are all direct results of elastic scalability and that is the characteristic of the system to be able to plug in another transaction node, fire it up and get additional capacity. Also to be able to take a node that’s in the database system, drop it out; to reduce, if you don’t need it, you can turn it off, you can spin it down if you’re actually running in the public cloud.
The ability to add and drop nodes is kind of metrical because obviously it lets you scale up in terms of your capacity. As the number of users goes up you can add more capacity. Another way that you can actually move a running database around the world if you need to or from hardware to hardware is you can start either up on one set of machines, add another or drop out one of the original. We’re could take the database system down. You could move the physical database say from Cambridge to California. You could change the hardware; there’s no reason to ever shut the system down, it just continues to run.
Because of that NuoDB is able to implement rolling upgrades. With all of the database systems that I’m aware of, when you do a software upgrade, you have to shut down the database through the upgrade and then bring it up. But NuoDB is quite different. We have an automatic process that does this but what goes on under the covers is to start up another node, an extra node, shut down one, upgrade the one that has been shut down, bring it back up, reenter the cloud and then it go on to the next node. When the last node running the old version of the software goes away, suddenly each server will kick to the next version, the new version of the software and start using enhanced protocols; new message formats.
There’s no reason that requires that you ever take a NuoDB database down. An interesting characteristic which t has been causing a lot of attention is geo-distribution. You can have a database system that connects data centers. You have one in New York and one in Tokyo and by the nature of the replication with NuoDB, you can make that architecture work. Where you don’t need full bandwidth to continuously move all of the data which was in New York and Tokyo — a lot of it can stay local.
The way this works is, in NuoDB every server has a series of atoms which are distributed objects and each atom knows what other systems have copies of that atom, so they know how to replicate between servers. If they need access to an atom and they don’t have it, they know who has it through a catalogue mechanism and each one has been pinging everybody else once a second. They have some idea about the relative responsiveness and they don’t really care whether the other guy is on a different continent or is really busy or is a slow CPU, it doesn’t make any difference.
If somebody is responsive they are good, if they are not responsive they are bad. When it needs a piece of data, when it needs an atom, it will always go to the most responsive node that has it and if nobody has the memory it will go to a storage manager. This means that the large data requests are all being done locally; so if you have, for example an ATM network half in New York and half in Tokyo, most of the customers in the banks and the ATMs are all in New York, that stuff is all going to be local and the same in Tokyo. There’s not a large amount of data going back and forth. There is necessarily replication data because somebody from New York might show up in Tokyo. It does have to be shipped there, but most of the communications happening locally in one side or the other.
Obviously it’s application specific, and exactly what it’s going to cost in terms of latency and network is very, very hard to predict. It’s really not application mutual, but it’s a very interesting characteristic. Continuous availability, I talked about this a little bit earlier. The constant ability to add nodes — if a node failed, being able to add additional capacity and to be able to do rolling upgrades when there’s a software update. It means that the database system never has to go down.
It can go down if you choose to set it down or if there’s an unexpected power failure, you can restart it, but that’s not something that normally happens. Let’s get to multi-tenancy (I hate system administration personally). Anytime I’m doing system administration essentially, it detracts from my lifespan. I designed this thing so to be self tuning. It’s also so dynamic that humans couldn’t keep up with it anyway. It keeps track of the responsiveness of nodes and the communication paths within it. It does whatever it does to adapt to what the changing circumstances are.
The tuning, the current feeds in are basically taken out of the hands of humans and left in the hands of computers, which have better idea how computers are supposed to work. The essential thing is the elastic scale-out is great and scale-out is good. This is what makes it work well in the cloud, give it the resources it needs to handle the current load and let it go. If the load picks up, which everybody wants, then you add more resources and the system responds to the load. Talk about availability, geo-distribution and no-knobs administration.
That is in a nutshell without going too deep into the technology, is what NuoDB is about and I look forward to Robin’s questions.
Eric: Okay good; and with that let me go ahead and hand the keys over to Robin.
Robin: This is actually an extraordinary, I have to say because this is the most surprising technology and more surprising database technology I’ve encountered in the last decade. I mean, it really is that there is a claimer here (I believe it to be true because of the customers out here) to be able to do things that databases simply could not do before.
What I intend to present are just a few questions: what do we really mean by distributed database? I’m coming at it possibly from a similar direction to where Jim is coming from but in a different way. The true distribution to database distribution has always been a holy grail. To a certain extent database engineers have always wanted the ability to distribute a database. The reason for it above and beyond anything else, is that means it scales.
If you go back to the beginning of the world of IT, the original databases on the mainframe, there was in the first a naïve period when databases came into — the belief that there would be the possibility of putting all of an organization’s business data into the database. You would have one database that would replace the whole idea of file system, and it would just sit there and whatever business applications you wanted to build, then you would just plug into this database. That idea was shot to smithereens very quickly.
We ended up with the idea of a database as a local thing in the sense that you build an application and you need a place to put the data and the database is a much better place than sticking it in files because you got the data. The data is reusable and so on and so forth and the data is managed for you; so there are lots of things you don’t have to do. Good things, but you never had the idea that the database would be able to just get larger and larger in terms the amount of data that it stored but also the various tables you wanted to drop into it.
What we’re talking about in terms of what NuoDB is, I’m sure Jim will correct me if I’m wrong about any of this. It’s a database that actually seems to be incredibly expansible and it’s also an OLTP database. There are certain things that you can scale out in terms of large query loads that you simply cannot so easily do with OLTP.
Anyway, we start off with what is a database? Well, a database is a software that presides over every heap of data. Implements data models, manages multiple concurrent requests for data, implements a security model, is ACID compliant — I presume everybody listening knows what that means. But there’s a question mark now because, in the efforts to scale out various databases that have come into existence in the past 5 to 6 years, have actually loosened up the ACID rules and they don’t promise immediate consistency, they promise eventual consistency.
They’ve done that in order to be able to scale out for particular workloads. But normally one expects ACID compliance and we certainly expect data compliance from an OLTP database. We also expect the database to be resilient. We expect that if anything fails (a software failure or a hardware failure,) we will be able to get the database back up and it will not be damaged in any reasonable way. That’s what we expect of a database.
The problem with distribution is a difficult problem to solve this. What I’ve drawn out here is two cloud data centers and two data centers that we control and I’ve just played out eight nodes of a database across all that and I’ll ask what happens if a person has a query. By some piece of magic I’m saying that it goes straight to the Internet, but the application request will go to some kind of broker that we should distribute out to the various pieces of the database.
If you look exactly at what would have to be done with the query in order to implement something like this, it’s incredibly complex. Imagine for instance the data has been shared out between these whole places and in order for the query to be answered the database actually has to know which part of the query to send to which place. Let’s say it goes first to data center 1 and then that sends out the rest of the query to all the other places. Then each particular location goes and gets the data that you want to get, the data that’s wanted.
It resolves the level of each location; I’m talking about geo-distribution. Then the full answer actually has to be resolved across all locations. If it is a simple OLTP query, it might actually just resolve one location or perhaps two locations, depending on what was linked to what. If it’s a larger query than that then you also have the problem that in order to get the full answer, you’re going to have to master the answer somewhere and then mastering the answer means that you’re going to have to throw some data about. If you choose the wrong place to master the answer, you may throw a lot of data out. As for the remainder, we’ve got the network latency involved in all of this.
You look at this and you think, “I don’t think that geo-distribution is actually possible because the amount of latency that we’re going to incur by trying to do all of this is going to actually just make it not work.” The interesting thing about NuoDB is the fact that it has an architecture which gets around this. I would say that it kind of cheats by putting in every location all the data so that any location is able to take over the whole thing if it absolutely had to. Then it catches the active data in memory and it goes between the various sites in order to determine which is the best place to respond to anything.
But of course, NuoDB hasn’t been built to do large query traffic; it’s been built more for OLTP at this point in time; and that will be a question for Jim later on, which is really, what would happen if you get a big query? That’s the problem we’re trying to deal with here and it looks very intractable if you look at it in the simplistic way that I did. Usually with the database, you would want to scale up onto a single node so you don’t have any networks and you hope this network is slower than everything on the single node. It would be best to kind of saturate a single node before you go to another node.
The first step in scaling out is to actually go to a well-engineered cluster so that there’s no other traffic going on in the network and therefore your networking between nodes is going to be as fast as it’s possibly achievable. The original databases that became the standard products of people implemented, Oracle database for example, was scaled out onto a tight cluster and the various software was running in parallel across that cluster.
Then the next move to scale out onto a more loosely coupled format is where you suddenly run into problems for a lot of reasons. First of all, if you are on a very large grid of computers and every single node in the grid has to know what’s going on — on all of the other nodes — you got a lot of messaging going on. That messaging itself has to be managed. You’ve got all sorts of potential latencies occurring or waits occurring. You can also have problems arise from locking, I mean there’s just an immense amount of problems.
At some point, the next scale out approach is getting into bottlenecks. That will depend upon what loads it’s trying to do. It’s going to occur much sooner with OLTP workloads, because with OLTP workloads you have the possibility of two different nodes wanting to update something at the same time. You have to resolve the updates, and you can get into all sorts of Mexican standoffs depending on how you try and do it. This is going to make it difficult.
Database is hard to distribute but in terms of database architecture, we got parallelism in a big way since Intel started to go multiprocessor. There has been a lot of work on scaling out, but databases simply do not distribute very well and it’s very hard to make them distribute very well. If you look at actual attempts of distribution, geo-distribution isn’t really any different to distributing within the same data center, it’s just you got much bigger latency issued because you’ve got to go through all sorts of protocols to get onto a wide area network and throw things between the network.
To the extent that geo-distribution in the way that Jim was talking about anyway is something that’s not really been achieved before. One of the things that distribution needs to do is simple replication. Match the slave and when you’ve got one of the nodes as a master, the other is replicating data of it, it’s a slave. You just don’t let any updates go to the slave because they are going to get sent back to the master anyway.
This you can get away with replicating data in this way to a certain extent, but as soon as you got intensive updates, you’re going to run into problems with this unless the slave is simply a replica that has no relationship to the master and everything is really happening on the master. In which case you haven’t scaled out at all, all you’ve got is a replicator that’s taking some query traffic every now and then.
You’ve then got multi-master replication which is peer replication. This, if I understand it correctly, is what NuoDB is doing, but there are other attempts out there, and they don’t work in the way that NuoDB works as far as I understand it. When you get multi-master replication you’ve really got the problem that each node needs to know what’s happening in all the other nodes and working that out. You know, that one is measuring its traffic the other is the natural latency while you wait for other nodes to tell you what’s happened; sorting that particular problem out becomes really difficult. The more nodes that you have multi-mastered, the worst the situation gets. If I understand correctly NuoDB is a kind of multi-master application because any one of the nodes can actually match a given transaction and replicate it to the rest automatically. It’s just that there isn’t any declaration as to what data must live where.
One other question I wanted to ask was just for clarity. What parameters can the user set? I know you’re talking about zero latency but I understand why a lot of this absolutely is zero happening. It can’t be the case that the user can’t set parameters. They presumably will want to know when the number of nodes that are available are starting to run out of resource. What can the user do in order to monitor the environment? What do they need to do?
Jim: Okay. Let me get back to that question in a minute. Because the first thing I want to do is comment slightly on what you were talking about before.
Jim: That is, the essence of NuoDB is really what I would say is a theoretical breakthrough. It really isn’t a theoretical breakthrough; it’s the discovery of a major blunder in computer science. When the first database systems were invented and transactions were invented, the only technology that was available for managing transactions was two-phase locking and the lock manager. Where you had to lock everything and if you locked it in the right order nobody could get out something that was about to be updated until the transaction committed.
The test on whether a system that is based on two phase locking works or not, whether it’s consistent, is that transactions have to be serializable. Because if they are not serializable that means that one transaction is tripped over another one and the locking scheme hasn’t worked. From that there was this placid assumption that serializability was a necessary condition for consistency. It isn’t a necessary. It’s a sufficient condition but it’s not necessary. What’s kind of magic about NuoDB is that there is no single definitive global database state. The serializable database system at any given time, there’s a single definitive state. That essentially makes it impossible to distribute it efficiently.
NuoDB which is only transactional. You can only view the database through a transaction. Each transaction is itself consistent. A transaction sees other transactions that were reported as committed on that node when the transaction started. It’s not global, it’s what was reported. We detect when one act of transaction tries to interfere with another one, we handle that case. We can guarantee consistency without being serializable. That what sets NuoDB apart from every other database system on the planet. I gave a talk at MIT a long time ago, an IEEE ACM talk which was entitled “Einstein’s special theory of relativity and the problem of database scalability.”
If you pick the position of Newton that there’s a center of the universe and everything is in it, then you’re stuck. You can’t distribute things efficiently. If you take an Einsteinian view that it’s relative to a viewer’s perspective, then the stuff works. Now, back to your question.
Robin: Wow! Let me just comment on that. That’s just smart. I really do hope that the audience actually get that. In order for them to get it they have to have a reasonably good appreciation of relativity. Okay. Let’s back to what can a user actually do in terms of configuration parameters?
Jim: First of all there are three types of people that interact with a NuoDB system. There are system managers who don’t have anything to do with individual database systems. They manage what nodes are running. When they start up a transaction node or a storage manager, they have to tell how much physical memory can you use. This is really a garbage collection parameter. When you get to this point, we have so much memory on the system and I’m distributing it among a bunch of processes. That’s just to avoid page faulting.
The second parameter they have to set is a maximum disk bandwidth for storage managers so it doesn’t overload the system. It’s well understood that if you take a Linux system, and you consistently write disk data faster then it’s going to absorb, really bad things happen. You need to manage disk bandwidth and a storage manager on a physical memory that a database process can use and that’s pretty much it.
The second class of users is database administrators. Those are the guys that we know. They set up the data model. They figure out what the records are supposed to look like, they figure out what’s supposed to be indexed. They have all the classical things: declared secondary indexes, primary foreign keys, all of that. They have almost nothing to do with tuning, other than defining what indexes there are, and I will say passing the node will be just a little bit smarter. Because you can use multiple indexes for a single retrieval, which makes it unusual, but good.
Then finally we have the actual guy who’s the application developer or somebody who’s using a tool doing database stuff. He doesn’t have anything he can do to tune it. That’s just going to happen. Mostly the database system isn’t tuning itself internally, it knows where data is. It knows that it should get it from somebody who’s got it in memory. They noted that a couple of guys have it in memory and will be most responsive. Which is usually the nearest fastest guy in a block.
I hate tuning. I don’t think humans should tell computers how computers should work.
Robin: I understand that perspective entirely. When you get into some kind of monolithic database that has all of these leaders in which they’re behaving in some way that was best to be able to make performance happen. It starts to become really … it takes an awful long time to properly appreciate what an engine like that actually does when you start messing about with parameters.
I’ve got two questions in terms of latency. It just has to be the case, that the more nodes that you add, you will start to build up some kind of potential latencies or bottlenecks. My understanding is you’ve tested it with 100 server nodes and it seems to function okay. Is there a latency build up that happens once you start to run on that many nodes?
Jim: Not really. There’s an additional cost per node and it mostly comes when it’s time to commit a transaction — it has to send a message to every other node saying this transaction is committed. This will mean the transaction has done an update. If it hasn’t done an update nobody else has to hear about it. The more nodes that it has to send it to, the more packets have to go out over the network. On the other hand, it patches stuff. If it’s in a process of sending one message to everybody and then another three or four guys do, they’ll go out in the same packet. It really isn’t so bad. The answer is, no there really isn’t much of a latency you get for 100 nodes.
Robin: Okay. The next thing is what is the latency for geo-distribution? Because if nothing else goes to the protocol delay and the actual speed-of-light delay that’s between, getting from New York to California let’s say. Have we got any kind of feel for what kind of latency gets in there?
Jim: Can I say no?
Robin: Yeah, you’ve got your wholesome idea. I would presume that you must certainly be getting into the millisecond area once you actually get thousands of miles between data centers.
Jim: Well, let me back up and talk a little bit about commit protocols. I can’t see anybody in the audience, but I hope the eyes aren’t going to glaze over before I finish. It’s kind of a philosophical policy question. What does it mean for a transaction to commit? NuoDB minimally it means that it’s been through all of the updates, and everybody agrees that it’s okay. Who has any say about it? It has sent a message to every other node in the database saying that this is committed. At the lowest level you can tell the application that it’s committed when you’ve notified everybody else.
Obviously there is a point where you send out the messages and then you tell the application they’re committed and somebody hits it with an axe or a mudslide hits it or some disaster occurs, trips over the power code. Well, what happens in that case is that the other node is saying, “Hey, Fred over there seems to have gone away” and one will say “I’ll manage it, here’s are all the messages I got from Fred since the last commit and he’s done everybody else’s.” Okay, I’ve got it to a couple more than I got. They’ll do message reconciliation so they get to the same state and decide whether Fred committed or not. If he just committed to one guy, he’s committed.
Now, that works but it’s not classical enough for some people. A lot of times people say “Well, okay. I really want to have the storage manager come back and say that he’s actually received the commit before I want to go ahead. What are the options that the administrator can do? He can say: well how many storage managers do you want to confirm to commit before you report it is committed? If you take that option, you’ve probably got a higher degree of safety, but you also know the latency of the round trip to the storage managers and back. If it’s geo-dispersed then you’re going to pay the round trip from New York to Tokyo and back. There’s no way to avoid that if that’s what your policy is going to be.
The next level up and other customers out there who have corporate policy saying “Well, we can’t consider it committed unless everything having to do with that transaction is sitting on disk side and rotating memory on at least two different disks. What we do in that case, in fact this is always going on, is that NuoDB storage managers log all incoming replication messages which includes the commit message. They’re written using non-buffered IO so they go straight to the disk. If you take that option that says I got to have two guys in every region actually write all the messages to disk before I get the commit message back. I need to hear from all the regions before I can tell the user that it’s committed, then you do have a significant latency.
What I’m saying basically is that based on your corporate policy philosophy you can choose your commit modes based on what your requirements are. Google has well established that the human attention span is somewhere between 250 and 300 milliseconds. Can you actually do this round trip and get back and give somebody before they hit a refresh button or click on another link? That’s a good question. I think it’s highly application specific. You’re standing in an ATM machine; I don’t think it’s a question. What I’m doing in many words is backing the question.
Robin: You’ve given us at least a sense of the kind of timescale they’re talking about. I’ve got two things I just wanted to ask about… you never mentioned, but I think it’s a very interesting thing that you have the capability within NuoDB of putting data in a given location. Because of various laws in various countries, they insist that certain data shouldn’t leave the country. That’s a very useful capability. I just wanted to ask how does that work and what is the extent of it, how do you do that?
Jim: Let’s say you’ve done something that they haven’t told me about, I don’t think NuoDB does that.
Robin: In which case it must be something that you talked about in the future.
Jim: We’re very aware of the legislation and in EU, in Canada and the United States. Why all of the public Cloud provider have to have sub-Clouds in different countries and different continents and have different rules. This is very sad that the world has come to this. Right now every storage manager has a full copy of the database and part of this is for safety and part of it is for performance and part of it is to minimize messaging. Right now there’s no way that you can say “I want this data only there and not there.”
Robin: The other question before I hand you over to the audience is really the query question. You clearly can’t pose queries to the database that’s distributed in a way that you distributed it. The question is at what point in time does the architecture itself create any problems to NuoDB to start running large queries against the data? Or does it not actually matter too much?
Jim: What happens if you throw a large query at NuoDB, it goes to a transaction engine and starts grinding away. That transaction engine is very busy and when somebody else pings, it takes a while to get around to respond into the ping. Or it says “I guess, I was busy and stop asking him for stuff and I’ll go someplace else.” We have more elegant ways of handling it. That is when an application process is attaching to a database it goes through a broker. There are things it can say about the broker. It can give it what’s called the connection ID, a connection key. The broker will hash that and use that as a decision toward direct query and this will give the performance characteristics where you have nodes that start specializing in specific atoms.
But the same mechanism can be used with a little creativity for somebody to say, “Okay. I’m going to do something that’s is likely to go dim. I want to set it to a transaction node that’s going to execute it and don’t have anybody else trying to update that guy, he’s going to be busy doing my stuff.” Ideally the brokers will handle this and I’m not sure we have all the mechanisms in place for doing that, but logically that’s the way it would work. It’s just a question of putting the tweaks into the broker to handle that situation. What NuoDB does not do right now is it doesn’t have the ability to execute queries in parallel on different nodes. Execution is all on a single node, which is pretty much necessary with the messaging and transaction model at the moment.
Robin: Okay. I understand that. Eric, I can presume the audience do have some questions.
Eric: We have a ton of questions. Understanding what you’ve designed and how it fits very well in these Cloud environments, which we were talking about are very heterogeneous and very demanding. I think that’s one of the keys to keep in mind here with the Cloud is that you’ll get these spikes of activity that can be frankly overwhelming for older, more traditional systems. If I understand it correctly this atomic approach that you’ve taken where you’ve got all these atoms that can talk to each other and do various things.
We’ve had of other companies taking a similar approach, not exactly like this and not for database. I’m guessing that sort of multi-atomic environment or strategy is what blends itself so well to Cloud-based computing.
Jim: Actually there are a couple of characteristics that make it nice.
A couple of things; one is the real benefit of public Cloud is you pay for the resources that you use. If you don’t need a particular resource you spool down that server and you don’t have to pay for it. If you own the servers, you pay for them whether they’re working or not and you’re probably paying for somebody to sit there and watch an idle server and that’s even more painful than owning an idle server. To make this work you have to be able to spool up and down your database characteristics. If you have a US daytime heavy load and everybody goes to sleep at night, presumably sooner or later you can throttle back, drop nodes out of the database and shut down those servers and not have to pay for them.
It sits on economics. There are other little things about it is that it is very network aware, because when you’re running on Amazon internal connections which are really cheap and very fast. But if you go outside and come back, it’s slow and really expensive. It knows about that sort of thing which is very useful. The other big thing is that going over the lifecycle of an application, it started in the development shop and you’ve been running on NuoDB, running on a cheap box in the corner. Then as it’s starting to be rolled out, it moved to a corporate data center. When the load builds up you can move it to a public Cloud and expand it out there on elasticity and then it gets to be very expensive because you’ve got a large continuous load. You can bring it back in to real life physical machines.
The magic thing about NuoDB is that the database system never has to go down during all those transitions. There are times when local machines make sense. There are times when the Cloud make sense. There are times that they’re going to change and you want move the focus of where your database system is and NuoDB kind of gives you a win-win-win. Everything is synced together.
Eric: Right. That’s interesting stuff. It reminds me years ago and it is a bit of a metaphor here, but it reminds of the difference between Apple and PCs. The way Apple would allow you to upgrade the operating system without having to backup and de-couple your file system manually. Whereas the PC you couldn’t just go ahead and put a new operating system over your existing file system it didn’t like that at all. Interesting.
We have a whole bunch from folks right here. Here’s just one I’ll pick out of the blue. One attendee asks. I believe your slide detected NuoDB to be continuously available, but it runs presumably on things that are not servers, Linux, disks, memory, etc. How do you run in the event of specific failures? You kind of talked about that earlier but maybe you could just expound a bit.
Jim: Sure. That’s why we support multiple storage managers. If a storage manager goes down, well that’s fine — the rest of the guys carry the load. When you fix it, you bring it back up and assuming there’s anything left of the disk, it will resynchronize with the other storage managers and just sit there and fetch stuff from other guys until it’s completely up to speed. The it will turn itself on and say “Okay, I’m now a storage manager.” Alternatively if you really need to you can shutdown the storage manager, copy the database, unless you’re running CFS which means you can do it online. Clone the actual database atoms on disk and restart that server and start a second server on the clone. There are a lot of ways to handle it.
Eric: Good. What about incremental changes and different data models, how do you deal with all that?
Jim: From the beginning the system is designed to be able to change anything online. Adding columns to tables, changing data types, adding indexes, dropping indexes, you name it. It all works online. I was running hot multi-user pumping transactions. Just as a side it has a characteristic that I really like. That is it uses an interesting data and coding where coding is based on the actual data rather than the declared data. Internally it has a concept of strain. It doesn’t care how big the string is, it’s just a string or it’s a number. A number is a number is a number.
You never have to do things like, “Well, I’ve got 40 characters for a last name and well I’ve got somebody who’s anthropologist, second generation. He’s got four hyphens and I’ve got to make it 80 characters. That doesn’t happen. Everything is very soft. When you change data in NuoDB it doesn’t change anything on the disk really toward the atoms. It just starts storing, do records and do formats and keeping track what the older ones look like.
It’s kind of magic, a lot of mirrors in there.
Eric: Here’s another very specific question. Does NuoDB support the XA protocol?
Jim: That’s an easy one. No.
Eric: That’s like open source protocol for two phase commit, is that right?
Jim: It’s a two phase commit protocol, yes. I’ve put hooks in there to support it but no one has ever asked for it in that line.
Eric: Interesting. You could handle it but you just haven’t seen that being asked before.
For the record, there was a project that I did it back in 1983 that was the first commercial database systems for the two-phase commit. I’ve done the first database system to support two-phase commit and the first database system that has no reason to do one.
Eric: We get it. You’ve talked about this in length too, but we have some people asking about it, so maybe just a bit more detail. Could you talk about locking again? What kinds of situations cause locks and how is locking resolution resolved basically?
Jim: This is great. There is no lock manager, there are no locks. This is where we got Mitchell Prutzman interested. We were giving a pitch to his PC company and he was sitting there dozing off and somebody asked about the lock management and he said “We don’t have a lock manager” he stood up straight and said “What!” It’s multi version currency control. When you update a record it doesn’t replace the existing record, it creates a new version of the record stamped with the transaction ID and the guy who made it. Every transaction knows which versions it should look at and which versions it should ignore. Under the cover it sees all of the versions. This is how we both give a transaction a consistent view of the database and how we can detect if two transactions are trying to update the same guy.
Eric: That’s fantastic. Talk about great for compliance issues. You’ve got your built-in audit trail and your built-in list of controls or transactions that had some impact on the data that’s being read.
Jim: I wish I could say yes, but word on the future features which didn’t get around to implementing it and it’s not going to be in the next six weeks either. It’s the concept of time travel, where you can say “I want to execute this query as of last Thursday,” but we don’t have that now.
Eric: I’m just curious to know …
Jim: I have a selection going on for efficiency so it doesn’t keep stuff a long time. It could.
Eric: I got you. Good. Well, that’s good detail. Let’s see. We’ve got two other good questions in here. How do you prevent NuoDB from eating all the network quota, upload and download while replicating data between data centers?
Jim: Excuse me. Did I say we kept it from eating to quota?
Eric: I have no idea. We don’t do anything …
Robin: Somebody else’s problem.
Jim: We try and minimize messaging obviously, because that’s directly performance. What we have to do we have to do.
Eric: I understand. Good. We’ve got a few more questions. The attendee asked. Could you speak of NuoDB and integration with Django. I’m not sure if that’s something that is worthy of talking too much about. You can jump on it if it is. Here’s an interesting one. On data types do strings containing digits get morphed into zeros? The infamous microscopic cell scenario. Is the client interface defined in terms of strings? Those are good questions.
Jim: No. The interface is JDBC. In fact the SQL engine is JDBC, even so it’s not Java. It has JDBC semantics. We support all of the Java data types. All the classical data types even if they map into the same thing internally. When it’s in memory we know what data type it is. It’s when we serialize it or transmission over the network, or serialize micro-storage on disk that we switch to a highly compressed format. Data storage is not necessarily the way that it’s declared.
Eric: I understand. This is really fascinating stuff. Robin I appreciate you getting all excited about it too, because what you guys have done here really is stretch out in very fundamental ways the capacity of database management. As you suggested you guys have a number of interesting things in the roadmap as well.
Jim: I’m very boastful about talking about my baby, you can see.
Eric: That is fantastic folks. We’re going to be wrapping up. I’m sure we’ll be hearing from Jim sometime in the future. With that folks, thank you very much for your time and attention. We will archive this event. We do archive all these events. Hop online to Insideanalysis.com to find links to all that. On the right-hand side of that home page you’ll a link that says recent episodes. Click on that and you can go to a page with all the episodes of the briefing room in year number five.