alvinashcraft
shared this story
from Envato Tuts+ Code.
SimpleDB is a remote database offered by Amazon Web Services (AWS). The world of data stores is usually divided into SQL and NoSQL, based on the use (or non-use) of the SQL language. NoSQL data stores are usually based on a simpler key/value setup. SimpleDB straddles this line—it is a key/value store and it can also use a variant of SQL for retrieval. Most SQL languages are based on a schema that lays out the rows and columns of the data, but SimpleDB is a schema-less database, making for a very flexible data store.
In the SimpleDB database model, you have items, attributes and values. Each row in the database is an item and can be identified by a unique and assignable item name. Each item can have up to 256 pairs of attributes and values. An unexpected aspect of SimpleDB is that an attribute can have more than one pair per item. I think the best way to think about SimpleDB is to think of a spreadsheet, but instead of each column/row intersection representing a single value, it represents an array of values.
This chart represents two items stored in a SimpleDB domain. The term domain is analogous to a “table” in other databases.
The first column is the item name—this is the only column where you can have only a single value, and you can think of it as a unique index column.
The other four columns (pets, cars, furniture, and phones) represent attributes that are currently in this domain—you aren’t limited to this, so every item can have an entirely unique set of attributes. In this data, the attribute pets on the item personInventory1 has three pairs; expressed in JSON, it’ll look something like this:
On the other hand, the item personInventory2 has only one pair:
While you don’t have to supply the same attributes for each item, you do need to supply at least one pair. This means that you cannot have an ‘empty’ item. Each attribute can have a value up to 1kb in size, so this means that each item is functionally limited to 256kb, due to the 1kb value limit and the 256 pair limit.
SimpleDB is distributed, which has some distinct traits that you need to understand and keep in mind as you design your app. Being a distributed database means a whole group of machines will respond to your requests and your data will be replicated throughout these servers. This distribution will be completely transparent to your program, but it does introduce the possibility of consistency issues—your data cannot be guaranteed to be present on all servers initially.
Don’t panic: it’s not as bad as it sounds for a few reasons. With SimpleDB, the consistency isn’t promised, but it is usually pretty good and quickly reaches all nodes from my experience. Designing around this also isn’t so hard—normally you try to avoid immediately reading a record you just wrote. Finally, SimpleDB has the option to perform consistent reads, but they are slower and may consume more resources. If your app requires consistent reading every time, you might want to reconsider using SimpleDB as your data store, but for many applications, this can be designed around or not even worried about.
On the upside, the distributed nature also affords SimpleDB a few advantages that mesh nicely with the Node.js environment. Since you don’t have a single server responding to your requests, you don’t need to worry about saturating the service, and you can achieve good performance by making many parallel requests to SimpleDB. Parallel and asynchronous requests are something that Node.js can handle easily.
Unlike many AWS services, there isn’t an Amazon-delivered console for management of SimpleDB. Luckily, there is a nice in-browser management console in the form of a Google Chrome plugin, SdbNavigator. In the SdbNavigator you can add or delete domains, insert, update and delete items, modify attributes, and perform queries.
AWS SDK
Now that we’ve gotten to know the SimpleDB service, let’s start writing our REST server. First, we’ll need to install the AWS SDK. This SDK handles not just SimpleDB but all the AWS services, so you may already be including it in your package.json file. To install the SDK, run the following from the command line:
To use SimpleDB, you’ll also need to get your AWS credentials, which include an Access Key and a Secret Key. SimpleDB is a pay-as-you-go service, but AWS currently includes a generous free allowance for SimpleDB.
Word of warning: As with any pay-as-you-go service, be aware that it’s possible to write code that can rack up big bills, so you’re going to want to keep an eye on your usage and keep your credentials private and safe.
Once you get the AWS SDK installed and have acquired your credentials, you’ll need to set up SimpleDB in your code. In this example, we'll use AWS credentials stored in a JSON file in your home directory. First, you’ll need to include the SDK module, create an AWS object, and finally set up your SimpleDB interface.
Notice that we are using a specific endpoint and region. Each datacenter is entirely independent, so if you create a Domain named “mysuperawesomedata” in Northern Virginia, it will not be replicated to nor present in the Sao Paulo datacenter, for example.
The SimpleDB object that you’ve created with new aws.SimpleDB is where all your methods for interacting with SimpleDB will be based. The AWS SDK for SimpleDB has only a few methods:
Batch Operations
batchDeleteAttributes
batchPutAttributes
Domain Management & Information
createDomain
deleteDomain
domainMetadata
listDomains
Item/Attribute Manipulation
deleteAttributes
getAttributes
putAttributes
Querying
select
In this tutorial, we will only be dealing with Item/Attribute Manipulation and Querying; while the other categories are useful, many applications will not have any use for them.
Test Data
Using SdbNavigator, enter your access and security keys into the tool, select ‘US-East’, and click connect.
Once you’ve successfully connected, let’s create a domain for testing. Click Add domain.
Then enter the domain name ‘sdb-rest-tut’ and click OK.
Now that you’ve created a domain, let’s enter some test data. Click Add property and add a property named “colors”. As a convention, I usually name properties in plural form to reflect the multi-value nature of SimpleDB.
Finally, we’ll click Add record to create our first SimpleDB item. In the ItemName() column, enter your unique item name. A quirk of SdbNavigator is that, by default, it will only accept a single value to each item, but this obscures the fact that a property can contain multiple values. To enter multiple values, click the S along the right edge of the property column.
In the new box, select Array to enter multiple values. In the Value column, enter “red”, and then click Add value and enter “blue”.
Finally, click Update to save the changes to this row.
Now that we’ve entered some test data, let’s make our first SimpleDB request from Node. We’ll just get everything in the Domain, which, at this point, will be just a single row.
The response will be logged to the console. Here is the response, annotated for explanation:
A REST Server
Since we’ll be building a REST Server that stores data in SimpleDB, it’s important to understand what a REST server does. REST stands for REpresentational State Transfer. A REST server is really just a server that uses HTTP standard mechanisms as an interface for your data. Often, REST is used for server-to-server communications, but you can use REST servers with the client through JavaScript libraries such as jQuery or Angular. Generally, however, an end-user won’t interact directly with a REST server.
Interestingly, the AWS SDK actually uses the REST protocol to interact with SimpleDB, so it may seem odd to create a REST server to another REST server. You wouldn’t want to use the SimpleDB REST API directly because you need to authenticate your requests, which would risk the security of your AWS account. Also, by writing a server, you’ll be able to add a layer of both abstraction and validation to your data storage that will make the rest of your whole application much easier to deal with.
In this tutorial we will be building the basic CRUD+L functions, that is Create, Read, Update, Delete and List. If you think about it, you can break down most applications into CRUD+L. With REST, you will use a limited number of paths and several HTTP methods or verbs to create an intuitive API. Most developers are familiar with a few of the HTTP verbs, namely GET and POST, as they are used most often in web applications, but there are several others.
Operation
HTTP Verb
Create
POST
Read
GET
Update
PUT
Delete
DELETE
List
GET
Notice that Read and List both use the same verb; we will be using slightly different paths to differentiate between the two. We’re using POST to represent Create as creating is not considered idempotent. Idempotent means that multiple identical calls will have the same result to the user and in your data, so an update (aka PUT) would be considered idempotent.
As our example, we’ll build a personal inventory server—a database to save whatever you own. Here is how the paths will look:
Operation
HTTP Verb
Path
Create
POST
/inventory
Read
GET
/inventory/1234
Update
PUT
/inventory/1234
Delete
DELETE
/inventory/1234
List
GET
/inventory
1234 is a placeholder for the person identifier (ID)—note that ‘create' and ‘list' do not have an ID. In the case of create, the ID will be generated, and with list, we’ll be getting all the names, so we don’t need a specific ID.
Building the Server
To get started, let's install Express, a Node.js HTTP server framework:
Express manages most of the minutiae in setting up a server, but it doesn't include any facility for handling the HTTP request body, so we'll need to install another module, body-parser, to enable us to read the request body.
Body-parser has a few different options for parsing the body of the HTTP request. We’ll use the json() method for readability, but switching to another method is just swapping out the method on the bodyParser object. We only need the bodyParser method on the create and update methods, so we can just include it in those particular routes.
Create
Since each SimpleDB itemName needs to be unique, we can auto-generate a new itemName for each newly created item. We’re going to use the cuid module, which is a lightweight way to generate unique identifiers.
SimpleDB expects attributes to be in the attribute name/value pair format:
Your server could certainly just accept and pass the values in this format directly to SimpleDB, but it is counter-intuitive to how data is often structured, and it's a difficult concept with which to work. We'll use a more intuitive data structure, an object/value array:
Here is a basic Express-based server with the create operation:
Let's start up your server and try it out. A great way to interact with a REST server is to use the cURL tool. This tool allows you to make an HTTP request with any verb right from the command line. To try out creating an item with our REST server, we'll need to activate a few extra options:
Option
Purpose
-H
Add a line to the HTTP heading
-X
Define which verb will be used
-d
Data to be sent in the HTTP request body
After running the command, you'll see a JSON response with your newly created itemName or ID. If you switch over to SdbNavigator, you should see the new data when you query all the items.
Read
Now let’s build a basic function to read an item from SimpleDB. For this, we don’t need to perform a query since we’ll be getting the itemName or ID from the path of the request. We can perform a getAttributes request with that itemName or ID.
If we stopped here, we would have a functional but not very friendly form of our data. Let’s transform the Name/Value array into the same form we’re using to accept data (attribute : array of values). To accomplish this, we will need to go through each name/value pair and add it to a new array for each unique name.
Finally, let’s add the itemName and return the results.
To test this, we need to use curl again. Try replacing [cuid] with the itemName or ID returned from our example of creating an item earlier in this tutorial.
Notice that we're using the -D- option. This will dump the HTTP head so we can see the response code.
Another aspect of REST is to use your response codes meaningfully. In the current example, if you supply a non-existent ID to curl, the above server will crash because you’re trying to forEach a non-existent array. We need to account for this and return a meaningful HTTP response code indicating that the item was not found.
To prevent the error, we should test for the existence of the variable awsResp.Attributes. If it doesn’t exist, let’s set the status code to 404 and end the http request. If it exists, then we can serve the response with attributes.
Try it out with the new code and a non-existent ID and you'll see that the server returns a 404.
Now that we know how to use status to change the value, we should also update how we are responding to a POST/create. While the 200 response is technically correct as it means ‘OK’, a more insightful response code would be 201, which indicates ‘created’. To make this change, we’ll add it in the status method before sending.
Update
Update is usually the most difficult operation for any system, and this REST server is no exception.
The nature of SimpleDB makes this operation a little more challenging as well. In the case of a REST server, an update is where you are replacing the entire piece of stored data; SimpleDB on the other hand, represents individual attribute/value pairs under an itemName.
To allow for an update to represent a single piece of data rather than a collection of name/value pairs, we need to define a schema for the purposes of our code (even though SimpleDB doesn’t need one). Don’t worry if this is unclear right now—keep reading and I’ll illustrate the requirement.
Compared to many other database systems, our schema will be very simple: just a defined array of attributes. For our example, we have four fields we are concerned with: pets, cars, furniture, and phones:
With SimpleDB you can’t store an empty attribute/value pair, nor does SimpleDB have any concept of individual items, so we’ll assume that if SimpleDB doesn’t return a value, it doesn’t exist. Similarly, if we try to update a SimpleDB item with an empty attribute/value pair, it will ignore that data. Take, for example, this data:
Logically, we know that cars, being an empty array, should have no values, and pets should have two values, but what about phones and furniture? What do you do to those? Here is how we translate this update request to work with SimpleDB:
Put an attribute pet with a value to cat.
Put an attribute pet with a value to dog.
Delete attributes for cars.
Delete attributes for phones.
Delete attributes for furniture.
Without some form of schema that at least defines the attributes, we wouldn’t know that phones and furniture needed to be deleted. Luckily, we can consolidate this update operation into two SimpleDB requests instead of five: one to put the attributes, and one to delete the attributes. This is a good time to pull out the code from the post/create function that transforms the attribute/array of values object into the attribute/value pair array.
We’re going to make an important alteration to the create function as well. We’ll be adding a new attribute/value to all items. This attribute will not be added to the schema and is effectively read-only.
We will add an attribute called created and set the value to 1. With SimpleDB, there is limited ability to check if an item exists prior to adding attributes and values. On every putAttributes request you can check for the value and existence of a single attribute—in our case, we’ll use created and check for a value of 1. While this may seem like a strange workaround, it provides a very important safety to prevent the update operation from being able to create new items with an arbitrary ID.
Since we’ll be doing a couple of asynchronous HTTP requests, let’s install the async module to ease the handling of those callbacks.
Remember, since SimpleDB is distributed, there is no reason to sequentially put our attributes and then delete. We’ll use the function async.parallel to run these two operations and get a callback when both have completed. The responses from AWS form putAttributes and deleteAttributes do not provide important information, so we will just send an empty response with a status code 200 if there are no errors.
To take this for a spin, let's update a previously created entry. This time, we will make the inventory only include a "dog", removing all other items. Again, with cURL, run the command, substituting [cuid] with one of your item IDs.
Delete
SimpleDB has no concept of an item deletion, but it can delete attributes, as mentioned above. To delete an item, we’ll need to delete all the attributes and the ‘item' will cease to be.
Since we’ve defined a list of attributes in our schema, we’ll use the deleteAttributes call to remove all of those attributes as well as the created attribute. As per our plan, this operation will be at the same path as Update, but using the verb delete.
List
Rounding out our REST verbs is list. To achieve the list operation, we’re going to use the select command and the SQL-like query language. Our list function will be barebones, but will serve as a good basis for more complex retrieval later on. We’re going to make a very simple query:
As we ran into with the get/read operation, the response from SimpleDB isn’t very useful as it is focused on the attribute/value pairs. To avoid repeating ourselves, we’ll refactor the part of the get/read operation into a separate function and use it here. While we are at it, we’ll also filter out the created attribute (as it will show up in the get operation).
With a select operation, SimpleDB returns the values in the Items array. Each item is represented by an object that contains the itemName (as simply Name) and the attribute/value pairs.
To simplify this response, let’s return everything in a single object. First, we’ll convert the attribute/value pairs into an attribute/value array as we did in the read/get operation, and then we can add the itemName as the property ID.
To see our results, we can use curl:
Validation
Validation is whole a subject of its own, but with the code we’ve already written, we have a start for a simple validation system.
For now, all we want to make sure is that a user can’t submit anything but what is in the schema. Looking back at the code that was written for update/put, forEaching over the schema will prevent any unauthorized attributes from being added, so we really just need to apply something similar to our create/post operation. In this case, we will filter the attribute/value pairs, eliminating any non-schema attributes.
In your production code, you will likely want a more robust validation system. I would suggest integrating a JSON schema validator like ajv and building a middleware that sits between bodyParser and your route function on create and update operations.
Next Steps
With the code outlined in this article, you have all the operations needed to store, read and modify data, but this is only the start of your journey. In most cases, you’ll need to start thinking about the following topics:
Authentication
Pagination
Complex list/query operations
Additional output formats (xml, csv, etc.)
This basis for a REST server powered by SimpleDB allows you to add middleware and additional logic to build a backbone for your application.
The finalized server code is available at simpledb-rest-api on GitHub.