2014-09-09

Installing a Cassandra 2.0 cluster on Ubuntu Trusty

Assuming you’ve already went through http://www.iknownothing.com/devops/initial-server-setup-trusty/…

Prerequisites

Java (Oracle 1.7)

For Cassandra, we need Oracle’s JRE, as many things wouldn’t work if we used OpenJDK.

Luckily, the process is pretty straightforward:

Let’s add JAVA_HOME for the current session, and make it persistent:

Add:

JNA

Cassandra also requires JNA. Install it with the following command:

Installation

To figure out how to install Cassandra 2.0 I was looking at following docs, which are extremely complete:

http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installDeb_t.html.

Installation is pretty easy:

Then:

Configuration

Configuration is more time-consuming.

I’m using the instructions on this page as a starting point: http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configTOC.html

Reset config

Stop Cassandra and remove the default configuration:

Timezone

Before using Cassandra, double-check that you have the correct time set on all nodes.

Cassandra uses timestamps to write columns, and it wouldn’t be good if nodes are set to different timezones:

If needed, just run this and pick whatever timezone, and set the same on all nodes (I personally use UTC):

To keep it synchronized, you can use NTP—that should be enough:

Make sure the service is running. If you’re in a VM, you might have to do this on the host OS.

Main settings (cassandra.yaml)

This is the main configuration file.

I’m using the instructions on this page as a starting point: http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html.

To get started:

Secure installation

Out of the box, Cassandra is set to disable authentication (i.g. lets everyone in), and give users full access to the system:

Let’s change it to:

Also, you have this, which you might want to set to CassandraAuthorizer (see http://www.datastax.com/documentation/cassandra/2.0/cassandra/security/secure_config_native_authorize_t.html):

Basic settings

Here are some basic settings, as explained on datastax.

cluster_name

This setting prevents nodes in one logical cluster from joining another. All nodes in a cluster must have the same value in all datacenters.

listen_address

This is the IP address of your machines, which is used by Cassandra for listening to other Cassandra nodes.

For a single-node configuration, you can put localhost or leave empty.

For a cluster config, if your node is configured properly (host name, name resolution, etc.) you can leave this empty, and Cassandra will use Java’s InetAddress.getLocalHost() to automatically get  the local IP address (you can probably check if it would work using ifconfig—if you see your public IP address there, it should).

There are cases where you can’t leave this empty. For instance, Java might be unable to figure out the address if you’re on a virtual machine.

To hardcode an address:

seed_provider

I specified what machines are seeds (in my case I have a 3-node cluster so I only have one seed, which is the first server I set up):

Snitch

For multi-node production deployment, the recommended snitch seems to be GossipingPropertyFileSnitch:

rpc_address

For rpc_address, I just put 0.0.0.0:

Datacenter/rack settings

Now, we need to set the datacenter/rack settings (see http://www.datastax.com/documentation/cassandra/2.0/cassandra/initialize/initializeSingleDS.html for more info):

And you’ll see something like:

These settings are up to you, i.g. it doesn’t have to be an actual datacenter. You can pick whatever you want, but cannot change it so give it a good name :-)

cassandra-env.sh

Specially if you’re getting a “Failed to connect to ‘127.0.0.1:7199′: Connection refused” error, you might have to change -Djava.rmi.server.hostname.

Just uncomment and put your IP address/domain name where it says <public name> (towards the end of the file):

Reboot

IMPORTANT: after making any changes in the cassandra.yaml file, you must restart the node for the changes to take effect:

Start Cassandra

After rebooting, start Cassandra with:

Then, to check if it’s working run:

If is shows you a list of nodes (only 1 for now), everything is OK. Otherwise, something is wrong, read below.

IMPORTANT: create admin user

You will be able to connect to the database using username ‘cassandra’, password ‘cassandra’, but obviously we’ll want to change this.

Follow these instructions: http://www.datastax.com/documentation/cassandra/2.0/cassandra/security/security_config_native_authenticate_t.html

Troubleshooting

If you run into problems, make sure Cassandra is running:

You can try stopping cassandra, run the following command, and reboot. Then, start cassandra again:

* could not access pidfile for Cassandra

I got this error a couple of times. The solution for me was to delete /run/cassandra/ and start it again, but with the instructions above it shouldn’t happen.

Rinse and repeat

Number of nodes

The cluster can be any number of machines. Some of them will be seeds, meaning that other machines will check with them to get info about the cluster. There should be more than one seed so that if that one goes down it’s not a big deal.

Ideally, I believe you want to have at least 6 nodes spread across 2 datacenters, 2 regular nodes and 1 seed per datacenter.

Configuration

You have to do all of the above for all the machines you want to use.

Double-check the timezone (see above)!

The seed would be the same,  since I’m using VMs, so I’ve taken a snapshot of what I have so far, and restored the other 2 blank machines from the snapshot. Then, ran the following to refresh settings/token, change IP-specific settings and rebooted:

IP-specific settings are:

listen_address in cassandra.yaml

hostname in cassandra-env.sh (unless you’re setting up a seed node, then you have to add its address to “seeds” on all nodes)

Rock & roll

Getting the cluster up

Start the seed(s), then the other machines, and run:

You should see all machines in the cluster.

It works!

To test, try to create records as explained at the bottom of this page: https://wiki.apache.org/cassandra/GettingStarted.

It worked for me, I added and removed records and they were immediately synchronized across all machines, so that I could operate on the DB as if it was one.

Pretty cool.

Further readings

http://www.tomas.cat/blog/en/cassandra-frequent-mistakes/

The post Installing a Cassandra 2.0 cluster on Ubuntu Trusty appeared first on nbrogi.com.

Show more