2015-04-15

Introduction.

This article is exploring the process of installing HA OpenNebula and Ceph as datastore on three nodes (disks – 6xSSD 240GB, backend network IPoIB, OS CentOS 7) and using one additional node for backup.

Scheme of equipment below:
.

We are using this solution for virtualization of our imagery processing servers.

Preparing.

All actions should be performed on all nodes. For kosmo-arch all except bridge-utils and FrontEnd network.

FrontEnd network.

Configure bond0 (mode0) and start script below to create frontend interface for VMs (OpenNebula)

BackEnd network. Configuration of IPoIB:

Enable IPoIB and switch infiniband to connected mode. This Link about differences of connected or datagram modes.

Start Infiniband services.

Check of working

and

Setup bond1 (mode1) of two IB interfaces. Set up IP 172.19.254.X where X is node number. Example below:

Disable firewall

Tuning sysctl.

Installing Ceph.

Preparation

Configure passwordless access between nodes for user root. The key shoud be created on one node and then copy to other to /root/.ssh/.

Disable Selinux on all nodes

setenforce 0

Add max open files to /etc/security/limits.conf (depends on your requirements) on all nodes

hard nofile 1000000

soft nofile 1000000

Setup /etc/hosts on all nodes:

Installing

Install kernel >3.15 on all nodes (That is needed for using cephFS client)

Set up new kernel for booting.

Reboot.

Set up repository: (on all nodes)

Import gpgkey: (on all nodes)

Setup ntpd. (on all nodes)

Editing /etc/ntp.conf and start ntpd. (on all nodes)

Install: (on all nodes)

Deploying.

MON deploying: (on kosmo-virt1)

OSD deploying:

(on kosmo-virt1)

(on kosmo-virt2)

(on kosmo-virt3)

where sd[b-g] – SSD disks.

MDS deploying:

New giant version of ceph doesn’t have osd pool data and metadata

Use ceph osd lspools to check.

Check pool id of data and metadata with

Configure FS

where 4 – id metadata pool, 3 – id metadata pool

Configure MDS

(on kosmo-virt1)

(on kosmo-virt2)

(on all nodes)

Configure kosmo-arch.

Copy /etc/ceph.conf and /etc/ceph.client.admin.keyring from any of kosmo-virt to kosmo-arch

Preparing Ceph for OpenNebula.

Create pool:

Setup authorization to pool one:

Get key from keyring:

Checking:

Copy /etc/ceph/ceph.client.oneadmin.keyring and /etc/ceph/oneadmin.key to the second node.

Preparing for Opennebula HA

Configuring MariaDB cluster

Configure MariaDB cluster on all nodes except kosmo-arch

Setup repo:

Install:

start service:

prepare for cluster:

configuring cluster: (for kosmo-virt1)

(for kosmo-virt2)

(for kosmo-virt3)

(on kosmo-virt1)

(on kosmo-virt2)

(on kosmo-virt3)

check on all nodes:

| Variable_name | Value | +——————————+————————————–+

wsrep_local_state_uuid

739895d5-d6de-11e4-87f6-3a3244f26574

wsrep_protocol_version

7

wsrep_last_committed

0

wsrep_replicated

0

wsrep_replicated_bytes

0

wsrep_repl_keys

0

wsrep_repl_keys_bytes

0

wsrep_repl_data_bytes

0

wsrep_repl_other_bytes

0

wsrep_received

6

wsrep_received_bytes

425

wsrep_local_commits

0

wsrep_local_cert_failures

0

wsrep_local_replays

0

wsrep_local_send_queue

0

wsrep_local_send_queue_max

1

wsrep_local_send_queue_min

0

wsrep_local_send_queue_avg

0.000000

wsrep_local_recv_queue

0

wsrep_local_recv_queue_max

1

wsrep_local_recv_queue_min

0

wsrep_local_recv_queue_avg

0.000000

wsrep_local_cached_downto

18446744073709551615

wsrep_flow_control_paused_ns

0

wsrep_flow_control_paused

0.000000

wsrep_flow_control_sent

0

wsrep_flow_control_recv

0

wsrep_cert_deps_distance

0.000000

wsrep_apply_oooe

0.000000

wsrep_apply_oool

0.000000

wsrep_apply_window

0.000000

wsrep_commit_oooe

0.000000

wsrep_commit_oool

0.000000

wsrep_commit_window

0.000000

wsrep_local_state

4

wsrep_local_state_comment

Synced

wsrep_cert_index_size

0

wsrep_causal_reads

0

wsrep_cert_interval

0.000000

wsrep_incoming_addresses

172.19.254.1:3306,172.19.254.3:3306,172.19.254.2:3306

wsrep_evs_delayed

wsrep_evs_evict_list

wsrep_evs_repl_latency

0/0/0/0/0

wsrep_evs_state

OPERATIONAL

wsrep_gcomm_uuid

7397d6d6-d6de-11e4-a515-d3302a8c2342

wsrep_cluster_conf_id

2

wsrep_cluster_size

2

wsrep_cluster_state_uuid

739895d5-d6de-11e4-87f6-3a3244f26574

wsrep_cluster_status

Primary

wsrep_connected

ON

wsrep_local_bf_aborts

0

wsrep_local_index

0

wsrep_provider_name

Galera

wsrep_provider_vendor

Codership Oy info@codership.com

wsrep_provider_version

25.3.9(r3387)

wsrep_ready

ON

wsrep_thread_count

2

+——————————+————————————–+

Creating user and database:

Remember, if all nodes will be down, actual node must be started with /etc/init.d/mysql start –wsrep-new-cluster. You should find an actual node. If you start node with not actual view, other nodes will issue error (see logs) – [ERROR] WSREP: gcs/src/gcs_group.cpp:void group_post_state_exchange(gcs_group_t*)():319: Reversing history: 0 → 0, this member has applied 140536161751824 more events than the primary component.Data loss is possible. Aborting.

Configuring HA cluster

Unfortunately pcs cluster conflicts with Opennebula server. That’s why will go with pacemaker,corosync and crmsh.

Installing HA

Set up repo on all nodes except kosmo-arch:

Install on all nodes except kosmo-arch:

On kosmo-virt1 create configuration

and create authkey on kosmo-virt1

Copy corosync and authkey to kosmo-virt2 and kosmo-virt3

Enabling (on all nodes except kosmo-arch):

Starting (on all nodes except kosmo-arch):

Checking:

add properies

Installing Opennebula

Installing

Setup repo on all nodes except kosmo-arch:

Installing (on all nodes except kosmo-arch):

Ruby Runtime Installation:

Change password oneadmin:

Create passworless access for oneadmin (on kosmo-virt1):

Copy to other nodes (remember that oneadmin home directory is /var/lib/one).

Change listen for sunstone-server (on all nodes):

on kosmo-virt1:

copy all /var/lib/one/.one/*.auth and one.key files to OTHER_NODES:/var/lib/one/.one/

Start stop services on kosmo-virt1:

Try to connect to http://node:9869.

Check logs for errors (/var/log/one/oned.log /var/log/one/sched.log /var/log/one/sunstone.log).

If no errors:

Add ceph support for qemu-kvm for all nodes except kosmo-arch

if there is no rbd support than you have to compile and install:

Download:

Compiling.

Change %define rhev 0 to %define rhev 1.

Installing (for all nodes except kosmo-arch).

Check for ceph support.

Try to write image (for all nodes except kosmo-arch):

where N node number.

Add ceph support for libvirt

On all nodes:

On kosmo-virt1 create uuid:

Create secret.xml

Where AQDp1aqz+JPAJhAAIcKf/Of0JfpJRQvfPLqn9Q== is cat /etc/ceph/oneadmin.key.

Copy secret.xml to other nodes.

Add key to libvirt (for all nodes except kosmo-arch)

check

Restart libvirtd:

Convering database to mysql:

Downloading script:

Converting:

Change /etc/one/oned.conf from

to

Copy oned.conf to other nodes as root except kosmo-arch.

Check kosmo-virt2 and kosmo-virt3 nodes in turn:

check logs for errors (/var/log/one/oned.log /var/log/one/sched.log /var/log/one/sunstone.log)

Creating HA resources

On all nodes except kosmo-arch:

From any of the nodes except kosmo-arch:

Check

Configuring OpenNebula

http://active_node:9869 – web management.

With web management. 1. Create Cluster. 2. Add hosts (using 192.168.14.0 networks).

Console management.

3. Add net. (su oneadmin)

4. Create image rbd datastore. (su oneadmin)

5. Create system ceph datastore.

check last id number – N.

on all nodes create directory and mount ceph

where K= IP of curent node.

From one node change permitions:

Create system ceph datastore (su oneadmin):

6. Add nodes, vnets, datastories to created cluster with web management.

HA VM

Here is official doc.

But one comment. I’m using migrate instead of recreate command.

BACKUP

Some words about backup.

Use persistent image type for this work scheme.

For BACKUP was used a single Linux server kosmo-arch (ceph client) with installed zfs on linux. For zpool set ZFS and deduplication on. (Remember that deduplication required about 2GB mem for 1TB storage space.)

Example of simple script that is starting by cron:

Use onevm utility or web-interface (see template) to know which image assigned to VM.

PS

Don’t forget to change storage driver for VM to vda.(Drivers for windows). Without that you will face with low IO performance. (no more than 100 MB/s).

I saw 415MB/s with virtio drivers.

Links.

1. Official documentation Opennebula

2. Official documentation Ceph

3. Convertation sqlite to mysql

4. Convertation Opennebula DB to mysql

5. HA Rhel 7

6. Cluster wirh crmsh

Show more