Mysqlperformanceblog.com

Webinar Q/A: MySQL High Availability with Percona XtraDB Cluster 5.7

2016-11-22

In this blog I will provide answers to the questions and queries that some of you have raised during the webinar on Nov 17th.

I would like to say thank you to all of the audience who attended the talk on November 17, 2016. You can also check the recording and slides here.

Q. How is storage distribution done across the node?

A. Each node has independent storage and other resources. There is no sharing of resource. Only the write-sets are replicated.

Q. If write-set propagation fails in some manner is there any retry mechanism?

A. write-set are written to group channel and originating node waits for ack from all the nodes of the cluster. If some nodes fails to respond back then it may be loose its cluster membership. Each node needs to consume all write-sets and in given order only.

Q. Normally, we point only to one write node, can we point in Percona XtraDB Cluster 5.7 to two writing nodes balanced ? Or should the solution be ProxySQL ?

A. Percona XtraDB Cluster (PXC) being multi-master you can execute writes on multiple-nodes. (This is possible even with 5.6). ProxySQL will help you load-balance your traffic but facility to write to any node is inherent to PXC.

Q. Which service call does a joining node have to be to get cluster membership? Is there some kind of registry service?

A. There is no special registry service. This is transparent to the end-user and is handled as part of gcomm communication layer.

Q. Would it be possible to get more information about setting up proxy-sql as we are currently using haproxy but would like a more aware balancer.

A. These articles should help:

https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/proxysql.html

https://www.percona.com/blog/2016/09/15/proxysql-percona-cluster-galera-integration/

Q. Is there a recommended setup for Cluster (White Paper)? I did hear a lot of conflict issues between nodes. So I would like to see if there is a recommended setup.

A. There is not a single way to do this but there are a lot of blogs based on your use-case. Simplest one is 3 node cluster in LAN. Conflicts generally happens if user tend to update same data through multiple nodes. Dis-joint workload distribution will help avoid conflict. Said that if conflicts are inherent part of application or workload Percona XtraDB Cluster (PXC) is well armed to handle it.

Q. What is best way to figure out timeouts for geo clusters?

A. Studying latency and ensuring timeout > latency.

Q. Lets say we are running Percona XtraDB Cluster 5.6 version with 2 cluster. Can i join new node with latest version of Percona XtraDB Cluster 5.7?

A. This scenario is possible as part of Percona XtraDB Cluster (PXC) support rolling upgrade a new node demanding SST from 5.6 node will surely not work. Also, this should be a temporary solution with plan for full upgrade not something you want to continue working with.

Q. Currently i am using Percona XtraDB Cluster 5.6. Mostly i am facing a deadlock situation. When insert query is running on big table. Then Percona trys to synch with another node. At that time ant dml query won’t be executed. So at that time i need to shutdown another node. Then query execution will be fine. Then i need to start another node one by node. I even changed may Gelera/percona wrep_xx configuration, but it did not work. So is this kind of issue solved in Percona XtraDB Cluster 5.7?

A. I am not sure I understood the complete setup but let me try to summarize my understanding. You have DML running on node-1 that is replication to node-2 and node-2 workload is trying to touch the same big-table that is getting replicated write-set. Local transaction may face a abort as replicated transaction always take priority over local running transaction. There shouldn’t be a need to shutdown any of the node. If you still face this problem you can file the detailed report on lp or forum. We can discuss what is going wrong.

Q. I need to make DR platform. which replication will be suitable for this. Do i need to upgrade with Percona XtraDB Cluster 5.7 at DR side or Replication manager requires?

A. For DR you can either use extended cluster so that DR site get instant write-set or setup a new cluster and enable cluster-cluster replication using MySQL MASTER-MASTER async replication. (Given DR one way MASTER-SLAVE should also work). You don’t need to upgrade it but it is better to use consistent and updated version for all node especially mix-match of MASTER-SLAVE may have compatibility issue.

Q. What are the major differences/benefits between Percona XtraDB Cluster 5.7 and MariaDB Cluster with Galera ?

A. Percona XtraDB Cluster (PXC) is 5.7 GA. MariaDB 10.2 is proposed to be GA by Dec 2016. Besides this PXC is fully PS compatible that uses XtraDB engine and there are some small functional/usage difference and stability difference.

Q. How much time a node can be out of a cluster and still can rejoin applying writesets ? How is managed writesets retention ?

A. Time node can be offline without need for SST depends on 2 factors: rate of replicating transaction (including size) and size of galera-cache that caches these write-sets. If you think you need longer offline time and then you should set galera cache accordingly.

Q. Can we have a sample config file for geo-clusters?

A. We will try to come up with one in due-course through an upcoming blog. In the meantime, you can look at existing blogs on the Percona Database Performance blog.

Q. Whats is the limit for max_rows and max_tnx_size in Percona XtraDB Cluster (PXC) 5.7..specially for batch datalaods across multi-region cluster nodes

A. wsrep_max_ws_rows (DEFAULT 0: no limit, max: 1048576). wsrep_max_ws_size (DEFAULT: 2G, range: 1024, 2G)

Q: Does Percona XtraDB Cluster (PXC) support MySQL’s GTIDs?

A. Yes. But for Percona XtraDB Cluster (PXC) replication it uses its own GTID. This blog will help clear confusion.

Q. How does Percona XtraDB Cluster (PXC) compare to MySQL’s Group Replication?

A. Both are trying to solve the same problem, except Percona XtraDB Cluster (PXC) is matured and has been in market for quite sometime. GR is being built.

Q. Does Percona XtraDB Cluster (PXC) have a size limitations? I recently tried to setup a 2TB PXC cluster, however, during load tests there were a few instances where one node got out of sync. The server did a full copy of the data, but could not complete because the load tests kept filling up the gcache.

A. There is no such known limitation. Generally if the node received queue fills up then it will emit a FLOW CONTROL signal. Generally you will receive a queue that is small enough not to fill up gcache. If you still have log files you can share them through LP or forum. We will try to look at them.

Q. How do you perform a major version upgrade. Per MySQL’s documentation, you can not replicate from a major version to the last major version. But it is fine to replicate from one major version to the next. So how would you do this in the cluster?

A. As per MySQL you may face issues if you try to replicate from lower version (master in 5.6) to higher version slave (slave in 5.7) but it is not blocked. Some of the semantics may be different. Percona XtraDB Cluster (PXC) write-sets are different though as it shares binlog events and this write-set format has not changed in 5.7.

Q. Does Galera set a max number of nodes that can be part of the cluster?

A. No such realistic limitation.

Q. Are there docker images with this configured? Dockerhub or something?

A. This should help.

Q. What is the maximum latency that would be supported on the LAN before you would say that running a Percona XtraDB Cluster is not a good idea?

A. I guess this is configurable based on timeout. So there is no such recommended latency threshold for LAN. Lesser the better.

Q. When you start a cluster and bootstrap Node 1, then start Node 2 and Node 3. If you restart Node 1, it will rejoin the cluster but not has a bootstrap state, but it does not matter because it will join a live cluster. If my understanding is correct Bootstrap only matter for the first node starting Is that correct ? What would happens if node 1 restart with bootstrap option, will it force the other node to sync against it ? will it join the running cluster?

A. When you start node-1 for the first time it will create a new cluster and node-2 and node-3 will join the existing cluster. Depending on how node-1 is restarted it can join the existing cluster or create one more independent cluster. Recommended way is to use a valid value of wsrep_cluster_address for all nodes and just pass following extra param –wsrep_new_cluster to the bootstrap node. If you happen to restart this node avoid passing this param. The node will try to join the existing cluster.

Q. What is the overhead of running Percona Monitoring and Management (PMM)

A. Percona Monitoring and Management (PMM) installs an agent on the node to collect a lot of other statistics. From Percona XtraDB Cluster (PXC) perspective it will only run to show a status, so pretty lightweight for PXC.

Q. Is it easy (any procedure) to move from codership galera to Percona XtraDB Cluster (PXC)?

A. I don’t think there is blog about it but they are fully compatible so moving should be easy. I will findout if there is set process for this.

Q. Where is the documentation for Cluster Safe Mode and other new features discussed here?

A. pxc_strict_mode. for PFS you can check this out. ProxySQL and Percona Monitoring and Management (PMM) has blog too.

Q. Is there some integrity issues that a client believes a node is up while this one has lost the cluster ?

A. No known issue.

Q. Is there any limit of running a huge number of databases ? Say several millions ?

A. No known issue.

Q. How are the performance of proxy sql compared with ha proxy?

A. You can check this out.

Q. We use Nagios for monitoring, will a plug-in be added for monitoring the cluster, or will it be only Percona Monitoring and Management (PMM)?

A. Check this out.

Q. “Cross data center replication”. We have two data centers that have a ping latency of 2ms (consistent) and I would like to replicate between the two for DR (disaster recovery) purposes.

A. 2 ms latency between 2 DC and consistent network sounds pretty good. Just tune timeout and things will work.

Q. Do you guys have a sample config files for a quick spin off of a 3 node cluster?

A. This should help.

Q. i see that there is added features like pam authentication,thraed pool which is given fro free in percona can you elobrate on it

A. Percona XtraDB Cluster (PXC) is PS compatible. So any feature that is present in PS will be part of Percona XtraDB Cluster (PXC).

Q. In the example that you showed, where you had a 6 node cluster , where 3 was in Site A and 3 was in Site B. If the WAN link goes down, how does the cluster determine what data set is the master set, once the wan link comes back up after a few hours?

A. In the example I have used 2 DCs. Recommended is to use 3 DCs to avoid split-brain. If you have 6 nodes in 2 DCs and WAN link goes off it will create split-brain and no node will accept workload unless user set weight to form quorum or re-bootstrap primary.

I hope I was able to answer most of the questions/queries. If you happen to have follow-up questions please post them on forum.