# Deploying high availability solution with Group Replication This document provides step-by-step instructions on how to deploy high availability solution with Group Replication. ## Preconditions We will use the following elements: * 1 Virtual IP for ProxySQL failover - 192.168.4.194 * 2 ProxySQL nodes - Proxy1 192.168.4.191 - Proxy2 192.168.4.192 * 4 MySQL nodes in Single Primary mode * Gr1 192.168.4.81 - Initial Primary * Gr2 192.168.4.82 - Replica / failover * Gr3 192.168.4.83 - Replica / failover * Gr4 192.168.4.84 - Replica / Backup * All of the following ports must be open if a firewall is in place or any other restriction like AppArmor or SELinux. * ProxySQL: * 6033 * 6032 * 3306 * MySQL - Group Replication: * 3306 * 33060 * 33061 ## Nodes configuration ### Preparation 1. Install Percona Server-based variant of Percona Distribution for MySQL on each MySQL node (Gr1-Gr4). 2. Make sure that all the nodes use the same time-zone and time ```{.bash data-prompt="$"} $ date Tue Aug 18 08:22:12 EDT 2020 ``` 3. Also check that `ntpd` service is present and enabled 4. Make sure that each node resolves the other nodes by name ```sh for i in 1 2 3 4 ; do ping -c 1 gr$i > /dev/null;echo $?; done ``` If nodes aren’t able to resolve, add the entries in the `/etc/hosts` file. 5. After instances are up and running, check *Percona Server for MySQL* version on each node: ```{.bash data-prompt="mysql>"} mysql>\s -------------- /opt/mysql_templates/PS-8P/bin/mysql Ver 8.0.20-11 for Linux on x86_64 (Percona Server (GPL), Release 11, Revision 159f0eb) ``` ### Step 1 Create an administration user 1. Create a user for administration. We will use the user `dba` in our setup: ```sql CREATE user dba@localhost identified by 'dbapw'; CREATE user dba@'192.168.%' identified by 'dbapw'; GRANT ALL on *.* to dba@localhost with grant option; GRANT ALL on *.* to dba@'192.168.%' with grant option; ``` Log out from the client as the root user and log in as the `dba` user. 2. Make sure to have a good and unique SERVER_ID value: ```{.bash data-prompt="mysql>"} mysql> show global variables like 'server_id'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | server_id | 1 | +---------------+-------+ 1 row in set (0.01 sec) ``` The `server_id` value must be unique on each node ### Step 2. Add Group Replication settings 1. Stop all the nodes ```{.bash data-prompt="$"} $ service mysql stop ``` 2. In the `my.cnf` configuration file, add the following: ``` ##################### #Replication + binlog settings ##################### auto-increment-increment =1 auto-increment-offset =1 log-bin =/binlog log-bin-index =binlog.index binlog-checksum =NONE binlog-format =ROW binlog-row-image =FULL log-slave-updates =1 binlog-transaction-dependency-tracking =WRITESET_SESSION enforce-gtid-consistency =TRUE gtid-mode =ON master-info-file =master.info master-info-repository =TABLE relay_log_info_repository =TABLE relay-log =/relay sync-binlog =1 ### SLAVE SECTION skip-slave-start slave-parallel-type = LOGICAL_CLOCK slave-parallel-workers = 4 slave-preserve-commit-order = 1 ###################################### #Group Replication ###################################### plugin_load_add ='group_replication.so' plugin-load-add ='mysql_clone.so' group_replication_group_name ="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" #<-- Not good. Use something that will help you to identify the GR transactions and from where they come from IE "dc1euz1-aaaa-aaaa-aaaa-aaaaaaaaaaaa" group_replication_start_on_boot =off group_replication_local_address = "192.168.4.81/2/3/4:33061" <---- CHANGE THIS TO MATCH EACH NODE LOCAL IP group_replication_group_seeds = "192.168.4.81:33061,192.168.4.82:33061,192.168.4.83:33061,192.168.4.84:33061" group_replication_bootstrap_group = off transaction-write-set-extraction = XXHASH64 ``` 3. Restart all nodes: ``` $ service mysql start ``` 4. Connect to the nodes ### Step 3. Create a replication user 1. On every node, create a user for replication ```sql SET SQL_LOG_BIN=0; CREATE USER replica@'192.168.4.%' IDENTIFIED BY 'replicapw'; #<--- Please note the filter by IP is more restrictive GRANT REPLICATION SLAVE ON *.* TO replica@'192.168.4.%'; FLUSH PRIVILEGES; SET SQL_LOG_BIN=1; ``` 2. Link the nodes with the replication channel. ```sql CHANGE MASTER TO MASTER_USER='replica', MASTER_PASSWORD='replicapw' FOR CHANNEL 'group_replication_recovery'; ``` Run this command on all nodes. 3. Check the current status: ```sql (dba@node1) [(none)]>\u performance_schema (dba@node1) [performance_schema]>show tables like '%repl%'; +-------------------------------------------+ | Tables_in_performance_schema (%repl%) | +-------------------------------------------+ | replication_applier_configuration | | replication_applier_filters | | replication_applier_global_filters | | replication_applier_status | | replication_applier_status_by_coordinator | | replication_applier_status_by_worker | | replication_connection_configuration | | replication_connection_status | | replication_group_member_stats | | replication_group_members | <------------------------ +-------------------------------------------+ (dba@node1) [performance_schema]>select * from replication_group_members\G CHANNEL_NAME: group_replication_applier MEMBER_ID: MEMBER_HOST: MEMBER_PORT: MEMBER_STATE: MEMBER_ROLE: OFFLINE MEMBER_VERSION: 1 row in set (0.00 sec) ``` At this stage, you should be able to start the first (Primary) cluster node. 4. Start the Primary node (Gr1) and enable Group Replication: ```sql (dba@node1)[none]> SET GLOBAL group_replication_bootstrap_group=ON; (dba@node1)[none]> START GROUP_REPLICATION; (dba@node1)[none]> SET GLOBAL group_replication_bootstrap_group=OFF; ``` 5. Check if the node registered correctly: ```sql (dba@node1) [none]>select * from performance_schema.replication_group_members\G CHANNEL_NAME: group_replication_applier MEMBER_ID: 90a353b8-e6dc-11ea-98fa-08002734ed50 MEMBER_HOST: gr1 MEMBER_PORT: 3306 MEMBER_STATE: ONLINE MEMBER_ROLE: PRIMARY MEMBER_VERSION: 8.0.20 ``` 6. Once the Primary node is running, connect to the secondary node (Gr2 node) and enable Group Replication: ```sh (dba@node2) [none]>START GROUP_REPLICATION; Query OK, 0 rows affected (4.60 sec) ``` 7. Check if the secondary node registered correctly: ```sql (dba@node2) [performance_schema]>select * from replication_group_members\G *************************** 1. row *************************** CHANNEL_NAME: group_replication_applier MEMBER_ID: 58ffd118-e6dc-11ea-8af8-08002734ed50 MEMBER_HOST: gr2 MEMBER_PORT: 3306 MEMBER_STATE: ONLINE MEMBER_ROLE: SECONDARY MEMBER_VERSION: 8.0.20 *************************** 2. row *************************** CHANNEL_NAME: group_replication_applier MEMBER_ID: 90a353b8-e6dc-11ea-98fa-08002734ed50 MEMBER_HOST: gr1 MEMBER_PORT: 3306 MEMBER_STATE: ONLINE MEMBER_ROLE: PRIMARY MEMBER_VERSION: 8.0.20 ``` 8. Test the replication: * On the Primary node, run the following command: ```sql (dba@node1) [performance_schema]>create schema test; Query OK, 1 row affected (0.76 sec) (dba@node1) [performance_schema]>\u test Database changed (dba@node1) [test]>create table test1 (`id` int auto_increment primary key); Query OK, 0 rows affected (0.32 sec) (dba@node1) [test]>insert into test1 values(null); Query OK, 1 row affected (0.34 sec) ``` * On the secondary node: ```sql (dba@node2) [performance_schema]>use \test Database changed (dba@node2) [test]>select * from test1; +----+ | id | +----+ | 1 | +----+ 1 row in set (0.00 sec) ``` 9. Start Group Replication on the remaining nodes ```sql (dba@node3) [performance_schema]>START GROUP_REPLICATION; (dba@node4) [performance_schema]>START GROUP_REPLICATION; ``` ## Proxy setup ### Step 1. Installation 1. [Install ProxySQL](https://www.percona.com/doc/proxysql/proxysql-v2.html#installing-proxysql-v2). [In our example](#predconditions), we install ProxySQL on Proxy1 192.168.4.191 and Proxy2 192.168.4.192 nodes. 2. Create the monitoring user on MySQL Group Replication nodes: ```sql create user monitor@'192.168.4.%' identified by 'monitor'; grant usage on *.* to 'monitor'@'192.168.4.%'; grant select on sys.* to 'monitor'@'192.168.4.%'; ``` 3. Define basic variables: ```sql update global_variables set Variable_Value='admin:admin;cluster1:clusterpass' where Variable_name='admin-admin_credentials'; update global_variables set variable_value='cluster1' where variable_name='admin-cluster_username'; update global_variables set variable_value='clusterpass' where variable_name='admin-cluster_password'; update global_variables set Variable_Value=0 where Variable_name='mysql-hostgroup_manager_verbose'; update global_variables set Variable_Value='true' where Variable_name='mysql-query_digests_normalize_digest_text'; update global_variables set Variable_Value='8.0.20' where Variable_name='mysql-server_version'; update global_variables set Variable_Value='utf8' where Variable_name='mysql-default_charset'; update global_variables set Variable_Value=300 where Variable_name='mysql-tcp_keepalive_time'; update global_variables set Variable_Value='true' where Variable_name='mysql-use_tcp_keepalive'; update global_variables set Variable_Value='true' where Variable_name='mysql-verbose_query_error'; update global_variables set Variable_Value='true' where Variable_name='mysql-show_processlist_extended'; update global_variables set Variable_Value=50000 where Variable_name='mysql-max_stmts_cache'; update global_variables set Variable_Value='false' where Variable_name='admin-web_enabled'; update global_variables set Variable_Value='0' where Variable_name='mysql-set_query_lock_on_hostgroup'; load admin variables to run;save admin variables to disk; load mysql variables to run;save mysql variables to disk; ``` !!! note The user name and password need to reflect your standards. The ones used above are just an example. 4. Set up the nodes as a cluster: ```sql INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES('192.168.4.191',6032,100,'PRIMARY'); INSERT INTO proxysql_servers (hostname,port,weight,comment) VALUES('192.168.4.192',6032,100,'SECONDARY'); load proxysql servers to run;save proxysql servers to disk; ``` ### Step 2. Define users, servers and query rules for read / write split 1. Create one or more valid users. Define these user(s). For example, if you have a user named `app_gr` with the password `test`, and that has access to your Group Replication cluster, the command to define the user is the following: ```sql insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent,comment) values ('app_gr','test',1,400,'mysql',1,'application test user GR'); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK; ``` 2. Define servers: ```sql INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.81',400,3306,10000,2000,'GR1'); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.81',401,3306,100,2000,'GR1'); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.82',401,3306,10000,2000,'GR2'); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.83',401,3306,10000,2000,'GR2'); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.84',401,3306,1,2000,'GR2'); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; ``` 3. Define query rules to get read / write split: ```sql INSERT INTO mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(4040,6033,'app_gr',400,1,3,'^SELECT.*FOR UPDATE',1); INSERT INTO mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,multiplex,apply) values(4042,6033,'app_gr',401,1,3,'^SELECT.*$',1,1); LOAD MYSQL QUERY RULES TO RUN;SAVE MYSQL QUERY RULES TO DISK; ``` ### Step 3. Create a view in SYS schema Once all the configuration is ready, we need to have a special view in the SYS schema in Percona server nodes. Find the view working for the server version 8 and above [here](https://github.com/Percona-Lab/group_replication_tools/blob/master/GR_sys_view_forProxysql_v1.sql). Run that sql on the **Primary** node of the Group Replication cluster. ### Step 4. Activate support for Group Replication in ProxySQL To activate the native support for Group Replication in ProxySQL, we will use the following group definition: ``` Writer HG-> 400 Reader HG-> 401 BackupW HG-> 402 Offline HG-> 9401 ``` ```sql INSERT INTO mysql_group_replication_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup, offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) values (400,402,401,9401,1,1,1,100); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; ``` #### Comments about parameters To obtain the most reliable results, we recommend setting the number of writers always to 1, and `writer_is_also_reader` to 1 as well. ``` max_writers: 1 writer_is_also_reader: 1 ``` The `max_transactions_behind` is a subjective parameter that you should calculate on the basis of your needs. If, for instance, you cannot have a stale read, it will be safe to set this value to a low number (i.e. 50) and to set in all Group Replication nodes: ```sql set global group_replication_consistency=AFTER; ``` If instead, you have no issue or strict requirements about some stale read, you can relax the parameter and ignore the `group_replication_consistency` setting. Our recommended setting is `group_replication_consistency=AFTER` and `max_transactions_behind: 100`. !!! admonition "See also" ProxySQL Documentation: [mysql_group_replication_hostgroups](https://proxysql.com/documentation/main-runtime/#mysql_group_replication_hostgroups) ### Step 5. Enable high availability for ProxySQL `keepalived` will be used to enable High Availability for ProxySQL. 1. Install `keepalived` on each ProxySQL node using the package manager of your operating system: === "on Debian/Ubuntu" ```{.bash data-prompt="$"} $ sudo apt install -y keepalived ``` === "On RHEL/derivatives" ```{.bash data-prompt="$"} $ sudo yum install -y keepalived ``` 2. Modify the `/etc/keepalived/keepalived.conf` file accordingly to your setup. In our case: * Proxy1 192.168.4.0/24 dev enp0s9 proto kernel scope link src 192.168.4.191 * Proxy2 192.168.4.0/24 dev enp0s9 proto kernel scope link src 192.168.4.192 * VIP 192.168.4.194 Let’s say Proxy1 is the primary node while Proxy2 is the secondary node. Given that, the config file looks as follows: ```init global_defs { # Keepalived process identifier router_id proxy_HA } # Script used to check if Proxy is running vrrp_script check_proxy { script "killall -0 proxysql" interval 2 weight 2 } # Virtual interface # The priority specifies the order in which the assigned interface to take over in a failover vrrp_instance VI_01 { state MASTER interface enp0s9 virtual_router_id 51 priority 100 <----- This needs to be different for each ProxySQL node, like 100/99 # The virtual ip address shared between the two load balancers virtual_ipaddress { 192.168.4.194 dev enp0s9 } track_script { check_proxy } } ``` 3. Start the `keepalived` service. From now on, the VIP will be associated with the Proxy1 unless the service is down. ## Disaster recovery implementation The implementation of a DR (Disaster Recovery) site will follow the same direction provided for the main site. There are only some generic rules to follow: * A DR site should be located in a different geographic location than the main site (several hundred kilometers/miles away). * The connection link between the main site and the DR site can only be established using *asynchronous replication* (standard MySQL [replication setup](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) ). ## Monitoring ### Using Percona Management and Monitoring (PMM) 1. Use [this quickstart](https://www.percona.com/software/pmm/quickstart) to install Percona Monitoring and Management (PMM). 2. Specify the `replication_set` flag when registering the *Percona Server for MySQL* node or the MySQL node in PMM: ```sh pmm-admin add mysql --username=pmm --password=pmm --query-source=perfschema --replication-set=gr_test_lab group_rep4 127.0.0.1:3306 ``` Then you can use the Group Replication Dashboard and monitor your cluster with a lot of details. The dashboard sections are the following: 1. Overview: ![image](../_images/pmm_for_gr_overiew.png) 2. Replication delay details ![image](../_images/pmm_for_gr.png) 3. Transactions ![image](../_images/pmm_for_gr_trx.png) 4. Conflicts ![image](../_images/pmm_for_gr_conflicts.png) ### Using command line From the command line, you need to manually query the tables in Performance schema: ``` +----------------------------------------------+ | replication_applier_configuration | | replication_applier_filters | | replication_applier_global_filters | | replication_applier_status | | replication_applier_status_by_coordinator | | replication_applier_status_by_worker | | replication_connection_configuration | | replication_connection_status | | replication_group_member_stats | | replication_group_members | +----------------------------------------------+ ``` For example, use this command to get the lag in number of transactions on a node: ```sql select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag; +---------------+ | last_executed | +---------------+ | 125624 | +---------------+ 1 row in set, 1 warning (0.03 sec) +---------------+ | last_received | +---------------+ | 125624 | +---------------+ 1 row in set, 1 warning (0.00 sec) +----------+ | real_lag | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) ``` You can use a more composite query to get information about each applier: ```sql SELECT conn_status.channel_name as channel_name, conn_status.service_state as IO_thread, applier_status.service_state as SQL_thread, conn_status.LAST_QUEUED_TRANSACTION as last_queued_transaction, applier_status.LAST_APPLIED_TRANSACTION as last_applied_transaction, LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP - LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP 'rep delay (sec)', LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP - LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP 'transport time', LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP - LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP 'time RL', LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP - LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP 'apply time', if(GTID_SUBTRACT(LAST_QUEUED_TRANSACTION, LAST_APPLIED_TRANSACTION) = "","0" , abs(time_to_sec(if(time_to_sec(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP)=0,0,timediff(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,now()))))) `lag_in_sec` FROM performance_schema.replication_connection_status AS conn_status JOIN performance_schema.replication_applier_status_by_worker AS applier_status ON applier_status.channel_name = conn_status.channel_name ORDER BY lag_in_sec, lag_in_sec desc\G ``` Output: ```sql *************************** 1. row *************************** channel_name: group_replication_applier IO_thread: ON SQL_thread: ON last_queued_transaction: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:125624 last_applied_transaction: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:125621 rep delay (sec): 3.153038 transport time: 0.061327 time RL: 0.001005 apply time: 0.388680 lag_in_sec: 0 ``` !!! admonition "Based on the material from Percona Database Performance Blog" This document is based on the blog post [Percona Distribution for MySQL: High Availability with Group Replication Solution](https://www.percona.com/blog/2021/04/14/percona-distribution-for-mysql-high-availability-with-group-replication-solution/) by *Marco Tusa*