3.4 2020-05-20T05:55:39Z Clickhouse servers {Clickhouse:ch_params[Uptime].nodata(3m)} 0 {HOST.HOST} clickhouse-server monitoring have no data, possible clickhouse-server is down, check `systemd status clickhouse-server` and check zbx_clickhouse_monitor.sh and `systemd status zabbix-agent` 0 0 4 0 1 {Clickhouse:ch_params[Uptime].last()} <= 600 0 {HOST.HOST} clickhouse-server recently restarted 0 0 2 0 1 {Clickhouse:ch_params[DNSError].last()}>0 or {Clickhouse:ch_params[NetworkErrors].last()}>0 0 {HOST.HOST} clickhouse DNS errors occurred 0 0 2 Please check DNS settings and remote_servers part of /etc/clickhouse-server/ https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/#server-settings-remote-servers https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/#server-settings-disable-internal-dns-cache https://clickhouse.tech/docs/en/query_language/system/#query_language-system-drop-dns-cache 0 1 {Clickhouse:ch_params[Revision].change()}=1 0 {HOST.HOST} clickhouse version changed 0 0 2 0 0 {Clickhouse:ch_params[DistributedConnectionFailAtAll].last()}>=0 or {Clickhouse:ch_params[DistributedConnectionFailTry].last()}>=0 or {Clickhouse:ch_params[DistributedFilesToInsert].last()}>={$MAX_DELAYED_FILES_TO_DISTRIBUTED_INSERT} 0 {HOST.HOST} distributed connection exceptions occurred 0 https://clickhouse.tech/docs/en/operations/table_engines/distributed/ 0 3 please check communications between Clickhouse servers and <remote_servers> in config.xml https://clickhouse.tech/docs/en/operations/table_engines/distributed/ https://clickhouse.tech/docs/en/sql-reference/statements/system/#query-language-system-distributed https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/#server-settings-remote-servers When you insert data to distributed table. Data is written to target *MergreTree tables asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The period for sending data is managed by the distributed_directory_monitor_sleep_time_ms and distributed_directory_monitor_max_sleep_time_ms settings. The Distributed engine sends each file with inserted data separately, but you can enable batch sending of files with the distributed_directory_monitor_batch_inserts setting. 0 0 {Clickhouse:ch_params[DelayedInserts].last()} > 0 0 {HOST.HOST} have INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree, please decrease INSERT frequency 0 https://clickhouse.tech/docs/en/development/architecture/#merge-tree 0 5 INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table. 0 0 {Clickhouse:ch_params[LongestRunningQuery].last()} >= {$MAX_QUERY_TIME} 0 {HOST.HOST} have queries which running more than {$MAX_QUERY_TIME} sec 0 0 2 0 1 {Clickhouse:ch_params[ReadonlyReplica].last()} > 0 0 {HOST.HOST} have read-only replicated tables, check Zookeeper state 0 https://clickhouse.tech/docs/en/operations/table_engines/replication/#recovery-after-failures 0 5 Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured. 0 0 {Clickhouse:ch_params[ReplicasMaxAbsoluteDelay].last(3m)} >= {$MAX_REPLICA_DELAY_DISTRIBUTED_QUERIES} 0 {HOST.HOST} have replication lag more {$MAX_REPLICA_DELAY_DISTRIBUTED_QUERIES} sec 0 https://clickhouse.tech/docs/en/operations/settings/settings/#settings-max_replica_delay_for_distributed_queries 0 4 When replica have too much lag, it can be skipped from Distributed SELECT Queries without errors and you will have wrong query results Check disks and networks on monitored ClickHouse servers 0 1 {Clickhouse:ch_params[HTTPConnection].last()} >= {$MAX_HTTP_CONNECTIONS} 0 {HOST.HOST} HTTP connections >= {$MAX_HTTP_CONNECTIONS} 0 https://clickhouse.tech/docs/en/operations/server_settings/settings/#max-concurrent-queries 0 2 The clickhouse is adapted to run not a very large number of parallel requests, not every HTTP connection means a running sql request, but a large number of open tcp connections can cause a spike in sudden sql requests, resulting in performance degradation. 0 0 {Clickhouse:ch_params[MaxPartCountForPartition].last()} >= {$MAX_PARTS_PER_PARTITION} * 0.9 0 {HOST.HOST} MergeTree parts 90% of {$MAX_PARTS_PER_PARTITION}, please decrease INSERT queries frequency 0 0 4 Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run, and when you have too much unmerged parts inside partition, SELECT queries performance can significate degrade, so clickhouse try delay insert, or abort it 0 0 {Clickhouse:ch_params[MySQLConnection].last()} >= {$MAX_MYSQL_CONNECTIONS} 0 {HOST.HOST} MySQL connections >= {$MAX_MYSQL_CONNECTIONS} 0 https://clickhouse.tech/docs/en/operations/server_settings/settings/#max-concurrent-queries 0 2 The clickhouse is adapted to run not a very large number of parallel requests, not every MySQL connection means a running sql request, but a large number of open tcp connections can cause a spike in sudden sql requests, resulting in performance degradation. 0 0 {Clickhouse:ch_params[InsertQuery].last()}>0 and ( {Clickhouse:ch_params[InsertedRows].last()} / {Clickhouse:ch_params[InsertQuery].last()} ) <= {$MIN_INSERTED_ROWS_PER_QUERY} 0 {HOST.HOST} please increase inserted rows per INSERT query 0 https://clickhouse.tech/docs/en/introduction/performance/#performance-when-inserting-data 0 4 Clickhouse team recommends inserting data in packets of at least 1000 rows or no more than a single request per second. Please use Buffer table https://clickhouse.tech/docs/en/operations/table_engines/buffer/ or https://github.com/nikepan/clickhouse-bulk or https://github.com/VKCOM/kittenhouse 0 0 {Clickhouse:ch_params[Query].last(3m)} >= 0.9 * {$MAX_CONCURRENT_QUERIES} 0 {HOST.HOST} running queries 90% of {$MAX_CONCURRENT_QUERIES} 0 https://clickhouse.tech/docs/en/operations/server_settings/settings/#max-concurrent-queries 0 4 Each concurrent SELECT query use memory in JOINs use CPU for running aggregation function and can read lot of data from disk when scan parts in partitions and utilize disk IO. Each concurrent INSERT query, allocate around 1MB per each column in an inserted table and can utilize disk IO. Look at following documentation parts https://clickhouse.tech/docs/en/operations/settings/query_complexity/ https://clickhouse.tech/docs/en/operations/quotas/ 0 1 {Clickhouse:ch_params[TCPConnection].last()} >= {$MAX_TCP_CONNECTIONS} 0 {HOST.HOST} TCP connections >= {$MAX_TCP_CONNECTIONS} 0 https://clickhouse.tech/docs/en/operations/server_settings/settings/#max-connections 0 2 The clickhouse is adapted to run not a very large number of parallel requests, not every tcp connection means a running sql request, but a large number of open tcp connections can cause a spike in sudden sql requests, resulting in performance degradation. 0 0 Concurrent running queries 600 200 0.0000 100.0000 0 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 DDDD00 0 4 0 Clickhouse ch_params[Write] 1 0 00BB00 0 4 0 Clickhouse ch_params[Read] 2 0 BB0000 0 4 0 Clickhouse ch_params[Query] 3 0 A54F10 1 2 0 Clickhouse ch_params[LongestRunningQuery] Connections 600 200 0.0000 100.0000 1 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 1A7C11 0 2 0 Clickhouse ch_params[TCPConnection] 1 0 F63100 0 2 0 Clickhouse ch_params[HTTPConnection] 2 0 CCCC00 0 2 0 Clickhouse ch_params[MySQLConnection] 3 0 A54F10 0 2 0 Clickhouse ch_params[DistributedSend] Database size 600 200 0.0000 100.0000 0 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 AA0000 0 2 0 Clickhouse ch_params[DiskUsage] Distributed 900 200 0.0000 100.0000 1 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 CC0000 0 2 0 Clickhouse ch_params[DistributedConnectionFailAtAll] 1 0 CCCC00 0 2 0 Clickhouse ch_params[DistributedConnectionFailTry] 2 0 00BB00 1 2 0 Clickhouse ch_params[DistributedFilesToInsert] Finished Queries 600 200 0.0000 100.0000 0 1 1 1 0 0.0000 0.0000 1 0 0 0 0 0 4CAF50 0 2 0 Clickhouse ch_params[SelectQuery] 1 0 DDDD00 0 2 0 Clickhouse ch_params[InsertQuery] Insert / Merge rows/sec 600 200 0.0000 100.0000 0 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 DDDD00 0 2 0 Clickhouse ch_params[InsertedRows] 1 0 CC0000 0 2 0 Clickhouse ch_params[MergedRows] Memory Usage 600 200 0.0000 100.0000 0 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 F63100 0 2 0 Clickhouse ch_params[MemoryTracking] 1 0 FFFF33 0 2 0 Clickhouse ch_params[MemoryTrackingForMerges] 2 0 AAAA00 0 2 0 Clickhouse ch_params[MemoryTrackingInBackgroundProcessingPool] 3 0 000099 0 2 0 Clickhouse ch_params[MemoryTrackingInBackgroundMoveProcessingPool] 4 0 00DDDD 0 2 0 Clickhouse ch_params[MemoryTrackingInBackgroundSchedulePool] Replication 600 200 0.0000 100.0000 1 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 EE0000 0 2 0 Clickhouse ch_params[ReadonlyReplica] 1 0 DDDD00 0 2 0 Clickhouse ch_params[ReplicaPartialShutdown] 2 0 2774A4 1 2 0 Clickhouse ch_params[ReplicasMaxAbsoluteDelay] 3 0 A54F10 0 2 0 Clickhouse ch_params[ReplicasSumQueueSize] Write / Merge bytes/sec 600 200 0.0000 100.0000 0 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 CCCC00 0 2 0 Clickhouse ch_params[InsertedBytes] 1 0 BB0000 0 2 0 Clickhouse ch_params[MergedUncompressedBytes] Zookeeper 600 200 0.0000 100.0000 0 1 0 1 0 0.0000 0.0000 1 0 0 0 0 0 FF3333 0 2 0 Clickhouse ch_params[ZooKeeperHardwareExceptions] 1 0 2774A4 1 2 0 Clickhouse ch_params[ZooKeeperWatch] 2 0 CC0000 0 2 0 Clickhouse ch_params[ZooKeeperOtherExceptions] 3 0 CC0000 0 2 0 Clickhouse ch_params[ZooKeeperUserExceptions]