= Circuit Breakers // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed with this work for additional information // regarding copyright ownership. The ASF licenses this file // to you under the Apache License, Version 2.0 (the // "License"); you may not use this file except in compliance // with the License. You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, // software distributed under the License is distributed on an // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, either express or implied. See the License for the // specific language governing permissions and limitations // under the License. Solr's circuit breaker infrastructure cancels requests when selected system metrics show high load. The circuit breakers should be configured to trigger before the node runs out of resources. The purpose of circuit breakers is to provide good response times for the portion of the load that can be handled and to quickly reject ("fail fast") the portion of the load which is beyond the capacity of the node. This avoids congested overload conditions which can make all requests slow and lead to node failure or a site-wide slowdown. Circuit breakers only interrupt search requests (`SearchHandler`). They are not checked for update requests, admin requests, etc. They are checked for distributed search requests, so they may result in partial failures for multi-shard requests. Circuit breakers are checked once, early in the request evaluation, before significant work is done. Long-running requests will not be interrupted. Servers rejecting traffic with a 503 code may mislead a load balancer into thinking that they are broken, when they are actually intelligently handling overload. This could cause Solr hosts to be dropped from a load balancer, causing a cascading overload on the remaining hosts. Make sure the load balancer is configured to allow the servers to shed excess load with 503 responses. == When To Use Circuit Breakers Circuit breakers should be used when the user wishes to trade request throughput for a higher Solr stability. If circuit breakers are enabled, requests may be rejected under the high node load with an appropriate HTTP error code (typically 503). It is up to the client to handle this error and potentially build a retry logic as this should ideally be a transient situation. == Circuit Breaker Configurations Circuit breakers can be configured for each Solr core. All circuit breaker configurations are listed in the circuitBreaker tags in solrconfig.xml as shown below: [source,xml] ---- ---- The "enabled" attribute controls the activation and deactivation of all circuit breakers for a core. If this flag is disabled, all circuit breakers in the core will be disabled. Each specific circuit breaker must also be individually enabled. CircuitBreakerManager is the default manager for all circuit breakers and should be defined in the tag unless the user wishes to use a custom implementation. == Currently Supported Circuit Breakers === JVM Heap Usage Circuit Breaker This circuit breaker tracks JVM heap memory usage and rejects incoming search requests with a 503 error code if the heap usage exceeds a configured percentage of maximum heap allocated to the JVM (-Xmx). The configuration for this circuit breaker is the threshold percentage where the breaker will trip. Heap memory usage is measured with the JMX metric `MemoryMXBean.getHeapMemoryUsage().getUsed()` as a percentage of `getMax()`. That gives the current percentage heap usage for object allocation, including both live objects and garbage objects that have not been collected, if any. This does not measure off-heap memory usage by the JVM. Configuration for JVM heap usage circuit breaker: [source,xml] ---- true ---- Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag will not help you. The triggering threshold is defined as a percentage of the max heap allocated to the JVM. A value of "0" maps to 0% usage and a value of "100" maps to 100% usage. This circuit breaker will trip when the heap usage is equal to or greater than 75%: [source,xml] ---- 75 ---- It does not logically make sense to have a threshold below 50% or above 95% of the max heap allocated to the JVM. Hence, the range of valid values for this parameter is [50, 95], both inclusive. Consider the following example: JVM has been allocated a maximum heap of 5GB (-Xmx5g) and memoryCircuitBreakerThresholdPct is set to 75. In this scenario, the heap usage at which the circuit breaker will trip is 3.75GB. === System CPU Usage Circuit Breaker This circuit breaker tracks system CPU usage and triggers if the recent CPU usage exceeds a configurable threshold. This is tracked with the JMX metric `OperatingSystemMXBean.getSystemCpuLoad()`. That measures the recent CPU usage for the whole system. A value of 0.0 means that all CPUs were idle, while a value of 1.0 means that all CPUs were actively running 100% of the time. This metric is provided by the `com.sun.management` package, which is not implemented on all JVMs. If the metric is not available, the circuit breaker will fail with the warning "Unable to get CPU usage". For the circuit breaker configuration, a value of "0" maps to 0% usage and a value of "100" maps to 100% usage. Configuration for CPU usage circuit breaker: [source,xml] ---- true ---- Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag will not help you. The triggering threshold is defined in percent CPU usage. A value of "0" maps to 0% usage and a value of "100" maps to 100% usage. This circuit breaker will trip when the CPU usage is equal to or greater than 75%: [source,xml] ---- 75 ---- === System Load Average Circuit Breaker This circuit breaker tracks system load average and triggers if the recent load average exceeds a configurable threshold. This is tracked with the JMX metric `OperatingSystemMXBean.getSystemLoadAverage()`. That measures the recent load average for the whole system. A "load average" is the number of processes using or waiting for a CPU, usually averaged over one minute. Some systems include processes waiting on IO in the load average. Check the documentation for your system and JVM to understand this metric. For more information, see the https://en.wikipedia.org/wiki/Load_(computing)[Wikipedia page for Load], Configuration for load average circuit breaker: [source,xml] ---- true ---- Note that this configuration will be overridden by the global circuit breaker flag -- if circuit breakers are disabled, this flag will not help you. The triggering threshold is a floating point number matching load average. This circuit breaker will trip when the load average is equal to or greater than 8.0: [source,xml] ---- 8.0 ---- == Performance Considerations While JVM or CPU circuit breakers do not add any noticeable overhead per query, having too many circuit breakers checked for a single request can cause a performance overhead. In addition, it is a good practice for clients to exponentially back off when retrying requests that return 503 or other busy errors.