--- description: "Deploy Conductor as a self-hosted workflow engine in production — architecture overview, horizontal scaling, database, queue, indexing, and lock configuration, workflow monitoring, and recommended production deployment settings for this open source workflow orchestration platform." --- # Self-hosted deployment guide Conductor is a self-hosted, open source workflow engine that you deploy on your own infrastructure. This production deployment guide covers everything you need to run Conductor at scale: architecture, backend configuration, horizontal scaling, workflow monitoring, and tuning. ## Architecture overview A Conductor deployment consists of these components: ![Conductor Architecture](../architecture/conductor-architecture.png) **What each component does:** | Component | Role | |:--|:--| | **API Server** | Exposes REST and gRPC endpoints for workflow and task operations. | | **Decider** | The core state machine. Evaluates workflow state and schedules the next set of tasks. | | **Sweeper** | Background process that polls for running workflows and triggers the decider to evaluate them. Required for progress on long-running workflows. | | **System Task Workers** | Execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ, etc.) within the server JVM. | | **Event Processor** | Listens to configured event buses and triggers workflows or completes tasks based on incoming events. | | **Database** | Persists workflow definitions, execution state, task state, and poll data. | | **Queue** | Manages task scheduling — pending tasks, delayed tasks, and the sweeper's own work queue. | | **Index** | Powers workflow and task search in the UI and via the search API. | | **Lock** | Distributed lock that prevents concurrent decider evaluations of the same workflow. **Required in production.** | --- ## Quick start with Docker Compose For local development and evaluation: ```shell git clone https://github.com/conductor-oss/conductor cd conductor docker compose -f docker/docker-compose.yaml up ``` This starts Conductor with Redis (database + queue), Elasticsearch (indexing), and the server with UI on port **8080**. | URL | Description | |:----|:---| | `http://localhost:8080` | Conductor UI | | `http://localhost:8080/swagger-ui/index.html` | REST API docs | | `http://localhost:8080/api/` | API base URL | Pre-built compose files for other backend combinations: | Compose file | Database | Queue | Index | |:--|:--|:--|:--| | `docker-compose.yaml` | Redis | Redis | Elasticsearch 7 | | `docker-compose-es8.yaml` | Redis | Redis | Elasticsearch 8 | | `docker-compose-postgres.yaml` | PostgreSQL | PostgreSQL | PostgreSQL | | `docker-compose-postgres-es7.yaml` | PostgreSQL | PostgreSQL | Elasticsearch 7 | | `docker-compose-mysql.yaml` | MySQL | Redis | Elasticsearch 7 | | `docker-compose-redis-os2.yaml` | Redis | Redis | OpenSearch 2 | | `docker-compose-redis-os3.yaml` | Redis | Redis | OpenSearch 3 | ```shell # Example: PostgreSQL for everything docker compose -f docker/docker-compose-postgres.yaml up # Example: Redis + Elasticsearch 8 docker compose -f docker/docker-compose-es8.yaml up # Example: Redis + OpenSearch 3 docker compose -f docker/docker-compose-redis-os3.yaml up ``` For Elasticsearch 8, set `conductor.indexing.type=elasticsearch8` and use `config-redis-es8.properties` or an equivalent custom config. --- ## Production configuration All configuration is done via Spring Boot properties in `application.properties` or environment variables. Properties can also be mounted as a Docker volume. ### Database The database stores workflow definitions, execution state, task state, and event handler definitions. ```properties conductor.db.type=postgres ``` **Supported database backends:** | Backend | Property value | When to use | Notes | |:--|:--|:--|:--| | PostgreSQL | `postgres` | **Recommended for production.** ACID, battle-tested, supports indexing too. | Requires `spring.datasource.*` config. | | MySQL | `mysql` | Production alternative if your team already runs MySQL. | Requires `spring.datasource.*` config. Needs separate queue backend (Redis). | | Redis | `redis_standalone` | Fast, simple. Good for moderate scale. | Requires `conductor.redis.*` config. | | Cassandra | `cassandra` | High write throughput, multi-region. | Requires `conductor.cassandra.*` config. | | SQLite | `sqlite` | **Local development only.** Single-file, zero config. | Default. Not for production. | #### PostgreSQL ```properties conductor.db.type=postgres conductor.external-payload-storage.type=postgres spring.datasource.url=jdbc:postgresql://db-host:5432/conductor spring.datasource.username=conductor spring.datasource.password= # Optional tuning conductor.postgres.deadlockRetryMax=3 conductor.postgres.taskDefCacheRefreshInterval=60s conductor.postgres.asyncMaxPoolSize=12 conductor.postgres.asyncWorkerQueueSize=100 ``` #### MySQL ```properties conductor.db.type=mysql spring.datasource.url=jdbc:mysql://db-host:3306/conductor spring.datasource.username=conductor spring.datasource.password= # Optional tuning conductor.mysql.deadlockRetryMax=3 conductor.mysql.taskDefCacheRefreshInterval=60s ``` #### Redis ```properties conductor.db.type=redis_standalone # Format: host:port:rack (semicolon-separated for multiple hosts) conductor.redis.hosts=redis-host:6379:us-east-1c conductor.redis.workflowNamespacePrefix=conductor conductor.redis.queueNamespacePrefix=conductor_queues conductor.redis.taskDefCacheRefreshInterval=1s # Connection pool conductor.redis.maxIdleConnections=8 conductor.redis.minIdleConnections=5 # SSL conductor.redis.ssl=false # Auth (password is taken from the first host entry: host:port:rack:password) # Or set conductor.redis.username and conductor.redis.password directly ``` --- ### Queue The queue backend manages task scheduling — it tracks which tasks are pending, delayed, or ready for execution. The sweeper and system task workers all depend on it. ```properties conductor.queue.type=postgres ``` **Supported queue backends:** | Backend | Property value | When to use | |:--|:--|:--| | PostgreSQL | `postgres` | Use when database is also PostgreSQL. Simplest stack. | | Redis | `redis_standalone` | Use when database is Redis or MySQL. Fast, low-latency. | | SQLite | `sqlite` | Local development only. | !!! tip "Match your queue backend to your database" PostgreSQL database + PostgreSQL queue is the simplest production stack — one fewer dependency. If you use MySQL for the database, pair it with Redis for the queue. --- ### Indexing The indexing backend powers workflow and task search in the UI and via the `/api/workflow/search` and `/api/tasks/search` endpoints. ```properties conductor.indexing.enabled=true conductor.indexing.type=postgres ``` **Supported indexing backends:** | Backend | Property value | When to use | Notes | |:--|:--|:--|:--| | PostgreSQL | `postgres` | Simplest stack when database is also PostgreSQL. | Set `conductor.elasticsearch.version=0` to disable ES client. | | Elasticsearch 7 | `elasticsearch` | Best search performance at scale. Full-text search. | Set `conductor.elasticsearch.version=7`. | | Elasticsearch 8 | `elasticsearch8` | Use when running the ES8 persistence module. | Set `conductor.elasticsearch.version=8`. | | OpenSearch 2 | `opensearch2` | Open-source ES alternative. | Compatible with ES 7 queries. | | OpenSearch 3 | `opensearch3` | Latest OpenSearch. | | | SQLite | `sqlite` | Local development only. | | | Disabled | N/A | Set `conductor.indexing.enabled=false`. UI search won't work. | | #### PostgreSQL indexing ```properties conductor.indexing.enabled=true conductor.indexing.type=postgres # Disable Elasticsearch client conductor.elasticsearch.version=0 ``` #### Elasticsearch 7 ```properties conductor.indexing.enabled=true conductor.elasticsearch.url=http://es-host:9200 conductor.elasticsearch.version=7 conductor.elasticsearch.indexName=conductor conductor.elasticsearch.clusterHealthColor=yellow # Performance tuning conductor.elasticsearch.indexBatchSize=1 conductor.elasticsearch.asyncMaxPoolSize=12 conductor.elasticsearch.asyncWorkerQueueSize=100 conductor.elasticsearch.asyncBufferFlushTimeout=10s conductor.elasticsearch.indexShardCount=5 conductor.elasticsearch.indexReplicasCount=1 # Auth (if using security) conductor.elasticsearch.username=elastic conductor.elasticsearch.password= ``` #### Elasticsearch 8 ```properties conductor.indexing.enabled=true conductor.indexing.type=elasticsearch8 conductor.elasticsearch.url=http://es-host:9200 conductor.elasticsearch.version=8 conductor.elasticsearch.indexName=conductor conductor.elasticsearch.clusterHealthColor=yellow ``` #### OpenSearch ```properties conductor.indexing.enabled=true conductor.indexing.type=opensearch2 # or opensearch3 conductor.opensearch.url=http://os-host:9200 conductor.opensearch.indexPrefix=conductor conductor.opensearch.clusterHealthColor=yellow conductor.opensearch.indexReplicasCount=0 ``` #### Async indexing For high-throughput deployments, enable async indexing to decouple the indexing path from the workflow execution path: ```properties conductor.app.asyncIndexingEnabled=true conductor.app.asyncUpdateShortRunningWorkflowDuration=30s conductor.app.asyncUpdateDelay=60s ``` #### Indexing toggles Control what gets indexed: ```properties conductor.app.taskIndexingEnabled=true conductor.app.taskExecLogIndexingEnabled=true conductor.app.eventMessageIndexingEnabled=true conductor.app.eventExecutionIndexingEnabled=true ``` --- ### Locking !!! warning "Required for production" Distributed locking prevents race conditions when multiple server instances evaluate the same workflow concurrently. **Always enable locking in production with a distributed lock provider** (Redis or Zookeeper). ```properties conductor.workflow-execution-lock.type=redis conductor.app.workflowExecutionLockEnabled=true ``` **Supported lock providers:** | Provider | Property value | When to use | |:--|:--|:--| | Redis | `redis` | **Recommended.** Use when Redis is already in the stack. | | Zookeeper | `zookeeper` | Use when Zookeeper is available (e.g. Kafka deployments). | | Local | `local_only` | Single-instance development only. **Not safe for multi-instance.** | #### Redis lock ```properties conductor.workflow-execution-lock.type=redis conductor.app.workflowExecutionLockEnabled=true conductor.app.lockLeaseTime=60000 # lock held for max 60s conductor.app.lockTimeToTry=500 # wait up to 500ms to acquire conductor.redis-lock.serverType=SINGLE # SINGLE, CLUSTER, or SENTINEL conductor.redis-lock.serverAddress=redis://redis-host:6379 # conductor.redis-lock.serverPassword= # conductor.redis-lock.serverMasterName=master # for Sentinel # conductor.redis-lock.namespace=conductor # key prefix conductor.redis-lock.ignoreLockingExceptions=false ``` #### Zookeeper lock ```properties conductor.workflow-execution-lock.type=zookeeper conductor.app.workflowExecutionLockEnabled=true conductor.app.lockLeaseTime=60000 conductor.app.lockTimeToTry=500 conductor.zookeeper-lock.connectionString=zk1:2181,zk2:2181,zk3:2181 # conductor.zookeeper-lock.sessionTimeoutMs=60000 # conductor.zookeeper-lock.connectionTimeoutMs=15000 # conductor.zookeeper-lock.namespace=conductor ``` --- ### Sweeper The sweeper is a background process that monitors running workflows. It polls the queue for workflows that need evaluation and triggers the decider. Without the sweeper, long-running workflows will not make progress. The sweeper runs automatically as part of the Conductor server. Tune the thread count based on your workflow volume: ```properties # Number of sweeper threads (default: availableProcessors * 2) conductor.app.sweeperThreadCount=8 # How long to wait when polling the sweep queue (default: 2000ms) conductor.app.sweeperWorkflowPollTimeout=2000 # Batch size per sweep poll (default: 2) conductor.app.sweeper.sweepBatchSize=2 # Queue pop timeout in ms (default: 100) conductor.app.sweeper.queuePopTimeout=100 ``` !!! tip "Sweeper sizing" Start with `sweeperThreadCount = 2 * CPU cores`. If you see workflows stuck in RUNNING state, increase it. If CPU usage is high on idle, decrease it. --- ### System task workers System task workers execute built-in task types (HTTP, Event, Wait, Inline, JSON_JQ_TRANSFORM, etc.) inside the Conductor server JVM. They poll internal queues for scheduled system tasks and execute them. ```properties # Number of system task worker threads (default: availableProcessors * 2) conductor.app.systemTaskWorkerThreadCount=20 # Max number of tasks to poll at once (default: same as thread count) conductor.app.systemTaskMaxPollCount=20 # Poll interval (default: 50ms) conductor.app.systemTaskWorkerPollInterval=50ms # Callback duration — how often to re-check async system tasks (default: 30s) conductor.app.systemTaskWorkerCallbackDuration=30s # Queue pop timeout (default: 100ms) conductor.app.systemTaskQueuePopTimeout=100ms ``` #### Running system task workers separately In large deployments, you may want to run system task workers on dedicated instances, separate from the API server. Use the **execution namespace** to isolate which instance handles system tasks: ```properties # On API-only instances — set a namespace that no system task worker listens on conductor.app.systemTaskWorkerExecutionNamespace=api-only conductor.app.systemTaskWorkerThreadCount=0 # On dedicated system task worker instances — match the namespace conductor.app.systemTaskWorkerExecutionNamespace=worker-pool-1 conductor.app.systemTaskWorkerThreadCount=40 conductor.app.systemTaskMaxPollCount=40 ``` #### Isolated system task workers For task domain isolation (routing specific tasks to specific worker groups): ```properties # Threads per isolation group (default: 1) conductor.app.isolatedSystemTaskWorkerThreadCount=4 ``` #### Postpone threshold When a system task has been polled many times without completing (e.g. a Join waiting for branches), Conductor progressively delays re-evaluation to avoid busy-polling: ```properties # After this many polls, begin exponential backoff (default: 200) conductor.app.systemTaskPostponeThreshold=200 ``` --- ### Event processing The event processor listens to configured event buses and triggers workflows or completes tasks based on incoming events. ```properties # Thread count for event processing (default: 2) conductor.app.eventProcessorThreadCount=4 # Event queue polling conductor.app.eventQueueSchedulerPollThreadCount=4 # default: CPU cores conductor.app.eventQueuePollInterval=100ms conductor.app.eventQueuePollCount=10 conductor.app.eventQueueLongPollTimeout=1000ms ``` See the [Event-driven recipes](../cookbook/event-driven.md) for configuring Kafka, NATS, AMQP, and SQS event queues. --- ### Payload size limits Conductor enforces payload size limits to prevent oversized data from degrading performance. When a payload exceeds the threshold, it is automatically stored in external payload storage (S3, PostgreSQL, or Azure Blob). ```properties # Workflow input/output — threshold to move to external storage (default: 5120 KB) conductor.app.workflowInputPayloadSizeThreshold=5120KB conductor.app.workflowOutputPayloadSizeThreshold=5120KB # Workflow input/output — hard limit, fails the workflow (default: 10240 KB) conductor.app.maxWorkflowInputPayloadSizeThreshold=10240KB conductor.app.maxWorkflowOutputPayloadSizeThreshold=10240KB # Task input/output — threshold to move to external storage (default: 3072 KB) conductor.app.taskInputPayloadSizeThreshold=3072KB conductor.app.taskOutputPayloadSizeThreshold=3072KB # Task input/output — hard limit, fails the task (default: 10240 KB) conductor.app.maxTaskInputPayloadSizeThreshold=10240KB conductor.app.maxTaskOutputPayloadSizeThreshold=10240KB # Workflow variables — hard limit (default: 256 KB) conductor.app.maxWorkflowVariablesPayloadSizeThreshold=256KB ``` For external payload storage configuration, see [External Payload Storage](../../documentation/advanced/externalpayloadstorage.md). --- ### Workflow monitoring and observability Conductor exposes Prometheus-compatible metrics out of the box for workflow monitoring and observability: ```properties conductor.metrics-prometheus.enabled=true management.endpoints.web.exposure.include=health,info,prometheus management.metrics.web.server.request.autotime.percentiles=0.50,0.75,0.90,0.95,0.99 management.endpoint.health.show-details=always ``` Scrape `http://:8080/actuator/prometheus` with Prometheus. For details on available metrics, see [Server Metrics](../../documentation/metrics/server.md) and [Client Metrics](../../documentation/metrics/client.md). --- ## Recommended production configurations ### PostgreSQL stack (simplest) One database for everything — fewest moving parts. ```properties # Database conductor.db.type=postgres conductor.queue.type=postgres conductor.external-payload-storage.type=postgres spring.datasource.url=jdbc:postgresql://db-host:5432/conductor spring.datasource.username=conductor spring.datasource.password= # Indexing (use PostgreSQL, no Elasticsearch needed) conductor.indexing.enabled=true conductor.indexing.type=postgres conductor.elasticsearch.version=0 # Locking (use Redis — lightweight, fast) conductor.workflow-execution-lock.type=redis conductor.app.workflowExecutionLockEnabled=true conductor.redis-lock.serverAddress=redis://redis-host:6379 # Sweeper conductor.app.sweeperThreadCount=8 # System task workers conductor.app.systemTaskWorkerThreadCount=20 conductor.app.systemTaskMaxPollCount=20 # Metrics conductor.metrics-prometheus.enabled=true management.endpoints.web.exposure.include=health,info,prometheus ``` ### Redis + Elasticsearch stack (high throughput) Best search performance and lowest latency for queue operations. ```properties # Database + Queue conductor.db.type=redis_standalone conductor.queue.type=redis_standalone conductor.redis.hosts=redis-host:6379:us-east-1c conductor.redis.workflowNamespacePrefix=conductor conductor.redis.queueNamespacePrefix=conductor_queues # Indexing conductor.indexing.enabled=true conductor.elasticsearch.url=http://es-host:9200 conductor.elasticsearch.version=7 conductor.elasticsearch.indexName=conductor conductor.elasticsearch.clusterHealthColor=yellow conductor.app.asyncIndexingEnabled=true # Locking conductor.workflow-execution-lock.type=redis conductor.app.workflowExecutionLockEnabled=true conductor.redis-lock.serverAddress=redis://redis-host:6379 # Sweeper conductor.app.sweeperThreadCount=16 # System task workers conductor.app.systemTaskWorkerThreadCount=40 conductor.app.systemTaskMaxPollCount=40 # Metrics conductor.metrics-prometheus.enabled=true management.endpoints.web.exposure.include=health,info,prometheus ``` --- ## Running with Docker ### Using Docker Compose ```shell git clone https://github.com/conductor-oss/conductor cd conductor docker compose -f docker/docker-compose.yaml up ``` To use a different backend, swap the compose file: ```shell docker compose -f docker/docker-compose-postgres.yaml up ``` ### Using the standalone image ```shell docker run -p 8080:8080 conductoross/conductor:latest ``` ### Custom configuration via volume mount Mount your own properties file to override the defaults without rebuilding the image: ```shell docker run -p 8080:8080 \ -v /path/to/my-config.properties:/app/config/config.properties \ conductoross/conductor:latest ``` ### Accessing Conductor | URL | Description | |:----|:---| | `http://localhost:8080` | Conductor UI | | `http://localhost:8080/swagger-ui/index.html` | REST API docs | ### Shutting down ```shell # Ctrl+C to stop, then: docker compose down ``` --- ## Multi-instance deployment and horizontal scaling For high availability and horizontal scaling, run multiple Conductor server instances behind a load balancer. All instances share the same database, queue, index, and lock backends. This architecture enables workflow engine scalability to millions of concurrent executions. **Requirements:** - **Distributed locking must be enabled** (`redis` or `zookeeper`). Without it, concurrent decider evaluations on the same workflow will cause race conditions. - All instances must point to the same database, queue, and indexing backends. - The load balancer should use round-robin or least-connections routing. **Optional: separate API and worker instances:** ``` ┌──────────────────┐ ┌──────────────────┐ │ API Instance 1 │ │ API Instance 2 │ ← handle REST/gRPC, low system task threads │ (systemTask=0) │ │ (systemTask=0) │ └────────┬─────────┘ └────────┬─────────┘ │ │ ┌────┴────────────────────────┴────┐ │ Load Balancer │ └────┬────────────────────────┬────┘ │ │ ┌────────┴──────────┐ ┌───────┴───────────┐ │ Worker Instance │ │ Worker Instance │ ← high system task threads, sweeper │ (systemTask=40) │ │ (systemTask=40) │ └───────────────────┘ └───────────────────┘ ``` --- ## Troubleshooting | Issue | Fix | |:--|:--| | Out of memory or slow performance | Check JVM heap usage and adjust `-Xms` / `-Xmx` as necessary. Monitor with `jstat` or the `/actuator/health` endpoint. | | Elasticsearch stuck in yellow health | Set `conductor.elasticsearch.clusterHealthColor=yellow` or add more ES nodes for green. | | Workflows stuck in RUNNING | Check sweeper is running and `sweeperThreadCount > 0`. Check lock provider is reachable. | | System tasks not executing | Verify `systemTaskWorkerThreadCount > 0` and the queue backend is reachable. | | Config changes not taking effect | Properties are baked into the Docker image at build time. Mount a volume instead of rebuilding. |