# Go Standards - Multi-Tenant > **Module:** multi-tenant.md | **Sections:** §27 | **Parent:** [index.md](index.md) This module covers multi-tenant patterns with Tenant Manager. --- ## Table of Contents | # | Section | Description | |---|---------|-------------| | 1 | [Multi-Tenant Patterns (CONDITIONAL)](#multi-tenant-patterns-conditional) | Configuration, Tenant Manager API, middleware, context injection, repository adaptation | | 1a | [Generic TenantMiddleware (Standard Pattern)](#generic-tenantmiddleware-standard-pattern) | Single-module services (CRM, plugins, reporter) | | 1b | [Multi-module middleware (MultiPoolMiddleware)](#multi-module-middleware-multipoolmiddleware) | Multi-module unified services (midaz ledger) | | 2 | [Tenant Isolation Verification (⚠️ CONDITIONAL)](#tenant-isolation-verification-conditional) | Database-per-tenant verification, context-based connection checks | | 3 | [Route-Level Auth-Before-Tenant Ordering (MANDATORY)](#route-level-auth-before-tenant-ordering-mandatory) | Auth MUST validate JWT before tenant middleware calls Tenant Manager API | | 24 | [Multi-Tenant Message Queue Consumers (Lazy Mode)](#multi-tenant-message-queue-consumers-lazy-mode) | Lazy consumer initialization, on-demand connection, exponential backoff | | 25 | [M2M Credentials via Secret Manager (Plugin-Only)](#m2m-credentials-via-secret-manager-plugin-only) | AWS Secrets Manager integration for plugin-to-product authentication per tenant | | 26 | [Service Authentication (MANDATORY)](#service-authentication-mandatory) | API key authentication for tenant-manager /settings endpoint via X-API-Key header | --- ## Multi-Tenant Patterns (CONDITIONAL) **CONDITIONAL:** Only implement if `MULTI_TENANT_ENABLED=true` is required for your service. ### HARD GATE: Canonical Model Compliance **Existence ≠ Compliance.** A service that has "some multi-tenant code" is NOT considered multi-tenant unless every component matches the canonical patterns defined in this document exactly. MUST replace multi-tenant implementations that use custom middleware, manual DB switching, non-standard env var names, or any mechanism other than the lib-commons v3 tenant-manager sub-packages — they are **non-compliant**. Not patched, not adapted, **replaced**. The only valid multi-tenant implementation uses: - `tenantId` from JWT via `TenantMiddleware` or `MultiPoolMiddleware` (from `lib-commons/v3/commons/tenant-manager/middleware`), registered per-route using a local `WhenEnabled` helper - `core.ResolvePostgres` / `core.ResolveMongo` / `core.ResolveModuleDB` for database resolution (from `lib-commons/v3/commons/tenant-manager/core`) - `valkey.GetKeyFromContext` for Redis key prefixing (from `lib-commons/v3/commons/tenant-manager/valkey`) - `s3.GetObjectStorageKeyForTenant` for S3 key prefixing (from `lib-commons/v3/commons/tenant-manager/s3`) - `tmrabbitmq.Manager` for RabbitMQ vhost isolation (from `lib-commons/v3/commons/tenant-manager/rabbitmq`) - The 8 canonical `MULTI_TENANT_*` environment variables with correct names and defaults - `client.WithCircuitBreaker` on the Tenant Manager HTTP client - `client.WithServiceAPIKey` on the Tenant Manager HTTP client for `/settings` endpoint authentication MUST correct any deviation from these patterns before the service can be considered multi-tenant. **Any file outside this canonical set that claims to handle multi-tenant logic (custom tenant resolvers, manual pool managers, wrapper middleware, etc.) is non-compliant and MUST NOT be considered part of the multi-tenant implementation.** Only files following the patterns below are valid. ### Canonical File Map These are the only files that require multi-tenant changes. The exact paths follow the standard Go project layout used across Lerian services. Files not listed here MUST NOT contain multi-tenant logic. **Always modified (every service):** | File | Gate | What Changes | |------|------|-------------| | `go.mod` | 2 | lib-commons v3, lib-auth v2 | | `internal/bootstrap/config.go` | 3 | 8 canonical `MULTI_TENANT_*` env vars in Config struct | | `internal/bootstrap/service.go` (or equivalent init file) | 4 | Conditional initialization: Tenant Manager client, connection managers, middleware creation. Branch on `cfg.MultiTenantEnabled` | | `internal/bootstrap/routes.go` (or equivalent router file) | 4 | Per-route composition via `WhenEnabled(ttHandler)` — auth validates JWT before tenant resolves DB. Each project implements the `WhenEnabled` helper locally. See [Route-Level Auth-Before-Tenant Ordering](#route-level-auth-before-tenant-ordering-mandatory) | **Per detected database/storage (Gate 5):** | File Pattern | Stack | What Changes | |-------------|-------|-------------| | `internal/adapters/postgres/**/*.postgresql.go` | PostgreSQL | `r.connection.GetDB()` → `core.ResolvePostgres(ctx, r.connection)` or `core.ResolveModuleDB(ctx, module, r.connection)` (multi-module) | | `internal/adapters/mongodb/**/*.mongodb.go` | MongoDB | Static mongo connection → `core.ResolveMongo(ctx, r.connection, r.dbName)` | | `internal/adapters/redis/**/*.redis.go` | Redis | Every key operation → `valkey.GetKeyFromContext(ctx, key)` (including Lua script `KEYS[]` and `ARGV[]`) | | `internal/adapters/storage/**/*.go` (or S3 adapter) | S3 | Every object key → `s3.GetObjectStorageKeyForTenant(ctx, key)` | **Conditional — plugin only (Gate 5.5):** | File | Condition | What Changes | |------|-----------|-------------| | `internal/adapters/product/client.go` (or equivalent product API client) | Plugin that calls product APIs | M2M authenticator with per-tenant credential caching via `secretsmanager.GetM2MCredentials` | | `internal/bootstrap/service.go` | Plugin | Conditional M2M wiring: `if cfg.MultiTenantEnabled` → AWS Secrets Manager client + M2M provider | **Conditional — RabbitMQ only (Gate 6):** | File Pattern | What Changes | |-------------|-------------| | `internal/adapters/rabbitmq/producer*.go` | Dual constructor: single-tenant (direct connection) + multi-tenant (`tmrabbitmq.Manager.GetChannel`). `X-Tenant-ID` header injection | | `internal/adapters/rabbitmq/consumer*.go` (or `internal/bootstrap/`) | `tmconsumer.MultiTenantConsumer` with lazy initialization. `X-Tenant-ID` header extraction | **Tests (Gate 7-8):** | File Pattern | What Tests | |-------------|------------| | `internal/bootstrap/*_test.go` | `TestMultiTenant_BackwardCompatibility` — validates single-tenant mode works unchanged | | `internal/adapters/**/*_test.go` | Unit tests with mock tenant context, tenant isolation tests (two tenants, data separation) | | `internal/service/*_test.go` (or integration test dir) | Integration tests with two distinct tenants verifying cross-tenant isolation | **Output artifacts (Gate 11):** | File | What | |------|------| | `docs/multi-tenant-guide.md` | Activation guide: env vars, how to enable/disable, verification steps | | `docs/multi-tenant-preview.html` | Visual implementation preview (generated at Gate 1.5, kept for reference) | **HARD GATE: Files outside this map that contain multi-tenant logic are non-compliant.** If a service has custom files like `internal/tenant/resolver.go`, `internal/middleware/tenant_middleware.go`, `pkg/multitenancy/pool.go` or similar — these MUST be removed and replaced with the canonical lib-commons v3 tenant-manager sub-packages wired through the files listed above. ### Required lib-commons Version Multi-tenant support requires **lib-commons v3** (`github.com/LerianStudio/lib-commons/v3`). The `tenant-manager` package does not exist in v2. | lib-commons version | Multi-tenant support | Package path | |--------------------|-----------------------|-------------| | **v2** (`lib-commons/v2`) | Not available | N/A — no `tenant-manager` package | | **v3** (`lib-commons/v3`) | Full support | `github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/...` (sub-packages: `core`, `client`, `postgres`, `mongo`, `middleware`, `rabbitmq`, `consumer`, `valkey`, `s3`). The `middleware` sub-package contains both `TenantMiddleware` (single-module) and `MultiPoolMiddleware` (multi-module). Route-level composition uses a local `WhenEnabled` helper (not from lib-commons). | **Migration from v2 to v3:** Services currently on lib-commons v2 MUST upgrade to v3 before implementing multi-tenant. The upgrade involves: 1. Update `go.mod`: `github.com/LerianStudio/lib-commons/v2` -> `github.com/LerianStudio/lib-commons/v3` 2. Update all import paths: `lib-commons/v2/commons/...` -> `lib-commons/v3/commons/...` 3. Add the `tenant-manager` package imports where needed ```bash # Update go.mod go get github.com/LerianStudio/lib-commons/v3@latest # Update import paths across the codebase (portable — works on macOS and Linux) find . -name "*.go" -exec perl -pi -e 's|lib-commons/v2|lib-commons/v3|g' {} + # Verify build go build ./... ``` ### When to Use Multi-Tenant Mode | Scenario | Mode | Configuration | |----------|------|---------------| | Single customer deployment | Single-tenant | `MULTI_TENANT_ENABLED=false` (default) | | SaaS with shared infrastructure | Multi-tenant | `MULTI_TENANT_ENABLED=true` | | Multiple isolated databases per customer | Multi-tenant | Requires Tenant Manager | ### Environment Variables | Env Var | Description | Default | Required | |---------|-------------|---------|----------| | `APPLICATION_NAME` | Service name for Tenant Manager API (`/tenants/{id}/services/{service}/settings`) | - | Yes | | `MULTI_TENANT_ENABLED` | Enable multi-tenant mode | `false` | Yes | | `MULTI_TENANT_URL` | Tenant Manager service URL | - | If multi-tenant | | `MULTI_TENANT_ENVIRONMENT` | Deployment environment for cache key segmentation (lazy consumer tenant discovery) | `staging` | Only if RabbitMQ | | `MULTI_TENANT_MAX_TENANT_POOLS` | Soft limit for tenant connection pools (LRU eviction) | `100` | No | | `MULTI_TENANT_IDLE_TIMEOUT_SEC` | Seconds before idle tenant connection is eviction-eligible | `300` | No | | `MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD` | Consecutive failures before circuit breaker opens | `5` | Yes | | `MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC` | Seconds before circuit breaker resets (half-open) | `30` | Yes | | `MULTI_TENANT_SERVICE_API_KEY` | API key for authenticating with tenant-manager `/settings` endpoint. Generated via service catalog. | - | If multi-tenant | **Example `.env` for multi-tenant:** ```bash MULTI_TENANT_ENABLED=true MULTI_TENANT_URL=http://tenant-manager:4003 MULTI_TENANT_ENVIRONMENT=production MULTI_TENANT_MAX_TENANT_POOLS=100 MULTI_TENANT_IDLE_TIMEOUT_SEC=300 MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD=5 MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC=30 MULTI_TENANT_SERVICE_API_KEY=your-service-api-key-here ``` ### Configuration ```go // internal/bootstrap/config.go type Config struct { ApplicationName string `env:"APPLICATION_NAME"` // Multi-Tenant Configuration MultiTenantEnabled bool `env:"MULTI_TENANT_ENABLED" default:"false"` MultiTenantURL string `env:"MULTI_TENANT_URL"` MultiTenantEnvironment string `env:"MULTI_TENANT_ENVIRONMENT" default:"staging"` MultiTenantMaxTenantPools int `env:"MULTI_TENANT_MAX_TENANT_POOLS" default:"100"` MultiTenantIdleTimeoutSec int `env:"MULTI_TENANT_IDLE_TIMEOUT_SEC" default:"300"` MultiTenantCircuitBreakerThreshold int `env:"MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD" default:"5"` MultiTenantCircuitBreakerTimeoutSec int `env:"MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC" default:"30"` MultiTenantServiceAPIKey string `env:"MULTI_TENANT_SERVICE_API_KEY"` // PostgreSQL Primary (used as default connection in single-tenant mode) PrimaryDBHost string `env:"DB_HOST"` PrimaryDBUser string `env:"DB_USER"` PrimaryDBPassword string `env:"DB_PASSWORD"` PrimaryDBName string `env:"DB_NAME"` PrimaryDBPort string `env:"DB_PORT"` PrimaryDBSSLMode string `env:"DB_SSLMODE"` } ``` ### Service Name Resolution The `service` parameter in `NewManager` maps to the Tenant Manager API path: `/tenants/{id}/services/{service}/settings`. Use `cfg.ApplicationName` (env `APPLICATION_NAME`): ```go pgMgr := tmpostgres.NewManager(tmClient, cfg.ApplicationName, tmpostgres.WithModule(ApplicationName), // module = component name constant tmpostgres.WithLogger(logger), ) ``` | Parameter | Source | Purpose | Example | |-----------|--------|---------|---------| | `service` (2nd arg) | `cfg.ApplicationName` (env `APPLICATION_NAME`) | Tenant Manager API path | `"ledger"`, `"reporter"` | | `module` (WithModule) | Component constant `ApplicationName` | Key in `TenantConfig.Databases[module]` | `"onboarding"`, `"transaction"`, `"manager"` | ### Manager Wiring **TenantMiddleware (single-module):** Managers are passed directly to the middleware: ```go mongoManager := tmmongo.NewManager(tmClient, cfg.ApplicationName, ...) ttMid := tmmiddleware.NewTenantMiddleware( tmmiddleware.WithMongoManager(mongoManager), ) ``` **MultiPoolMiddleware (multi-module):** Managers MUST be assigned to the Service struct and exposed via getters: ```go type Service struct { pgManager interface{} mongoManager interface{} } func (s *Service) GetPGManager() interface{} { return s.pgManager } func (s *Service) GetMongoManager() interface{} { return s.mongoManager } // At construction, MUST assign managers service := &Service{ pgManager: pg.pgManager, mongoManager: mgo.mongoManager, } ``` ### Tenant Manager Service API The Tenant Manager is an external service that stores database credentials per tenant. All connection managers in lib-commons call this API to resolve tenant-specific connections. **Endpoints:** | Method | Path | Returns | Purpose | |--------|------|---------|---------| | `GET` | `/tenants/{tenantID}/services/{service}/settings` | `TenantConfig` | Full tenant configuration with DB credentials | | `GET` | `/tenants/active?service={service}` | `[]*TenantSummary` | List of active tenants (fallback for discovery) | **Tenant Discovery (for lazy consumer mode):** 1. Primary: Redis `SMEMBERS "tenant-manager:tenants:active"` (fast, <1ms) 2. Fallback: HTTP `GET /tenants/active?service={service}` (slower, network call) ### TenantConfig Data Model The Tenant Manager returns this structure for each tenant. The `Databases` map is keyed by **module name** (e.g., `"onboarding"`, `"transaction"`). ```go type TenantConfig struct { ID string // Tenant UUID TenantSlug string // Human-readable slug IsolationMode string // "isolated" (default) or "schema" Databases map[string]DatabaseConfig // module -> config Messaging *MessagingConfig // RabbitMQ config (optional) } type DatabaseConfig struct { PostgreSQL *PostgreSQLConfig PostgreSQLReplica *PostgreSQLConfig // Read replica (optional) MongoDB *MongoDBConfig ConnectionSettings *ConnectionSettings // Per-tenant pool overrides (optional) } // ConnectionSettings holds per-tenant database connection pool settings. // When present, these values override the global defaults on PostgresManager/MongoManager. // If nil (e.g., older tenant associations), global defaults apply. type ConnectionSettings struct { MaxOpenConns int `json:"maxOpenConns"` MaxIdleConns int `json:"maxIdleConns"` } ``` **Isolation Modes:** | Mode | Database | Schema | Connection String | When to Use | |------|----------|--------|-------------------|-------------| | `isolated` (default) | Separate database per tenant | Default `public` schema | Standard connection | Strong isolation, recommended | | `schema` | Shared database | Schema per tenant | Adds `options=-csearch_path="{schema}"` | Cost optimization, weaker isolation | ### Connection Pool Management All connection managers (PostgreSQL, MongoDB, RabbitMQ) use **LRU eviction with soft limits**: - **Soft limit** (`WithMaxTenantPools`): When the pool reaches this size and a new tenant needs a connection, only connections idle longer than the timeout are evicted. If all connections are active, the pool grows beyond the limit. - **Idle timeout** (`WithIdleTimeout`): Connections not accessed within this window become eligible for eviction. Default: 5 minutes. - **Connection health**: Cached connections are pinged before reuse (3s timeout). Stale connections are recreated transparently. The Tenant Manager HTTP client MUST enable the **circuit breaker** (`WithCircuitBreaker`): - After N consecutive failures, the circuit opens and requests fail fast with `ErrCircuitBreakerOpen` - After the timeout, the circuit enters half-open state and allows one request through - On success, the circuit resets to closed ### JWT Tenant Extraction **Claim key:** `tenantId` (camelCase, hardcoded) **⛔ CRITICAL: `tenantId` from JWT is the ONLY multi-tenant mechanism.** The `tenantId` identifies the client/customer. The lib-commons `TenantMiddleware` extracts it from the JWT, resolves the tenant-specific database connection via Tenant Manager API, and stores it in context. Each tenant has its own database — tenant A cannot query tenant B's database. **`organization_id` is NOT part of multi-tenant isolation.** It is a separate concern (entity within a domain). Adding `organization_id` filters to queries does NOT provide tenant isolation. Multi-tenant isolation comes exclusively from `tenantId` → `TenantConnectionManager` → database-per-tenant. **Anti-Rationalization:** | Rationalization | Why It's WRONG | Required Action | |-----------------|----------------|-----------------| | "Adding organization_id filters = multi-tenant" | organization_id does NOT route to different databases. All data still in ONE database. | **MUST implement tenantId → TenantConnectionManager** | | "The codebase already has organization_id wiring" | organization_id is irrelevant for multi-tenant. tenantId from JWT is the mechanism. | **Implement TenantMiddleware with JWT tenantId extraction** | | "Midaz uses organization_id for tenant isolation" | WRONG. Midaz has ZERO organization_id in WHERE clauses. It uses tenantId → core.ResolveModuleDB(ctx, module, fallback). | **Follow the actual pattern: tenantId → context → database routing** | ```go // internal/bootstrap/middleware.go func extractTenantIDFromToken(c *fiber.Ctx) (string, error) { // Use lib-commons helper for token extraction accessToken := libHTTP.ExtractTokenFromHeader(c) if accessToken == "" { return "", errors.New("no authorization token provided") } // Parse without validation (validation done by auth middleware) token, _, err := new(jwt.Parser).ParseUnverified(accessToken, jwt.MapClaims{}) if err != nil { return "", err } claims, ok := token.Claims.(jwt.MapClaims) if !ok { return "", errors.New("invalid token claims format") } // Extract tenantId (camelCase only - no fallbacks) tenantID, ok := claims["tenantId"].(string) if !ok || tenantID == "" { return "", errors.New("tenantId claim not found in token") } return tenantID, nil } ``` ### Generic TenantMiddleware (Standard Pattern) **This is the standard pattern for all services.** The lib-commons `TenantMiddleware` handles JWT extraction, tenant resolution, and context injection automatically. ```go // internal/bootstrap/config.go import ( "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/client" tmpostgres "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/postgres" tmmongo "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/mongo" "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/middleware" ) func initService(cfg *Config) { // 1. Create Tenant Manager HTTP client (with circuit breaker — MANDATORY) var clientOpts []client.ClientOption if cfg.MultiTenantCircuitBreakerThreshold > 0 { clientOpts = append(clientOpts, client.WithCircuitBreaker( cfg.MultiTenantCircuitBreakerThreshold, time.Duration(cfg.MultiTenantCircuitBreakerTimeoutSec)*time.Second, ), ) } clientOpts = append(clientOpts, client.WithServiceAPIKey(cfg.MultiTenantServiceAPIKey), ) tmClient, err := client.NewClient(cfg.MultiTenantURL, logger, clientOpts...) if err != nil { return fmt.Errorf("creating tenant-manager client: %w", err) } idleTimeout := time.Duration(cfg.MultiTenantIdleTimeoutSec) * time.Second // 2. Create PostgreSQL manager (one per service or per module) pgManager := tmpostgres.NewManager(tmClient, "my-service", tmpostgres.WithModule("my-module"), tmpostgres.WithLogger(logger), tmpostgres.WithMaxTenantPools(cfg.MultiTenantMaxTenantPools), tmpostgres.WithIdleTimeout(idleTimeout), ) // 3. Create MongoDB manager (optional) mongoManager := tmmongo.NewManager(tmClient, "my-service", tmmongo.WithModule("my-module"), tmmongo.WithLogger(logger), tmmongo.WithMaxTenantPools(cfg.MultiTenantMaxTenantPools), tmmongo.WithIdleTimeout(idleTimeout), ) // 4. Create middleware (do NOT register globally — use per-route with WhenEnabled) ttMid := middleware.NewTenantMiddleware( middleware.WithPostgresManager(pgManager), middleware.WithMongoManager(mongoManager), // optional ) // Pass ttMid.WithTenantDB as the ttHandler to routes.go. // In routes.go, register per-route using WhenEnabled(ttHandler). // When MULTI_TENANT_ENABLED=false, pass nil instead — WhenEnabled handles it. // See "Route-Level Auth-Before-Tenant Ordering" section } ``` **What the middleware does internally:** 1. Extracts `Authorization: Bearer {token}` header 2. Parses JWT (unverified — auth middleware already validated it) 3. Extracts `tenantId` claim 4. Calls `PostgresManager.GetConnection(ctx, tenantID)` to resolve tenant-specific DB 5. Stores tenant ID and DB connection in context 6. If MongoDB manager is set, resolves and stores MongoDB connection 7. Calls `c.Next()` **In repositories, use context-based getters:** ```go import ( "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/core" "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/valkey" ) // Single-module service: use generic getter db, err := core.ResolvePostgres(ctx, r.connection) // MongoDB mongoDB, err := core.ResolveMongo(ctx, r.connection, r.dbName) // Redis key prefixing key := valkey.GetKeyFromContext(ctx, "cache-key") // -> "tenant:{tenantId}:cache-key" // Get tenant ID directly tenantID := core.GetTenantID(ctx) ``` ### Multi-module middleware (MultiPoolMiddleware) **When to use:** Services that serve multiple modules on a single port with different databases per module. For example, midaz ledger serves onboarding and transaction modules in a single process, each with its own PostgreSQL and MongoDB pools. **Most services do NOT need this.** If your service has a single database (CRM, plugin-auth, reporter, etc.), use the standard `TenantMiddleware` above. Only reach for `MultiPoolMiddleware` when you have path-based routing to separate database pools. **Import:** ```go import ( tmmiddleware "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/middleware" ) ``` **Key types:** | Type | Purpose | |------|---------| | `MultiPoolMiddleware` | Routes requests to module-specific tenant pools based on URL path matching | | `MultiPoolOption` | Functional option for configuring `MultiPoolMiddleware` | | `ConsumerTrigger` | Interface for lazy consumer spawning (defined in middleware package) | | `ErrorMapper` | Function type for custom HTTP error responses | **Available options:** | Option | Purpose | Required | |--------|---------|----------| | `WithRoute(paths, module, pgPool, mongoPool)` | Map URL path prefixes to a module's database pools | At least one route or default | | `WithDefaultRoute(module, pgPool, mongoPool)` | Fallback route when no path-based route matches | At least one route or default | | `WithPublicPaths(paths...)` | URL prefixes that bypass tenant resolution entirely | No | | `WithConsumerTrigger(ct)` | Enable lazy consumer spawning after tenant ID extraction | No (only if using lazy RabbitMQ mode) | | `WithCrossModuleInjection()` | Resolve PG connections for all registered modules, not just the matched one | No (only if handlers need cross-module DB access) | | `WithErrorMapper(fn)` | Custom function to convert tenant-manager errors into HTTP responses | No (built-in mapper is used by default) | | `WithMultiPoolLogger(logger)` | Set logger for the middleware (otherwise extracted from context) | No | #### Multi-module service example ```go // config.go - Multi-module service (e.g., unified ledger with onboarding + transaction) import ( tmmiddleware "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/middleware" ) transactionPaths := []string{"/transactions", "/operations", "/balances", "/asset-rates"} multiMid := tmmiddleware.NewMultiPoolMiddleware( tmmiddleware.WithRoute(transactionPaths, "transaction", txPGPool, txMongoPool), tmmiddleware.WithDefaultRoute("onboarding", onbPGPool, onbMongoPool), tmmiddleware.WithPublicPaths("/health", "/version", "/swagger"), tmmiddleware.WithConsumerTrigger(consumerTrigger), // optional, for lazy RabbitMQ mode tmmiddleware.WithCrossModuleInjection(), // enables cross-module PG connections tmmiddleware.WithErrorMapper(customErrorMapper), // optional, for custom HTTP error responses tmmiddleware.WithMultiPoolLogger(logger), ) // Pass multiMid.Handle to routes.go — register per-route using WhenEnabled: // See "Route-Level Auth-Before-Tenant Ordering" section ``` **What the middleware does internally:** 1. Checks if the request path matches a public path — if so, calls `c.Next()` immediately 2. Matches the request path against registered routes (first match wins, falls back to default route) 3. Checks if the matched route's PG pool is multi-tenant — if not, calls `c.Next()` 4. Extracts `Authorization: Bearer {token}` header and parses JWT 5. Extracts `tenantId` claim from JWT 6. Calls `ConsumerTrigger.EnsureConsumerStarted(ctx, tenantID)` if configured 7. Resolves PG connection via `route.pgPool.GetConnection(ctx, tenantID)` and stores it using module-scoped context keys 8. If `WithCrossModuleInjection()` is enabled, resolves PG connections for all other routes too 9. Resolves MongoDB connection if the route has a mongo pool 10. Calls `c.Next()` **In repositories for multi-module services, use module-scoped getters:** ```go import "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/core" // Multi-module: use module-specific getter db, err := core.ResolveModuleDB(ctx, "transaction", r.connection) db, err := core.ResolveModuleDB(ctx, "onboarding", r.connection) ``` #### Simple single-module service example ```go // config.go - Single-module service (e.g., CRM, plugin, reporter) // Just use TenantMiddleware directly — no need for MultiPoolMiddleware import ( tmmiddleware "github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/middleware" ) ttMid := tmmiddleware.NewTenantMiddleware( tmmiddleware.WithPostgresManager(pgManager), tmmiddleware.WithMongoManager(mongoManager), // optional ) // Pass ttMid.WithTenantDB to routes.go — register per-route using WhenEnabled: // See "Route-Level Auth-Before-Tenant Ordering" section ``` #### Choosing between TenantMiddleware and MultiPoolMiddleware | Feature | TenantMiddleware | MultiPoolMiddleware | |---------|-----------------|-------------------| | Single PG pool | Yes | Yes (via `WithDefaultRoute`) | | Multiple PG pools (path-based) | No | Yes (via `WithRoute`) | | MongoDB support | Yes | Yes | | Cross-module injection | No | Yes (via `WithCrossModuleInjection`) | | Consumer trigger (lazy mode) | No | Yes (via `WithConsumerTrigger`) | | Custom error mapping | No | Yes (via `WithErrorMapper`) | | Public path bypass | No (handled externally) | Yes (via `WithPublicPaths`) | | When to use | Single-module services | Multi-module unified services | **Rule of thumb:** If your service has one database module, use `TenantMiddleware`. If your service combines multiple modules with different databases behind one HTTP port, use `MultiPoolMiddleware`. #### ConsumerTrigger interface The `ConsumerTrigger` interface is defined in the lib-commons middleware package. Services that process messages in multi-tenant mode implement this interface and pass it to the middleware for lazy consumer activation. ```go // Defined in github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/middleware type ConsumerTrigger interface { EnsureConsumerStarted(ctx context.Context, tenantID string) } ``` The middleware calls `EnsureConsumerStarted` after extracting the tenant ID. The first request per tenant spawns the consumer (~500ms). Subsequent requests return immediately (<1ms). Import this interface from `tmmiddleware`, not from service-specific packages. #### ErrorMapper The `ErrorMapper` type lets you customize how tenant-manager errors are converted into HTTP responses. When not set, the built-in default mapper handles standard cases (401, 403, 404, 503). ```go // Defined in github.com/LerianStudio/lib-commons/v3/commons/tenant-manager/middleware type ErrorMapper func(c *fiber.Ctx, err error, tenantID string) error ``` Example custom mapper: ```go customErrorMapper := func(c *fiber.Ctx, err error, tenantID string) error { if errors.Is(err, core.ErrTenantNotFound) { return c.Status(404).JSON(fiber.Map{ "code": "TENANT_NOT_FOUND", "message": fmt.Sprintf("tenant %s not found", tenantID), }) } // Fall through to default behavior for other errors return c.Status(500).JSON(fiber.Map{ "code": "INTERNAL_ERROR", "message": err.Error(), }) } ``` ### Database Connection in Repositories Repositories use context-based getters to retrieve tenant connections: ```go // internal/adapters/postgres/entity/entity.postgresql.go func (r *EntityPostgreSQLRepository) Create(ctx context.Context, entity *mmodel.Entity) (*mmodel.Entity, error) { logger, tracer, _, _ := libCommons.NewTrackingFromContext(ctx) ctx, span := tracer.Start(ctx, "postgres.create_entity") defer span.End() // Get tenant-specific connection from context db, err := core.ResolvePostgres(ctx, r.connection) if err != nil { libOpentelemetry.HandleSpanError(&span, "Failed to get database connection", err) logger.Errorf("Failed to get database connection: %v", err) return nil, err } record := &EntityPostgreSQLModel{} record.FromEntity(entity) // Use db for queries - automatically scoped to tenant's database // ... } ``` ### Redis Key Prefixing ```go // internal/adapters/redis/repository.go func (r *RedisRepository) Set(ctx context.Context, key, value string, ttl time.Duration) error { logger, tracer, _, _ := libCommons.NewTrackingFromContext(ctx) ctx, span := tracer.Start(ctx, "redis.set") defer span.End() // Tenant-aware key prefixing (adds tenant:{tenantId}: prefix if multi-tenant) key = valkey.GetKeyFromContext(ctx, key) rds, err := r.conn.GetConnection(ctx) if err != nil { return err } return rds.Set(ctx, key, value, ttl).Err() } ``` ### S3/Object Storage Key Prefixing Services that store files in S3 MUST prefix object keys with the tenant ID for tenant isolation. The bucket is configured per service via environment variable. Tenant separation is by directory within the bucket. ```go // In any service/adapter that uploads, downloads, or deletes files from S3: func (r *StorageRepository) Upload(ctx context.Context, originalKey, contentType string, data io.Reader) error { // Tenant-aware key prefixing: {tenantId}/{originalKey} in multi-tenant, {originalKey} in single-tenant key := s3.GetObjectStorageKeyForTenant(ctx, originalKey) return r.s3Client.Upload(ctx, key, data, contentType) } func (r *StorageRepository) Download(ctx context.Context, originalKey string) (io.ReadCloser, error) { // MUST use the same prefixed key for reads and writes key := s3.GetObjectStorageKeyForTenant(ctx, originalKey) return r.s3Client.Download(ctx, key) } ``` **Storage structure:** ``` Bucket: {service-name} (env var: OBJECT_STORAGE_BUCKET) └── {tenantId}/ └── {resource}/{path} ``` **Backward compatibility:** When no tenant is in context (single-tenant mode), the key is returned unchanged — no prefix added. ### RabbitMQ Multi-Tenant: Two-Layer Isolation Model RabbitMQ multi-tenant requires **two complementary layers** — both are mandatory: | Layer | Mechanism | Purpose | |-------|-----------|---------| | **1. Vhost Isolation** | `tmrabbitmq.Manager` → `GetChannel(ctx, tenantID)` | **Isolation.** Each tenant gets its own RabbitMQ vhost. Queues, exchanges, and connections are fully separated. | | **2. X-Tenant-ID Header** | `headers["X-Tenant-ID"] = tenantID` | **Audit + context propagation.** Enables distributed tracing, log correlation, and downstream tenant resolution. Does NOT provide isolation. | **⛔ Layer 2 alone is NOT multi-tenant compliant.** A shared connection with `X-Tenant-ID` headers provides traceability but zero isolation — a poison message or traffic spike from one tenant affects all tenants. **⛔ Layer 1 alone is incomplete.** Vhosts isolate but the `X-Tenant-ID` header is needed for log correlation, distributed tracing, and downstream context propagation across services. ### RabbitMQ Multi-Tenant Producer ```go // internal/adapters/rabbitmq/producer.go type ProducerRepository struct { conn *libRabbitmq.RabbitMQConnection rabbitMQManager *tmrabbitmq.Manager multiTenantMode bool } // Single-tenant constructor func NewProducer(conn *libRabbitmq.RabbitMQConnection) *ProducerRepository { return &ProducerRepository{ conn: conn, multiTenantMode: false, } } // Multi-tenant constructor func NewProducerMultiTenant(pool *tmrabbitmq.Manager) *ProducerRepository { return &ProducerRepository{ rabbitMQManager: pool, multiTenantMode: true, } } func (p *ProducerRepository) Publish(ctx context.Context, exchange, key string, message []byte) error { // Inject tenant ID header tenantID := core.GetTenantID(ctx) headers := amqp.Table{} if tenantID != "" { headers["X-Tenant-ID"] = tenantID } if p.multiTenantMode { if tenantID == "" { return fmt.Errorf("tenant ID is required in multi-tenant mode") } // Get tenant-specific channel from pool channel, err := p.rabbitMQManager.GetChannel(ctx, tenantID) if err != nil { return err } return channel.PublishWithContext(ctx, exchange, key, false, false, amqp.Publishing{ ContentType: "application/json", DeliveryMode: amqp.Persistent, Headers: headers, Body: message, }) } // Single-tenant: use static connection return p.conn.Channel.Publish(exchange, key, false, false, amqp.Publishing{Body: message, Headers: headers}) } ``` ### MongoDB Multi-Tenant Repository ```go // internal/adapters/mongodb/metadata.go type MetadataMongoDBRepository struct { connection *libMongo.MongoConnection dbName string } func NewMetadataMongoDBRepository(conn *libMongo.MongoConnection, dbName string) *MetadataMongoDBRepository { return &MetadataMongoDBRepository{connection: conn, dbName: dbName} } func (r *MetadataMongoDBRepository) Create(ctx context.Context, collection string, metadata *Metadata) error { logger, tracer, _, _ := libCommons.NewTrackingFromContext(ctx) ctx, span := tracer.Start(ctx, "mongodb.create_metadata") defer span.End() // Get tenant-specific database from context tenantDB, err := core.ResolveMongo(ctx, r.connection, r.dbName) if err != nil { libOpentelemetry.HandleSpanError(&span, "Failed to get database connection", err) return err } // Use tenant's database for operations coll := tenantDB.Collection(strings.ToLower(collection)) record := &MetadataMongoDBModel{} if err := record.FromEntity(metadata); err != nil { return err } _, err = coll.InsertOne(ctx, record) if err != nil { libOpentelemetry.HandleSpanError(&span, "Failed to insert metadata", err) return err } return nil } ``` ### Conditional Initialization The initialization path depends on whether the service runs a single module or combines multiple modules: ```go // internal/bootstrap/service.go func InitService(cfg *Config) (*Service, error) { // ttHandler starts as nil — WhenEnabled(nil) is a no-op (single-tenant passthrough) var ttHandler fiber.Handler if cfg.MultiTenantEnabled && cfg.MultiTenantURL != "" { if isUnifiedService { // Multi-module: use MultiPoolMiddleware // See "Multi-module middleware (MultiPoolMiddleware)" section above multiMid := initMultiTenantMiddleware(cfg, logger, consumerTrigger) ttHandler = multiMid.Handle } else { // Single-module: use TenantMiddleware // See "Generic TenantMiddleware (Standard Pattern)" section above ttMid := tmmiddleware.NewTenantMiddleware( tmmiddleware.WithPostgresManager(pgManager), ) ttHandler = ttMid.WithTenantDB } // Do NOT register globally with app.Use() — register per-route in routes.go // using WhenEnabled(ttHandler). See "Route-Level Auth-Before-Tenant Ordering" section. logger.Infof("Multi-tenant mode enabled with Tenant Manager URL: %s", cfg.MultiTenantURL) } else { logger.Info("Running in SINGLE-TENANT MODE") // ttHandler remains nil — WhenEnabled(nil) calls c.Next() immediately } // Pass ttHandler to NewRoutes — routes use WhenEnabled(ttHandler) per-route // ... } ``` **Most services follow the single-module path.** Only unified services like midaz ledger need the multi-module path with `MultiPoolMiddleware`. ### Testing Multi-Tenant Code #### Unit Tests with Mock Tenant Context ```go // internal/service/user_service_test.go func TestUserService_Create_WithTenantContext(t *testing.T) { // Setup tenant context tenantID := "tenant-123" ctx := core.ContextWithTenantID(context.Background(), tenantID) // Mock database connection mockDB := setupMockDB(t) ctx = core.ContextWithTenantPGConnection(ctx, mockDB) // Create service with mock dependencies repo := repository.NewUserRepository() service := service.NewUserService(repo, logger) // Execute input := &CreateUserInput{Name: "John", Email: "john@example.com"} result, err := service.Create(ctx, input) // Assert require.NoError(t, err) assert.Equal(t, "John", result.Name) } ``` #### Testing Tenant Isolation ```go func TestRepository_Create_TenantIsolation(t *testing.T) { tests := []struct { name string tenantID string input *Entity wantErr bool }{ { name: "tenant-1 creates entity", tenantID: "tenant-1", input: &Entity{Name: "Entity A"}, wantErr: false, }, { name: "tenant-2 creates same entity (isolated)", tenantID: "tenant-2", input: &Entity{Name: "Entity A"}, wantErr: false, // Different tenant = different database = allowed }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { // Inject tenant-specific context ctx := core.ContextWithTenantID(context.Background(), tt.tenantID) ctx = core.ContextWithTenantPGConnection(ctx, mockDB) _, err := repo.Create(ctx, tt.input) if tt.wantErr { require.Error(t, err) } else { require.NoError(t, err) } }) } } ``` #### Integration Tests with Tenant Isolation ```go // tests/integration/multi_tenant_test.go func TestMultiTenant_TenantIsolation(t *testing.T) { if !h.IsMultiTenantEnabled() { t.Skip("Multi-tenant mode is not enabled") } env := h.LoadEnvironment() ctx := context.Background() client := h.NewHTTPClient(env.ServerURL, env.HTTPTimeout) // Define two distinct tenants tenantA := "tenant-a-" + h.RandString(6) tenantB := "tenant-b-" + h.RandString(6) headersTenantA := h.TenantAuthHeaders(h.RandHex(8), tenantA) headersTenantB := h.TenantAuthHeaders(h.RandHex(8), tenantB) // Step 1: Tenant A creates organization codeA, bodyA, _ := client.Request(ctx, "POST", "/v1/organizations", headersTenantA, orgPayload) require.Equal(t, 201, codeA) var orgA struct{ ID string `json:"id"` } json.Unmarshal(bodyA, &orgA) // Step 2: Tenant B creates organization codeB, bodyB, _ := client.Request(ctx, "POST", "/v1/organizations", headersTenantB, orgPayload) require.Equal(t, 201, codeB) var orgB struct{ ID string `json:"id"` } json.Unmarshal(bodyB, &orgB) // Step 3: Verify Tenant A cannot see Tenant B's data code, body, _ := client.Request(ctx, "GET", "/v1/organizations", headersTenantA, nil) require.Equal(t, 200, code) var list struct{ Items []struct{ ID string `json:"id"` } `json:"items"` } json.Unmarshal(body, &list) for _, item := range list.Items { assert.NotEqual(t, orgB.ID, item.ID, "ISOLATION VIOLATION: Tenant A can see Tenant B's data") } } ``` #### Testing Error Cases ```go func TestMiddleware_WithTenantDB_ErrorCases(t *testing.T) { tests := []struct { name string setupContext func(*fiber.Ctx) expectedStatus int expectedCode string }{ { name: "missing JWT token", setupContext: func(c *fiber.Ctx) { // No Authorization header }, expectedStatus: 401, expectedCode: "TENANT_ID_REQUIRED", }, { name: "JWT without tenantId claim", setupContext: func(c *fiber.Ctx) { token := createJWTWithoutTenantClaim() c.Request().Header.Set("Authorization", "Bearer "+token) }, expectedStatus: 401, expectedCode: "TENANT_ID_REQUIRED", }, { name: "tenant not found in Tenant Manager", setupContext: func(c *fiber.Ctx) { token := createJWT(map[string]interface{}{"tenantId": "unknown-tenant"}) c.Request().Header.Set("Authorization", "Bearer "+token) }, expectedStatus: 404, expectedCode: "TENANT_NOT_FOUND", }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { app := fiber.New() ctx := app.AcquireCtx(&fasthttp.RequestCtx{}) defer app.ReleaseCtx(ctx) tt.setupContext(ctx) err := middleware.WithTenantDB(ctx) require.Error(t, err) fiberErr, ok := err.(*fiber.Error) require.True(t, ok) assert.Equal(t, tt.expectedStatus, fiberErr.Code) }) } } ``` #### Testing RabbitMQ Multi-Tenant Consumer ```go func TestRabbitMQConsumer_MultiTenant(t *testing.T) { t.Run("injects tenant context from X-Tenant-ID header", func(t *testing.T) { tenantID := "tenant-456" // Create message with tenant header message := amqp.Delivery{ Headers: amqp.Table{ "X-Tenant-ID": tenantID, }, Body: []byte(`{"action": "create"}`), } // Setup consumer consumer := NewMultiTenantConsumer(pool, logger) // Process message ctx, err := consumer.injectTenantDBConnections( context.Background(), tenantID, logger, ) // Assert tenant context injected require.NoError(t, err) extractedTenant := core.GetTenantID(ctx) assert.Equal(t, tenantID, extractedTenant) }) } ``` #### Testing Redis Key Prefixing ```go func TestRedisRepository_MultiTenant_KeyPrefixing(t *testing.T) { t.Run("prefixes keys with tenant ID", func(t *testing.T) { tenantID := "tenant-789" ctx := core.ContextWithTenantID(context.Background(), tenantID) repo := NewRedisRepository(redisConn) err := repo.Set(ctx, "user:session", "value123", 3600) require.NoError(t, err) // Verify key was prefixed key := valkey.GetKeyFromContext(ctx, "user:session") assert.Equal(t, "tenant:tenant-789:user:session", key) }) t.Run("single-tenant mode does not prefix keys", func(t *testing.T) { ctx := context.Background() key := valkey.GetKeyFromContext(ctx, "user:session") assert.Equal(t, "user:session", key) }) } ``` ### Error Handling | Error | HTTP Status | Code | When | |-------|-------------|------|------| | Missing tenantId claim | 401 | `TENANT_ID_REQUIRED` | JWT doesn't have tenantId | | Tenant not found | 404 | `TENANT_NOT_FOUND` | Tenant not registered in Tenant Manager | | Tenant not provisioned | 422 | `TENANT_NOT_PROVISIONED` | Database schema not initialized (SQLSTATE 42P01) | | Tenant suspended | 403 | service-specific | Tenant status is suspended or purged (use `errors.As(err, &core.TenantSuspendedError{})`) | | Service not configured | 503 | service-specific | Tenant exists but has no config for this service/module (`core.ErrServiceNotConfigured`) | | Schema mode error | 422 | service-specific | Invalid schema configuration for tenant database | | Connection error | 503 | service-specific | Failed to get or establish tenant connection | | Manager closed | 503 | service-specific | Connection manager has been shut down (`core.ErrManagerClosed`) | | Circuit breaker open | 503 | service-specific | Tenant Manager client tripped after consecutive failures (`core.ErrCircuitBreakerOpen`) | | Tenant config rate limited | 503 | service-specific | Too many concurrent requests for the same tenant config — retry after brief delay | ### Tenant Isolation Verification (⚠️ CONDITIONAL) Multi-tenant applications MUST verify tenant isolation to prevent data leakage between tenants. **⛔ CONDITIONAL:** This section applies ONLY if `MULTI_TENANT_ENABLED=true`. If single-tenant, mark as N/A. **Detection Question:** Is this a multi-tenant service? ```bash # Check if multi-tenant mode is enabled grep -rn "MULTI_TENANT_ENABLED\|MultiTenantEnabled" internal/ --include="*.go" # If 0 matches OR always set to false: Mark N/A # If found AND can be true: Apply this section ``` #### Isolation Architecture Multi-tenant isolation uses a **database-per-tenant** model. The `tenantId` from JWT determines which database the request connects to. Each tenant has its own database — tenant A cannot query tenant B's database. | Mechanism | How It Works | Protection | |-----------|-------------|------------| | **JWT `tenantId` extraction** | `TenantMiddleware` extracts `tenantId` claim from JWT | Identifies the tenant | | **Database routing** | `TenantConnectionManager` resolves tenant-specific DB connection | Tenant A → Database A, Tenant B → Database B | | **Context injection** | Connection stored in request context | Repositories use `core.ResolvePostgres(ctx, fallback)` / `core.ResolveMongo(ctx, fallback, dbName)` | | **Single-tenant passthrough** | `IsMultiTenant() == false` → `c.Next()` immediately | Backward compatibility | #### Why Tenant Isolation Verification Is MANDATORY | Attack | Without Verification | With Verification | |--------|----------------------|-------------------| | Cross-tenant data access | Tenant A accesses Tenant B's database | Connection-level isolation prevents it | | Data exfiltration | Cross-tenant data leakage | Separate databases per tenant | #### Detection Commands (MANDATORY) ```bash # MANDATORY: Run before every PR in multi-tenant services # Verify all repositories use context-based connections (not static) grep -rn "ResolvePostgres\|ResolveModuleDB\|ResolveMongo" internal/adapters/ --include="*.go" # Verify no repositories use static/hardcoded connections when multi-tenant is enabled # Excludes tenant-aware variables (tenantDB, tenantmanager) to avoid false positives grep -rn "\.DB\.\|\.Database\." internal/adapters/ --include="*.go" | grep -v "_test.go" | grep -v "tenantmanager\|tenantDB" # Expected: All repositories should use tenant-manager context getters (core, valkey, s3 packages) ``` #### Anti-Rationalization Table | Rationalization | Why It's WRONG | Required Action | |-----------------|----------------|-----------------| | "Static connection works fine" | Static connection goes to ONE database. All tenants share it. No isolation. | **Use core.ResolvePostgres(ctx, fallback) / core.ResolveMongo(ctx, fallback, dbName)** | | "We only have one customer" | Requirements change. Multi-tenant is easy to add now, hard later. | **Design for multi-tenant, deploy as single** | | "organization_id filtering = tenant isolation" | organization_id does NOT route to different databases. It is NOT multi-tenant. | **Use tenantId from JWT → TenantConnectionManager** | ### Context Functions lib-commons provides two sets of context functions. They use **separate, isolated context keys** with no fallback between them. | Function | Context Key | Use When | |----------|-------------|----------| | `core.ResolvePostgres(ctx, fallback)` | `tenantPGConnection` | Standard — one database per tenant | | `core.ResolveModuleDB(ctx, module, fallback)` | `tenantPGConnection:{module}` | When service has multiple database modules | ```go // Standard usage db, err := core.ResolvePostgres(ctx, r.connection) if err != nil { return err } // If service has multiple database modules db, err := core.ResolveModuleDB(ctx, "my-module", r.connection) ``` **Context setters (used by middleware, not by service code):** - `core.ContextWithTenantID(ctx, tenantID)` — stores tenant ID - `core.ContextWithTenantPGConnection(ctx, db)` — generic PG connection (set by TenantMiddleware) - `core.ContextWithModulePGConnection(ctx, module, db)` — module-specific PG (when service has multiple DB modules) - `core.ContextWithTenantMongo(ctx, mongoDB)` — MongoDB connection ### Tenant ID Validation lib-commons validates tenant IDs to prevent path traversal and injection: ```go const maxTenantIDLength = 256 var validTenantIDPattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9_-]*$`) ``` Rules: - Must start with alphanumeric character - Only alphanumeric, underscore, and hyphen allowed - Maximum 256 characters - Empty strings rejected ### Redis Key Prefixing for Lua Scripts Beyond simple `Set/Get` operations, Redis Lua scripts require special attention. ALL keys passed as `KEYS[]` and `ARGV[]` to Lua scripts MUST be pre-prefixed in Go **before** the script execution: ```go // ✅ CORRECT: Prefix keys in Go before Lua execution prefixedBackupQueue := valkey.GetKeyFromContext(ctx, TransactionBackupQueue) prefixedTransactionKey := valkey.GetKeyFromContext(ctx, transactionKey) prefixedBalanceSyncKey := valkey.GetKeyFromContext(ctx, utils.BalanceSyncScheduleKey) // Also prefix ARGV values that are used as keys inside the Lua script prefixedInternalKey := valkey.GetKeyFromContext(ctx, blcs.InternalKey) result, err := script.Run(ctx, rds, []string{prefixedBackupQueue, prefixedTransactionKey, prefixedBalanceSyncKey}, finalArgs...).Result() ``` ```go // ❌ FORBIDDEN: Hardcoded keys inside Lua script -- Lua script must NEVER reference keys by name local key = "balance:lock" -- WRONG: not tenant-prefixed // ✅ CORRECT: Lua script uses only KEYS[] and ARGV[] local backupQueue = KEYS[1] -- Already prefixed by Go caller local txKey = KEYS[2] -- Already prefixed by Go caller ``` This pattern also ensures Redis Cluster compatibility (all keys in `KEYS[]` must be in the same hash slot for atomic operations). ### Multi-Tenant Metrics Services implementing multi-tenant MUST expose these metrics: | Metric | Type | Description | |--------|------|-------------| | `tenant_connections_total` | Counter | Total tenant connections created | | `tenant_connection_errors_total` | Counter | Connection failures per tenant | | `tenant_consumers_active` | Gauge | Active message consumers | | `tenant_messages_processed_total` | Counter | Messages processed per tenant | ### Responsibility Split: lib-commons vs Service | Responsibility | lib-commons handles | Service MUST implement | |---------------|--------------------|-----------------------| | **Connection pooling** | Cache per tenant, double-check locking | - | | **Credential fetching** | HTTP call to Tenant Manager API | - | | **JWT parsing** | Extract `tenantId` from token (both middlewares) | - | | **Tenant discovery** | Redis -> API fallback, sync loop | - | | **Lazy consumer lifecycle** | On-demand spawn, backoff, degraded tracking | - | | **Path-based pool routing** | `MultiPoolMiddleware` matches URL to module pools | - | | **Cross-module connection injection** | `MultiPoolMiddleware` resolves PG for all modules when enabled | - | | **Error mapping** | Default error mapper in both middlewares; customizable via `ErrorMapper` in `MultiPoolMiddleware` | - | | **Middleware registration** | - | Register `TenantMiddleware` or `MultiPoolMiddleware` on routes | | **Repository adaptation** | - | Use `core.ResolvePostgres(ctx, fallback)` or `core.ResolveModuleDB(ctx, module, fallback)` instead of global DB | | **Redis key prefixing** | - | Call `valkey.GetKeyFromContext(ctx, key)` for every Redis operation | | **S3 key prefixing** | Tenant-aware key prefix (`s3.GetObjectStorageKeyForTenant`) | Call `s3.GetObjectStorageKeyForTenant(ctx, key)` for every S3 operation | | **Consumer setup** | - | Register handlers, call `consumer.Run(ctx)` at startup | | **Consumer trigger** | - | Implement `ConsumerTrigger` interface (imported from `tmmiddleware`) and wire to middleware | | **Error handling** | Return sentinel errors | Map errors to HTTP status codes (or provide custom `ErrorMapper`) | ### ConsumerTrigger Wiring For `ConsumerTrigger` interface definition and usage, see [ConsumerTrigger interface](#consumertrigger-interface) in the Multi-module middleware section above. Import from `tmmiddleware`, not from service-specific packages. For `TenantMiddleware`, wire it externally. For `MultiPoolMiddleware`, pass it via `WithConsumerTrigger(ct)`. ### Anti-Rationalization Table (General) | Rationalization | Why It's WRONG | Required Action | |-----------------|----------------|-----------------| | "Service already has multi-tenant code" | Existence ≠ compliance. Code that doesn't match the Ring canonical model (lib-commons v3 tenant-manager sub-packages) is non-compliant and MUST be replaced entirely. | **STOP. Run compliance audit against this document. Replace every non-compliant component.** | | "Our custom multi-tenant approach works" | Working ≠ compliant. Custom implementations create drift, block lib-commons upgrades, prevent standardized tooling, and cannot be validated by automated compliance checks. | **STOP. Replace with canonical lib-commons v3 implementation.** | | "Just need to adapt/patch the existing code" | Non-standard implementations cannot be patched into compliance. The patterns are structurally different (context-based resolution vs static connections, lib-commons middleware vs custom middleware). | **STOP. Replace, do not patch.** | | "We only have one customer" | Requirements change. Multi-tenant is easy to add now, hard later. | **Design for multi-tenant, deploy as single** | | "Tenant Manager adds complexity" | Complexity is in connection management anyway. Tenant Manager standardizes it. | **Use Tenant Manager for multi-tenant** | | "JWT parsing is expensive" | Parse once in middleware, use from context everywhere. | **Extract tenant once, propagate via context** | | "We'll add tenant isolation later" | Retrofitting tenant isolation is a rewrite. | **Build tenant-aware from the start** | ### Final Step: Single-Tenant Backward Compatibility Validation (MANDATORY) **HARD GATE: This is the LAST step of every multi-tenant implementation.** CANNOT merge or deploy without completing this validation. If the service was already running as single-tenant before multi-tenant was added, it MUST continue working unchanged with `MULTI_TENANT_ENABLED=false` (the default). Multi-tenant is opt-in. Single-tenant is the baseline. Breaking it is a production incident for all existing deployments. #### How the Backward Compatibility Mechanism Works The middleware checks `pool.IsMultiTenant()` at the start of every request. When `MULTI_TENANT_ENABLED=false`: - `IsMultiTenant()` returns `false` - Middleware returns `c.Next()` immediately — zero tenant logic applied - No JWT parsing, no Tenant Manager calls, no pool routing - The service uses its default database connection as before ```go // Single-tenant mode: pass through if pool is not multi-tenant if !pool.IsMultiTenant() { return c.Next() // No tenant logic applied — service works as before } ``` #### Validation Steps (execute in order) **Step 1 — Remove all multi-tenant env vars and start the service:** ```bash # Unset ALL multi-tenant variables unset MULTI_TENANT_ENABLED MULTI_TENANT_URL MULTI_TENANT_ENVIRONMENT unset MULTI_TENANT_MAX_TENANT_POOLS MULTI_TENANT_IDLE_TIMEOUT_SEC unset MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC # Start the service — MUST start without errors, without Tenant Manager running go run cmd/app/main.go # Expected log: "Running in SINGLE-TENANT MODE" ``` **Step 2 — Run the full existing test suite (single-tenant):** ```bash # ALL pre-existing tests MUST pass with multi-tenant disabled MULTI_TENANT_ENABLED=false go test ./... ``` **Step 3 — Run backward compatibility integration test:** ```go func TestMultiTenant_BackwardCompatibility(t *testing.T) { // MUST skip when multi-tenant is enabled — this test validates single-tenant only if h.IsMultiTenantEnabled() { t.Skip("Skipping backward compatibility test - multi-tenant mode is enabled") } // Create resources WITHOUT tenant context — MUST work in single-tenant mode code, body, err := client.Request(ctx, "POST", "/v1/organizations", headers, orgPayload) require.Equal(t, 201, code, "single-tenant CRUD must work without tenant context") // List resources — MUST return data normally code, body, err = client.Request(ctx, "GET", "/v1/organizations", headers, nil) require.Equal(t, 200, code, "single-tenant list must work without tenant context") // Health endpoints — MUST work without any auth or tenant context code, _, _ = client.Request(ctx, "GET", "/health", nil, nil) require.Equal(t, 200, code, "health endpoint must work in single-tenant mode") } ``` **Step 4 — Validate against this checklist:** | # | Check | How to Verify | Pass Criteria | |---|-------|--------------|---------------| | 1 | Service starts without `MULTI_TENANT_*` vars | Remove all vars, start service | Starts normally, logs "SINGLE-TENANT MODE" | | 2 | Service starts without Tenant Manager | Don't run Tenant Manager, start service | No connection errors, no panics | | 3 | All existing CRUD operations work | Run pre-existing integration tests | All pass with same behavior as before | | 4 | Health/version/swagger endpoints work | `GET /health`, `GET /version` | 200 OK without any auth headers | | 5 | Default DB connection is used | Check DB queries go to the configured `DB_HOST` | Queries hit single-tenant database | | 6 | No new required env vars break startup | Start with only the env vars the service had before | Service starts without errors | **Step 5 — Run multi-tenant test suite (both modes work):** ```bash # Confirm multi-tenant mode also works MULTI_TENANT_ENABLED=true MULTI_TENANT_URL=http://tenant-manager:4003 go test ./... -run "MultiTenant" ``` #### Anti-Rationalization | Rationalization | Why It's WRONG | Required Action | |-----------------|----------------|-----------------| | "Nobody uses single-tenant anymore" | Existing deployments depend on it. Breaking them is a production incident for every customer running self-hosted. | **STOP. Run the validation steps.** | | "We tested multi-tenant, that's enough" | Multi-tenant tests exercise a DIFFERENT code path (`IsMultiTenant()=true`). Single-tenant (`IsMultiTenant()=false`) is a separate path that needs separate verification. | **STOP. Run both test suites.** | | "The passthrough is trivial, it can't break" | Config struct changes, new required env vars, middleware ordering changes, import side effects — all of these can silently break the passthrough path. | **STOP. Verify with actual requests.** | | "I'll test single-tenant later" | Later never comes. Once merged, the damage is done. Existing CI may not catch it if tests assume multi-tenant. | **STOP. Test now, before merge.** | ### Checklist **Environment Variables:** - [ ] `MULTI_TENANT_ENABLED` in config struct (default: `false`) - [ ] `MULTI_TENANT_URL` in config struct (required if multi-tenant) - [ ] `MULTI_TENANT_ENVIRONMENT` in config struct (default: `staging`, only if RabbitMQ) - [ ] `MULTI_TENANT_MAX_TENANT_POOLS` in config struct (default: `100`) - [ ] `MULTI_TENANT_IDLE_TIMEOUT_SEC` in config struct (default: `300`) - [ ] `MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD` in config struct (default: `5`) - [ ] `MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC` in config struct (default: `30`) - [ ] `MULTI_TENANT_SERVICE_API_KEY` in config struct (required) **Architecture:** - [ ] `client.NewClient(url, logger, opts...)` returns `(*Client, error)` — handle error for fail-fast - [ ] `client.WithServiceAPIKey(cfg.MultiTenantServiceAPIKey)` always passed (lib-commons validates internally) - [ ] `tmpostgres.NewManager(client, service, WithModule(...), WithLogger(...), WithMaxTenantPools(...), WithIdleTimeout(...))` for PostgreSQL pool - [ ] Each manager has `Stats()`, `IsMultiTenant()`, and `ApplyConnectionSettings()` methods **Middleware — choose one:** - [ ] For single-module services: `tmmiddleware.NewTenantMiddleware(WithPostgresManager(...))` registered in routes via `WhenEnabled` - [ ] For multi-module services: `tmmiddleware.NewMultiPoolMiddleware(WithRoute(...), WithDefaultRoute(...))` registered in routes via `WhenEnabled` - [ ] `WhenEnabled` helper implemented locally in the routes file (nil check → `c.Next()`) - [ ] Tenant middleware passed as nil when `MULTI_TENANT_ENABLED=false` (single-tenant passthrough via `WhenEnabled` nil check) **Middleware & Context:** - [ ] JWT tenant extraction (claim key: `tenantId`) - [ ] `core.ContextWithTenantID()` in middleware - [ ] Public endpoints (`/health`, `/version`, `/swagger`) bypass tenant middleware - [ ] `core.ErrTenantNotFound` → 404, `core.ErrManagerClosed` → 503, `core.ErrServiceNotConfigured` → 503 - [ ] `core.IsTenantNotProvisionedError()` in error handler → 422 - [ ] `core.ErrTenantContextRequired` handled in repositories - [ ] `ConsumerTrigger` imported from `tmmiddleware` (not from midaz pkg or service-specific packages) - [ ] `ConsumerTrigger.EnsureConsumerStarted()` called after tenant ID extraction (if using lazy mode) **Repositories:** - [ ] `core.ResolvePostgres(ctx, fallback)` in PostgreSQL repositories (single-module services) - [ ] `core.ResolveModuleDB(ctx, module, fallback)` in PostgreSQL repositories (multi-module services) - [ ] `valkey.GetKeyFromContext(ctx, key)` for ALL Redis keys (including Lua script KEYS[] and ARGV[]) - [ ] `core.ResolveMongo(ctx, fallback, dbName)` in MongoDB repositories (if using MongoDB) - [ ] `s3.GetObjectStorageKeyForTenant(ctx, key)` for ALL S3 operations (if using S3/object storage) **Async Processing:** - [ ] Tenant ID header (`X-Tenant-ID`) in RabbitMQ messages - [ ] `consumer.NewMultiTenantConsumer` with `consumer.WithPostgresManager` and `consumer.WithMongoManager` - [ ] `consumer.Register(queueName, handler)` for each queue - [ ] `consumer.Run(ctx)` at startup (non-blocking, <1s) - [ ] `ConsumerTrigger` interface implemented and wired to middleware **Testing:** - [ ] Unit tests with mock tenant context - [ ] Tenant isolation tests (verify data separation between tenants) - [ ] Error case tests (missing tenant, invalid tenant, tenant not found) - [ ] Integration tests with two distinct tenants verifying cross-tenant isolation - [ ] RabbitMQ consumer tests (X-Tenant-ID header extraction) - [ ] Redis key prefixing tests (verify tenant prefix applied, including Lua scripts) **Single-Tenant Backward Compatibility (MANDATORY):** - [ ] All existing tests pass with `MULTI_TENANT_ENABLED=false` (default) - [ ] Service starts without any `MULTI_TENANT_*` environment variables - [ ] Service starts without Tenant Manager running - [ ] All CRUD operations work in single-tenant mode - [ ] Backward compatibility integration test exists (`TestMultiTenant_BackwardCompatibility`) - [ ] Health/version endpoints work without tenant context --- ## Route-Level Auth-Before-Tenant Ordering (MANDATORY) **MANDATORY:** When using multi-tenant middleware, auth MUST validate the JWT **before** tenant middleware resolves the database connection. This ordering is a security requirement, not a performance optimization. ### Why This Matters | Concern | Impact Without Auth-Before-Tenant | |---------|-----------------------------------| | **SECURITY** | Forged or expired JWTs trigger Tenant Manager API calls before token signature validation. Any request with a `tenantId` claim — valid or not — causes a network round-trip to resolve tenant DB credentials. | | **PERFORMANCE** | Unauthenticated requests trigger unnecessary Tenant Manager API round-trips (~50ms+ each). At scale, this adds significant latency and load to the Tenant Manager service. | | **DoS VECTOR** | Attackers can flood the Tenant Manager API with crafted tokens containing valid-looking `tenantId` claims. Since tenant resolution happens before auth rejects the token, every malicious request costs a TM API call. | ### The WRONG Pattern (Anti-Pattern) ```go // ❌ WRONG: Tenant middleware runs before auth on ALL routes app.Use(tenantMid.WithTenantDB) // Runs first — calls TM API before auth validates JWT app.Post("/v1/resources", auth.Authorize("app", "resource", "post"), handler.Create) ``` In this pattern, `WithTenantDB` executes for **every request** before `auth.Authorize` validates the JWT. A request with a forged JWT containing `tenantId: "victim-tenant"` triggers a full Tenant Manager resolution — fetching credentials, opening connections — before auth rejects it. ### The CORRECT Pattern: WhenEnabled **MUST use `WhenEnabled` — a simple helper function that each project implements locally — to conditionally apply tenant middleware per-route.** Auth is listed before `WhenEnabled` in the handler chain, guaranteeing auth runs first. Tenant resolution runs only for authenticated requests. **`WhenEnabled` implementation (each project implements this locally):** ```go // WhenEnabled is a helper that conditionally applies a middleware if it's not nil. // When multi-tenant is disabled, the middleware passed is nil and WhenEnabled calls c.Next(). func WhenEnabled(middleware fiber.Handler) fiber.Handler { return func(c *fiber.Ctx) error { if middleware == nil { return c.Next() } return middleware(c) } } ``` **Route registration:** ```go // ✅ CORRECT: Auth validates JWT FIRST, then tenant resolves DB // ttHandler is nil when MULTI_TENANT_ENABLED=false (single-tenant passthrough) f.Post("/v1/resources", auth.Authorize("app", "resource", "post"), WhenEnabled(ttHandler), handler.Create) f.Get("/v1/resources", auth.Authorize("app", "resource", "get"), WhenEnabled(ttHandler), handler.GetAll) f.Get("/v1/resources/:id", auth.Authorize("app", "resource", "get"), WhenEnabled(ttHandler), handler.GetByID) ``` **How it works:** 1. `auth.Authorize(...)` is the first handler — validates JWT before anything else 2. `WhenEnabled(ttHandler)` runs second — if `ttHandler` is nil (single-tenant mode), it calls `c.Next()` immediately; if non-nil, it executes the tenant middleware 3. The business handler runs last 4. If auth rejects the request, tenant middleware never runs — no TM API call 5. If `MULTI_TENANT_ENABLED=false`, `ttHandler` is nil and `WhenEnabled` is a no-op — zero overhead ### Detection Commands (MANDATORY) ```bash # MANDATORY: Run before every PR in multi-tenant services # Check for global tenant middleware registration (anti-pattern) grep -rn "app\.Use(.*WithTenantDB\|app\.Use(.*tenantMid" internal/ --include="*.go" # Expected: 0 matches. Tenant middleware MUST NOT be registered globally. # Check for correct per-route composition: auth.Authorize BEFORE WhenEnabled on same route grep -rnE '^\s*(app|f)\.(Get|Post|Put|Patch|Delete)\(.*auth\.Authorize\(.*WhenEnabled\(' internal/ --include="*.go" # Expected: 1+ matches in routes.go — auth appears before WhenEnabled on protected routes. ``` ### Anti-Rationalization Table | Rationalization | Why It's WRONG | Required Action | |-----------------|----------------|-----------------| | "Auth middleware is already global, so order doesn't matter" | Global middleware ordering is implicit and fragile — a refactor can silently break it. Different routes may need different auth handlers. | **MUST use explicit per-route composition: `auth, WhenEnabled(tenant), handler`** | | "Tenant resolution is fast, no harm running it first" | TM API calls are network round-trips (~50ms+). At scale, unauthorized traffic amplifies cost. Every unauthenticated request wastes a TM API call. | **MUST authenticate before any TM API call** | | "We'll just register auth middleware before tenant in app.Use()" | Global ordering provides no guarantee per-route. Different routes may need different auth handlers. A single `app.Use()` reorder silently breaks security for all routes. | **MUST compose auth+tenant per-route using WhenEnabled** | | "Only internal services call this endpoint, no DoS risk" | Internal networks are not trusted by default. Compromised services, misconfigured proxies, or lateral movement can generate unauthorized traffic. | **MUST enforce auth-before-tenant regardless of network topology** | | "We already validate tokens at the API gateway" | Defense in depth. Gateway validation can be bypassed, misconfigured, or removed. Service-level auth is the last line of defense. | **MUST validate auth at service level before tenant resolution** | --- ## Multi-Tenant Message Queue Consumers (Lazy Mode) ### When to Use **CONDITIONAL:** Only applies if your service: - Processes messages from a message broker (RabbitMQ, SQS, etc.) - Uses per-tenant message isolation (dedicated vhosts or queues per tenant) - Has 10+ tenants where startup time is a concern **If single-tenant or <10 tenants:** Multi-tenant mode overhead may be unnecessary. Consider single-tenant architecture. --- ### Problem: Startup Time Scales with Tenant Count In multi-tenant services consuming messages, connecting to ALL tenant vhosts at startup causes: ``` Startup Time = N tenants × 500ms per connection 10 tenants = 5 seconds 100 tenants = 50 seconds ← Unacceptable for autoscaling/deployments ``` **Symptoms:** - Slow deployments (rolling updates wait for each pod to connect) - Poor autoscaling responsiveness (new pods take 30-60s to become ready) - Wasted resources (connections to inactive tenants) --- ### Solution: Lazy Consumer Initialization **Pattern:** Decouple tenant discovery from consumer connection. ``` Startup: 1. Discover tenant list (Redis/API) - lightweight, <1s 2. Track discovered tenants in memory (knownTenants map) 3. Do NOT start consumers yet 4. Return immediately (startup complete) On First Request per Tenant: 1. HTTP middleware extracts tenant ID 2. Middleware calls consumer.EnsureConsumerStarted(ctx, tenantID) 3. Consumer spawns on-demand (first time: ~500ms) 4. Connection cached for reuse 5. Subsequent requests: fast path (<1ms) ``` **Result:** Startup time O(1) regardless of tenant count, resources scale with active tenants only. --- ### Architecture Components #### 1. Tenant Discovery Service **Responsibility:** Provide list of active tenants without connection overhead. **Implementation:** - **Primary:** Cache (Redis SET: `tenant-manager:tenants:active`) - **Fallback:** HTTP API (`GET /tenants/active?service={serviceName}`) **Response format (API):** ```json [ {"id": "tenant-001", "name": "Tenant A", "status": "active"}, {"id": "tenant-002", "name": "Tenant B", "status": "active"} ] ``` **Endpoint characteristics:** - Returns minimal info (id, name, status only) - Supports optional filtering by service (query param: `?service={serviceName}`) #### 2. Consumer Manager (lib-commons) **Responsibility:** Manage lifecycle of message consumers across tenants with lazy initialization. **Key methods:** ```go type MultiTenantConsumer struct { mu sync.RWMutex // Protects knownTenants and activeTenants maps knownTenants map[string]bool // Discovered tenants activeTenants map[string]CancelFunc // Running consumers consumerLocks sync.Map // Per-tenant mutexes retryState sync.Map // Failure tracking } // Discover tenants without connecting (non-blocking, <1s) func (c *MultiTenantConsumer) Run(ctx context.Context) error // Ensure consumer is active for tenant (idempotent, thread-safe, fire-and-forget) func (c *MultiTenantConsumer) EnsureConsumerStarted(ctx context.Context, tenantID string) // Check if tenant has failed repeatedly func (c *MultiTenantConsumer) IsDegraded(tenantID string) bool // Get runtime statistics func (c *MultiTenantConsumer) Stats() ConsumerStats ``` #### 3. HTTP Middleware Trigger **Responsibility:** Trigger lazy consumer spawn when requests arrive. **Implementation pattern:** ```go func TenantMiddleware(consumer ConsumerTrigger) fiber.Handler { return func(c *fiber.Ctx) error { // Extract tenant ID from header or JWT tenantID := c.Get("x-tenant-id") if tenantID == "" { return fiber.NewError(400, "x-tenant-id required") } // Lazy mode trigger: ensure consumer is active // First time: spawns consumer (~500ms) // Subsequent: fast path (<1ms) if consumer != nil { // Nil-safe for single-tenant mode ctx := c.UserContext() consumer.EnsureConsumerStarted(ctx, tenantID) // Fire-and-forget: consumer retries via background sync if spawn fails } // Continue with request processing return c.Next() } } ``` **Placement:** After tenant ID extraction, before database connection resolution. --- ### Implementation Steps #### Step 1: Update Shared Library Implement lazy mode in your message queue consumer library: 1. **Add state tracking:** ```go knownTenants map[string]bool // Discovered via API/cache activeTenants map[string]CancelFunc // Actually running consumerLocks sync.Map // Prevent duplicate spawns ``` 2. **Non-blocking discovery:** ```go func (c *Consumer) Run(ctx context.Context) error { // Discover tenants (timeout: 500ms, soft failure) c.discoverTenants(ctx) // Start background sync loop (for tenant add/remove) go c.runSyncLoop(ctx) return nil // Return immediately } ``` 3. **On-demand spawning with double-check locking:** ```go func (c *Consumer) EnsureConsumerStarted(ctx context.Context, tenantID string) { // Fast path: check if already active (read lock) c.mu.RLock() if _, exists := c.activeTenants[tenantID]; exists { c.mu.RUnlock() return // Already running } c.mu.RUnlock() // Slow path: acquire per-tenant mutex (prevent thundering herd) mutex := c.getPerTenantMutex(tenantID) mutex.Lock() defer mutex.Unlock() // Double-check after acquiring lock c.mu.RLock() if _, exists := c.activeTenants[tenantID]; exists { c.mu.RUnlock() return // Another goroutine created it } c.mu.RUnlock() // Spawn consumer (fire-and-forget, errors logged internally) c.startTenantConsumer(ctx, tenantID) } ``` #### Step 2: Add Tenant Discovery Endpoint In your tenant management service: ```go // GET /tenants/active?service={serviceName} func GetActiveTenants(c *fiber.Ctx) error { service := c.Query("service") // Query active tenants (filter by service if provided) tenants, err := repo.ListActiveTenants(service) if err != nil { return libHTTP.InternalServerError(c, "TENANT_LIST_FAILED", err) } // Return minimal info (no credentials) response := []TenantSummary{} for _, t := range tenants { response = append(response, TenantSummary{ ID: t.ID, Name: t.Name, Status: t.Status, }) } return libHTTP.OK(c, response) } // Register as PUBLIC endpoint (no auth) app.Get("/tenants/active", GetActiveTenants) ``` #### Step 3: Wire Trigger in Service Middleware In each service consuming messages: ```go // In bootstrap/config.go func InitServers() { // Create multi-tenant consumer tmConsumer := consumer.NewMultiTenantConsumer(...) if err := tmConsumer.Run(ctx); err != nil { logger.Errorf("tenant manager startup failed: %v", err) // Note: startup continues - consumer will retry tenant discovery in background } // Create middleware with consumer trigger svc.ttHandler = NewTenantMiddleware(tmConsumer) // Register per-route in routes.go using WhenEnabled // See "Route-Level Auth-Before-Tenant Ordering" section } // In middleware file type TenantMiddleware struct { consumer ConsumerTrigger // Interface with EnsureConsumerStarted method } func (m *TenantMiddleware) Handle(c *fiber.Ctx) error { tenantID := extractTenantID(c) // From header or JWT // Lazy mode trigger (fire-and-forget, errors logged internally) if m.consumer != nil { ctx := c.UserContext() m.consumer.EnsureConsumerStarted(ctx, tenantID) } return c.Next() } ``` --- ### Failure Resilience Pattern **Exponential Backoff:** ```go func backoffDelay(retryCount int) time.Duration { delays := []time.Duration{5*time.Second, 10*time.Second, 20*time.Second, 40*time.Second} if retryCount >= len(delays) { return delays[len(delays)-1] // Cap at 40s } return delays[retryCount] } ``` **Per-Tenant Retry State:** ```go type retryState struct { count int degraded bool // True after 3 failures } retryState sync.Map // Key: tenantID, Value: *retryState ``` **Degraded Tenant Handling:** ```go if consumer.IsDegraded(tenantID) { logger.Errorf("Tenant %s is degraded (3+ connection failures)", tenantID) return errors.New("tenant degraded") } ``` --- ### Observability Pattern **Enhanced Stats API:** ```go type ConsumerStats struct { ConnectionMode string `json:"connectionMode"` // "lazy" ActiveTenants int `json:"activeTenants"` // Connected KnownTenants int `json:"knownTenants"` // Discovered PendingTenants []string `json:"pendingTenants"` // Known but not active DegradedTenants []string `json:"degradedTenants"` // Failed 3+ times } ``` **Structured Logs:** - `connection_mode=lazy` at startup - `on-demand consumer start for tenant: {id}` when spawning - `connecting to vhost: tenant={id}` when connecting - `tenant {id} marked degraded` after max retries --- ### Testing Strategy **Unit Tests:** - Startup completes in <1s (0, 100, 500 tenants) - Concurrent EnsureConsumerStarted spawns exactly 1 consumer - Exponential backoff sequence (5s, 10s, 20s, 40s) - Degraded tenant detection after 3 failures **Integration Tests:** - Discovery fallback (Redis → API) - On-demand connection with testcontainers - Tenant removal cleanup (<30s) --- ### When to Use Multi-Tenant Lazy Consumer | Scenario | Recommended | Rationale | |----------|-------------|-----------| | **10+ tenants** | ✅ Yes | Startup time becomes significant with many tenants | | **<10 tenants** | ⚠️ Consider | Overhead may not justify complexity | | **Most tenants inactive** | ✅ Yes | Resources scale with active count only | | **All tenants active** | ✅ Yes | Still faster startup, resources scale proportionally | | **Frequent deployments** | ✅ Yes | Fast startup critical for CI/CD velocity | | **Latency-sensitive** | ✅ Yes* | *First request per tenant: +500ms (acceptable trade-off) | --- ### Common Pitfalls **❌ Don't:** Start consumers in discovery loop (defeats lazy purpose) **✅ Do:** Populate knownTenants only, defer connection to trigger **❌ Don't:** Use global mutex for all tenants (contention) **✅ Do:** Per-tenant mutex via sync.Map (fine-grained locking) **❌ Don't:** Fail HTTP request if consumer spawn fails **✅ Do:** Log warning, let background sync retry **❌ Don't:** Forget to cleanup on tenant removal **✅ Do:** Remove from knownTenants, activeTenants, consumerLocks **❌ Don't:** Retry indefinitely on connection failure **✅ Do:** Mark degraded after 3 failures, stop retrying --- ### Single-Tenant vs Multi-Tenant Mode ```go // Support both single-tenant and multi-tenant deployments if !config.MultiTenantEnabled { // Single-tenant: static RabbitMQ connection (no tenant isolation) consumer = initSingleTenantConsumer(...) } else { // Multi-tenant: lazy mode with per-tenant vhosts consumer = initMultiTenantConsumer(...) } ``` **Middleware trigger (multi-tenant only):** ```go if m.consumerTrigger != nil { // Nil in single-tenant mode m.consumerTrigger.EnsureConsumerStarted(ctx, tenantID) } ``` **Note:** Single-tenant uses a different consumer implementation without tenant isolation. Multi-tenant consumers ALWAYS use lazy mode for optimal startup performance. --- ## M2M Credentials via Secret Manager (Plugin-Only) **CONDITIONAL:** Only implement if the service is a **plugin** that needs to authenticate with a **product** (e.g., ledger, midaz, CRM). Products do NOT need this — only plugins that call product APIs. ### When This Applies | Service Type | `MULTI_TENANT_ENABLED` | Needs Secret Manager? | Reason | |-------------|------------------------|----------------------|--------| | **Plugin** (plugin-pix, plugin-crm, etc.) | `true` | ✅ YES | Plugin must authenticate with product APIs per tenant | | **Plugin** (plugin-pix, plugin-crm, etc.) | `false` | ❌ NO | Single-tenant — plugin uses existing static auth, no Secrets Manager calls | | **Product** (midaz, ledger, CRM) | any | ❌ NO | Products don't call other products via M2M | | **Infrastructure** (tenant-manager, reporter) | any | ❌ NO | Infrastructure services use internal auth | **⛔ Backward Compatibility:** When `MULTI_TENANT_ENABLED=false` (the default), the plugin MUST continue working with its existing authentication mechanism — no AWS Secrets Manager calls, no M2M credential fetching. The Secret Manager path is activated **only** when multi-tenant mode is enabled. This follows the same conditional pattern as all other tenant-manager resources (PostgreSQL, MongoDB, Redis, S3, RabbitMQ). ### How It Works When `MULTI_TENANT_ENABLED=true`, each tenant has its own M2M credentials stored in **AWS Secrets Manager**. When a plugin needs to call a product API (e.g., ledger), it must: 1. Extract `tenantOrgID` from the JWT context (already available via tenant middleware) 2. Call `secretsmanager.GetM2MCredentials()` to fetch `clientId` + `clientSecret` for that tenant 3. Pass the credentials to the existing Plugin Access Manager integration (which handles JWT acquisition) **Note:** The plugin already handles JWT token acquisition via Plugin Access Manager. This section only covers **how to retrieve M2M credentials** from AWS Secrets Manager — not how to exchange them for tokens. ### Required lib-commons Package ```go import ( secretsmanager "github.com/LerianStudio/lib-commons/v3/commons/secretsmanager" ) ``` ### Secret Path Convention Credentials are stored in AWS Secrets Manager following this path: ``` tenants/{env}/{tenantOrgID}/{applicationName}/m2m/{targetService}/credentials ``` | Segment | Source | Example | |---------|--------|---------| | `env` | `MULTI_TENANT_ENVIRONMENT` env var | `staging`, `production` | | `tenantOrgID` | JWT `owner` claim via `auth.GetTenantID(ctx)` | `org_01KHVKQQP6D2N4RDJK0ADEKQX1` | | `applicationName` | Plugin's own service name constant | `plugin-pix`, `plugin-crm` | | `targetService` | The product being called | `ledger`, `midaz` | ### Environment Variables (Plugin-Only) In addition to the 8 canonical multi-tenant env vars, plugins MUST add: | Env Var | Description | Default | Required | |---------|-------------|---------|----------| | `AWS_REGION` | AWS region for Secrets Manager | - | Yes (for plugins) | | `M2M_TARGET_SERVICE` | Target product service name | - | Yes (for plugins) | | `M2M_CREDENTIAL_CACHE_TTL_SEC` | Local cache TTL for credentials | `300` | No | ### Implementation Pattern #### 1. Fetching M2M Credentials ```go package m2m import ( "context" "fmt" awsconfig "github.com/aws/aws-sdk-go-v2/config" awssm "github.com/aws/aws-sdk-go-v2/service/secretsmanager" secretsmanager "github.com/LerianStudio/lib-commons/v3/commons/secretsmanager" ) // FetchCredentials retrieves M2M credentials from AWS Secrets Manager for a specific tenant. func FetchCredentials(ctx context.Context, env, tenantOrgID, applicationName, targetService string) (*secretsmanager.M2MCredentials, error) { cfg, err := awsconfig.LoadDefaultConfig(ctx) if err != nil { return nil, fmt.Errorf("loading AWS config: %w", err) } client := awssm.NewFromConfig(cfg) creds, err := secretsmanager.GetM2MCredentials(ctx, client, env, tenantOrgID, applicationName, targetService) if err != nil { return nil, fmt.Errorf("fetching M2M credentials for tenant %s: %w", tenantOrgID, err) } return creds, nil } ``` #### 2. Credential Caching (MANDATORY) **MUST cache credentials locally.** Hitting AWS Secrets Manager on every request is expensive and adds latency. ```go package m2m import ( "context" "fmt" "sync" "time" secretsmanager "github.com/LerianStudio/lib-commons/v3/commons/secretsmanager" ) type cachedCredentials struct { creds *secretsmanager.M2MCredentials expiresAt time.Time } // M2MCredentialProvider handles per-tenant M2M credential retrieval with caching. // Token acquisition is handled by Plugin Access Manager — this only provides credentials. type M2MCredentialProvider struct { smClient secretsmanager.SecretsManagerClient env string applicationName string targetService string credCacheTTL time.Duration credCache sync.Map // map[tenantOrgID]*cachedCredentials } // NewM2MCredentialProvider creates a credential provider with configurable cache TTL. func NewM2MCredentialProvider( smClient secretsmanager.SecretsManagerClient, env, applicationName, targetService string, credCacheTTL time.Duration, ) *M2MCredentialProvider { return &M2MCredentialProvider{ smClient: smClient, env: env, applicationName: applicationName, targetService: targetService, credCacheTTL: credCacheTTL, } } // GetCredentials returns M2M credentials for the given tenant, using cache when possible. // The caller (Plugin Access Manager integration) handles token acquisition. func (p *M2MCredentialProvider) GetCredentials(ctx context.Context, tenantOrgID string) (*secretsmanager.M2MCredentials, error) { if cached, ok := p.credCache.Load(tenantOrgID); ok { cc := cached.(*cachedCredentials) if time.Now().Before(cc.expiresAt) { return cc.creds, nil } } creds, err := secretsmanager.GetM2MCredentials(ctx, p.smClient, p.env, tenantOrgID, p.applicationName, p.targetService) if err != nil { return nil, fmt.Errorf("fetching M2M credentials for tenant %s: %w", tenantOrgID, err) } p.credCache.Store(tenantOrgID, &cachedCredentials{ creds: creds, expiresAt: time.Now().Add(p.credCacheTTL), }) return creds, nil } ``` #### 3. Single-Tenant vs Multi-Tenant: Conditional Flow This is the core pattern. The plugin MUST work in both modes — the `M2MCredentialProvider` is **nil** in single-tenant mode. **Few-shot 1 — Bootstrap wiring (picks the right path at startup):** ```go // In bootstrap/config.go or bootstrap/dependencies.go var m2mProvider *m2m.M2MCredentialProvider // nil = single-tenant mode if cfg.MultiTenantEnabled { // MULTI-TENANT: create credential provider that fetches from AWS Secrets Manager awsCfg, err := awsconfig.LoadDefaultConfig(ctx) if err != nil { logger.Fatalf("Failed to load AWS config for M2M: %v", err) } smClient := awssm.NewFromConfig(awsCfg) m2mProvider = m2m.NewM2MCredentialProvider( smClient, cfg.MultiTenantEnvironment, constant.ApplicationName, cfg.M2MTargetService, time.Duration(cfg.M2MCredentialCacheTTLSec) * time.Second, ) } // SINGLE-TENANT: m2mProvider stays nil — no AWS calls, no Secret Manager // Both modes use the same client — it checks internally if m2mProvider is nil productClient := product.NewClient(cfg.ProductURL, m2mProvider) ``` **Few-shot 2 — Product client (handles both modes transparently):** ```go // internal/adapters/product/client.go type Client struct { baseURL string m2mProvider *m2m.M2MCredentialProvider // nil in single-tenant mode httpClient *http.Client } func NewClient(baseURL string, m2mProvider *m2m.M2MCredentialProvider) *Client { return &Client{ baseURL: baseURL, m2mProvider: m2mProvider, httpClient: &http.Client{Timeout: 30 * time.Second}, } } func (c *Client) CreateTransaction(ctx context.Context, input TransactionInput) (*TransactionOutput, error) { req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.baseURL+"/v1/transactions", marshal(input)) if err != nil { return nil, err } if c.m2mProvider != nil { // MULTI-TENANT: fetch per-tenant credentials from Secret Manager tenantOrgID := auth.GetTenantID(ctx) creds, err := c.m2mProvider.GetCredentials(ctx, tenantOrgID) if err != nil { return nil, fmt.Errorf("fetching M2M credentials for tenant %s: %w", tenantOrgID, err) } req.SetBasicAuth(creds.ClientID, creds.ClientSecret) } // SINGLE-TENANT: no credentials injected — plugin uses existing auth // (e.g., static token from env var, already set in headers by middleware, etc.) resp, err := c.httpClient.Do(req) if err != nil { return nil, fmt.Errorf("calling product API: %w", err) } defer resp.Body.Close() // ... handle response } ``` **Few-shot 3 — Service layer (no branching needed, client handles it):** ```go func (uc *ProcessPaymentUseCase) Execute(ctx context.Context, input PaymentInput) error { // Works in both modes: // - Single-tenant: client calls product API with existing static auth // - Multi-tenant: client fetches per-tenant creds from Secret Manager first resp, err := uc.ledgerClient.CreateTransaction(ctx, input.Transaction) if err != nil { return fmt.Errorf("creating transaction in ledger: %w", err) } return nil } ``` **The pattern:** The conditional logic lives in the **client/adapter layer**, not in the service/use-case layer. The service layer calls the same method regardless of mode — it doesn't know or care whether credentials came from Secret Manager or static config. ### Error Handling The `secretsmanager` package provides sentinel errors for precise error handling: ```go import ( "errors" secretsmanager "github.com/LerianStudio/lib-commons/v3/commons/secretsmanager" ) creds, err := secretsmanager.GetM2MCredentials(ctx, client, env, tenantOrgID, appName, target) if err != nil { switch { case errors.Is(err, secretsmanager.ErrM2MCredentialsNotFound): // Tenant not provisioned yet — return 503 or queue for retry case errors.Is(err, secretsmanager.ErrM2MVaultAccessDenied): // IAM permissions missing or token expired — alert ops case errors.Is(err, secretsmanager.ErrM2MInvalidCredentials): // Secret exists but clientId/clientSecret missing — alert ops default: // Infrastructure error — retry with backoff } } ``` ### AWS IAM Permissions The plugin's IAM role (or ECS task role / EKS service account) MUST have permission to read the tenant secrets. Minimal policy: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "secretsmanager:GetSecretValue", "Resource": "arn:aws:secretsmanager:*:*:secret:tenants/*/m2m/*/credentials-*" } ] } ``` For tighter scoping, replace wildcards with specific values: | Wildcard | Scoped Example | |----------|---------------| | First `*` (env) | `production` or `staging` | | Second `*` (target) | `ledger` or `midaz` | | Trailing `-*` | Required — AWS appends a random suffix to secret ARNs | **MUST NOT** grant `secretsmanager:*` — only `GetSecretValue` is needed. ### Testing Guidance The `secretsmanager.SecretsManagerClient` is an interface, so it can be mocked in unit tests without hitting AWS. #### Mocking the client ```go type mockSMClient struct { getSecretValueFunc func(ctx context.Context, input *awssm.GetSecretValueInput, opts ...func(*awssm.Options)) (*awssm.GetSecretValueOutput, error) } func (m *mockSMClient) GetSecretValue(ctx context.Context, input *awssm.GetSecretValueInput, opts ...func(*awssm.Options)) (*awssm.GetSecretValueOutput, error) { return m.getSecretValueFunc(ctx, input, opts...) } ``` #### Testing credential cache TTL ```go func TestCredentialCacheExpiry(t *testing.T) { callCount := 0 mock := &mockSMClient{ getSecretValueFunc: func(_ context.Context, _ *awssm.GetSecretValueInput, _ ...func(*awssm.Options)) (*awssm.GetSecretValueOutput, error) { callCount++ secret := `{"clientId":"id","clientSecret":"secret"}` return &awssm.GetSecretValueOutput{SecretString: &secret}, nil }, } provider := NewM2MCredentialProvider(mock, "test", "plugin-pix", "ledger", 1*time.Second) // First call — fetches from AWS _, err := provider.GetCredentials(context.Background(), "tenant-1") require.NoError(t, err) assert.Equal(t, 1, callCount) // Second call — served from cache _, err = provider.GetCredentials(context.Background(), "tenant-1") require.NoError(t, err) assert.Equal(t, 1, callCount) // still 1 // Wait for cache expiry time.Sleep(1100 * time.Millisecond) // Third call — cache expired, fetches again _, err = provider.GetCredentials(context.Background(), "tenant-1") require.NoError(t, err) assert.Equal(t, 2, callCount) } ``` #### Testing error scenarios Test each sentinel error path (`ErrM2MCredentialsNotFound`, `ErrM2MVaultAccessDenied`, `ErrM2MInvalidCredentials`) by returning the corresponding error from the mock. ### Observability & Metrics Instrument `M2MCredentialProvider` to track credential retrieval health. Recommended counters and histogram: | Metric | Type | Where to Increment | Description | |--------|------|-------------------|-------------| | `m2m_credential_cache_hits` | Counter | `GetCredentials` — cache hit path | Credential served from local cache | | `m2m_credential_cache_misses` | Counter | `GetCredentials` — cache miss path | Cache miss, fetching from AWS | | `m2m_credential_fetch_errors` | Counter | `GetCredentials` — error return | AWS Secrets Manager call failed | | `m2m_credential_fetch_duration_seconds` | Histogram | `GetCredentials` — around the `GetM2MCredentials` call | Latency of AWS Secrets Manager requests | Labels: `tenant_org_id`, `target_service`, `environment`. **MUST NOT** include `clientId` or `clientSecret` in metric labels or log fields. ### Security Considerations 1. **MUST NOT log credentials** — never log `clientId` or `clientSecret` values 2. **MUST NOT store credentials in environment variables** — always fetch from Secrets Manager at runtime 3. **MUST cache locally** — avoid per-request AWS API calls (latency + cost) 4. **MUST handle credential rotation** — cache TTL ensures stale credentials are refreshed automatically ### Anti-Rationalization | Rationalization | Why It's WRONG | Required Action | |-----------------|----------------|-----------------| | "Product service doesn't need M2M" | Correct. Only plugins need M2M. | **Skip this gate for products** | | "We can hardcode credentials per tenant" | Hardcoded creds don't scale and are a security risk. | **MUST use Secrets Manager** | | "Caching is optional, we'll add it later" | Every request hitting AWS adds ~50-100ms latency + cost. | **MUST implement caching from day one** | | "We'll use env vars for client credentials" | Env vars are shared across tenants. M2M is per-tenant. | **MUST use Secrets Manager per tenant** | | "Single-tenant plugins don't need this" | Correct if MULTI_TENANT_ENABLED=false. | **Skip when single-tenant** | --- ## Service Authentication (MANDATORY) Consumer services that call the Tenant Manager `/settings` endpoint MUST authenticate using an API key sent via the `X-API-Key` HTTP header. Without this header, the Tenant Manager rejects requests to protected endpoints. ### How It Works 1. **Key generation:** API keys are generated per-service via the service catalog endpoint `POST /services/:name/api-keys`. 2. **Key limit:** Maximum 2 keys per environment per service, enabling zero-downtime rotation (create new key, roll out, revoke old key). 3. **Header injection:** The lib-commons Tenant Manager HTTP client sends the `X-API-Key` header automatically when configured with `client.WithServiceAPIKey()`. 4. **Consumer configuration:** Consumer services set the `MULTI_TENANT_SERVICE_API_KEY` environment variable. The bootstrap code passes it to the client via `WithServiceAPIKey`. ### Configuration Add `MULTI_TENANT_SERVICE_API_KEY` to the Config struct (see [Environment Variables](#environment-variables)): ```go MultiTenantServiceAPIKey string `env:"MULTI_TENANT_SERVICE_API_KEY"` ``` Wire it when creating the Tenant Manager HTTP client: ```go clientOpts = append(clientOpts, client.WithServiceAPIKey(cfg.MultiTenantServiceAPIKey), ) tmClient, err := client.NewClient(cfg.MultiTenantURL, logger, clientOpts...) if err != nil { return fmt.Errorf("creating tenant-manager client: %w", err) } ``` > **Fail-fast:** `NewClient` returns `core.ErrServiceAPIKeyRequired` if the API key is empty. The service refuses to start with a clear error instead of failing with runtime 401 errors on `/settings` calls. ### Key Rotation Zero-downtime rotation flow: 1. Generate a new API key: `POST /services/:name/api-keys` 2. Deploy the new key to the consumer service (`MULTI_TENANT_SERVICE_API_KEY`) 3. Verify the service authenticates successfully with the new key 4. Revoke the old key: `DELETE /services/:name/api-keys/:keyId` The service catalog enforces a maximum of 2 active keys per environment, so both old and new keys work simultaneously during the rollout window. ### Anti-Rationalization | Rationalization | Why It's WRONG | Required Action | |-----------------|----------------|-----------------| | "We don't need API key auth for internal services" | The `/settings` endpoint returns database credentials. Unauthenticated access is a security risk. | **MUST configure `WithServiceAPIKey`** | | "We'll add the API key later" | Without authentication, the Tenant Manager rejects `/settings` requests. The service cannot resolve tenant connections. | **MUST configure before enabling multi-tenant** | | "We can use a shared API key across services" | Each service MUST have its own API key for audit trail and independent revocation. | **MUST generate per-service keys via service catalog** |