--- title: Configuring this pattern weight: 20 aliases: /rag-llm-cpu/configure/ --- # **Configuring this pattern** This guide covers common customizations, such as changing the default large language model (LLM), adding new models, and configuring retrieval-augmented generation (RAG) data sources. This guide assumes that you have already completed the [Getting started](/rag-llm-cpu/getting-started/) guide. ## **Configuration overview** ArgoCD manages this pattern by using GitOps. All application configurations are defined in the `values-prod.yaml` file. To customize a component, complete the following steps: 1. **Enable an override:** In the `values-prod.yaml` file, locate the application that you want to change, such as `llm-inference-service`, and add an `extraValueFiles:` entry that points to a new override file, such as `$patternref/overrides/llm-inference-service.yaml`. 2. **Create the override file:** Create the new `.yaml` file in the `/overrides` directory. 3. **Add settings:** Add the specific values that you want to change to the new file. 4. **Commit and synchronize:** Commit your changes and allow ArgoCD to synchronize the application. ## **Task: Changing the default LLM** By default, the pattern deploys the `mistral-7b-instruct-v0.2.Q5_0.gguf` model. You can change this to a different model, such as a different quantization, or adjust the resource usage. To change the default LLM, create an override file for the existing `llm-inference-service` application. 1. **Enable the override:** In the `values-prod.yaml` file, update the `llm-inference-service` application to use an override file: ```yaml clusterGroup: # ... applications: # ... llm-inference-service: name: llm-inference-service namespace: rag-llm-cpu chart: llm-inference-service chartVersion: 0.3.* extraValueFiles: # <-- ADD THIS BLOCK - $patternref/overrides/llm-inference-service.yaml ``` 2. **Create the override file:** Create a new file named `overrides/llm-inference-service.yaml`. The following example switches to a different model file (Q8_0) and increases the CPU and memory requests: ```yaml inferenceService: resources: # <-- Increaed allocated resources requests: cpu: "8" memory: 12Gi limits: cpu: "12" memory: 24Gi servingRuntime: args: - --model - /models/mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed model file model: repository: TheBloke/Mistral-7B-Instruct-v0.2-GGUF files: - mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed file to download ``` ## **Task: Adding a second LLM** You can deploy an additional LLM and add it to the demonstration user interface (UI). The following example deploys the HuggingFace TGI runtime instead of `llama.cpp`. This process requires two steps: deploying the new LLM and configuring the frontend UI. ### **Step 1: Deploying the new LLM service** 1. **Define the new application:** In the `values-prod.yaml` file, add a new application named `another-llm-inference-service` to the applications list. ```yaml clusterGroup: # ... applications: # ... another-llm-inference-service: # <-- ADD THIS NEW APPLICATION name: another-llm-inference-service namespace: rag-llm-cpu chart: llm-inference-service chartVersion: 0.3.* extraValueFiles: - $patternref/overrides/another-llm-inference-service.yaml ``` 2. **Create the override file:** Create a new file named `overrides/another-llm-inference-service.yaml`. This file defines the new model and disables the creation of resources, such as secrets, that the first LLM already created. ```yaml dsc: initialize: false externalSecret: create: false # Define the new InferenceService inferenceService: name: hf-inference-service # <-- New service name minReplicas: 1 maxReplicas: 1 resources: requests: cpu: "8" memory: 32Gi limits: cpu: "12" memory: 32Gi # Define the new runtime (HuggingFace TGI) servingRuntime: name: hf-runtime port: 8080 image: docker.io/kserve/huggingfaceserver:latest modelFormat: huggingface args: - --model_dir - /models - --model_name - /models/Mistral-7B-Instruct-v0.3 - --http_port - "8080" # Define the new model to download model: repository: mistralai/Mistral-7B-Instruct-v0.3 files: - generation_config.json - config.json - model.safetensors.index.json - model-00001-of-00003.safetensors - model-00002-of-00003.safetensors - model-00003-of-00003.safetensors - tokenizer.model - tokenizer.json - tokenizer_config.json ``` > **IMPORTANT:** A known issue in the model-downloading container requires that you explicitly list all files that you want to download from the HuggingFace repository. Ensure that you list every file required for the model to run. ### **Step 2: Adding the new LLM to the demonstration UI** Configure the frontend to recognize the new LLM. 1. **Edit the frontend overrides**: Open the `overrides/rag-llm-frontend-values.yaml` file. 2. **Update LLM_URLS:** Add the URL of the new service to the `LLM_URLS` environment variable. The URL uses the `http://-predictor/v1` format or `http://-predictor/openai/v1` for the HuggingFace runtime. In the `overrides/rag-llm-frontend-values.yaml` file: ```yaml env: # ... - name: LLM_URLS value: '["http://cpu-inference-service-predictor/v1","http://hf-inference-service-predictor/openai/v1"]' ``` ## **Task: Customizing RAG data sources** By default, the pattern ingests data from the Validated Patterns documentation. You can change this to point to public Git repositories or web pages. 1. **Edit the vector database overrides:** Open the `overrides/vector-db-values.yaml` file. 2. **Update sources:** Modify the `repoSources` and `webSources` keys. You can add any publicly available Git repository or public web URL. The job also processes PDF files from `webSources`. In the `overrides/vector-db-values.yaml` file: ```yaml providers: qdrant: enabled: true mssql: enabled: true vectorEmbedJob: repoSources: - repo: https://github.com/your-org/your-docs.git # <-- Your repo globs: - "**/*.md" webSources: - https://your-company.com/product-manual.pdf # <-- Your PDF chunking: size: 4096 ``` ## **Task: Adding a new RAG database provider** By default, the pattern enables `qdrant` and `mssql`. You can also enable `redis`, `pgvector`, or `elastic`. This process requires three steps: adding secrets, enabling the database, and configuring the UI. ### **Step 1: Updating the secrets file** 1. If the new database requires credentials, add them to the main secrets file: ```sh vim ~/values-secret-rag-llm-cpu.yaml ``` 2. Add the necessary credentials. For example: ```yaml secrets: # ... - name: pgvector fields: - name: user value: user # <-- Update the user - name: password value: password # <-- Update the password - name: db value: db # <-- Update the db ``` > **NOTE:** For information about the expected values, see the [`values-secret.yaml.template`](https://github.com/validatedpatterns-sandbox/rag-llm-cpu/blob/main/values-secret.yaml.template) file. ### **Step 2: Enabling the provider in the vector database chart** Edit the `overrides/vector-db-values.yaml` file and set `enabled: true` for the providers that you want to add. In the `overrides/vector-db-values.yaml` file: ```yaml providers: qdrant: enabled: true mssql: enabled: true pgvector: # <-- ADD THIS enabled: true elastic: # <-- OR THIS enabled: true ``` ### **Step 3: Adding the provider to the demonstration UI** Edit the `overrides/rag-llm-frontend-values.yaml` file to configure the UI: 1. Add the secrets for the new provider to the `dbProvidersSecret.vault` list. 2. Add the connection details for the new provider to the `dbProvidersSecret.providers` list. The following example shows the configuration for non-default RAG database providers: In the `overrides/rag-llm-frontend-values.yaml` file: ```yaml dbProvidersSecret: vault: - key: mssql field: sapassword - key: pgvector # <-- Add this block field: user - key: pgvector field: password - key: pgvector field: db - key: elastic # <-- Add this block field: user - key: elastic field: password providers: - type: qdrant # <-- Example for Qdrant collection: docs url: http://qdrant-service:6333 embedding_model: sentence-transformers/all-mpnet-base-v2 - type: mssql # <-- Example for MSSQL table: docs connection_string: >- Driver={ODBC Driver 18 for SQL Server}; Server=mssql-service,1433; Database=embeddings; UID=sa; PWD={{ .mssql_sapassword }}; TrustServerCertificate=yes; Encrypt=no; embedding_model: sentence-transformers/all-mpnet-base-v2 - type: redis # <-- Example for Redis index: docs url: redis://redis-service:6379 embedding_model: sentence-transformers/all-mpnet-base-v2 - type: elastic # <-- Example for Elastic index: docs url: http://elastic-service:9200 user: "{{ .elastic_user }}" password: "{{ .elastic_password }}" embedding_model: sentence-transformers/all-mpnet-base-v2 - type: pgvector # <-- Example for PGVector collection: docs url: >- postgresql+psycopg://{{ .pgvector_user }}:{{ .pgvector_password }}@pgvector-service:5432/{{ .pgvector_db }} embedding_model: sentence-transformers/all-mpnet-base-v2 ```