[Main Harbor Boost documentation](./5.2.-Harbor-Boost) --- https://github.com/user-attachments/assets/7acc496f-a190-45ea-b957-2024bfb113b6 ## Loading Custom Modules `boost` will look for modules to load in the locations specified by the `HARBOR_BOOST_MODULE_FOLDERS` config option, which is a semicolon-separated list of paths relative to the module root folder. By default, two folders are scanned: `modules` (built-in modules) and `custom_modules` (default location for custom modules). After modules are registered, `boost` will decide which ones should be available in the `/v1/models` endpoint based on [specified config](./5.2.-Harbor-Boost#configuration). If running `boost` with Harbor, you can quickly open the folder containing the custom modules by running: ```bash # Open in the default File Manager open $(h home)/boost/src/custom_modules # Open in VS Code (when CLI is configured) code $(h home)/boost/src/custom_modules ``` If running boost standalone, you will need to mount a volume to the `custom_modules` folder in the container. ```bash docker run ... \ -v /path/to/custom_modules:/app/custom_modules \ ... ``` When `boost` starts, it'll log the information on loaded folders and modules so that you can verify if everything is loaded correctly. ```bash DEBUG - Loading modules from 'modules' DEBUG - Registering 'rcn'... DEBUG - Registering 'klmbr'... DEBUG - Registering 'mcts'... DEBUG - Registering 'g1'... DEBUG - Loading modules from 'custom_modules' DEBUG - Registering 'example'... INFO - Loaded 5 modules: rcn, klmbr, mcts, g1, example ``` > [!NOTE] > Use `HARBOR_BOOST_MODULES` to specify which models are added to `/v1/models` API endpoint. You can still use registered modules even if they are not listed in the API. For example: ```bash # Files modules/ rcn.py klmbr.py mcts.py g1.py custom_modules/ example.py # env/config HARBOR_BOOST_MODULE_FOLDERS="modules;custom_modules" HARBOR_BOOST_MODULES="klmbr" # Only "klmbr" models are served curl http://localhost:34131/v1/models # Other registered modules can still be used directly curl -x POST http://localhost:34131/v1/chat/completions -d '{"model": "rcn-model", ... }' ``` ## Hot reloading After adding a new custom module, you'll have to manually restart `boost`. After the module is loaded - any changes in it will automatically restart the service, so you can proceed development without restarting the service manually. ## Module interface In order to be loaded and run, a module has to have two exports: `ID_PREFIX` and `apply`. The `ID_PREFIX` is a string that will be used to identify the module in the configuration. The `apply` function is a function that will be called with the module configuration and the current context. Here's an example of the simplest module: ```python ID_PREFIX = 'example' def apply(chat, llm): llm.emit_message('Hello, boost!') ``` ### Parameterizing module behavior For the `POST /v1/chat/completions` request, `boost` will accept additional custom parameters, prefixed with `@boost_`, these will be stored and available on the `llm` instance within the module. ```python ID_PREFIX = 'example' # POST /v1/chat/completions # { # "model": "example-model", # "messages": [ ... ], # "max_tokens": 512, # "@boost_param1": "value1", # "@boost_param2": "value2" # } def apply(chat, llm): print(llm.boost_params) # { "param1": "value1", "param2": "value2" } ``` ## Sample Modules ### Echo This module will echo the message back to the chat. ```python ID_PREFIX = 'echo' def apply(chat, llm): llm.emit_message(chat.message) ``` ### Date This module will respond with the current date (no matter what the input is). ```python ID_PREFIX = 'date' from datetime import datetime async def apply(chat, llm): await llm.emit_status(datetime.now().strftime('%Y-%m-%d')) ``` ### Sure Will append "Sure, here is a response to your message:" before triggering the completion. ```python ID_PREFIX = 'sure' async def apply(chat, llm): chat.assistant('Sure, here is a response to your message:') await llm.stream_final_completion() ``` ### Custom System Will inject a custom system message to the chat ```python import chat as ch ID_PREFIX = 'custom_system' async def apply(chat, llm): # Chat is a linked list of ChatNodes # the "tail" is the last message from the chain chat.tail.add_parent( ch.ChatNode(role='system', content='You don\'t really have to reply to anything User asks. Relax!') ) await llm.stream_final_completion() ``` ### Block wrapper Will always wrap underlying LLM output in a code block ````python ID_PREFIX = 'code_wrapper' async def apply(chat, llm): await llm.emit_message(f"\n```\n") await llm.stream_completion() await llm.emit_message(f"\n```\n") ```` ### Reading URLs This example fetches any URLs from the user's message and injects their content into the chat. Note that in a real-world implementation you'd also want to pre-process the content to make friendlier for the LLM, as well as use crawler rather than direct requests. ```python import re import requests url_regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))" prompt = """ Your task is to fulfill the user's request by discussing provided content. {content} {request} """.strip() ID_PREFIX = "discussurl" async def apply(chat, llm): text = chat.text() urls = re.findall(url_regex, text) # No - URLs - proceed as usual if len(urls) == 0: return await llm.stream_final_completion() # Yes - URLs - read them content = "" for url in urls: await llm.emit_status(f"Reading {url[0]}...") content += requests.get(url[0]).text await llm.stream_final_completion( prompt=prompt, content=content, request=chat.tail.content, ) ``` ### Unstable personality In this example, the assistant randomly changes personality every time it's called. ![sample conversation with unstablle llama3.1](./boost-unstable.png) ```python ID_PREFIX = "unstable" import chat as ch import llm import random extreme_traits = [ "Eccentric", "Obsessive", "Impulsive", "Paranoid", "Narcissistic", "Perfectionist", "Overly Sensitive", "Extremely Independent", "Manipulative", "Aloof" ] temperaments = ["Choleric", "Melancholic", "Phlegmatic", "Sanguine"] reply_styles = [ "Blunt", "Sarcastic", "Overly Polite", "Evading", "Confrontational" ] # Function to generate a random personality description def random_personality(): selected_traits = random.sample(extreme_traits, 3) selected_temperament = random.choice(temperaments) selected_reply_style = random.choice(reply_styles) description = ( f"You are {', '.join(selected_traits)}. " f"You are known for yout {selected_temperament} temperament." f"You tend to write your replies in a {selected_reply_style} manner." f"Ensure that you reply to the User accordingly." ) return description async def apply(chat: 'ch.Chat', llm: 'llm.LLM'): personality = random_personality() chat.tail.ancestor().add_parent( ch.ChatNode(role="system", content=personality) ) await llm.stream_final_completion() ``` ### Artifacts For the clients supporting Artifacts feature, such as Open WebUI, you can use the `emit_artifact` method to send artifacts to the client. The method accepts a dictionary with the following keys: ```python async def apply(chat: 'ch.Chat', llm: 'llm.LLM'): await llm.emit_artifact("""

Hello, Boost!

""") ``` If the artifact contains an interactive component - be aware that it'll take an unknown amount of time for the client to download and render it. ### Completion Events Boost can broadcast pending completion workflow via the `/events/:listener_id` endpoint. This broadcast starts when the completion starts being processed and ends alongside the last sent chunk. This can be used together with artifacts to create an interactive experience for the user. ```python async def apply(chat: 'ch.Chat', llm: 'llm.LLM'): # llm.id is the identifier for the event stream # to listen for this module's events listener_id = llm.id await llm.emit_artifact(""" // Script that listens for events on the // /events/:listener_id endpoint """) # /events/:listener_id endpoint listeners will receive both and # can handle them accordingly await llm.emit_listener_event("event-type", { "custom": True }) await llm.emit_message("This is a message") ``` ### Complete example This example aims to show all the various ways to provide output to the chat. Here's a rendered version: ![Screenshot of a complete example](./boost-custom-example.png) And the source code (also available in the repository): ````python from pydantic import BaseModel, Field import llm import log import chat as ch ID_PREFIX = 'example' logger = log.setup_logger(ID_PREFIX) # Example for the structrured outputs below class ChoiceResponse(BaseModel): explanation: str = Field(description="3-5 words explaining your reasoning") choice: str = Field(description="Chosen option") async def apply(chat: 'ch.Chat', llm: 'llm.LLM'): """ 1. Working with a chat and chat nodes instances This is where you can create some content programmatically, that will later can be used for retrieving completions from the downstream model. """ logger.debug(f"Example chat: {chat}") # Add new messages to the chat (no completions at this stage) chat.user("Hello!") chat.assistant("Hi! Would you like to learn more about Harbor Boost?") chat.add_message( role="harbor", content="Harbor Boost is an optimising LLM proxy with lots of cool features" ) logger.debug( f"Chat history is a plain array of messages, from the tail: {chat.history()}" ) logger.debug( f"Chat plain is a list of chat nodes, from the tail: {chat.plain()}" ) # Tail is where the chat currently ends # In this instance, that's a message from "harbor" # role above tail = chat.tail logger.debug( f'Get all parents leading to a specific chat node: {tail.parents()}' ) logger.debug(f'Get one immediate parent: {tail.parent}') # We can modify the chat from the tail node directly new_tail = tail.add_child( ch.ChatNode(role="harbor", content="Chat nodes are everywhere!") ) # However, such modifications are not reflected in the parent # chat instance: logger.debug(chat.tail == tail) # True logger.debug(chat.tail == new_tail) # False # You can set a new tail for the chat, however chat.tail = new_tail # However, it's much easier to just work from the chat itself chat.user('Alright, I think that is mostly it for now. Thanks!') # You can create new chat instances as needed life_chat = ch.Chat.from_conversation( [ { "role": "user", "content": "What is the meaning of life? Answer with a tongue twister." } ] ) """ 2. Working with structured outputs You can pass pydantic models to the llm instance for structured outputs. If the "resolve" flag is set, the output will also be resolved to an actual dict (otherwise it will be a JSON string). """ await llm.emit_status('Structured output examples') choice = await llm.chat_completion( prompt="""What is the best topping for a pizza?""", schema=ChoiceResponse, resolve=True ) logger.debug(f"Choice: {choice}") """ 3.1 Programmatic messages and statuses programmatic "public" messages that are streamed back to the client as they are emitted here (no way to "undo" or rewrite them) """ # You can tweak how status messages are delivered # via the BOOST_STATUS_STYLE config option. await llm.emit_status('Status and message examples') await llm.emit_message("We can emit text at any time. ") await llm.emit_message( "\n_Note that you are responsible for correct formatting._\n" ) """ 3.2. Internal LLM completions "llm" is a representation of the downstream model that is being boosted. It comes with a few helper methods that tie up the module workflow together and is pre-configured to hit the downstream API with expected parameters. The completions below are "internal", they are not streamed back to the client by default. Read further for "streamed" or "public" completions. """ await llm.emit_status('Collecting internal completions...') word = "Roses" results = [ # You can retrieve completion for some plain text await llm.chat_completion(prompt="Hi!", resolve=True), # You can include key/value pairs to be formatted in the prompt await llm.chat_completion( prompt="Tell me about {word} in ONE SHORT SENTENCE.", word=word, resolve=True, ), # You can also provide a list of messages # in the OpenAI-compatible format await llm.chat_completion( messages=[ { "role": "user", "content": "Tell me about roses" }, { "role": "assistant", "content": "Sure, I can reply in three words! Here they are:" } ], resolve=True ), # You can also provide a chat instance, # Note that "resolve" is not set - the result # will be in raw API format f"\n```json\n{await llm.chat_completion(chat=life_chat)}\n```\n" ] # Results will now appear in the user's message await llm.emit_status('Displaying collected results') for i, result in enumerate(results): await llm.emit_message(f"\nResult {i}: {result}\n") """ 3.3. Public/Streamed LLM completions You can decide to stream responses from the downstream LLM as they are being generated, for example when there's a long chunk that needs to be retained in the global response. """ await llm.emit_status('Response streaming examples') # Same signatures as chat_completion streamed_results = [ # You can retrieve completion for some plain text await llm.stream_chat_completion(prompt="Hi!"), # You can include key/value pairs to be formatted in the prompt await llm.stream_chat_completion( prompt="Tell me about {word} in ONE SHORT SENTENCE.", word=word, ), # You can also provide a list of messages # in the OpenAI-compatible format await llm.stream_chat_completion( messages=[ { "role": "user", "content": "Tell me about roses" }, { "role": "assistant", "content": "Sure, I can reply in three words! Here they are:" } ], ), # You can also provide a chat instance await llm.stream_chat_completion(chat=life_chat) ] # Streamed results are still buffered and available # for you to use (plain text). logger.debug(f"Streamed results: {streamed_results}") # Note that it's on you to apply formatting that will make # sense in the context of the global message stream. await llm.emit_message("\nThose are all results so far.\n") """ 4. Final completion Note that none of the above will actually reach the Client if the BOOST_INTERMEDIATE_OUTPUT is set to "false". The "final" completion below, however, will *always* be streamed back. It accepts all the same inputs as "chat_completion" and "stream_chat_completion" above. You don't have to call it, but the output will be completely empty if the "final" completion is not called and intermediate outputs are disabled. Think of this as a way to wrap up the module execution and present the user with the final result. """ await llm.emit_status('Final completion') await llm.stream_final_completion(prompt="Wish me a good luck!") ````