# LangChain and CrateDB


## About LangChain

[LangChain] is an open source framework for developing applications powered
by language models. It provides a complete set of powerful and flexible
components for building context-aware, reasoning applications.

[LangGraph] is a low-level orchestration framework for building, managing,
and deploying long-running, stateful agents.

Please refer to the [LangChain documentation] and the [Building Ambient
Agents with LangGraph] academy material for further information.

Common end-to-end use cases are:

- Analyzing structured data
- Chatbots and friends
- Document question answering
- Text-to-SQL (talk to your data)

LangChain provides standard, extendable interfaces and external integrations
for the following modules, listed from least to most complex:

- [Model I/O][Model I/O]: Interface with language models
- [Retrieval][Retrieval]: Interface with application-specific data
- [Chains][Chains]: Construct sequences of calls
- [Agents][Agents]: Let chains choose which tools to use given high-level directives
- [Memory][Memory]: Persist application state between runs of a chain
- [Callbacks][Callbacks]: Log and stream intermediate steps of any chain


## What's inside

[![Made with Jupyter](https://img.shields.io/badge/Made%20with-Jupyter-orange?logo=Jupyter)](https://jupyter.org/try) [![Made with Markdown](https://img.shields.io/badge/Made%20with-Markdown-1f425f.svg?logo=Markdown)](https://commonmark.org)

This folder provides guidelines and runnable code to get started with [LangChain]
and [CrateDB].

- [readme.md](readme.md): The file you are currently reading contains a walkthrough
  about how to get started with the LangChain framework and CrateDB, and guides you
  to corresponding example programs how to use different subsystems.

- [requirements.txt](requirements.txt): Pulls in a patched version of LangChain,
  as well as the CrateDB client driver and the `crash` command-line interface.

- `vector_store.ipynb` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](vector_search.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Flangchain%2Fvector_search.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/vector_search.ipynb)

  This notebook explores CrateDB's [`FLOAT_VECTOR`] and [`KNN_MATCH`] functionalities for storing and retrieving
  embeddings, and for conducting similarity searches.

- `document_loader.ipynb` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](document_loader.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Flangchain%2Fdocument_loader.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/document_loader.ipynb)

  The notebook about the Document Loader demonstrates how to query a database table in CrateDB and use it as a
  source provider for LangChain documents.

- `conversational_memory.ipynb` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](conversational_memory.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Flangchain%2Fconversational_memory.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/conversational_memory.ipynb)

  LangChain also supports managing conversation history in SQL databases. This notebook exercises
  how that works with CrateDB.

- `cratedb-vectorstore-rag-openai-sql.ipynb` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](conversational_memory.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Flangchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb-vectorstore-rag-openai-sql.ipynb)

  This example intentionally shows how to use the CrateDB Vector Store using SQL.
  There might be cases where the default parameters of the LangChain integration
  are not sufficient, or you need to use more advanced SQL queries.

  The example still uses LangChain components to split a PDF file into chunks,
  leverages OpenAI to calculate embeddings, and to execute the request towards an LLM.

- Accompanied to the Jupyter Notebook files, there are also basic variants of
  corresponding examples, [vector_search.py](vector_search.py),
  [document_loader.py](document_loader.py), and
  [conversational_memory.py](conversational_memory.py).
  
- `cratedb_rag_customer_support_langchain.ipynb` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](cratedb_rag_customer_support_langchain.ipynb)[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb_rag_customer_support_langchain.ipynb)
    
  This example illustrates the RAG implementation of a customer support scenario.
  The dataset used in this example is based on a collection of customer support interactions 
  from Twitter related to Microsoft products or services.

  The example shows how to use the CrateDB vector store functionality to create a retrieval 
  augmented generation (RAG) pipeline. To implement RAG we use the Python client driver for 
  CrateDB and vector store support in LangChain.

- `cratedb_rag_customer_support_vertexai.ipynb` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](cratedb_rag_customer_support_vertexai.ipynb)[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb_rag_customer_support_vertexai.ipynb)
    
  This example illustrates the RAG implementation of a customer support scenario. 
  It is based on the previous notebook, and it illustrates how to use Vertex AI platform
  on Google Cloud for RAG pipeline.

- `agent_with_mcp.py` [![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](agent_with_mcp.py)

  This example illustrates how to use LangGraph and the `langchain-mcp-adapters`
  package to implement an LLM agent that is connecting to the CrateDB MCP server.
  The demo program performs Text-to-SQL on timeseries data stored in a CrateDB table.

## Install

To properly set up a sandbox environment to explore the example notebooks
and programs, it is advised to create a Python virtualenv, and install the
dependencies into it. In this way, it is easy to wipe your virtualenv and start
from scratch anytime.

```shell
python3 -m venv .venv
source .venv/bin/activate
pip install -U -r requirements.txt
```


## Setup

The upcoming commands expect that you are working on a terminal with
activated virtualenv.
```shell
source .venv/bin/activate
```

### CrateDB on localhost

In order to spin up a CrateDB instance without further ado, you can use
Docker or Podman.
```shell
docker run --rm -it \
  --name=cratedb --publish=4200:4200 --publish=5432:5432 \
  --env=CRATE_HEAP_SIZE=4g crate -Cdiscovery.type=single-node
```

### CrateDB Cloud

Sign up or log in to [CrateDB Cloud], and create a free tier cluster. Within just a few minutes,
a cloud-based development environment is up and running. As soon as your project scales, you can
easily move to a different cluster tier or scale horizontally.


### MCP

# Provision database.
```bash
crash < init.sql
```

Spin up the [CrateDB MCP server], connecting it to CrateDB on localhost.
```bash
export CRATEDB_CLUSTER_URL=http://crate:crate@localhost:4200/
export CRATEDB_MCP_TRANSPORT=streamable-http
uvx cratedb-mcp serve
```

Run the code using OpenAI API:
```bash
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
python agent_with_mcp.py
```
Expected output:
```text
Query was: What is the average value for sensor 1?
Answer was: The average value for sensor 1 is approximately 17.03. If you need more details or a different calculation, let me know!
```

## Testing

Run all tests.
```shell
pytest
```

Run tests selectively.
```shell
pytest -k document_loader
pytest -k "notebook and loader"
```

To force regeneration of Jupyter notebooks, use the
`--nb-force-regen` option.
```shell
pytest -k document_loader --nb-force-regen
```


[Agents]: https://python.langchain.com/docs/modules/agents/
[Building Ambient Agents with LangGraph]: https://academy.langchain.com/courses/ambient-agents/
[Callbacks]: https://python.langchain.com/docs/modules/callbacks/
[Chains]: https://python.langchain.com/docs/modules/chains/
[CrateDB]: https://github.com/crate/crate
[CrateDB Cloud]: https://console.cratedb.cloud
[CrateDB MCP server]: https://cratedb.com/docs/guide/integrate/mcp/cratedb-mcp.html
[`FLOAT_VECTOR`]: https://crate.io/docs/crate/reference/en/master/general/ddl/data-types.html#float-vector
[`KNN_MATCH`]: https://crate.io/docs/crate/reference/en/master/general/builtins/scalar-functions.html#scalar-knn-match
[LangChain]: https://www.langchain.com/
[LangChain documentation]: https://python.langchain.com/
[LangGraph]: https://langchain-ai.github.io/langgraph/
[Memory]: https://python.langchain.com/docs/modules/memory/
[Model I/O]: https://python.langchain.com/docs/modules/model_io/
[Retrieval]: https://python.langchain.com/docs/modules/data_connection/