# Elasticsearch 向量存储


Elasticsearch是一个基于Apache Lucene构建的分布式、RESTful搜索和分析引擎。它提供不同的检索选项,包括密集向量检索、稀疏向量检索、关键字搜索和混合搜索。

[注册](https://cloud.elastic.co/registration?utm_source=llama-index&utm_content=documentation)免费试用Elastic Cloud,或者按照下面描述的方式运行本地服务器。

需要Elasticsearch 8.9.0或更高版本以及AIOHTTP。


In [None]:
%pip install -qU llama-index-vector-stores-elasticsearc llama-index openai

In [None]:
import getpass
import os

import openai

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

openai.api_key = os.environ["OPENAI_API_KEY"]

## 运行和连接Elasticsearch
有两种设置Elasticsearch实例以供使用的方式:

### Elastic Cloud
Elastic Cloud是一个托管的Elasticsearch服务。[注册](https://cloud.elastic.co/registration?utm_source=llama-index&utm_content=documentation)免费试用。

### 本地
通过在本地运行来开始使用Elasticsearch。最简单的方法是使用官方的Elasticsearch Docker镜像。有关更多信息,请参阅Elasticsearch Docker文档。

```bash
docker run -p 9200:9200 \
 -e "discovery.type=single-node" \
 -e "xpack.security.enabled=false" \
 -e "xpack.license.self_generated.type=trial" \
 docker.elastic.co/elasticsearch/elasticsearch:8.13.2
```

## 配置ElasticsearchStore
ElasticsearchStore类用于连接到Elasticsearch实例。它需要以下参数:

 - index_name: Elasticsearch索引的名称。必填。
 - es_client: 可选。预先存在的Elasticsearch客户端。
 - es_url: 可选。Elasticsearch URL。
 - es_cloud_id: 可选。Elasticsearch云ID。
 - es_api_key: 可选。Elasticsearch API密钥。
 - es_user: 可选。Elasticsearch用户名。
 - es_password: 可选。Elasticsearch密码。
 - text_field: 可选。存储文本的Elasticsearch字段的名称。
 - vector_field: 可选。存储嵌入的Elasticsearch字段的名称。
 - batch_size: 可选。用于批量索引的批量大小。默认为200。
 - distance_strategy: 可选。用于相似性搜索的距离策略。默认为"COSINE"。

### 示例:本地连接
```python
from llama_index.vector_stores import ElasticsearchStore

es = ElasticsearchStore(
 index_name="my_index",
 es_url="http://localhost:9200",
)
```

### 示例:使用用户名和密码连接到Elastic Cloud

```python
from llama_index.vector_stores import ElasticsearchStore

es = ElasticsearchStore(
 index_name="my_index",
 es_cloud_id="", # 在部署页面中找到
 es_user="elastic"
 es_password="" # 创建部署时提供。或者可以重置密码。
)
```

### 示例:使用API密钥连接到Elastic Cloud

```python
from llama_index.vector_stores import ElasticsearchStore

es = ElasticsearchStore(
 index_name="my_index",
 es_cloud_id="", # 在部署页面中找到
 es_api_key="" # 在Kibana中创建API密钥(安全 -> API密钥)
)
```


#### 示例数据


In [None]:
from llama_index.core.schema import TextNode

movies = [
 TextNode(
 text="The lives of two mob hitmen, a boxer, a gangster and his wife, and a pair of diner bandits intertwine in four tales of violence and redemption.",
 metadata={"title": "Pulp Fiction"},
 ),
 TextNode(
 text="When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.",
 metadata={"title": "The Dark Knight"},
 ),
 TextNode(
 text="An insomniac office worker and a devil-may-care soapmaker form an underground fight club that evolves into something much, much more.",
 metadata={"title": "Fight Club"},
 ),
 TextNode(
 text="A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into thed of a C.E.O.",
 metadata={"title": "Inception"},
 ),
 TextNode(
 text="A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its controllers.",
 metadata={"title": "The Matrix"},
 ),
 TextNode(
 text="Two detectives, a rookie and a veteran, hunt a serial killer who uses the seven deadly sins as his motives.",
 metadata={"title": "Se7en"},
 ),
 TextNode(
 text="An organized crime dynasty's aging patriarch transfers control of his clandestine empire to his reluctant son.",
 metadata={"title": "The Godfather", "theme": "Mafia"},
 ),
]

## 检索示例

本节展示了通过`ElasticsearchStore`提供的不同检索选项,并通过VectorStoreIndex来利用它们。


In [None]:
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.elasticsearch import ElasticsearchStore

首先,我们定义一个辅助函数来获取并打印用户查询输入的结果:


In [None]:
def print_results(results):
 for rank, result in enumerate(results, 1):
 print(
 f"{rank}. title={result.metadata['title']} score={result.get_score()} text={result.get_text()}"
 )


def search(
 vector_store: ElasticsearchStore, nodes: list[TextNode], query: str
):
 storage_context = StorageContext.from_defaults(vector_store=vector_store)
 index = VectorStoreIndex(nodes, storage_context=storage_context)

 print(">>> Documents:")
 retriever = index.as_retriever()
 results = retriever.retrieve(query)
 print_results(results)

 print("\n>>> Answer:")
 query_engine = index.as_query_engine()
 response = query_engine.query(query)
 print(response)

### 密集检索

在这里,我们使用来自OpenAI的嵌入来进行搜索。


In [None]:
# from llama_index.vector_stores.elasticsearch import AsyncDenseVectorStrategy# 创建ElasticsearchStore实例dense_vector_store# 设置es_url为"http://localhost:9200",index_name为"movies_dense",retrieval_strategy为AsyncDenseVectorStrategy()# 使用dense_vector_store进行search操作,搜索movies中涉及梦境的电影。

>>> Documents:
1. title=Inception score=1.0 text=A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into thed of a C.E.O.

>>> Answer:
Inception


这也是默认的检索策略:


In [None]:
默认存储 = ElasticsearchStore( es_url="http://localhost:9200", # 对于Elastic Cloud身份验证,请参见上文 index_name="movies_default",)搜索(default_store, 电影, "哪部电影涉及梦境?")

>>> Documents:
1. title=Inception score=1.0 text=A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into thed of a C.E.O.

>>> Answer:
Inception


### 稀疏检索

在这个例子中,您首先需要在您的Elasticsearch部署中[部署ELSER模型](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html)的第二个版本。


In [None]:
from llama_index.vector_stores.elasticsearch import AsyncSparseVectorStrategysparse_vector_store = ElasticsearchStore( es_url="http://localhost:9200", # 对于Elastic Cloud身份验证,请参见上文 index_name="movies_sparse", retrieval_strategy=AsyncSparseVectorStrategy(model_id=".elser_model_2"),)search(sparse_vector_store, movies, "哪部电影涉及梦境?")

>>> Documents:
1. title=Inception score=1.0 text=A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into thed of a C.E.O.

>>> Answer:
Inception


### 关键词检索

要使用经典的全文搜索,可以使用BM25策略。


In [None]:
# from llama_index.vector_stores.elasticsearch import AsyncBM25Strategy# 创建ElasticsearchStore实例bm25_store,设置es_url为"http://localhost:9200",index_name为"movies_bm25",retrieval_strategy为AsyncBM25Strategy()# 使用search函数在bm25_store中搜索关键词"joker"

>>> Documents:
1. title=The Dark Knight score=1.0 text=When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.

>>> Answer:
The Joker is a menacing character who wreaks havoc and chaos on the people of Gotham, posing a significant challenge for Batman to combat injustice.


### 混合检索

通过设置标志,可以将密集检索和关键字搜索结合起来,实现混合检索。


In [None]:
from llama_index.vector_stores.elasticsearch import AsyncDenseVectorStrategyhybrid_store = ElasticsearchStore( es_url="http://localhost:9200", # 对于Elastic Cloud身份验证,请参见上文 index_name="movies_hybrid", retrieval_strategy=AsyncDenseVectorStrategy(hybrid=True),)search(hybrid_store, movies, "哪部电影涉及梦境?")

>>> Documents:
1. title=Inception score=0.36787944117144233 text=A thief who steals corporate secrets through the use of dream-sharing technology is given the inverse task of planting an idea into thed of a C.E.O.

>>> Answer:
"Inception" is the movie that involves dreaming.


### 元数据过滤器

我们还可以根据文档的元数据对查询引擎应用过滤器。


In [None]:
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFiltersmetadata_store = ElasticsearchStore( es_url="http://localhost:9200", # 用于Elastic Cloud身份验证,请参见上文 index_name="movies_metadata",)storage_context = StorageContext.from_defaults(vector_store=metadata_store)index = VectorStoreIndex(movies, storage_context=storage_context)# 元数据过滤器filters = MetadataFilters( filters=[ExactMatchFilter(key="theme", value="Mafia")])retriever = index.as_retriever(filters=filters)results = retriever.retrieve("Inception是关于什么的?")print_results(results)

1. title=The Godfather score=1.0 text=An organized crime dynasty's aging patriarch transfers control of his clandestine empire to his reluctant son.


## 自定义过滤器和覆盖查询
目前,Elasticsearch实现仅支持LlamaIndex提供的ExactMatchFilters。Elasticsearch本身支持各种过滤器,包括范围过滤器、地理过滤器等。要使用这些过滤器,可以将它们作为字典列表传递给`es_filter`参数。


In [None]:
def custom_query(query, query_str):
 print("custom query", query)
 return query


query_engine = index.as_query_engine(
 vector_store_kwargs={
 "es_filter": [{"match": {"title": "matrix"}}],
 "custom_query": custom_query,
 }
)
query_engine.query("what is this movie about?")

custom query {'knn': {'filter': [{'match': {'title': 'matrix'}}], 'field': 'embedding', 'k': 2, 'num_candidates': 20, 'query_vector': [0.00446691969409585, -0.038953110575675964, -0.023963095620274544, -0.024891795590519905, -0.016729693859815598, 0.017200583592057228, -0.002360992832109332, -0.012622482143342495, -0.009980263188481331, -0.026108263060450554, 0.02950914017856121, 0.018626336008310318, -0.016154160723090172, -0.012099270708858967, 0.03777588531374931, 0.006209868937730789, 0.03539527207612991, -0.011746102944016457, 0.0029888467397540808, -0.022066453471779823, -0.02290359139442444, -0.011752642691135406, -0.018744058907032013, -0.015251620672643185, 0.0034074161667376757, 0.00014756205200683326, 0.022955913096666336, -0.02264198660850525, 0.002032350515946746, -0.021778685972094536, 0.012164671905338764, -0.015055416151881218, 0.006543416064232588, -0.009509372524917126, -0.008632993325591087, -0.006814832333475351, 0.011765723116695881, -0.01788076013326645, 0.0016669