<img src="docs/images/DSPy8.png" alt="DSPy7 图片" height="150"/>

# DSPy: 在 SkyCamp 的教程


这个笔记本包含了 **SkyCamp 2023** 的 **DSPy 教程**。

让我们从设置开始。下面的代码片段将会在需要时安装 **DSPy**。

In [1]:
%load_ext autoreload
%autoreload 2

import sys
import os

try: # 当在谷歌Colab上时，让我们克隆笔记本以便下载缓存。
    import google.colab
    repo_path = 'dspy'
    !git -C $repo_path pull origin || git clone https://github.com/stanfordnlp/dspy $repo_path
except:
    repo_path = '.'

if repo_path not in sys.path:
    sys.path.append(repo_path)

# 为这个笔记本设置缓存
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(repo_path, 'cache')

# import pkg_resources # 如果未安装该包，则安装该包
# if not "dspy-ai" in {pkg.key for pkg in pkg_resources.working_set}:
#     !pip install -U pip
#     # !pip install dspy-ai
#     !pip install -e $repo_path

import dspy

In [2]:
import dspy  # 导入dspy库
from dspy.evaluate import Evaluate  # 从dspy.evaluate模块导入Evaluate类
from dspy.datasets.hotpotqa import HotPotQA  # 从dspy.datasets.hotpotqa模块导入HotPotQA类
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune  # 从dspy.teleprompt模块导入BootstrapFewShot、BootstrapFewShotWithRandomSearch、BootstrapFinetune类

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
./cache/compiler


### 1) 配置默认的语言模型和检索模型

我们将首先设置语言模型（LM）和检索模型（RM）。**DSPy**支持多个API和本地模型。

在这个笔记本中，我们将使用`Llama2-13b-chat`，使用HuggingFace TGI服务软件基础设施。原则上，您可以在自己的本地GPU上运行此模型，但是在本教程中，所有示例都是预缓存的，因此您无需担心成本问题。

我们将使用检索模型`ColBERTv2`。为了简化操作，我们已经设置了一个ColBERTv2服务器，托管了一个维基百科2017年“摘要”搜索索引（即包含来自[2017倾销](https://hotpotqa.github.io/wiki-readme.html)的每篇文章的第一段），因此您无需担心设置问题！而且是免费的。

**注意：** _如果按照说明运行此笔记本，则无需API密钥。所有示例已经在内部缓存，因此您可以检查它们！_

In [3]:
llama = dspy.HFClientTGI(model="meta-llama/Llama-2-13b-chat-hf", port=[7140, 7141, 7142, 7143], max_tokens=150)
colbertv2 = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

# # 注意：在完成这个笔记本之后，如果你愿意，你可以像这样使用GPT-3.5。
# turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct')
# # 在这种情况下，如果你选择这样做，请确保在下面配置lm=turbo。

dspy.settings.configure(rm=colbertv2, lm=llama)

### 2) 为我们的任务创建一些问题-答案对

In [4]:
# 定义一个包含问题和答案的训练数据集
train = [('Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?', 'Kevin Greutert'),
         ('The heir to the Du Pont family fortune sponsored what wrestling team?', 'Foxcatcher'),
         ('In what year was the star of To Hell and Back born?', '1925'),
         ('Which award did the first book of Gary Zukav receive?', 'U.S. National Book Award'),
         ('What documentary about the Gilgo Beach Killer debuted on A&E?', 'The Killing Season'),
         ('Which author is English: John Braine or Studs Terkel?', 'John Braine'),
         ('Who produced the album that included a re-recording of "Lithium"?', 'Butch Vig')]

# 使用dspy库中的Example类创建训练实例，每个实例包含问题和答案，并将问题作为输入
train = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]

In [5]:
# 定义一个包含问题和答案的列表
dev = [('Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?', 'E. L. Doctorow'),
       ('Right Back At It Again contains lyrics co-written by the singer born in what city?', 'Gainesville, Florida'),
       ('What year was the party of the winner of the 1971 San Francisco mayoral election founded?', '1828'),
       ('Anthony Dirrell is the brother of which super middleweight title holder?', 'Andre Dirrell'),
       ('The sports nutrition business established by Oliver Cookson is based in which county in the UK?', 'Cheshire'),
       ('Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.', 'February 13, 1980'),
       ('Kyle Moran was born in the town on what river?', 'Castletown River'),
       ("The actress who played the niece in the Priest film was born in what city, country?", 'Surrey, England'),
       ('Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.', 'Portrait of a Marriage'),
       ('What year was the father of the Princes in the Tower born?', '1442'),
       ('What river is near the Crichton Collegiate Church?', 'the River Tyne'),
       ('Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?', 'Renault'),
       ('André Zucca was a French photographer who worked with a German propaganda magazine published by what Nazi organization?', 'the Wehrmacht')]

# 将问题和答案转换为dspy.Example对象，并指定输入为'question'
dev = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]

### 3) 关键概念：签名与模块

In [6]:
# 定义一个 dspy.Predict 模块，其签名为 `question -> answer`（即，接受一个问题并输出一个答案）。
predict = dspy.Predict('question -> answer')

# 使用该模块！
predict(question="What is the capital of Germany?")

Prediction(
    answer='Berlin'
)

在上面的示例中，我们使用了`dspy.Predict`模块**零样本**，即在任何示例上都没有编译它。

现在让我们构建一个稍微更高级的程序。我们的程序将使用`dspy.ChainOfThought`模块，该模块要求LM逐步思考。

我们将称此程序为`CoT`。

In [7]:
class CoT(dspy.Module):  # 定义一个新模块
    def __init__(self):
        super().__init__()

        # 声明思维链子模块，以便后续编译（例如，教给它一个提示）
        self.generate_answer = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.generate_answer(question=question)  # 在这里使用模块

现在让我们使用我们的六个`train`示例来编译这个。我们将在DSPy中使用非常简单的`BootstrapFewShot`。

In [8]:
# 定义评估指标为精确匹配
metric_EM = dspy.evaluate.answer_exact_match

# 创建BootstrapFewShot实例，设置评估指标为精确匹配，最大bootstrap演示次数为2
teleprompter = BootstrapFewShot(metric=metric_EM, max_bootstrapped_demos=2)

# 编译CoT模型，使用训练集trainset进行训练
cot_compiled = teleprompter.compile(CoT(), trainset=train)

100%|██████████| 7/7 [00:00<00:00, 29.36it/s]

Bootstrapped 1 full traces after 7 examples in round 0.





让我们向这个新程序提一个问题。

In [9]:
# 调用cot_compiled函数，并传入参数"What is the capital of Germany?"
cot_compiled("What is the capital of Germany?")

Prediction(
    rationale='determine the capital of Germany. We know that the capital of Germany is Berlin, so the answer is Berlin.',
    answer='Berlin'
)

你可能会好奇发生了什么。让我们检查一下最后一次调用我们的Llama LM，看看提示和输出。

In [10]:
# 使用llama.inspect_history(n=1)函数来查看llama对象的历史记录，n=1表示只查看最近的1条历史记录
llama.inspect_history(n=1)





Given the fields `question`, produce the fields `answer`.

---

Question: Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?
Answer: Kevin Greutert

Question: Which award did the first book of Gary Zukav receive?
Answer: U.S. National Book Award

Question: What documentary about the Gilgo Beach Killer debuted on A&E?
Answer: The Killing Season

Question: In what year was the star of To Hell and Back born?
Answer: 1925

Question: The heir to the Du Pont family fortune sponsored what wrestling team?
Answer: Foxcatcher

Question: Who produced the album that included a re-recording of "Lithium"?
Answer: Butch Vig

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: ${answer}

---

Question: Which author is English: John Braine or Studs Terkel?
Reasoning: Let's think step by step in order to determine which author is English. We know that John Braine is English, 

请注意提示以我们提出的问题结尾（“德国的首都是哪里？”），但在此之前包含了少样本示例。

提示中的最后一个示例包含了一个自动生成的理由（逐步推理），用作演示的LM，用于训练问题“哪位作者是英国人：约翰·布雷恩还是斯塔兹·特克尔？”。

现在，让我们在开发集上进行评估。

In [11]:
NUM_THREADS = 32
# 创建一个Evaluate对象，传入参数devset为dev，metric为metric_EM，num_threads为NUM_THREADS，display_progress为True，display_table为15
evaluate_hotpot = Evaluate(devset=dev, metric=metric_EM, num_threads=NUM_THREADS, display_progress=True, display_table=15)

首先，让我们使用 Llama 来评估编译后的 `CoT` 程序。请随意将下面的 `cot_compiled` 替换为 `CoT()`（注意括号），以测试 CoT 的零-shot 版本。

In [12]:
# 评估热锅代码
evaluate_hotpot(cot_compiled)

Average Metric: 3 / 13  (23.1): 100%|██████████| 13/13 [00:00<00:00, 117.05it/s]


Average Metric: 3 / 13  (23.1%)


Unnamed: 0,question,example_answer,rationale,pred_answer,answer_exact_match
0,Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?,E. L. Doctorow,"determine who has a broader scope of profession. We know that E. L. Doctorow was a novelist, but Julia Peterkin was a journalist and a...",Julia Peterkin,❌ [False]
1,Right Back At It Again contains lyrics co-written by the singer born in what city?,"Gainesville, Florida","determine the answer. We know that the singer was born in Minneapolis, so the answer is Minneapolis.",Minneapolis,❌ [False]
2,What year was the party of the winner of the 1971 San Francisco mayoral election founded?,1828,determine the year the party of the winner of the 1971 San Francisco mayoral election was founded. We know that the party was founded in...,1971,❌ [False]
3,Anthony Dirrell is the brother of which super middleweight title holder?,Andre Dirrell,"determine which super middleweight title holder Anthony Dirrell is the brother of. We know that Anthony Dirrell is a professional boxer, and after researching, we...",Andre Dirrell,✔️ [True]
4,The sports nutrition business established by Oliver Cookson is based in which county in the UK?,Cheshire,"determine the county in the UK where Oliver Cookson's sports nutrition business is based. We know that Oliver Cookson is a British entrepreneur, so we...",Surrey,❌ [False]
5,Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.,"February 13, 1980","determine the birth date of the actor. We know that the actor was born in the 1950s, so we need to narrow down the possible...","August 12, 1955",❌ [False]
6,Kyle Moran was born in the town on what river?,Castletown River,determine where Kyle Moran was born. We know that Kyle Moran was born in the town on the Delaware River.,Delaware River,❌ [False]
7,"The actress who played the niece in the Priest film was born in what city, country?","Surrey, England","determine the answer. We know that the actress was born in a city, so we need to determine the country. After researching, we found that...","Los Angeles, California",❌ [False]
8,Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.,Portrait of a Marriage,"determine the name of the movie. We know that the daughter of Noel Harrison plays Violet Trefusis, so we need to determine the name of...",The Remains of the Day,❌ [False]
9,What year was the father of the Princes in the Tower born?,1442,"determine the answer. We know that the father of the Princes in the Tower was born before 1483, so we need to find the correct...",1452,❌ [False]


23.08

### 4) 奖励1：带有查询生成的RAG

作为奖励，让我们定义一个更复杂的程序称为`RAG`。 这个程序将：

- 使用语言模型根据输入问题生成一个搜索查询
- 使用我们的检索器检索三个段落
- 使用语言模型使用这些段落生成最终答案

In [13]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        # 声明三个模块：检索器、查询生成器和答案生成器
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_query = dspy.ChainOfThought("question -> search_query")
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        # 从问题生成搜索查询，并用它检索段落
        search_query = self.generate_query(question=question).search_query
        passages = self.retrieve(search_query).passages

        # 从段落和问题生成答案
        return self.generate_answer(context=passages, question=question)

出于好奇，我们可以评估这个程序的**未编译**（或**零射击**）版本。

In [14]:
# 导入必要的库
from RAG import RAG

# 调用 evaluate_hotpot 函数，并传入 RAG() 实例作为参数，display_table 参数设置为 0
evaluate_hotpot(RAG(), display_table=0)

Average Metric: 3 / 13  (23.1): 100%|██████████| 13/13 [00:00<00:00, 45.09it/s]

Average Metric: 3 / 13  (23.1%)





23.08

让我们现在编译这个RAG程序。这次我们将使用稍微更先进的提示器（自动提示优化器），它依赖于随机搜索。

In [15]:
# 创建一个BootstrapFewShotWithRandomSearch对象，指定metric为metric_EM，最大bootstrapped演示数量为2，候选程序数量为8，线程数量为NUM_THREADS
teleprompter2 = BootstrapFewShotWithRandomSearch(metric=metric_EM, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=NUM_THREADS)

# 编译teleprompter2对象，使用RAG模型，训练集为train，验证集为dev
rag_compiled = teleprompter2.compile(RAG(), trainset=train, valset=dev)

Going to sample between 1 and 2 traces per predictor.
Will attempt to train 8 candidate sets.


Average Metric: 3 / 13  (23.1): 100%|██████████| 13/13 [00:00<00:00, 155.65it/s]


Average Metric: 3 / 13  (23.1%)
Score: 23.08 for set: [0, 0]
New best score: 23.08 for seed -3
Scores so far: [23.08]
Best score: 23.08


Average Metric: 3 / 13  (23.1): 100%|██████████| 13/13 [00:00<00:00, 72.77it/s]


Average Metric: 3 / 13  (23.1%)
Score: 23.08 for set: [7, 7]
Scores so far: [23.08, 23.08]
Best score: 23.08


 86%|████████▌ | 6/7 [00:00<00:00, 13.07it/s]


Bootstrapped 2 full traces after 7 examples in round 0.


Average Metric: 5 / 13  (38.5): 100%|██████████| 13/13 [00:00<00:00, 45.43it/s]


Average Metric: 5 / 13  (38.5%)
Score: 38.46 for set: [7, 7]
New best score: 38.46 for seed -1
Scores so far: [23.08, 23.08, 38.46]
Best score: 38.46
Average of max per entry across top 1 scores: 0.38461538461538464
Average of max per entry across top 2 scores: 0.46153846153846156
Average of max per entry across top 3 scores: 0.46153846153846156
Average of max per entry across top 5 scores: 0.46153846153846156
Average of max per entry across top 8 scores: 0.46153846153846156
Average of max per entry across top 9999 scores: 0.46153846153846156


100%|██████████| 7/7 [00:00<00:00, 19.01it/s]


Bootstrapped 1 full traces after 7 examples in round 0.


Average Metric: 6 / 13  (46.2): 100%|██████████| 13/13 [00:00<00:00, 42.01it/s]


Average Metric: 6 / 13  (46.2%)
Score: 46.15 for set: [7, 7]
New best score: 46.15 for seed 0
Scores so far: [23.08, 23.08, 38.46, 46.15]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384


 14%|█▍        | 1/7 [00:00<00:00, 21.10it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 4 / 13  (30.8): 100%|██████████| 13/13 [00:00<00:00, 68.72it/s]


Average Metric: 4 / 13  (30.8%)
Score: 30.77 for set: [7, 7]
Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384


 29%|██▊       | 2/7 [00:00<00:00, 21.89it/s]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 4 / 13  (30.8): 100%|██████████| 13/13 [00:00<00:00, 67.99it/s]


Average Metric: 4 / 13  (30.8%)
Score: 30.77 for set: [7, 7]
Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384


 43%|████▎     | 3/7 [00:00<00:00, 21.61it/s]


Bootstrapped 1 full traces after 4 examples in round 0.


Average Metric: 3 / 13  (23.1): 100%|██████████| 13/13 [00:00<00:00, 61.97it/s]


Average Metric: 3 / 13  (23.1%)
Score: 23.08 for set: [7, 7]
Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384


 14%|█▍        | 1/7 [00:00<00:00, 21.15it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 5 / 13  (38.5): 100%|██████████| 13/13 [00:00<00:00, 44.95it/s]


Average Metric: 5 / 13  (38.5%)
Score: 38.46 for set: [7, 7]
Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384


100%|██████████| 7/7 [00:00<00:00, 22.62it/s]


Bootstrapped 1 full traces after 7 examples in round 0.


Average Metric: 4 / 13  (30.8): 100%|██████████| 13/13 [00:00<00:00, 66.59it/s]


Average Metric: 4 / 13  (30.8%)
Score: 30.77 for set: [7, 7]
Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46, 30.77]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384


 57%|█████▋    | 4/7 [00:00<00:00, 23.46it/s]


Bootstrapped 1 full traces after 5 examples in round 0.


Average Metric: 4 / 13  (30.8): 100%|██████████| 13/13 [00:00<00:00, 68.29it/s]


Average Metric: 4 / 13  (30.8%)
Score: 30.77 for set: [7, 7]
Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46, 30.77, 30.77]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384


100%|██████████| 7/7 [00:00<00:00, 20.87it/s]


Bootstrapped 1 full traces after 7 examples in round 0.


Average Metric: 4 / 13  (30.8): 100%|██████████| 13/13 [00:00<00:00, 70.76it/s]

Average Metric: 4 / 13  (30.8%)
Score: 30.77 for set: [7, 7]
Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46, 30.77, 30.77, 30.77]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.5384615384615384
Average of max per entry across top 3 scores: 0.5384615384615384
Average of max per entry across top 5 scores: 0.5384615384615384
Average of max per entry across top 8 scores: 0.5384615384615384
Average of max per entry across top 9999 scores: 0.5384615384615384
11 candidate programs found.





让我们现在评估这个编译版本的RAG。

In [16]:
# 调用 evaluate_hotpot 函数并传入 rag_compiled 参数
evaluate_hotpot(rag_compiled)

Average Metric: 6 / 13  (46.2): 100%|██████████| 13/13 [00:00<00:00, 137.18it/s]

Average Metric: 6 / 13  (46.2%)





Unnamed: 0,question,example_answer,rationale,pred_answer,answer_exact_match
0,Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?,E. L. Doctorow,"answer this question. We know that E. L. Doctorow and Julia Peterkin are both authors, but we also know that E. L. Doctorow is known...",E. L. Doctorow.,✔️ [True]
1,Right Back At It Again contains lyrics co-written by the singer born in what city?,"Gainesville, Florida","answer this question. We know that Beyoncé is the singer who co-wrote the lyrics of ""Right Back At It Again"". We also know that Beyoncé...",Houston.,❌ [False]
2,What year was the party of the winner of the 1971 San Francisco mayoral election founded?,1828,answer this question. We know that the winner of the 1971 San Francisco mayoral election was a member of the Democratic Party. We also know...,1828.,✔️ [True]
3,Anthony Dirrell is the brother of which super middleweight title holder?,Andre Dirrell,answer this question. We know that Anthony Dirrell is a professional boxer. We also know that he held the WBC super middleweight title from 2014...,Andre Dirrell.,✔️ [True]
4,The sports nutrition business established by Oliver Cookson is based in which county in the UK?,Cheshire,"answer this question. We know that Oliver Cookson established Myprotein, a sports nutrition business. We also know that Myprotein was sold for a reported £58...",Cheshire.,✔️ [True]
5,Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.,"February 13, 1980","answer this question. We know that the actor played roles in ""First Wives Club"" and ""Searching for the Elephant"". We also know that the actor's...","October 17, 1976.",❌ [False]
6,Kyle Moran was born in the town on what river?,Castletown River,answer this question. We know that Kyle Moran is an actor who was born in Livingston. We also know that Livingston is a town in...,River Forth.,❌ [False]
7,"The actress who played the niece in the Priest film was born in what city, country?","Surrey, England",answer this question. We know that Lily Collins is an actress and the daughter of Phil Collins. We also know that she was born in...,"Surrey, England.",✔️ [True]
8,Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.,Portrait of a Marriage,answer this question. We know that Noel Harrison is the father of Dhani Harrison. We also know that Dhani Harrison is a member of the...,"The daughter of Noel Harrison plays Violet Trefusis in the movie ""The Killing Season"".",❌ [False]
9,What year was the father of the Princes in the Tower born?,1442,answer this question. We know that the father of the Princes in the Tower was King Richard III of England. We also know that he...,1452.,❌ [False]


46.15

让我们检查其中一个LM调用。特别关注提示中最后几个输入/输出示例的结构。

In [17]:
# 调用rag_compiled函数，提出问题："1971年旧金山市长选举的获胜者的政党是在哪一年成立的？"
rag_compiled("What year was the party of the winner of the 1971 San Francisco mayoral election founded?")

# 调用llama对象的inspect_history方法，查看最近的1条历史记录
llama.inspect_history(n=1)





Given the fields `context`, `question`, produce the fields `answer`.

---

Question: Which author is English: John Braine or Studs Terkel?
Answer: John Braine

Question: The heir to the Du Pont family fortune sponsored what wrestling team?
Answer: Foxcatcher

Question: Who produced the album that included a re-recording of "Lithium"?
Answer: Butch Vig

Question: In what year was the star of To Hell and Back born?
Answer: 1925

Question: What documentary about the Gilgo Beach Killer debuted on A&E?
Answer: The Killing Season

Question: Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?
Answer: Kevin Greutert

---

Follow the following format.

Context: ${context}

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «The Dancing Wu Li Masters | The Dancing Wu Li Masters is a 1979 book by Gary Zukav, a popular science work exploring modern physics, and quantum phen

### 4) 奖励2：多跳检索和推理

让我们现在构建一个简单的多跳程序，该程序将交替调用语言模型（LM）和检索器。

请按照下面的**TODO**指示来实现这个。

In [18]:
from dsp.utils.utils import deduplicate

class MultiHop(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_query = dspy.ChainOfThought("question -> search_query")

        # TODO: 使用具有签名'context, question -> search_query'的dspy.ChainOfThought模块。
        self.generate_query_from_context = dspy.ChainOfThought("context, question -> search_query")

        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question):
        passages = []
        
        search_query = self.generate_query(question=question).search_query
        passages += self.retrieve(search_query).passages

        # TODO: 使用self.generate_query_from_context生成搜索查询。
        # 注意：模块需要命名关键字参数（例如，context=..., question=...）。
        search_query = self.generate_query_from_context(context=passages, question=question).search_query

        # TODO: 使用self.retrieve检索段落。将它们附加到列表`passages`中。
        passages += self.retrieve(search_query).passages

        return self.generate_answer(context=deduplicate(passages), question=question)

In [19]:
# 使用teleprompter2库中的compile函数编译MultiHop模型，并传入训练集train和验证集dev
multihop_compiled = teleprompter2.compile(MultiHop(), trainset=train, valset=dev)

Average Metric: 3 / 13  (23.1): 100%|██████████| 13/13 [00:00<00:00, 40.91it/s]


Average Metric: 3 / 13  (23.1%)
Score: 23.08 for set: [0, 0, 0]
New best score: 23.08 for seed -3
Scores so far: [23.08]
Best score: 23.08


Average Metric: 3 / 13  (23.1): 100%|██████████| 13/13 [00:00<00:00, 53.59it/s]


Average Metric: 3 / 13  (23.1%)
Score: 23.08 for set: [7, 7, 7]
Scores so far: [23.08, 23.08]
Best score: 23.08


 57%|█████▋    | 4/7 [00:00<00:00, 11.10it/s]


Bootstrapped 2 full traces after 5 examples in round 0.


Average Metric: 6 / 13  (46.2): 100%|██████████| 13/13 [00:00<00:00, 27.22it/s]


Average Metric: 6 / 13  (46.2%)
Score: 46.15 for set: [7, 7, 7]
New best score: 46.15 for seed -1
Scores so far: [23.08, 23.08, 46.15]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.46153846153846156
Average of max per entry across top 3 scores: 0.46153846153846156
Average of max per entry across top 5 scores: 0.46153846153846156
Average of max per entry across top 8 scores: 0.46153846153846156
Average of max per entry across top 9999 scores: 0.46153846153846156


 43%|████▎     | 3/7 [00:00<00:00, 15.41it/s]

Bootstrapped 2 full traces after 4 examples in round 0.



Average Metric: 5 / 13  (38.5): 100%|██████████| 13/13 [00:00<00:00, 27.45it/s]


Average Metric: 5 / 13  (38.5%)
Score: 38.46 for set: [7, 7, 7]
Scores so far: [23.08, 23.08, 46.15, 38.46]
Best score: 46.15
Average of max per entry across top 1 scores: 0.46153846153846156
Average of max per entry across top 2 scores: 0.46153846153846156
Average of max per entry across top 3 scores: 0.46153846153846156
Average of max per entry across top 5 scores: 0.46153846153846156
Average of max per entry across top 8 scores: 0.46153846153846156
Average of max per entry across top 9999 scores: 0.46153846153846156


 14%|█▍        | 1/7 [00:00<00:00, 16.50it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 7 / 13  (53.8): 100%|██████████| 13/13 [00:00<00:00, 37.79it/s]


Average Metric: 7 / 13  (53.8%)
Score: 53.85 for set: [7, 7, 7]
New best score: 53.85 for seed 1
Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85]
Best score: 53.85
Average of max per entry across top 1 scores: 0.5384615384615384
Average of max per entry across top 2 scores: 0.6153846153846154
Average of max per entry across top 3 scores: 0.6153846153846154
Average of max per entry across top 5 scores: 0.6153846153846154
Average of max per entry across top 8 scores: 0.6153846153846154
Average of max per entry across top 9999 scores: 0.6153846153846154


 29%|██▊       | 2/7 [00:00<00:00, 17.20it/s]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 8 / 13  (61.5): 100%|██████████| 13/13 [00:00<00:00, 55.08it/s]


Average Metric: 8 / 13  (61.5%)
Score: 61.54 for set: [7, 7, 7]
New best score: 61.54 for seed 2
Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54]
Best score: 61.54
Average of max per entry across top 1 scores: 0.6153846153846154
Average of max per entry across top 2 scores: 0.6153846153846154
Average of max per entry across top 3 scores: 0.6923076923076923
Average of max per entry across top 5 scores: 0.6923076923076923
Average of max per entry across top 8 scores: 0.6923076923076923
Average of max per entry across top 9999 scores: 0.6923076923076923


 43%|████▎     | 3/7 [00:00<00:00, 17.15it/s]


Bootstrapped 1 full traces after 4 examples in round 0.


Average Metric: 8 / 13  (61.5): 100%|██████████| 13/13 [00:00<00:00, 50.97it/s]


Average Metric: 8 / 13  (61.5%)
Score: 61.54 for set: [7, 7, 7]
Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54]
Best score: 61.54
Average of max per entry across top 1 scores: 0.6153846153846154
Average of max per entry across top 2 scores: 0.6153846153846154
Average of max per entry across top 3 scores: 0.6153846153846154
Average of max per entry across top 5 scores: 0.6923076923076923
Average of max per entry across top 8 scores: 0.6923076923076923
Average of max per entry across top 9999 scores: 0.6923076923076923


 14%|█▍        | 1/7 [00:00<00:00, 11.73it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 5 / 13  (38.5): 100%|██████████| 13/13 [00:00<00:00, 38.16it/s]


Average Metric: 5 / 13  (38.5%)
Score: 38.46 for set: [7, 7, 7]
Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46]
Best score: 61.54
Average of max per entry across top 1 scores: 0.6153846153846154
Average of max per entry across top 2 scores: 0.6153846153846154
Average of max per entry across top 3 scores: 0.6153846153846154
Average of max per entry across top 5 scores: 0.6923076923076923
Average of max per entry across top 8 scores: 0.6923076923076923
Average of max per entry across top 9999 scores: 0.6923076923076923


 71%|███████▏  | 5/7 [00:00<00:00, 17.44it/s]


Bootstrapped 2 full traces after 6 examples in round 0.


Average Metric: 5 / 13  (38.5): 100%|██████████| 13/13 [00:00<00:00, 34.45it/s]


Average Metric: 5 / 13  (38.5%)
Score: 38.46 for set: [7, 7, 7]
Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46, 38.46]
Best score: 61.54
Average of max per entry across top 1 scores: 0.6153846153846154
Average of max per entry across top 2 scores: 0.6153846153846154
Average of max per entry across top 3 scores: 0.6153846153846154
Average of max per entry across top 5 scores: 0.6923076923076923
Average of max per entry across top 8 scores: 0.6923076923076923
Average of max per entry across top 9999 scores: 0.6923076923076923


 29%|██▊       | 2/7 [00:00<00:00, 20.30it/s]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 5 / 13  (38.5): 100%|██████████| 13/13 [00:00<00:00, 56.06it/s]


Average Metric: 5 / 13  (38.5%)
Score: 38.46 for set: [7, 7, 7]
Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46, 38.46, 38.46]
Best score: 61.54
Average of max per entry across top 1 scores: 0.6153846153846154
Average of max per entry across top 2 scores: 0.6153846153846154
Average of max per entry across top 3 scores: 0.6153846153846154
Average of max per entry across top 5 scores: 0.6923076923076923
Average of max per entry across top 8 scores: 0.6923076923076923
Average of max per entry across top 9999 scores: 0.6923076923076923


 43%|████▎     | 3/7 [00:00<00:00, 20.01it/s]


Bootstrapped 2 full traces after 4 examples in round 0.


Average Metric: 7 / 13  (53.8): 100%|██████████| 13/13 [00:00<00:00, 26.05it/s]

Average Metric: 7 / 13  (53.8%)
Score: 53.85 for set: [7, 7, 7]
Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46, 38.46, 38.46, 53.85]
Best score: 61.54
Average of max per entry across top 1 scores: 0.6153846153846154
Average of max per entry across top 2 scores: 0.6153846153846154
Average of max per entry across top 3 scores: 0.6153846153846154
Average of max per entry across top 5 scores: 0.8461538461538461
Average of max per entry across top 8 scores: 0.8461538461538461
Average of max per entry across top 9999 scores: 0.8461538461538461
11 candidate programs found.





In [20]:
# 调用 evaluate_hotpot 函数，传入 multihop_compiled 数据集和 dev 数据集作为参数
evaluate_hotpot(multihop_compiled, devset=dev)

Average Metric: 8 / 13  (61.5): 100%|██████████| 13/13 [00:00<00:00, 92.27it/s]

Average Metric: 8 / 13  (61.5%)





Unnamed: 0,question,example_answer,rationale,pred_answer,answer_exact_match
0,Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?,E. L. Doctorow,"answer this question. We know that E. L. Doctorow is an American novelist, editor, and professor, and he has been described as one of the...",E. L. Doctorow.,✔️ [True]
1,Right Back At It Again contains lyrics co-written by the singer born in what city?,"Gainesville, Florida","answer this question. We know that Beyoncé is an American singer, songwriter, dancer, and actress, and she was born in Houston, Texas. Her album ""Beyoncé""...",Houston.,❌ [False]
2,What year was the party of the winner of the 1971 San Francisco mayoral election founded?,1828,"answer this question. We know that the Democratic Party is one of the two major contemporary political parties in the United States, and it was...",1828.,✔️ [True]
3,Anthony Dirrell is the brother of which super middleweight title holder?,Andre Dirrell,answer this question. We know that Anthony Dirrell is a professional boxer who held the WBC super middleweight title from 2014 to 2015. We also...,Andre Dirrell.,✔️ [True]
4,The sports nutrition business established by Oliver Cookson is based in which county in the UK?,Cheshire,answer this question. We know that Oliver Cookson is a UK entrepreneur who established the sports nutrition business Myprotein. We also know that Myprotein was...,Cheshire.,✔️ [True]
5,Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.,"February 13, 1980","answer this question. We know that the actor's name is Jo Dong-hyuk, and he was born on December 11, 1977. Therefore, the answer is December...","December 11, 1977.",❌ [False]
6,Kyle Moran was born in the town on what river?,Castletown River,answer this question. We know that Kyle Moran is an Irish footballer who plays as a forward for Perth SC in the NPL Western Australia....,River Dundalk.,❌ [False]
7,"The actress who played the niece in the Priest film was born in what city, country?","Surrey, England","answer this question. We know that Lily Collins is an actress, and she was born in Surrey, England. Therefore, the answer is Surrey, England.","Surrey, England.",✔️ [True]
8,Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.,Portrait of a Marriage,"answer this question. We know that Cathryn Harrison is the daughter of Noel Harrison, and she is an English actress. One of her roles was...",First Daughter.,❌ [False]
9,What year was the father of the Princes in the Tower born?,1442,"answer this question. We know that the father of the Princes in the Tower was King Richard III of England, and he was born on...",1452.,❌ [False]


61.54

让我们现在检查一个问题的第二跳搜索查询的提示。

In [21]:
# 调用multihop_compiled函数，传入问题参数
multihop_compiled(question="Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?")
# 调用llama对象的inspect_history方法，设置参数n为1，skip为2
llama.inspect_history(n=1, skip=2)





Given the fields `context`, `question`, produce the fields `search_query`.

---

Follow the following format.

Context: ${context}

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the search_query}. We ...

Search Query: ${search_query}

---

Context:
[1] «The Dancing Wu Li Masters | The Dancing Wu Li Masters is a 1979 book by Gary Zukav, a popular science work exploring modern physics, and quantum phenomena in particular. It was awarded a 1980 U.S. National Book Award in category of Science. Although it explores empirical topics in modern physics research, "The Dancing Wu Li Masters" gained attention for leveraging metaphors taken from eastern spiritual movements, in particular the Huayen school of Buddhism with the monk Fazang's treatise on The Golden Lion, to explain quantum phenomena and has been regarded by some reviewers as a New Age work, although the book is mostly concerned with the work of pioneers in western physics down through the ages.