{"cells":[{"cell_type":"markdown","metadata":{},"source":["\"DSPy7\n","\n","# DSPy: 在 SkyCamp 的教程\n"]},{"cell_type":"markdown","metadata":{},"source":["这个笔记本包含了 **SkyCamp 2023** 的 **DSPy 教程**。\n","\n","让我们从设置开始。下面的代码片段将会在需要时安装 **DSPy**。"]},{"cell_type":"code","execution_count":1,"metadata":{},"outputs":[],"source":["%load_ext autoreload\n","%autoreload 2\n","\n","import sys\n","import os\n","\n","try: # 当在谷歌Colab上时,让我们克隆笔记本以便下载缓存。\n"," import google.colab\n"," repo_path = 'dspy'\n"," !git -C $repo_path pull origin || git clone https://github.com/stanfordnlp/dspy $repo_path\n","except:\n"," repo_path = '.'\n","\n","if repo_path not in sys.path:\n"," sys.path.append(repo_path)\n","\n","# 为这个笔记本设置缓存\n","os.environ[\"DSP_NOTEBOOK_CACHEDIR\"] = os.path.join(repo_path, 'cache')\n","\n","# import pkg_resources # 如果未安装该包,则安装该包\n","# if not \"dspy-ai\" in {pkg.key for pkg in pkg_resources.working_set}:\n","# !pip install -U pip\n","# # !pip install dspy-ai\n","# !pip install -e $repo_path\n","\n","import dspy"]},{"cell_type":"code","execution_count":2,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The autoreload extension is already loaded. To reload it, use:\n"," %reload_ext autoreload\n","./cache/compiler\n"]}],"source":["import dspy # 导入dspy库\n","from dspy.evaluate import Evaluate # 从dspy.evaluate模块导入Evaluate类\n","from dspy.datasets.hotpotqa import HotPotQA # 从dspy.datasets.hotpotqa模块导入HotPotQA类\n","from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune # 从dspy.teleprompt模块导入BootstrapFewShot、BootstrapFewShotWithRandomSearch、BootstrapFinetune类"]},{"cell_type":"markdown","metadata":{},"source":["### 1) 配置默认的语言模型和检索模型\n","\n","我们将首先设置语言模型(LM)和检索模型(RM)。**DSPy**支持多个API和本地模型。\n","\n","在这个笔记本中,我们将使用`Llama2-13b-chat`,使用HuggingFace TGI服务软件基础设施。原则上,您可以在自己的本地GPU上运行此模型,但是在本教程中,所有示例都是预缓存的,因此您无需担心成本问题。\n","\n","我们将使用检索模型`ColBERTv2`。为了简化操作,我们已经设置了一个ColBERTv2服务器,托管了一个维基百科2017年“摘要”搜索索引(即包含来自[2017倾销](https://hotpotqa.github.io/wiki-readme.html)的每篇文章的第一段),因此您无需担心设置问题!而且是免费的。\n","\n","**注意:** _如果按照说明运行此笔记本,则无需API密钥。所有示例已经在内部缓存,因此您可以检查它们!_"]},{"cell_type":"code","execution_count":3,"metadata":{},"outputs":[],"source":["llama = dspy.HFClientTGI(model=\"meta-llama/Llama-2-13b-chat-hf\", port=[7140, 7141, 7142, 7143], max_tokens=150)\n","colbertv2 = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')\n","\n","# # 注意:在完成这个笔记本之后,如果你愿意,你可以像这样使用GPT-3.5。\n","# turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct')\n","# # 在这种情况下,如果你选择这样做,请确保在下面配置lm=turbo。\n","\n","dspy.settings.configure(rm=colbertv2, lm=llama)"]},{"cell_type":"markdown","metadata":{},"source":["### 2) 为我们的任务创建一些问题-答案对"]},{"cell_type":"code","execution_count":4,"metadata":{},"outputs":[],"source":["# 定义一个包含问题和答案的训练数据集\n","train = [('Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?', 'Kevin Greutert'),\n"," ('The heir to the Du Pont family fortune sponsored what wrestling team?', 'Foxcatcher'),\n"," ('In what year was the star of To Hell and Back born?', '1925'),\n"," ('Which award did the first book of Gary Zukav receive?', 'U.S. National Book Award'),\n"," ('What documentary about the Gilgo Beach Killer debuted on A&E?', 'The Killing Season'),\n"," ('Which author is English: John Braine or Studs Terkel?', 'John Braine'),\n"," ('Who produced the album that included a re-recording of \"Lithium\"?', 'Butch Vig')]\n","\n","# 使用dspy库中的Example类创建训练实例,每个实例包含问题和答案,并将问题作为输入\n","train = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]"]},{"cell_type":"code","execution_count":5,"metadata":{},"outputs":[],"source":["# 定义一个包含问题和答案的列表\n","dev = [('Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?', 'E. L. Doctorow'),\n"," ('Right Back At It Again contains lyrics co-written by the singer born in what city?', 'Gainesville, Florida'),\n"," ('What year was the party of the winner of the 1971 San Francisco mayoral election founded?', '1828'),\n"," ('Anthony Dirrell is the brother of which super middleweight title holder?', 'Andre Dirrell'),\n"," ('The sports nutrition business established by Oliver Cookson is based in which county in the UK?', 'Cheshire'),\n"," ('Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.', 'February 13, 1980'),\n"," ('Kyle Moran was born in the town on what river?', 'Castletown River'),\n"," (\"The actress who played the niece in the Priest film was born in what city, country?\", 'Surrey, England'),\n"," ('Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.', 'Portrait of a Marriage'),\n"," ('What year was the father of the Princes in the Tower born?', '1442'),\n"," ('What river is near the Crichton Collegiate Church?', 'the River Tyne'),\n"," ('Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?', 'Renault'),\n"," ('André Zucca was a French photographer who worked with a German propaganda magazine published by what Nazi organization?', 'the Wehrmacht')]\n","\n","# 将问题和答案转换为dspy.Example对象,并指定输入为'question'\n","dev = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]"]},{"cell_type":"markdown","metadata":{},"source":["### 3) 关键概念:签名与模块"]},{"cell_type":"code","execution_count":6,"metadata":{},"outputs":[{"data":{"text/plain":["Prediction(\n"," answer='Berlin'\n",")"]},"execution_count":6,"metadata":{},"output_type":"execute_result"}],"source":["# 定义一个 dspy.Predict 模块,其签名为 `question -> answer`(即,接受一个问题并输出一个答案)。\n","predict = dspy.Predict('question -> answer')\n","\n","# 使用该模块!\n","predict(question=\"What is the capital of Germany?\")"]},{"cell_type":"markdown","metadata":{},"source":["在上面的示例中,我们使用了`dspy.Predict`模块**零样本**,即在任何示例上都没有编译它。\n","\n","现在让我们构建一个稍微更高级的程序。我们的程序将使用`dspy.ChainOfThought`模块,该模块要求LM逐步思考。\n","\n","我们将称此程序为`CoT`。"]},{"cell_type":"code","execution_count":7,"metadata":{},"outputs":[],"source":["class CoT(dspy.Module): # 定义一个新模块\n"," def __init__(self):\n"," super().__init__()\n","\n"," # 声明思维链子模块,以便后续编译(例如,教给它一个提示)\n"," self.generate_answer = dspy.ChainOfThought('question -> answer')\n"," \n"," def forward(self, question):\n"," return self.generate_answer(question=question) # 在这里使用模块"]},{"cell_type":"markdown","metadata":{},"source":["现在让我们使用我们的六个`train`示例来编译这个。我们将在DSPy中使用非常简单的`BootstrapFewShot`。"]},{"cell_type":"code","execution_count":8,"metadata":{},"outputs":[{"name":"stderr","output_type":"stream","text":["100%|██████████| 7/7 [00:00<00:00, 29.36it/s]"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 7 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["\n"]}],"source":["# 定义评估指标为精确匹配\n","metric_EM = dspy.evaluate.answer_exact_match\n","\n","# 创建BootstrapFewShot实例,设置评估指标为精确匹配,最大bootstrap演示次数为2\n","teleprompter = BootstrapFewShot(metric=metric_EM, max_bootstrapped_demos=2)\n","\n","# 编译CoT模型,使用训练集trainset进行训练\n","cot_compiled = teleprompter.compile(CoT(), trainset=train)"]},{"cell_type":"markdown","metadata":{},"source":["让我们向这个新程序提一个问题。"]},{"cell_type":"code","execution_count":9,"metadata":{},"outputs":[{"data":{"text/plain":["Prediction(\n"," rationale='determine the capital of Germany. We know that the capital of Germany is Berlin, so the answer is Berlin.',\n"," answer='Berlin'\n",")"]},"execution_count":9,"metadata":{},"output_type":"execute_result"}],"source":["# 调用cot_compiled函数,并传入参数\"What is the capital of Germany?\"\n","cot_compiled(\"What is the capital of Germany?\")"]},{"cell_type":"markdown","metadata":{},"source":["你可能会好奇发生了什么。让我们检查一下最后一次调用我们的Llama LM,看看提示和输出。"]},{"cell_type":"code","execution_count":10,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["\n","\n","\n","\n","Given the fields `question`, produce the fields `answer`.\n","\n","---\n","\n","Question: Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?\n","Answer: Kevin Greutert\n","\n","Question: Which award did the first book of Gary Zukav receive?\n","Answer: U.S. National Book Award\n","\n","Question: What documentary about the Gilgo Beach Killer debuted on A&E?\n","Answer: The Killing Season\n","\n","Question: In what year was the star of To Hell and Back born?\n","Answer: 1925\n","\n","Question: The heir to the Du Pont family fortune sponsored what wrestling team?\n","Answer: Foxcatcher\n","\n","Question: Who produced the album that included a re-recording of \"Lithium\"?\n","Answer: Butch Vig\n","\n","---\n","\n","Follow the following format.\n","\n","Question: ${question}\n","Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n","Answer: ${answer}\n","\n","---\n","\n","Question: Which author is English: John Braine or Studs Terkel?\n","Reasoning: Let's think step by step in order to determine which author is English. We know that John Braine is English, so we need to determine if Studs Terkel is English. After researching, we found that Studs Terkel was an American author, so the answer is John Braine.\n","Answer: John Braine\n","\n","---\n","\n","Question: What is the capital of Germany?\n","Reasoning: Let's think step by step in order to determine the capital of Germany. We know that the capital of Germany is Berlin, so the answer is Berlin.\n","Answer:\u001b[32m Berlin\n","\u001b[0m\n","\n","\n","\n"]}],"source":["# 使用llama.inspect_history(n=1)函数来查看llama对象的历史记录,n=1表示只查看最近的1条历史记录\n","llama.inspect_history(n=1)"]},{"cell_type":"markdown","metadata":{},"source":["请注意提示以我们提出的问题结尾(“德国的首都是哪里?”),但在此之前包含了少样本示例。\n","\n","提示中的最后一个示例包含了一个自动生成的理由(逐步推理),用作演示的LM,用于训练问题“哪位作者是英国人:约翰·布雷恩还是斯塔兹·特克尔?”。\n","\n","现在,让我们在开发集上进行评估。"]},{"cell_type":"code","execution_count":11,"metadata":{},"outputs":[],"source":["NUM_THREADS = 32\n","# 创建一个Evaluate对象,传入参数devset为dev,metric为metric_EM,num_threads为NUM_THREADS,display_progress为True,display_table为15\n","evaluate_hotpot = Evaluate(devset=dev, metric=metric_EM, num_threads=NUM_THREADS, display_progress=True, display_table=15)"]},{"cell_type":"markdown","metadata":{},"source":["首先,让我们使用 Llama 来评估编译后的 `CoT` 程序。请随意将下面的 `cot_compiled` 替换为 `CoT()`(注意括号),以测试 CoT 的零-shot 版本。"]},{"cell_type":"code","execution_count":12,"metadata":{},"outputs":[{"name":"stderr","output_type":"stream","text":["Average Metric: 3 / 13 (23.1): 100%|██████████| 13/13 [00:00<00:00, 117.05it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 3 / 13 (23.1%)\n"]},{"data":{"text/html":["\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
 questionexample_answerrationalepred_answeranswer_exact_match
0Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?E. L. Doctorowdetermine who has a broader scope of profession. We know that E. L. Doctorow was a novelist, but Julia Peterkin was a journalist and a...Julia Peterkin❌ [False]
1Right Back At It Again contains lyrics co-written by the singer born in what city?Gainesville, Floridadetermine the answer. We know that the singer was born in Minneapolis, so the answer is Minneapolis.Minneapolis❌ [False]
2What year was the party of the winner of the 1971 San Francisco mayoral election founded?1828determine the year the party of the winner of the 1971 San Francisco mayoral election was founded. We know that the party was founded in...1971❌ [False]
3Anthony Dirrell is the brother of which super middleweight title holder?Andre Dirrelldetermine which super middleweight title holder Anthony Dirrell is the brother of. We know that Anthony Dirrell is a professional boxer, and after researching, we...Andre Dirrell✔️ [True]
4The sports nutrition business established by Oliver Cookson is based in which county in the UK?Cheshiredetermine the county in the UK where Oliver Cookson's sports nutrition business is based. We know that Oliver Cookson is a British entrepreneur, so we...Surrey❌ [False]
5Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.February 13, 1980determine the birth date of the actor. We know that the actor was born in the 1950s, so we need to narrow down the possible...August 12, 1955❌ [False]
6Kyle Moran was born in the town on what river?Castletown Riverdetermine where Kyle Moran was born. We know that Kyle Moran was born in the town on the Delaware River.Delaware River❌ [False]
7The actress who played the niece in the Priest film was born in what city, country?Surrey, Englanddetermine the answer. We know that the actress was born in a city, so we need to determine the country. After researching, we found that...Los Angeles, California❌ [False]
8Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.Portrait of a Marriagedetermine the name of the movie. We know that the daughter of Noel Harrison plays Violet Trefusis, so we need to determine the name of...The Remains of the Day❌ [False]
9What year was the father of the Princes in the Tower born?1442determine the answer. We know that the father of the Princes in the Tower was born before 1483, so we need to find the correct...1452❌ [False]
10What river is near the Crichton Collegiate Church?the River Tynedetermine what river is near the Crichton Collegiate Church. We know that the church is located in Scotland, so we need to determine which river...River Tyne✔️ [True]
11Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?Renaultdetermine who purchased the team. We know that Michael Schumacher raced for the Benetton team in the 1995 Monaco Grand Prix, so we need to...Renault✔️ [True]
12André Zucca was a French photographer who worked with a German propaganda magazine published by what Nazi organization?the Wehrmachtdetermine which Nazi organization André Zucca worked with. We know that he worked with a German propaganda magazine, so we need to determine which Nazi...SS❌ [False]
\n"],"text/plain":[""]},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":["23.08"]},"execution_count":12,"metadata":{},"output_type":"execute_result"}],"source":["# 评估热锅代码\n","evaluate_hotpot(cot_compiled)"]},{"cell_type":"markdown","metadata":{},"source":["### 4) 奖励1:带有查询生成的RAG\n","\n","作为奖励,让我们定义一个更复杂的程序称为`RAG`。 这个程序将:\n","\n","- 使用语言模型根据输入问题生成一个搜索查询\n","- 使用我们的检索器检索三个段落\n","- 使用语言模型使用这些段落生成最终答案"]},{"cell_type":"code","execution_count":13,"metadata":{},"outputs":[],"source":["class RAG(dspy.Module):\n"," def __init__(self, num_passages=3):\n"," super().__init__()\n","\n"," # 声明三个模块:检索器、查询生成器和答案生成器\n"," self.retrieve = dspy.Retrieve(k=num_passages)\n"," self.generate_query = dspy.ChainOfThought(\"question -> search_query\")\n"," self.generate_answer = dspy.ChainOfThought(\"context, question -> answer\")\n"," \n"," def forward(self, question):\n"," # 从问题生成搜索查询,并用它检索段落\n"," search_query = self.generate_query(question=question).search_query\n"," passages = self.retrieve(search_query).passages\n","\n"," # 从段落和问题生成答案\n"," return self.generate_answer(context=passages, question=question)"]},{"cell_type":"markdown","metadata":{},"source":["出于好奇,我们可以评估这个程序的**未编译**(或**零射击**)版本。"]},{"cell_type":"code","execution_count":14,"metadata":{},"outputs":[{"name":"stderr","output_type":"stream","text":["Average Metric: 3 / 13 (23.1): 100%|██████████| 13/13 [00:00<00:00, 45.09it/s]"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 3 / 13 (23.1%)\n"]},{"name":"stderr","output_type":"stream","text":["\n"]},{"data":{"text/plain":["23.08"]},"execution_count":14,"metadata":{},"output_type":"execute_result"}],"source":["# 导入必要的库\n","from RAG import RAG\n","\n","# 调用 evaluate_hotpot 函数,并传入 RAG() 实例作为参数,display_table 参数设置为 0\n","evaluate_hotpot(RAG(), display_table=0)"]},{"cell_type":"markdown","metadata":{},"source":["让我们现在编译这个RAG程序。这次我们将使用稍微更先进的提示器(自动提示优化器),它依赖于随机搜索。"]},{"cell_type":"code","execution_count":15,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["Going to sample between 1 and 2 traces per predictor.\n","Will attempt to train 8 candidate sets.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 3 / 13 (23.1): 100%|██████████| 13/13 [00:00<00:00, 155.65it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 3 / 13 (23.1%)\n","Score: 23.08 for set: [0, 0]\n","New best score: 23.08 for seed -3\n","Scores so far: [23.08]\n","Best score: 23.08\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 3 / 13 (23.1): 100%|██████████| 13/13 [00:00<00:00, 72.77it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 3 / 13 (23.1%)\n","Score: 23.08 for set: [7, 7]\n","Scores so far: [23.08, 23.08]\n","Best score: 23.08\n"]},{"name":"stderr","output_type":"stream","text":[" 86%|████████▌ | 6/7 [00:00<00:00, 13.07it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 2 full traces after 7 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 5 / 13 (38.5): 100%|██████████| 13/13 [00:00<00:00, 45.43it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 5 / 13 (38.5%)\n","Score: 38.46 for set: [7, 7]\n","New best score: 38.46 for seed -1\n","Scores so far: [23.08, 23.08, 38.46]\n","Best score: 38.46\n","Average of max per entry across top 1 scores: 0.38461538461538464\n","Average of max per entry across top 2 scores: 0.46153846153846156\n","Average of max per entry across top 3 scores: 0.46153846153846156\n","Average of max per entry across top 5 scores: 0.46153846153846156\n","Average of max per entry across top 8 scores: 0.46153846153846156\n","Average of max per entry across top 9999 scores: 0.46153846153846156\n"]},{"name":"stderr","output_type":"stream","text":["100%|██████████| 7/7 [00:00<00:00, 19.01it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 7 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 6 / 13 (46.2): 100%|██████████| 13/13 [00:00<00:00, 42.01it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 6 / 13 (46.2%)\n","Score: 46.15 for set: [7, 7]\n","New best score: 46.15 for seed 0\n","Scores so far: [23.08, 23.08, 38.46, 46.15]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n"]},{"name":"stderr","output_type":"stream","text":[" 14%|█▍ | 1/7 [00:00<00:00, 21.10it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 2 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 4 / 13 (30.8): 100%|██████████| 13/13 [00:00<00:00, 68.72it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 4 / 13 (30.8%)\n","Score: 30.77 for set: [7, 7]\n","Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n"]},{"name":"stderr","output_type":"stream","text":[" 29%|██▊ | 2/7 [00:00<00:00, 21.89it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 3 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 4 / 13 (30.8): 100%|██████████| 13/13 [00:00<00:00, 67.99it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 4 / 13 (30.8%)\n","Score: 30.77 for set: [7, 7]\n","Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n"]},{"name":"stderr","output_type":"stream","text":[" 43%|████▎ | 3/7 [00:00<00:00, 21.61it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 4 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 3 / 13 (23.1): 100%|██████████| 13/13 [00:00<00:00, 61.97it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 3 / 13 (23.1%)\n","Score: 23.08 for set: [7, 7]\n","Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n"]},{"name":"stderr","output_type":"stream","text":[" 14%|█▍ | 1/7 [00:00<00:00, 21.15it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 2 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 5 / 13 (38.5): 100%|██████████| 13/13 [00:00<00:00, 44.95it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 5 / 13 (38.5%)\n","Score: 38.46 for set: [7, 7]\n","Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n"]},{"name":"stderr","output_type":"stream","text":["100%|██████████| 7/7 [00:00<00:00, 22.62it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 7 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 4 / 13 (30.8): 100%|██████████| 13/13 [00:00<00:00, 66.59it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 4 / 13 (30.8%)\n","Score: 30.77 for set: [7, 7]\n","Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46, 30.77]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n"]},{"name":"stderr","output_type":"stream","text":[" 57%|█████▋ | 4/7 [00:00<00:00, 23.46it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 5 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 4 / 13 (30.8): 100%|██████████| 13/13 [00:00<00:00, 68.29it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 4 / 13 (30.8%)\n","Score: 30.77 for set: [7, 7]\n","Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46, 30.77, 30.77]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n"]},{"name":"stderr","output_type":"stream","text":["100%|██████████| 7/7 [00:00<00:00, 20.87it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 7 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 4 / 13 (30.8): 100%|██████████| 13/13 [00:00<00:00, 70.76it/s]"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 4 / 13 (30.8%)\n","Score: 30.77 for set: [7, 7]\n","Scores so far: [23.08, 23.08, 38.46, 46.15, 30.77, 30.77, 23.08, 38.46, 30.77, 30.77, 30.77]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.5384615384615384\n","Average of max per entry across top 3 scores: 0.5384615384615384\n","Average of max per entry across top 5 scores: 0.5384615384615384\n","Average of max per entry across top 8 scores: 0.5384615384615384\n","Average of max per entry across top 9999 scores: 0.5384615384615384\n","11 candidate programs found.\n"]},{"name":"stderr","output_type":"stream","text":["\n"]}],"source":["# 创建一个BootstrapFewShotWithRandomSearch对象,指定metric为metric_EM,最大bootstrapped演示数量为2,候选程序数量为8,线程数量为NUM_THREADS\n","teleprompter2 = BootstrapFewShotWithRandomSearch(metric=metric_EM, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=NUM_THREADS)\n","\n","# 编译teleprompter2对象,使用RAG模型,训练集为train,验证集为dev\n","rag_compiled = teleprompter2.compile(RAG(), trainset=train, valset=dev)"]},{"cell_type":"markdown","metadata":{},"source":["让我们现在评估这个编译版本的RAG。"]},{"cell_type":"code","execution_count":16,"metadata":{},"outputs":[{"name":"stderr","output_type":"stream","text":["Average Metric: 6 / 13 (46.2): 100%|██████████| 13/13 [00:00<00:00, 137.18it/s]"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 6 / 13 (46.2%)\n"]},{"name":"stderr","output_type":"stream","text":["\n"]},{"data":{"text/html":["\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
 questionexample_answerrationalepred_answeranswer_exact_match
0Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?E. L. Doctorowanswer this question. We know that E. L. Doctorow and Julia Peterkin are both authors, but we also know that E. L. Doctorow is known...E. L. Doctorow.✔️ [True]
1Right Back At It Again contains lyrics co-written by the singer born in what city?Gainesville, Floridaanswer this question. We know that Beyoncé is the singer who co-wrote the lyrics of \"Right Back At It Again\". We also know that Beyoncé...Houston.❌ [False]
2What year was the party of the winner of the 1971 San Francisco mayoral election founded?1828answer this question. We know that the winner of the 1971 San Francisco mayoral election was a member of the Democratic Party. We also know...1828.✔️ [True]
3Anthony Dirrell is the brother of which super middleweight title holder?Andre Dirrellanswer this question. We know that Anthony Dirrell is a professional boxer. We also know that he held the WBC super middleweight title from 2014...Andre Dirrell.✔️ [True]
4The sports nutrition business established by Oliver Cookson is based in which county in the UK?Cheshireanswer this question. We know that Oliver Cookson established Myprotein, a sports nutrition business. We also know that Myprotein was sold for a reported £58...Cheshire.✔️ [True]
5Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.February 13, 1980answer this question. We know that the actor played roles in \"First Wives Club\" and \"Searching for the Elephant\". We also know that the actor's...October 17, 1976.❌ [False]
6Kyle Moran was born in the town on what river?Castletown Riveranswer this question. We know that Kyle Moran is an actor who was born in Livingston. We also know that Livingston is a town in...River Forth.❌ [False]
7The actress who played the niece in the Priest film was born in what city, country?Surrey, Englandanswer this question. We know that Lily Collins is an actress and the daughter of Phil Collins. We also know that she was born in...Surrey, England.✔️ [True]
8Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.Portrait of a Marriageanswer this question. We know that Noel Harrison is the father of Dhani Harrison. We also know that Dhani Harrison is a member of the...The daughter of Noel Harrison plays Violet Trefusis in the movie \"The Killing Season\".❌ [False]
9What year was the father of the Princes in the Tower born?1442answer this question. We know that the father of the Princes in the Tower was King Richard III of England. We also know that he...1452.❌ [False]
10What river is near the Crichton Collegiate Church?the River Tyneanswer this question. We know that Crichton Collegiate Church is situated in Midlothian, Scotland. We also know that the church is near the hamlet of...River Tyne.✔️ [True]
11Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?Renaultanswer this question. We know that Michael Schumacher raced for the Benetton team in the 1995 Monaco Grand Prix. We also know that Gilberto Benetton...Gilberto Benetton.❌ [False]
12André Zucca was a French photographer who worked with a German propaganda magazine published by what Nazi organization?the Wehrmachtanswer this question. We know that André Zucca was a French photographer who worked with a German propaganda magazine. We also know that the magazine...Nazi organization.❌ [False]
\n"],"text/plain":[""]},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":["46.15"]},"execution_count":16,"metadata":{},"output_type":"execute_result"}],"source":["# 调用 evaluate_hotpot 函数并传入 rag_compiled 参数\n","evaluate_hotpot(rag_compiled)"]},{"cell_type":"markdown","metadata":{},"source":["让我们检查其中一个LM调用。特别关注提示中最后几个输入/输出示例的结构。"]},{"cell_type":"code","execution_count":17,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["\n","\n","\n","\n","Given the fields `context`, `question`, produce the fields `answer`.\n","\n","---\n","\n","Question: Which author is English: John Braine or Studs Terkel?\n","Answer: John Braine\n","\n","Question: The heir to the Du Pont family fortune sponsored what wrestling team?\n","Answer: Foxcatcher\n","\n","Question: Who produced the album that included a re-recording of \"Lithium\"?\n","Answer: Butch Vig\n","\n","Question: In what year was the star of To Hell and Back born?\n","Answer: 1925\n","\n","Question: What documentary about the Gilgo Beach Killer debuted on A&E?\n","Answer: The Killing Season\n","\n","Question: Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?\n","Answer: Kevin Greutert\n","\n","---\n","\n","Follow the following format.\n","\n","Context: ${context}\n","\n","Question: ${question}\n","\n","Reasoning: Let's think step by step in order to ${produce the answer}. We ...\n","\n","Answer: ${answer}\n","\n","---\n","\n","Context:\n","[1] «The Dancing Wu Li Masters | The Dancing Wu Li Masters is a 1979 book by Gary Zukav, a popular science work exploring modern physics, and quantum phenomena in particular. It was awarded a 1980 U.S. National Book Award in category of Science. Although it explores empirical topics in modern physics research, \"The Dancing Wu Li Masters\" gained attention for leveraging metaphors taken from eastern spiritual movements, in particular the Huayen school of Buddhism with the monk Fazang's treatise on The Golden Lion, to explain quantum phenomena and has been regarded by some reviewers as a New Age work, although the book is mostly concerned with the work of pioneers in western physics down through the ages.»\n","[2] «Gary Zukav | Gary Zukav (born October 17, 1942) is an American spiritual teacher and the author of four consecutive New York Times Best Sellers. Beginning in 1998, he appeared more than 30 times on \"The Oprah Winfrey Show\" to discuss transformation in human consciousness concepts presented in his book \"The Seat of the Soul\". His first book, \"The Dancing Wu Li Masters\" (1979), won a U.S. National Book Award.»\n","[3] «Li Junfeng | Master Li Junfeng (born October 13, 1938 in Gaocheng, Hebei) is a qigong master, the founder of Sheng Zhen Qigong, and a world-renowned wushu coach. He has also starred-in and choreographed several Chinese martial arts films.»\n","\n","Question: Which award did the first book of Gary Zukav receive?\n","\n","Reasoning: Let's think step by step in order to answer this question. We know that Gary Zukav is the author of \"The Dancing Wu Li Masters\". We also know that this book was awarded a U.S. National Book Award in category of Science. Therefore, the answer is the U.S. National Book Award.\n","\n","Answer: U.S. National Book Award.\n","\n","---\n","\n","Context:\n","[1] «Democratic Party (United States) | The Democratic Party is one of the two major contemporary political parties in the United States, along with the Republican Party. Tracing its heritage back to Thomas Jefferson and James Madison's Democratic-Republican Party, the modern-day Democratic Party was founded around 1828 by supporters of Andrew Jackson, making it the world's oldest political party.»\n","[2] «Democratic Party (South Korea, 2008) | The Democratic Party (Hangul: 민주당 hanja: 民主黨 ) was a liberal political party in South Korea. Since its foundation in 2008, it was the main opposition party in the 18th Assembly. In late 2011, it merged into the Democratic United Party.»\n","[3] «Democrat Party (Turkey, current) | The Democratic Party (Turkish: \"Demokrat Parti\" ), abbreviated to DP, is a centre-right, conservative Turkish political party, established by Süleyman Demirel in 1983 as the True Path Party (Turkish: \"Doğru Yol Partisi\" or DYP). It succeeded the historical Democratic Party and the Justice Party, two parties with similar ideologies.»\n","\n","Question: What year was the party of the winner of the 1971 San Francisco mayoral election founded?\n","\n","Reasoning: Let's think step by step in order to answer this question. We know that the winner of the 1971 San Francisco mayoral election was a member of the Democratic Party. We also know that the Democratic Party was founded around 1828. Therefore, the answer is 1828.\n","\n","Answer:\u001b[32m 1828.\n","\u001b[0m\n","\n","\n","\n"]}],"source":["# 调用rag_compiled函数,提出问题:\"1971年旧金山市长选举的获胜者的政党是在哪一年成立的?\"\n","rag_compiled(\"What year was the party of the winner of the 1971 San Francisco mayoral election founded?\")\n","\n","# 调用llama对象的inspect_history方法,查看最近的1条历史记录\n","llama.inspect_history(n=1)"]},{"cell_type":"markdown","metadata":{},"source":["### 4) 奖励2:多跳检索和推理"]},{"cell_type":"markdown","metadata":{},"source":["让我们现在构建一个简单的多跳程序,该程序将交替调用语言模型(LM)和检索器。\n","\n","请按照下面的**TODO**指示来实现这个。"]},{"cell_type":"code","execution_count":18,"metadata":{},"outputs":[],"source":["from dsp.utils.utils import deduplicate\n","\n","class MultiHop(dspy.Module):\n"," def __init__(self, num_passages=3):\n"," super().__init__()\n","\n"," self.retrieve = dspy.Retrieve(k=num_passages)\n"," self.generate_query = dspy.ChainOfThought(\"question -> search_query\")\n","\n"," # TODO: 使用具有签名'context, question -> search_query'的dspy.ChainOfThought模块。\n"," self.generate_query_from_context = dspy.ChainOfThought(\"context, question -> search_query\")\n","\n"," self.generate_answer = dspy.ChainOfThought(\"context, question -> answer\")\n"," \n"," def forward(self, question):\n"," passages = []\n"," \n"," search_query = self.generate_query(question=question).search_query\n"," passages += self.retrieve(search_query).passages\n","\n"," # TODO: 使用self.generate_query_from_context生成搜索查询。\n"," # 注意:模块需要命名关键字参数(例如,context=..., question=...)。\n"," search_query = self.generate_query_from_context(context=passages, question=question).search_query\n","\n"," # TODO: 使用self.retrieve检索段落。将它们附加到列表`passages`中。\n"," passages += self.retrieve(search_query).passages\n","\n"," return self.generate_answer(context=deduplicate(passages), question=question)"]},{"cell_type":"code","execution_count":19,"metadata":{},"outputs":[{"name":"stderr","output_type":"stream","text":["Average Metric: 3 / 13 (23.1): 100%|██████████| 13/13 [00:00<00:00, 40.91it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 3 / 13 (23.1%)\n","Score: 23.08 for set: [0, 0, 0]\n","New best score: 23.08 for seed -3\n","Scores so far: [23.08]\n","Best score: 23.08\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 3 / 13 (23.1): 100%|██████████| 13/13 [00:00<00:00, 53.59it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 3 / 13 (23.1%)\n","Score: 23.08 for set: [7, 7, 7]\n","Scores so far: [23.08, 23.08]\n","Best score: 23.08\n"]},{"name":"stderr","output_type":"stream","text":[" 57%|█████▋ | 4/7 [00:00<00:00, 11.10it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 2 full traces after 5 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 6 / 13 (46.2): 100%|██████████| 13/13 [00:00<00:00, 27.22it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 6 / 13 (46.2%)\n","Score: 46.15 for set: [7, 7, 7]\n","New best score: 46.15 for seed -1\n","Scores so far: [23.08, 23.08, 46.15]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.46153846153846156\n","Average of max per entry across top 3 scores: 0.46153846153846156\n","Average of max per entry across top 5 scores: 0.46153846153846156\n","Average of max per entry across top 8 scores: 0.46153846153846156\n","Average of max per entry across top 9999 scores: 0.46153846153846156\n"]},{"name":"stderr","output_type":"stream","text":[" 43%|████▎ | 3/7 [00:00<00:00, 15.41it/s]"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 2 full traces after 4 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["\n","Average Metric: 5 / 13 (38.5): 100%|██████████| 13/13 [00:00<00:00, 27.45it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 5 / 13 (38.5%)\n","Score: 38.46 for set: [7, 7, 7]\n","Scores so far: [23.08, 23.08, 46.15, 38.46]\n","Best score: 46.15\n","Average of max per entry across top 1 scores: 0.46153846153846156\n","Average of max per entry across top 2 scores: 0.46153846153846156\n","Average of max per entry across top 3 scores: 0.46153846153846156\n","Average of max per entry across top 5 scores: 0.46153846153846156\n","Average of max per entry across top 8 scores: 0.46153846153846156\n","Average of max per entry across top 9999 scores: 0.46153846153846156\n"]},{"name":"stderr","output_type":"stream","text":[" 14%|█▍ | 1/7 [00:00<00:00, 16.50it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 2 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 7 / 13 (53.8): 100%|██████████| 13/13 [00:00<00:00, 37.79it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 7 / 13 (53.8%)\n","Score: 53.85 for set: [7, 7, 7]\n","New best score: 53.85 for seed 1\n","Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85]\n","Best score: 53.85\n","Average of max per entry across top 1 scores: 0.5384615384615384\n","Average of max per entry across top 2 scores: 0.6153846153846154\n","Average of max per entry across top 3 scores: 0.6153846153846154\n","Average of max per entry across top 5 scores: 0.6153846153846154\n","Average of max per entry across top 8 scores: 0.6153846153846154\n","Average of max per entry across top 9999 scores: 0.6153846153846154\n"]},{"name":"stderr","output_type":"stream","text":[" 29%|██▊ | 2/7 [00:00<00:00, 17.20it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 3 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 8 / 13 (61.5): 100%|██████████| 13/13 [00:00<00:00, 55.08it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 8 / 13 (61.5%)\n","Score: 61.54 for set: [7, 7, 7]\n","New best score: 61.54 for seed 2\n","Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54]\n","Best score: 61.54\n","Average of max per entry across top 1 scores: 0.6153846153846154\n","Average of max per entry across top 2 scores: 0.6153846153846154\n","Average of max per entry across top 3 scores: 0.6923076923076923\n","Average of max per entry across top 5 scores: 0.6923076923076923\n","Average of max per entry across top 8 scores: 0.6923076923076923\n","Average of max per entry across top 9999 scores: 0.6923076923076923\n"]},{"name":"stderr","output_type":"stream","text":[" 43%|████▎ | 3/7 [00:00<00:00, 17.15it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 4 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 8 / 13 (61.5): 100%|██████████| 13/13 [00:00<00:00, 50.97it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 8 / 13 (61.5%)\n","Score: 61.54 for set: [7, 7, 7]\n","Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54]\n","Best score: 61.54\n","Average of max per entry across top 1 scores: 0.6153846153846154\n","Average of max per entry across top 2 scores: 0.6153846153846154\n","Average of max per entry across top 3 scores: 0.6153846153846154\n","Average of max per entry across top 5 scores: 0.6923076923076923\n","Average of max per entry across top 8 scores: 0.6923076923076923\n","Average of max per entry across top 9999 scores: 0.6923076923076923\n"]},{"name":"stderr","output_type":"stream","text":[" 14%|█▍ | 1/7 [00:00<00:00, 11.73it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 2 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 5 / 13 (38.5): 100%|██████████| 13/13 [00:00<00:00, 38.16it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 5 / 13 (38.5%)\n","Score: 38.46 for set: [7, 7, 7]\n","Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46]\n","Best score: 61.54\n","Average of max per entry across top 1 scores: 0.6153846153846154\n","Average of max per entry across top 2 scores: 0.6153846153846154\n","Average of max per entry across top 3 scores: 0.6153846153846154\n","Average of max per entry across top 5 scores: 0.6923076923076923\n","Average of max per entry across top 8 scores: 0.6923076923076923\n","Average of max per entry across top 9999 scores: 0.6923076923076923\n"]},{"name":"stderr","output_type":"stream","text":[" 71%|███████▏ | 5/7 [00:00<00:00, 17.44it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 2 full traces after 6 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 5 / 13 (38.5): 100%|██████████| 13/13 [00:00<00:00, 34.45it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 5 / 13 (38.5%)\n","Score: 38.46 for set: [7, 7, 7]\n","Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46, 38.46]\n","Best score: 61.54\n","Average of max per entry across top 1 scores: 0.6153846153846154\n","Average of max per entry across top 2 scores: 0.6153846153846154\n","Average of max per entry across top 3 scores: 0.6153846153846154\n","Average of max per entry across top 5 scores: 0.6923076923076923\n","Average of max per entry across top 8 scores: 0.6923076923076923\n","Average of max per entry across top 9999 scores: 0.6923076923076923\n"]},{"name":"stderr","output_type":"stream","text":[" 29%|██▊ | 2/7 [00:00<00:00, 20.30it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 1 full traces after 3 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 5 / 13 (38.5): 100%|██████████| 13/13 [00:00<00:00, 56.06it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 5 / 13 (38.5%)\n","Score: 38.46 for set: [7, 7, 7]\n","Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46, 38.46, 38.46]\n","Best score: 61.54\n","Average of max per entry across top 1 scores: 0.6153846153846154\n","Average of max per entry across top 2 scores: 0.6153846153846154\n","Average of max per entry across top 3 scores: 0.6153846153846154\n","Average of max per entry across top 5 scores: 0.6923076923076923\n","Average of max per entry across top 8 scores: 0.6923076923076923\n","Average of max per entry across top 9999 scores: 0.6923076923076923\n"]},{"name":"stderr","output_type":"stream","text":[" 43%|████▎ | 3/7 [00:00<00:00, 20.01it/s]\n"]},{"name":"stdout","output_type":"stream","text":["Bootstrapped 2 full traces after 4 examples in round 0.\n"]},{"name":"stderr","output_type":"stream","text":["Average Metric: 7 / 13 (53.8): 100%|██████████| 13/13 [00:00<00:00, 26.05it/s]"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 7 / 13 (53.8%)\n","Score: 53.85 for set: [7, 7, 7]\n","Scores so far: [23.08, 23.08, 46.15, 38.46, 53.85, 61.54, 61.54, 38.46, 38.46, 38.46, 53.85]\n","Best score: 61.54\n","Average of max per entry across top 1 scores: 0.6153846153846154\n","Average of max per entry across top 2 scores: 0.6153846153846154\n","Average of max per entry across top 3 scores: 0.6153846153846154\n","Average of max per entry across top 5 scores: 0.8461538461538461\n","Average of max per entry across top 8 scores: 0.8461538461538461\n","Average of max per entry across top 9999 scores: 0.8461538461538461\n","11 candidate programs found.\n"]},{"name":"stderr","output_type":"stream","text":["\n"]}],"source":["# 使用teleprompter2库中的compile函数编译MultiHop模型,并传入训练集train和验证集dev\n","multihop_compiled = teleprompter2.compile(MultiHop(), trainset=train, valset=dev)"]},{"cell_type":"code","execution_count":20,"metadata":{},"outputs":[{"name":"stderr","output_type":"stream","text":["Average Metric: 8 / 13 (61.5): 100%|██████████| 13/13 [00:00<00:00, 92.27it/s]"]},{"name":"stdout","output_type":"stream","text":["Average Metric: 8 / 13 (61.5%)\n"]},{"name":"stderr","output_type":"stream","text":["\n"]},{"data":{"text/html":["\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
 questionexample_answerrationalepred_answeranswer_exact_match
0Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?E. L. Doctorowanswer this question. We know that E. L. Doctorow is an American novelist, editor, and professor, and he has been described as one of the...E. L. Doctorow.✔️ [True]
1Right Back At It Again contains lyrics co-written by the singer born in what city?Gainesville, Floridaanswer this question. We know that Beyoncé is an American singer, songwriter, dancer, and actress, and she was born in Houston, Texas. Her album \"Beyoncé\"...Houston.❌ [False]
2What year was the party of the winner of the 1971 San Francisco mayoral election founded?1828answer this question. We know that the Democratic Party is one of the two major contemporary political parties in the United States, and it was...1828.✔️ [True]
3Anthony Dirrell is the brother of which super middleweight title holder?Andre Dirrellanswer this question. We know that Anthony Dirrell is a professional boxer who held the WBC super middleweight title from 2014 to 2015. We also...Andre Dirrell.✔️ [True]
4The sports nutrition business established by Oliver Cookson is based in which county in the UK?Cheshireanswer this question. We know that Oliver Cookson is a UK entrepreneur who established the sports nutrition business Myprotein. We also know that Myprotein was...Cheshire.✔️ [True]
5Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.February 13, 1980answer this question. We know that the actor's name is Jo Dong-hyuk, and he was born on December 11, 1977. Therefore, the answer is December...December 11, 1977.❌ [False]
6Kyle Moran was born in the town on what river?Castletown Riveranswer this question. We know that Kyle Moran is an Irish footballer who plays as a forward for Perth SC in the NPL Western Australia....River Dundalk.❌ [False]
7The actress who played the niece in the Priest film was born in what city, country?Surrey, Englandanswer this question. We know that Lily Collins is an actress, and she was born in Surrey, England. Therefore, the answer is Surrey, England.Surrey, England.✔️ [True]
8Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.Portrait of a Marriageanswer this question. We know that Cathryn Harrison is the daughter of Noel Harrison, and she is an English actress. One of her roles was...First Daughter.❌ [False]
9What year was the father of the Princes in the Tower born?1442answer this question. We know that the father of the Princes in the Tower was King Richard III of England, and he was born on...1452.❌ [False]
10What river is near the Crichton Collegiate Church?the River Tyneanswer this question. We know that Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. We...River Tyne.✔️ [True]
11Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?Renaultanswer this question. We know that Michael Schumacher raced for the Benetton team in the 1995 Monaco Grand Prix. In 2000, the Benetton team was...Renault.✔️ [True]
12André Zucca was a French photographer who worked with a German propaganda magazine published by what Nazi organization?the Wehrmachtanswer this question. We know that André Zucca was a French photographer who worked with a German propaganda magazine called \"Signal\". Therefore, the answer is...Wehrmacht.✔️ [True]
\n"],"text/plain":[""]},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":["61.54"]},"execution_count":20,"metadata":{},"output_type":"execute_result"}],"source":["# 调用 evaluate_hotpot 函数,传入 multihop_compiled 数据集和 dev 数据集作为参数\n","evaluate_hotpot(multihop_compiled, devset=dev)"]},{"cell_type":"markdown","metadata":{},"source":["让我们现在检查一个问题的第二跳搜索查询的提示。"]},{"cell_type":"code","execution_count":21,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["\n","\n","\n","\n","Given the fields `context`, `question`, produce the fields `search_query`.\n","\n","---\n","\n","Follow the following format.\n","\n","Context: ${context}\n","\n","Question: ${question}\n","\n","Reasoning: Let's think step by step in order to ${produce the search_query}. We ...\n","\n","Search Query: ${search_query}\n","\n","---\n","\n","Context:\n","[1] «The Dancing Wu Li Masters | The Dancing Wu Li Masters is a 1979 book by Gary Zukav, a popular science work exploring modern physics, and quantum phenomena in particular. It was awarded a 1980 U.S. National Book Award in category of Science. Although it explores empirical topics in modern physics research, \"The Dancing Wu Li Masters\" gained attention for leveraging metaphors taken from eastern spiritual movements, in particular the Huayen school of Buddhism with the monk Fazang's treatise on The Golden Lion, to explain quantum phenomena and has been regarded by some reviewers as a New Age work, although the book is mostly concerned with the work of pioneers in western physics down through the ages.»\n","[2] «Gary Zukav | Gary Zukav (born October 17, 1942) is an American spiritual teacher and the author of four consecutive New York Times Best Sellers. Beginning in 1998, he appeared more than 30 times on \"The Oprah Winfrey Show\" to discuss transformation in human consciousness concepts presented in his book \"The Seat of the Soul\". His first book, \"The Dancing Wu Li Masters\" (1979), won a U.S. National Book Award.»\n","[3] «Li Junfeng | Master Li Junfeng (born October 13, 1938 in Gaocheng, Hebei) is a qigong master, the founder of Sheng Zhen Qigong, and a world-renowned wushu coach. He has also starred-in and choreographed several Chinese martial arts films.»\n","[4] «The Dancing Wu Li Masters | The Dancing Wu Li Masters is a 1979 book by Gary Zukav, a popular science work exploring modern physics, and quantum phenomena in particular. It was awarded a 1980 U.S. National Book Award in category of Science. Although it explores empirical topics in modern physics research, \"The Dancing Wu Li Masters\" gained attention for leveraging metaphors taken from eastern spiritual movements, in particular the Huayen school of Buddhism with the monk Fazang's treatise on The Golden Lion, to explain quantum phenomena and has been regarded by some reviewers as a New Age work, although the book is mostly concerned with the work of pioneers in western physics down through the ages.»\n","[5] «Gary Zukav | Gary Zukav (born October 17, 1942) is an American spiritual teacher and the author of four consecutive New York Times Best Sellers. Beginning in 1998, he appeared more than 30 times on \"The Oprah Winfrey Show\" to discuss transformation in human consciousness concepts presented in his book \"The Seat of the Soul\". His first book, \"The Dancing Wu Li Masters\" (1979), won a U.S. National Book Award.»\n","[6] «Wu Pao-chun | Wu Pao-chun (, born 5 September 1970), is a Taiwanese baker best known for winning the title of Master Baker in the bread category of the 2010 Bakery Masters competition held in Paris. Wu is also known for a rose-lychee bread he created which includes Taiwanese ingredients such as millet wine, rose petals and dried lychees.»\n","\n","Question: Which award did the first book of Gary Zukav receive?\n","\n","Reasoning: Let's think step by step in order to produce the search query. We know that the first book of Gary Zukav is \"The Dancing Wu Li Masters\". We also know that this book received an award. Therefore, we can start by searching for the name of the award that the book received.\n","\n","Search Query: \"The Dancing Wu Li Masters\" award\n","\n","---\n","\n","Context:\n","[1] «Benetton Group | Benetton Group S.r.l. (correct ] ; often mispronounced ] or ] ) is a global fashion brand, based in Ponzano Veneto, Italy. The name comes from the Benetton family who founded the company in 1965.»\n","[2] «Benetton Rugby | Benetton Rugby (] or ] ) are an Italian professional rugby union team based in Treviso, Veneto competing in the Pro14 and the European Rugby Champions Cup.»\n","[3] «Gilberto Benetton | Gilberto Benetton (born 19 June 1941) is an Italian billionaire businessman, one of the co-founders of Benetton Group, the Italian fashion brand.»\n","\n","Question: Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?\n","\n","Reasoning: Let's think step by step in order to produce the search query. We know that Michael Schumacher raced for a team in the 1995 Monaco Grand Prix. We also know that the team was purchased by someone in 2000. Therefore, we can start by searching for the name of the team that Michael Schumacher raced for in the 1995 Monaco Grand Prix.\n","\n","Search Query:\u001b[32m Michael Schumacher team Monaco Grand Prix 1995\n","\u001b[0m\n","\n","\n","\n"]}],"source":["# 调用multihop_compiled函数,传入问题参数\n","multihop_compiled(question=\"Who purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000?\")\n","# 调用llama对象的inspect_history方法,设置参数n为1,skip为2\n","llama.inspect_history(n=1, skip=2)"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":[]}],"metadata":{"kernelspec":{"display_name":"py39_aug2023_dspy","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.17"},"orig_nbformat":4},"nbformat":4,"nbformat_minor":2}