{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook [1]: First steps with cdQA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook shows how to use the `cdQA` pipeline to perform question answering on a custom dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Note:*** *If you are using colab, you will need to install `cdQA` by executing `!pip install cdqa` in a cell.*" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-07-20T13:32:09.138284Z", "start_time": "2019-07-20T13:32:01.868622Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/andre.farias/python3.7.0/lib/python3.7/site-packages/tqdm/autonotebook/__init__.py:18: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n", " \" (e.g. in jupyter console)\", TqdmExperimentalWarning)\n" ] } ], "source": [ "import os\n", "import pandas as pd\n", "from ast import literal_eval\n", "\n", "from cdqa.utils.filters import filter_paragraphs\n", "from cdqa.pipeline import QAPipeline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download pre-trained reader model and example dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-07-20T13:33:36.002880Z", "start_time": "2019-07-20T13:32:10.618797Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Downloading BNP data...\n", "\n", "Downloading trained model...\n" ] } ], "source": [ "from cdqa.utils.download import download_model, download_bnpp_data\n", "\n", "download_bnpp_data(dir='./data/bnpp_newsroom_v1.1/')\n", "download_model(model='bert-squad_1.1', dir='./models')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize the dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-07-20T13:35:00.377971Z", "start_time": "2019-07-20T13:34:59.764491Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | date | \n", "title | \n", "category | \n", "link | \n", "abstract | \n", "paragraphs | \n", "
---|---|---|---|---|---|---|
0 | \n", "13.05.2019 | \n", "The banking jobs : Assistant Vice President – ... | \n", "Careers | \n", "https://group.bnpparibas/en/news/banking-jobs-... | \n", "Within the Group’s Corporate and Institutional... | \n", "[I manage a team in charge of designing and im... | \n", "
1 | \n", "13.05.2019 | \n", "BNP Paribas at #VivaTech : discover the progra... | \n", "Innovation | \n", "https://group.bnpparibas/en/news/bnp-paribas-v... | \n", "From Thursday 16 to Saturday 18 May 2019, join... | \n", "[With François Hollande, Chairman of French fo... | \n", "
2 | \n", "13.05.2019 | \n", "\"The bank with an IT budget of more than EUR6 ... | \n", "Group | \n", "https://group.bnpparibas/en/news/the-bank-budg... | \n", "Interview with Jean-Laurent Bonnafé, Director ... | \n", "[We did the groundwork between 2012 and 2016, ... | \n", "
3 | \n", "10.05.2019 | \n", "BNP Paribas at #VivaTech : discover the progra... | \n", "Innovation | \n", "https://group.bnpparibas/en/news/bnp-paribas-v... | \n", "From Thursday 16 to Saturday 18 May 2019, join... | \n", "[As part of the ‘United Tech of Europe’ theme,... | \n", "
4 | \n", "10.05.2019 | \n", "When Artificial Intelligence participates in r... | \n", "Careers | \n", "https://group.bnpparibas/en/news/artificial-in... | \n", "As the competition to attract talent intensifi... | \n", "[Online recruitment is already the norm. Accor... | \n", "