{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u2AQa8mSutMi"
   },
   "source": [
    "# Que faire quand vous obtenez une erreur\n",
    "\n",
    "Ce chapitre portant sur le débogage, la langue nous importe peu ici. Nous nous intéressons surtout à la logique du code pour comprendre d'où provient l'erreur."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "P2ni9cGhutMj"
   },
   "source": [
    "Installez les bibliothèques 🤗 Transformers et 🤗 Datasets pour exécuter ce *notebook*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "X6cCDJcYutMk"
   },
   "outputs": [],
   "source": [
    "!pip install datasets transformers[sentencepiece]\n",
    "!apt install git-lfs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IxfNJAyputMm"
   },
   "source": [
    "Vous aurez besoin de configurer git, adaptez votre email et votre nom dans la cellule suivante."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "0jGA8gKSutMn"
   },
   "outputs": [],
   "source": [
    "!git config --global user.email \"you@example.com\"\n",
    "!git config --global user.name \"Your Name\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AKfiKg4_utMn"
   },
   "source": [
    "Vous devrez également être connecté au *Hub* d'Hugging Face. Exécutez ce qui suit et entrez vos informations d'identification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "tzQgJTfrutMo"
   },
   "outputs": [],
   "source": [
    "from huggingface_hub import notebook_login\n",
    "\n",
    "notebook_login()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "eE4HjzUiutMp"
   },
   "outputs": [],
   "source": [
    "from distutils.dir_util import copy_tree\n",
    "from huggingface_hub import Repository, snapshot_download, create_repo, get_full_repo_name\n",
    "\n",
    "\n",
    "def copy_repository_template():\n",
    "    # Cloner le dépôt et extraire le chemin local\n",
    "    template_repo_id = \"lewtun/distilbert-base-uncased-finetuned-squad-d5716d28\"\n",
    "    commit_hash = \"be3eaffc28669d7932492681cd5f3e8905e358b4\"\n",
    "    template_repo_dir = snapshot_download(template_repo_id, revision=commit_hash)\n",
    "    # Créer un dépôt vide sur le Hub\n",
    "    model_name = template_repo_id.split(\"/\")[1]\n",
    "    create_repo(model_name, exist_ok=True)\n",
    "    # Clonez le dépôt vide\n",
    "    new_repo_id = get_full_repo_name(model_name)\n",
    "    new_repo_dir = model_name\n",
    "    repo = Repository(local_dir=new_repo_dir, clone_from=new_repo_id)\n",
    "    # Copier les fichiers\n",
    "    copy_tree(template_repo_dir, new_repo_dir)\n",
    "    # Pousser sur le Hub\n",
    "    repo.push_to_hub()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "yCAxD5q2utMp",
    "outputId": "af9064c9-2d5e-4344-8925-0444cdf5dbad"
   },
   "outputs": [],
   "source": [
    "from transformers import pipeline\n",
    "\n",
    "model_checkpoint = get_full_repo_name(\"distillbert-base-uncased-finetuned-squad-d5716d28\")\n",
    "reader = pipeline(\"question-answering\", model=model_checkpoint)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "n9CaRxrqutMq",
    "outputId": "ab9e1b9c-3c06-4df2-fbeb-0cce8a9feebe"
   },
   "outputs": [],
   "source": [
    "model_checkpoint = get_full_repo_name(\"distilbert-base-uncased-finetuned-squad-d5716d28\")\n",
    "reader = pipeline(\"question-answering\", model=model_checkpoint)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "pRlCOKgzutMq",
    "outputId": "a1875547-98ad-445d-acf8-a40f538e16e1"
   },
   "outputs": [],
   "source": [
    "from huggingface_hub import list_repo_files\n",
    "\n",
    "list_repo_files(repo_id=model_checkpoint)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "244m28l_utMs"
   },
   "outputs": [],
   "source": [
    "from transformers import AutoConfig\n",
    "\n",
    "pretrained_checkpoint = \"distilbert-base-uncased\"\n",
    "config = AutoConfig.from_pretrained(pretrained_checkpoint)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fGLU4XLjutMs"
   },
   "outputs": [],
   "source": [
    "config.push_to_hub(model_checkpoint, commit_message=\"Add config.json\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_weqzIC7utMs",
    "outputId": "3780c995-7148-48e7-9886-f772e4f1f9f9"
   },
   "outputs": [],
   "source": [
    "reader = pipeline(\"question-answering\", model=model_checkpoint, revision=\"main\")\n",
    "\n",
    "context = r\"\"\"\n",
    "Extractive Question Answering is the task of extracting an answer from a text\n",
    "given a question. An example of a question answering dataset is the SQuAD\n",
    "dataset, which is entirely based on that task. If you would like to fine-tune a\n",
    "model on a SQuAD task, you may leverage the\n",
    "examples/pytorch/question-answering/run_squad.py script.\n",
    "\n",
    "🤗 Transformers is interoperable with the PyTorch, TensorFlow, and JAX\n",
    "frameworks, so you can use your favourite tools for a wide variety of tasks!\n",
    "\"\"\"\n",
    "\n",
    "question = \"What is extractive question answering?\"\n",
    "reader(question=question, context=context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "s99eTrMJutMt"
   },
   "outputs": [],
   "source": [
    "tokenizer = reader.tokenizer\n",
    "model = reader.model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-8Dv8M3RutMt"
   },
   "outputs": [],
   "source": [
    "question = \"Which frameworks can I use?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "zO_yZE_SutMu",
    "outputId": "bb218c79-77de-4e55-e3bb-48b9efd0d14f"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "inputs = tokenizer(question, context, add_special_tokens=True)\n",
    "input_ids = inputs[\"input_ids\"][0]\n",
    "outputs = model(**inputs)\n",
    "answer_start_scores = outputs.start_logits\n",
    "answer_end_scores = outputs.end_logits\n",
    "# Obtenir le début de réponse le plus probable avec l'argmax du score\n",
    "answer_start = torch.argmax(answer_start_scores)\n",
    "# Obtenir la fin de réponse la plus probable avec l'argmax du score\n",
    "answer_end = torch.argmax(answer_end_scores) + 1\n",
    "answer = tokenizer.convert_tokens_to_string(\n",
    "    tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])\n",
    ")\n",
    "print(f\"Question: {question}\")\n",
    "print(f\"Answer: {answer}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "FiT9Y5bCutMu",
    "outputId": "e737e934-9611-4120-aa1c-97f498e19696"
   },
   "outputs": [],
   "source": [
    "inputs[\"input_ids\"][:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "rUuyiTdLutMv",
    "outputId": "a5099876-f9da-452d-b607-cdb262db6eff"
   },
   "outputs": [],
   "source": [
    "type(inputs[\"input_ids\"])"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}