{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "u2AQa8mSutMi" }, "source": [ "# Que faire quand vous obtenez une erreur\n", "\n", "Ce chapitre portant sur le débogage, la langue nous importe peu ici. Nous nous intéressons surtout à la logique du code pour comprendre d'où provient l'erreur." ] }, { "cell_type": "markdown", "metadata": { "id": "P2ni9cGhutMj" }, "source": [ "Installez les bibliothèques 🤗 Transformers et 🤗 Datasets pour exécuter ce *notebook*." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "X6cCDJcYutMk" }, "outputs": [], "source": [ "!pip install datasets transformers[sentencepiece]\n", "!apt install git-lfs" ] }, { "cell_type": "markdown", "metadata": { "id": "IxfNJAyputMm" }, "source": [ "Vous aurez besoin de configurer git, adaptez votre email et votre nom dans la cellule suivante." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0jGA8gKSutMn" }, "outputs": [], "source": [ "!git config --global user.email \"you@example.com\"\n", "!git config --global user.name \"Your Name\"" ] }, { "cell_type": "markdown", "metadata": { "id": "AKfiKg4_utMn" }, "source": [ "Vous devrez également être connecté au *Hub* d'Hugging Face. Exécutez ce qui suit et entrez vos informations d'identification." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tzQgJTfrutMo" }, "outputs": [], "source": [ "from huggingface_hub import notebook_login\n", "\n", "notebook_login()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "eE4HjzUiutMp" }, "outputs": [], "source": [ "from distutils.dir_util import copy_tree\n", "from huggingface_hub import Repository, snapshot_download, create_repo, get_full_repo_name\n", "\n", "\n", "def copy_repository_template():\n", " # Cloner le dépôt et extraire le chemin local\n", " template_repo_id = \"lewtun/distilbert-base-uncased-finetuned-squad-d5716d28\"\n", " commit_hash = \"be3eaffc28669d7932492681cd5f3e8905e358b4\"\n", " template_repo_dir = snapshot_download(template_repo_id, revision=commit_hash)\n", " # Créer un dépôt vide sur le Hub\n", " model_name = template_repo_id.split(\"/\")[1]\n", " create_repo(model_name, exist_ok=True)\n", " # Clonez le dépôt vide\n", " new_repo_id = get_full_repo_name(model_name)\n", " new_repo_dir = model_name\n", " repo = Repository(local_dir=new_repo_dir, clone_from=new_repo_id)\n", " # Copier les fichiers\n", " copy_tree(template_repo_dir, new_repo_dir)\n", " # Pousser sur le Hub\n", " repo.push_to_hub()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yCAxD5q2utMp", "outputId": "af9064c9-2d5e-4344-8925-0444cdf5dbad" }, "outputs": [], "source": [ "from transformers import pipeline\n", "\n", "model_checkpoint = get_full_repo_name(\"distillbert-base-uncased-finetuned-squad-d5716d28\")\n", "reader = pipeline(\"question-answering\", model=model_checkpoint)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "n9CaRxrqutMq", "outputId": "ab9e1b9c-3c06-4df2-fbeb-0cce8a9feebe" }, "outputs": [], "source": [ "model_checkpoint = get_full_repo_name(\"distilbert-base-uncased-finetuned-squad-d5716d28\")\n", "reader = pipeline(\"question-answering\", model=model_checkpoint)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pRlCOKgzutMq", "outputId": "a1875547-98ad-445d-acf8-a40f538e16e1" }, "outputs": [], "source": [ "from huggingface_hub import list_repo_files\n", "\n", "list_repo_files(repo_id=model_checkpoint)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "244m28l_utMs" }, "outputs": [], "source": [ "from transformers import AutoConfig\n", "\n", "pretrained_checkpoint = \"distilbert-base-uncased\"\n", "config = AutoConfig.from_pretrained(pretrained_checkpoint)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fGLU4XLjutMs" }, "outputs": [], "source": [ "config.push_to_hub(model_checkpoint, commit_message=\"Add config.json\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_weqzIC7utMs", "outputId": "3780c995-7148-48e7-9886-f772e4f1f9f9" }, "outputs": [], "source": [ "reader = pipeline(\"question-answering\", model=model_checkpoint, revision=\"main\")\n", "\n", "context = r\"\"\"\n", "Extractive Question Answering is the task of extracting an answer from a text\n", "given a question. An example of a question answering dataset is the SQuAD\n", "dataset, which is entirely based on that task. If you would like to fine-tune a\n", "model on a SQuAD task, you may leverage the\n", "examples/pytorch/question-answering/run_squad.py script.\n", "\n", "🤗 Transformers is interoperable with the PyTorch, TensorFlow, and JAX\n", "frameworks, so you can use your favourite tools for a wide variety of tasks!\n", "\"\"\"\n", "\n", "question = \"What is extractive question answering?\"\n", "reader(question=question, context=context)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "s99eTrMJutMt" }, "outputs": [], "source": [ "tokenizer = reader.tokenizer\n", "model = reader.model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-8Dv8M3RutMt" }, "outputs": [], "source": [ "question = \"Which frameworks can I use?\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zO_yZE_SutMu", "outputId": "bb218c79-77de-4e55-e3bb-48b9efd0d14f" }, "outputs": [], "source": [ "import torch\n", "\n", "inputs = tokenizer(question, context, add_special_tokens=True)\n", "input_ids = inputs[\"input_ids\"][0]\n", "outputs = model(**inputs)\n", "answer_start_scores = outputs.start_logits\n", "answer_end_scores = outputs.end_logits\n", "# Obtenir le début de réponse le plus probable avec l'argmax du score\n", "answer_start = torch.argmax(answer_start_scores)\n", "# Obtenir la fin de réponse la plus probable avec l'argmax du score\n", "answer_end = torch.argmax(answer_end_scores) + 1\n", "answer = tokenizer.convert_tokens_to_string(\n", " tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end])\n", ")\n", "print(f\"Question: {question}\")\n", "print(f\"Answer: {answer}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FiT9Y5bCutMu", "outputId": "e737e934-9611-4120-aa1c-97f498e19696" }, "outputs": [], "source": [ "inputs[\"input_ids\"][:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "rUuyiTdLutMv", "outputId": "a5099876-f9da-452d-b607-cdb262db6eff" }, "outputs": [], "source": [ "type(inputs[\"input_ids\"])" ] } ], "metadata": { "colab": { "collapsed_sections": [], "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 1 }