{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5241d11d-38e4-471b-8023-aa6ccbe53104",
   "metadata": {},
   "source": [
    "<figure>\n",
    "<img src=\"../Imagenes/logo-final-ap.png\"  width=\"80\" height=\"80\" align=\"left\"/> \n",
    "</figure>\n",
    "\n",
    "# <span style=\"color:#4361EE\"><left>Aprendizaje Profundo</left></span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f790db7-dbbc-4024-a0ec-7c1a8cff35a5",
   "metadata": {
    "tags": []
   },
   "source": [
    "# <span style=\"color:red\"><center>Diplomado en Inteligencia Artificial y Aprendizaje Profundo</center></span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2ad70c4-c7a1-455b-8ff3-9d98fd84661b",
   "metadata": {
    "tags": []
   },
   "source": [
    "# <span style=\"color:green\"><center> Introducción a spaCy para NLP</center></span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9203b92c-78e7-48ba-b64c-1a4acd78dc47",
   "metadata": {},
   "source": [
    "<figure>\n",
    "<center>\n",
    "<img src=\"https://spacy.io/images/architecture.svg\" width=\"35%\" align=\"center\"/>\n",
    "</center>\n",
    "</figure>\n",
    "\n",
    "<center>Fuente: <a href=\"https://spacy.io/api\">Arquitectura de la librería spaCy</a></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16ce3fb1-bfe5-4a1c-9e0e-1e86bdacfca4",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Profesores</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8907b73-4b4c-412c-9f15-e72b72b240ab",
   "metadata": {},
   "source": [
    "1. Alvaro  Montenegro, PhD, ammontenegrod@unal.edu.co\n",
    "1. Camilo José Torres Jiménez, Msc, cjtorresj@unal.edu.co\n",
    "1. Daniel  Montenegro, Msc, dextronomo@gmail.com "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6bed094-dbe3-4c54-8983-7324a2ad1c5c",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Asesora Medios y Marketing digital</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce257431-c9e3-4679-8258-c9195aaf3cdb",
   "metadata": {},
   "source": [
    "4. Maria del Pilar Montenegro, pmontenegro88@gmail.com\n",
    "5. Jessica López Mejía, jelopezme@unal.edu.co\n",
    "6. Venus Puertas, vpuertasg@unal.edu.co"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cd9e0df-9088-4e70-b0cc-960b4b44c3d8",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Jefe Jurídica</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c944fb3f-707c-44bd-8f6e-dbbc0cfb3c92",
   "metadata": {},
   "source": [
    "7. Paula Andrea Guzmán, guzmancruz.paula@gmail.com"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a50b138d-1ca6-40d4-8ae4-317318876de3",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Coordinador Jurídico</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "287175c1-fcfb-491b-b963-2744010720a3",
   "metadata": {},
   "source": [
    "8. David Fuentes, fuentesd065@gmail.com"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccb24a86-284b-4292-ad38-f76a59c89c62",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Desarrolladores Principales</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38eed8e6-cd61-4a0e-b5c0-ab9275b52183",
   "metadata": {},
   "source": [
    "9. Dairo Moreno, damoralesj@unal.edu.co\n",
    "10. Joan Castro, jocastroc@unal.edu.co\n",
    "11. Bryan Riveros, briveros@unal.edu.co\n",
    "12. Rosmer Vargas, rovargasc@unal.edu.co"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f448aac2-368f-425c-81c3-c07139340bbf",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Expertos en Bases de Datos</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c4a7ffc-6bc9-49c1-86db-02d4fa3cf80b",
   "metadata": {},
   "source": [
    "13. Giovvani Barrera, udgiovanni@gmail.com\n",
    "14. Camilo Chitivo, cchitivo@unal.edu.co"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7905f969-c5f5-420d-bd0c-c4a12ecbda70",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Fuentes y referencias</span>\n",
    "\n",
    "Este cuaderno es una adaptación y traducción libre de las guías disponibles en la página oficial de [spaCy](https://spacy.io/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29aac8a5-4e5c-4878-9b9b-1c5446cfab79",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Introducción</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f567537c-b192-4db9-bf83-28f00d47cbf3",
   "metadata": {
    "tags": []
   },
   "source": [
    "### <span style=\"color:#4361EE\">Instalar el paquete spacy</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db808048-4039-41ac-987b-efb94d5d1545",
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install --quiet spacy\n",
    "#!conda install -c conda-forge spacy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "298da6e9-600d-4de7-aa35-8352065a8f32",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Descargar un pipeline entrenado</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1b7523d-bb95-4c48-abbc-06eac1c50b0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "!spacy download en_core_web_md"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a66fe5d0-1e03-49a7-bf33-c681ad613de7",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Importar el paquete y cargar el pipeline</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "011544f9-3864-4795-a814-3455cf8b3f9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import spacy\n",
    "from spacy import displacy\n",
    "\n",
    "nlp = spacy.load(\"en_core_web_md\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3ab7dbb-a174-4146-b585-46f4be15b9bd",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Documentos</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74b3c72d-4414-48d6-b2de-8bbe878a0d4b",
   "metadata": {},
   "outputs": [],
   "source": [
    "doc = nlp(\"Apple is looking at buying U.K. startup for $1 billion\")\n",
    "\n",
    "text = \"\"\"In ancient Rome, some neighbors live in three adjacent houses. In the center is the house of Senex, who lives there with wife Domina, son Hero, and several slaves, including head slave Hysterium and the musical's main character Pseudolus. A slave belonging to Hero, Pseudolus wishes to buy, win, or steal his freedom. One of the neighboring houses is owned by Marcus Lycus, who is a buyer and seller of beautiful women; the other belongs to the ancient Erronius, who is abroad searching for his long-lost children (stolen in infancy by pirates). One day, Senex and Domina go on a trip and leave Pseudolus in charge of Hero. Hero confides in Pseudolus that he is in love with the lovely Philia, one of the courtesans in the House of Lycus (albeit still a virgin).\"\"\"\n",
    "longer_doc = nlp(text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fea3ea49-9b49-40a0-bee3-5e67ae3fefad",
   "metadata": {
    "tags": []
   },
   "source": [
    "## <span style=\"color:#4361EE\">Anotaciones lingüísticas</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "477d0cd6-d728-4dca-ad66-a61567cb97c4",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Sentencias</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1554929-9db2-41cd-9ce1-e26ffb2e8d58",
   "metadata": {},
   "outputs": [],
   "source": [
    "sentence_spans = list(longer_doc.sents)\n",
    "print(sentence_spans)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff532122-ef30-4bf3-b280-4eeeabcf33cc",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Tokenizacion</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "304c0d1c-61b4-47fc-8423-d21f0fdc4bee",
   "metadata": {},
   "outputs": [],
   "source": [
    "for token in doc:\n",
    "    print(token.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e029aee3-3c05-41f0-99e1-53e724862fa7",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Etiquetas de partes del discurso y dependencias</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf24def3-92f0-4b9e-9d44-9ca625f8e12e",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f'{\"text\":15s}{\"lemma\":15s}{\"pos\":15s}{\"tag_\":15s}{\"dep_\":15s}{\"shape_\":15s}is_alpha, is_stop')\n",
    "print(\"-\"*110)\n",
    "for token in doc:\n",
    "    print(f'{token.text:15s}{token.lemma_:15s}{token.pos_:15s}{token.tag_:15s}{token.dep_:15s}{token.shape_:15s}{token.is_alpha}, {token.is_stop}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1614bdb8-4f3e-4d94-ab08-61cd7234d680",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Características morfológicas</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "849eff92-40e4-48e8-9ce8-8076f645e1c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "for token in doc:\n",
    "    print(token.morph)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc13486f-76a7-4b7f-86d7-f6efafad8533",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Visualización</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d3ed09e-b3d1-4895-887a-23a837421bf0",
   "metadata": {},
   "outputs": [],
   "source": [
    "displacy.render(doc, style=\"dep\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29da69d5-7cea-4739-a935-e38c78d5ef49",
   "metadata": {},
   "outputs": [],
   "source": [
    "displacy.render(sentence_spans, style=\"dep\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41852e3f-516b-4c0a-b09d-a707b10cc36f",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Entidades con nombre (propias)</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ec5beb3-cecf-425c-9dc5-82b9889293d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "for ent in doc.ents:\n",
    "    print(f'{ent.text:15s}{ent.label_:10}{ent.start_char:5d}{ent.end_char:5d}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a9610a4-484d-45a7-a4b1-9fb14fc7a3a1",
   "metadata": {},
   "source": [
    "### <span style=\"color:#4361EE\">Visualización</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69aadcc8-18cd-48fb-832e-647c55670ed4",
   "metadata": {},
   "outputs": [],
   "source": [
    "displacy.render(doc, style=\"ent\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "edf8986c-6e0f-4014-ac89-eda9c2906a80",
   "metadata": {},
   "outputs": [],
   "source": [
    "displacy.render(longer_doc, style=\"ent\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bc73d6d-361a-4abb-8df0-fe074cc458a0",
   "metadata": {},
   "source": [
    "## <span style=\"color:#4361EE\">Vectores de palabras y similitud</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f133b03-cb7f-4e70-99f9-541c3c4a6fd9",
   "metadata": {},
   "outputs": [],
   "source": [
    "for token in doc:\n",
    "    print(f'{token.text:15s}{token.vector_norm:10f} {token.has_vector} {token.is_oov}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b83ea5c1-fea9-4613-b9d9-9b7db05d02f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "doc1 = nlp(\"I like salty fries and hamburgers.\")\n",
    "doc2 = nlp(\"Fast food tastes very good.\")\n",
    "# Similitud de dos documentos\n",
    "print(doc1, \"<->\", doc2, \":\", doc1.similarity(doc2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0cad04e-98a8-4828-9a43-98595b0d520f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Similitud de tokens y spans\n",
    "french_fries = doc1[2:4]\n",
    "burgers = doc1[5]\n",
    "print(french_fries, \"<->\", burgers, \":\", french_fries.similarity(burgers))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}