{ "cells": [ { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:19.036357Z", "start_time": "2019-09-23T18:50:19.031896Z" } }, "source": [ "# SQL\n", "\n", "This notebook demonstrates how to [convert](https://nexus-forge.readthedocs.io/en/latest/interaction.html#converting) data in RDBMS to Resources and vice-versa using SQL and pandas Dataframe." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! pip install flask_sqlalchemy" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.068658Z", "start_time": "2019-09-23T18:50:19.054054Z" } }, "outputs": [], "source": [ "from kgforge.core import KnowledgeGraphForge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "forge = KnowledgeGraphForge(\"../../configurations/forge.yml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2019-09-23T18:50:20.127987Z", "start_time": "2019-09-23T18:50:20.119390Z" } }, "outputs": [], "source": [ "from kgforge.core import Resource" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from sqlalchemy import create_engine" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SQL Database initialization" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "engine = create_engine(\"sqlite:///:memory:\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources to SQL" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "jane = Resource(name=\"Jane Doe\", email=\"jane.doe@epfl.ch\")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "john = Resource(name=\"John Smith\", email=\"john.smith@epfl.ch\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "other = Resource(name=\"John Doe\", email=\"john.doe@unige.ch\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "persons = [jane, john, other]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Converting Resources to DataFrame" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "df = forge.as_dataframe(persons)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Persisting Resources in the SQL Database.\n", "\n", "Note: Another way would be to have a Store implementation for the SQL Database." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "df.to_sql(\"Persons\", engine, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SQL to Resources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Getting the table as a DataFrame" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emailname
0jane.doe@epfl.chJane Doe
1john.smith@epfl.chJohn Smith
2john.doe@unige.chJohn Doe
\n", "
" ], "text/plain": [ " email name\n", "0 jane.doe@epfl.ch Jane Doe\n", "1 john.smith@epfl.ch John Smith\n", "2 john.doe@unige.ch John Doe" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_sql(\"SELECT * FROM Persons\", engine)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filtering on the SQL Database to keep only persons with an EPFL email" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df = pd.read_sql(\"SELECT * FROM Persons WHERE email LIKE '%@epfl.ch'\", engine)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emailname
0jane.doe@epfl.chJane Doe
1john.smith@epfl.chJohn Smith
\n", "
" ], "text/plain": [ " email name\n", "0 jane.doe@epfl.ch Jane Doe\n", "1 john.smith@epfl.ch John Smith" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Converting DataFrame to Resources" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "persons = forge.from_dataframe(df)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " email: jane.doe@epfl.ch\n", " name: Jane Doe\n", "}\n", "{\n", " email: john.smith@epfl.ch\n", " name: John Smith\n", "}\n" ] } ], "source": [ "print(*persons, sep=\"\\n\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.7 (nexusforgelatest)", "language": "python", "name": "nexusforgelatest" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }