{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2019-09-23T18:50:19.036357Z",
"start_time": "2019-09-23T18:50:19.031896Z"
}
},
"source": [
"# SQL\n",
"\n",
"This notebook demonstrates how to [convert](https://nexus-forge.readthedocs.io/en/latest/interaction.html#converting) data in RDBMS to Resources and vice-versa using SQL and pandas Dataframe."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install flask_sqlalchemy"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2019-09-23T18:50:20.068658Z",
"start_time": "2019-09-23T18:50:19.054054Z"
}
},
"outputs": [],
"source": [
"from kgforge.core import KnowledgeGraphForge"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"forge = KnowledgeGraphForge(\"../../configurations/forge.yml\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2019-09-23T18:50:20.127987Z",
"start_time": "2019-09-23T18:50:20.119390Z"
}
},
"outputs": [],
"source": [
"from kgforge.core import Resource"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"from sqlalchemy import create_engine"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SQL Database initialization"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"engine = create_engine(\"sqlite:///:memory:\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resources to SQL"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"jane = Resource(name=\"Jane Doe\", email=\"jane.doe@epfl.ch\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"john = Resource(name=\"John Smith\", email=\"john.smith@epfl.ch\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"other = Resource(name=\"John Doe\", email=\"john.doe@unige.ch\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"persons = [jane, john, other]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Converting Resources to DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"df = forge.as_dataframe(persons)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Persisting Resources in the SQL Database.\n",
"\n",
"Note: Another way would be to have a Store implementation for the SQL Database."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"df.to_sql(\"Persons\", engine, index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## SQL to Resources"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Getting the table as a DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" email | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" jane.doe@epfl.ch | \n",
" Jane Doe | \n",
"
\n",
" \n",
" 1 | \n",
" john.smith@epfl.ch | \n",
" John Smith | \n",
"
\n",
" \n",
" 2 | \n",
" john.doe@unige.ch | \n",
" John Doe | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" email name\n",
"0 jane.doe@epfl.ch Jane Doe\n",
"1 john.smith@epfl.ch John Smith\n",
"2 john.doe@unige.ch John Doe"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_sql(\"SELECT * FROM Persons\", engine)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Filtering on the SQL Database to keep only persons with an EPFL email"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_sql(\"SELECT * FROM Persons WHERE email LIKE '%@epfl.ch'\", engine)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" email | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" jane.doe@epfl.ch | \n",
" Jane Doe | \n",
"
\n",
" \n",
" 1 | \n",
" john.smith@epfl.ch | \n",
" John Smith | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" email name\n",
"0 jane.doe@epfl.ch Jane Doe\n",
"1 john.smith@epfl.ch John Smith"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Converting DataFrame to Resources"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"persons = forge.from_dataframe(df)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" email: jane.doe@epfl.ch\n",
" name: Jane Doe\n",
"}\n",
"{\n",
" email: john.smith@epfl.ch\n",
" name: John Smith\n",
"}\n"
]
}
],
"source": [
"print(*persons, sep=\"\\n\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7 (nexusforgelatest)",
"language": "python",
"name": "nexusforgelatest"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}