{ "cells": [ { "cell_type": "markdown", "id": "22d93371-d72a-498a-b25e-affb11659f0d", "metadata": {}, "source": [ "## NeoSchema library - Tutorial 2 : set up a simple Schema (Classes, Properties) \n", "## and perform a data import (Data Nodes and relationships among them)\n", "\n", "If you're new to Graph Databases, you can think of \"Classes\" and \"Properties\" along the lines of, respectively, \"Table names\" and \"Table field lists\".\n", "\n", "If you need to first clear out your test database, one of the cells below (currently commented out) will conveniently let you do it\n", "\n", "#### [Background Article on Schema in Graph Databases](https://julianspolymathexplorations.blogspot.com/2022/11/schema-graph-databases-neo4j.html)" ] }, { "cell_type": "code", "execution_count": 1, "id": "910c294a-eb6b-43d7-9369-980f20974e12", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 'D:\\Docs\\- MY CODE\\Brain Annex\\BA-Win7' to sys.path\n" ] } ], "source": [ "import set_path # Importing this module will add the project's home directory to sys.path" ] }, { "cell_type": "code", "execution_count": 2, "id": "e00686a6-c019-414e-92be-a44d32cfe138", "metadata": {}, "outputs": [], "source": [ "import os\n", "import sys\n", "import getpass\n", "import pandas as pd\n", "\n", "from neoaccess import NeoAccess\n", "from BrainAnnex.modules.neo_schema.neo_schema import NeoSchema" ] }, { "cell_type": "code", "execution_count": null, "id": "62138c8f-f546-482a-a2b6-fe6d3623619c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "be1fb174-5bb9-4dee-a920-0ac5dcfb74a5", "metadata": {}, "source": [ "# Connect to the database, \n", "#### using the `NeoAccess` library\n", "#### You can use a free local install of the Neo4j database, or a remote one on a virtual machine under your control, or a hosted solution, or simply the FREE \"Sandbox\" : [instructions here](https://julianspolymathexplorations.blogspot.com/2023/03/neo4j-sandbox-tutorial-cypher.html)\n", "NOTE: This tutorial is tested on version 4.4 of the Neo4j database, but will probably also work on the new version 5" ] }, { "cell_type": "code", "execution_count": 3, "id": "6d6452fe-06a6-486a-b4f1-1568e6bde8d5", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Save your credentials here (and skip the next cell!) - or use the prompts given by the next cell\n", "#host = \"\" # EXAMPLES: bolt://123.456.789.012 OR neo4j://localhost\n", "#password = \"\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "0b92a63a-5c1c-462e-9d60-68a21d1f9fb1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "To create a database connection, enter the host IP, but leave out the port number: (EXAMPLES: bolt://1.2.3.4 OR neo4j://localhost )\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ "Enter host IP WITHOUT the port number. EXAMPLE: bolt://123.456.789.012 bolt://123.456.789.012\n", "Enter the database password: ········\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "=> Will be using: host='bolt://123.456.789.012:7687', username='neo4j', password=**********\n" ] } ], "source": [ "print(\"To create a database connection, enter the host IP, but leave out the port number: (EXAMPLES: bolt://1.2.3.4 OR neo4j://localhost )\\n\")\n", "\n", "host = input(\"Enter host IP WITHOUT the port number. EXAMPLE: bolt://123.456.789.012 \")\n", "host += \":7687\" # EXAMPLE of host value: \"bolt://123.456.789.012:7687\"\n", "\n", "password = getpass.getpass(\"Enter the database password:\")\n", "\n", "print(f\"\\n=> Will be using: host='{host}', username='neo4j', password=**********\")" ] }, { "cell_type": "code", "execution_count": null, "id": "c3965805-0d5e-44d8-8d4d-6aaecde6497c", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 4, "id": "7247f139-9f06-41d1-98e8-410ff7c9f177", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connection to Neo4j database established.\n" ] } ], "source": [ "db = NeoAccess(host=host,\n", " credentials=(\"neo4j\", password), debug=False) # Notice the debug option being OFF" ] }, { "cell_type": "code", "execution_count": 5, "id": "c96ece03-2b07-4a4d-ad6e-e41ccbf67251", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Version of the Neo4j driver: 4.4.11\n" ] } ], "source": [ "print(\"Version of the Neo4j driver: \", db.version())" ] }, { "cell_type": "code", "execution_count": null, "id": "cbbf9037-a8aa-4435-b708-cd2257702e3d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "4ca98da0-f267-4efa-8302-624efbd1a744", "metadata": {}, "source": [ "# Import of data from a Pandas data frame using the `NeoSchema` library" ] }, { "cell_type": "markdown", "id": "f5d03ea0-0282-4d95-9f4a-852d04cc8f57", "metadata": {}, "source": [ "### Initial setup" ] }, { "cell_type": "code", "execution_count": 6, "id": "c3e8506e-b904-48c1-a22a-2818807814cf", "metadata": {}, "outputs": [], "source": [ "#db.empty_dbase() # ****** Recommended for use with test databases. WARNING: USE WITH CAUTION!!! ******" ] }, { "cell_type": "code", "execution_count": 7, "id": "48e9ff90-6698-48a1-b8d2-e715dc7f191b", "metadata": { "tags": [] }, "outputs": [], "source": [ "NeoSchema.set_database(db)" ] }, { "cell_type": "markdown", "id": "2410dc5c-f720-4462-a0e6-32423d1083cb", "metadata": {}, "source": [ "### Create the Schema" ] }, { "cell_type": "code", "execution_count": 8, "id": "52bafa2c-1957-4285-ac70-44aab9d10dc1", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "(530, 1)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a \"City\" Class node - together with its Properties, based on the data to import\n", "\n", "NeoSchema.create_class_with_properties(name=\"City\", property_list=[\"City ID\", \"name\"])" ] }, { "cell_type": "code", "execution_count": 9, "id": "fdfd3222-fc19-41c1-bf74-d14e708ff1ba", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(533, 4)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Likewise for a \"State\" Class node - together with its Properties, based on the data to import\n", "\n", "NeoSchema.create_class_with_properties(name=\"State\", property_list=[\"State ID\", \"name\", \"2-letter abbr\"]) " ] }, { "cell_type": "code", "execution_count": 10, "id": "6828a03d-83b0-415f-b1e9-001dfdadf5ea", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Now add a relationship named \"IS_IN\", from the \"City\" Class to the \"State\" Class\n", "\n", "NeoSchema.create_class_relationship(from_class=\"City\", to_class=\"State\", rel_name=\"IS_IN\")" ] }, { "cell_type": "code", "execution_count": null, "id": "5063a4ae-57c1-4262-b8a6-4d2dff101da5", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "98ef1f4f-ddad-48a0-8717-1b0d3d2a9dff", "metadata": {}, "source": [ "### Now import some data\n", "We'll pass our data as Pandas data frames; those could easily be read in from CSV files, for example" ] }, { "cell_type": "code", "execution_count": 11, "id": "c6b117d0-c1c0-408b-b741-f2d771563e82", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
City IDname
01Berkeley
12Chicago
23San Francisco
34New York City
\n", "
" ], "text/plain": [ " City ID name\n", "0 1 Berkeley\n", "1 2 Chicago\n", "2 3 San Francisco\n", "3 4 New York City" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "city_df = pd.DataFrame({\"City ID\": [1, 2, 3, 4], \"name\": [\"Berkeley\", \"Chicago\", \"San Francisco\", \"New York City\"]})\n", "city_df" ] }, { "cell_type": "code", "execution_count": 12, "id": "65c8b7cd-5143-48f5-8805-46932f6b77c8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
State IDname2-letter abbr
01CaliforniaCA
12IllinoisIL
23New YorkNY
\n", "
" ], "text/plain": [ " State ID name 2-letter abbr\n", "0 1 California CA\n", "1 2 Illinois IL\n", "2 3 New York NY" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state_df = pd.DataFrame({\"State ID\": [1, 2, 3], \"name\": [\"California\", \"Illinois\", \"New York\"], \"2-letter abbr\": [\"CA\", \"IL\", \"NY\"]})\n", "state_df" ] }, { "cell_type": "code", "execution_count": 13, "id": "e6d8c9b5-1365-4124-aded-ea654cd05347", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
State IDCity ID
011
113
222
334
\n", "
" ], "text/plain": [ " State ID City ID\n", "0 1 1\n", "1 1 3\n", "2 2 2\n", "3 3 4" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# In this example, we assume a separate table (\"join table\") with the data about the relationships;\n", "# this would always be the case for many-to-many relationships; \n", "# 1-to-many relationships, like we have here, could also be stored differently\n", "state_city_links_df = pd.DataFrame({\"State ID\": [1, 1, 2, 3], \"City ID\": [1, 3, 2, 4]})\n", "state_city_links_df" ] }, { "cell_type": "markdown", "id": "f5605a74-a98f-47ea-9754-cfdcd948bd92", "metadata": {}, "source": [ "#### Note: those dataframes would often be read in from CSV files, with instruction such as\n", "#### city_df = pd.read_csv('D:\\my_path\\city_data_file.csv', encoding = \"ISO-8859-1\")" ] }, { "cell_type": "markdown", "id": "3b1f5975-d704-4ee0-88ef-b6e46dd68856", "metadata": {}, "source": [ "# Ingesting the Data Frames into the graph database is quite easy:" ] }, { "cell_type": "code", "execution_count": 14, "id": "a720e0be-71f2-45d2-b0f3-34ed18078b1e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Getting ready to import 4 records...\n", " FINISHED importing a total of 4 records\n" ] }, { "data": { "text/plain": [ "[537, 538, 539, 540]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NeoSchema.import_pandas_nodes(df=city_df, class_node=\"City\")" ] }, { "cell_type": "code", "execution_count": 15, "id": "cf745a2b-762e-418f-aac3-e236aa809d7c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Getting ready to import 3 records...\n", " FINISHED importing a total of 3 records\n" ] }, { "data": { "text/plain": [ "[541, 542, 543]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NeoSchema.import_pandas_nodes(df=state_df, class_node=\"State\")" ] }, { "cell_type": "code", "execution_count": 16, "id": "166fba7e-3ceb-4814-b8f9-dcc992617521", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Getting ready to import 4 links...\n", " FINISHED importing a total of 4 links\n" ] }, { "data": { "text/plain": [ "[647, 648, 649, 650]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "NeoSchema.import_pandas_links(df=state_city_links_df,\n", " col_from=\"City ID\", col_to=\"State ID\", \n", " link_name=\"IS_IN\")" ] }, { "cell_type": "code", "execution_count": null, "id": "8c1218a2-1486-4155-be7b-4d14a2d616db", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "74a81403-5125-48a7-b82d-57abdb1b0227", "metadata": {}, "source": [ "_This is what we have created with our import:_" ] }, { "cell_type": "markdown", "id": "aa94393a-4217-4c21-bdac-d16a112eaac7", "metadata": {}, "source": [ "![Schema](../BrainAnnex/docs/schema_tutorial_2_import.jpg)" ] }, { "cell_type": "markdown", "id": "e7d01fd7-2b2f-4795-943a-46eb68e636b3", "metadata": {}, "source": [ "#### Notice that the data from that \"join table\" (with \"State ID\" and \"City ID\") that was used for the import, to link up states and cities, is now stored as RELATIONSHIPS - nativety represented in the graph database" ] }, { "cell_type": "code", "execution_count": null, "id": "a0a42625-ce49-473b-b12d-fd5422fd1275", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }