{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Objectives\n", "\n", "- install [Nexus Forge](https://nexus-forge.readthedocs.io/en/latest/),\n", "- configure a Knowledge Graph forge,\n", "- download the [MovieLens data](http://files.grouplens.org/datasets/movielens/)\n", "- transform data,\n", "- load the transformed data into the project,\n", "- search for data using a SPARQL query." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "- This notebook requires you to have set up a project space in the [Blue Brain Nexus Sandbox](https://sandbox.bluebrainnexus.io/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing Nexus Forge" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install nexusforge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import Libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import yaml\n", "import pandas as pd\n", "import numpy as np\n", "import nexussdk as nxs\n", "from kgforge.core import KnowledgeGraphForge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "Then you need to define the Nexus Sandbox API endpoint, as well as the organization and project configured in the first part of the tutorial.\n", "Please remember to change to the appropriate organization and project in the code below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ORGANIZATION = \"tutorialnexus\"\n", "PROJECT = \"mytutorial\" # Provide your project label here\n", "DEPLOYMENT = \"https://sandbox.bluebrainnexus.io/v1\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to authenticate yourself, go to the [Nexus Sandbox](https://sandbox.bluebrainnexus.io/) and copy your token. You can then run the following line of code and input the token:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TOKEN = getpass.getpass() # Provide your Blue Brain Nexus token. It can be obtained after login into the sandbox: https://sandbox.bluebrainnexus.io" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we will be using the MovieLens data, it's useful to describe the context for the entities that we want to import in the knowledge graph. Here's an example of how to define the context." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "context = {\n", " \"@id\": \"https://context.org\",\n", " \"@context\": {\n", " \"@vocab\": \"https://sandbox.bluebrainnexus.io/v1/vocabs/\",\n", " \"schema\": \"http://schema.org/\",\n", " \"Movie\": {\n", " \"@id\": \"schema:Movie\"\n", " },\n", " \"Rating\": {\n", " \"@id\": \"schema:Rating\"\n", " }\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next step is thus to push that context as a resource to your project (i.e. knowledge graph):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nxs.config.set_environment(DEPLOYMENT)\n", "nxs.config.set_token(TOKEN)\n", "nxs.resources.create(ORGANIZATION, PROJECT, context)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To let Nexus Forge work with the Nexus Delta store, we need a bit of configuration. Let's write the configuration." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "config = {\n", " \"Model\": {\n", " \"name\": \"RdfModel\",\n", " \"origin\": \"store\",\n", " \"source\": \"BlueBrainNexus\",\n", " \"context\": {\n", " \"iri\": \"https://context.org\",\n", " \"bucket\": f\"{ORGANIZATION}/{PROJECT}\"\n", " }\n", " },\n", " \"Store\": {\n", " \"name\": \"BlueBrainNexus\",\n", " \"endpoint\": DEPLOYMENT,\n", " \"versioned_id_template\": \"{x.id}?rev={x._store_metadata._rev}\",\n", " \"file_resource_mapping\": \"../../configurations/nexus-store/file-to-resource-mapping.hjson\",\n", " },\n", " \"Formatters\": {\n", " \"identifier\": \"https://movielens.org/{}/{}\"\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now setup the Forge:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge = KnowledgeGraphForge(config, token=TOKEN, bucket=f\"{ORGANIZATION}/{PROJECT}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## MovieLens Data\n", "\n", "Let's download the MovieLens datasets and load the data in Python:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Download the data using curl and unzipping the file\n", "# Please note that the prefix '!' is meant to execute a shell command from inside a notebook, you will have to remove it if you do it from a terminal.\n", "!curl -s -O http://files.grouplens.org/datasets/movielens/ml-latest-small.zip && unzip -qq ml-latest-small.zip && cd ml-latest-small && ls \n", "\n", "directory = \"./ml-latest-small\" # Location of the files (from unzipping)\n", "\n", "movies_df = pd.read_csv(f\"{directory}/movies.csv\")\n", "ratings_df = pd.read_csv(f\"{directory}/ratings.csv\", dtype={\"movieId\":\"string\"})\n", "tags_df = pd.read_csv(f\"{directory}/tags.csv\", dtype={\"movieId\":\"string\"})\n", "links_df = pd.read_csv(f\"{directory}/links.csv\")\n", "\n", "movies_links_df = pd.merge(movies_df, links_df, on='movieId') # Merge movies and links" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources in Nexus Forge\n", "\n", "Let's define the types of our data frames:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "movies_links_df[\"type\"] = \"Movie\"\n", "ratings_df[\"type\"] = \"Rating\"\n", "tags_df[\"type\"] = \"Tag\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also apply some data transformations. We split the `genres`, and format the `Id`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "movies_links_df[\"id\"] = movies_links_df[\"movieId\"].apply(lambda x: forge.format(\"identifier\", \"movies\", x))\n", "movies_links_df[\"genres\"] = movies_links_df[\"genres\"].apply(lambda x: x.split(\"|\"))\n", "ratings_df[\"movieId.id\"] = movies_links_df[\"movieId\"].apply(lambda x: forge.format(\"identifier\", \"movies\", x))\n", "tags_df[\"movieId.id\"] = tags_df[\"movieId\"].apply(lambda x: forge.format(\"identifier\", \"movies\", x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's register these data frames as Forge resources:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "movies_resources = forge.from_dataframe(movies_links_df, np.nan, \".\")\n", "ratings_resources = forge.from_dataframe(ratings_df, np.nan, \".\")\n", "tags_resources = forge.from_dataframe(tags_df, np.nan, \".\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualize the results:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(movies_resources[0])\n", "print(ratings_resources[0])\n", "print(tags_resources[629])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Register Resources into Nexus Delta\n", "\n", "Now that we have the resources, let's push them to our Sandbox deployment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forge.register(movies_resources)\n", "forge.register(ratings_resources)\n", "forge.register(tags_resources)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it! You can check your project in the web interface to see the newly created resources.\n", "\n", "## Query Resources with Nexus Forge from Nexus Delta\n", "\n", "As the resources are being indexed in the elasticsearch and blazegraph indices, that means that we can soon query those resources using a SPARQL query.\n", "\n", "> If you are new to SPARQL, that's ok, you can watch this [introduction video](https://www.youtube.com/watch?v=FvGndkpa4K0) before moving further.\n", "\n", "Let's list \"thought-provoking\" movies. First we write the query:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "query = \"\"\"\n", " PREFIX vocab: \n", " PREFIX nxv: \n", " SELECT DISTINCT ?id ?title\n", " WHERE {\n", " ?id nxv:self ?self ;\n", " nxv:deprecated false ;\n", " vocab:title ?title ;\n", " ^vocab:movieId / vocab:tag \"thought-provoking\" .\n", " }\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then use the Forge to query the endpoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "resources = forge.sparql(query, limit=100, debug=True)\n", "set(forge.as_dataframe(resources).title)\n", "movie = forge.retrieve(resources[0].id)\n", "print(movie)\n", "movie._store_metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to try some other examples of Nexus Forge, you can use these [notebooks](https://mybinder.org/v2/gh/BlueBrain/nexus-forge/master?filepath=examples%2Fnotebooks%2Fgetting-started).\n", "\n", "The next step is to use this query to create a Studio view in Nexus Fusion." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.5 64-bit", "language": "python", "name": "python_defaultSpec_1597329146582" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }