{ "cells": [ { "cell_type": "markdown", "id": "df6dbfb0-d649-481b-8e36-ca143c30e33d", "metadata": {}, "source": [ "# DISK narratives" ] }, { "cell_type": "markdown", "id": "e558df4e-77c1-4dfe-aa4e-e1d6961a086b", "metadata": {}, "source": [ "In this document we have some LLM experiments using the DISK data." ] }, { "cell_type": "markdown", "id": "4250d337-a39b-4723-8b4a-958f89089bc6", "metadata": {}, "source": [ "## RDF -> Neo4J" ] }, { "cell_type": "code", "execution_count": 1, "id": "9bf5a73b-b633-48c5-9788-919370016257", "metadata": {}, "outputs": [], "source": [ "from dotenv import load_dotenv\n", "import os\n", "import re\n", "from time import sleep\n", "from SPARQLWrapper import SPARQLWrapper, JSON\n", "from langchain_community.graphs import Neo4jGraph\n", "\n", "# Langchain\n", "from langchain_community.graphs import Neo4jGraph\n", "from langchain_community.vectorstores import Neo4jVector\n", "from langchain_openai import OpenAIEmbeddings\n", "from langchain.chains import RetrievalQAWithSourcesChain\n", "from langchain_openai import ChatOpenAI" ] }, { "cell_type": "code", "execution_count": 2, "id": "a696b67b-db8a-41f9-ab53-6c64b70bc8b1", "metadata": {}, "outputs": [], "source": [ "#Load config for Fuseki and Neo4J\n", "load_dotenv('.env', override=True)\n", "FUSEKI_URI = os.getenv('FUSEKI_URI')\n", "FUSEKI_USERNAME = os.getenv('FUSEKI_USERNAME')\n", "FUSEKI_PASSWORD = os.getenv('FUSEKI_PASSWORD')\n", "NEO4J_URI = os.getenv('NEO4J_URI')\n", "NEO4J_USERNAME = os.getenv('NEO4J_USERNAME')\n", "NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')\n", "NEO4J_DATABASE = os.getenv('NEO4J_DATABASE')\n", "OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')" ] }, { "cell_type": "code", "execution_count": 3, "id": "e4d0e7e9-2c7d-4657-bccb-9fbe903ffc2b", "metadata": {}, "outputs": [], "source": [ "# Create SPARQL wrapper\n", "sparql = SPARQLWrapper(FUSEKI_URI)\n", "if FUSEKI_USERNAME and FUSEKI_PASSWORD:\n", " sparql.setCredentials(FUSEKI_USERNAME, FUSEKI_PASSWORD)\n", "sparql.setReturnFormat(JSON)" ] }, { "cell_type": "code", "execution_count": 4, "id": "36f083a1-615b-459b-a16d-f962e07ff233", "metadata": {}, "outputs": [], "source": [ "# Create Neo4J wrapper\n", "neo4j = Neo4jGraph(\n", " url=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=NEO4J_DATABASE\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "id": "969b86e8-b122-4ea1-97b9-e52e33341f16", "metadata": {}, "outputs": [], "source": [ "# Some constants\n", "PREFIXES = \"\"\"PREFIX rdf: \n", "PREFIX rdfs: \"\"\"" ] }, { "cell_type": "markdown", "id": "765403d2-4384-4305-82ca-aa8d7dc8cece", "metadata": {}, "source": [ "### Question templates" ] }, { "cell_type": "code", "execution_count": 6, "id": "31d1f326-497e-490c-b5db-afcb304c2e89", "metadata": {}, "outputs": [], "source": [ "sparql.setQuery(PREFIXES + \"\"\"\n", "SELECT DISTINCT * FROM WHERE {\n", " ?QuestionId a ;\n", " \trdfs:label ?QuestionName ;\n", " \t ?questionTemplate ;\n", "}\"\"\")\n", "results = sparql.query().convert()" ] }, { "cell_type": "code", "execution_count": 7, "id": "3b3d897a-4b12-42b2-ba04-31df898de3b0", "metadata": {}, "outputs": [], "source": [ "# Creating a dict to store question templates.\n", "questions = {}\n", "for item in results['results']['bindings']:\n", " questions[item['QuestionId']['value']] = {}\n", " questions[item['QuestionId']['value']]['id'] = item['QuestionId']['value']\n", " questions[item['QuestionId']['value']]['name'] = item['QuestionName']['value']\n", " questions[item['QuestionId']['value']]['template'] = item['questionTemplate']['value']" ] }, { "cell_type": "code", "execution_count": 8, "id": "11e70c6f-2469-4afa-bdc2-3e30c5278158", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'question.name': 'Is the effect size of ?Genotype in ?Brain Imaging Trait associated with ?Demographic Attribute?'}]\n", "[{'question.name': 'Is ?Brain Characteristic associated with ?Neurological Disorder in comparison to healthy controls?'}]\n", "[{'question.name': 'Is the effect size of ?Genotype on ?Brain Imaging Trait of ?Region associated with ?Demographic Attribute?'}]\n", "[{'question.name': 'What is the effect size of ?Genotype on ?Region ?Brain Imaging Trait?'}]\n", "[{'question.name': 'Is the effect size of ?Genotype on ?Brain Imaging Trait of ?Region associated with ?Demographic Attribute for cohorts groups filtered by ?Criterion for ?Value?'}]\n", "[{'question.name': 'What is the effect size of ?Genotype on ?Region ?Brain Imaging Trait for cohorts groups filtered by ?Criterion for ?Value?'}]\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add_query = \"\"\"\n", "MERGE(question:Question {id: $question.id})\n", " ON CREATE SET \n", " question.name = $question.name,\n", " question.template = $question.template\n", "RETURN question.name\"\"\"\n", "\n", "for c in questions:\n", " r = neo4j.query(add_query, params={'question':questions[c]})\n", " print(r)\n", "\n", "neo4j.query(\"\"\"\n", "CREATE CONSTRAINT unique_question IF NOT EXISTS \n", " FOR (q:Question) REQUIRE q.id IS UNIQUE\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 9, "id": "e0e383af-31fb-4207-92f6-56f1f5023ea7", "metadata": {}, "outputs": [], "source": [ "# DELETE:\n", "#neo4j.query(\"\"\"MATCH (q:Question) DELETE q\"\"\")" ] }, { "cell_type": "markdown", "id": "f115eb9f-30b5-4070-9b35-6079c45cdf56", "metadata": {}, "source": [ "### Goals" ] }, { "cell_type": "code", "execution_count": 10, "id": "99c373bd-6cee-4cf3-b664-be85ea7cc7d0", "metadata": {}, "outputs": [], "source": [ "# Goals\n", "sparql.setQuery(PREFIXES + \"\"\"\n", "SELECT DISTINCT * FROM WHERE {\n", " ?GoalId a ;\n", "\t\t ?GoalName ;\n", " \t\t ?Description ;\n", "\t\t ?Created ;\n", " ?QuestionId .\n", " optional { ?GoalId ?Modified }\n", "}\n", "\"\"\")\n", "goal_results = sparql.query().convert()" ] }, { "cell_type": "code", "execution_count": 11, "id": "9aee8d48-3dc1-4add-b6e2-9360367c7dd1", "metadata": {}, "outputs": [], "source": [ "# Store goals, those are linked to question by question_id\n", "goals = {}\n", "for item in goal_results['results']['bindings']:\n", " goals[item['GoalId']['value']] = {}\n", " goals[item['GoalId']['value']]['id'] = item['GoalId']['value']\n", " goals[item['GoalId']['value']]['name'] = item['GoalName']['value']\n", " goals[item['GoalId']['value']]['description'] = item['Description']['value']\n", " goals[item['GoalId']['value']]['date_created'] = item['Created']['value']\n", " goals[item['GoalId']['value']]['question_id'] = item['QuestionId']['value']\n", " if 'modified' in item:\n", " goals[item['GoalId']['value']]['date_modified'] = item['Modified']['value']" ] }, { "cell_type": "code", "execution_count": 12, "id": "116cfd96-a234-4459-8113-17c1ea3a62fc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'goal.name': \"Is the effect size of the association between SNP rs1080066 and the Surface Area of the Precentral gyrus associated with a cohort's mean age in cohorts of European Ancestry?\"}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'goal.name': 'What is the Effect Size of rs1080066 on Precental Cortex Surface Area for cohorts groups of European ancestry'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add_query = \"\"\"\n", "MERGE(goal:Goal {id: $goal.id})\n", " ON CREATE SET \n", " goal.name = $goal.name,\n", " goal.description = $goal.description,\n", " goal.date_created = $goal.date_created,\n", " goal.date_modified = $goal.date_modified\n", "RETURN goal.name\"\"\"\n", "\n", "merge_query = \"\"\"\n", "MATCH (q:Question {id: $question.id}), (g:Goal {id: $goal.id})\n", "MERGE (g)-[relationship:hasQuestion]->(q)\n", "RETURN relationship\"\"\"\n", "\n", "for g in goals:\n", " q = questions[goals[g]['question_id']]\n", " r = neo4j.query(add_query, params={'goal':goals[g]})\n", " print(r)\n", " r = neo4j.query(merge_query, params={'question':q, 'goal':goals[g]})\n", " print(r)\n", "\n", "neo4j.query(\"\"\"CREATE CONSTRAINT unique_goal IF NOT EXISTS FOR (g:Goal) REQUIRE g.id IS UNIQUE\"\"\")" ] }, { "cell_type": "code", "execution_count": 13, "id": "e199d625-06d0-44a3-a87f-74d0ec937a2e", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-XBkQmDYmJAn0/bindings/genotype-Ky8zLkhrWpVO'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-XBkQmDYmJAn0/bindings/brainImagingTrait-SUjM0MeseHxB'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-XBkQmDYmJAn0/bindings/CriterionValue-Y13ptyhmjJd6'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-XBkQmDYmJAn0/bindings/demographicAttribute-Y5l3ZuoYITux'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-XBkQmDYmJAn0/bindings/brainRegion-AUfzTg0TjlL4'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-XBkQmDYmJAn0/bindings/Criterion-1GtBXBML6uw3'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-7l4AS1WcMhyh/bindings/genotype-jqrCiLBJ2zji'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-7l4AS1WcMhyh/bindings/brainRegion-jVSd824Nlovz'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-7l4AS1WcMhyh/bindings/brainImagingTrait-TrONS5Jvmpgv'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-7l4AS1WcMhyh/bindings/Criterion-Qvoq1zG8ZWr3'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/goals/Goal-7l4AS1WcMhyh/bindings/CriterionValue-qQr8gCuTtMA8'}]\n", "[{'relationship': ({}, 'hasQuestionBinding', {})}]\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load Variable bindings for Goal's question\n", "add_binding_query = \"\"\"\n", "MERGE(b:Binding {id: $binding.id})\n", " ON CREATE SET \n", " b.variable = $binding.variable,\n", " b.value = $binding.value,\n", " b.type = $binding.type\n", "RETURN b.id\"\"\"\n", "\n", "merge_goal_binding_query = \"\"\"\n", "MATCH (goal:Goal {id: $goal.id}), (binding:Binding {id: $binding.id})\n", "MERGE (goal)-[relationship:hasQuestionBinding]->(binding)\n", "RETURN relationship\"\"\"\n", "\n", "for g in goals:\n", " goal = goals[g]\n", " sparql.setQuery(PREFIXES + \"\"\"\n", "SELECT DISTINCT ?Binding ?Variable ?Value ?Type FROM WHERE {\n", " ?Goal ?Binding .\n", " ?Binding \t ?Variable ;\n", " \t\t\t ?Value ;\n", " \t\t ?Type .\n", " VALUES ?Goal { <\"\"\" + goal['id'] + \"> }}\")\n", " r = sparql.query().convert()\n", " for item in r['results']['bindings']:\n", " binding = {}\n", " binding['id'] = item['Binding']['value']\n", " binding['variable'] = item['Variable']['value']\n", " binding['value'] = item['Value']['value']\n", " binding['type'] = item['Type']['value']\n", " out = neo4j.query(add_binding_query, params={'binding':binding})\n", " print(out)\n", " out = neo4j.query(merge_goal_binding_query, params={'binding':binding, 'goal':goal})\n", " print(out)\n", "\n", "neo4j.query(\"\"\"CREATE CONSTRAINT unique_binding IF NOT EXISTS FOR (l:Binding) REQUIRE l.id IS UNIQUE\"\"\")" ] }, { "cell_type": "code", "execution_count": 14, "id": "f3fdd73f-31b5-4e1a-b93c-18b60fc933be", "metadata": {}, "outputs": [], "source": [ "# DELETE:\n", "#neo4j.query(\"\"\"MATCH (g:Goal) DELETE g\"\"\")\n", "#neo4j.query(\"\"\"MATCH (g:Goal)-[relationship:hasQuestion]->(q:Question) DELETE relationship\"\"\")" ] }, { "cell_type": "markdown", "id": "7963d2a5-f77b-4a30-a8cd-0a19972c3d3c", "metadata": {}, "source": [ "### Lines of Inquiry" ] }, { "cell_type": "code", "execution_count": 15, "id": "22d3ebd3-6d6e-4a4d-ae6b-923cab41899c", "metadata": {}, "outputs": [], "source": [ "# Lines of Inquiry\n", "sparql.setQuery(PREFIXES + \"\"\"\n", "SELECT DISTINCT * FROM WHERE {\n", " ?LOI a ;\n", " ?Name ;\n", " ?Description ;\n", " ?QuestionId ;\n", " ?Created ;\n", " ?Seed ;\n", " ?DataQuery .\n", " ?DataQuery ?QueryTemplate . \n", " optional { ?LOI ?Modified }\n", "}\"\"\")\n", "loi_results = sparql.query().convert()" ] }, { "cell_type": "code", "execution_count": 16, "id": "2b6d2721-c2a6-47fb-8a3a-6eff95a63f1a", "metadata": {}, "outputs": [], "source": [ "# Store lois, those are linked to question by question_id\n", "lois = {}\n", "for item in loi_results['results']['bindings']:\n", " lois[item['LOI']['value']] = {}\n", " lois[item['LOI']['value']]['id'] = item['LOI']['value']\n", " lois[item['LOI']['value']]['name'] = item['Name']['value']\n", " lois[item['LOI']['value']]['description'] = item['Description']['value']\n", " lois[item['LOI']['value']]['date_created'] = item['Created']['value']\n", " lois[item['LOI']['value']]['question_id'] = item['QuestionId']['value']\n", " lois[item['LOI']['value']]['query_template'] = item['QueryTemplate']['value']\n", " lois[item['LOI']['value']]['seed_id'] = item['Seed']['value'] #Not loaded yet\n", " if 'modified' in item:\n", " lois[item['LOI']['value']]['date_modified'] = item['Modified']['value']" ] }, { "cell_type": "code", "execution_count": 17, "id": "c1705540-dc9d-4edd-a21e-4696a6d0664e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'loi.name': 'Meta regression with a filter'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'loi.name': 'Meta analysis'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add_loi_query = \"\"\"\n", "MERGE(loi:LineOfInquiry {id: $loi.id})\n", " ON CREATE SET \n", " loi.name = $loi.name,\n", " loi.description = $loi.description,\n", " loi.date_created = $loi.date_created,\n", " loi.date_modified = $loi.date_modified,\n", " loi.query_template = $loi.query_template\n", "RETURN loi.name\"\"\"\n", "\n", "merge_loi_query = \"\"\"\n", "MATCH (q:Question {id: $question.id}), (loi:LineOfInquiry {id: $loi.id})\n", "MERGE (loi)-[relationship:hasQuestion]->(q)\n", "RETURN relationship\"\"\"\n", "\n", "for loi in lois:\n", " q = questions[lois[loi]['question_id']]\n", " r = neo4j.query(add_loi_query, params={'loi':lois[loi]})\n", " print(r)\n", " r = neo4j.query(merge_loi_query, params={'question':q, 'loi':lois[loi]})\n", " print(r)\n", "\n", "neo4j.query(\"\"\"CREATE CONSTRAINT unique_loi IF NOT EXISTS FOR (l:LineOfInquiry) REQUIRE l.id IS UNIQUE\"\"\")" ] }, { "cell_type": "markdown", "id": "b5169489-2a11-4514-9cfe-ababaf31a21b", "metadata": {}, "source": [ "### Workflow seed" ] }, { "cell_type": "code", "execution_count": 18, "id": "d234d910-0898-4f93-9f4e-5198a2a9ef9c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'wfs.id': 'http://localhost:8080/disk-project-server/lois/LOI-Q7zw0HsrUwwD/seeds/-wbAoiLMRuJ3f'}]\n", "[{'relationship': ({}, 'hasWorkflowSeed', {})}]\n", "[{'wfs.id': 'http://localhost:8080/disk-project-server/lois/LOI-pEJwIhcWNTDS/seeds/-cTAZIDs8XfzM'}]\n", "[{'relationship': ({}, 'hasWorkflowSeed', {})}]\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Load Workflow seeds for LOIs\n", "add_seed_query = \"\"\"\n", "MERGE(wfs:WorkflowSeed {id: $seed.id})\n", " ON CREATE SET \n", " wfs.variable = $seed.variable,\n", " wfs.value = $seed.value,\n", " wfs.type = $seed.type\n", "RETURN wfs.id\"\"\"\n", "\n", "merge_loi_seed_query = \"\"\"\n", "MATCH (loi:LineOfInquiry {id: $loi.id}), (seed:WorkflowSeed {id: $seed.id})\n", "MERGE (loi)-[relationship:hasWorkflowSeed]->(seed)\n", "RETURN relationship\"\"\"\n", "\n", "seeds = {}\n", "for i in lois:\n", " loi = lois[i]\n", " seed_id = loi['seed_id']\n", " sparql.setQuery(PREFIXES + \"\"\"\n", "SELECT DISTINCT * FROM WHERE {\n", " ?Seed ?Name ;\n", " ?Description ;\n", " VALUES ?Seed { <\"\"\" + seed_id + \"> }}\")\n", " r = sparql.query().convert()\n", " if len(r['results']['bindings']) > 0:\n", " item = r['results']['bindings'][0]\n", " seed = {}\n", " seed['id'] = seed_id\n", " seed['name'] = item['Name']['value']\n", " seed['description'] = item['Description']['value']\n", " seeds[seed_id] = seed\n", " out = neo4j.query(add_seed_query, params={'seed':seed})\n", " print(out)\n", " out = neo4j.query(merge_loi_seed_query, params={'seed':seed, 'loi':loi})\n", " print(out)\n", "\n", "neo4j.query(\"\"\"CREATE CONSTRAINT unique_seed IF NOT EXISTS FOR (l:WorkflowSeed) REQUIRE l.id IS UNIQUE\"\"\")" ] }, { "cell_type": "code", "execution_count": 19, "id": "f0c00115-86bc-448a-b874-27d3f33c0cfc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-Q7zw0HsrUwwD/seeds/-wbAoiLMRuJ3f/bindings/area-S2u4Gd3XADjg'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-Q7zw0HsrUwwD/seeds/-wbAoiLMRuJ3f/bindings/demographic-XfwdBufvw0y2'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-Q7zw0HsrUwwD/seeds/-wbAoiLMRuJ3f/bindings/demographic_value-HncGxEoq2MIs'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-Q7zw0HsrUwwD/seeds/-wbAoiLMRuJ3f/bindings/snp-YCVVj9h2OlNr'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-Q7zw0HsrUwwD/seeds/-wbAoiLMRuJ3f/bindings/trait-myGIDu70W593'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-Q7zw0HsrUwwD/seeds/-wbAoiLMRuJ3f/bindings/cohortData-uaGNGjoUbclR'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-pEJwIhcWNTDS/seeds/-cTAZIDs8XfzM/bindings/area-O8o2iLLy8xAa'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-pEJwIhcWNTDS/seeds/-cTAZIDs8XfzM/bindings/snp-sOLb1IxtU9CP'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-pEJwIhcWNTDS/seeds/-cTAZIDs8XfzM/bindings/trait-EoNSTe2xROPj'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-pEJwIhcWNTDS/seeds/-cTAZIDs8XfzM/bindings/demographic_value-E7sbHmUfAS54'}]\n", "[{'relationship': ({}, 'hasParameter', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/lois/LOI-pEJwIhcWNTDS/seeds/-cTAZIDs8XfzM/bindings/cohortData-yBhLXX1ggMJW'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n" ] } ], "source": [ "# Load Bindings for wf seeds\n", "merge_seed_parameter_query = \"\"\"\n", "MATCH (seed:WorkflowSeed {id: $seed.id}), (binding:Binding {id: $binding.id})\n", "MERGE (seed)-[relationship:hasParameter]->(binding)\n", "RETURN relationship\"\"\"\n", "merge_seed_input_query = \"\"\"\n", "MATCH (seed:WorkflowSeed {id: $seed.id}), (binding:Binding {id: $binding.id})\n", "MERGE (seed)-[relationship:hasInput]->(binding)\n", "RETURN relationship\"\"\"\n", "\n", "for i in seeds:\n", " seed = seeds[i]\n", " seed_id = seed['id']\n", " for t in ['hasParameter', 'hasInput']:\n", " sparql_query = PREFIXES + \"\"\"\n", "SELECT DISTINCT * FROM WHERE {\n", " ?Seed ?Binding .\n", " ?Binding ?Variable ;\n", " ?Value ;\n", " ?Type .\n", " VALUES ?Seed { <\"\"\" + seed_id + \"> }}\"\n", " sparql.setQuery(sparql_query)\n", " r = sparql.query().convert()\n", " for item in r['results']['bindings']:\n", " binding = {}\n", " binding['id'] = item['Binding']['value']\n", " binding['variable'] = item['Variable']['value']\n", " binding['value'] = item['Value']['value']\n", " binding['type'] = item['Type']['value']\n", " out = neo4j.query(add_binding_query, params={'binding':binding})\n", " print(out)\n", " if t == 'hasParameter':\n", " out = neo4j.query(merge_seed_parameter_query, params={'binding':binding, 'seed':seed})\n", " else:\n", " out = neo4j.query(merge_seed_input_query, params={'binding':binding, 'seed':seed})\n", " print(out)" ] }, { "cell_type": "markdown", "id": "a7d18389-3f13-49e0-8e5c-0ca07601a906", "metadata": {}, "source": [ "### Triggered lines of inquiry" ] }, { "cell_type": "code", "execution_count": 20, "id": "312ab734-6b96-456b-8565-5a649b2187e8", "metadata": {}, "outputs": [], "source": [ "# Lines of Inquiry\n", "sparql.setQuery(PREFIXES + \"\"\"\n", "SELECT DISTINCT * FROM WHERE {\n", " ?TLOI a ;\n", " ?Name ;\n", " ?Description ;\n", " ?QuestionId ;\n", "\t ?LoiId ;\n", " ?GoalId ;\n", " ?Created ;\n", " ?Inst ;\n", "\t ?Status;\n", " ?QueryResults .\n", " ?QueryResults ?Query ;\n", " ?QueryTemplate ;\n", " ?Result\n", " optional { ?TLOI ?Modified }\n", "}\"\"\")\n", "results = sparql.query().convert()" ] }, { "cell_type": "code", "execution_count": 21, "id": "c2af6cf9-1f79-4bb3-982a-dfa54f7a7c17", "metadata": {}, "outputs": [], "source": [ "# Store tlois, those are linked to goals and lois.\n", "tlois = {}\n", "for item in results['results']['bindings']:\n", " tlois[item['TLOI']['value']] = {}\n", " tlois[item['TLOI']['value']]['id'] = item['TLOI']['value']\n", " tlois[item['TLOI']['value']]['name'] = item['Name']['value']\n", " tlois[item['TLOI']['value']]['description'] = item['Description']['value']\n", " tlois[item['TLOI']['value']]['date_created'] = item['Created']['value']\n", " tlois[item['TLOI']['value']]['question_id'] = item['QuestionId']['value']\n", " tlois[item['TLOI']['value']]['loi_id'] = item['LoiId']['value']\n", " tlois[item['TLOI']['value']]['goal_id'] = item['GoalId']['value']\n", " tlois[item['TLOI']['value']]['status'] = item['Status']['value']\n", " tlois[item['TLOI']['value']]['inst'] = item['Inst']['value']\n", " tlois[item['TLOI']['value']]['query_template'] = item['QueryTemplate']['value']\n", " tlois[item['TLOI']['value']]['query'] = item['Query']['value']\n", " tlois[item['TLOI']['value']]['query_response'] = item['Result']['value']\n", " if 'modified' in item:\n", " tlois[item['TLOI']['value']]['date_modified'] = item['Modified']['value']" ] }, { "cell_type": "code", "execution_count": 22, "id": "593f1510-031a-440e-bb0c-3b307d16f6ff", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'tloi.name': 'Triggered: Meta regression with ancestry'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'relationship': ({}, 'hasLineOfInquiry', {})}]\n", "[{'relationship': ({}, 'hasGoal', {})}]\n", "[{'tloi.name': 'Triggered: Meta regression with ancestry'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'relationship': ({}, 'hasLineOfInquiry', {})}]\n", "[{'relationship': ({}, 'hasGoal', {})}]\n", "[{'tloi.name': 'Triggered: Meta regression with ancestry'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'relationship': ({}, 'hasLineOfInquiry', {})}]\n", "[{'relationship': ({}, 'hasGoal', {})}]\n", "[{'tloi.name': 'Triggered: Meta regression with a filter'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'relationship': ({}, 'hasLineOfInquiry', {})}]\n", "[{'relationship': ({}, 'hasGoal', {})}]\n", "[{'tloi.name': 'Triggered: Meta regression with a filter'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'relationship': ({}, 'hasLineOfInquiry', {})}]\n", "[{'relationship': ({}, 'hasGoal', {})}]\n", "[{'tloi.name': 'Triggered: Meta regression with a filter'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'relationship': ({}, 'hasLineOfInquiry', {})}]\n", "[{'relationship': ({}, 'hasGoal', {})}]\n", "[{'tloi.name': 'Triggered: Meta regression with a filter'}]\n", "[{'relationship': ({}, 'hasQuestion', {})}]\n", "[{'relationship': ({}, 'hasLineOfInquiry', {})}]\n", "[{'relationship': ({}, 'hasGoal', {})}]\n" ] }, { "ename": "KeyError", "evalue": "'http://localhost:8080/disk-project-server/goals/Goal-DVcUWW5xZFXX'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[22], line 33\u001b[0m\n\u001b[1;32m 31\u001b[0m q \u001b[38;5;241m=\u001b[39m questions[tlois[tloi][\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mquestion_id\u001b[39m\u001b[38;5;124m'\u001b[39m]]\n\u001b[1;32m 32\u001b[0m loi \u001b[38;5;241m=\u001b[39m lois[tlois[tloi][\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mloi_id\u001b[39m\u001b[38;5;124m'\u001b[39m]]\n\u001b[0;32m---> 33\u001b[0m goal \u001b[38;5;241m=\u001b[39m \u001b[43mgoals\u001b[49m\u001b[43m[\u001b[49m\u001b[43mtlois\u001b[49m\u001b[43m[\u001b[49m\u001b[43mtloi\u001b[49m\u001b[43m]\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mgoal_id\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m]\u001b[49m\n\u001b[1;32m 34\u001b[0m r \u001b[38;5;241m=\u001b[39m neo4j\u001b[38;5;241m.\u001b[39mquery(add_tloi_query, params\u001b[38;5;241m=\u001b[39m{\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtloi\u001b[39m\u001b[38;5;124m'\u001b[39m:tlois[tloi]})\n\u001b[1;32m 35\u001b[0m \u001b[38;5;28mprint\u001b[39m(r)\n", "\u001b[0;31mKeyError\u001b[0m: 'http://localhost:8080/disk-project-server/goals/Goal-DVcUWW5xZFXX'" ] } ], "source": [ "add_tloi_query = \"\"\"\n", "MERGE(tloi:TriggeredLineOfInquiry {id: $tloi.id})\n", " ON CREATE SET \n", " tloi.name = $tloi.name,\n", " tloi.description = $tloi.description,\n", " tloi.source = $tloi.id,\n", " tloi.status = $tloi.status,\n", " tloi.date_created = $tloi.date_created,\n", " tloi.date_modified = $tloi.date_modified,\n", " tloi.query = $tloi.query,\n", " tloi.query_template = $tloi.query_template,\n", " tloi.query_response = $tloi.query_response\n", "RETURN tloi.name\"\"\"\n", "\n", "merge_tloi_question_query = \"\"\"\n", "MATCH (q:Question {id: $question.id}), (tloi:TriggeredLineOfInquiry {id: $tloi.id})\n", "MERGE (tloi)-[relationship:hasQuestion]->(q)\n", "RETURN relationship\"\"\"\n", "\n", "merge_tloi_loi_query = \"\"\"\n", "MATCH (loi:LineOfInquiry {id: $loi.id}), (tloi:TriggeredLineOfInquiry {id: $tloi.id})\n", "MERGE (tloi)-[relationship:hasLineOfInquiry]->(loi)\n", "RETURN relationship\"\"\"\n", "\n", "merge_tloi_goal_query = \"\"\"\n", "MATCH (g:Goal {id: $goal.id}), (tloi:TriggeredLineOfInquiry {id: $tloi.id})\n", "MERGE (tloi)-[relationship:hasGoal]->(g)\n", "RETURN relationship\"\"\"\n", "\n", "for tloi in tlois:\n", " q = questions[tlois[tloi]['question_id']]\n", " loi = lois[tlois[tloi]['loi_id']]\n", " goal = goals[tlois[tloi]['goal_id']]\n", " r = neo4j.query(add_tloi_query, params={'tloi':tlois[tloi]})\n", " print(r)\n", " r = neo4j.query(merge_tloi_question_query, params={'question':q, 'tloi':tlois[tloi]})\n", " print(r)\n", " r = neo4j.query(merge_tloi_loi_query, params={'loi':loi, 'tloi':tlois[tloi]})\n", " print(r)\n", " r = neo4j.query(merge_tloi_goal_query, params={'goal':goal, 'tloi':tlois[tloi]})\n", " print(r)\n", "\n", "neo4j.query(\"\"\"CREATE CONSTRAINT unique_tloi IF NOT EXISTS FOR (l:TriggeredLineOfInquiry) REQUIRE l.id IS UNIQUE\"\"\")" ] }, { "cell_type": "markdown", "id": "d27d63b1-b051-4808-b8cf-180662472778", "metadata": {}, "source": [ "### Workflow inst" ] }, { "cell_type": "code", "execution_count": 23, "id": "7d4b3ff0-058f-49fc-98a2-5ae6edad5e43", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'inst.name': 'Meta-Regression'}]\n", "[{'relationship': ({}, 'hasWorkflowInstantiation', {})}]\n", "[{'inst.name': 'Meta-Regression'}]\n", "[{'relationship': ({}, 'hasWorkflowInstantiation', {})}]\n", "[{'inst.name': 'Meta-Regression'}]\n", "[{'relationship': ({}, 'hasWorkflowInstantiation', {})}]\n", "[{'inst.name': 'Meta-Regression'}]\n", "[{'relationship': ({}, 'hasWorkflowInstantiation', {})}]\n", "[{'inst.name': 'Meta-Regression'}]\n", "[{'relationship': ({}, 'hasWorkflowInstantiation', {})}]\n", "[{'inst.name': 'Meta-Regression'}]\n", "[{'relationship': ({}, 'hasWorkflowInstantiation', {})}]\n", "[{'inst.name': 'Meta-Regression'}]\n", "[{'relationship': ({}, 'hasWorkflowInstantiation', {})}]\n", "[{'inst.name': 'Meta-Regression'}]\n", "[]\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add_inst_query = \"\"\"\n", "MERGE(inst:WorkflowInstantiation {id: $inst.id})\n", " ON CREATE SET \n", " inst.name = $inst.name,\n", " inst.description = $inst.description,\n", " inst.status = $inst.status,\n", " inst.workflow_link = $inst.workflow_link\n", "RETURN inst.name\"\"\"\n", "\n", "merge_tloi_inst_query = \"\"\"\n", "MATCH (wfc:WorkflowInstantiation {id: $inst.id}), (tloi:TriggeredLineOfInquiry {id: $tloi.id})\n", "MERGE (tloi)-[relationship:hasWorkflowInstantiation]->(wfc)\n", "RETURN relationship\"\"\"\n", "\n", "# Load workflow values for loaded tlois:\n", "instantiations = {}\n", "for tid in tlois:\n", " tloi = tlois[tid]\n", " inst_id = tloi['inst']\n", " sparql.setQuery(PREFIXES + \"\"\"\n", " SELECT DISTINCT * FROM WHERE {\n", " ?I ?Name ;\n", " ?Description ;\n", " ?WorkflowLink;\n", " ?Execution .\n", " OPTIONAL {?I ?Status}\n", " VALUES ?I {<\"\"\" + inst_id + \">}}\")\n", " r = sparql.query().convert()\n", " if len(r['results']['bindings']) > 0:\n", " item = r['results']['bindings'][0]\n", " instantiation = {}\n", " instantiation['id'] = inst_id\n", " instantiation['name'] = item['Name']['value']\n", " instantiation['description'] = item['Description']['value']\n", " instantiation['workflow_link'] = item['WorkflowLink']['value']\n", " instantiation['execution'] = item['Execution']['value']\n", " if 'status' in instantiation:\n", " instantiation['status'] = item['Status']['value']\n", " instantiations[inst_id] = instantiation\n", " out = neo4j.query(add_inst_query, params={'inst':instantiation})\n", " print(out)\n", " out = neo4j.query(merge_tloi_inst_query, params={'inst':instantiation, 'tloi':tloi})\n", " print(out)\n", " else:\n", " print(\"Workflow instantiation not found for \" + inst_id)\n", "\n", "neo4j.query(\"\"\"CREATE CONSTRAINT unique_inst IF NOT EXISTS FOR (l:WorkflowInstantiation) REQUIRE l.id IS UNIQUE\"\"\")" ] }, { "cell_type": "code", "execution_count": 22, "id": "63f83c4f-8ddd-4cd8-b173-8ee671185c60", "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Load Bindings for wf seeds\n", "#merge_inst_parameter_query = \"\"\"\n", "#MATCH (inst:WorkflowInstantiation {id: $inst.id}), (binding:Binding {id: $binding.id})\n", "#MERGE (inst)-[relationship:hasParameter]->(binding)\n", "#RETURN relationship\"\"\"\n", "#merge_inst_input_query = \"\"\"\n", "#MATCH (inst:WorkflowInstantiation {id: $inst.id}), (binding:Binding {id: $binding.id})\n", "#MERGE (inst)-[relationship:hasInput]->(binding)\n", "#RETURN relationship\"\"\"\n", "#merge_inst_output_query = \"\"\"\n", "#MATCH (inst:WorkflowInstantiation {id: $inst.id}), (binding:Binding {id: $binding.id})\n", "#MERGE (inst)-[relationship:hasOutput]->(binding)\n", "#RETURN relationship\"\"\"\n", "#\n", "#for inst_id in instantiations:\n", "# inst = instantiations[inst_id]\n", "# for t in ['hasParameter', 'hasInput', 'hasOutput']:\n", "# sparql_query = PREFIXES + \"\"\"\n", "#SELECT DISTINCT * FROM WHERE {\n", "# ?Inst ?Binding .\n", "# ?Binding ?Variable ;\n", "# ?Value ;\n", "# ?Type .\n", "# VALUES ?Inst { <\"\"\" + inst_id + \"> }}\"\n", "# sparql.setQuery(sparql_query)\n", "# r = sparql.query().convert()\n", "# for item in r['results']['bindings']:\n", "# binding = {}\n", "# binding['id'] = item['Binding']['value']\n", "# binding['variable'] = item['Variable']['value']\n", "# binding['value'] = item['Value']['value']\n", "# binding['type'] = item['Type']['value']\n", "# out = neo4j.query(add_binding_query, params={'binding':binding})\n", "# print(out)\n", "# if t == 'hasParameter':\n", "# out = neo4j.query(merge_inst_parameter_query, params={'binding':binding, 'inst':inst})\n", "# elif t == 'hasInput':\n", "# out = neo4j.query(merge_inst_input_query, params={'binding':binding, 'inst':inst})\n", "# else:\n", "# out = neo4j.query(merge_inst_output_query, params={'binding':binding, 'inst':inst})\n", "# print(out)" ] }, { "cell_type": "markdown", "id": "2ae2d5ec-bea3-494e-97c5-9792d4cc0f4c", "metadata": {}, "source": [ "### Executions" ] }, { "cell_type": "code", "execution_count": 24, "id": "2253b2ec-2224-436b-a4fa-f1d52e4cf4c4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'exec.confidence_value': '0.263176970715283e0'}]\n", "[{'relationship': ({}, 'hasExecution', {})}]\n", "[{'exec.confidence_value': '0.044376539244712e0'}]\n", "[{'relationship': ({}, 'hasExecution', {})}]\n", "[{'exec.confidence_value': '0.37155457478486e0'}]\n", "[{'relationship': ({}, 'hasExecution', {})}]\n", "[{'exec.confidence_value': '0.500060579258888e0'}]\n", "[{'relationship': ({}, 'hasExecution', {})}]\n", "[{'exec.confidence_value': '0.020618776795934e0'}]\n", "[{'relationship': ({}, 'hasExecution', {})}]\n", "[{'exec.confidence_value': '0.0828042022392925e0'}]\n", "[{'relationship': ({}, 'hasExecution', {})}]\n", "[{'exec.confidence_value': '0.018198423339433e0'}]\n", "[{'relationship': ({}, 'hasExecution', {})}]\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Executions\n", "add_exec_query = \"\"\"\n", "MERGE(exec:Execution {id: $exec.id})\n", " ON CREATE SET \n", " exec.confidence_type = $exec.confidence_type,\n", " exec.confidence_value = toFloat($exec.confidence_value),\n", " exec.date_start = $exec.start_date,\n", " exec.date_end = $exec.end_date\n", "RETURN exec.confidence_value\"\"\"\n", "\n", "merge_exec_inst_query = \"\"\"\n", "MATCH (wfc:WorkflowInstantiation {id: $inst.id}), (exec:Execution {id: $exec.id})\n", "MERGE (wfc)-[relationship:hasExecution]->(exec)\n", "RETURN relationship\"\"\"\n", "\n", "executions = {}\n", "for i in instantiations:\n", " inst = instantiations[i];\n", " exec_id = inst['execution']\n", " sparql.setQuery(PREFIXES + \"\"\"\n", " SELECT DISTINCT * FROM WHERE {\n", " ?Execution ?StartDate ;\n", " ?EndDate ;\n", " ?Result .\n", " \t\t?Result ?ConfidenceType .\n", " \t\t?Result ?ConfidenceValue\n", " VALUES ?Execution {<\"\"\" + exec_id + \">}}\")\n", " r = sparql.query().convert()\n", " if len(r['results']['bindings']) > 0:\n", " item = r['results']['bindings'][0]\n", " execution = {}\n", " execution['id'] = exec_id\n", " execution['start_date'] = item['StartDate']['value']\n", " execution['end_date'] = item['EndDate']['value']\n", " execution['confidence_type'] = item['ConfidenceType']['value']\n", " execution['confidence_value'] = item['ConfidenceValue']['value']\n", " executions[exec_id] = execution\n", " out = neo4j.query(add_exec_query, params={'exec':execution})\n", " print(out)\n", " out = neo4j.query(merge_exec_inst_query, params={'inst':inst, 'exec':execution})\n", " print(out)\n", "\n", "neo4j.query(\"\"\"CREATE CONSTRAINT unique_exec IF NOT EXISTS FOR (l:Execution) REQUIRE l.id IS UNIQUE\"\"\")" ] }, { "cell_type": "markdown", "id": "c89f5cab-85ed-465a-9176-c5a78b5c8b0a", "metadata": {}, "source": [ "### input/output" ] }, { "cell_type": "code", "execution_count": 26, "id": "e898d35c-9f7c-4306-80ff-f3073a6ae401", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/snp-9KsH6OCKGJ2A'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/demographic_min-6JIJEK1lGoDh'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/trait-feW04T9gR5jg'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/demographic_value-hF9GLnTTKMZM'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/demographic-xDaSSRJlqbS1'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/area-6T51xYmHFFxR'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/demographic_max-iT9J5wYjRn3o'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/cohortData-YQ8q3TKzaI7I'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/log-s94xrVsoOigf'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/p_value-wZu4AXbGKEYB'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/brain_visualization-zOghWK6hUkKJ'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ/instantiations/-LbRnRP91gcXu/bindings/scatter-MDjiGB8P4yjF'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/demographic_min-oxOStdKAt1dE'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/demographic_value-LdhTYw8xDNBY'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/trait-M7E9bN3vYEFv'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/demographic_max-s3L4rbFeez0i'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/snp-80tnHu1faEGt'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/demographic-gRDSowEiqgPe'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/area-4XH0gGPY5CvH'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/cohortData-Z3R0oBSm16S1'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/p_value-pD7IWFEuqSTe'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/brain_visualization-3VCoEDF4zFUw'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/log-vj0VdhZzRVOu'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K/instantiations/-wnFyzaHf3k0J/bindings/scatter-JEUNrxvpmfYP'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/snp-kMVnCHANGvE5'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/demographic_max-d4kucGEBkjsr'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/demographic-BStAnJDxMQbf'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/trait-i0uvie4Gtuwq'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/demographic_value-mqx4OKfPBgZl'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/area-zwtmMApbBJhV'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/demographic_min-AbyjZ9Qu2tk0'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/cohortData-UtzgYethtUEU'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/scatter-giJqZho6kb5c'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/brain_visualization-X8CJAGKSurYQ'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/log-S5bfgKUHywEi'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3/instantiations/-p4T22wUh71sC/bindings/p_value-SzC3WykNYeLH'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/demographic-g21Y3aVY3cdO'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/demographic_max-G3AjR1eeYJm0'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/area-oxmoaknioW25'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/snp-SXb5dajUhbiJ'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/demographic_min-qJF5YPuXPte7'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/trait-lfXqxGC6soHU'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/demographic_value-fJSQqBneGwNo'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/cohortData-HpJZKtSftdz8'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/brain_visualization-gGDKpHWrU7JW'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/log-qwW4Ya7olyNi'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/scatter-mQTfSnrytq9J'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM/instantiations/-DTq9uoBEMkgm/bindings/p_value-O0gmn75rRATo'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/demographic_value-0bPim8oy48hf'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/snp-uTvuspMuRvjn'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/demographic_min-K2n1FIlEGMBO'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/demographic_max-hAAcU1285MnY'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/demographic-ZRYbxoMbYuAF'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/trait-XQsTSz6jTYld'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/area-2YS9I22BSOkJ'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/cohortData-YwSHneWNLLhV'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/log-JdOulww16v2T'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/brain_visualization-fGKJgbXWSp72'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/p_value-C2QQwRiykfGk'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61/instantiations/-6pdbgyBxBb9w/bindings/scatter-WRjNugK2Elm8'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/snp-twyafQVQOisd'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/trait-8fTONXjWs816'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/demographic-PxoiGF4m1BYo'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/demographic_min-EEskA1oaG2FT'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/area-MXF48etJAwrF'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/demographic_value-8MQ2G1piVSqI'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/demographic_max-LD5P9UdfgzIC'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/cohortData-dyhj26G7LGcN'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/log-A5pG7zRUGypq'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/scatter-v6VSuxkq7e2E'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/brain_visualization-Wd0GZmyyN64v'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez/instantiations/-Y9pKl4PTjzlu/bindings/p_value-hQjTZmpKxm5h'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/demographic-jcEs0eT7hHi8'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/demographic_min-x2x5Qi2lBLHT'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/demographic_max-vrxOL5LYR53V'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/trait-aGXpFyovrwCy'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/demographic_value-RHIelBfw0XWk'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/area-v5vecFDgrvz0'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/snp-uvmZzoIsJAOf'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/cohortData-4MHCbnj5G1Ji'}]\n", "[{'relationship': ({}, 'hasInput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/scatter-jWqHdG8hgfq1'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/log-tsEeO9iqTqm2'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/p_value-oSOCMfJRtaij'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n", "[{'b.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC/instantiations/-iw3phUA6ox6U/bindings/brain_visualization-K4qh4r4ROTdr'}]\n", "[{'relationship': ({}, 'hasOutput', {})}]\n" ] } ], "source": [ "# Load Inputs and outputs for executions\n", "merge_exec_input_query = \"\"\"\n", "MATCH (exec:Execution {id: $exec.id}), (binding:Binding {id: $binding.id})\n", "MERGE (exec)-[relationship:hasInput]->(binding)\n", "RETURN relationship\"\"\"\n", "merge_exec_output_query = \"\"\"\n", "MATCH (exec:Execution {id: $exec.id}), (binding:Binding {id: $binding.id})\n", "MERGE (exec)-[relationship:hasOutput]->(binding)\n", "RETURN relationship\"\"\"\n", "\n", "for exec_id in executions:\n", " ex = executions[exec_id]\n", " for t in ['hasInputFile', 'hasOutputFile']:\n", " sparql_query = PREFIXES + \"\"\"\n", "SELECT DISTINCT * FROM WHERE {\n", " ?Exec ?Binding .\n", " ?Binding ?Variable ;\n", " ?Value ;\n", " ?Type .\n", " VALUES ?Exec { <\"\"\" + exec_id + \"> }}\"\n", " sparql.setQuery(sparql_query)\n", " r = sparql.query().convert()\n", " for item in r['results']['bindings']:\n", " binding = {}\n", " binding['id'] = item['Binding']['value']\n", " binding['variable'] = item['Variable']['value']\n", " binding['value'] = item['Value']['value']\n", " binding['type'] = item['Type']['value']\n", " out = neo4j.query(add_binding_query, params={'binding':binding})\n", " print(out)\n", " if t == 'hasInputFile':\n", " out = neo4j.query(merge_exec_input_query, params={'binding':binding, 'exec':ex})\n", " else:\n", " out = neo4j.query(merge_exec_output_query, params={'binding':binding, 'exec':ex})\n", " print(out)" ] }, { "cell_type": "code", "execution_count": 27, "id": "e2f4bf0c-5ad3-4f12-9afa-ec6c0d877a4e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Node properties:\n", "Question {id: STRING, name: STRING, template: STRING}\n", "Goal {date_created: STRING, description: STRING, id: STRING, name: STRING}\n", "LineOfInquiry {name: STRING, date_created: STRING, description: STRING, id: STRING, query_template: STRING}\n", "TriggeredLineOfInquiry {fullText: STRING, textEmbedding: LIST, source: STRING, status: STRING, query_template: STRING, query: STRING, date_created: STRING, query_response: STRING, description: STRING, id: STRING, name: STRING}\n", "Execution {date_end: STRING, date_start: STRING, confidence_value: STRING, id: STRING, confidence_type: STRING}\n", "Binding {type: STRING, id: STRING, variable: STRING, value: STRING}\n", "WorkflowSeed {id: STRING}\n", "WorkflowInstantiation {name: STRING, description: STRING, id: STRING, workflow_link: STRING}\n", "Relationship properties:\n", "\n", "The relationships:\n", "(:Goal)-[:hasQuestion]->(:Question)\n", "(:Goal)-[:hasQuestionBinding]->(:Binding)\n", "(:LineOfInquiry)-[:hasQuestion]->(:Question)\n", "(:LineOfInquiry)-[:hasWorkflowSeed]->(:WorkflowSeed)\n", "(:TriggeredLineOfInquiry)-[:hasWorkflowInstantiation]->(:WorkflowInstantiation)\n", "(:TriggeredLineOfInquiry)-[:hasQuestion]->(:Question)\n", "(:TriggeredLineOfInquiry)-[:hasLineOfInquiry]->(:LineOfInquiry)\n", "(:TriggeredLineOfInquiry)-[:hasGoal]->(:Goal)\n", "(:Execution)-[:hasOutput]->(:Binding)\n", "(:Execution)-[:hasInput]->(:Binding)\n", "(:WorkflowSeed)-[:hasInput]->(:Binding)\n", "(:WorkflowSeed)-[:hasParameter]->(:Binding)\n", "(:WorkflowInstantiation)-[:hasExecution]->(:Execution)\n" ] } ], "source": [ "neo4j.refresh_schema()\n", "print(neo4j.schema)" ] }, { "cell_type": "markdown", "id": "a211564a-493d-4331-9717-54812913d2fd", "metadata": {}, "source": [ "\n", "## Create simple text representations" ] }, { "cell_type": "code", "execution_count": 115, "id": "aa31a856-063d-40a3-8421-2d662310f18e", "metadata": {}, "outputs": [], "source": [ "TEXT_CONTEXT = \"\"\"\n", "[GENERAL CONTEXT]\n", "A Question Template is a text representation of possible questions the DISK system is able to test.\n", "Question Templates contains one or more Question Variables that are denoted by the prefix “?” (e.g ?Genotype is the Question Variable \"Genotype\").\n", "Question Variables provide multiple options retrieved from the data source. Users can select option values to customize the Question Template. \n", "\n", "A Goal is what a DISK user wants to test. Goals are identified by an ID and have Name and Description.\n", "Goals follow a Question Template and provide values for all of its Question Variables.\n", "\n", "A Line of Inquiry is how DISK will test a Question Template. Lines of inquiry are identified by ID and have the follorwing properties: Name, Description, Data Query Template and Workflow Seed.\n", "Lines of Inquiry follow a Question Template and use Question Variable values to customize its Data Query Template and Workflow Seed.\n", "\n", "When the DISK system finds a Goal and a Line of Inquiry that follows the same Question template, a Triggered Line of Inquiry is created.\n", "A Triggered Line of Inquiry is identified by an ID, Data Query and Workflow Instantiation.\n", "The Triggered Line of Inquiry Data Query is created by using the Goal Question Variable Values to customize the Line of Inquiry Data Query Template. \n", "This data query is used to retrieve inputs and parameters to use on the Workflow Seed. When all parameters and inputs are set, a new Execution is send.\n", "This data query is executed periodically and when new data is found a new Triggered Line of Inquiry is created.\n", "\n", "An Execution is a workflow run. Uses the data gathered by the Triggered Line of Inquiry to customize the run of an experiment.\n", "This experiment can return a confidence value and one or several output files.\n", "\"\"\"\n", "\n", "GOAL_TEMPLATE = \"\"\"\n", "[GOAL]\n", "ID: {}\n", "Name: {}\n", "Description: {}\n", "Question Template: {}\n", "Question Variable Values: {}\n", "\"\"\"\n", "\n", "LOI_TEMPLATE = \"\"\"\n", "[Line of Inquiry]\n", "ID: {}\n", "Name: {}\n", "Description: {}\n", "Question Template: {}\n", "Data Query Template: {}\n", "\"\"\"\n", "\n", "TLOI_TEMPLATE = \"\"\"\n", "[Triggered Line of Inquiry]\n", "ID: {}\n", "Goal ID: {}\n", "Line of Inquiry ID: {}\n", "Question Template: {}\n", "Data Query: {}\n", "Workflow Name: {}\n", "Workflow Description: {}\n", "Execution Date: {}\n", "Execution confidence value: {} ({})\n", "Execution Inputs: {}\n", "Execution Outputs: {}\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 7, "id": "0501b6fe-e3f6-4be6-83a9-84204fd5662a", "metadata": {}, "outputs": [], "source": [ "# fulltext for Goals\n", "goal_data = \"\"\"\n", "MATCH (g:Goal) -[hasQuestion]-> (q:Question) \n", "WITH COLLECT {MATCH (g:Goal) -[hasQuestionTemplate]-> (b:Binding)\n", " RETURN apoc.text.replace(b.variable, \".*?/\" , \"\") + \" = \" + b.value } as bindings, g, q\n", "RETURN g.id, g.name, g.description, q.template, bindings\n", "\"\"\"\n", "\n", "goal_results = neo4j.query(goal_data)" ] }, { "cell_type": "code", "execution_count": 140, "id": "1a9fb22a-7218-4eea-a268-e44b74cfe1b6", "metadata": {}, "outputs": [], "source": [ "goal_text = {};\n", "for i in goal_results:\n", " goal_text[i[\"g.id\"]] = \"[GOAL]\\nID: {}\\nName: {}\\nDescription: {}\\nQuestion Template: {}\\nQuestion Variable Values: {}\".format(*i.values())\n", " #print(goal_text[i[\"g.id\"]].replace('http://localhost:8080/disk-project-server/goals/', ''))" ] }, { "cell_type": "code", "execution_count": 101, "id": "7774ca39-f5db-4b19-afb3-c65d7f95e22e", "metadata": {}, "outputs": [], "source": [ "loi_data = \"\"\"\n", "MATCH (loi:LineOfInquiry) -[hasQuestion]-> (q:Question)\n", "RETURN loi.id, loi.name, loi.description, q.template, loi.query_template\n", "\"\"\"\n", "loi_results = neo4j.query(loi_data)" ] }, { "cell_type": "code", "execution_count": 139, "id": "82c77c98-6214-4757-9b85-7ccf7cf72d7d", "metadata": { "scrolled": true }, "outputs": [], "source": [ "loi_text = {}\n", "for i in loi_results:\n", " loi_text[i[\"loi.id\"]] = \"[Line of Inquiry]\\nID: {}\\nName: {}\\nDescription: {}\\nQuestion Template: {}\\nData Query Template: {}\".format(*i.values())\n", " #print(loi_text[i[\"loi.id\"]])" ] }, { "cell_type": "code", "execution_count": 10, "id": "f131fa44-31ec-4819-accc-5363ee1cb691", "metadata": { "scrolled": true }, "outputs": [], "source": [ "#We create embedings for executions, for enigma all TLOIs only run one workflow and have one execution, this is a simplification of the structure of DISK, were multiple workflows/executions are possible.\n", "tloi_data = \"\"\"\n", "MATCH (tloi:TriggeredLineOfInquiry) -[:hasGoal]-> (g:Goal), \n", " (tloi:TriggeredLineOfInquiry) -[:hasLineOfInquiry]-> (loi:LineOfInquiry),\n", " (loi:LineOfInquiry) -[:hasQuestion]-> (q:Question),\n", " (tloi:TriggeredLineOfInquiry) -[:hasWorkflowInstantiation]-> (inst:WorkflowInstantiation),\n", " (inst:WorkflowInstantiation) -[:hasExecution]-> (exec:Execution)\n", "WITH COLLECT {MATCH (exec:Execution) -[:hasInput]-> (ba:Binding)\n", " RETURN apoc.text.replace(ba.variable, \".*?/\" , \"\") + \" = \" + ba.value } as inputs,\n", " COLLECT {MATCH (exec:Execution) -[:hasOutput]-> (bb:Binding)\n", " RETURN apoc.text.replace(bb.variable, \".*?/\" , \"\") + \" = \" + bb.value } as outputs,\n", " tloi, g, loi, inst, exec, q\n", "RETURN tloi.id, g.id, loi.id, q.template,tloi.query, inst.name, inst.description, exec.date_start, exec.confidence_value, exec.confidence_type, apoc.text.join(inputs, \"\\n - \"), apoc.text.join(outputs, \"\\n - \")\n", "\"\"\"\n", "tloi_results = neo4j.query(tloi_data)" ] }, { "cell_type": "code", "execution_count": 138, "id": "6f476106-3d84-495a-93a0-b1efb097bbc8", "metadata": { "scrolled": true }, "outputs": [], "source": [ "tloi_text = {}\n", "for i in tloi_results:\n", " tloi_text[i[\"tloi.id\"]] = \"[Triggered Line of Inquiry]\\nID: {}\\nGoal ID: {}\\nLine of Inquiry ID: {}\\nQuestion Template: {}\\nData Query: {}\\nWorkflow Name: {}\\nWorkflow Description: {}\\nExecution Date: {}\\nExecution confidence value: {} ({})\\nExecution Inputs: \\n - {}\\nExecution Outputs: \\n - {}\".format(*i.values())\n", " #print(tloi_text[i[\"tloi.id\"]].replace('http://localhost:8080/disk-project-server/tlois/','').replace('http://localhost:8080/disk-project-server/goals/','').replace('http://localhost:8080/disk-project-server/lois/','').replace('http://localhost:8080/wings-portal/export/users/admin/Enigma/data/library.owl#',''))" ] }, { "cell_type": "code", "execution_count": 135, "id": "6c823d16-215c-4746-889e-cb2912029f53", "metadata": {}, "outputs": [], "source": [ "#Write text to neo4j\n", "add_text_tloi = \"\"\"\n", "MATCH (tloi:TriggeredLineOfInquiry {id: $id})\n", "SET tloi.fullText = $text\n", "RETURN tloi.fullText\"\"\"\n", "add_text_loi = \"\"\"\n", "MATCH (loi:LineOfInquiry {id: $id})\n", "SET loi.fullText = $text\n", "RETURN loi.fullText\"\"\"\n", "add_text_goal = \"\"\"\n", "MATCH (goal:Goal {id: $id})\n", "SET goal.fullText = $text\n", "RETURN goal.fullText\"\"\"" ] }, { "cell_type": "code", "execution_count": 141, "id": "3333e33d-cb0d-439b-8427-9761851f6b92", "metadata": { "scrolled": true }, "outputs": [], "source": [ "#for ID in goal_text.keys():\n", "# print(neo4j.query(add_text_goal, params={'id':ID, 'text': goal_text[ID]}))\n", "#for ID in loi_text.keys():\n", "# print(neo4j.query(add_text_loi, params={'id':ID, 'text': loi_text[ID]}))\n", "#for ID in tloi_text.keys():\n", "# print(neo4j.query(add_text_tloi, params={'id':ID, 'text': tloi_text[ID]}))" ] }, { "cell_type": "code", "execution_count": 37, "id": "a53ef134-c586-492c-9d92-1201030aad04", "metadata": {}, "outputs": [], "source": [ "# Delete all\n", "#neo4j.query(\"\"\"MATCH (n) OPTIONAL MATCH (n)-[r]-() WITH n,r LIMIT 50000 DELETE n,r RETURN count(n) as deletedNodesCount\"\"\")" ] }, { "cell_type": "markdown", "id": "0885bc0a-12d6-489c-b7cd-d0d0c2855c56", "metadata": {}, "source": [ "### Vector index" ] }, { "cell_type": "code", "execution_count": 104, "id": "1522cf14-e9b2-4281-9cb8-40ef5f7b1c26", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "neo4j.query(\"\"\"\n", " CREATE VECTOR INDEX `tloi_text` IF NOT EXISTS\n", " FOR (t:TriggeredLineOfInquiry) ON (t.textEmbedding) \n", " OPTIONS { indexConfig: {\n", " `vector.dimensions`: 1536,\n", " `vector.similarity_function`: 'cosine' \n", " }}\n", "\"\"\")\n", "neo4j.query(\"\"\"\n", " CREATE VECTOR INDEX `loi_text` IF NOT EXISTS\n", " FOR (t:LineOfInquiry) ON (t.textEmbedding) \n", " OPTIONS { indexConfig: {\n", " `vector.dimensions`: 1536,\n", " `vector.similarity_function`: 'cosine' \n", " }}\n", "\"\"\")\n", "neo4j.query(\"\"\"\n", " CREATE VECTOR INDEX `goal_text` IF NOT EXISTS\n", " FOR (t:Goal) ON (t.textEmbedding) \n", " OPTIONS { indexConfig: {\n", " `vector.dimensions`: 1536,\n", " `vector.similarity_function`: 'cosine' \n", " }}\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 105, "id": "7ce71b56-30c6-4a3a-bfae-5ab0afb84c5f", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[{'id': 21,\n", " 'name': 'goal_text',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'VECTOR',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['Goal'],\n", " 'properties': ['textEmbedding'],\n", " 'indexProvider': 'vector-2.0',\n", " 'owningConstraint': None,\n", " 'lastRead': None,\n", " 'readCount': None},\n", " {'id': 1,\n", " 'name': 'index_343aff4e',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'LOOKUP',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': None,\n", " 'properties': None,\n", " 'indexProvider': 'token-lookup-1.0',\n", " 'owningConstraint': None,\n", " 'lastRead': neo4j.time.DateTime(2024, 9, 25, 7, 11, 19, 167000000, tzinfo=),\n", " 'readCount': 2300},\n", " {'id': 2,\n", " 'name': 'index_f7700477',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'LOOKUP',\n", " 'entityType': 'RELATIONSHIP',\n", " 'labelsOrTypes': None,\n", " 'properties': None,\n", " 'indexProvider': 'token-lookup-1.0',\n", " 'owningConstraint': None,\n", " 'lastRead': neo4j.time.DateTime(2024, 7, 11, 5, 15, 54, 824000000, tzinfo=),\n", " 'readCount': 1},\n", " {'id': 20,\n", " 'name': 'loi_text',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'VECTOR',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['LineOfInquiry'],\n", " 'properties': ['textEmbedding'],\n", " 'indexProvider': 'vector-2.0',\n", " 'owningConstraint': None,\n", " 'lastRead': None,\n", " 'readCount': None},\n", " {'id': 19,\n", " 'name': 'tloi_text',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'VECTOR',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['TriggeredLineOfInquiry'],\n", " 'properties': ['textEmbedding'],\n", " 'indexProvider': 'vector-2.0',\n", " 'owningConstraint': None,\n", " 'lastRead': neo4j.time.DateTime(2024, 8, 28, 4, 10, 41, 996000000, tzinfo=),\n", " 'readCount': 8},\n", " {'id': 15,\n", " 'name': 'unique_binding',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['Binding'],\n", " 'properties': ['id'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_binding',\n", " 'lastRead': neo4j.time.DateTime(2024, 9, 25, 4, 46, 57, 268000000, tzinfo=),\n", " 'readCount': 182321},\n", " {'id': 11,\n", " 'name': 'unique_exec',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['Execution'],\n", " 'properties': ['id'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_exec',\n", " 'lastRead': neo4j.time.DateTime(2024, 9, 25, 4, 46, 57, 268000000, tzinfo=),\n", " 'readCount': 686},\n", " {'id': 5,\n", " 'name': 'unique_goal',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['Goal'],\n", " 'properties': ['id'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_goal',\n", " 'lastRead': neo4j.time.DateTime(2024, 9, 25, 7, 24, 29, 931000000, tzinfo=),\n", " 'readCount': 358},\n", " {'id': 13,\n", " 'name': 'unique_inst',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['WorkflowConfiguration'],\n", " 'properties': ['id'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_inst',\n", " 'lastRead': neo4j.time.DateTime(2024, 7, 12, 6, 11, 17, 400000000, tzinfo=),\n", " 'readCount': 183},\n", " {'id': 7,\n", " 'name': 'unique_loi',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['LineOfInquiry'],\n", " 'properties': ['id'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_loi',\n", " 'lastRead': neo4j.time.DateTime(2024, 9, 25, 7, 25, 29, 26000000, tzinfo=),\n", " 'readCount': 150},\n", " {'id': 3,\n", " 'name': 'unique_question',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['Question'],\n", " 'properties': ['questionId'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_question',\n", " 'lastRead': neo4j.time.DateTime(2024, 7, 11, 3, 37, 5, 228000000, tzinfo=),\n", " 'readCount': 17},\n", " {'id': 17,\n", " 'name': 'unique_seed',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['WorkflowSeed'],\n", " 'properties': ['id'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_seed',\n", " 'lastRead': neo4j.time.DateTime(2024, 9, 25, 4, 46, 39, 666000000, tzinfo=),\n", " 'readCount': 87469},\n", " {'id': 9,\n", " 'name': 'unique_tloi',\n", " 'state': 'ONLINE',\n", " 'populationPercent': 100.0,\n", " 'type': 'RANGE',\n", " 'entityType': 'NODE',\n", " 'labelsOrTypes': ['TriggeredLineOfInquiry'],\n", " 'properties': ['id'],\n", " 'indexProvider': 'range-1.0',\n", " 'owningConstraint': 'unique_tloi',\n", " 'lastRead': neo4j.time.DateTime(2024, 9, 25, 7, 25, 50, 664000000, tzinfo=),\n", " 'readCount': 500}]" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "neo4j.query(\"SHOW INDEXES\")" ] }, { "cell_type": "code", "execution_count": 138, "id": "778db655-883c-42cf-b905-c4a5c96db8b4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'tloi.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ'},\n", " {'tloi.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K'},\n", " {'tloi.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3'},\n", " {'tloi.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM'},\n", " {'tloi.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61'},\n", " {'tloi.id': 'http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez'}]" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "neo4j.query(\"\"\"\n", " MATCH (tloi:TriggeredLineOfInquiry) WHERE tloi.textEmbedding IS NULL\n", " return tloi.id\n", " \"\"\")" ] }, { "cell_type": "code", "execution_count": 137, "id": "722cde1b-e177-4511-afd8-c0be02e366f4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://localhost:8080/disk-project-server/tlois/TriggeredLOI-fEdISYTbY6OC\n", "http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez\n", "http://localhost:8080/disk-project-server/tlois/TriggeredLOI-j5QRPbmS5u61\n", "http://localhost:8080/disk-project-server/tlois/TriggeredLOI-SP3oHYmxkUrM\n", "http://localhost:8080/disk-project-server/tlois/TriggeredLOI-MA2p3owIlWh3\n", "http://localhost:8080/disk-project-server/tlois/TriggeredLOI-E8PbUbCdZB4K\n", "http://localhost:8080/disk-project-server/tlois/TriggeredLOI-bfjxauj6EcYQ\n" ] } ], "source": [ "for ID in tloi_text.keys():\n", " print(ID)" ] }, { "cell_type": "code", "execution_count": 110, "id": "f952fd30-6198-488b-a4a7-376b8d9672db", "metadata": {}, "outputs": [], "source": [ "#This deletes the embedings\n", "#neo4j.query(\"\"\"\n", "# MATCH (tloi:TriggeredLineOfInquiry) WHERE tloi.textEmbedding IS NOT NULL\n", "# remove tloi.textEmbedding\n", "# return tloi.id\n", "# \"\"\")" ] }, { "cell_type": "code", "execution_count": 139, "id": "1969003e-a81c-4b79-aae3-e67f2347c61d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "[]\n", "[]\n", "[]\n", "[]\n", "[]\n", "[]\n" ] } ], "source": [ "add_embedings_tloi = \"\"\"\n", " MATCH (tloi:TriggeredLineOfInquiry {id:$id}) WHERE tloi.textEmbedding IS NULL\n", " CALL apoc.ml.openai.embedding([tloi.fullText], $apiKey) yield embedding\n", " CALL db.create.setNodeVectorProperty(tloi, \"textEmbedding\", embedding)\n", "\"\"\"\n", "\n", "for ID in tloi_text.keys():\n", " print(neo4j.query(add_embedings_tloi, params={\"apiKey\":OPENAI_API_KEY, 'id':ID}))\n", " sleep(2)" ] }, { "cell_type": "code", "execution_count": 128, "id": "069ae76a-455e-4453-9812-79da75140d41", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "[]\n" ] } ], "source": [ "add_embedings_loi = \"\"\"\n", " MATCH (loi:LineOfInquiry {id:$id}) WHERE loi.textEmbedding IS NULL\n", " CALL apoc.ml.openai.embedding([loi.fullText], $apiKey) yield embedding\n", " CALL db.create.setNodeVectorProperty(loi, \"textEmbedding\", embedding)\n", "\"\"\"\n", "\n", "for ID in loi_text.keys():\n", " print(neo4j.query(add_embedings_loi, params={\"apiKey\":OPENAI_API_KEY, 'id':ID}))\n", " sleep(2)" ] }, { "cell_type": "code", "execution_count": 140, "id": "f738cf0f-74da-439f-9c33-b187799ac03b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "[]\n" ] } ], "source": [ "add_embedings_goal = \"\"\"\n", " MATCH (g:Goal {id:$id}) WHERE g.textEmbedding IS NULL\n", " CALL apoc.ml.openai.embedding([g.fullText], $apiKey) yield embedding\n", " CALL db.create.setNodeVectorProperty(g, \"textEmbedding\", embedding)\n", "\"\"\"\n", "\n", "for ID in goal_text.keys():\n", " print(neo4j.query(add_embedings_goal, params={\"apiKey\":OPENAI_API_KEY, 'id':ID}))\n", " sleep(2)" ] }, { "cell_type": "code", "execution_count": 126, "id": "c8e13305-327f-42af-8e18-f9f8f3e588b5", "metadata": {}, "outputs": [], "source": [ "eg = neo4j.query(\"\"\"\n", " MATCH (tloi:TriggeredLineOfInquiry) \n", " WHERE tloi.textEmbedding IS NOT NULL\n", " RETURN tloi.textEmbedding\n", " LIMIT 1\n", " \"\"\"\n", ")" ] }, { "cell_type": "code", "execution_count": 127, "id": "ee776bbd-275f-4f8f-94ca-7391bb62839e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[-0.01246707420796156,\n", " 0.01806901954114437,\n", " 0.023506201803684235,\n", " -0.022146906703710556,\n", " -0.03287023678421974,\n", " 0.03575359284877777,\n", " -0.005955499596893787,\n", " 0.005241525825113058,\n", " -0.014540343545377254,\n", " -0.05266926810145378]" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eg[0]['tloi.textEmbedding'][:10]" ] }, { "cell_type": "markdown", "id": "1a4f0ac0-8d3c-48db-9725-6aeb920a39cc", "metadata": {}, "source": [ "### Questions and Answers" ] }, { "cell_type": "code", "execution_count": 67, "id": "71313377-fd9c-42cd-bfbb-e94a80f1ca06", "metadata": {}, "outputs": [], "source": [ "#More data for tlois:\n", "#This query returns too much data, it exceds the max tokens\n", "#retrieval_query = \"\"\"\n", "# OPTIONAL MATCH (tloi:TriggeredLineOfInquiry) -[hasGoal]-> (g:Goal)\n", "# OPTIONAL MATCH (tloi:TriggeredLineOfInquiry) -[hasLineOfInquiry]-> (l:LineOfInquiry)\n", "# RETURN g.fullText + l.fullText as text, score, {} AS metadata\n", "#\"\"\"\n", "#More data for tlois:\n", "retrieval_query = \"\"\"\n", " OPTIONAL MATCH (tloi:TriggeredLineOfInquiry) -[hasGoal]-> (g:Goal)\n", " RETURN g.fullText as text, score, {} AS metadata\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 56, "id": "c78ee2ae-b6f5-4493-989f-19a5ac1b1bba", "metadata": {}, "outputs": [], "source": [ "question = \"What is the Triggered Line of Inquiry with the lower p-value?\"" ] }, { "cell_type": "code", "execution_count": 80, "id": "e7eb9de9-0847-408e-b245-35eac23f017c", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated procedure. ('db.create.setVectorProperty' has been replaced by 'db.create.setNodeVectorProperty')} {position: line: 1, column: 84, offset: 83} for query: \"UNWIND $data AS row MATCH (n:`TriggeredLineOfInquiry`) WHERE elementId(n) = row.id CALL db.create.setVectorProperty(n, 'textEmbedding', row.embedding) YIELD node RETURN count(*)\"\n" ] } ], "source": [ "#Main retriever\n", "neo4j_vector_store_tloi = Neo4jVector.from_existing_graph(\n", " embedding=OpenAIEmbeddings(),\n", " url=NEO4J_URI,\n", " username=NEO4J_USERNAME,\n", " password=NEO4J_PASSWORD,\n", " index_name='tloi_text',\n", " node_label=\"TriggeredLineOfInquiry\",\n", " text_node_properties=[\"fullText\"],\n", " embedding_node_property=\"textEmbedding\",\n", " #retrieval_query=retrieval_query,\n", ")\n", "tloi_retriever = neo4j_vector_store_tloi.as_retriever()" ] }, { "cell_type": "code", "execution_count": 69, "id": "b693c2f4-e363-41fa-a52a-07ca7d0a4950", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated procedure. ('db.create.setVectorProperty' has been replaced by 'db.create.setNodeVectorProperty')} {position: line: 1, column: 75, offset: 74} for query: \"UNWIND $data AS row MATCH (n:`LineOfInquiry`) WHERE elementId(n) = row.id CALL db.create.setVectorProperty(n, 'textEmbedding', row.embedding) YIELD node RETURN count(*)\"\n", "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated procedure. ('db.create.setVectorProperty' has been replaced by 'db.create.setNodeVectorProperty')} {position: line: 1, column: 66, offset: 65} for query: \"UNWIND $data AS row MATCH (n:`Goal`) WHERE elementId(n) = row.id CALL db.create.setVectorProperty(n, 'textEmbedding', row.embedding) YIELD node RETURN count(*)\"\n" ] } ], "source": [ "#Other retrievers\n", "neo4j_vector_store_loi = Neo4jVector.from_existing_graph(\n", " embedding=OpenAIEmbeddings(),\n", " url=NEO4J_URI,\n", " username=NEO4J_USERNAME,\n", " password=NEO4J_PASSWORD,\n", " index_name='loi_text',\n", " node_label=\"LineOfInquiry\",\n", " text_node_properties=[\"fullText\"],\n", " embedding_node_property=\"textEmbedding\",\n", ")\n", "neo4j_vector_store_goal = Neo4jVector.from_existing_graph(\n", " embedding=OpenAIEmbeddings(),\n", " url=NEO4J_URI,\n", " username=NEO4J_USERNAME,\n", " password=NEO4J_PASSWORD,\n", " index_name='goal_text',\n", " node_label=\"Goal\",\n", " text_node_properties=[\"fullText\"],\n", " embedding_node_property=\"textEmbedding\",\n", ")\n", "\n", "loi_retriever = neo4j_vector_store_loi.as_retriever()\n", "goal_retriever = neo4j_vector_store_goal.as_retriever()" ] }, { "cell_type": "code", "execution_count": 59, "id": "58bf679f-0de4-4ff4-a09b-613a5948f009", "metadata": {}, "outputs": [], "source": [ "chain = RetrievalQAWithSourcesChain.from_chain_type(\n", " ChatOpenAI(temperature=0), \n", " chain_type=\"stuff\", \n", " retriever=tloi_retriever\n", ")" ] }, { "cell_type": "code", "execution_count": 82, "id": "ee4e78b4-4a55-49fc-8550-45ecc4ec73fd", "metadata": {}, "outputs": [], "source": [ "from langchain.retrievers import EnsembleRetriever\n", "ensemble_retriever = EnsembleRetriever(\n", " retrievers=[tloi_retriever, loi_retriever, goal_retriever], weights=[0.34, 0.33, 0.33]\n", ")" ] }, { "cell_type": "code", "execution_count": 84, "id": "2b534d10-40a0-4cc2-b0f0-d63f115f5a99", "metadata": { "scrolled": true }, "outputs": [], "source": [ "from langchain.chains import create_retrieval_chain\n", "from langchain.chains.combine_documents import create_stuff_documents_chain\n", "from langchain_core.prompts import ChatPromptTemplate\n", "from langchain_openai import ChatOpenAI\n", "\n", "\n", "retriever = ensemble_retriever\n", "llm = ChatOpenAI()\n", "\n", "system_prompt = (\n", " \"Use the given context to answer the question. \"\n", " \"If you don't know the answer, say you don't know. \"\n", " \"Use three sentence maximum and keep the answer concise. \"\n", " \"Context: {context}\"\n", ")\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\"system\", system_prompt),\n", " (\"human\", \"{input}\"),\n", " ]\n", ")\n", "question_answer_chain = create_stuff_documents_chain(llm, prompt)\n", "chain = create_retrieval_chain(retriever, question_answer_chain)" ] }, { "cell_type": "code", "execution_count": 86, "id": "f64df565-8b19-4bf5-9971-1a6a1ede3f68", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Triggered Line of Inquiry with the lower p-value is:\n", "- ID: http://localhost:8080/disk-project-server/tlois/TriggeredLOI-oCyPWu4b8Rez\n", "- Execution confidence value: 0.37155457478486e0 (p-Value)\n" ] } ], "source": [ "resp = chain.invoke({\"input\": \"What is the Triggered Line of Inquiry with the lower p-value?\"})\n", "print(resp['answer'])" ] }, { "cell_type": "markdown", "id": "028074a4-0c9a-40c1-851e-8901b4ced9ae", "metadata": {}, "source": [ "This is not the correct answer" ] }, { "cell_type": "code", "execution_count": 131, "id": "0d607e60-63ef-43ec-8980-ba54976ce0a8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Triggered Line of Inquiry (TriggeredLOI-oCyPWu4b8Rez) is associated with the goal of running a meta-analysis for cohorts filtered by genetic ancestry. This line of inquiry focuses on investigating the effect size of a specific genetic variant (rs1080066) on the surface area of the Precentral Cortex for cohorts with European ancestry. The meta-regression analysis aims to determine the relationship between the effect size and the mean age of each cohort.\n" ] } ], "source": [ "resp = chain.invoke({\"input\": \"Give me a summary of TriggeredLOI-oCyPWu4b8Rez\"})\n", "print(resp['answer'])" ] }, { "cell_type": "code", "execution_count": 132, "id": "b3bf544e-0279-4250-ae20-08a63f69083b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Triggered Line of Inquiry TriggeredLOI-oCyPWu4b8Rez is associated with Goal-XBkQmDYmJAn0 and investigates the effect size of the genotype rs1080066 on the Surface Area trait for the Precentral Cortex among cohorts of European ancestry. The execution was conducted using the Meta-Regression workflow on multiple cohort datasets, resulting in a p-value of 0.0828042022392925, indicating a moderate level of statistical confidence in the analysis.\n" ] } ], "source": [ "resp = chain.invoke({\"input\": \"Give me a summary of TriggeredLOI-oCyPWu4b8Rez, include information about the execution\"})\n", "print(resp['answer'])" ] }, { "cell_type": "code", "execution_count": 133, "id": "0fd20b45-fe62-4a9a-ad3f-7af15493f722", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TriggeredLOI-oCyPWu4b8Rez is associated with Goal-XBkQmDYmJAn0, focusing on the effect size of rs1080066 on the Precentral Cortex Surface Area for cohorts of European ancestry. The execution date was 1970-01-20 at 16:40:55-03, with a confidence value of 0.018198423339433e0 (p-Value). The input files used for this analysis were scatter-28vnxquffs6hvegarrkosp7f3.png, p_value-p59rlb9wz6dyw74xqovftx1l, brain_visualization-3hwj1p2ov0jfe132m063dxez5, and log-4ys4xb4vp4vvsfmfdzteb82kr.\n" ] } ], "source": [ "resp = chain.invoke({\"input\": \"Give me a summary of TriggeredLOI-oCyPWu4b8Rez, include information about the execution and a list of the input files\"})\n", "print(resp['answer'])" ] }, { "cell_type": "markdown", "id": "58d55f2b-73b8-402d-9deb-9f395c27c000", "metadata": {}, "source": [ "**Note:** I've asked three time for the same TLOI but is giving different information. The files it shows as inputs are outputs...\n", "\n", "Lets try creating tloi nodes with less text and adding a retriever query." ] }, { "cell_type": "code", "execution_count": 147, "id": "de1c70e0-ec57-4931-b131-f15c612b2d17", "metadata": { "scrolled": true }, "outputs": [], "source": [ "#Less data now\n", "tloi_data_simpler = \"\"\"\n", "MATCH (tloi:TriggeredLineOfInquiry) -[:hasGoal]-> (g:Goal), \n", " (tloi:TriggeredLineOfInquiry) -[:hasLineOfInquiry]-> (loi:LineOfInquiry),\n", " (loi:LineOfInquiry) -[:hasQuestion]-> (q:Question),\n", " (tloi:TriggeredLineOfInquiry) -[:hasWorkflowInstantiation]-> (inst:WorkflowInstantiation),\n", " (inst:WorkflowInstantiation) -[:hasExecution]-> (exec:Execution)\n", "WITH COLLECT {MATCH (exec:Execution) -[:hasInput]-> (ba:Binding)\n", " RETURN apoc.text.replace(ba.variable, \".*?/\" , \"\") + \" = \" + ba.value } as inputs,\n", " tloi, g, loi, inst, exec, q\n", "RETURN tloi.id, g.id, loi.id, q.template, inst.name, inst.description, exec.confidence_value, exec.confidence_type, apoc.text.join(inputs, \"\\n - \")\n", "\"\"\"\n", "tloi_results_2 = neo4j.query(tloi_data_simpler)\n", "\n", "tloi_text_simpler = {}\n", "for i in tloi_results_2:\n", " tloi_text_simpler[i[\"tloi.id\"]] = \"[Triggered Line of Inquiry]\\nID: {}\\nGoal ID: {}\\nLine of Inquiry ID: {}\\nQuestion Template: {}\\nWorkflow Name: {}\\nWorkflow Description: {}\\nConfidence value: {} ({})\\nInputs: \\n - {}\".format(*i.values()).replace('http://localhost:8080/disk-project-server/tlois/','').replace('http://localhost:8080/disk-project-server/goals/','').replace('http://localhost:8080/disk-project-server/lois/','').replace('http://localhost:8080/wings-portal/export/users/admin/Enigma/data/library.owl#','')\n", " #print(tloi_text_simpler[i[\"tloi.id\"]])" ] }, { "cell_type": "code", "execution_count": 149, "id": "e0e57d17-3d0b-4f7e-b80a-59d03ba549f2", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'tloi.text': \"[Triggered Line of Inquiry]\\nID: TriggeredLOI-fEdISYTbY6OC\\nGoal ID: Goal-XBkQmDYmJAn0\\nLine of Inquiry ID: LOI-Q7zw0HsrUwwD\\nQuestion Template: Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\\nWorkflow Name: Meta-Regression\\nWorkflow Description: Meta regression is considered as an extended statistical model of meta analysis, regressing the effect size against variable(s) of interest to account for the systematic differences of the effect sizes being meta-analyzed\\nConfidence value: 0.018198423339433e0 (p-Value)\\nInputs: \\n - cohortData = ['SHA47a5c5_ASRB_Significant_GWAS.csv','SHA4b15c5_FOR2107_Significant_GWAS.csv','SHA4b15c5_HUBIN_Significant_GWAS.csv','SHA4835c5_MCIC_Significant_GWAS.csv','SHA4825c5_MPRC_Significant_GWAS.csv','SHA4a65c5_PAFIP_Significant_GWAS.csv','SHA4865c5_TOP_Significant_GWAS.csv','SHA4335c5_UMCU_Significant_GWAS.csv','SHA4fe5c4_1000BRAINS_Significant_GWAS.csv','SHA4a85c5_ADNI2GO_Significant_GWAS.csv','SHA4c45c5_BONN_Significant_GWAS.csv','SHA4ad5c5_BrainScale_Significant_GWAS.csv','SHA4a75c5_DNS-V3_Significant_GWAS.csv','SHA4935c5_GSP_Significant_GWAS.csv','SHA4925c5_HUNT_Significant_GWAS.csv','SHA4b85c5_IMAGEN_Significant_GWAS.csv','SHA4bf5c5_ImpACT_Significant_GWAS.csv','SHA4b45c5_LBC1936_Significant_GWAS.csv','SHA48a5c5_LIBD_Significant_GWAS.csv','SHA4905c5_MooDS_Significant_GWAS.csv','SHA4945c5_MUNSTER_Significant_GWAS.csv','SHA4865c5_NCNG_Significant_GWAS.csv','SHA4835c5_NESDA_Significant_GWAS.csv','SHA4f55c5_NeuroIMAGE_Significant_GWAS.csv','SHA46d5c5_NTR_Significant_GWAS.csv','SHA4865c5_PDNZ_Significant_GWAS.csv','SHA47a5c5_PING_Significant_GWAS.csv','SHA3f15c5_PPMI_Significant_GWAS.csv','SHA4875c5_QTIM_Significant_GWAS.csv','SHA4bb5c5_SHIP_Significant_GWAS.csv','SHA4ee5c5_SHIP-Trend_Significant_GWAS.csv','SHA4a85c5_SYS_Significant_GWAS.csv','SHA4c15c5_TCD-NUIG_Significant_GWAS.csv','SHA4ae5c5_UiO2016_Significant_GWAS.csv','SHA3215e5_ABCD_Significant_GWAS.csv','SHA4815c5_ADNI1_Significant_GWAS.csv','SHA4bf5c6_ALSPACa_Significant_GWAS.csv','SHA4aa5c5_BETULA_Significant_GWAS.csv','SHA4945c5_BIG_Significant_GWAS.csv','SHA5305c4_BIG-PsychChip_Significant_GWAS.csv','SHA4ca5c5_CARDIFF_Significant_GWAS.csv','SHA4b15c5_DNS-V4_Significant_GWAS.csv','SHA4c65c5_EPIGEN_Significant_GWAS.csv','SHA4925c5_GIG_Significant_GWAS.csv','SHA4775c5_MPIP_Significant_GWAS.csv','SHA4765c5_OATS_Significant_GWAS.csv','SHA4ba5c5_SydneyMAS_Significant_GWAS.csv','SHA4945c5_TOP3T_Significant_GWAS.csv','SHA4aa5c5_UiO2017_Significant_GWAS.csv','SHA4c95c5_UKBB_Significant_GWAS.csv']\\n - snp = rs1080066\\n - area = PrecentralCortex\\n - demographic_value = ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4','38.2','10','19.7','21.4','58.9','14.6','40.8','72.7','33.2','33.6','35.8','51.6','37.5','17.1','29.4','68.2','11.8','61.7','22.4','55.8','50.4','28.3','29.9','31.8','9.96','74.8','19.6','62.4','22.6','22.5','24.8','19.9','38.4','24.2','48.3','70.5','78.4','33.2','42.1','62.8']\\n - trait = SA\\n - demographic_max = 0\\n - demographic_min = 0\\n - demographic = HasAge Mean (E)\"}]\n", "[{'tloi.text': \"[Triggered Line of Inquiry]\\nID: TriggeredLOI-oCyPWu4b8Rez\\nGoal ID: Goal-XBkQmDYmJAn0\\nLine of Inquiry ID: LOI-Q7zw0HsrUwwD\\nQuestion Template: Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\\nWorkflow Name: Meta-Regression\\nWorkflow Description: Meta regression is considered as an extended statistical model of meta analysis, regressing the effect size against variable(s) of interest to account for the systematic differences of the effect sizes being meta-analyzed\\nConfidence value: 0.0828042022392925e0 (p-Value)\\nInputs: \\n - cohortData = ['SHA47a5c5_ASRB_Significant_GWAS.csv','SHA4b15c5_FOR2107_Significant_GWAS.csv','SHA4b15c5_HUBIN_Significant_GWAS.csv','SHA4835c5_MCIC_Significant_GWAS.csv','SHA4825c5_MPRC_Significant_GWAS.csv','SHA4a65c5_PAFIP_Significant_GWAS.csv','SHA4865c5_TOP_Significant_GWAS.csv','SHA4335c5_UMCU_Significant_GWAS.csv','SHA4fe5c4_1000BRAINS_Significant_GWAS.csv','SHA4a85c5_ADNI2GO_Significant_GWAS.csv','SHA4c45c5_BONN_Significant_GWAS.csv','SHA4ad5c5_BrainScale_Significant_GWAS.csv','SHA4a75c5_DNS-V3_Significant_GWAS.csv','SHA4935c5_GSP_Significant_GWAS.csv','SHA4925c5_HUNT_Significant_GWAS.csv','SHA4b85c5_IMAGEN_Significant_GWAS.csv','SHA4bf5c5_ImpACT_Significant_GWAS.csv','SHA4b45c5_LBC1936_Significant_GWAS.csv','SHA48a5c5_LIBD_Significant_GWAS.csv','SHA4905c5_MooDS_Significant_GWAS.csv','SHA4945c5_MUNSTER_Significant_GWAS.csv','SHA4865c5_NCNG_Significant_GWAS.csv','SHA4835c5_NESDA_Significant_GWAS.csv','SHA4f55c5_NeuroIMAGE_Significant_GWAS.csv','SHA46d5c5_NTR_Significant_GWAS.csv','SHA4865c5_PDNZ_Significant_GWAS.csv','SHA47a5c5_PING_Significant_GWAS.csv','SHA3f15c5_PPMI_Significant_GWAS.csv','SHA4875c5_QTIM_Significant_GWAS.csv','SHA4bb5c5_SHIP_Significant_GWAS.csv','SHA4ee5c5_SHIP-Trend_Significant_GWAS.csv','SHA4a85c5_SYS_Significant_GWAS.csv','SHA4c15c5_TCD-NUIG_Significant_GWAS.csv','SHA4ae5c5_UiO2016_Significant_GWAS.csv','SHA4815c5_ADNI1_Significant_GWAS.csv','SHA4bf5c6_ALSPACa_Significant_GWAS.csv','SHA4aa5c5_BETULA_Significant_GWAS.csv','SHA4945c5_BIG_Significant_GWAS.csv','SHA5305c4_BIG-PsychChip_Significant_GWAS.csv','SHA4ca5c5_CARDIFF_Significant_GWAS.csv','SHA4b15c5_DNS-V4_Significant_GWAS.csv','SHA4c65c5_EPIGEN_Significant_GWAS.csv','SHA4925c5_GIG_Significant_GWAS.csv','SHA4775c5_MPIP_Significant_GWAS.csv','SHA4765c5_OATS_Significant_GWAS.csv','SHA4ba5c5_SydneyMAS_Significant_GWAS.csv','SHA4945c5_TOP3T_Significant_GWAS.csv','SHA4aa5c5_UiO2017_Significant_GWAS.csv','SHA4c95c5_UKBB_Significant_GWAS.csv']\\n - demographic_max = 0\\n - demographic_value = ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4','38.2','10','19.7','21.4','58.9','14.6','40.8','72.7','33.2','33.6','35.8','51.6','37.5','17.1','29.4','68.2','11.8','61.7','22.4','55.8','50.4','28.3','29.9','31.8','74.8','19.6','62.4','22.6','22.5','24.8','19.9','38.4','24.2','48.3','70.5','78.4','33.2','42.1','62.8']\\n - area = PrecentralCortex\\n - demographic_min = 0\\n - demographic = HasAge Mean (E)\\n - trait = SA\\n - snp = rs1080066\"}]\n", "[{'tloi.text': \"[Triggered Line of Inquiry]\\nID: TriggeredLOI-j5QRPbmS5u61\\nGoal ID: Goal-XBkQmDYmJAn0\\nLine of Inquiry ID: LOI-Q7zw0HsrUwwD\\nQuestion Template: Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\\nWorkflow Name: Meta-Regression\\nWorkflow Description: Meta regression is considered as an extended statistical model of meta analysis, regressing the effect size against variable(s) of interest to account for the systematic differences of the effect sizes being meta-analyzed\\nConfidence value: 0.020618776795934e0 (p-Value)\\nInputs: \\n - cohortData = ['SHA47a5c5_ASRB_Significant_GWAS.csv','SHA4b15c5_FOR2107_Significant_GWAS.csv','SHA4b15c5_HUBIN_Significant_GWAS.csv','SHA4835c5_MCIC_Significant_GWAS.csv','SHA4825c5_MPRC_Significant_GWAS.csv','SHA4a65c5_PAFIP_Significant_GWAS.csv','SHA4865c5_TOP_Significant_GWAS.csv','SHA4335c5_UMCU_Significant_GWAS.csv','SHA4fe5c4_1000BRAINS_Significant_GWAS.csv','SHA4a85c5_ADNI2GO_Significant_GWAS.csv','SHA4c45c5_BONN_Significant_GWAS.csv','SHA4ad5c5_BrainScale_Significant_GWAS.csv','SHA4a75c5_DNS-V3_Significant_GWAS.csv','SHA4935c5_GSP_Significant_GWAS.csv','SHA4925c5_HUNT_Significant_GWAS.csv','SHA4b85c5_IMAGEN_Significant_GWAS.csv','SHA4bf5c5_ImpACT_Significant_GWAS.csv','SHA4b45c5_LBC1936_Significant_GWAS.csv','SHA48a5c5_LIBD_Significant_GWAS.csv','SHA4905c5_MooDS_Significant_GWAS.csv','SHA4945c5_MUNSTER_Significant_GWAS.csv','SHA4865c5_NCNG_Significant_GWAS.csv','SHA4835c5_NESDA_Significant_GWAS.csv','SHA4f55c5_NeuroIMAGE_Significant_GWAS.csv','SHA46d5c5_NTR_Significant_GWAS.csv','SHA4865c5_PDNZ_Significant_GWAS.csv','SHA47a5c5_PING_Significant_GWAS.csv','SHA3f15c5_PPMI_Significant_GWAS.csv','SHA4875c5_QTIM_Significant_GWAS.csv','SHA4bb5c5_SHIP_Significant_GWAS.csv','SHA4ee5c5_SHIP-Trend_Significant_GWAS.csv','SHA4a85c5_SYS_Significant_GWAS.csv','SHA4c15c5_TCD-NUIG_Significant_GWAS.csv','SHA4ae5c5_UiO2016_Significant_GWAS.csv','SHA4815c5_ADNI1_Significant_GWAS.csv','SHA4bf5c6_ALSPACa_Significant_GWAS.csv','SHA4aa5c5_BETULA_Significant_GWAS.csv','SHA4945c5_BIG_Significant_GWAS.csv','SHA5305c4_BIG-PsychChip_Significant_GWAS.csv','SHA4ca5c5_CARDIFF_Significant_GWAS.csv','SHA4b15c5_DNS-V4_Significant_GWAS.csv','SHA4c65c5_EPIGEN_Significant_GWAS.csv','SHA4925c5_GIG_Significant_GWAS.csv','SHA4775c5_MPIP_Significant_GWAS.csv','SHA4765c5_OATS_Significant_GWAS.csv','SHA4ba5c5_SydneyMAS_Significant_GWAS.csv','SHA4945c5_TOP3T_Significant_GWAS.csv','SHA4aa5c5_UiO2017_Significant_GWAS.csv']\\n - area = PrecentralCortex\\n - trait = SA\\n - demographic = HasAge Mean (E)\\n - demographic_max = 0\\n - demographic_min = 0\\n - snp = rs1080066\\n - demographic_value = ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4','38.2','10','19.7','21.4','58.9','14.6','40.8','72.7','33.2','33.6','35.8','51.6','37.5','17.1','29.4','68.2','11.8','61.7','22.4','55.8','50.4','28.3','29.9','31.8','74.8','19.6','62.4','22.6','22.5','24.8','19.9','38.4','24.2','48.3','70.5','78.4','33.2','42.1']\"}]\n", "[{'tloi.text': \"[Triggered Line of Inquiry]\\nID: TriggeredLOI-SP3oHYmxkUrM\\nGoal ID: Goal-XBkQmDYmJAn0\\nLine of Inquiry ID: LOI-Q7zw0HsrUwwD\\nQuestion Template: Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\\nWorkflow Name: Meta-Regression\\nWorkflow Description: Meta regression is considered as an extended statistical model of meta analysis, regressing the effect size against variable(s) of interest to account for the systematic differences of the effect sizes being meta-analyzed\\nConfidence value: 0.500060579258888e0 (p-Value)\\nInputs: \\n - cohortData = ['SHA47a5c5_ASRB_Significant_GWAS.csv','SHA4b15c5_FOR2107_Significant_GWAS.csv','SHA4b15c5_HUBIN_Significant_GWAS.csv','SHA4835c5_MCIC_Significant_GWAS.csv','SHA4825c5_MPRC_Significant_GWAS.csv','SHA4a65c5_PAFIP_Significant_GWAS.csv','SHA4865c5_TOP_Significant_GWAS.csv','SHA4335c5_UMCU_Significant_GWAS.csv','SHA4fe5c4_1000BRAINS_Significant_GWAS.csv','SHA4a85c5_ADNI2GO_Significant_GWAS.csv','SHA4c45c5_BONN_Significant_GWAS.csv','SHA4ad5c5_BrainScale_Significant_GWAS.csv','SHA4a75c5_DNS-V3_Significant_GWAS.csv','SHA4935c5_GSP_Significant_GWAS.csv','SHA4925c5_HUNT_Significant_GWAS.csv','SHA4b85c5_IMAGEN_Significant_GWAS.csv','SHA4bf5c5_ImpACT_Significant_GWAS.csv','SHA4b45c5_LBC1936_Significant_GWAS.csv','SHA48a5c5_LIBD_Significant_GWAS.csv','SHA4905c5_MooDS_Significant_GWAS.csv','SHA4945c5_MUNSTER_Significant_GWAS.csv','SHA4865c5_NCNG_Significant_GWAS.csv','SHA4835c5_NESDA_Significant_GWAS.csv','SHA4f55c5_NeuroIMAGE_Significant_GWAS.csv','SHA46d5c5_NTR_Significant_GWAS.csv','SHA4865c5_PDNZ_Significant_GWAS.csv','SHA47a5c5_PING_Significant_GWAS.csv','SHA3f15c5_PPMI_Significant_GWAS.csv','SHA4875c5_QTIM_Significant_GWAS.csv','SHA4bb5c5_SHIP_Significant_GWAS.csv','SHA4ee5c5_SHIP-Trend_Significant_GWAS.csv','SHA4a85c5_SYS_Significant_GWAS.csv','SHA4c15c5_TCD-NUIG_Significant_GWAS.csv','SHA4ae5c5_UiO2016_Significant_GWAS.csv','SHA4815c5_ADNI1_Significant_GWAS.csv','SHA4bf5c6_ALSPACa_Significant_GWAS.csv','SHA4aa5c5_BETULA_Significant_GWAS.csv','SHA4945c5_BIG_Significant_GWAS.csv','SHA5305c4_BIG-PsychChip_Significant_GWAS.csv','SHA4ca5c5_CARDIFF_Significant_GWAS.csv']\\n - demographic_value = ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4','38.2','10','19.7','21.4','58.9','14.6','40.8','72.7','33.2','33.6','35.8','51.6','37.5','17.1','29.4','68.2','11.8','61.7','22.4','55.8','50.4','28.3','29.9','31.8','74.8','19.6','62.4','22.6','22.5','24.8']\\n - trait = SA\\n - demographic_min = 0\\n - snp = rs1080066\\n - area = PrecentralCortex\\n - demographic_max = 0\\n - demographic = HasAge Mean (E)\"}]\n", "[{'tloi.text': \"[Triggered Line of Inquiry]\\nID: TriggeredLOI-MA2p3owIlWh3\\nGoal ID: Goal-XBkQmDYmJAn0\\nLine of Inquiry ID: LOI-Q7zw0HsrUwwD\\nQuestion Template: Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\\nWorkflow Name: Meta-Regression\\nWorkflow Description: Meta regression is considered as an extended statistical model of meta analysis, regressing the effect size against variable(s) of interest to account for the systematic differences of the effect sizes being meta-analyzed\\nConfidence value: 0.37155457478486e0 (p-Value)\\nInputs: \\n - cohortData = ['SHA47a5c5_ASRB_Significant_GWAS.csv','SHA4b15c5_FOR2107_Significant_GWAS.csv','SHA4b15c5_HUBIN_Significant_GWAS.csv','SHA4835c5_MCIC_Significant_GWAS.csv','SHA4825c5_MPRC_Significant_GWAS.csv','SHA4a65c5_PAFIP_Significant_GWAS.csv','SHA4865c5_TOP_Significant_GWAS.csv','SHA4335c5_UMCU_Significant_GWAS.csv','SHA4fe5c4_1000BRAINS_Significant_GWAS.csv','SHA4a85c5_ADNI2GO_Significant_GWAS.csv','SHA4c45c5_BONN_Significant_GWAS.csv','SHA4ad5c5_BrainScale_Significant_GWAS.csv','SHA4a75c5_DNS-V3_Significant_GWAS.csv','SHA4935c5_GSP_Significant_GWAS.csv','SHA4925c5_HUNT_Significant_GWAS.csv','SHA4b85c5_IMAGEN_Significant_GWAS.csv','SHA4bf5c5_ImpACT_Significant_GWAS.csv','SHA4b45c5_LBC1936_Significant_GWAS.csv','SHA48a5c5_LIBD_Significant_GWAS.csv','SHA4905c5_MooDS_Significant_GWAS.csv','SHA4945c5_MUNSTER_Significant_GWAS.csv','SHA4865c5_NCNG_Significant_GWAS.csv','SHA4835c5_NESDA_Significant_GWAS.csv','SHA4f55c5_NeuroIMAGE_Significant_GWAS.csv','SHA46d5c5_NTR_Significant_GWAS.csv','SHA4865c5_PDNZ_Significant_GWAS.csv','SHA47a5c5_PING_Significant_GWAS.csv','SHA3f15c5_PPMI_Significant_GWAS.csv','SHA4875c5_QTIM_Significant_GWAS.csv','SHA4bb5c5_SHIP_Significant_GWAS.csv']\\n - demographic_min = 0\\n - area = PrecentralCortex\\n - demographic_value = ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4','38.2','10','19.7','21.4','58.9','14.6','40.8','72.7','33.2','33.6','35.8','51.6','37.5','17.1','29.4','68.2','11.8','61.7','22.4','55.8']\\n - trait = SA\\n - demographic = HasAge Mean (E)\\n - demographic_max = 0\\n - snp = rs1080066\"}]\n", "[{'tloi.text': \"[Triggered Line of Inquiry]\\nID: TriggeredLOI-E8PbUbCdZB4K\\nGoal ID: Goal-XBkQmDYmJAn0\\nLine of Inquiry ID: LOI-Q7zw0HsrUwwD\\nQuestion Template: Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\\nWorkflow Name: Meta-Regression\\nWorkflow Description: Meta regression is considered as an extended statistical model of meta analysis, regressing the effect size against variable(s) of interest to account for the systematic differences of the effect sizes being meta-analyzed\\nConfidence value: 0.044376539244712e0 (p-Value)\\nInputs: \\n - cohortData = ['SHA47a5c5_ASRB_Significant_GWAS.csv','SHA4b15c5_FOR2107_Significant_GWAS.csv','SHA4b15c5_HUBIN_Significant_GWAS.csv','SHA4835c5_MCIC_Significant_GWAS.csv','SHA4825c5_MPRC_Significant_GWAS.csv','SHA4a65c5_PAFIP_Significant_GWAS.csv','SHA4865c5_TOP_Significant_GWAS.csv','SHA4335c5_UMCU_Significant_GWAS.csv','SHA4fe5c4_1000BRAINS_Significant_GWAS.csv','SHA4a85c5_ADNI2GO_Significant_GWAS.csv','SHA4c45c5_BONN_Significant_GWAS.csv','SHA4ad5c5_BrainScale_Significant_GWAS.csv','SHA4a75c5_DNS-V3_Significant_GWAS.csv','SHA4935c5_GSP_Significant_GWAS.csv','SHA4925c5_HUNT_Significant_GWAS.csv','SHA4b85c5_IMAGEN_Significant_GWAS.csv','SHA4bf5c5_ImpACT_Significant_GWAS.csv','SHA4b45c5_LBC1936_Significant_GWAS.csv','SHA48a5c5_LIBD_Significant_GWAS.csv','SHA4905c5_MooDS_Significant_GWAS.csv']\\n - area = PrecentralCortex\\n - demographic = HasAge Mean (E)\\n - snp = rs1080066\\n - demographic_max = 0\\n - trait = SA\\n - demographic_value = ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4','38.2','10','19.7','21.4','58.9','14.6','40.8','72.7','33.2','33.6']\\n - demographic_min = 0\"}]\n", "[{'tloi.text': \"[Triggered Line of Inquiry]\\nID: TriggeredLOI-bfjxauj6EcYQ\\nGoal ID: Goal-XBkQmDYmJAn0\\nLine of Inquiry ID: LOI-Q7zw0HsrUwwD\\nQuestion Template: Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\\nWorkflow Name: Meta-Regression\\nWorkflow Description: Meta regression is considered as an extended statistical model of meta analysis, regressing the effect size against variable(s) of interest to account for the systematic differences of the effect sizes being meta-analyzed\\nConfidence value: 0.263176970715283e0 (p-Value)\\nInputs: \\n - cohortData = ['SHA47a5c5_ASRB_Significant_GWAS.csv','SHA4b15c5_FOR2107_Significant_GWAS.csv','SHA4b15c5_HUBIN_Significant_GWAS.csv','SHA4835c5_MCIC_Significant_GWAS.csv','SHA4825c5_MPRC_Significant_GWAS.csv','SHA4a65c5_PAFIP_Significant_GWAS.csv','SHA4865c5_TOP_Significant_GWAS.csv','SHA4335c5_UMCU_Significant_GWAS.csv','SHA4fe5c4_1000BRAINS_Significant_GWAS.csv','SHA4a85c5_ADNI2GO_Significant_GWAS.csv']\\n - demographic_max = 0\\n - area = PrecentralCortex\\n - demographic = HasAge Mean (E)\\n - demographic_value = ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4']\\n - trait = SA\\n - demographic_min = 0\\n - snp = rs1080066\"}]\n" ] } ], "source": [ "add_simpler_text_tloi = \"\"\"\n", "MATCH (tloi:TriggeredLineOfInquiry {id: $id})\n", "SET tloi.text = $text\n", "RETURN tloi.text\"\"\"\n", "\n", "for ID in tloi_text_simpler.keys():\n", " print(neo4j.query(add_simpler_text_tloi, params={'id':ID, 'text': tloi_text_simpler[ID]}))" ] }, { "cell_type": "code", "execution_count": 150, "id": "27dd267a-86db-46c4-8b31-0ed84e2f2082", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "neo4j.query(\"\"\"\n", " CREATE VECTOR INDEX `tloi_simple_text` IF NOT EXISTS\n", " FOR (t:TriggeredLineOfInquiry) ON (t.simpleTextEmbedding) \n", " OPTIONS { indexConfig: {\n", " `vector.dimensions`: 1536,\n", " `vector.similarity_function`: 'cosine' \n", " }}\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 151, "id": "d0ea9bd8-2d74-446e-8ce7-cad4163c0f41", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n", "[]\n", "[]\n", "[]\n", "[]\n", "[]\n", "[]\n" ] } ], "source": [ "add_simpler_embedings_tloi = \"\"\"\n", " MATCH (tloi:TriggeredLineOfInquiry {id:$id}) WHERE tloi.simpleTextEmbedding IS NULL\n", " CALL apoc.ml.openai.embedding([tloi.text], $apiKey) yield embedding\n", " CALL db.create.setNodeVectorProperty(tloi, \"simpleTextEmbedding\", embedding)\n", "\"\"\"\n", "\n", "for ID in tloi_text_simpler.keys():\n", " print(neo4j.query(add_simpler_embedings_tloi, params={\"apiKey\":OPENAI_API_KEY, 'id':ID}))\n", " sleep(2)" ] }, { "cell_type": "code", "execution_count": 170, "id": "6dda6078-c28b-4407-8ee9-cd56f54664b4", "metadata": {}, "outputs": [], "source": [ "#More data for tlois:\n", "# Try again, only goal description as loi is too long.\n", "retrieval_query = \"\"\"\n", " OPTIONAL MATCH (tloi:TriggeredLineOfInquiry) -[hasGoal]-> (g:Goal)\n", " RETURN g.fullText as text, score, {} AS metadata\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 174, "id": "5077ca9f-556d-4033-bc52-8b96a1c30bb2", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning} {category: DEPRECATION} {title: This feature is deprecated and will be removed in future versions.} {description: The query used a deprecated procedure. ('db.create.setVectorProperty' has been replaced by 'db.create.setNodeVectorProperty')} {position: line: 1, column: 84, offset: 83} for query: \"UNWIND $data AS row MATCH (n:`TriggeredLineOfInquiry`) WHERE elementId(n) = row.id CALL db.create.setVectorProperty(n, 'simpleTextEmbedding', row.embedding) YIELD node RETURN count(*)\"\n" ] } ], "source": [ "#Simpler retriever\n", "neo4j_vector_store_tloi_s = Neo4jVector.from_existing_graph(\n", " embedding=OpenAIEmbeddings(),\n", " url=NEO4J_URI,\n", " username=NEO4J_USERNAME,\n", " password=NEO4J_PASSWORD,\n", " index_name='tloi_simple_text',\n", " node_label=\"TriggeredLineOfInquiry\",\n", " text_node_properties=[\"text\"],\n", " embedding_node_property=\"simpleTextEmbedding\",\n", " #retrieval_query=retrieval_query,\n", ")\n", "tloi_retriever_simpler = neo4j_vector_store_tloi_s.as_retriever()" ] }, { "cell_type": "code", "execution_count": 175, "id": "adfc0046-7df2-49db-9441-393ef827caac", "metadata": {}, "outputs": [], "source": [ "retriever = tloi_retriever_simpler\n", "llm = ChatOpenAI()\n", "\n", "system_prompt = TEXT_CONTEXT + (\n", " \"Use the given context to answer the question. \"\n", " \"If you don't know the answer, say you don't know. \"\n", " #\"Use three sentence maximum and keep the answer concise. \"\n", " \"Context: {context}\"\n", ")\n", "prompt = ChatPromptTemplate.from_messages(\n", " [\n", " (\"system\", system_prompt),\n", " (\"human\", \"{input}\"),\n", " ]\n", ")\n", "question_answer_chain = create_stuff_documents_chain(llm, prompt)\n", "chain = create_retrieval_chain(retriever, question_answer_chain)" ] }, { "cell_type": "code", "execution_count": 176, "id": "05232f44-ec7b-445c-98e0-3e93bab284ef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Triggered Line of Inquiry with the lower p-value is the one with ID TriggeredLOI-j5QRPbmS5u61, which has a confidence value of 0.020618776795934e0 (p-Value).\n" ] } ], "source": [ "# Lets try the same question:\n", "resp = chain.invoke({\"input\": \"What is the Triggered Line of Inquiry with the lower p-value?\"})\n", "print(resp['answer'])" ] }, { "cell_type": "markdown", "id": "aa783d44-aca2-4e4c-9d21-3314631b857b", "metadata": {}, "source": [ "Better, but the min p-value is 0.018198423339433e0 for TriggeredLOI-fEdISYTbY6OC" ] }, { "cell_type": "code", "execution_count": 160, "id": "d93a922f-082f-43be-9485-2abe6e6f87cb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Triggered Line of Inquiry with the lower confidence value is TriggeredLOI-j5QRPbmS5u61 with a confidence value of 0.020618776795934e0 (p-Value).\n" ] } ], "source": [ "resp = chain.invoke({\"input\": \"What is the Triggered Line of Inquiry with the lower confidence value?\"})\n", "print(resp['answer'])" ] }, { "cell_type": "code", "execution_count": 177, "id": "d36baa60-61cb-451f-a065-0238891b0366", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TriggeredLOI-j5QRPbmS5u61 is associated with the Goal ID: Goal-XBkQmDYmJAn0 and Line of Inquiry ID: LOI-Q7zw0HsrUwwD. The question template for this triggered line of inquiry is \"Is the effect size of ?Genotype on ?BrainImagingTrait of ?Region associated with ?DemographicAttribute for cohorts groups filtered by ?Criterion for ?Value\". The workflow used is Meta-Regression, which involves regressing the effect size against variable(s) of interest to account for systematic differences in effect sizes being meta-analyzed.\n", "\n", "The confidence value for this triggered line of inquiry is 0.020618776795934e0 (p-Value). \n", "\n", "The inputs used for this execution include:\n", "- cohortData: A list of significant GWAS files from various cohorts\n", "- area: PrecentralCortex\n", "- trait: SA\n", "- demographic: HasAge Mean (E)\n", "- demographic_max: 0\n", "- demographic_min: 0\n", "- snp: rs1080066\n", "- demographic_value: A list of demographic values\n", "\n", "These inputs were used to customize the data query template and workflow seed for the execution associated with TriggeredLOI-j5QRPbmS5u61.\n" ] } ], "source": [ "resp = chain.invoke({\"input\": \"Give me a summary of TriggeredLOI-j5QRPbmS5u61, include information about the execution and input files used\"})\n", "print(resp['answer'])" ] }, { "cell_type": "code", "execution_count": 178, "id": "42b120e0-3441-4e18-b5c8-a13770b37a29", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The question template with all variables replaced for values in TriggeredLOI-j5QRPbmS5u61 is:\n", "\"Is the effect size of rs1080066 on SA of PrecentralCortex associated with HasAge Mean (E) for cohorts groups filtered by 0 for ['38.5','34.4','41.9','33.7','37.2','28.3','35.2','33.1','67.3','72.4']\"\n" ] } ], "source": [ "resp = chain.invoke({\"input\": \"For TriggeredLOI-j5QRPbmS5u61, give me the question template with all variables replaced for values\"})\n", "print(resp['answer'])" ] }, { "cell_type": "markdown", "id": "37c5d919-921b-4b4c-8799-b5b5fa37fa17", "metadata": {}, "source": [ "Almost correct, is not able to determine what is the ethnic value as those information is in the question graph, we have not load it yet." ] }, { "cell_type": "markdown", "id": "1cea502d-423c-44be-866f-79fd3caccfca", "metadata": {}, "source": [ "## Approach 2: Only text" ] }, { "cell_type": "code", "execution_count": 179, "id": "d7868b72-94c8-4903-9cb5-3c3355ae77bf", "metadata": {}, "outputs": [], "source": [ "FULL_TEXT = TEXT_CONTEXT" ] }, { "cell_type": "code", "execution_count": 180, "id": "f3984701-711d-419e-93a6-b7847a0e8b4d", "metadata": {}, "outputs": [], "source": [ "for key in goal_text:\n", " part = goal_text[key].replace('http://localhost:8080/disk-project-server/goals/','')\n", " #print(part)\n", " FULL_TEXT += '\\n' + part" ] }, { "cell_type": "code", "execution_count": 181, "id": "671b2083-3064-4c5f-9ef3-3e25556f6a4d", "metadata": { "scrolled": true }, "outputs": [], "source": [ "for key in loi_text:\n", " part = loi_text[key].replace('http://localhost:8080/disk-project-server/lois/','')\n", " #print(part)\n", " FULL_TEXT += '\\n' + part" ] }, { "cell_type": "code", "execution_count": 182, "id": "21f611ab-ffda-47e2-b2c6-73c5f1d7dc11", "metadata": { "scrolled": true }, "outputs": [], "source": [ "for key in tloi_text:\n", " part = tloi_text[key].replace('http://localhost:8080/disk-project-server/tlois/','').replace('http://localhost:8080/wings-portal/export/users/admin/Enigma/data/library.owl#','')\n", " part = part.replace('http://localhost:8080/disk-project-server/goals/','').replace('http://localhost:8080/disk-project-server/lois/','')\n", " #print(part)\n", " FULL_TEXT += '\\n' + part" ] }, { "cell_type": "code", "execution_count": 125, "id": "399a0725-2eba-4944-a27a-4a12ead57d37", "metadata": {}, "outputs": [], "source": [ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n", "# Global constants\n", "VECTOR_INDEX_NAME = 'disk_chunks'\n", "VECTOR_NODE_LABEL = 'Chunk'\n", "VECTOR_SOURCE_PROPERTY = 'text'\n", "VECTOR_EMBEDDING_PROPERTY = 'textEmbedding'" ] }, { "cell_type": "code", "execution_count": 126, "id": "e578c5b2-1298-4b18-9c44-9bbc24c7880c", "metadata": {}, "outputs": [], "source": [ "text_splitter = RecursiveCharacterTextSplitter(\n", " chunk_size = 2000,\n", " chunk_overlap = 200,\n", " length_function = len,\n", " is_separator_regex = False,\n", ")" ] }, { "cell_type": "code", "execution_count": 127, "id": "2d02d90d-0501-45f8-9633-a88ce6b0437e", "metadata": {}, "outputs": [], "source": [ "text_chunks = text_splitter.split_text(FULL_TEXT)" ] }, { "cell_type": "code", "execution_count": 134, "id": "3f8a2d56-3f39-47a7-92c2-93aef2ed6166", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[GENERAL CONTEXT]\\nA Question Template is a text representation of possible questions the DISK system is able to test.\\nQuestion Templates contains one or more Question Variables that are denoted by the prefix “?” (e.g ?Genotype is the Question Variable \"Genotype\").\\nQuestion Variables provide multiple options retrieved from the data source. Users can select option values to customize the Question Template. \\n\\nA Goal is what a DISK user wants to test. Goals are identified by an ID and have Name and Description.\\nGoals follow a Question Template and provide values for all of its Question Variables.\\n\\nA Line of Inquiry is how DISK will test a Question Template. Lines of inquiry are identified by ID and have the follorwing properties: Name, Description, Data Query Template and Workflow Seed.\\nLines of Inquiry follow a Question Template and use Question Variable values to customize its Data Query Template and Workflow Seed.\\n\\nWhen the DISK system finds a Goal and a Line of Inquiry that follows the same Question template, a Triggered Line of Inquiry is created.\\nA Triggered Line of Inquiry is identified by an ID, Data Query and Workflow Instantiation.\\nThe Triggered Line of Inquiry Data Query is created by using the Goal Question Variable Values to customize the Line of Inquiry Data Query Template. \\nThis data query is used to retrieve inputs and parameters to use on the Workflow Seed. When all parameters and inputs are set, a new Execution is send.\\nThis data query is executed periodically and when new data is found a new Triggered Line of Inquiry is created.\\n\\nAn Execution is a workflow run. Uses the data gathered by the Triggered Line of Inquiry to customize the run of an experiment.\\nThis experiment can return a confidence value and one or several output files.'" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text_chunks[0]" ] }, { "cell_type": "markdown", "id": "76ba272b-b9be-4368-b662-5c6082ceb952", "metadata": {}, "source": [ "Idea: Generate only one node for all kind of text info" ] }, { "cell_type": "code", "execution_count": null, "id": "3d7bc988-409d-4fa8-94c7-4f84c4f9fef4", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 5 }