{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Find my labs experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'd really like to get all of my labs experiments and I think using a dedicated query language can make such operations easier.\n", "\n", " 1. [Setup python environment](#Setup)\n", " 1. [Define JSON-LD Import code](#JSON-LD-Importer)\n", " 1. [Define Context](#Context)\n", " 1. [Create RDF Model](#Create-RDF-Model) and import experiments collection\n", " 1. [Query](#Query)\n", " 1. [Query Experiments](#Query-Experiments)\n", " 1. [Load Experiment Detail](#Load-Experiment-Detail)\n", " 1. [Query Biosamples](#Query-Biosamples)\n", " 1. [Query Nucleic Acid Term](#Query-Nucleic-Acid-Term)\n", " 1. [Query SO](#Query-SO)\n", " 1. [Search for fastqs](#Search-for-Fastqs)" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initial setup for this notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The RDF library I've been using is [Redland](http://librdf.org/). The first JSON-LD library I found on PyPI was [PyLD](https://pypi.python.org/pypi/PyLD/0.4.9) however returns a structure that Redland doesn't directly read, so I needed to provide an adaptor. Some of the other JSON-LD [language bindings](http://json-ld.org/#developers) may be a bit more convienent. There's a pure python RDF package called [RDFLib](https://github.com/RDFLib/rdflib) that has a JSON-LD parser." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import collections\n", "from hashlib import md5\n", "import json\n", "import os\n", "import netrc\n", "from pprint import pformat\n", "import requests\n", "import RDF\n", "from StringIO import StringIO\n", "import sys\n", "import time\n", "import types\n", "import urllib\n", "from urlparse import urlsplit, urlunsplit, urljoin, parse_qs" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "# point to where I extracted PyLD\n", "pyld = os.path.expanduser('~/src/PyLD-0.4.9/lib')\n", "if pyld not in sys.path:\n", " sys.path.append(pyld)\n", "from pyld import jsonld" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Store password outside of code being shared using a netrc file.\n", "\n", "[Keyring](https://pypi.python.org/pypi/keyring) may be a better solution as it encrypts the password. However it requires that you query by url and user id, and I was having trouble getting to work with the KDE keychain." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ENCODE_HOST = 'submit.encodedcc.org'\n", "USER_ID, _, PASSWD = netrc.netrc().authenticators(ENCODE_HOST)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "I got tired of typing long urls, so this tries to fix up a url fragment with defaults intelligent enough for this document." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def prepare_url(request_url, **kwargs):\n", " '''This attempts to provide some convienence for accessing a URL\n", " \n", " Given a url fragment it will default to :\n", " * requests over http\n", " * requests to submit.encodedcc.org (ENCODE_HOST from above)\n", " * appending limit=all to the query string\n", " \n", " This allows fairly flexible urls. e.g.\n", " \n", " prepare_url('/experiments/ENCSR000AEG')\n", " prepare_url('submit.encodedcc.org/experiments/ENCSR000AEG')\n", " prepare_url('http://submit.encodedcc.org/experiments/ENCSR000AEG?limit=all')\n", " \n", " should all return the same url\n", " '''\n", " # clean up potentially messy urls\n", " url = urlsplit(request_url)._asdict()\n", " if not url['scheme']:\n", " url['scheme'] = 'http'\n", " if not url['netloc']:\n", " url['netloc'] = ENCODE_HOST\n", " if url['query']:\n", " kwargs.update(parse_qs(url.qs))\n", " url['query'] = urllib.urlencode(kwargs)\n", " url = urlunsplit(url.values()) \n", " return url\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just a little bit more than the DCC's example for accessing objects at submit." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def get_ENCODE(obj_id, **kwargs):\n", " '''GET an ENCODE object as JSON and return as dict\n", " \n", " Uses prepare_url to allow url short-cuts\n", " \n", " This will also encode additional keyword arguments in the query string.\n", " '''\n", " if len(kwargs) == 0:\n", " kwargs['limit'] = 'all'\n", " \n", " url = prepare_url(obj_id, **kwargs)\n", " print 'requesting:', url\n", " \n", " # do the request\n", " headers = {'content-type': 'application/json'}\n", " response = requests.get(url, auth=(USER_ID, PASSWD), headers=headers)\n", " if not response.status_code == requests.codes.ok:\n", " print >> sys.stderr, response.text\n", " response.raise_for_status()\n", " return response.json()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "test_req = get_ENCODE('http://submit.encodedcc.org/labs/barbara-wold/')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "requesting: http://submit.encodedcc.org/labs/barbara-wold/?limit=all\n" ] } ], "prompt_number": 6 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "JSON-LD Importer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ENCODE3 submit tool encoded returns a JSON structure that's been partially formatted as JSON-LD, in that many objects have `@id` and `@type` attributes. However by using the JSON-LD Context we can clean up the structure a bit more.\n", "\n", "My entry point for importing ENCODE3 objexts is `load_LD_ENCODE`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def load_LD_ENCODE(model, url, contexts, **kwargs):\n", " \"\"\"Load an encode json object at url into our model.\n", " \n", " This Uses the provided context dictionary to improve the json to json-ld.\n", " \n", " Additional keyword arguments can also be provided and will be added\n", " to the request's query string\n", " \"\"\"\n", " resource = prepare_url(url)\n", " print 'resource:', resource\n", " # the encode site kept 500-ing on me. so lets throw in a quick and dirty cache\n", " resource_cache = 'cache/' + md5(resource).hexdigest()+'.cache.json'\n", " if os.path.exists(resource_cache):\n", " with open(resource_cache, 'r') as instream:\n", " data = json.loads(instream.read())\n", " else:\n", " data = get_ENCODE(resource, **kwargs)\n", " with open(resource_cache, 'w') as outstream:\n", " outstream.write(json.dumps(data))\n", " addContextToEncodedTree(data, contexts, base=resource)\n", " loadJSONintoModel(model, data)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "def addContextToEncodedTree(tree, contexts, base=None):\n", " \"\"\"Add contexts to various objects in the tree.\n", " \n", " tree is a json tree returned from the DCC's encoded database.\n", " contexts is a dictionary of dictionaries containing contexts \n", " for the various possible encoded classes.\n", " base, if supplied allows setting the base url that relative \n", " urls will be resolved against.\n", " \"\"\"\n", " tree['@context'] = contexts[None]\n", " if base:\n", " tree['@context']['@base'] = base\n", " addContextToEncodedChild(tree, contexts)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "def addContextToEncodedChild(obj, contexts):\n", " '''Add JSON-LD context to the encoded JSON.\n", " \n", " This is recursive becuase some of the IDs were relative URLs\n", " and I needed a way to properly compute a the correct base URL.\n", " '''\n", " # pretend strings aren't iterable\n", " if type(obj) in types.StringTypes:\n", " return\n", " \n", " # recurse on container types\n", " if isinstance(obj, collections.Sequence):\n", " # how should I update lists?\n", " for v in obj:\n", " addContextToEncodedChild(v, contexts)\n", " return\n", " \n", " if isinstance(obj, collections.Mapping):\n", " for v in obj.values():\n", " addContextToEncodedChild(v, contexts)\n", " \n", " # we have an object. attach a context to it.\n", " if isEncodedObject(obj):\n", " default_base = contexts[None]['@base']\n", " context = {'@base': urljoin(default_base, obj['@id'])}\n", " for t in obj['@type']:\n", " if t in contexts:\n", " context.update(contexts[t])\n", " if len(context) > 0:\n", " obj.setdefault('@context', {}).update(context)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "def isEncodedObject(obj):\n", " '''Test to see if an object is a JSON-LD object\n", " \n", " Some of the nested dictionaries lack the @id or @type\n", " information necessary to convert them.\n", " '''\n", " if not isinstance(obj, collections.Iterable):\n", " return False\n", " \n", " if '@id' in obj and '@type' in obj:\n", " return True\n", " return False" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "def loadJSONintoModel(model, json_data):\n", " '''Given a PyLD dictionary, load its statements into our model\n", " '''\n", " json_graphs = jsonld.to_rdf(json_data)\n", " for graph in json_graphs:\n", " for triple in json_graphs[graph]:\n", " s = pyldToNode(triple['subject'])\n", " p = pyldToNode(triple['predicate'])\n", " o = pyldToNode(triple['object'])\n", " stmt = RDF.Statement(s, p, o)\n", " model.add_statement(stmt) #, graph_context)\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "def pyldToNode(item):\n", " '''Convert a PyLD node to a Redland node'''\n", " nodetype = item['type']\n", " value = item['value']\n", " datatype = item.get('datatype', None)\n", "\n", " if nodetype == 'blank node':\n", " return RDF.Node(blank=value)\n", " elif nodetype == 'IRI':\n", " return RDF.Node(uri_string=str(value))\n", " else:\n", " return RDF.Node(literal=unicode(value).encode('utf-8'), \n", " datatype=RDF.Uri(datatype))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Context" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: the `contexts[None]` indicates default that will be attached to the root of the tree" ] }, { "cell_type": "code", "collapsed": false, "input": [ "contexts = {\n", " # The None context will get added to the root of the tree and will\n", " # provide common defaults.\n", " None: {\n", " # give this context a default descriptive url. \n", " # I wish encoded's rendering supported fragment linking\n", " '@vocab': 'http://submit.encodedcc.org/profiles/experiment.json#',\n", " \n", " # terms in multiple encoded objects\n", " \"lab\": { \"@type\": \"@id\" },\n", " \"pi\": { \"@type\": \"@id\" },\n", " \"description\": \"rdf:description\",\n", " 'href': { '@type': '@id' },\n", " 'url': { '@type': '@id' },\n", " },\n", " # Identify and markup contained classes.\n", " # e.g. in the tree there was a sub-dictionary named 'biosample'\n", " # That dictionary had a term 'biosample_term_id, which is the\n", " # term that should be used as the @id.\n", " 'biosample': {\n", " 'biosample_term_id': { '@type': '@id' },\n", " },\n", " 'experiment': {\n", " \"assay_term_id\": { \"@type\": \"@id\" },\n", " },\n", " # I tried to use the JSON-LD mapping capabilities to convert the lab\n", " # contact information into a vcard record, but the encoded model \n", " # didn't lend itself well to the vcard schema\n", " #'lab': {\n", " # \"address1\": \"vcard:street-address\",\n", " # \"address2\": \"vcard:street-address\",\n", " # \"city\": \"vcard:locality\",\n", " # \"state\": \"vcard:region\",\n", " # \"country\": \"vcard:country\" \n", " #},\n", " 'human_donor': {\n", " 'award': { '@type': '@id' },\n", " },\n", " 'library': {\n", " 'award': { '@type': '@id' },\n", " 'nucleic_acid_term_id': { '@type': '@id' }\n", " }\n", "}\n", "\n", "namespaces = {\n", " # JSON-LD lets you define namespaces so you can used the shorted url syntax.\n", " # (instead of http://www.w3.org/2000/01/rdf-schema#label you can do \n", " # rdfs:label)\n", " \"rdf\": \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\",\n", " \"rdfs\": \"http://www.w3.org/2000/01/rdf-schema#\",\n", " \"owl\": \"http://www.w3.org/2002/07/owl#\",\n", " \"dc\": \"htp://purl.org/dc/elements/1.1/\",\n", " \"xsd\": \"http://www.w3.org/2001/XMLSchema#\",\n", " \"vcard\": \"http://www.w3.org/2006/vcard/ns#\",\n", " \n", " # for some namespaces I made a best guess for the ontology root.\n", " \"EFO\": \"http://www.ebi.ac.uk/efo/\", # EFO ontology\n", " \"OBO\": \"http://purl.obolibrary.org/obo/\", # OBO ontology\n", " \"OBI\": \"http://purl.obolibrary.org/obo/OBI_\", # Ontology for Biomedical Investigations\n", " # OBI: available from http://svn.code.sf.net/p/obi/code/releases/2012-07-01/merged/merged-obi-comments.owl\n", " 'SO': 'http://purl.obolibrary.org/obo/SO_', # Sequence ontology\n", " # SO: available from http://www.berkeleybop.org/ontologies/so.owl\n", " \n", " # make a fake shortening for this ontology\n", " 'encode3exp': 'http://submit.encodedcc.org/profiles/experiment.json#'\n", "}\n", "\n", "contexts[None].update(namespaces)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Create RDF Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this notebook I'm just creating a memory model, Redland supports a variety of other [storage types](http://librdf.org/docs/api/redland-storage-modules.html)\n", "\n", "I recently did some [performance testing](http://ghic.org/~diane/compare-triple-loading.html), and learned that the RDF.HashStorage is quite fast as long as you don't use named graphs. At least with Redland librdf 1.0.16 something goes terribly slowly if you enable the context graph. (E.g. loading the 30k triples for from the experiment summary takes ~1.5 seconds if you use the default hash, and ~1500 seconds if you turn on the context graph)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "storage = RDF.MemoryStorage()\n", "model = RDF.Model(storage)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "load_LD_ENCODE(model, '/experiments/', contexts)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "resource: http://submit.encodedcc.org/experiments/\n" ] } ], "prompt_number": 15 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Query" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Query Experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets look for possibly useful terms in the experiment collection\n", "\n", "First lets define a short helper function to display the result of running a RDF.SPARQLQuery" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def query_model(model, query):\n", " '''Execute a sparql query on the model.\n", " \n", " The namespace dictionary provides default shortend urls\n", " in the query.\n", " '''\n", " q = RDF.SPARQLQuery(query)\n", " for i, row in enumerate(q.execute(model)):\n", " if i == 0:\n", " print '\\t'.join(row.keys())\n", " print '\\t'.join(['-' * len(k) for k in row]) # should make a seperator line\n", " print '\\t'.join((str(row[k]) for k in row))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "When exploring a new RDF Model, There's a few queries I like to do to see what's available.\n", "\n", "First off is the `?s a ?type` query. The keyword `a` is meant to be read as `is-a`, so the following query finds all the object classes that were defined in this model.\n", "\n", "Distinct is a useful option that suppresses duplicates. The query below is also finding all the possible object IDs but isn't reporting them. However without the distinct we'd still end up with every `?type` triple in the dataset." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "select distinct ?type\n", "where {\n", " ?s a ?type\n", "}\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "type\n", "----\n", "http://submit.encodedcc.org/profiles/experiment.json#experiment\n", "http://submit.encodedcc.org/profiles/experiment.json#item\n", "http://submit.encodedcc.org/profiles/experiment.json#experiment_collection\n", "http://submit.encodedcc.org/profiles/experiment.json#collection\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "/usr/lib/python2.7/dist-packages/RDF.py:1995: RedlandWarning: Variable s was bound but is unused in the query\n", " results = Redland.librdf_query_execute(self._query,model._model)\n" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next query is to see what opject properties are defined." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "select distinct ?p\n", "where {\n", " ?s ?p ?o\n", "}\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "p\n", "-\n", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type\n", "http://submit.encodedcc.org/profiles/experiment.json#accession\n", "http://submit.encodedcc.org/profiles/experiment.json#assay_term_name\n", "http://submit.encodedcc.org/profiles/experiment.json#award.rfa\n", "http://submit.encodedcc.org/profiles/experiment.json#biosample_term_name\n", "http://submit.encodedcc.org/profiles/experiment.json#files.length\n", "http://submit.encodedcc.org/profiles/experiment.json#lab.title\n", "http://submit.encodedcc.org/profiles/experiment.json#replicates.length\n", "http://submit.encodedcc.org/profiles/experiment.json#target.label\n", "http://submit.encodedcc.org/profiles/experiment.json#condition\n", "http://submit.encodedcc.org/profiles/experiment.json#href\n", "http://submit.encodedcc.org/profiles/experiment.json#method\n", "http://submit.encodedcc.org/profiles/experiment.json#name\n", "http://submit.encodedcc.org/profiles/experiment.json#profile\n", "http://submit.encodedcc.org/profiles/experiment.json#title\n", "http://submit.encodedcc.org/profiles/experiment.json#actions\n", "http://submit.encodedcc.org/profiles/experiment.json#columns\n", "http://www.w3.org/1999/02/22-rdf-syntax-ns#description\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "/usr/lib/python2.7/dist-packages/RDF.py:1995: RedlandWarning: Variable o was bound but is unused in the query\n", " results = Redland.librdf_query_execute(self._query,model._model)\n" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since my goal was to try and find what experiments were associated with my lab the #lab.title term looks promising, so lets examine those." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "select distinct ?lab\n", "where {\n", " ?s ?lab\n", "}\n", "order by ?lab''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "lab\n", "---\n", "Ali Mortazavi, UCI\n", "Barbara Wold, Caltech\n", "Bing Ren, UCSD\n", "Bradley Bernstein, Broad\n", "Brenton Graveley, UConn\n", "David Gilbert, FSU\n", "Gregory Crawford, Duke\n", "J. Michael Cherry, Stanford\n", "Jason Lieb, UNC\n", "Job Dekker, UMass\n", "John Stamatoyannopoulos, UW\n", "Kevin Struhl, HMS\n", "Kevin White, UChicago\n", "Lab\n", "Michael Snyder, Stanford\n", "Peggy Farnham, USC\n", "Piero Carninci, RIKEN\n", "Richard Myers, HAIB\n", "Ross Hardison, PennState\n", "Scott Tenenbaum, SUNY-Albany\n", "Sherman Weissman, Yale\n", "Thomas Gingeras, CSHL\n", "Vishwanath Iyer, UTA\n", "Yijun Ruan, GIS\n" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we're getting somewhere, lets get all the experiment ids for my lab. At least with the Redland query engine I need to use a `FILTER` with the `regex` function to limit the records returned." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "select ?experiment ?lab\n", "where {\n", " ?experiment ?lab.\n", " filter(regex(?lab, \"barbara wold\", \"i\")) \n", "}\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "experiment\tlab\n", "----------\t---\n", "http://submit.encodedcc.org/experiments/ENCSR000AEG/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AEH/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AEP/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AEQ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHL/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHM/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHN/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHO/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHP/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHQ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHR/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHS/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHT/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHU/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHV/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHW/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHX/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHY/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AHZ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIA/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIB/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIC/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AID/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIE/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIF/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIG/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIH/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AII/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIJ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIK/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIL/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIM/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIN/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIO/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIP/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIQ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIR/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIS/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIT/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIU/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIV/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIW/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIX/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIY/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AIZ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJA/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJB/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJC/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJD/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJE/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJF/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJG/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJH/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJN/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJO/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJP/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJQ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJR/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJS/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000AJT/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWK/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWL/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWM/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWN/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWO/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWP/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWQ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000CWR/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000EYN/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000EYO/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000EYP/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000EYQ/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000EYR/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000EYS/\tBarbara Wold, Caltech\n", "http://submit.encodedcc.org/experiments/ENCSR000EYT/\tBarbara Wold, Caltech\n" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok so that's my labs data, lets import it.\n", "\n", "Also this is the kind of operation that makes webmasters sad, as I'm loading a bunch of large objects as fast as their webserver will hand them to me. Because when I first started doing this there was a burst of 500-errors I added a super simple cache algorithm to the loader.\n", "\n", "I wonder what a good solution for caching results and checking to see if they're still fresh is." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Load Experiment Detail" ] }, { "cell_type": "code", "collapsed": false, "input": [ "q = RDF.SPARQLQuery('''\n", "select ?experiment\n", "where {\n", " ?experiment ?lab.\n", " filter(regex(?lab, \"barbara wold\", \"i\")) \n", "}''')\n", "for row in q.execute(model):\n", " load_LD_ENCODE(model, str(row['experiment']), contexts)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "resource: http://submit.encodedcc.org/experiments/ENCSR000AEG/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AEH/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AEP/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AEQ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHL/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHM/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHN/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHO/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHP/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHQ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHR/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHS/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHT/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHU/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHV/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHW/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHX/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHY/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AHZ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIA/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIB/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIC/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AID/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIE/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIF/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIG/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIH/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AII/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIJ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIK/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIL/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIM/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIN/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIO/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIP/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIQ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIR/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIS/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIT/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIU/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIV/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIW/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIX/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIY/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AIZ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJA/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJB/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJC/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJD/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJE/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJF/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJG/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJH/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJN/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJO/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJP/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJQ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJR/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJS/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000AJT/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWK/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWL/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWM/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWN/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWO/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWP/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWQ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000CWR/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000EYN/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000EYO/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000EYP/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000EYQ/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000EYR/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000EYS/\n", "resource:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " http://submit.encodedcc.org/experiments/ENCSR000EYT/\n" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now what's in our model now?\n", "\n", "Get all the classes..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "select distinct ?type\n", "where {\n", " ?s a ?type\n", "}\n", "order by ?type\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "type\n", "----\n", "http://submit.encodedcc.org/profiles/experiment.json#antibody_lot\n", "http://submit.encodedcc.org/profiles/experiment.json#award\n", "http://submit.encodedcc.org/profiles/experiment.json#biosample\n", "http://submit.encodedcc.org/profiles/experiment.json#collection\n", "http://submit.encodedcc.org/profiles/experiment.json#document\n", "http://submit.encodedcc.org/profiles/experiment.json#donor\n", "http://submit.encodedcc.org/profiles/experiment.json#experiment\n", "http://submit.encodedcc.org/profiles/experiment.json#experiment_collection\n", "http://submit.encodedcc.org/profiles/experiment.json#file\n", "http://submit.encodedcc.org/profiles/experiment.json#human_donor\n", "http://submit.encodedcc.org/profiles/experiment.json#item\n", "http://submit.encodedcc.org/profiles/experiment.json#lab\n", "http://submit.encodedcc.org/profiles/experiment.json#library\n", "http://submit.encodedcc.org/profiles/experiment.json#mouse_donor\n", "http://submit.encodedcc.org/profiles/experiment.json#organism\n", "http://submit.encodedcc.org/profiles/experiment.json#platform\n", "http://submit.encodedcc.org/profiles/experiment.json#replicate\n", "http://submit.encodedcc.org/profiles/experiment.json#source\n", "http://submit.encodedcc.org/profiles/experiment.json#target\n", "http://submit.encodedcc.org/profiles/experiment.json#user\n" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And get all the properties" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "select distinct ?p\n", "where {\n", " ?s ?p ?o\n", "}\n", "order by ?p\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "p\n", "-\n", "http://submit.encodedcc.org/profiles/experiment.json#accession\n", "http://submit.encodedcc.org/profiles/experiment.json#actions\n", "http://submit.encodedcc.org/profiles/experiment.json#address1\n", "http://submit.encodedcc.org/profiles/experiment.json#address2\n", "http://submit.encodedcc.org/profiles/experiment.json#aliases\n", "http://submit.encodedcc.org/profiles/experiment.json#antibody\n", "http://submit.encodedcc.org/profiles/experiment.json#antigen_description\n", "http://submit.encodedcc.org/profiles/experiment.json#antigen_sequence\n", "http://submit.encodedcc.org/profiles/experiment.json#assay_term_id\n", "http://submit.encodedcc.org/profiles/experiment.json#assay_term_name\n", "http://submit.encodedcc.org/profiles/experiment.json#assembly\n", "http://submit.encodedcc.org/profiles/experiment.json#attachment\n", "http://submit.encodedcc.org/profiles/experiment.json#award\n", "http://submit.encodedcc.org/profiles/experiment.json#award.rfa\n", "http://submit.encodedcc.org/profiles/experiment.json#awards\n", "http://submit.encodedcc.org/profiles/experiment.json#biological_replicate_number\n", "http://submit.encodedcc.org/profiles/experiment.json#biosample\n", "http://submit.encodedcc.org/profiles/experiment.json#biosample_term_id\n", "http://submit.encodedcc.org/profiles/experiment.json#biosample_term_name\n", "http://submit.encodedcc.org/profiles/experiment.json#biosample_type\n", "http://submit.encodedcc.org/profiles/experiment.json#characterizations\n", "http://submit.encodedcc.org/profiles/experiment.json#city\n", "http://submit.encodedcc.org/profiles/experiment.json#clonality\n", "http://submit.encodedcc.org/profiles/experiment.json#columns\n", "http://submit.encodedcc.org/profiles/experiment.json#condition\n", "http://submit.encodedcc.org/profiles/experiment.json#country\n", "http://submit.encodedcc.org/profiles/experiment.json#culture_harvest_date\n", "http://submit.encodedcc.org/profiles/experiment.json#culture_start_date\n", "http://submit.encodedcc.org/profiles/experiment.json#dataset\n", "http://submit.encodedcc.org/profiles/experiment.json#dataset_type\n", "http://submit.encodedcc.org/profiles/experiment.json#date_created\n", "http://submit.encodedcc.org/profiles/experiment.json#dbxref\n", "http://submit.encodedcc.org/profiles/experiment.json#derived_from\n", "http://submit.encodedcc.org/profiles/experiment.json#document_type\n", "http://submit.encodedcc.org/profiles/experiment.json#documents\n", "http://submit.encodedcc.org/profiles/experiment.json#donor\n", "http://submit.encodedcc.org/profiles/experiment.json#download\n", "http://submit.encodedcc.org/profiles/experiment.json#download_path\n", "http://submit.encodedcc.org/profiles/experiment.json#email\n", "http://submit.encodedcc.org/profiles/experiment.json#encode2_dbxrefs\n", "http://submit.encodedcc.org/profiles/experiment.json#end_date\n", "http://submit.encodedcc.org/profiles/experiment.json#ethnicity\n", "http://submit.encodedcc.org/profiles/experiment.json#experiment\n", "http://submit.encodedcc.org/profiles/experiment.json#extraction_method\n", "http://submit.encodedcc.org/profiles/experiment.json#fax\n", "http://submit.encodedcc.org/profiles/experiment.json#file_format\n", "http://submit.encodedcc.org/profiles/experiment.json#files\n", "http://submit.encodedcc.org/profiles/experiment.json#files.length\n", "http://submit.encodedcc.org/profiles/experiment.json#first_name\n", "http://submit.encodedcc.org/profiles/experiment.json#flowcell\n", "http://submit.encodedcc.org/profiles/experiment.json#flowcell_details\n", "http://submit.encodedcc.org/profiles/experiment.json#fragmentation_method\n", "http://submit.encodedcc.org/profiles/experiment.json#gene_name\n", "http://submit.encodedcc.org/profiles/experiment.json#geo_dbxrefs\n", "http://submit.encodedcc.org/profiles/experiment.json#google\n", "http://submit.encodedcc.org/profiles/experiment.json#health_status\n", "http://submit.encodedcc.org/profiles/experiment.json#host_organism\n", "http://submit.encodedcc.org/profiles/experiment.json#href\n", "http://submit.encodedcc.org/profiles/experiment.json#institute_label\n", "http://submit.encodedcc.org/profiles/experiment.json#institute_name\n", "http://submit.encodedcc.org/profiles/experiment.json#isotype\n", "http://submit.encodedcc.org/profiles/experiment.json#job_title\n", "http://submit.encodedcc.org/profiles/experiment.json#lab\n", "http://submit.encodedcc.org/profiles/experiment.json#lab.title\n", "http://submit.encodedcc.org/profiles/experiment.json#label\n", "http://submit.encodedcc.org/profiles/experiment.json#lane\n", "http://submit.encodedcc.org/profiles/experiment.json#last_name\n", "http://submit.encodedcc.org/profiles/experiment.json#library\n", "http://submit.encodedcc.org/profiles/experiment.json#library_size_selection_method\n", "http://submit.encodedcc.org/profiles/experiment.json#life_stage\n", "http://submit.encodedcc.org/profiles/experiment.json#lot_id\n", "http://submit.encodedcc.org/profiles/experiment.json#lot_id_alias\n", "http://submit.encodedcc.org/profiles/experiment.json#lysis_method\n", "http://submit.encodedcc.org/profiles/experiment.json#machine\n", "http://submit.encodedcc.org/profiles/experiment.json#md5sum\n", "http://submit.encodedcc.org/profiles/experiment.json#method\n", "http://submit.encodedcc.org/profiles/experiment.json#name\n", "http://submit.encodedcc.org/profiles/experiment.json#note\n", "http://submit.encodedcc.org/profiles/experiment.json#nucleic_acid_term_id\n", "http://submit.encodedcc.org/profiles/experiment.json#nucleic_acid_term_name\n", "http://submit.encodedcc.org/profiles/experiment.json#organism\n", "http://submit.encodedcc.org/profiles/experiment.json#output_type\n", "http://submit.encodedcc.org/profiles/experiment.json#paired_ended\n", "http://submit.encodedcc.org/profiles/experiment.json#passage_number\n", "http://submit.encodedcc.org/profiles/experiment.json#phone1\n", "http://submit.encodedcc.org/profiles/experiment.json#phone2\n", "http://submit.encodedcc.org/profiles/experiment.json#pi\n", "http://submit.encodedcc.org/profiles/experiment.json#platform\n", "http://submit.encodedcc.org/profiles/experiment.json#possible_controls\n", "http://submit.encodedcc.org/profiles/experiment.json#postal_code\n", "http://submit.encodedcc.org/profiles/experiment.json#product_id\n", "http://submit.encodedcc.org/profiles/experiment.json#profile\n", "http://submit.encodedcc.org/profiles/experiment.json#project\n", "http://submit.encodedcc.org/profiles/experiment.json#protocol_documents\n", "http://submit.encodedcc.org/profiles/experiment.json#purifications\n", "http://submit.encodedcc.org/profiles/experiment.json#replicate\n", "http://submit.encodedcc.org/profiles/experiment.json#replicates\n", "http://submit.encodedcc.org/profiles/experiment.json#replicates.length\n", "http://submit.encodedcc.org/profiles/experiment.json#rfa\n", "http://submit.encodedcc.org/profiles/experiment.json#schema_version\n", "http://submit.encodedcc.org/profiles/experiment.json#scientific_name\n", "http://submit.encodedcc.org/profiles/experiment.json#sex\n", "http://submit.encodedcc.org/profiles/experiment.json#size_range\n", "http://submit.encodedcc.org/profiles/experiment.json#source\n", "http://submit.encodedcc.org/profiles/experiment.json#start_date\n", "http://submit.encodedcc.org/profiles/experiment.json#starting_amount\n", "http://submit.encodedcc.org/profiles/experiment.json#starting_amount_units\n", "http://submit.encodedcc.org/profiles/experiment.json#state\n", "http://submit.encodedcc.org/profiles/experiment.json#status\n", "http://submit.encodedcc.org/profiles/experiment.json#strain_background\n", "http://submit.encodedcc.org/profiles/experiment.json#strand_specificity\n", "http://submit.encodedcc.org/profiles/experiment.json#submits_for\n", "http://submit.encodedcc.org/profiles/experiment.json#submitted_by\n", "http://submit.encodedcc.org/profiles/experiment.json#submitted_file_name\n", "http://submit.encodedcc.org/profiles/experiment.json#target\n", "http://submit.encodedcc.org/profiles/experiment.json#target.label\n", "http://submit.encodedcc.org/profiles/experiment.json#taxon_id\n", "http://submit.encodedcc.org/profiles/experiment.json#technical_replicate_number\n", "http://submit.encodedcc.org/profiles/experiment.json#term_id\n", "http://submit.encodedcc.org/profiles/experiment.json#term_name\n", "http://submit.encodedcc.org/profiles/experiment.json#timezone\n", "http://submit.encodedcc.org/profiles/experiment.json#title\n", "http://submit.encodedcc.org/profiles/experiment.json#type\n", "http://submit.encodedcc.org/profiles/experiment.json#url\n", "http://submit.encodedcc.org/profiles/experiment.json#urls\n", "http://submit.encodedcc.org/profiles/experiment.json#uuid\n", "http://www.w3.org/1999/02/22-rdf-syntax-ns#description\n", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type\n" ] } ], "prompt_number": 23 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Query Biosamples" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "prefix encode: \n", "prefix rdfs: \n", "select ?s ?description\n", "where {\n", " ?s rdfs:description ?description ;\n", " a encode:biosample .\n", "}\n", "limit 50\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "s\tdescription\n", "-\t-----------\n", "http://submit.encodedcc.org/biosamples/ENCBS089RNA/\tB-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus\n", "http://submit.encodedcc.org/biosamples/ENCBS090RNA/\tB-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Virus\n", "http://submit.encodedcc.org/biosamples/ENCBS087RNA/\tThe continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises. ENCODE3 RNA-seq evaluation replicate 1." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "http://submit.encodedcc.org/biosamples/ENCBS088RNA/\tThe continuous cell line K-562 was established by Lozzio and Lozzio from the pleural effusion of a 53-year-old female with chronic myelogenous leukemia in terminal blast crises. ENCODE3 RNA-seq evaluation replicate 2.\n", "http://submit.encodedcc.org/biosamples/ENCBS124ENC/\tMyoblast cell line derived from thigh muscle of C3H mice after crush injury" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "http://submit.encodedcc.org/biosamples/ENCBS127ENC/\tMyoblast cell line derived from thigh muscle of C3H mice after crush injury; differentiated from C2C12 cells for 60 hours" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "http://submit.encodedcc.org/biosamples/ENCBS125ENC/\tMyoblast cell line derived from thigh muscle of C3H mice after crush injury; differentiated from C2C12 cells for 24 hours" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "http://submit.encodedcc.org/biosamples/ENCBS034ENC/\tMultipotential cell line that can be converted by 5-azacytidine into three mesodermal stem cell lineages.\n", "http://submit.encodedcc.org/biosamples/ENCBS035ENC/\tAs a control, this multipotential cell line was treated with a differentiation protocol that will not induce these cells to differentiate into three mesodermal stem cell lineages." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "http://submit.encodedcc.org/biosamples/ENCBS126ENC/\tMyoblast cell line derived from thigh muscle of C3H mice after crush injury; differentiated from C2C12 cells for 5 days" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "http://submit.encodedcc.org/biosamples/ENCBS128ENC/\tMyoblast cell line derived from thigh muscle of C3H mice after crush injury; differentiated from C2C12 cells for 7 days" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 24 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Query Nucleic Acid Term" ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "prefix encode: \n", "prefix rdfs: \n", "select distinct ?term\n", "where {\n", " ?s encode:nucleic_acid_term_id ?term .\n", "}\n", "limit 50\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "term\n", "----\n", "http://purl.obolibrary.org/obo/SO_0000356\n", "http://purl.obolibrary.org/obo/SO_0000871\n", "http://purl.obolibrary.org/obo/SO_0000352\n" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok so far all of that could have been plausably done with a relational model.\n", "\n", "So, lets try something harder. Let's load another ontology into our model." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Query SO" ] }, { "cell_type": "code", "collapsed": false, "input": [ "model.load('http://www.berkeleybop.org/ontologies/so.owl')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "True" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok now that we've loaded so.owl. Lets do a query that takes advantage of it.\n", "\n", "I manually looked at terms in so.owl, and noticed [IAO_0000115](http://purl.obolibrary.org/obo/IAO_0000115) looked useful. Attempting to look the term up suggests it means \"Definition\"." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "prefix encode: \n", "prefix rdfs: \n", "prefix SO: \n", "select distinct ?term ?iao\n", "where {\n", " ?s encode:nucleic_acid_term_id ?term .\n", " ?term ?iao .\n", "}\n", "order by ?s\n", "limit 50\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "term\tiao\n", "----\t---\n", "http://purl.obolibrary.org/obo/SO_0000356\tAn attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a D-ribose ring connected to a phosphate backbone.\n", "http://purl.obolibrary.org/obo/SO_0000871\tAn mRNA that is polyadenylated.\n", "http://purl.obolibrary.org/obo/SO_0000352\tAn attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a 2-deoxy-D-ribose ring connected to a phosphate backbone.\n" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try another case.\n", "\n", "It looks like I didn't mark up all the terms correctly for them to be converted into links. Also when I tried to do load the EFO my browser started to run my laptop out of memory. So grabbing more information from this query may be challenging, and possibly take a database backed triple store." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "prefix encode: \n", "prefix rdfs: \n", "\n", "select distinct ?term \n", "where {\n", " ?s encode:biosample_term_id ?term .\n", "\n", "}\n", "order by ?s\n", "limit 50\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "term\n", "----\n", "BTO:0003166\n", "http://www.ebi.ac.uk/efo/0002067\n", "http://www.ebi.ac.uk/efo/0002784\n", "http://www.ebi.ac.uk/efo/0001098\n", "NTR:0000710\n", "EFO:0002784\n", "EFO:0002067\n", "EFO:0001098\n", "NTR:0000710\n", "BTO:0003166\n", "EFO:0000322\n", "EFO:0002786\n", "EFO:0002824\n", "CL:0000515\n", "BTO:0005046\n", "BTO:0000093\n", "CL:0002553\n", "EFO:0003042\n", "EFO:0001185\n", "EFO:0001187\n", "CL:0002618\n", "CL:0000312\n" ] } ], "prompt_number": 28 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Search for Fastqs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately this query is rather slow in redland. Also I probably need to adjust the download_path context so its an absolute URL." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "prefix encode: \n", "prefix rdfs: \n", "\n", "select ?exp_accession ?file ?path\n", "where {\n", " ?s encode:files ?file ;\n", " encode:accession ?exp_accession .\n", " ?file encode:file_format ?format ;\n", " encode:download_path ?path .\n", "\n", "}\n", "order by ?exp_acession\n", "limit 10\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "exp_accession\tfile\tpath\n", "-------------\t----\t----\n", "ENCSR000AEG\thttp://submit.encodedcc.org/files/ENCFF001RRR/\t2013/7/22/ENCFF001RRR.fastq.gz\n", "ENCSR000AEG\thttp://submit.encodedcc.org/files/ENCFF001RRN/\t2013/7/22/ENCFF001RRN.fastq.gz\n", "ENCSR000AEH\thttp://submit.encodedcc.org/files/ENCFF001RRJ/\t2013/7/22/ENCFF001RRJ.fastq.gz\n", "ENCSR000AEH\thttp://submit.encodedcc.org/files/ENCFF001RRK/\t2013/7/22/ENCFF001RRK.fastq.gz\n", "ENCSR000AEH\thttp://submit.encodedcc.org/files/ENCFF001RRI/\t2013/7/22/ENCFF001RRI.fastq.gz\n", "ENCSR000AEH\thttp://submit.encodedcc.org/files/ENCFF001RRL/\t2013/7/22/ENCFF001RRL.fastq.gz\n", "ENCSR000AEP\thttp://submit.encodedcc.org/files/ENCFF001RQW/\t2013/7/22/ENCFF001RQW.fastq.gz\n", "ENCSR000AEP\thttp://submit.encodedcc.org/files/ENCFF001RRG/\t2013/7/22/ENCFF001RRG.fastq.gz\n", "ENCSR000AEP\thttp://submit.encodedcc.org/files/ENCFF001RQT/\t2013/7/22/ENCFF001RQT.fastq.gz\n", "ENCSR000AEP\thttp://submit.encodedcc.org/files/ENCFF001RQX/\t2013/7/22/ENCFF001RQX.fastq.gz\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "/usr/lib/python2.7/dist-packages/RDF.py:1995: RedlandWarning: Variable format was bound but is unused in the query\n", " results = Redland.librdf_query_execute(self._query,model._model)\n", "/usr/lib/python2.7/dist-packages/RDF.py:1995: RedlandWarning: Variable exp_acession was used but is not bound in the query\n", " results = Redland.librdf_query_execute(self._query,model._model)\n" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examing one file shows that there's a dataset term that could have been converted to a fully qualified @id." ] }, { "cell_type": "code", "collapsed": false, "input": [ "query_model(model, '''\n", "prefix encode: \n", "prefix rdfs: \n", "\n", "select distinct ?p ?o\n", "where {\n", " ?p ?o .\n", "\n", "}\n", "order by ?s\n", "limit 10\n", "''')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "p\to\n", "-\t-\n", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type\thttp://submit.encodedcc.org/profiles/experiment.json#file\n", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type\thttp://submit.encodedcc.org/profiles/experiment.json#item\n", "http://submit.encodedcc.org/profiles/experiment.json#accession\tENCFF001RRR\n", "http://submit.encodedcc.org/profiles/experiment.json#dataset\t/experiments/ENCSR000AEG/\n", "http://submit.encodedcc.org/profiles/experiment.json#date_created\t2013-07-22\n", "http://submit.encodedcc.org/profiles/experiment.json#download_path\t2013/7/22/ENCFF001RRR.fastq.gz\n", "http://submit.encodedcc.org/profiles/experiment.json#file_format\tfastq\n", "http://submit.encodedcc.org/profiles/experiment.json#md5sum\t3f44150f50da1da4ca1d7a5d7360ec07\n", "http://submit.encodedcc.org/profiles/experiment.json#output_type\tread1\n", "http://submit.encodedcc.org/profiles/experiment.json#replicate\thttp://submit.encodedcc.org/replicates/dab021e1-4e00-4c6f-8580-eb95c7160995/\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "/usr/lib/python2.7/dist-packages/RDF.py:1995: RedlandWarning: Variable s was used but is not bound in the query\n", " results = Redland.librdf_query_execute(self._query,model._model)\n" ] } ], "prompt_number": 30 } ], "metadata": {} } ] }