{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial 15 - Simple ways to manipulate the model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial will help you utilise the module to get details about the data model of various mines. These are additional examples, besides the ones covered [here](http://intermine.org/intermine-ws-python/intermine.html#module-intermine.model). Each class (data type) has information containing references to other objects in the data model, collections of references or attribute details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's begin by showing all the fields a data type contains (including all Attributes, References and Collections listed alphabetically),then-" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "[CDSs is a group of CDS objects, which link back to this as gene,\n UTRs is a group of UTR objects, which link back to this as gene,\n alleles is a group of Allele objects, which link back to this as gene,\n briefDescription is a String,\n childFeatures is a group of SequenceFeature objects,\n chromosome is a Chromosome,\n chromosomeLocation is a Location,\n clones is a group of CDNAClone objects, which link back to this as gene,\n crossReferences is a group of CrossReference objects, which link back to this as subject,\n cytoLocation is a String,\n dataSets is a group of DataSet objects, which link back to this as bioEntities,\n description is a String,\n diseases is a group of Disease objects, which link back to this as genes,\n downstreamIntergenicRegion is a IntergenicRegion,\n exons is a group of Exon objects, which link back to this as gene,\n flankingRegions is a group of GeneFlankingRegion objects, which link back to this as gene,\n goAnnotation is a group of GOAnnotation objects,\n homologues is a group of Homologue objects, which link back to this as gene,\n id is a Integer,\n interactions is a group of Interaction objects, which link back to this as participant1,\n introns is a group of Intron objects, which link back to this as genes,\n length is a Integer,\n locatedFeatures is a group of Location objects, which link back to this as locatedOn,\n locations is a group of Location objects, which link back to this as feature,\n mRNAExpressionResults is a group of MRNAExpressionResult objects, which link back to this as gene,\n miRNAtargets is a group of MiRNATarget objects, which link back to this as mirnagene,\n microArrayResults is a group of MicroArrayResult objects, which link back to this as genes,\n name is a String,\n ontologyAnnotations is a group of OntologyAnnotation objects, which link back to this as subject,\n organism is a Organism,\n overlappingFeatures is a group of SequenceFeature objects,\n pathways is a group of Pathway objects, which link back to this as genes,\n primaryIdentifier is a String,\n probeSets is a group of ProbeSet objects, which link back to this as genes,\n proteins is a group of Protein objects, which link back to this as genes,\n publications is a group of Publication objects, which link back to this as entities,\n regulatoryRegions is a group of RegulatoryRegion objects, which link back to this as gene,\n rnaSeqResults is a group of RNASeqResult objects, which link back to this as gene,\n rnaiResults is a group of RNAiResult objects, which link back to this as gene,\n score is a Double,\n scoreType is a String,\n secondaryIdentifier is a String,\n sequence is a Sequence,\n sequenceOntologyTerm is a SOTerm,\n strain is a Strain, which links back to this as features,\n symbol is a String,\n synonyms is a group of Synonym objects, which link back to this as subject,\n transcripts is a group of Transcript objects, which link back to this as gene,\n upstreamIntergenicRegion is a IntergenicRegion]" }, "metadata": {}, "execution_count": 1 } ], "source": [ "from intermine.webservice import Service\n", "service = Service(\"http://flymine.org/flymine/service\")\n", "model = service.model\n", "datatype = model.get_class(\"Gene\")\n", "datatype.fields" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However suppose you have a class name, and want to know the details about a particular field it contains, then you can do the following- " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "genes is a group of Gene objects, which link back to this as proteins" }, "metadata": {}, "execution_count": 2 } ], "source": [ "datatype = model.get_class(\"Protein\")\n", "datatype.get_field('genes')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if you enter a field name that does not belong to the class, then you will receive an error. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "output_type": "error", "ename": "ModelError", "evalue": "'There is no field called puppies in Protein'", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mModelError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdatatype\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_field\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'puppies'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/opt/anaconda3/lib/python3.7/site-packages/intermine-1.12.0-py3.7.egg/intermine/model.py\u001b[0m in \u001b[0;36mget_field\u001b[0;34m(self, name)\u001b[0m\n\u001b[1;32m 330\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 331\u001b[0m raise ModelError(\"There is no field called %s in %s\" %\n\u001b[0;32m--> 332\u001b[0;31m (name, self.name))\n\u001b[0m\u001b[1;32m 333\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 334\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0misa\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mModelError\u001b[0m: 'There is no field called puppies in Protein'" ] } ], "source": [ "datatype.get_field('puppies')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, suppose you want to find out the nature of the data you retrieved earlier (i.e. whether it is an 'Attribute' or 'Collection'/'Reference'), you can easily iterate over the fields we had obtained earlier- " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": "CDSs - REFERENCE\ncanonicalProtein - REFERENCE\ncomments - REFERENCE\ncomponents - REFERENCE\ncrossReferences - REFERENCE\ndataSets - REFERENCE\necNumber - ATTRIBUTE\necNumbers - REFERENCE\nfeatures - REFERENCE\ngenbankIdentifier - ATTRIBUTE\ngenes - REFERENCE\nid - ATTRIBUTE\ninteractions - REFERENCE\nisFragment - ATTRIBUTE\nisUniprotCanonical - ATTRIBUTE\nisoforms - REFERENCE\nkeywords - REFERENCE\nlength - ATTRIBUTE\nlocatedFeatures - REFERENCE\nlocations - REFERENCE\nmd5checksum - ATTRIBUTE\nmolecularWeight - ATTRIBUTE\nname - ATTRIBUTE\nontologyAnnotations - REFERENCE\norganism - REFERENCE\npathways - REFERENCE\nprimaryAccession - ATTRIBUTE\nprimaryIdentifier - ATTRIBUTE\nproteinDomainRegions - REFERENCE\npublications - REFERENCE\nsecondaryIdentifier - ATTRIBUTE\nsequence - REFERENCE\nsymbol - ATTRIBUTE\nsynonyms - REFERENCE\ntranscripts - REFERENCE\nuniprotAccession - ATTRIBUTE\nuniprotName - ATTRIBUTE\n" } ], "source": [ "datatype = model.get_class(\"Protein\")\n", "fieldtypes=[]\n", "for field in datatype.fields:\n", "\n", " fieldtypes.append(str(field))\n", " path = model.make_path(\"Protein\"+\".\"+fieldtypes[len(fieldtypes)-1])\n", " if path.is_reference()== True:\n", " print(fieldtypes[len(fieldtypes)-1],'-', 'REFERENCE')\n", " \n", " if path.is_attribute()== True:\n", " print(fieldtypes[len(fieldtypes)-1], '-','ATTRIBUTE')\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that we used make_path() in the process. This function helps us construct paths and inspect whether it is valid or not, or as in here, utilise it to get more information from the model. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look at what information we can get regarding 'inheritance' of class'. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The isa(input) function helps us determine whether 'input' belongs to the ancestry(is a parent class or one of the parents of the parent class or so on) of a given class. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "False" }, "metadata": {}, "execution_count": 5 } ], "source": [ "datatype = model.get_class(\"Protein\")\n", "datatype.isa('Allele')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus Allele is not an ancestor of of Protein." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's see how we can get retrieve the ancestry of a particular class- " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": "BioEntity\nAnnotatable\n" } ], "source": [ "datatype = model.get_class(\"Protein\")\n", "classes = model.to_ancestry(datatype)\n", "for i in classes:\n", " print(i.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial thus explained how to get information regarding a class i.e. its attributes and the inheritance it shares without having to retrieve the entire model! " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3-final" } }, "nbformat": 4, "nbformat_minor": 2 }