# OAK paths command

This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).

This notebook provides examples for the `paths` command, which can be used to query for paths between ontology terms

## Help Option

You can get help on any OAK command using `--help`

In [1]:
!runoak paths --help

Usage: runoak paths [OPTIONS] [TERMS]...

  List all paths between one or more start curies.

  Example:

      runoak -i sqlite:obo:go paths  -p i,p 'nuclear membrane'

  This shows all shortest paths from nuclear membrane to all ancestors

  Example:

      runoak -i sqlite:obo:go paths  -p i,p 'nuclear membrane' --target
      cytoplasm

  This shows shortest paths between two nodes

  Example:

      runoak -i sqlite:obo:go paths  -p i,p 'nuclear membrane' 'thylakoid'
      --target cytoplasm 'thylakoid membrane'

  This shows all shortest paths between 4 combinations of starts and ends

  You can also use "@" to separate start node list and end node list. Like
  most OAK commands, you can pass either explicit terms, or term queries. For
  example, if you have two files of IDs, then you can do this:

      runoak -i sqlite:obo:go paths  -p i,p .idfile START_NODES.txt @ .idfile
      END_NODES.txt

  You can also pass in weights for each predicate, used when calculating
  shortest p

## Set up an alias

For convenience we will set up an alias for use in this notebook

In [2]:
alias cl runoak -i sqlite:obo:cl

__Note__ if you want to do this on your own machine the syntax is slightly different in bash/zsh:

`alias cl="runoak -i sqlite:obo:cl"`

## Example: simple subclass ancestor path

In [3]:
cl paths --target cell interneuron

subject	subject_label	object	object_label	path	path_label
CL:0000099	interneuron	CL:0000000	cell	['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'GO:0030154', 'CL:0000000']	interneuron|neuron|material entity|precursor cell|cell differentiation|cell
CL:0000099	interneuron	CL:0000000	cell	['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0000003', 'CL:0000000']	interneuron|neuron|material entity|precursor cell|native cell|cell
CL:0000099	interneuron	CL:0000000	cell	['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000003', 'CL:0000000']	interneuron|neuron|material entity|motile cell|native cell|cell
CL:0000099	interneuron	CL:0000000	cell	['CL:0000099', 'CL:0000540', 'CL:0000393', 'CL:0000211', 'CL:0000003', 'CL:0000000']	interneuron|neuron|electrically responsive cell|electrically active cell|native cell|cell
CL:0000099	interneuron	CL:0000000	cell	['CL:0000099', 'CL:0000540', 'CL:0000404', 'CL:0000211', 'CL:0000003', 'CL:0000000']	interneuron|neuron|ele

You can see a similar structure using the `tree` command:

In [4]:
cl tree interneuron -p i 

* [] BFO:0000002 ! continuant
    * [i] BFO:0000004 ! independent continuant
        * [i] BFO:0000040 ! material entity
            * [i] CL:0000540 ! neuron
                * [i] **CL:0000099 ! interneuron**
        * [i] CL:0002319 ! neural cell
            * [i] CL:0000540 ! neuron
                * [i] **CL:0000099 ! interneuron**
* [] CL:0000000 ! cell
    * [i] CL:0000003 ! native cell
        * [i] CL:0000211 ! electrically active cell
            * [i] CL:0000393 ! electrically responsive cell
                * [i] CL:0000540 ! neuron
                    * [i] **CL:0000099 ! interneuron**
            * [i] CL:0000404 ! electrically signaling cell
                * [i] CL:0000540 ! neuron
                    * [i] **CL:0000099 ! interneuron**
        * [i] CL:0000255 ! eukaryotic cell
            * [i] CL:0000548 ! animal cell
                * [i] CL:0002319 ! neural cell
                    * [i] CL:0000540 ! neuron
                        * [i] **CL:0000

## Non-directed paths

By default the paths command will ignore direction and show paths going both up and down:

In [5]:
cl paths interneuron --target "T-cell"

subject	subject_label	object	object_label	path	path_label
CL:0000099	interneuron	CL:0000084	T cell	['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0000219', 'CL:0000738', 'CL:0000842', 'CL:0000542', 'CL:0000084']	interneuron|neuron|material entity|motile cell|leukocyte|mononuclear cell|lymphocyte|T cell
CL:0000099	interneuron	CL:0000084	T cell	['CL:0000099', 'CL:0000540', 'BFO:0000040', 'CL:0011115', 'CL:0011026', 'CL:0000051', 'CL:0000542', 'CL:0000084']	interneuron|neuron|material entity|precursor cell|progenitor cell|common lymphoid progenitor|lymphocyte|T cell
CL:0000099	interneuron	CL:0000084	T cell	['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0000785', 'GO:0000792', 'CL:0000542', 'CL:0000084']	interneuron|neuron|presynapse|cellular anatomical entity|chromatin|heterochromatin|lymphocyte|T cell
CL:0000099	interneuron	CL:0000084	T cell	['CL:0000099', 'CL:0000540', 'GO:0098793', 'GO:0110165', 'GO:0005737', 'CL:0017500', 'CL:0000542', 'CL:0000084']	interneuron|neuron|p

Specifying `--directed` forces traversal of subject to object; in this case, there are no such paths:

In [6]:
cl paths interneuron --directed --target "T-cell"

## Narrow table options

The default output is one row per *path*

You can use the `--narrow` option to make a narrow table, with one row per path *element*:

In [8]:
cl paths --narrow --target CL:4023061 interneuron

subject	subject_label	object	object_label	path_node	path_node_label
CL:0000099	interneuron	CL:4023061	hippocampal CA4 neuron	CL:0000099	interneuron
CL:0000099	interneuron	CL:4023061	hippocampal CA4 neuron	CL:0000540	neuron
CL:0000099	interneuron	CL:4023061	hippocampal CA4 neuron	CL:4023061	hippocampal CA4 neuron


In [9]:
cl paths --narrow --target CL:4023061 interneuron -o output/interneuron-CA4-path.tsv

In [10]:
import pandas as pd

In [11]:
df = pd.read_csv("output/interneuron-CA4-path.tsv", sep="\t")
df

Unnamed: 0,subject,subject_label,object,object_label,path_node,path_node_label
0,CL:0000099,interneuron,CL:4023061,hippocampal CA4 neuron,CL:0000099,interneuron
1,CL:0000099,interneuron,CL:4023061,hippocampal CA4 neuron,CL:0000540,neuron
2,CL:0000099,interneuron,CL:4023061,hippocampal CA4 neuron,CL:4023061,hippocampal CA4 neuron
