{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Examples of running the script\n", "\n", "There are several options to run `UCSC_chrom_sizes_2_circos_karyotype.py` or utilize the core function. \n", "This notebook will demonstrate the following:\n", "\n", "- [Running as script file](#Running-as-script-file)\n", "- [Running core function of the script after loading into a cell](#Running-core-function-of-the-script-after-loading-into-a-cell) via one or two alternative routes\n", "- [Running core function of the script after import](#Running-core-function-of-the-script-after-importing)\n", "\n", "(If you are having any problems at all doing any of this, this notebook was developed in the environment launchable by pressing `Launch binder` badge [here](https://github.com/fomightez/cl_sq_demo-binder). You could always launch that environment and upload this notebook there and things should work.)\n", "\n", "## Running as script file\n", "\n", "Similar to how one would run a script from the command line. (Aspects of that are reviewed in this section, too.)\n", "\n", "Upload the script to the directory where you want to run it. Or upload it to a running Jupyter environment.\n", "\n", "(For the sake of this demonstration, I am going to use `curl` to get the file from github and upload it to the 'local' environment. You of course can use whatever download and upload steps you'd like, such as using a browser and your system's graphical user interface, to place the script in the directory. 'local' is in parentheses because if running this in a Jupyter interface via the Binder system, 'local' would be inside the running enviroment.)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 16358 100 16358 0 0 91898 0 --:--:-- --:--:-- --:--:-- 91385\n" ] } ], "source": [ "!curl -O https://raw.githubusercontent.com/fomightez/sequencework/master/circos-utilities/UCSC_chrom_sizes_2_circos_karyotype.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That command would work on the command line without the exclamation point. The use of the exclamation point signals here to not treat it as Python code and instead target the command to the available command line shell. \n", "\n", "**THEN AFTER UPLOADED**... \n", "If running on the command line then you would enter:\n", "\n", "```\n", "python UCSC_chrom_sizes_2_circos_karyotype.py http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\n", "```\n", "\n", "Or something similar to that depending on your Python environment and source of data. (See more about that [here](https://github.com/fomightez/sequencework/tree/master/circos-utilities).)\n", "\n", "Similarly you can do that in the Jupyter environment using either either `!python` before the script name or using the `%run` magic command. \n", "The `%run` magic command is demonstrated in the next cell. If you are in an active Jupyter environment, to run it click on the next cell and type `shift-enter` or click run on the toolbar above the notebook." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: UCSC_chrom_sizes_2_circos_karyotype.py [-h] [-sc SPECIES_CODE]\n", " URL [OUTPUT_FILE]\n", "\n", "UCSC_chrom_sizes_2_circos_karyotype.py takes a URL for a UCSC chrom.sizes file\n", "and makes a karyotype.tab file. **** Script by Wayne Decatur (fomightez @\n", "github) ***\n", "\n", "positional arguments:\n", " URL URL of chrom.sizes file at UCSC.\n", " OUTPUT_FILE **OPTIONAL**Name of file for storing the karyotype. If\n", " none is provided, the karyotype will be stored as\n", " 'karyotype.tab'.\n", "\n", "optional arguments:\n", " -h, --help show this help message and exit\n", " -sc SPECIES_CODE, --species_code SPECIES_CODE\n", " **OPTIONAL**Identifier to use in front of chromosome\n", " names. An attempt will be made to extract one if\n", " nothing is provided & that is why it's optional.\n" ] } ], "source": [ "%run UCSC_chrom_sizes_2_circos_karyotype.py --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next cell is an example where actual arguments are provided as outlined in the `USAGE` shown as the output from in the above cell due to running with the script with the `--help/-h` flag. See [here](https://github.com/fomightez/sequencework/blob/master/circos-utilities/README.md) for help with coming up with your own parameters to pass the script." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "The following species code will be used in the ID column in the\n", "produced karyotype file: 'dog'.\n", "\n", "The karyotype file for 41 chromosomes has been saved as a file named 'dog_karyotype.tab'." ] } ], "source": [ "%run UCSC_chrom_sizes_2_circos_karyotype.py http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes dog_karyotype.tab --species_code dog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----\n", "\n", "## Running core function of the script after loading into a cell\n", "\n", "It can be pasted into a cell or loaded from github. Those will be demonstrated in this section of the notebook.\n", "\n", "First part of this section will cover pasting the script into a cell. \n", "In the next cell is the script (althought it might not be the most up-to-date version, and so it would be best to get and paste the most-up-to-date version from Github or use the `%load `approach to fetch it directly into the cell as discussed below.)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#!/usr/bin/env python\n", "# UCSC_chrom_sizes_2_circos_karyotype.py\n", "__author__ = \"Wayne Decatur\" #fomightez on GitHub\n", "__license__ = \"MIT\"\n", "__version__ = \"0.2.0\"\n", "\n", "\n", "# UCSC_chrom_sizes_2_circos_karyotype.py by Wayne Decatur\n", "# ver 0.2\n", "#\n", "#*******************************************************************************\n", "# Verified compatible with both Python 2.7 and Python 3.6; written initially in \n", "# Python 3.\n", "#\n", "# PURPOSE: Takes a URL for a UCSC `chrom.sizes` file and makes a `karyotype.tab` \n", "# file from it for use with Circos.\n", "# Note: to determine the URL, google `YOUR_ORGANISM genome UCSC chrom.sizes`, \n", "# where you replace `YOUR_ORGANISM` with your organism name and then\n", "# adapt the path you see in the best match to be something similar to \n", "# \"http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\"\n", "# -or-\n", "# \"http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes\"\n", "# \n", "# IMPORTANTLY, this script is intended for organisms without cytogenetic bands, \n", "# such as dog, cow, yeast, etc..\n", "# Acquiring the cytogenetic bands information is described at \n", "# http://circos.ca/tutorials/lessons/ideograms/karyotypes/ , about halfway down \n", "# the page where it says, \"obtain the karyotype structure from...\". \n", "# Unfortunately, it seems the output directed to by those instructions is not\n", "# directly useful in Circos(?). Fortunately, though as described at \n", "# http://circos.ca/documentation/tutorials/quick_start/hello_world/ \n", "# ,\"Circos ships with several predefined karyotype files for common sequence \n", "# assemblies: human, mouse, rat, and drosophila. These files are located in \n", "# data/karyotype within the Circos distribution.\"\n", "#\n", "# Written to run from command line or pasted/loaded inside a Jupyter notebook \n", "# cell. \n", "#\n", "#\n", "#\n", "# This script based on work and musings developed in \n", "# `Trying to convert k75.Umap.bedGraph to bigwig file that works at SGD jbrowse.md` \n", "# (specifically use of chrom.sizes) and \n", "# `Resources in regards to plotting information on presence or absence of signal on circular chromosome circos.md` \n", "# (where was describing issues with getting karyotype) and \n", "# http://circos.ca/tutorials/course/handouts/session-4.pdf (that shows first \n", "# part of Saccharomyces cerevisiae karyptype on page 6).\n", "#\n", "# Example input from \n", "# http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes:\n", "'''\n", "chrIV 1531933\n", "chrXV 1091291\n", "chrVII 1090940\n", "chrXII 1078177\n", "chrXVI 948066\n", "chrXIII 924431\n", "chrII 813184\n", "chrXIV 784333\n", "chrX 745751\n", "chrXI 666816\n", "chrV 576874\n", "chrVIII 562643\n", "chrIX 439888\n", "chrIII 316620\n", "chrVI 270161\n", "chrI 230218\n", "chrM 85779\n", "'''\n", "\n", "#\n", "#Example output (tab-separated):\n", "'''\n", "chr - Sc-chrIV chrIV 0 1531933 black\n", "chr - Sc-chrXV chrXV 0 1091291 black\n", "chr - Sc-chrVII chrVII 0 1090940 black\n", "chr - Sc-chrXII chrXII 0 1078177 black\n", "chr - Sc-chrXVI chrXVI 0 948066 black\n", "chr - Sc-chrXIII chrXIII 0 924431 black\n", "chr - Sc-chrII chrII 0 813184 black\n", "chr - Sc-chrXIV chrXIV 0 784333 black\n", "chr - Sc-chrX chrX 0 745751 black\n", "chr - Sc-chrXI chrXI 0 666816 black\n", "chr - Sc-chrV chrV 0 576874 black\n", "chr - Sc-chrVIII chrVIII 0 562643 black\n", "chr - Sc-chrIX chrIX 0 439888 black\n", "chr - Sc-chrIII chrIII 0 316620 black\n", "chr - Sc-chrVI chrVI 0 270161 black\n", "chr - Sc-chrI chrI 0 230218 black\n", "chr - Sc-chrM chrM 0 85779 black\n", "'''\n", "\n", "#\n", "#\n", "# Dependencies beyond the mostly standard libraries/modules:\n", "#\n", "#\n", "#\n", "# VERSION HISTORY:\n", "# v.0.1. basic working version\n", "# v.0.2. removed references to `http://hgdownload-test.cse.ucsc.edu/..` because \n", "# seems UCSC has removed the `-test` part so that it is now\n", "# `https://hgdownload.cse.ucsc.edu/...`\n", "#\n", "# To do:\n", "# - probably would be nice to add automated handling of ordering by increasing \n", "# chromosome number. (I've used detection of roman numerals before, see \n", "# `plot_expression_across_chromosomes.py) Because would need to be able to \n", "# store and sort, probably putting the chromosomes and lengths in a dataframe \n", "# instead would be a good route. Then could write a function to iterrows and \n", "# write the output lines.\n", "# - possible to do: automate making ones for ones with cytogenetic bands, or is\n", "# there not enough aside from the ones included?\n", "#\n", "#\n", "#\n", "#\n", "# TO RUN:\n", "# Examples,\n", "# Enter on the command line of your terminal, the line\n", "#-----------------------------------\n", "# python UCSC_chrom_sizes_2_circos_karyotype.py http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\n", "\n", "#-OR-\n", "# python UCSC_chrom_sizes_2_circos_karyotype.py http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes dog_karyotype.tab --species_code dog\n", "#-----------------------------------\n", "# Issue `python UCSC_chrom_sizes_2_circos_karyotype.py -h` for details.\n", "# \n", "#\n", "# To use this after pasting or loading into a cell in a Jupyter notebook, in\n", "# the next cell define the URL and then call the main function similar to below:\n", "# url = \"http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\"\n", "# UCSC_chrom_sizes_2_circos_karyotype(species_code)\n", "#\n", "#(`species_code_hardcoded` and `output_file_name `can be assigned in a cell \n", "# before calling the function as well.)\n", "#\n", "# Note that `url` is actually not needed if you are using the yeast one because \n", "# that specific one is hardcoded in script as default.\n", "# In fact due to fact I hardcoded in defaults, just `main()` will indeed work \n", "# for yeast.\n", "#\n", "# \n", "#\n", "'''\n", "CURRENT ACTUAL CODE FOR RUNNING/TESTING IN A NOTEBOOK WHEN LOADED OR PASTED IN \n", "ANOTHER CELL:\n", "UCSC_chrom_sizes_2_circos_karyotype()\n", "\n", "-OR, just-\n", "\n", "main()\n", "\n", "'''\n", "#\n", "#\n", "#*******************************************************************************\n", "#\n", "\n", "\n", "\n", "\n", "\n", "#*******************************************************************************\n", "##################################\n", "# USER ADJUSTABLE VALUES #\n", "\n", "##################################\n", "#\n", "## default URL\n", "url = \"http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\" \n", "output_file_name = \"karyotype.tab\"\n", "\n", "species_code = None # replace `None` with what you want to use,\n", "# with flanking quotes if something appropriate is not being extracted from the\n", "# provided URL to be used as the species code.\n", "\n", "\n", "\n", "\n", "\n", "#\n", "#*******************************************************************************\n", "#**********************END USER ADJUSTABLE VARIABLES****************************\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "#*******************************************************************************\n", "#*******************************************************************************\n", "###DO NOT EDIT BELOW HERE - ENTER VALUES ABOVE###\n", "\n", "import sys\n", "import os\n", "\n", "\n", "\n", "\n", "###---------------------------HELPER FUNCTIONS---------------------------------###\n", "\n", "\n", "def make_and_save_karyotype(chromosomes_and_length, species_code):\n", " '''\n", " Takes a dictionary of chromosome identifiers and length and makes a karyotype\n", " file with that information.\n", "\n", " Result will look like this at start of output file:\n", " chr - Sc-chrIV chrIV 0 1531933 black\n", " chr - Sc-chrXV chrXV 0 1091291 black\n", " ...\n", "\n", " Function returns None.\n", " '''\n", " # prepare output file for saving so it will be open and ready\n", " with open(output_file_name, 'w') as output_file:\n", " for indx,(chrom,length) in enumerate(chromosomes_and_length.items()):\n", " next_line = (\"chr\\t-\\t{species_code}-{chrom}\\t{chrom}\\t0\"\n", " \"\\t{length}\\tblack\".format(\n", " species_code=species_code,chrom=chrom, length=length))\n", " if indx < (len(chromosomes_and_length)-1):\n", " next_line += \"\\n\" # don't add new line character to last line\n", " # Send the built line to output\n", " output_file.write(next_line)\n", " sys.stderr.write( \"\\n\\nThe karyotype file for {} chromosomes has been saved \"\n", " \"as a file named\"\n", " \" '{}'.\".format(len(chromosomes_and_length),output_file_name))\n", "\n", "\n", "def extract_species_code_fromUCSC_URL(url):\n", " '''\n", " Take something like:\n", " https://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\n", "\n", " And return:\n", " sacCer\n", "\n", " Note:\n", " I decided to use `''.join([i for i in s if not i.isdigit()])`, where s is a\n", " aprovided string, to toss digits.\n", " '''\n", " species_code = url.split(\"goldenPath\")[1].split(\"/\")[1]\n", " return ''.join([i for i in species_code if not i.isdigit()]) # remove digits\n", "###--------------------------END OF HELPER FUNCTIONS---------------------------###\n", "###--------------------------END OF HELPER FUNCTIONS---------------------------###\n", "\n", "#*******************************************************************************\n", "###------------------------'main' function of script---------------------------##\n", "# This switch below about `species_code_hardcoded` added here so that above the \n", "# user can see they can edit it under 'END USER ADJUSTABLE VARIABLES' to make it \n", "# a string, but I want it tp default to `False` when not set to make checking \n", "# status easier.\n", "if species_code == \"None\":\n", " species_code = False\n", "\n", "def UCSC_chrom_sizes_2_circos_karyotype(url=url, species_code=species_code):\n", " '''\n", " Main function of script. Will use url to get `chrom.sizes` file from UCSC \n", " and use that to make a karyotype file for use in Circos.\n", " Saves the file as tab-separated values with the extension `.tab`, by\n", " default, to be consistent with what Circos ecosystem seems to use.\n", "\n", " Default url is the yeast one if calling function from inside Juputer or\n", " IPyhon.\n", "\n", " Optionally a string can be provided in the call to the function to be used\n", " as species in place of the one extracted automatically. Example: \n", " `species_code = \"doggie\"`\n", "\n", " Returns: None\n", " '''\n", " # Get data from URL.\n", " chromosomes_and_length = {}\n", " # Getting html originally for just Python 3, adapted from \n", " # https://stackoverflow.com/a/17510727/8508004 and then updated from to \n", " # handle Python 2 and 3 according to same link.\n", " try:\n", " # For Python 3.0 and later\n", " from urllib.request import urlopen\n", " except ImportError:\n", " # Fall back to Python 2's urllib2\n", " from urllib2 import urlopen\n", " html = urlopen(url)\n", " for line in html.read().splitlines():\n", " #chromosome, chr_len, *_ = line.strip().split()\n", " # that elegant unpack above is based on \n", " # https://stackoverflow.com/questions/11371204/unpack-the-first-two-elements-in-list-tuple\n", " # , but it won't work in Python 2. From same place, one that works in 2:\n", " chromosome, chr_len = line.strip().split()[:2]\n", " chromosomes_and_length[chromosome.decode(\n", " encoding='UTF-8')] = chr_len.decode(encoding='UTF-8')\n", "\n", "\n", "\n", " # Parse the URL for a genus/species -type identifier. (If one not provided.)\n", " # Note part of keeping URL separate is so that I parse it to parse out from URL \n", " # first part of genus-species identifier. Here in development version that is \n", " # `sacCer3`, for yeast Saccharmyces cerevisiae. Parsing\n", " # because of advice [here](http://circos.ca/documentation/tutorials/ideograms/karyotypes/), \n", " # \"Even when working with only one species, prefixing the chromosome with a \n", " # species code is highly recommended - this will greatly help in creating \n", " # more transparent configuration and data files.\"\n", " if species_code:\n", " species_code = species_code\n", " sys.stderr.write( \"\\nThe following \"\n", " \"species code will be used in the ID column \"\n", " \"in the\\nproduced karyotype file: '{}'.\".format(species_code))\n", " else:\n", " species_code = extract_species_code_fromUCSC_URL(url)\n", " if species_code == \"sacCer\":\n", " species_code = \"Sc\" # CUSTOMIZING; I'd prefer to use this for yeast.\n", " sys.stderr.write( \"\\nBased on the provided URL, the following \"\n", " \"species code will be used in the\\nID column \"\n", " \"in the karyotype file: '{}'.\\n\"\n", " \"If that is not suitable, you can re-run the script and \"\n", " \"provide one when calling\\nthe script using the \"\n", " \"`--species_code` flag. Alternatively, edit \"\n", " \"the produced file with find/replace.\".format(species_code))\n", " # With the approach in that above block, I can expose `species_code` to \n", " # setting for advanced use without it being required and without need to be \n", " # passed into the function.\n", "\n", "\n", "\n", "\n", " # Now use the data to make a karyotype file as described at \n", " # http://circos.ca/documentation/tutorials/ideograms/karyotypes/ and like \n", " # on page 6 of http://circos.ca/tutorials/course/handouts/session-4.pdf\n", " make_and_save_karyotype(chromosomes_and_length, species_code)\n", "###--------------------------END OF MAIN FUNCTION----------------------------###\n", "###--------------------------END OF MAIN FUNCTION----------------------------###\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "#*******************************************************************************\n", "###------------------------'main' section of script---------------------------##\n", "\n", "def main():\n", " \"\"\" Main entry point of the script \"\"\"\n", " # placing actual main action in a 'helper'script so can call that easily \n", " # with a distinguishing name in Jupyter notebooks, where `main()` may get\n", " # assigned multiple times depending how many scripts imported/pasted in.\n", " UCSC_chrom_sizes_2_circos_karyotype(url,species_code)\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "if __name__ == \"__main__\" and '__file__' in globals():\n", " \"\"\" This is executed when run from the command line \"\"\"\n", " # Code with just `if __name__ == \"__main__\":` alone will be run if pasted\n", " # into a notebook. The addition of ` and '__file__' in globals()` is based\n", " # on https://stackoverflow.com/a/22923872/8508004\n", " # See also https://stackoverflow.com/a/22424821/8508004 for an option to \n", " # provide arguments when prototyping a full script in the notebook.\n", " ###-----------------for parsing command line arguments-----------------------###\n", " import argparse\n", " parser = argparse.ArgumentParser(prog='UCSC_chrom_sizes_2_circos_karyotype.py',\n", " description=\"UCSC_chrom_sizes_2_circos_karyotype.py takes a URL for a \\\n", " UCSC chrom.sizes file and makes a karyotype.tab file. \\\n", " **** Script by Wayne Decatur \\\n", " (fomightez @ github) ***\")\n", "\n", " parser.add_argument(\"URL\", help=\"URL of chrom.sizes file at UCSC. \\\n", " \", metavar=\"URL\")\n", " parser.add_argument('-sc', '--species_code', action='store', type=str, \n", " default= species_code, help=\"**OPTIONAL**Identifier \\\n", " to use in front of chromosome names. An attempt will be made to extract \\\n", " one if nothing is provided & that is why it's optional.\")\n", " parser.add_argument(\"output\", nargs='?', help=\"**OPTIONAL**Name of file \\\n", " for storing the karyotype. If none is provided, the karyotype will be \\\n", " stored as '\"+output_file_name+\"'.\", \n", " default=output_file_name , metavar=\"OUTPUT_FILE\")\n", "\n", " # See\n", " # https://stackoverflow.com/questions/4480075/argparse-optional-positional-arguments \n", " # and \n", " # https://docs.python.org/2/library/argparse.html#nargs for use of `nargs='?'` \n", " # to make output file name optional. Note that the square brackets\n", " # shown in the usage out signify optional according to \n", " # https://stackoverflow.com/questions/4480075/argparse-optional-positional-arguments#comment40460395_4480202\n", " # , but because placed under positional I added clarifying text to help \n", " # description.\n", " # IF MODIFYING THIS SCRIPT FOR USE ELSEWHERE AND DON'T NEED/WANT THE OUTPUT \n", " # FILE TO BE OPTIONAL, remove `nargs` (& default?) BUT KEEP WHERE NOT\n", " # USING `argparse.FileType` AND USING `with open` AS CONISDERED MORE PYTHONIC.\n", "\n", "\n", "\n", " #I would also like trigger help to display if no arguments provided because \n", " # need at least one for url\n", " if len(sys.argv)==1: #from http://stackoverflow.com/questions/4042452/display-help-message-with-python-argparse-when-script-is-called-without-any-argu\n", " parser.print_help()\n", " sys.exit(1)\n", " args = parser.parse_args()\n", " url= args.URL\n", " output_file_name = args.output\n", " species_code = args.species_code\n", "\n", "\n", " main()\n", "\n", "#*******************************************************************************\n", "###-***********************END MAIN PORTION OF SCRIPT***********************-###\n", "#*******************************************************************************\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have options for calling the core function of the script. The next two cells demonstrate that." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Based on the provided URL, the following species code will be used in the\n", "ID column in the karyotype file: 'Sc'.\n", "If that is not suitable, you can re-run the script and provide one when calling\n", "the script using the `--species_code` flag. Alternatively, edit the produced file with find/replace.\n", "\n", "The karyotype file for 17 chromosomes has been saved as a file named 'karyotype.tab'." ] } ], "source": [ "UCSC_chrom_sizes_2_circos_karyotype()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To provide your own settings, set the variables the script will use before calling it, like so:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Based on the provided URL, the following species code will be used in the\n", "ID column in the karyotype file: 'Sc'.\n", "If that is not suitable, you can re-run the script and provide one when calling\n", "the script using the `--species_code` flag. Alternatively, edit the produced file with find/replace.\n", "\n", "The karyotype file for 17 chromosomes has been saved as a file named 'dog_karyotype.tab'." ] } ], "source": [ "url=\"http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes\"\n", "output_file_name = \"dog_karyotype.tab\"\n", "species_code_hardcoded = \"dog\"\n", "UCSC_chrom_sizes_2_circos_karyotype()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Skipping pasting by loading into a cell**\n", "\n", "Next to load the script into a cell direct from github, you need the URL of the raw script and then you can use the load magic command in a cell, like:\n", "```\n", "%load https://raw.githubusercontent.com/fomightez/sequencework/master/circos-utilities/UCSC_chrom_sizes_2_circos_karyotype.py\n", "```\n", "\n", "(Note it is possible to use that method to get a specific version of the script, see [my comment here](https://stackoverflow.com/questions/40054672/how-to-save-code-file-on-github-and-run-on-jupyter-notebook/48587645#comment84259526_48587645), and indeed that may be the best option if you are looking for reproducibility.)\n", "\n", "Actually doing that will result in a cell that looks like the following because after the contents are loaded, the `%load` command is commented out and the contents of the cell are the script:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# %load https://raw.githubusercontent.com/fomightez/sequencework/master/circos-utilities/UCSC_chrom_sizes_2_circos_karyotype.py\n", "#!/usr/bin/env python\n", "# UCSC_chrom_sizes_2_circos_karyotype.py\n", "__author__ = \"Wayne Decatur\" #fomightez on GitHub\n", "__license__ = \"MIT\"\n", "__version__ = \"0.2.0\"\n", "\n", "\n", "# UCSC_chrom_sizes_2_circos_karyotype.py by Wayne Decatur\n", "# ver 0.2\n", "#\n", "#*******************************************************************************\n", "# Verified compatible with both Python 2.7 and Python 3.6; written initially in \n", "# Python 3.\n", "#\n", "# PURPOSE: Takes a URL for a UCSC `chrom.sizes` file and makes a `karyotype.tab` \n", "# file from it for use with Circos.\n", "# Note: to determine the URL, google `YOUR_ORGANISM genome UCSC chrom.sizes`, \n", "# where you replace `YOUR_ORGANISM` with your organism name and then\n", "# adapt the path you see in the best match to be something similar to \n", "# \"http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\"\n", "# -or-\n", "# \"http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes\"\n", "# \n", "# IMPORTANTLY, this script is intended for organisms without cytogenetic bands, \n", "# such as dog, cow, yeast, etc..\n", "# Acquiring the cytogenetic bands information is described at \n", "# http://circos.ca/tutorials/lessons/ideograms/karyotypes/ , about halfway down \n", "# the page where it says, \"obtain the karyotype structure from...\". \n", "# Unfortunately, it seems the output directed to by those instructions is not\n", "# directly useful in Circos(?). Fortunately, though as described at \n", "# http://circos.ca/documentation/tutorials/quick_start/hello_world/ \n", "# ,\"Circos ships with several predefined karyotype files for common sequence \n", "# assemblies: human, mouse, rat, and drosophila. These files are located in \n", "# data/karyotype within the Circos distribution.\"\n", "#\n", "# Written to run from command line or pasted/loaded inside a Jupyter notebook \n", "# cell. \n", "#\n", "#\n", "#\n", "# This script based on work and musings developed in \n", "# `Trying to convert k75.Umap.bedGraph to bigwig file that works at SGD jbrowse.md` \n", "# (specifically use of chrom.sizes) and \n", "# `Resources in regards to plotting information on presence or absence of signal on circular chromosome circos.md` \n", "# (where was describing issues with getting karyotype) and \n", "# http://circos.ca/tutorials/course/handouts/session-4.pdf (that shows first \n", "# part of Saccharomyces cerevisiae karyptype on page 6).\n", "#\n", "# Example input from \n", "# http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes:\n", "'''\n", "chrIV 1531933\n", "chrXV 1091291\n", "chrVII 1090940\n", "chrXII 1078177\n", "chrXVI 948066\n", "chrXIII 924431\n", "chrII 813184\n", "chrXIV 784333\n", "chrX 745751\n", "chrXI 666816\n", "chrV 576874\n", "chrVIII 562643\n", "chrIX 439888\n", "chrIII 316620\n", "chrVI 270161\n", "chrI 230218\n", "chrM 85779\n", "'''\n", "\n", "#\n", "#Example output (tab-separated):\n", "'''\n", "chr - Sc-chrIV chrIV 0 1531933 black\n", "chr - Sc-chrXV chrXV 0 1091291 black\n", "chr - Sc-chrVII chrVII 0 1090940 black\n", "chr - Sc-chrXII chrXII 0 1078177 black\n", "chr - Sc-chrXVI chrXVI 0 948066 black\n", "chr - Sc-chrXIII chrXIII 0 924431 black\n", "chr - Sc-chrII chrII 0 813184 black\n", "chr - Sc-chrXIV chrXIV 0 784333 black\n", "chr - Sc-chrX chrX 0 745751 black\n", "chr - Sc-chrXI chrXI 0 666816 black\n", "chr - Sc-chrV chrV 0 576874 black\n", "chr - Sc-chrVIII chrVIII 0 562643 black\n", "chr - Sc-chrIX chrIX 0 439888 black\n", "chr - Sc-chrIII chrIII 0 316620 black\n", "chr - Sc-chrVI chrVI 0 270161 black\n", "chr - Sc-chrI chrI 0 230218 black\n", "chr - Sc-chrM chrM 0 85779 black\n", "'''\n", "\n", "#\n", "#\n", "# Dependencies beyond the mostly standard libraries/modules:\n", "#\n", "#\n", "#\n", "# VERSION HISTORY:\n", "# v.0.1. basic working version\n", "# v.0.2. removed references to `http://hgdownload-test.cse.ucsc.edu/..` because \n", "# seems UCSC has removed the `-test` part so that it is now\n", "# `https://hgdownload.cse.ucsc.edu/...`\n", "#\n", "# To do:\n", "# - probably would be nice to add automated handling of ordering by increasing \n", "# chromosome number. (I've used detection of roman numerals before, see \n", "# `plot_expression_across_chromosomes.py) Because would need to be able to \n", "# store and sort, probably putting the chromosomes and lengths in a dataframe \n", "# instead would be a good route. Then could write a function to iterrows and \n", "# write the output lines.\n", "# - possible to do: automate making ones for ones with cytogenetic bands, or is\n", "# there not enough aside from the ones included?\n", "#\n", "#\n", "#\n", "#\n", "# TO RUN:\n", "# Examples,\n", "# Enter on the command line of your terminal, the line\n", "#-----------------------------------\n", "# python UCSC_chrom_sizes_2_circos_karyotype.py http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\n", "\n", "#-OR-\n", "# python UCSC_chrom_sizes_2_circos_karyotype.py http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes dog_karyotype.tab --species_code dog\n", "#-----------------------------------\n", "# Issue `python UCSC_chrom_sizes_2_circos_karyotype.py -h` for details.\n", "# \n", "#\n", "# To use this after pasting or loading into a cell in a Jupyter notebook, in\n", "# the next cell define the URL and then call the main function similar to below:\n", "# url = \"http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\"\n", "# UCSC_chrom_sizes_2_circos_karyotype(species_code)\n", "#\n", "#(`species_code_hardcoded` and `output_file_name `can be assigned in a cell \n", "# before calling the function as well.)\n", "#\n", "# Note that `url` is actually not needed if you are using the yeast one because \n", "# that specific one is hardcoded in script as default.\n", "# In fact due to fact I hardcoded in defaults, just `main()` will indeed work \n", "# for yeast.\n", "#\n", "# \n", "#\n", "'''\n", "CURRENT ACTUAL CODE FOR RUNNING/TESTING IN A NOTEBOOK WHEN LOADED OR PASTED IN \n", "ANOTHER CELL:\n", "UCSC_chrom_sizes_2_circos_karyotype()\n", "\n", "-OR, just-\n", "\n", "main()\n", "\n", "'''\n", "#\n", "#\n", "#*******************************************************************************\n", "#\n", "\n", "\n", "\n", "\n", "\n", "#*******************************************************************************\n", "##################################\n", "# USER ADJUSTABLE VALUES #\n", "\n", "##################################\n", "#\n", "## default URL\n", "url = \"http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\" \n", "output_file_name = \"karyotype.tab\"\n", "\n", "species_code = None # replace `None` with what you want to use,\n", "# with flanking quotes if something appropriate is not being extracted from the\n", "# provided URL to be used as the species code.\n", "\n", "\n", "\n", "\n", "\n", "#\n", "#*******************************************************************************\n", "#**********************END USER ADJUSTABLE VARIABLES****************************\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "#*******************************************************************************\n", "#*******************************************************************************\n", "###DO NOT EDIT BELOW HERE - ENTER VALUES ABOVE###\n", "\n", "import sys\n", "import os\n", "\n", "\n", "\n", "\n", "###---------------------------HELPER FUNCTIONS---------------------------------###\n", "\n", "\n", "def make_and_save_karyotype(chromosomes_and_length, species_code):\n", " '''\n", " Takes a dictionary of chromosome identifiers and length and makes a karyotype\n", " file with that information.\n", "\n", " Result will look like this at start of output file:\n", " chr - Sc-chrIV chrIV 0 1531933 black\n", " chr - Sc-chrXV chrXV 0 1091291 black\n", " ...\n", "\n", " Function returns None.\n", " '''\n", " # prepare output file for saving so it will be open and ready\n", " with open(output_file_name, 'w') as output_file:\n", " for indx,(chrom,length) in enumerate(chromosomes_and_length.items()):\n", " next_line = (\"chr\\t-\\t{species_code}-{chrom}\\t{chrom}\\t0\"\n", " \"\\t{length}\\tblack\".format(\n", " species_code=species_code,chrom=chrom, length=length))\n", " if indx < (len(chromosomes_and_length)-1):\n", " next_line += \"\\n\" # don't add new line character to last line\n", " # Send the built line to output\n", " output_file.write(next_line)\n", " sys.stderr.write( \"\\n\\nThe karyotype file for {} chromosomes has been saved \"\n", " \"as a file named\"\n", " \" '{}'.\".format(len(chromosomes_and_length),output_file_name))\n", "\n", "\n", "def extract_species_code_fromUCSC_URL(url):\n", " '''\n", " Take something like:\n", " https://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/sacCer3.chrom.sizes\n", "\n", " And return:\n", " sacCer\n", "\n", " Note:\n", " I decided to use `''.join([i for i in s if not i.isdigit()])`, where s is a\n", " aprovided string, to toss digits.\n", " '''\n", " species_code = url.split(\"goldenPath\")[1].split(\"/\")[1]\n", " return ''.join([i for i in species_code if not i.isdigit()]) # remove digits\n", "###--------------------------END OF HELPER FUNCTIONS---------------------------###\n", "###--------------------------END OF HELPER FUNCTIONS---------------------------###\n", "\n", "#*******************************************************************************\n", "###------------------------'main' function of script---------------------------##\n", "# This switch below about `species_code_hardcoded` added here so that above the \n", "# user can see they can edit it under 'END USER ADJUSTABLE VARIABLES' to make it \n", "# a string, but I want it tp default to `False` when not set to make checking \n", "# status easier.\n", "if species_code == \"None\":\n", " species_code = False\n", "\n", "def UCSC_chrom_sizes_2_circos_karyotype(url=url, species_code=species_code):\n", " '''\n", " Main function of script. Will use url to get `chrom.sizes` file from UCSC \n", " and use that to make a karyotype file for use in Circos.\n", " Saves the file as tab-separated values with the extension `.tab`, by\n", " default, to be consistent with what Circos ecosystem seems to use.\n", "\n", " Default url is the yeast one if calling function from inside Juputer or\n", " IPyhon.\n", "\n", " Optionally a string can be provided in the call to the function to be used\n", " as species in place of the one extracted automatically. Example: \n", " `species_code = \"doggie\"`\n", "\n", " Returns: None\n", " '''\n", " # Get data from URL.\n", " chromosomes_and_length = {}\n", " # Getting html originally for just Python 3, adapted from \n", " # https://stackoverflow.com/a/17510727/8508004 and then updated from to \n", " # handle Python 2 and 3 according to same link.\n", " try:\n", " # For Python 3.0 and later\n", " from urllib.request import urlopen\n", " except ImportError:\n", " # Fall back to Python 2's urllib2\n", " from urllib2 import urlopen\n", " html = urlopen(url)\n", " for line in html.read().splitlines():\n", " #chromosome, chr_len, *_ = line.strip().split()\n", " # that elegant unpack above is based on \n", " # https://stackoverflow.com/questions/11371204/unpack-the-first-two-elements-in-list-tuple\n", " # , but it won't work in Python 2. From same place, one that works in 2:\n", " chromosome, chr_len = line.strip().split()[:2]\n", " chromosomes_and_length[chromosome.decode(\n", " encoding='UTF-8')] = chr_len.decode(encoding='UTF-8')\n", "\n", "\n", "\n", " # Parse the URL for a genus/species -type identifier. (If one not provided.)\n", " # Note part of keeping URL separate is so that I parse it to parse out from URL \n", " # first part of genus-species identifier. Here in development version that is \n", " # `sacCer3`, for yeast Saccharmyces cerevisiae. Parsing\n", " # because of advice [here](http://circos.ca/documentation/tutorials/ideograms/karyotypes/), \n", " # \"Even when working with only one species, prefixing the chromosome with a \n", " # species code is highly recommended - this will greatly help in creating \n", " # more transparent configuration and data files.\"\n", " if species_code:\n", " species_code = species_code\n", " sys.stderr.write( \"\\nThe following \"\n", " \"species code will be used in the ID column \"\n", " \"in the\\nproduced karyotype file: '{}'.\".format(species_code))\n", " else:\n", " species_code = extract_species_code_fromUCSC_URL(url)\n", " if species_code == \"sacCer\":\n", " species_code = \"Sc\" # CUSTOMIZING; I'd prefer to use this for yeast.\n", " sys.stderr.write( \"\\nBased on the provided URL, the following \"\n", " \"species code will be used in the\\nID column \"\n", " \"in the karyotype file: '{}'.\\n\"\n", " \"If that is not suitable, you can re-run the script and \"\n", " \"provide one when calling\\nthe script using the \"\n", " \"`--species_code` flag. Alternatively, edit \"\n", " \"the produced file with find/replace.\".format(species_code))\n", " # With the approach in that above block, I can expose `species_code` to \n", " # setting for advanced use without it being required and without need to be \n", " # passed into the function.\n", "\n", "\n", "\n", "\n", " # Now use the data to make a karyotype file as described at \n", " # http://circos.ca/documentation/tutorials/ideograms/karyotypes/ and like \n", " # on page 6 of http://circos.ca/tutorials/course/handouts/session-4.pdf\n", " make_and_save_karyotype(chromosomes_and_length, species_code)\n", "###--------------------------END OF MAIN FUNCTION----------------------------###\n", "###--------------------------END OF MAIN FUNCTION----------------------------###\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "#*******************************************************************************\n", "###------------------------'main' section of script---------------------------##\n", "\n", "def main():\n", " \"\"\" Main entry point of the script \"\"\"\n", " # placing actual main action in a 'helper'script so can call that easily \n", " # with a distinguishing name in Jupyter notebooks, where `main()` may get\n", " # assigned multiple times depending how many scripts imported/pasted in.\n", " UCSC_chrom_sizes_2_circos_karyotype(url,species_code)\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "if __name__ == \"__main__\" and '__file__' in globals():\n", " \"\"\" This is executed when run from the command line \"\"\"\n", " # Code with just `if __name__ == \"__main__\":` alone will be run if pasted\n", " # into a notebook. The addition of ` and '__file__' in globals()` is based\n", " # on https://stackoverflow.com/a/22923872/8508004\n", " # See also https://stackoverflow.com/a/22424821/8508004 for an option to \n", " # provide arguments when prototyping a full script in the notebook.\n", " ###-----------------for parsing command line arguments-----------------------###\n", " import argparse\n", " parser = argparse.ArgumentParser(prog='UCSC_chrom_sizes_2_circos_karyotype.py',\n", " description=\"UCSC_chrom_sizes_2_circos_karyotype.py takes a URL for a \\\n", " UCSC chrom.sizes file and makes a karyotype.tab file. \\\n", " **** Script by Wayne Decatur \\\n", " (fomightez @ github) ***\")\n", "\n", " parser.add_argument(\"URL\", help=\"URL of chrom.sizes file at UCSC. \\\n", " \", metavar=\"URL\")\n", " parser.add_argument('-sc', '--species_code', action='store', type=str, \n", " default= species_code_hardcoded, help=\"**OPTIONAL**Identifier \\\n", " to use in front of chromosome names. An attempt will be made to extract \\\n", " one if nothing is provided & that is why it's optional.\")\n", " parser.add_argument(\"output\", nargs='?', help=\"**OPTIONAL**Name of file \\\n", " for storing the karyotype. If none is provided, the karyotype will be \\\n", " stored as '\"+output_file_name+\"'.\", \n", " default=output_file_name , metavar=\"OUTPUT_FILE\")\n", "\n", " # See\n", " # https://stackoverflow.com/questions/4480075/argparse-optional-positional-arguments \n", " # and \n", " # https://docs.python.org/2/library/argparse.html#nargs for use of `nargs='?'` \n", " # to make output file name optional. Note that the square brackets\n", " # shown in the usage out signify optional according to \n", " # https://stackoverflow.com/questions/4480075/argparse-optional-positional-arguments#comment40460395_4480202\n", " # , but because placed under positional I added clarifying text to help \n", " # description.\n", " # IF MODIFYING THIS SCRIPT FOR USE ELSEWHERE AND DON'T NEED/WANT THE OUTPUT \n", " # FILE TO BE OPTIONAL, remove `nargs` (& default?) BUT KEEP WHERE NOT\n", " # USING `argparse.FileType` AND USING `with open` AS CONISDERED MORE PYTHONIC.\n", "\n", "\n", "\n", " #I would also like trigger help to display if no arguments provided because \n", " # need at least one for url\n", " if len(sys.argv)==1: #from http://stackoverflow.com/questions/4042452/display-help-message-with-python-argparse-when-script-is-called-without-any-argu\n", " parser.print_help()\n", " sys.exit(1)\n", " args = parser.parse_args()\n", " url= args.URL\n", " output_file_name = args.output\n", " species_code = args.species_code\n", "\n", "\n", " main()\n", "\n", "#*******************************************************************************\n", "###-***********************END MAIN PORTION OF SCRIPT***********************-###\n", "#*******************************************************************************" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with pasting the code into, once it is loaded into a cell there are options for calling the main function. Demonstrating those:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Based on the provided URL, the following species code will be used in the\n", "ID column in the karyotype file: 'Sc'.\n", "If that is not suitable, you can re-run the script and provide one when calling\n", "the script using the `--species_code` flag. Alternatively, edit the produced file with find/replace.\n", "\n", "The karyotype file for 17 chromosomes has been saved as a file named 'karyotype.tab'." ] } ], "source": [ "UCSC_chrom_sizes_2_circos_karyotype()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To provide your own settings, set the variables the script will use before calling it, like so:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Based on the provided URL, the following species code will be used in the\n", "ID column in the karyotype file: 'canFam'.\n", "If that is not suitable, you can re-run the script and provide one when calling\n", "the script using the `--species_code` flag. Alternatively, edit the produced file with find/replace.\n", "\n", "The karyotype file for 41 chromosomes has been saved as a file named 'karyotype.tab'." ] } ], "source": [ "url=\"http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes\"\n", "UCSC_chrom_sizes_2_circos_karyotype(url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "## Running core function of the script after importing\n", "\n", "This is similar to the last section,['Running core function of the script after loading into a cell'](#Running-core-function-of-the-script-after-loading-into-a-cell), but here we take advantage of Python's import statement to do what we did by pasting or loading code into a cell and running it. This is the preferred way to use the main function script if you are using it inside a Jupyter notebook or IPython notebook. (The above section was just included as that is more easily followed than explaning the use of `import`.)\n", "\n", "First insure the script is available where you are running. Running the next command will do that here. (You may have already down it earlier, but it is okay to run again.)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 16358 100 16358 0 0 202k 0 --:--:-- --:--:-- --:--:-- 202k\n" ] } ], "source": [ "!curl -O https://raw.githubusercontent.com/fomightez/sequencework/master/circos-utilities/UCSC_chrom_sizes_2_circos_karyotype.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then import the main function of the script to the notebook's active computational environment via an import statement. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from UCSC_chrom_sizes_2_circos_karyotype import UCSC_chrom_sizes_2_circos_karyotype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(As written above the command to do that looks a bit redundant; however, the first `from` part of the command below actually is referencing the `UCSC_chrom_sizes_2_circos_karyotype` script, but it doesn't need the `.py` extension because the `import` only deals with such files.)\n", "\n", "With the main function imported, it is now available to be run." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Based on the provided URL, the following species code will be used in the\n", "ID column in the karyotype file: 'Sc'.\n", "If that is not suitable, you can re-run the script and provide one when calling\n", "the script using the `--species_code` flag. Alternatively, edit the produced file with find/replace.\n", "\n", "The karyotype file for 17 chromosomes has been saved as a file named 'karyotype.tab'." ] } ], "source": [ "UCSC_chrom_sizes_2_circos_karyotype()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To provide your own settings, set the variables the script will use before calling it with that varible setting, like so:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Based on the provided URL, the following species code will be used in the\n", "ID column in the karyotype file: 'canFam'.\n", "If that is not suitable, you can re-run the script and provide one when calling\n", "the script using the `--species_code` flag. Alternatively, edit the produced file with find/replace.\n", "\n", "The karyotype file for 41 chromosomes has been saved as a file named 'karyotype.tab'." ] } ], "source": [ "url=\"http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes\"\n", "UCSC_chrom_sizes_2_circos_karyotype(url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or directly call it with that long URL, like below. You can also specify the other setting allowed, which is `species_code`, at the same time, like so:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "The following species code will be used in the ID column in the\n", "produced karyotype file: 'Dog'.\n", "\n", "The karyotype file for 41 chromosomes has been saved as a file named 'karyotype.tab'." ] } ], "source": [ "UCSC_chrom_sizes_2_circos_karyotype(\"http://hgdownload.cse.ucsc.edu/goldenPath/canFam2/bigZips/canFam2.chrom.sizes\",species_code=\"Dog\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting `species_code` let's you apply what you want for the species instead of relying on the script to extract it.\n", "\n", "\n", "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Save the karyotype file produced** to your local machine if you are running this not on your own machine. \n", "Enjoy!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }