{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "QUb3A2R-Pdh-" }, "source": [ "# ISB-CGC Community Notebooks\n", "Check out more notebooks at our [Community Notebooks Repository](https://github.com/isb-cgc/Community-Notebooks)!\n", "\n", "```\n", "Title: How to work with cloud storage\n", "Author: David L Gibbs\n", "Created: 2019-07-17\n", "Purpose: Demonstrate how to move files --in and out of-- GCS.\n", "```\n", "***\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## **How to work with cloud storage.**" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "H4J3VQuW488Z" }, "source": [ "### **In this notebook, we demonstrate how to work with files stored in GCS buckets.**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Documentation at: https://cloud.google.com/storage/docs/listing-objects" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "mOm-XpDEdTyf" }, "source": [ "### Let's authenticate ourselves" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "ozv4AqwsdIen", "outputId": "bb80b95c-4a97-429b-813e-3642a59b649a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your browser has been opened to visit:\n", "\n", " https://accounts.google.com/o/oauth2/auth?redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&prompt=select_account&response_type=code&client_id=32555940559.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&access_type=offline\n", "\n", "\n", "[13678:13697:0501/104705.579283:ERROR:browser_process_sub_thread.cc(217)] Waited 3 ms for network service\n", "Opening in existing browser session.\n", "\u001b[1;33mWARNING:\u001b[0m `gcloud auth login` no longer writes application default credentials.\n", "If you need to use ADC, see:\n", " gcloud auth application-default --help\n", "\n", "You are now logged in as [dgibbs@systemsbiology.org].\n", "Your current project is [isb-cgc-02-0001]. You can change this setting by running:\n", " $ gcloud config set project PROJECT_ID\n" ] } ], "source": [ "# with gcloud, we can authenticate ourselves\n", "\n", "!gcloud auth login" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;31mERROR:\u001b[0m (gcloud.config.set) The project property must be set to a valid project ID, not the project name [PROJECT_ID]\n", "To set your project, run:\n", "\n", " $ gcloud config set project PROJECT_ID\n", "\n", "or to unset it, run:\n", "\n", " $ gcloud config unset project\n" ] } ], "source": [ "# and we can select our project\n", "\n", "!gcloud config set project PROJECT_ID\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "zajZEqKxZ3Su" }, "source": [ "### Using 'bangs'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a 'bang', or the exclamaition point (!), we can run command line commands.\n", "This includes file operations like 'ls', 'mv', and 'cp'.\n", "We can also use Google's tools including 'gsutil'." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 2.1M\n", "drwxr-xr-x 3 davidgibbs davidgibbs 4.0K Apr 30 12:13 .\n", "drwxr-xr-x 8 davidgibbs davidgibbs 4.0K Apr 30 10:07 ..\n", "-rw-r--r-- 1 davidgibbs davidgibbs 7.9K Apr 30 10:08 'BCGSC microRNA expression.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 19K Apr 30 11:06 'BRAF-V600 study using CCLE data.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 174K Apr 30 10:08 'Copy Number segments.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 111K Apr 30 10:45 'Creating TCGA cohorts -- part 1.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 26K Apr 30 10:08 'Creating TCGA cohorts -- part 2.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 130K Apr 30 10:08 'Creating TCGA cohorts -- part 3.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 65K Apr 30 10:08 'DNA Methylation.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 15K Apr 30 12:13 how_to_move_files.ipynb\n", "drwxr-xr-x 2 davidgibbs davidgibbs 4.0K Apr 30 11:08 .ipynb_checkpoints\n", "-rw-r--r-- 1 davidgibbs davidgibbs 362K Apr 30 10:08 isb_cgc_bam_slicing_with_pysam.ipynb\n", "-rw-r--r-- 1 davidgibbs davidgibbs 362K Apr 30 10:08 ISB_cgc_bam_slicing_with_pysam.ipynb\n", "-rw-r--r-- 1 davidgibbs davidgibbs 116K Apr 30 10:08 ISB_CGC_Query_of_the_Month_November_2018.ipynb\n", "-rw-r--r-- 1 davidgibbs davidgibbs 22K Apr 30 10:08 'Protein expression.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 2.5K Apr 30 10:08 README.md\n", "-rw-r--r-- 1 davidgibbs davidgibbs 382K Apr 30 10:08 RegulomeExplorer_1_Gexpr_CNV.ipynb\n", "-rw-r--r-- 1 davidgibbs davidgibbs 28K Apr 30 10:46 renamed_test.bam\n", "-rw-r--r-- 1 davidgibbs davidgibbs 106K Apr 30 10:08 'Somatic Mutations.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 40K Apr 30 10:08 'TCGA Annotations.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 15K Apr 30 10:08 'The ISB-CGC open-access TCGA tables in BigQuery.ipynb'\n", "-rw-r--r-- 1 davidgibbs davidgibbs 51K Apr 30 10:08 'UNC HiSeq mRNAseq gene expression.ipynb'\n" ] } ], "source": [ "# here we get a list of files stored *locally*\n", "!ls -lha " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": {}, "colab_type": "code", "id": "fy5bLplJaCch" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gs://bam_bucket_1/renamed_test.bam\n", "gs://bam_bucket_1/test.bam\n", "gs://bam_bucket_1/test_2.bam\n", "gs://bam_bucket_1/test_3.bam\n" ] } ], "source": [ "# Now we use gsutil to list files in our bucket.\n", "!gsutil ls gs://bam_bucket_1/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 68 }, "colab_type": "code", "id": "J6_omldndGYS", "outputId": "a61c9190-89ec-48fd-f83e-5d1263d8aa7e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Copying gs://bam_bucket_1/test.bam...\n", "/ [1 files][ 27.5 KiB/ 27.5 KiB] \n", "Operation completed over 1 objects/27.5 KiB. \n" ] } ], "source": [ "# then we can copy to our local env.\n", "!gsutil cp gs://bam_bucket_1/test.bam test_dl.bam" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 68 }, "colab_type": "code", "id": "Tjdw88aMuW5H", "outputId": "0bf12829-b677-435d-c978-8a907fc5d7bf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-rw-r--r-- 1 davidgibbs davidgibbs 28K Apr 30 10:46 renamed_test.bam\n", "-rw-r--r-- 1 davidgibbs davidgibbs 28K May 1 10:47 test_dl.bam\n" ] } ], "source": [ "# and it made it?\n", "!ls -lha | grep test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 68 }, "colab_type": "code", "id": "J6_omldndGYS", "outputId": "a61c9190-89ec-48fd-f83e-5d1263d8aa7e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Copying file://renamed_test.bam [Content-Type=application/octet-stream]...\n", "/ [1 files][ 27.5 KiB/ 27.5 KiB] \n", "Operation completed over 1 objects/27.5 KiB. \n" ] } ], "source": [ "# then we can copy it back to our bucket\n", "!mv test_dl.bam renamed_test.bam\n", "!gsutil cp renamed_test.bam gs://bam_bucket_1/renamed_test.bam" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "zajZEqKxZ3Su" }, "source": [ "### Using 'pythons'." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# to install and import the library\n", "#!pip3 install --upgrade --user google-cloud-storage" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your browser has been opened to visit:\n", "\n", " https://accounts.google.com/o/oauth2/auth?redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&prompt=select_account&response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&access_type=offline\n", "\n", "\n", "[14422:14441:0501/104735.895643:ERROR:browser_process_sub_thread.cc(217)] Waited 8 ms for network service\n", "Opening in existing browser session.\n", "\n", "Credentials saved to file: [/home/davidgibbs/.config/gcloud/application_default_credentials.json]\n", "\n", "These credentials will be used by any library that requests\n", "Application Default Credentials.\n", "\n", "To generate an access token for other uses, run:\n", " gcloud auth application-default print-access-token\n" ] } ], "source": [ "!gcloud auth application-default login\n", "#Your browser has been opened to visit:\n", "#\n", "# https://accounts.google.com/o/oauth2/auth?redirect_uri=.....\n", "#\n", "#Credentials saved to file: [/home/davidgibbs/.config/gcloud/application_default_credentials.json]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/davidgibbs/.local/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a \"quota exceeded\" or \"API not enabled\" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n", " warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n" ] } ], "source": [ "!export GOOGLE_APPLICATION_CREDENTIALS=\"/home/davidgibbs/.config/gcloud/application_default_credentials.json\"\n", "import google.auth\n", "import google.cloud.storage as storage\n", "credentials, project = google.auth.default()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/davidgibbs/.local/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a \"quota exceeded\" or \"API not enabled\" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n", " warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n", "/home/davidgibbs/.local/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a \"quota exceeded\" or \"API not enabled\" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n", " warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n" ] } ], "source": [ "# connection to our project\n", "storage_client = storage.Client()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "\n", "for b in storage_client.list_buckets():\n", " print(b)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bucket wild_new_bucket_2000 created\n" ] } ], "source": [ "# here we'll create a bucket\n", "bucket = storage_client.create_bucket('wild_new_bucket_2000')\n", "print('Bucket {} created'.format(bucket.name))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# and then move a file to it\n", "bucket = storage_client.get_bucket('wild_new_bucket_2000')\n", "blob = bucket.blob('test_dl_upload.bam')\n", "blob.upload_from_filename('renamed_test.bam')" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test_dl_upload.bam\n" ] } ], "source": [ "# and check that it made it \n", "blobs = bucket.list_blobs()\n", "for blob in blobs:\n", " print(blob.name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# end of notebook" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "isb_cgc_bam_slicing_with_pysam.ipynb", "provenance": [], "version": "0.3.2" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }