{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2020 The TensorFlow IO Authors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "qFdPvlXBOdUN" }, "source": [ "# Azure blob storage with TensorFlow" ] }, { "cell_type": "markdown", "metadata": { "id": "MfBg1C5NB3X0" }, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " View on TensorFlow.org\n", " \n", " Run in Google Colab\n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "MsyhTesh4LSy" }, "source": [ "Caution: In addition to python packages this notebook uses `npm install --user` to install packages. Be careful when running locally. \n" ] }, { "cell_type": "markdown", "metadata": { "id": "xHxb-dlhMIzW" }, "source": [ "## Overview\n", "\n", "This tutorial shows how to use read and write files on [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/) with TensorFlow, through TensorFlow IO's Azure file system integration.\n", "\n", "An Azure storage account is needed to read and write files on Azure Blob Storage. The Azure Storage Key should be provided through environmental variable:\n", "```\n", "os.environ['TF_AZURE_STORAGE_KEY'] = ''\n", "```\n", "\n", "The storage account name and container name are part of the filename uri:\n", "```\n", "azfs:////\n", "```\n", "\n", "In this tutorial, for demo purposes you can optionally setup [Azurite](https://github.com/Azure/Azurite) which is a Azure Storage emulator. With Azurite emulator it is possible to read and write files through Azure blob storage interface with TensorFlow." ] }, { "cell_type": "markdown", "metadata": { "id": "MUXex9ctTuDB" }, "source": [ "## Setup and usage" ] }, { "cell_type": "markdown", "metadata": { "id": "upgCc3gXybsA" }, "source": [ "### Install required packages, and restart runtime" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "uUDYyMZRfkX4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow 2.x selected.\n", "Collecting tensorflow-io\n", "\u001b[?25l Downloading https://files.pythonhosted.org/packages/c0/d0/c5d7adce72c6a6d7c9a59c062150f60b5404c706578a0922f7dc2835713c/tensorflow_io-0.12.0-cp36-cp36m-manylinux2010_x86_64.whl (20.1MB)\n", "\u001b[K |████████████████████████████████| 20.1MB 42.7MB/s \n", "\u001b[?25hRequirement already satisfied: tensorflow<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow-io) (2.1.0)\n", "Requirement already satisfied: opt-einsum>=2.3.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0)\n", "Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)\n", "Requirement already satisfied: grpcio>=1.8.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.27.2)\n", "Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.34.2)\n", "Requirement already satisfied: google-pasta>=0.1.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.1.8)\n", "Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0)\n", "Requirement already satisfied: wrapt>=1.11.1 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.12.0)\n", "Requirement already satisfied: scipy==1.4.1; python_version >= \"3\" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.4.1)\n", "Requirement already satisfied: protobuf>=3.8.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.11.3)\n", "Requirement already satisfied: termcolor>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)\n", "Requirement already satisfied: absl-py>=0.7.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.9.0)\n", "Requirement already satisfied: keras-applications>=1.0.8 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.8)\n", "Requirement already satisfied: astor>=0.6.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.8.1)\n", "Requirement already satisfied: gast==0.2.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.2)\n", "Requirement already satisfied: keras-preprocessing>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0)\n", "Requirement already satisfied: numpy<2.0,>=1.16.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.18.1)\n", "Requirement already satisfied: six>=1.12.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.14.0)\n", "Requirement already satisfied: setuptools>=41.0.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (45.2.0)\n", "Requirement already satisfied: markdown>=2.6.8 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.2.1)\n", "Requirement already satisfied: google-auth<2,>=1.6.3 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.11.2)\n", "Requirement already satisfied: requests<3,>=2.21.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.22.0)\n", "Requirement already satisfied: werkzeug>=0.11.15 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.0)\n", "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.1)\n", "Requirement already satisfied: h5py in /tensorflow-2.1.0/python3.6 (from keras-applications>=1.0.8->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.10.0)\n", "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0.0)\n", "Requirement already satisfied: rsa<4.1,>=3.1.4 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0)\n", "Requirement already satisfied: pyasn1-modules>=0.2.1 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.8)\n", "Requirement already satisfied: idna<2.9,>=2.5 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.8)\n", "Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.0.4)\n", "Requirement already satisfied: certifi>=2017.4.17 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2019.11.28)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.25.8)\n", "Requirement already satisfied: requests-oauthlib>=0.7.0 in /tensorflow-2.1.0/python3.6 (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.3.0)\n", "Requirement already satisfied: pyasn1>=0.1.3 in /tensorflow-2.1.0/python3.6 (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.8)\n", "Requirement already satisfied: oauthlib>=3.0.0 in /tensorflow-2.1.0/python3.6 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0)\n", "Installing collected packages: tensorflow-io\n", "Successfully installed tensorflow-io-0.12.0\n" ] } ], "source": [ "try:\n", " %tensorflow_version 2.x \n", "except Exception:\n", " pass\n", "\n", "!pip install tensorflow-io" ] }, { "cell_type": "markdown", "metadata": { "id": "yZmI7l_GykcW" }, "source": [ "### Install and setup Azurite (optional)\n", "\n", "In case an Azure Storage Account is not available, the following is needed to install and setup Azurite that emulates the Azure Storage interface:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "YUj0878jPyz7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K\u001b[?25h\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[30;43mWARN\u001b[0m \u001b[0m\u001b[35mdeprecated\u001b[0m request@2.87.0: request has been deprecated, see https://github.com/request/request/issues/3142\n", "\u001b[K\u001b[?25h\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[30;43mWARN\u001b[0m \u001b[0m\u001b[35msaveError\u001b[0m ENOENT: no such file or directory, open '/content/package.json'\n", "\u001b[0m\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[34;40mnotice\u001b[0m\u001b[35m\u001b[0m created a lockfile as package-lock.json. You should commit this file.\n", "\u001b[0m\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[30;43mWARN\u001b[0m \u001b[0m\u001b[35menoent\u001b[0m ENOENT: no such file or directory, open '/content/package.json'\n", "\u001b[0m\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[30;43mWARN\u001b[0m\u001b[35m\u001b[0m content No description\n", "\u001b[0m\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[30;43mWARN\u001b[0m\u001b[35m\u001b[0m content No repository field.\n", "\u001b[0m\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[30;43mWARN\u001b[0m\u001b[35m\u001b[0m content No README data\n", "\u001b[0m\u001b[37;40mnpm\u001b[0m \u001b[0m\u001b[30;43mWARN\u001b[0m\u001b[35m\u001b[0m content No license field.\n", "\u001b[0m\n", "+ azurite@2.7.0\n", "added 116 packages from 141 contributors in 6.591s\n" ] } ], "source": [ "!npm install azurite@2.7.0" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "KXbiNLKY4kNM" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "npm bin path: /content/node_modules/.bin\n" ] } ], "source": [ "# The path for npm might not be exposed in PATH env,\n", "# you can find it out through 'npm bin' command\n", "npm_bin_path = get_ipython().getoutput('npm bin')[0]\n", "print('npm bin path: ', npm_bin_path)\n", "\n", "# Run `azurite-blob -s` as a background process. \n", "# IPython doesn't recognize `&` in inline bash cells.\n", "get_ipython().system_raw(npm_bin_path + '/' + 'azurite-blob -s &')" ] }, { "cell_type": "markdown", "metadata": { "id": "acEST3amdyDI" }, "source": [ "### Read and write files to Azure Storage with TensorFlow\n", "\n", "The following is an example of reading and writing files to Azure Storage with TensorFlow's API.\n", "\n", "It behaves the same way as other file systems (e.g., POSIX or GCS) in TensorFlow once `tensorflow-io` package is imported, as `tensorflow-io` will automatically register `azfs` scheme for use.\n", "\n", "The Azure Storage Key should be provided through `TF_AZURE_STORAGE_KEY` environmental variable. Otherwise `TF_AZURE_USE_DEV_STORAGE` could be set to `True` to use Azurite emulator instead:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZIrXoXgYlsj_" }, "outputs": [], "source": [ "import os\n", "import tensorflow as tf\n", "import tensorflow_io as tfio\n", "\n", "# Switch to False to use Azure Storage instead:\n", "use_emulator = True\n", "\n", "if use_emulator:\n", " os.environ['TF_AZURE_USE_DEV_STORAGE'] = '1'\n", " account_name = 'devstoreaccount1'\n", "else:\n", " # Replace with Azure Storage Key, and with Azure Storage Account\n", " os.environ['TF_AZURE_STORAGE_KEY'] = ''\n", " account_name = ''\n", "\n", " # Alternatively, you can use a shared access signature (SAS) to authenticate with the Azure Storage Account\n", " os.environ['TF_AZURE_STORAGE_SAS'] = ''\n", " account_name = ''" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "h21RdP7meGzP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, world!\n" ] } ], "source": [ "pathname = 'az://{}/aztest'.format(account_name)\n", "tf.io.gfile.mkdir(pathname)\n", "\n", "filename = pathname + '/hello.txt'\n", "with tf.io.gfile.GFile(filename, mode='w') as w:\n", " w.write(\"Hello, world!\")\n", "\n", "with tf.io.gfile.GFile(filename, mode='r') as r:\n", " print(r.read())" ] }, { "cell_type": "markdown", "metadata": { "id": "zF8IKV7phkIU" }, "source": [ "## Configurations\n", "\n", "Configurations of Azure Blob Storage in TensorFlow are always done through environmental variables. Below is a complete list of available configurations:\n", "\n", "- `TF_AZURE_USE_DEV_STORAGE`:\n", " Set to 1 to use local development storage emulator for connections like 'az://devstoreaccount1/container/file.txt'. This will take precendence over all other settings so `unset` to use any other connection\n", "- `TF_AZURE_STORAGE_KEY`:\n", " Account key for the storage account in use\n", "- `TF_AZURE_STORAGE_USE_HTTP`:\n", " Set to any value if you don't want to use https transfer. `unset` to use default of https\n", "- `TF_AZURE_STORAGE_BLOB_ENDPOINT`:\n", " Set to the endpoint of blob storage - default is `.core.windows.net`.\n" ] } ], "metadata": { "colab": { "name": "azure.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }