{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Basic Usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The central data structure provided by the library is the `BlobPath` type  \n",
    "This type would abstract away the internals of how the file is stored and works in a cloud agnostic manner  \n",
    "\n",
    "Note that you would need to install the `aws` extra to work with S3 paths:\n",
    "```bash\n",
    "pip install 'blob-path[aws]'\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from blob_path.backends.s3 import S3BlobPath\n",
    "from pathlib import PurePath\n",
    "\n",
    "bucket_name = \"narang-public-s3\"\n",
    "object_key = PurePath(\"hello_world.txt\")\n",
    "region = \"us-east-1\"\n",
    "blob_path = S3BlobPath(bucket_name, region, object_key)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The blob path is simply a path representation, like `pathlib.Path`, its not required that the file should exist or not  \n",
    "You can check for existence using `exists`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blob_path.exists()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The main method that `BlobPath` provides is `open`, it mimicks the builtin `open` function to some extent  \n",
    "This method is the central abstraction, many operations are handled in a generic way using this method\n",
    "\n",
    "Lets write something to the object in our bucket"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with blob_path.open(\"w\") as f:\n",
    "    f.write(\"hello world\")\n",
    "\n",
    "# the file would exist in S3 now, you should check it out\n",
    "blob_path.exists()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "S3 and other cloud storage blob paths can be fully serialised and deserialised.  \n",
    "You can pass around these path objects across processes (and servers) and easily locate the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'kind': 'blob-path-aws',\n",
       " 'payload': {'bucket': 'narang-public-s3',\n",
       "  'region': 'us-east-1',\n",
       "  'object_key': ['hello_world.txt']}}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# a single blob path can be serialised using the method `serialise`\n",
    "blob_path.serialise()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "kind=blob-path-aws bucket=narang-public-s3 region=us-east-1 object_key=hello_world.txt"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# lets deserialise them\n",
    "# deserialise is a separate function and you can pass it any kind of blob path and it would correctly deserialise it\n",
    "\n",
    "from blob_path.deserialise import deserialise\n",
    "\n",
    "deserialised_s3_blob = deserialise(\n",
    "    {\n",
    "        \"kind\": \"blob-path-aws\",\n",
    "        \"payload\": {\n",
    "            \"bucket\": \"narang-public-s3\",\n",
    "            \"region\": \"us-east-1\",\n",
    "            \"object_key\": [\"hello_world.txt\"],\n",
    "        },\n",
    "    }\n",
    ")\n",
    "\n",
    "deserialised_s3_blob"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets try another path backend, the `LocalRelativeBlobPath`, this path models a local FS relative path, which is always rooted at a single root directory  \n",
    "Consider you store all the application files inside a single path \"/tmp/my-apps-files\"  \n",
    "In this case, instead of using `pathlib.Path`, you could use `LocalRelativeBlobPath` (this allows you to easily switch between using a cloud storage or a local storage for your files)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from blob_path.backends.local_relative import LocalRelativeBlobPath\n",
    "\n",
    "# PurePath is a simple path representation, but it does not care whether its actually a path or not in your FS\n",
    "# Its useful for logically representing various data structures, as an example, you could represent S3 object keys as `PurePaths`\n",
    "from pathlib import PurePath\n",
    "\n",
    "relpath = PurePath(\"local\") / \"storage.txt\"\n",
    "local_blob = LocalRelativeBlobPath(relpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "Exception",
     "evalue": "tried fetching implicit variable from environment but the var os.environ['IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR'] does not exist",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mException\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[8], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mlocal_blob\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexists\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/Desktop/personal/blob-path/src/blob_path/backends/local_relative.py:74\u001b[0m, in \u001b[0;36mLocalRelativeBlobPath.exists\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m     73\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mexists\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mbool\u001b[39m:\n\u001b[0;32m---> 74\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m (\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_p\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m)\u001b[38;5;241m.\u001b[39mexists()\n",
      "File \u001b[0;32m~/Desktop/personal/blob-path/src/blob_path/backends/local_relative.py:94\u001b[0m, in \u001b[0;36mLocalRelativeBlobPath._p\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m     93\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_p\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Path:\n\u001b[0;32m---> 94\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_get_implicit_base_path\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;241m/\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_relpath\n",
      "File \u001b[0;32m~/Desktop/personal/blob-path/src/blob_path/backends/local_relative.py:110\u001b[0m, in \u001b[0;36m_get_implicit_base_path\u001b[0;34m()\u001b[0m\n\u001b[1;32m    109\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_get_implicit_base_path\u001b[39m() \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Path:\n\u001b[0;32m--> 110\u001b[0m     base_path \u001b[38;5;241m=\u001b[39m Path(\u001b[43mget_implicit_var\u001b[49m\u001b[43m(\u001b[49m\u001b[43mBASE_VAR\u001b[49m\u001b[43m)\u001b[49m)\n\u001b[1;32m    111\u001b[0m     base_path\u001b[38;5;241m.\u001b[39mmkdir(exist_ok\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m, parents\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m    112\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m base_path\n",
      "File \u001b[0;32m~/Desktop/personal/blob-path/src/blob_path/implicit.py:30\u001b[0m, in \u001b[0;36mget_implicit_var\u001b[0;34m(var)\u001b[0m\n\u001b[1;32m     28\u001b[0m result \u001b[38;5;241m=\u001b[39m _PROVIDER(var)\n\u001b[1;32m     29\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m result \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m---> 30\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m(\n\u001b[1;32m     31\u001b[0m         \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtried fetching implicit variable from environment \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m     32\u001b[0m         \u001b[38;5;241m+\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbut the var os.environ[\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mvar\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m] does not exist\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m     33\u001b[0m     )\n\u001b[1;32m     34\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m result\n",
      "\u001b[0;31mException\u001b[0m: tried fetching implicit variable from environment but the var os.environ['IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR'] does not exist"
     ]
    }
   ],
   "source": [
    "local_blob.exists()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Uh oh, we got an error, that too really early ;_;\n",
    "It says that we have not defined `IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR` in our environment  \n",
    "\n",
    "This environment variable stores the root directory of your relative paths"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "import os\n",
    "\n",
    "os.environ[\"IMPLICIT_BLOB_PATH_LOCAL_RELATIVE_BASE_DIR\"] = str(\n",
    "    Path.home() / \"tmp\" / \"local_fs_root\"\n",
    ")\n",
    "\n",
    "# it passes now, and says that the file does not exist\n",
    "local_blob.exists()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So why is `LocalRelativeBlobPath` taking the root directory as an environment variable? Could we pass it in `__init__`?  \n",
    "We could argue about this, but then the path is pretty much the same as any absolute path. Even the serialised representation of `LocalRelativeBlobPath` leaves out the root directory (its not part of the path representation)  \n",
    "\n",
    "## Implict variables\n",
    "These variables which modify the behavior of `BlobPath` are called implicit variables. They are by default, picked from the environment  \n",
    "Fetching the root directory from environment has multiple benefits\n",
    "- You could mount the same path between multiple containers at **different** mount points and still pass around the serialised representation correctly (assuming you provide the implicit variables correctly)\n",
    "- Same for servers mounted with an NFS\n",
    "- This also works well for presigned URLs, where you can simply start an nginx server and pass that server's base URL as an implicit variable to the path\n",
    "\n",
    "Implicit variables will change the behavior and location of your blobs implicitly (hah! perfect naming). Every implicit variable follows the naming convention: `IMPLICIT_BLOB_PATH_<BACKEND>_...`  \n",
    "Currently, only `LocalRelativeBlobPath` has implicit variables  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's do a simple copy operation between an S3 path and a local path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello world\n"
     ]
    }
   ],
   "source": [
    "import shutil\n",
    "\n",
    "# the long way\n",
    "with deserialised_s3_blob.open(\"r\") as fr:\n",
    "    with local_blob.open(\"w\") as fw:\n",
    "        shutil.copyfileobj(fr, fw)\n",
    "\n",
    "with local_blob.open(\"r\") as f:\n",
    "    print(f.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets use a shortcut now.  \n",
    "Whenever possible, prefer shortcuts from the library for your operations  \n",
    "Currently, they only provide ease-of-use, but we can later optimise away special cases (like copying between two S3 blobs can be triggered using a remote copy with boto3, without copying data in your local machine)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "local blob content copied from s3: hello world\n",
      "copied using shortcut: hello world\n"
     ]
    }
   ],
   "source": [
    "# delete first for the example\n",
    "local_blob.delete()\n",
    "\n",
    "deserialised_s3_blob.cp(local_blob)\n",
    "with local_blob.open(\"r\") as f:\n",
    "    print(\"local blob content copied from s3:\", f.read())\n",
    "\n",
    "\n",
    "# using a shortcut from the library\n",
    "# this shortcut provides more convenience, any of the `src` or `dest` can be `pathlib.Path` too\n",
    "# this makes it easy to deal with normal paths in your FS\n",
    "from blob_path.shortcuts import cp\n",
    "\n",
    "local_blob.delete()\n",
    "cp(deserialised_s3_blob, local_blob)\n",
    "with local_blob.open(\"r\") as f:\n",
    "    print(\"copied using shortcut:\", f.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets play a bit with an Azure path now, if you want, you can change it to any of the other paths, this to simply show that everything works with same with Azure paths  \n",
    "We will copy data from the S3 path to the Azure path now\n",
    "\n",
    "You will need to install the azure extra\n",
    "```\n",
    "pip install 'blob-path[azure]'\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from blob_path.backends.azure_blob_storage import AzureBlobPath\n",
    "from pathlib import PurePath\n",
    "\n",
    "destination = AzureBlobPath(\"narang99blobstore\", \"testcontainer\", PurePath(\"copied\") / \"from\" / \"s3.txt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deserialised_s3_blob.cp(destination)\n",
    "destination.exists()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}