{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bfef84ba",
   "metadata": {},
   "source": [
    "# Using Cache (available since v21.06.00)\n",
    "\n",
    "## Need for Cache\n",
    "\n",
    "In many deep learning use cases, small image patches need to be extracted from the large image and they are fed into the neural network. \n",
    "\n",
    "If the patch size doesn't align with the underlying tile layout of TIFF image (e.g., AI model such as ResNet may accept a particular size of the image [e.g., 224x224] that is smaller than the underlying tile size [256x256]), redundant image loadings for a tile are needed (See the following two figures)\n",
    "\n",
    "![image](https://user-images.githubusercontent.com/1928522/118344267-333a4f00-b4e2-11eb-898c-8980c8725d32.png)\n",
    "![image](https://user-images.githubusercontent.com/1928522/118344294-5238e100-b4e2-11eb-8f3a-4772ef055658.png)\n",
    "\n",
    "Which resulted in lower performance for unaligned cases as shown in our [GTC 2021 presentation](https://www.nvidia.com/en-us/gtc/catalog/?search=cuCIM)\n",
    "\n",
    "![image](https://user-images.githubusercontent.com/1928522/118344737-c07ea300-b4e4-11eb-9c95-15c2e5022274.png)\n",
    "\n",
    "\n",
    "The proper use of cache improves the loading performance greatly, especially for **inference** use cases and when [accessing tiles sequentially (left to right, top to bottom) from one TIFF file](https://nbviewer.jupyter.org/github/rapidsai/cucim/blob/branch-21.06/notebooks/File-access_Experiments_on_TIFF.ipynb#1.-Accessing-tiles-sequentially-(left-to-right,-top-to-bottom)-from-one-TIFF-file).\n",
    "\n",
    "On the other hand, if the application [accesses partial tiles randomly from multiple TIFF files](https://nbviewer.jupyter.org/github/rapidsai/cucim/blob/branch-21.06/notebooks/File-access_Experiments_on_TIFF.ipynb#3.-Accessing-partial-tiles-randomly-from-multiple-TIFF-files) (this usually happens for **training** use cases), using a cache could be meaningless."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e952a222",
   "metadata": {},
   "source": [
    "## Enabling cache\n",
    "\n",
    "Currently, cuCIM supports the following three strategies:\n",
    "\n",
    "  - `nocache`\n",
    "  - `per_process`\n",
    "  - `shared_memory` (interprocess)\n",
    "\n",
    "\n",
    "**1) `nocache`**\n",
    "\n",
    "No cache.\n",
    "\n",
    "By default, this cache strategy is used.\n",
    "With this strategy, the behavior is the same as one before `v20.06.00`.\n",
    "\n",
    "**2) `per_process`**\n",
    "\n",
    "The cache memory is shared among threads.\n",
    "\n",
    "**3) `shared_memory`**\n",
    "\n",
    "The cache memory is shared among processes.\n",
    "\n",
    "### Getting cache setting\n",
    "\n",
    "`CuImage.cache()` would return an object that can control the current cache. The object has the following properties:\n",
    "\n",
    "- `type`: The type (strategy) name\n",
    "- `memory_size`: The number of bytes used in the cache memory\n",
    "- `memory_capacity`: The maximum number of bytes that can be allocated (used) in the cache memory\n",
    "- `free_memory`: The number of bytes available in the cache memory\n",
    "- `size`: The number of cache items used\n",
    "- `capacity`: The maximum number of cache items that can be created\n",
    "- `hit_count`: The cache hit count\n",
    "- `miss_count`: The cache miss count\n",
    "- `config`: A configuration dictionary that was used for configuring cache.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ac9aa319",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       type: CacheType.NoCache(0)\n",
      "memory_size: 0/0\n",
      "free_memory: 0\n",
      "       size: 0/0\n",
      "  hit_count: 0\n",
      " miss_count: 0\n",
      "     config: {'type': 'nocache', 'memory_capacity': 1024, 'capacity': 5461, 'mutex_pool_capacity': 11117, 'list_padding': 10000, 'extra_shared_memory_size': 100, 'record_stat': False}\n"
     ]
    }
   ],
   "source": [
    "from cucim import CuImage\n",
    "\n",
    "cache = CuImage.cache()\n",
    "\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'free_memory: {cache.free_memory}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n",
    "print(f'  hit_count: {cache.hit_count}')\n",
    "print(f' miss_count: {cache.miss_count}')\n",
    "print(f'     config: {cache.config}')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f057a11a",
   "metadata": {},
   "source": [
    "### Changing Cache Setting\n",
    "\n",
    "Cache configuration can be changed by adding parameters to `cache()` method.\n",
    "\n",
    "The following parameters are available:\n",
    "\n",
    "- `type`: The type (strategy) name. Default to 'no_cache'.\n",
    "- `memory_capacity`: The maximum number of mebibytes (`MiB`, 2^20) that can be allocated (used) in the cache memory. Default to `1024`.\n",
    "- `capacity`: The maximum number of cache items that can be created. Default to `5461` (= (\\<memory_capacity\\> x 2^20) / (256x256x3)).\n",
    "- `mutex_pool_capacity`: The mutex pool size. Default to `11117`.\n",
    "- `list_padding`: The number of additional items used for the internal circular queue. Default to `10000`.\n",
    "- `extra_shared_memory_size`: The size of additional memory allocation (in MiB) for shared_memory allocator in `shared_process` strategy. Default to `100`.\n",
    "- `record_stat`: If the cache statistic should be recorded or not. Default to `False`.\n",
    "\n",
    "In most cases, `type`(required) and `memory_capacity` are used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "e7d3090d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       type: CacheType.PerProcess(1)\n",
      "memory_size: 0/2147483648\n",
      "free_memory: 2147483648\n",
      "       size: 0/10922\n",
      "  hit_count: 0\n",
      " miss_count: 0\n",
      "     config: {'type': 'per_process', 'memory_capacity': 2048, 'capacity': 10922, 'mutex_pool_capacity': 11117, 'list_padding': 10000, 'extra_shared_memory_size': 100, 'record_stat': False}\n"
     ]
    }
   ],
   "source": [
    "from cucim import CuImage\n",
    "\n",
    "cache = CuImage.cache('per_process', memory_capacity=2048)\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'free_memory: {cache.free_memory}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n",
    "print(f'  hit_count: {cache.hit_count}')\n",
    "print(f' miss_count: {cache.miss_count}')\n",
    "print(f'     config: {cache.config}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbd45a9e",
   "metadata": {},
   "source": [
    "## Choosing Proper Cache Memory Size\n",
    "\n",
    "It is important to select the appropriate cache memory size (capacity). Small cache memory size results in low cache hit rates. Conversely, if the cache memory size is too large, memory is wasted.\n",
    "\n",
    "For example, if the default tile size is 256x256 and the patch size to load is 224x224, the cache memory needs to be large enough to contain at least two rows of tiles in the image to avoid deleting the required cache entries while loading patches sequentially (left to right, top to bottom) from one TIFF file.\n",
    "\n",
    "![image](https://user-images.githubusercontent.com/1928522/120760720-4cbf2d00-c4c9-11eb-875b-b070203fd8e6.png)\n",
    "\n",
    "cuCIM provide a utility method (`cucim.clara.cache.preferred_memory_capacity()`) to calculate a preferred cache memory size for the given image (image size and tile size) and the patch size.\n",
    "\n",
    "Internal logic is available at <https://godbolt.org/z/jY7G84xzT>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "bfb70aa4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "image size: [19920, 26420]\n",
      "tile size: (256, 256)\n",
      "memory_capacity : 74 MiB\n",
      "memory_capacity2: 74 MiB\n",
      "memory_capacity3: 74 MiB\n",
      "= Cache Info =\n",
      "       type: CacheType.PerProcess(1)\n",
      "memory_size: 0/77594624\n",
      "       size: 0/394\n"
     ]
    }
   ],
   "source": [
    "from cucim import CuImage\n",
    "from cucim.clara.cache import preferred_memory_capacity\n",
    "\n",
    "img = CuImage('input/image.tif')\n",
    "\n",
    "image_size = img.size('XY')                        # same with `img.resolutions[\"level_dimensions\"][0]`\n",
    "tile_size = img.resolutions['level_tile_sizes'][0] # default: (256, 256)\n",
    "patch_size = (1024, 1024)                          # default: (256, 256)\n",
    "bytes_per_pixel = 3                                # default: 3\n",
    "\n",
    "print(f'image size: {image_size}')\n",
    "print(f'tile size: {tile_size}')\n",
    "\n",
    "# Below three statements are the same.\n",
    "memory_capacity = preferred_memory_capacity(img, patch_size=patch_size)\n",
    "memory_capacity2 = preferred_memory_capacity(None, image_size, tile_size, patch_size, bytes_per_pixel)\n",
    "memory_capacity3 = preferred_memory_capacity(None, image_size, patch_size=patch_size)\n",
    "\n",
    "print(f'memory_capacity : {memory_capacity} MiB')\n",
    "print(f'memory_capacity2: {memory_capacity2} MiB')\n",
    "print(f'memory_capacity3: {memory_capacity3} MiB')\n",
    "\n",
    "cache = CuImage.cache('per_process', memory_capacity=memory_capacity) # You can also manually set capacity` (e.g., `capacity=500`)\n",
    "print('= Cache Info =')\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5898b317",
   "metadata": {},
   "source": [
    "### Reserve More Cache Memory\n",
    "\n",
    "If more cache memory capacity is needed in runtime, you can use `reserve()` method.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "61801fe7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "memory_capacity : 30 MiB\n",
      "new_memory_capacity: 44 MiB\n",
      "\n",
      "= Cache Info =\n",
      "       type: CacheType.PerProcess(1)\n",
      "memory_size: 0/31457280\n",
      "       size: 0/160\n",
      "\n",
      "= Cache Info (update memory capacity) =\n",
      "       type: CacheType.PerProcess(1)\n",
      "memory_size: 0/46137344\n",
      "       size: 0/234\n",
      "\n",
      "= Cache Info (update memory capacity & capacity) =\n",
      "       type: CacheType.PerProcess(1)\n",
      "memory_size: 0/46137344     # smaller `memory_capacity` value does not change this\n",
      "       size: 0/500\n",
      "\n",
      "= Cache Info (no cache) =\n",
      "       type: CacheType.NoCache(0)\n",
      "memory_size: 0/0\n",
      "       size: 0/0\n"
     ]
    }
   ],
   "source": [
    "from cucim import CuImage\n",
    "from cucim.clara.cache import preferred_memory_capacity\n",
    "\n",
    "img = CuImage('input/image.tif')\n",
    "\n",
    "memory_capacity = preferred_memory_capacity(img, patch_size=(256, 256))\n",
    "new_memory_capacity = preferred_memory_capacity(img, patch_size=(512, 512))\n",
    "\n",
    "print(f'memory_capacity : {memory_capacity} MiB')\n",
    "print(f'new_memory_capacity: {new_memory_capacity} MiB')\n",
    "print()\n",
    "\n",
    "cache = CuImage.cache('per_process', memory_capacity=memory_capacity)\n",
    "print('= Cache Info =')\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n",
    "print()\n",
    "\n",
    "cache.reserve(new_memory_capacity)\n",
    "print('= Cache Info (update memory capacity) =')\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n",
    "print()\n",
    "\n",
    "cache.reserve(memory_capacity, capacity=500)\n",
    "print('= Cache Info (update memory capacity & capacity) =')\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}     # smaller `memory_capacity` value does not change this')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n",
    "print()\n",
    "\n",
    "cache = CuImage.cache('no_cache')\n",
    "print('= Cache Info (no cache) =')\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8ffe4cb",
   "metadata": {},
   "source": [
    "## Profiling Cache Hit/Miss\n",
    "\n",
    "If you add an argument `record_stat=True` to `CuImage.cache()` method, cache statistics is recorded.\n",
    "\n",
    "Cache hit/miss count is accessible through `hit_count`/`miss_count` property of the cache object.\n",
    "\n",
    "You can get/set/unset the recording through `record()` method.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "91587c98",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cache hit: 0, cache miss: 1\n",
      "cache hit: 1, cache miss: 1\n",
      "cache hit: 2, cache miss: 1\n",
      "Is recorded: True\n",
      "Is recorded: False\n",
      "cache hit: 0, cache miss: 0\n",
      "\n",
      "       type: CacheType.PerProcess(1)\n",
      "memory_size: 196608/31457280\n",
      "free_memory: 31260672\n",
      "       size: 1/160\n",
      "\n",
      "       type: CacheType.NoCache(0)\n",
      "memory_size: 0/0\n",
      "free_memory: 0\n",
      "       size: 0/0\n"
     ]
    }
   ],
   "source": [
    "from cucim import CuImage\n",
    "from cucim.clara.cache import preferred_memory_capacity\n",
    "\n",
    "img = CuImage('input/image.tif')\n",
    "memory_capacity = preferred_memory_capacity(img, patch_size=(256, 256))\n",
    "cache = CuImage.cache('per_process', memory_capacity=memory_capacity, record_stat=True)\n",
    "\n",
    "img.read_region((0,0), (100,100))\n",
    "print(f'cache hit: {cache.hit_count}, cache miss: {cache.miss_count}')\n",
    "\n",
    "region = img.read_region((0,0), (100,100))\n",
    "print(f'cache hit: {cache.hit_count}, cache miss: {cache.miss_count}')\n",
    "\n",
    "region = img.read_region((0,0), (100,100))\n",
    "print(f'cache hit: {cache.hit_count}, cache miss: {cache.miss_count}')\n",
    "\n",
    "print(f'Is recorded: {cache.record()}')\n",
    "\n",
    "cache.record(False)\n",
    "print(f'Is recorded: {cache.record()}')\n",
    "\n",
    "region = img.read_region((0,0), (100,100))\n",
    "print(f'cache hit: {cache.hit_count}, cache miss: {cache.miss_count}')\n",
    "print()\n",
    "\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'free_memory: {cache.free_memory}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n",
    "print()\n",
    "\n",
    "cache = CuImage.cache('no_cache')\n",
    "print(f'       type: {cache.type}({int(cache.type)})')\n",
    "print(f'memory_size: {cache.memory_size}/{cache.memory_capacity}')\n",
    "print(f'free_memory: {cache.free_memory}')\n",
    "print(f'       size: {cache.size}/{cache.capacity}')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4b4b55e",
   "metadata": {},
   "source": [
    "## Considerations in Multi-threading/processing Environment\n",
    "\n",
    "\n",
    "### `per_process` strategy\n",
    "\n",
    "#### Cache memory\n",
    "\n",
    "If used in the multi-threading environment and each thread is reading the different part of the image sequentially, please consider increasing cache memory size than the size suggested by `cucim.clara.cache.preferred_memory_capacity()` to avoid dropping necessary cache items.\n",
    "\n",
    "If used in the multi-processing environment, the cache memory size allocated can be `(# of processes) x (cache memory capacity)`. \n",
    "\n",
    "Please be careful not to oversize the memory allocated by the cache.\n",
    "\n",
    "\n",
    "#### Cache Statistics\n",
    "\n",
    "If used in the multi-processing environment (e.g, using `concurrent.futures.ProcessPoolExecutor()`), cache hit count (`hit_count`) and miss count (`miss_count`) wouldn't be recorded in the main process's cache object.\n",
    "\n",
    "\n",
    "### `shared_memory` strategy\n",
    "\n",
    "In general, `shared_memory` strategy has more overhead than `per_process` strategy. However, it is recommended that you select this strategy if you want to use a fixed size of cache memory regardless of the number of processes.\n",
    "\n",
    "Note that, this strategy pre-allocates the cache memory in the shared memory and allocates more memory (as specified in `extra_shared_memory_size` parameter) than the requested cache memory size (capacity) for the memory allocator to handle memory segments.\n",
    "\n",
    "\n",
    "#### Cache memory\n",
    "\n",
    "Since the cache memory would be shared by multiple threads/processes, you will need to set enough cache memory to avoid dropping necessary cache items.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5b247db",
   "metadata": {},
   "source": [
    "## Setting Default Cache Configuration\n",
    "\n",
    "The configuration for cuCIM can be specified in `.cucim.json` file and user can set a default cache settings there.\n",
    "\n",
    "cuCIM finds `.cucim.json` file from the following order:\n",
    "\n",
    "1. The current folder\n",
    "2. `$HOME/.cucim.json`\n",
    "\n",
    "The configuration for the cache can be specified like below.\n",
    "\n",
    "```jsonc\n",
    "\n",
    "{\n",
    "    // This is actually JSONC file so comments are available.\n",
    "    \"cache\": {\n",
    "        \"type\": \"nocache\",\n",
    "        \"memory_capacity\": 1024,\n",
    "        \"capacity\": 5461,\n",
    "        \"mutex_pool_capacity\": 11117,\n",
    "        \"list_padding\": 10000,\n",
    "        \"extra_shared_memory_size\": 100,\n",
    "        \"record_stat\": false\n",
    "    }\n",
    "}\n",
    "```\n",
    "\n",
    "You can write the current cache configuration into the file like below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1dda19e6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"cache\": {\n",
      "        \"type\": \"nocache\",\n",
      "        \"memory_capacity\": 1024,\n",
      "        \"capacity\": 5461,\n",
      "        \"mutex_pool_capacity\": 11117,\n",
      "        \"list_padding\": 10000,\n",
      "        \"extra_shared_memory_size\": 100,\n",
      "        \"record_stat\": false\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "from cucim import CuImage\n",
    "\n",
    "cache = CuImage.cache()\n",
    "config_data = {'cache': cache.config}\n",
    "json_text = json.dumps(config_data, indent=4)\n",
    "print(json_text)\n",
    "\n",
    "# Save into the configuration file.\n",
    "with open('.cucim.json', 'w') as fp:\n",
    "    fp.write(json_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60c2d934",
   "metadata": {},
   "source": [
    "### Cache Mechanism Used in Other Libraries (OpenSlide and rasterio)\n",
    "\n",
    "Other libraries have the following strategies for the cache.\n",
    "\n",
    "- [OpenSlide](https://openslide.org/) \n",
    "  - 1024 x 1024 x 30 bytes (30MiB) per file handle for cache ==> 160 (RGB) or 120 (ARGB) 256x256 tiles\n",
    "  - Not configurable\n",
    "- [rasterio](https://rasterio.readthedocs.io/en/latest/)\n",
    "  - 5% of available system memory per process by default (e.g., 32 GB of free memory => 1.6 GB of cache memory allocated).\n",
    "  - Configurable through [environment module](https://rasterio.readthedocs.io/en/latest/api/rasterio.env.html)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5148543f",
   "metadata": {},
   "source": [
    "## Results\n",
    "\n",
    "cuCIM has a similar performance gain with the aligned case when the patch and tile layout are not aligned.\n",
    "\n",
    "We compared performance against OpenSlide and rasterio.\n",
    "\n",
    "For the cache memory size(capacity) setting, we used a similar approach with rasterio (5% of available system memory).\n",
    "\n",
    "\n",
    "### System Information\n",
    "\n",
    "- OS: Ubuntu 18.04\n",
    "- CPU: [Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz](https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-7800X+%40+3.50GHz&id=3037)\n",
    "- Memory: 64GB (G-Skill DDR4 2133 16GB X 4)\n",
    "- Storage\n",
    "  - SATA SSD: [Samsung SSD 850 EVO 1TB](https://www.samsung.com/us/computing/memory-storage/solid-state-drives/ssd-850-evo-2-5-sata-iii-1tb-mz-75e1t0b-am/)\n",
    "  \n",
    "### Experiment Setup\n",
    "+ Use read_region() APIs to read all patches (256x256 size each) of a whole slide image (.tif) at the largest resolution level (92,344 x 81,017. Internal tile size is 256 x 256 with 95% JPEG compression quality level) on multithread/multiprocess environment.\n",
    "    - Original whole slide image (.svs : 1.6GB) was converted into .tif file (3.2GB) using OpenSlide & tifffile library in this experiment (image2.tif).\n",
    "        * Original image can be downloaded from here(https://drive.google.com/drive/u/0/folders/0B--ztKW0d17XYlBqOXppQmw0M2M   , TUPAC-TR-488.svs)\n",
    "+ Two different job configurations\n",
    "    - multithreading: spread workload into multiple threads\n",
    "    - multiprocessing: spread workload into multiple processes\n",
    "+ Two different read configurations for each job configuration\n",
    "    - unaligned/nocache: (256x256)-patch-reads start from (1,1). e.g., read the region (1,1)-(257,257) then, read the region (257,1)-(513,257), ...\n",
    "    - aligned: (256x256)-patch-reads start from (0,0). OpenSlide's internal cache mechanism does not affect this case.\n",
    "+ Took about 10 samples due to the time to conduct the experiment so there could have some variation in the results.\n",
    "+ Note that this experiment doesn’t isolate the effect of system cache (page cache) that we excluded its effect on C++ API benchmark[discard_cache] so IO time itself could be short for both libraries.\n",
    "\n",
    "### Aligned Case (`per_process`, JPEG-compressed TIFF file)\n",
    "\n",
    "![image](https://user-images.githubusercontent.com/1928522/120849255-ae63b380-c52a-11eb-80c3-8411990e6c25.png)\n",
    "\n",
    "\n",
    "### Unaligned Case (`per_process`, JPEG-compressed TIFF file)\n",
    "\n",
    "![image](https://user-images.githubusercontent.com/1928522/120849176-92601200-c52a-11eb-9c38-92e55c3d413a.png)\n",
    "\n",
    "### Overall Performance of `per_process` Compared with `no_cache` for Unaligned Case\n",
    "\n",
    "![image](https://user-images.githubusercontent.com/1928522/118345306-494b0e00-b4e8-11eb-88ca-c835c70aa037.png)\n",
    "\n",
    "\n",
    "The detailed data is available [here](https://docs.google.com/spreadsheets/d/1eAqs24p25p6iIzZdUlnWNlk_RsrdRfEkIOYB9Xgu67c/edit?usp=sharing).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1c665c5",
   "metadata": {},
   "source": [
    "\n",
    "## Room for Improvement\n",
    "\n",
    "### Using of a Memory Pool\n",
    "\n",
    "`per_process` strategy performs better than `shared_memory` strategy, and both strategies perform less than `nocache` strategy when underlying tiles and patches are aligned.\n",
    "- `shared_memory` strategy does some additional operations compared with `per_process` strategy, and both strategies have some overhead using cache (such as memory allocation for cache item/indirect function calls)\n",
    "\n",
    "=> All three strategies (including `nocache`) can have benefited if we allocate CPU/GPU memory for tiles from a fixed-sized cache memory pool (using [RMM](https://docs.rapids.ai/api/rmm/stable/basics.html) and/or [PMR](https://en.cppreference.com/w/cpp/memory/synchronized_pool_resource)) instead of calling malloc() to allocate memory.\n",
    "\n",
    "### Supporting Generator (iterator)\n",
    "\n",
    "When patches to read in an image can be determined in advance (inference use case), we can load/prefetch entire compressed/decompressed image data to the memory and provide Python generator(iterator) to get a series of patches efficiently for inference use cases. \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf312216",
   "metadata": {},
   "source": [
    "## Appendix\n",
    "\n",
    "### Experiment Code\n",
    "\n",
    "```python\n",
    "#\n",
    "# Copyright (c) 2021, NVIDIA CORPORATION.\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#      http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License.\n",
    "#\n",
    "\n",
    "import concurrent.futures\n",
    "from contextlib import ContextDecorator\n",
    "from datetime import datetime\n",
    "from itertools import repeat\n",
    "from time import perf_counter\n",
    "\n",
    "import numpy as np\n",
    "import rasterio\n",
    "from cucim import CuImage\n",
    "from openslide import OpenSlide\n",
    "from rasterio.windows import Window\n",
    "\n",
    "\n",
    "class Timer(ContextDecorator):\n",
    "    def __init__(self, message):\n",
    "        self.message = message\n",
    "        self.end = None\n",
    "\n",
    "    def elapsed_time(self):\n",
    "        self.end = perf_counter()\n",
    "        return self.end - self.start\n",
    "\n",
    "    def __enter__(self):\n",
    "        self.start = perf_counter()\n",
    "        return self\n",
    "\n",
    "    def __exit__(self, exc_type, exc, exc_tb):\n",
    "        if not self.end:\n",
    "            self.elapsed_time()\n",
    "        print(\"{} : {}\".format(self.message, self.end - self.start))\n",
    "\n",
    "\n",
    "def load_tile_openslide(slide, start_loc, patch_size):\n",
    "    _ = slide.read_region(start_loc, 0, [patch_size, patch_size])\n",
    "\n",
    "def load_tile_openslide_chunk(inp_file, start_loc_list, patch_size):\n",
    "    with OpenSlide(inp_file) as slide:\n",
    "        for start_loc in start_loc_list:\n",
    "            region = slide.read_region(start_loc, 0, [patch_size, patch_size])\n",
    "\n",
    "def load_tile_cucim(slide, start_loc, patch_size):\n",
    "    _ = slide.read_region(start_loc, [patch_size, patch_size], 0)\n",
    "\n",
    "def load_tile_cucim_chunk(inp_file, start_loc_list, patch_size):\n",
    "    try:\n",
    "        slide = CuImage(inp_file)\n",
    "        for start_loc in start_loc_list:\n",
    "            region = slide.read_region(start_loc, [patch_size, patch_size], 0)\n",
    "    except Exception as e:\n",
    "        print(e)\n",
    "\n",
    "identity = rasterio.Affine(1, 0, 0, 0, 1, 0)\n",
    "def load_tile_rasterio(slide, start_loc, tile_size):\n",
    "    _ = np.moveaxis(slide.read([1,2,3],\n",
    "        window=Window.from_slices((start_loc[0], start_loc[0] + tile_size),(start_loc[1], start_loc[1] + tile_size))), 0, -1)\n",
    "\n",
    "def load_tile_rasterio_chunk(input_file, start_loc_list, patch_size):\n",
    "    identity = rasterio.Affine(1, 0, 0, 0, 1, 0)\n",
    "    slide = rasterio.open(input_file, transform = identity, num_threads=1)\n",
    "    for start_loc in start_loc_list:\n",
    "        _ = np.moveaxis(slide.read([1,2,3],\n",
    "                  window=Window.from_slices((start_loc[0], start_loc[0] + patch_size),(start_loc[1], start_loc[1] + patch_size))), 0, -1)\n",
    "\n",
    "\n",
    "def load_tile_openslide_chunk_mp(inp_file, start_loc_list, patch_size):\n",
    "    with OpenSlide(inp_file) as slide:\n",
    "        for start_loc in start_loc_list:\n",
    "            region = slide.read_region(start_loc, 0, [patch_size, patch_size])\n",
    "\n",
    "def load_tile_cucim_chunk_mp(inp_file, start_loc_list, patch_size):\n",
    "    slide = CuImage(inp_file)\n",
    "    for start_loc in start_loc_list:\n",
    "        region = slide.read_region(start_loc, [patch_size, patch_size], 0)\n",
    "\n",
    "def load_tile_rasterio_chunk_mp(input_file, start_loc_list, patch_size):\n",
    "    slide = rasterio.open(input_file, num_threads=1)\n",
    "    for start_loc in start_loc_list:\n",
    "        region = np.moveaxis(slide.read([1,2,3],\n",
    "                  window=Window.from_slices((start_loc[0], start_loc[0] + patch_size),(start_loc[1], start_loc[1] + patch_size))), 0, -1)\n",
    "\n",
    "def experiment_thread(cache_strategy, input_file, num_threads, start_location, patch_size):\n",
    "    import psutil\n",
    "    print(\"  \", psutil.virtual_memory())\n",
    "    for num_workers in range(1, num_threads + 1): # range(1, num_threads + 1): # (num_threads,):\n",
    "        openslide_time = 1\n",
    "        cucim_time = 1\n",
    "        rasterio_time = 1\n",
    "\n",
    "        with OpenSlide(input_file) as slide:\n",
    "            width, height = slide.dimensions\n",
    "\n",
    "            start_loc_data = [(sx, sy)\n",
    "                            for sy in range(start_location, height, patch_size)\n",
    "                                for sx in range(start_location, width, patch_size)]\n",
    "            chunk_size = len(start_loc_data) // num_workers\n",
    "            start_loc_list_iter = [start_loc_data[i:i+chunk_size] for i in range(0, len(start_loc_data), chunk_size)]\n",
    "            with Timer(\"  Thread elapsed time (OpenSlide)\") as timer:\n",
    "                with concurrent.futures.ThreadPoolExecutor(\n",
    "                    max_workers=num_workers\n",
    "                ) as executor:\n",
    "                    executor.map(\n",
    "                        load_tile_openslide_chunk,\n",
    "                        repeat(input_file),\n",
    "                        start_loc_list_iter,\n",
    "                        repeat(patch_size)\n",
    "                    )\n",
    "                openslide_time = timer.elapsed_time()\n",
    "        print(\"  \", psutil.virtual_memory())\n",
    "\n",
    "        cache_size = psutil.virtual_memory().available // 1024 // 1024 // 20\n",
    "        cache = CuImage.cache(cache_strategy, memory_capacity=cache_size, record_stat=True)\n",
    "        cucim_time = 0\n",
    "        slide = CuImage(input_file)\n",
    "        start_loc_data = [(sx, sy)\n",
    "                        for sy in range(start_location, height, patch_size)\n",
    "                            for sx in range(start_location, width, patch_size)]\n",
    "        chunk_size = len(start_loc_data) // num_workers\n",
    "        start_loc_list_iter = [start_loc_data[i:i+chunk_size] for i in range(0, len(start_loc_data), chunk_size)]\n",
    "        with Timer(\"  Thread elapsed time (cuCIM)\") as timer:\n",
    "            with concurrent.futures.ThreadPoolExecutor(\n",
    "                max_workers=num_workers\n",
    "            ) as executor:\n",
    "                executor.map(\n",
    "                    load_tile_cucim_chunk,\n",
    "                    repeat(input_file),\n",
    "                    start_loc_list_iter,\n",
    "                    repeat(patch_size)\n",
    "                )\n",
    "            cucim_time = timer.elapsed_time()\n",
    "        print(f\"  hit: {cache.hit_count}   miss: {cache.miss_count}\")\n",
    "        print(\"  \", psutil.virtual_memory())\n",
    "\n",
    "        start_loc_data = [(sx, sy)\n",
    "                        for sy in range(start_location, height, patch_size)\n",
    "                            for sx in range(start_location, width, patch_size)]\n",
    "        chunk_size = len(start_loc_data) // num_workers\n",
    "        start_loc_list_iter = [start_loc_data[i:i+chunk_size] for i in range(0, len(start_loc_data), chunk_size)]\n",
    "\n",
    "        with Timer(\"  Thread elapsed time (rasterio)\") as timer:\n",
    "            with concurrent.futures.ThreadPoolExecutor(\n",
    "                max_workers=num_workers\n",
    "            ) as executor:\n",
    "                executor.map(\n",
    "                    load_tile_rasterio_chunk,\n",
    "                    repeat(input_file),\n",
    "                    start_loc_list_iter,\n",
    "                    repeat(patch_size)\n",
    "                )\n",
    "            rasterio_time = timer.elapsed_time()\n",
    "\n",
    "        print(\"  \", psutil.virtual_memory())\n",
    "        output_text = f\"{datetime.now().strftime('%Y-%m-%d %H:%M:%S')},thread,{cache_strategy},{input_file},{start_location},{patch_size},{num_workers},{openslide_time},{cucim_time},{rasterio_time},{openslide_time / cucim_time},{rasterio_time / cucim_time},{cache_size},{cache.hit_count},{cache.miss_count}\\n\"\n",
    "        with open(\"experiment.txt\", \"a+\") as f:\n",
    "            f.write(output_text)\n",
    "        print(output_text)\n",
    "\n",
    "def experiment_process(cache_strategy, input_file, num_processes, start_location, patch_size):\n",
    "    import psutil\n",
    "    print(\"  \", psutil.virtual_memory())\n",
    "    for num_workers in range(1, num_processes + 1):\n",
    "        openslide_time = 1\n",
    "        cucim_time = 1\n",
    "        rasterio_time = 1\n",
    "        # (92344 x 81017)\n",
    "        with OpenSlide(input_file) as slide:\n",
    "            width, height = slide.dimensions\n",
    "\n",
    "            start_loc_data = [(sx, sy)\n",
    "                            for sy in range(start_location, height, patch_size)\n",
    "                                for sx in range(start_location, width, patch_size)]\n",
    "            chunk_size = len(start_loc_data) // num_workers\n",
    "            start_loc_list_iter = [start_loc_data[i:i+chunk_size] for i in range(0, len(start_loc_data), chunk_size)]\n",
    "\n",
    "            with Timer(\"  Process elapsed time (OpenSlide)\") as timer:\n",
    "                with concurrent.futures.ProcessPoolExecutor(\n",
    "                    max_workers=num_workers\n",
    "                ) as executor:\n",
    "                    executor.map(\n",
    "                        load_tile_openslide_chunk_mp,\n",
    "                        repeat(input_file),\n",
    "                        start_loc_list_iter,\n",
    "                        repeat(patch_size)\n",
    "                    )\n",
    "                openslide_time = timer.elapsed_time()\n",
    "        print(\"  \", psutil.virtual_memory())\n",
    "\n",
    "        cache_size = psutil.virtual_memory().available // 1024 // 1024 // 20\n",
    "        if cache_strategy == \"shared_memory\":\n",
    "            cache_size = cache_size * num_workers\n",
    "        cache = CuImage.cache(cache_strategy, memory_capacity=cache_size, record_stat=True)\n",
    "        cucim_time = 0\n",
    "        slide = CuImage(input_file)\n",
    "        start_loc_data = [(sx, sy)\n",
    "                        for sy in range(start_location, height, patch_size)\n",
    "                            for sx in range(start_location, width, patch_size)]\n",
    "        chunk_size = len(start_loc_data) // num_workers\n",
    "        start_loc_list_iter = [start_loc_data[i:i+chunk_size] for i in range(0, len(start_loc_data), chunk_size)]\n",
    "\n",
    "        with Timer(\"  Process elapsed time (cuCIM)\") as timer:\n",
    "            with concurrent.futures.ProcessPoolExecutor(\n",
    "                max_workers=num_workers\n",
    "            ) as executor:\n",
    "                executor.map(\n",
    "                    load_tile_cucim_chunk_mp,\n",
    "                    repeat(input_file),\n",
    "                    start_loc_list_iter,\n",
    "                    repeat(patch_size)\n",
    "                )\n",
    "            cucim_time = timer.elapsed_time()\n",
    "        print(\"  \", psutil.virtual_memory())\n",
    "\n",
    "        rasterio_time = 0\n",
    "        start_loc_data = [(sx, sy)\n",
    "                        for sy in range(start_location, height, patch_size)\n",
    "                            for sx in range(start_location, width, patch_size)]\n",
    "        chunk_size = len(start_loc_data) // num_workers\n",
    "        start_loc_list_iter = [start_loc_data[i:i+chunk_size] for i in range(0, len(start_loc_data), chunk_size)]\n",
    "\n",
    "        with Timer(\"  Process elapsed time (rasterio)\") as timer:\n",
    "            with concurrent.futures.ProcessPoolExecutor(\n",
    "                max_workers=num_workers\n",
    "            ) as executor:\n",
    "                executor.map(\n",
    "                    load_tile_rasterio_chunk_mp,\n",
    "                    repeat(input_file),\n",
    "                    start_loc_list_iter,\n",
    "                    repeat(patch_size)\n",
    "                )\n",
    "            rasterio_time = timer.elapsed_time()\n",
    "\n",
    "        print(\"  \", psutil.virtual_memory())\n",
    "        output_text = f\"{datetime.now().strftime('%Y-%m-%d %H:%M:%S')},process,{cache_strategy},{input_file},{start_location},{patch_size},{num_workers},{openslide_time},{cucim_time},{rasterio_time},{openslide_time / cucim_time},{rasterio_time / cucim_time},{cache_size},{cache.hit_count},{cache.miss_count}\\n\"\n",
    "        with open(\"experiment.txt\", \"a+\") as f:\n",
    "            f.write(output_text)\n",
    "        print(output_text)\n",
    "\n",
    "experiment_thread(\"nocache\", \"notebooks/input/image.tif\", 12, 0, 256)\n",
    "experiment_process(\"nocache\", \"notebooks/input/image.tif\", 12, 0, 256)\n",
    "experiment_thread(\"per_process\", \"notebooks/input/image.tif\", 12, 0, 256)\n",
    "experiment_process(\"per_process\", \"notebooks/input/image.tif\", 12, 0, 256)\n",
    "experiment_thread(\"shared_memory\", \"notebooks/input/image.tif\", 12, 0, 256)\n",
    "experiment_process(\"shared_memory\", \"notebooks/input/image.tif\", 12, 0, 256)\n",
    "\n",
    "experiment_thread(\"nocache\", \"notebooks/input/image.tif\", 12, 1, 256)\n",
    "experiment_process(\"nocache\", \"notebooks/input/image.tif\", 12, 1, 256)\n",
    "experiment_thread(\"per_process\", \"notebooks/input/image.tif\", 12, 1, 256)\n",
    "experiment_process(\"per_process\", \"notebooks/input/image.tif\", 12, 1, 256)\n",
    "experiment_thread(\"shared_memory\", \"notebooks/input/image.tif\", 12, 1, 256)\n",
    "experiment_process(\"shared_memory\", \"notebooks/input/image.tif\", 12, 1, 256)\n",
    "\n",
    "experiment_thread(\"nocache\", \"notebooks/input/image2.tif\", 12, 0, 256)\n",
    "experiment_process(\"nocache\", \"notebooks/input/image2.tif\", 12, 0, 256)\n",
    "experiment_thread(\"per_process\", \"notebooks/input/image2.tif\", 12, 0, 256)\n",
    "experiment_process(\"per_process\", \"notebooks/input/image2.tif\", 12, 0, 256)\n",
    "experiment_thread(\"shared_memory\", \"notebooks/input/image2.tif\", 12, 0, 256)\n",
    "experiment_process(\"shared_memory\", \"notebooks/input/image2.tif\", 12, 0, 256)\n",
    "\n",
    "experiment_thread(\"nocache\", \"notebooks/input/image2.tif\", 12, 1, 256)\n",
    "experiment_process(\"nocache\", \"notebooks/input/image2.tif\", 12, 1, 256)\n",
    "experiment_thread(\"per_process\", \"notebooks/input/image2.tif\", 12, 1, 256)\n",
    "experiment_process(\"per_process\", \"notebooks/input/image2.tif\", 12, 1, 256)\n",
    "experiment_thread(\"shared_memory\", \"notebooks/input/image2.tif\", 12, 1, 256)\n",
    "experiment_process(\"shared_memory\", \"notebooks/input/image2.tif\", 12, 1, 256)\n",
    "\n",
    "experiment_thread(\"nocache\", \"notebooks/0486052bb.tiff\", 12, 0, 1024)\n",
    "experiment_process(\"nocache\", \"notebooks/0486052bb.tiff\", 12, 0, 1024)\n",
    "experiment_thread(\"per_process\", \"notebooks/0486052bb.tiff\", 12, 0, 1024)\n",
    "experiment_process(\"per_process\", \"notebooks/0486052bb.tiff\", 12, 0, 1024)\n",
    "experiment_thread(\"shared_memory\", \"notebooks/0486052bb.tiff\", 12, 0, 1024)\n",
    "experiment_process(\"shared_memory\", \"notebooks/0486052bb.tiff\", 12, 0, 1024)\n",
    "\n",
    "experiment_thread(\"nocache\", \"notebooks/0486052bb.tiff\", 12, 1, 1024)\n",
    "experiment_process(\"nocache\", \"notebooks/0486052bb.tiff\", 12, 1, 1024)\n",
    "experiment_thread(\"per_process\", \"notebooks/0486052bb.tiff\", 12, 1, 1024)\n",
    "experiment_process(\"per_process\", \"notebooks/0486052bb.tiff\", 12, 1, 1024)\n",
    "experiment_thread(\"shared_memory\", \"notebooks/0486052bb.tiff\", 12, 1, 1024)\n",
    "experiment_process(\"shared_memory\", \"notebooks/0486052bb.tiff\", 12, 1, 1024)\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}