{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# [widget] Video Source Stabilization (part 1)\n",
    "\n",
    "The widget below illustrates how images generated using `animation_mode: Video Source` are affected by certain \"stabilization\" options. \n",
    "\n",
    "Press the **\"▷\"** icon to begin the animation. \n",
    "\n",
    "The first run with any particular set of settings will probably show an empty image because the widget is janky and downloads only what it needs on the fly. What can I say: I'm an ML engineer, not a webdeveloper.\n",
    "\n",
    "## What is \"Video Source\" animation mode?\n",
    "\n",
    "PyTTI generates images by iterative updates. This process can be initialized in a variety of ways, and depending on how certain settings are configured, the initial state can have a very significant impact on the final result. For example, if we set the number of steps or the learning rate very low, the final result might be barely modified from the initial state. PyTTI's default behavior is to initialize this process using random noise (i.e. an image of fuzzy static). If we provide an image to use for the starting state of this process, the \"image generation\" can become more of an \"image *manipulation*\". A video is just a sequence of images, so we can use pytti as a tool for manipulating an input video sequence similar to how pytti can be used to manipulate an input image.\n",
    "\n",
    "Generating a sequence of images for an animation often comes with some additional considerations. In particular: we often want to be able to control frame-to-frame coherence. Using adjacent video frames as init images to generate adjacent frames of an animation is a good way to at least guarantee some structural coherence in terms of the image layout, but otherwise the images will be generated independently of each other. A single frame of an animation generated this way will probably look fine in isolation, but as part of an animation sequence it might create a kind of undesirable flickering as manifestations of objects in the image change without regard to what they looked like in the previous frame.\n",
    "\n",
    "To resolve this, PyTTI provides a variety of mechanisms for encouraging an image generation to conform to attributes of either the input video, previously generated animation frames, or both. \n",
    "\n",
    "The following widget uses the VQGAN image model. You can aboslutely use other image models for video source animations, but generally we find this is what people are looking for. There will be some artifacts in the animations generated here as a consequence of the low output resolution used, so keep in mind that VQGAN outputs don't need to be as \"blocky\" as those illustrated here. The resolution in this experiment was kept low to generate the demonstration images faster.\n",
    "\n",
    "## Description of Settings in Widget\n",
    "\n",
    "* **`reencode_each_frame`**: Use each video frame as an init_image instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.\n",
    "* **`direct_stabilization_weight`**: Use the current frame of the video as a direct image prompt.\n",
    "* **`semantic_stabilization_weight`**: Use the current frame of the video as a semantic image prompt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Widget"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "import re\n",
    "from pathlib import Path\n",
    "\n",
    "from IPython.display import display, clear_output, Image, Video\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import panel as pn\n",
    "\n",
    "pn.extension()\n",
    "\n",
    "#########\n",
    "\n",
    "outputs_root = Path('images_out')\n",
    "folder_prefix = 'exp_video_basic_stability_modes'\n",
    "folders = list(outputs_root.glob(f'{folder_prefix}_*'))\n",
    "\n",
    "\n",
    "def format_val(v):\n",
    "    try:\n",
    "        v = float(v)\n",
    "        if int(v) == v:\n",
    "            v = int(v)\n",
    "    except:\n",
    "        pass\n",
    "    return v\n",
    "\n",
    "def parse_folder_name(folder):\n",
    "    metadata_string = folder.name[1+len(folder_prefix):]\n",
    "    pattern = r\"_?([a-zA-Z_]+)-(True|False|[0-9.]+)\"\n",
    "    matches = re.findall(pattern, metadata_string)\n",
    "    d_ = {k:format_val(v) for k,v in matches}\n",
    "    d_['fpath'] = folder\n",
    "    d_['n_images'] = len(list(folder.glob('*.png')))\n",
    "    return d_\n",
    "\n",
    "df_meta = pd.DataFrame([parse_folder_name(f) for f in folders])\n",
    "\n",
    "variant_names = [v for v in df_meta.columns.tolist() if v not in ['fpath']]\n",
    "variant_ranges = {v:df_meta[v].unique() for v in variant_names}\n",
    "[v.sort() for v in variant_ranges.values()]\n",
    "\n",
    "\n",
    "##########################################\n",
    "\n",
    "n_imgs_per_group = 20\n",
    "\n",
    "def setting_name_shorthand(setting_name):\n",
    "    return ''.join([tok[0] for tok in setting_name.split('_')])\n",
    "\n",
    "decoded_setting_name = {\n",
    "    'ref': 'reencode_each_frame',\n",
    "    'dsw': 'direct_stabilization_weight',\n",
    "    'ssw': 'semantic_stabilization_weight',\n",
    "}\n",
    "\n",
    "kargs = {k:pn.widgets.DiscreteSlider(name=decoded_setting_name[k], options=list(v), value=v[0]) for k,v in variant_ranges.items() if k != 'n_images'}\n",
    "kargs['i'] = pn.widgets.Player(interval=300, name='step', start=1, end=n_imgs_per_group, step=1, value=1, loop_policy='reflect')\n",
    "\n",
    "\n",
    "url_prefix = \"https://raw.githubusercontent.com/dmarx/pytti-settings-test/main/images_out/\"\n",
    "image_paths = [str(p) for p in Path('images_out').glob('**/*.png')]\n",
    "d_image_urls = {im_path:im_path.replace('images_out/', url_prefix) for im_path in image_paths}\n",
    "\n",
    "##########\n",
    "\n",
    "@pn.interact(\n",
    "    **kargs\n",
    ")\n",
    "def display_images(\n",
    "    ref,\n",
    "    dsw,\n",
    "    ssw,\n",
    "    i,\n",
    "):\n",
    "    folder = df_meta[\n",
    "        (ref == df_meta['ref']) &\n",
    "        (dsw == df_meta['dsw']) &\n",
    "        (ssw == df_meta['ssw'])\n",
    "    ]['fpath'].values[0]\n",
    "    im_path = str(folder / f\"{folder.name}_{i}.png\")\n",
    "    #im_url = im_path\n",
    "    im_url = d_image_urls[im_path]\n",
    "    return pn.pane.HTML(f'<img src=\"{im_url}\" width=\"700\">', width=700, height=350, sizing_mode='fixed')\n",
    "\n",
    "pn.panel(display_images, height=1000).embed(max_opts=n_imgs_per_group, max_states=999999999)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unmodified Source Video\n",
    "\n",
    "Via: https://archive.org/details/EvaVikstromStockFootageViewFromaTrainHebyMorgongavainAugust2006\n",
    "\n",
    "<iframe src=\"https://archive.org/embed/EvaVikstromStockFootageViewFromaTrainHebyMorgongavainAugust2006\" width=\"640\" height=\"480\" frameborder=\"0\" webkitallowfullscreen=\"true\" mozallowfullscreen=\"true\" allowfullscreen></iframe>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Settings shared across animations\n",
    "\n",
    "```\n",
    "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"\n",
    "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
    "\n",
    "animation_mode: \"Video Source\"\n",
    "video_path: \"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\"\n",
    "frames_per_second: 15\n",
    "backups: 3\n",
    "\n",
    "steps_per_frame: 50\n",
    "save_every: 50\n",
    "steps_per_scene: 1000\n",
    "\n",
    "image_model: \"VQGAN\"\n",
    "\n",
    "cutouts: 40\n",
    "cut_pow: 1\n",
    "\n",
    "pixel_size: 1\n",
    "height: 512\n",
    "width: 1024\n",
    "\n",
    "seed: 12345\n",
    "```\n",
    "\n",
    "### Detailed explanation of shared settings\n",
    "\n",
    "(WIP)\n",
    "\n",
    "```\n",
    "scenes: \"a photograph of a bright and beautiful spring day, by Trey Ratcliff\"\n",
    "scene_suffix: \" | text:-1:-.9 | watermark:-1:-.9\"\n",
    "```\n",
    "\n",
    "Guiding text prompts.\n",
    "\n",
    "```\n",
    "animation_mode: \"Video Source\"\n",
    "video_path: \"/home/dmarx/proj/pytti-book/pytti-core/src/pytti/assets/HebyMorgongava_512kb.mp4\"\n",
    "```\n",
    "\n",
    "It's generally a good idea to specify the path to files using an \"absolute\" path (starting from the root folder of the file system, in this case \"/\") rather than a \"relative\" path ('relative' with respect to the current folder). This is because depending on how we run pytti, it may actually change the current working directory. One of many headaches that comes with Hydra, which powers pytti's CLI and config system.\n",
    "\n",
    "```\n",
    "frames_per_second: 15\n",
    "```\n",
    "\n",
    "The video source file will be read in using ffmpeg, which will decode the video from its original frame rate to 15 FPS.\n",
    "\n",
    "```\n",
    "backups: 3\n",
    "```\n",
    "\n",
    "This is a concern that should totally be abstracted away from the user and I'm sorry I haven't taken care of it already. If you get errors saying something like pytti can't find a file named `...*.bak`, try setting backups to 0 or incrementing the number of backups until the error goes away. Let's just leave it at that for now.\n",
    "\n",
    "```\n",
    "steps_per_frame: 50\n",
    "save_every: 50\n",
    "steps_per_scene: 1000\n",
    "```\n",
    "\n",
    "Pytti will take 50 optimization steps for each frame (i.e. image) of the animation. \n",
    "\n",
    "We have one scenes: 1000 steps_per_scene / 50 steps_per_frame = **20 frames total** will be generated. \n",
    "\n",
    "At 15 FPS, we'll be manipulating 1.3 seconds of video footage. If the input video is shorter than the output duration calculated as a function of frames (like we just computed here), the animation will end when we run out of input video frames. \n",
    "\n",
    "**To apply PyTTI to an entire input video: set `steps_per_scene` to an arbitrarily high value.**\n",
    "\n",
    "```\n",
    "image_model: VQGAN\n",
    "```\n",
    "\n",
    "We choose the vqgan model here because it's essentially a short-cut to photorealistic outputs.\n",
    "\n",
    "\n",
    "```\n",
    "cutouts: 40\n",
    "cut_pow: 1\n",
    "```\n",
    "\n",
    "For each optimization step, we will take 60 random crops from the image to show the perceptor. `cut_pow` controls the size of these cutouts: 1 is generally a good default, smaller values create bigger cutouts. Generally, more cutouts = nicer images. If we set `reencode_each_frame: False`, we can sort of \"accumulate\" cutout information in the VQGAN latent, which will get carried from frame-to-frame rather than being re-initialized each frame. Sometimes this will be helpful, sometimes it  won't.\n",
    "\n",
    "\n",
    "```\n",
    "seed: 12345\n",
    "```\n",
    "\n",
    "If a seed is not specified, one will be generated randomly. This value is used to initialize the random number generator: specifying a seed promotes deterministic (repeatable) behavior. This is an especially useful parameter to set for comparison studies like this."
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "3eff1e1332ed0784bebe5613522d192d113df675730803c3b8984f113f4e15fd"
  },
  "kernelspec": {
   "display_name": "Python 3.9.7 ('pytti-book-l72HEyWC')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}