{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Autoencoder transformation (encode-decode)\n",
    "\n",
    "Considering a dataset with $p$ numerical attributes. \n",
    "\n",
    "The goal of the autoencoder is to reduce the dimension of $p$ to $k$, such that these $k$ attributes are enough to recompose the original $p$ attributes. However from the $k$ dimensionals the data is returned back to $p$ dimensions. The higher the quality of autoencoder the similiar is the output from the input. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Loading required package: daltoolbox\n",
      "\n",
      "Registered S3 method overwritten by 'quantmod':\n",
      "  method            from\n",
      "  as.zoo.data.frame zoo \n",
      "\n",
      "\n",
      "Attaching package: ‘daltoolbox’\n",
      "\n",
      "\n",
      "The following object is masked from ‘package:base’:\n",
      "\n",
      "    transform\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# DAL ToolBox\n",
    "# version 1.0.777\n",
    "\n",
    "source(\"https://raw.githubusercontent.com/cefet-rj-dal/daltoolbox/main/jupyter.R\")\n",
    "\n",
    "#loading DAL\n",
    "load_library(\"daltoolbox\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### dataset for example "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 6 × 5 of type dbl</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>t4</th><th scope=col>t3</th><th scope=col>t2</th><th scope=col>t1</th><th scope=col>t0</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>0.0000000</td><td>0.2474040</td><td>0.4794255</td><td>0.6816388</td><td>0.8414710</td></tr>\n",
       "\t<tr><td>0.2474040</td><td>0.4794255</td><td>0.6816388</td><td>0.8414710</td><td>0.9489846</td></tr>\n",
       "\t<tr><td>0.4794255</td><td>0.6816388</td><td>0.8414710</td><td>0.9489846</td><td>0.9974950</td></tr>\n",
       "\t<tr><td>0.6816388</td><td>0.8414710</td><td>0.9489846</td><td>0.9974950</td><td>0.9839859</td></tr>\n",
       "\t<tr><td>0.8414710</td><td>0.9489846</td><td>0.9974950</td><td>0.9839859</td><td>0.9092974</td></tr>\n",
       "\t<tr><td>0.9489846</td><td>0.9974950</td><td>0.9839859</td><td>0.9092974</td><td>0.7780732</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 6 × 5 of type dbl\n",
       "\\begin{tabular}{lllll}\n",
       " t4 & t3 & t2 & t1 & t0\\\\\n",
       "\\hline\n",
       "\t 0.0000000 & 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710\\\\\n",
       "\t 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846\\\\\n",
       "\t 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950\\\\\n",
       "\t 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859\\\\\n",
       "\t 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974\\\\\n",
       "\t 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974 & 0.7780732\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 6 × 5 of type dbl\n",
       "\n",
       "| t4 | t3 | t2 | t1 | t0 |\n",
       "|---|---|---|---|---|\n",
       "| 0.0000000 | 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 |\n",
       "| 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 |\n",
       "| 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 |\n",
       "| 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 |\n",
       "| 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 |\n",
       "| 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 | 0.7780732 |\n",
       "\n"
      ],
      "text/plain": [
       "     t4        t3        t2        t1        t0       \n",
       "[1,] 0.0000000 0.2474040 0.4794255 0.6816388 0.8414710\n",
       "[2,] 0.2474040 0.4794255 0.6816388 0.8414710 0.9489846\n",
       "[3,] 0.4794255 0.6816388 0.8414710 0.9489846 0.9974950\n",
       "[4,] 0.6816388 0.8414710 0.9489846 0.9974950 0.9839859\n",
       "[5,] 0.8414710 0.9489846 0.9974950 0.9839859 0.9092974\n",
       "[6,] 0.9489846 0.9974950 0.9839859 0.9092974 0.7780732"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "data(sin_data)\n",
    "\n",
    "sw_size <- 5\n",
    "ts <- ts_data(sin_data$y, sw_size)\n",
    "\n",
    "ts_head(ts)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### applying data normalization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 6 × 5 of type dbl</caption>\n",
       "<thead>\n",
       "\t<tr><th scope=col>t4</th><th scope=col>t3</th><th scope=col>t2</th><th scope=col>t1</th><th scope=col>t0</th></tr>\n",
       "</thead>\n",
       "<tbody>\n",
       "\t<tr><td>0.5004502</td><td>0.6243512</td><td>0.7405486</td><td>0.8418178</td><td>0.9218625</td></tr>\n",
       "\t<tr><td>0.6243512</td><td>0.7405486</td><td>0.8418178</td><td>0.9218625</td><td>0.9757058</td></tr>\n",
       "\t<tr><td>0.7405486</td><td>0.8418178</td><td>0.9218625</td><td>0.9757058</td><td>1.0000000</td></tr>\n",
       "\t<tr><td>0.8418178</td><td>0.9218625</td><td>0.9757058</td><td>1.0000000</td><td>0.9932346</td></tr>\n",
       "\t<tr><td>0.9218625</td><td>0.9757058</td><td>1.0000000</td><td>0.9932346</td><td>0.9558303</td></tr>\n",
       "\t<tr><td>0.9757058</td><td>1.0000000</td><td>0.9932346</td><td>0.9558303</td><td>0.8901126</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 6 × 5 of type dbl\n",
       "\\begin{tabular}{lllll}\n",
       " t4 & t3 & t2 & t1 & t0\\\\\n",
       "\\hline\n",
       "\t 0.5004502 & 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625\\\\\n",
       "\t 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058\\\\\n",
       "\t 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000\\\\\n",
       "\t 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346\\\\\n",
       "\t 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303\\\\\n",
       "\t 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303 & 0.8901126\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 6 × 5 of type dbl\n",
       "\n",
       "| t4 | t3 | t2 | t1 | t0 |\n",
       "|---|---|---|---|---|\n",
       "| 0.5004502 | 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 |\n",
       "| 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 |\n",
       "| 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 |\n",
       "| 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 |\n",
       "| 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 |\n",
       "| 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 | 0.8901126 |\n",
       "\n"
      ],
      "text/plain": [
       "     t4        t3        t2        t1        t0       \n",
       "[1,] 0.5004502 0.6243512 0.7405486 0.8418178 0.9218625\n",
       "[2,] 0.6243512 0.7405486 0.8418178 0.9218625 0.9757058\n",
       "[3,] 0.7405486 0.8418178 0.9218625 0.9757058 1.0000000\n",
       "[4,] 0.8418178 0.9218625 0.9757058 1.0000000 0.9932346\n",
       "[5,] 0.9218625 0.9757058 1.0000000 0.9932346 0.9558303\n",
       "[6,] 0.9757058 1.0000000 0.9932346 0.9558303 0.8901126"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "preproc <- ts_norm_gminmax()\n",
    "preproc <- fit(preproc, ts)\n",
    "ts <- transform(preproc, ts)\n",
    "\n",
    "ts_head(ts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### spliting into training and test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "samp <- ts_sample(ts, test_size = 10)\n",
    "train <- as.data.frame(samp$train)\n",
    "test <- as.data.frame(samp$test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### creating autoencoder\n",
    "Reduce from 5 to 3 dimensions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "ename": "ERROR",
     "evalue": "name 'sample' is not defined",
     "output_type": "error",
     "traceback": [
      "name 'sample' is not definedTraceback:\n",
      "1. fit(auto, train)",
      "2. fit.autoenc_encode_decode(auto, train)",
      "3. autoencoder_fit(obj$model, data, num_epochs = obj$num_epochs, \n .     learning_rate = obj$learning_rate, return_loss = obj$return_loss)",
      "4. py_call_impl(callable, call_args$unnamed, call_args$named)"
     ]
    }
   ],
   "source": [
    "auto <- autoenc_encode_decode(5, 3)\n",
    "\n",
    "auto <- fit(auto, train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### testing autoencoder\n",
    "presenting the original test set and display encoding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(head(test))\n",
    "result <- transform(auto, test)\n",
    "print(head(result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.4.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}