{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Autoencoder transformation (encode-decode)\n",
"\n",
"Considering a dataset with $p$ numerical attributes. \n",
"\n",
"The goal of the autoencoder is to reduce the dimension of $p$ to $k$, such that these $k$ attributes are enough to recompose the original $p$ attributes. However from the $k$ dimensionals the data is returned back to $p$ dimensions. The higher the quality of autoencoder the similiar is the output from the input. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Loading required package: daltoolbox\n",
"\n",
"Registered S3 method overwritten by 'quantmod':\n",
" method from\n",
" as.zoo.data.frame zoo \n",
"\n",
"\n",
"Attaching package: ‘daltoolbox’\n",
"\n",
"\n",
"The following object is masked from ‘package:base’:\n",
"\n",
" transform\n",
"\n",
"\n"
]
}
],
"source": [
"# DAL ToolBox\n",
"# version 1.0.777\n",
"\n",
"source(\"https://raw.githubusercontent.com/cefet-rj-dal/daltoolbox/main/jupyter.R\")\n",
"\n",
"#loading DAL\n",
"load_library(\"daltoolbox\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### dataset for example "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"\tt4 | t3 | t2 | t1 | t0 |
\n",
"\n",
"\n",
"\t0.0000000 | 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 |
\n",
"\t0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 |
\n",
"\t0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 |
\n",
"\t0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 |
\n",
"\t0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 |
\n",
"\t0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 | 0.7780732 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A matrix: 6 × 5 of type dbl\n",
"\\begin{tabular}{lllll}\n",
" t4 & t3 & t2 & t1 & t0\\\\\n",
"\\hline\n",
"\t 0.0000000 & 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710\\\\\n",
"\t 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846\\\\\n",
"\t 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950\\\\\n",
"\t 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859\\\\\n",
"\t 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974\\\\\n",
"\t 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974 & 0.7780732\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"| t4 | t3 | t2 | t1 | t0 |\n",
"|---|---|---|---|---|\n",
"| 0.0000000 | 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 |\n",
"| 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 |\n",
"| 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 |\n",
"| 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 |\n",
"| 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 |\n",
"| 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 | 0.7780732 |\n",
"\n"
],
"text/plain": [
" t4 t3 t2 t1 t0 \n",
"[1,] 0.0000000 0.2474040 0.4794255 0.6816388 0.8414710\n",
"[2,] 0.2474040 0.4794255 0.6816388 0.8414710 0.9489846\n",
"[3,] 0.4794255 0.6816388 0.8414710 0.9489846 0.9974950\n",
"[4,] 0.6816388 0.8414710 0.9489846 0.9974950 0.9839859\n",
"[5,] 0.8414710 0.9489846 0.9974950 0.9839859 0.9092974\n",
"[6,] 0.9489846 0.9974950 0.9839859 0.9092974 0.7780732"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data(sin_data)\n",
"\n",
"sw_size <- 5\n",
"ts <- ts_data(sin_data$y, sw_size)\n",
"\n",
"ts_head(ts)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### applying data normalization"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"\tt4 | t3 | t2 | t1 | t0 |
\n",
"\n",
"\n",
"\t0.5004502 | 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 |
\n",
"\t0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 |
\n",
"\t0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 |
\n",
"\t0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 |
\n",
"\t0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 |
\n",
"\t0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 | 0.8901126 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A matrix: 6 × 5 of type dbl\n",
"\\begin{tabular}{lllll}\n",
" t4 & t3 & t2 & t1 & t0\\\\\n",
"\\hline\n",
"\t 0.5004502 & 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625\\\\\n",
"\t 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058\\\\\n",
"\t 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000\\\\\n",
"\t 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346\\\\\n",
"\t 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303\\\\\n",
"\t 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303 & 0.8901126\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"| t4 | t3 | t2 | t1 | t0 |\n",
"|---|---|---|---|---|\n",
"| 0.5004502 | 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 |\n",
"| 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 |\n",
"| 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 |\n",
"| 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 |\n",
"| 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 |\n",
"| 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 | 0.8901126 |\n",
"\n"
],
"text/plain": [
" t4 t3 t2 t1 t0 \n",
"[1,] 0.5004502 0.6243512 0.7405486 0.8418178 0.9218625\n",
"[2,] 0.6243512 0.7405486 0.8418178 0.9218625 0.9757058\n",
"[3,] 0.7405486 0.8418178 0.9218625 0.9757058 1.0000000\n",
"[4,] 0.8418178 0.9218625 0.9757058 1.0000000 0.9932346\n",
"[5,] 0.9218625 0.9757058 1.0000000 0.9932346 0.9558303\n",
"[6,] 0.9757058 1.0000000 0.9932346 0.9558303 0.8901126"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"preproc <- ts_norm_gminmax()\n",
"preproc <- fit(preproc, ts)\n",
"ts <- transform(preproc, ts)\n",
"\n",
"ts_head(ts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### spliting into training and test"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"samp <- ts_sample(ts, test_size = 10)\n",
"train <- as.data.frame(samp$train)\n",
"test <- as.data.frame(samp$test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### creating autoencoder\n",
"Reduce from 5 to 3 dimensions"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"auto <- autoenc_encode_decode(5, 3)\n",
"\n",
"auto <- fit(auto, train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### testing autoencoder\n",
"presenting the original test set and display encoding"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" t4 t3 t2 t1 t0\n",
"1 0.7258342 0.8294719 0.9126527 0.9702046 0.9985496\n",
"2 0.8294719 0.9126527 0.9702046 0.9985496 0.9959251\n",
"3 0.9126527 0.9702046 0.9985496 0.9959251 0.9624944\n",
"4 0.9702046 0.9985496 0.9959251 0.9624944 0.9003360\n",
"5 0.9985496 0.9959251 0.9624944 0.9003360 0.8133146\n",
"6 0.9959251 0.9624944 0.9003360 0.8133146 0.7068409\n",
" [,1] [,2] [,3] [,4] [,5]\n",
"[1,] 0.7281464 0.8297843 0.9119417 0.9712389 0.9987636\n",
"[2,] 0.8298926 0.9120755 0.9692511 0.9992059 0.9964882\n",
"[3,] 0.9135329 0.9704542 0.9987994 0.9969148 0.9618803\n",
"[4,] 0.9685430 0.9973767 0.9959966 0.9621131 0.9001518\n",
"[5,] 0.9985738 0.9968489 0.9646484 0.8984091 0.8126789\n",
"[6,] 0.9928818 0.9628512 0.9030885 0.8135285 0.7084026\n"
]
}
],
"source": [
"print(head(test))\n",
"result <- transform(auto, test)\n",
"print(head(result))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.3.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}