{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Autoencoder transformation (encode-decode)\n",
"\n",
"Considering a dataset with $p$ numerical attributes. \n",
"\n",
"The goal of the autoencoder is to reduce the dimension of $p$ to $k$, such that these $k$ attributes are enough to recompose the original $p$ attributes. However from the $k$ dimensionals the data is returned back to $p$ dimensions. The higher the quality of autoencoder the similiar is the output from the input. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Loading required package: daltoolbox\n",
"\n",
"Registered S3 method overwritten by 'quantmod':\n",
" method from\n",
" as.zoo.data.frame zoo \n",
"\n",
"\n",
"Attaching package: ‘daltoolbox’\n",
"\n",
"\n",
"The following object is masked from ‘package:base’:\n",
"\n",
" transform\n",
"\n",
"\n"
]
}
],
"source": [
"# DAL ToolBox\n",
"# version 1.0.777\n",
"\n",
"source(\"https://raw.githubusercontent.com/cefet-rj-dal/daltoolbox/main/jupyter.R\")\n",
"\n",
"#loading DAL\n",
"load_library(\"daltoolbox\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### dataset for example "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"\tt4 | t3 | t2 | t1 | t0 |
\n",
"\n",
"\n",
"\t0.0000000 | 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 |
\n",
"\t0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 |
\n",
"\t0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 |
\n",
"\t0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 |
\n",
"\t0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 |
\n",
"\t0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 | 0.7780732 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A matrix: 6 × 5 of type dbl\n",
"\\begin{tabular}{lllll}\n",
" t4 & t3 & t2 & t1 & t0\\\\\n",
"\\hline\n",
"\t 0.0000000 & 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710\\\\\n",
"\t 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846\\\\\n",
"\t 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950\\\\\n",
"\t 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859\\\\\n",
"\t 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974\\\\\n",
"\t 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974 & 0.7780732\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"| t4 | t3 | t2 | t1 | t0 |\n",
"|---|---|---|---|---|\n",
"| 0.0000000 | 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 |\n",
"| 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 |\n",
"| 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 |\n",
"| 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 |\n",
"| 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 |\n",
"| 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 | 0.7780732 |\n",
"\n"
],
"text/plain": [
" t4 t3 t2 t1 t0 \n",
"[1,] 0.0000000 0.2474040 0.4794255 0.6816388 0.8414710\n",
"[2,] 0.2474040 0.4794255 0.6816388 0.8414710 0.9489846\n",
"[3,] 0.4794255 0.6816388 0.8414710 0.9489846 0.9974950\n",
"[4,] 0.6816388 0.8414710 0.9489846 0.9974950 0.9839859\n",
"[5,] 0.8414710 0.9489846 0.9974950 0.9839859 0.9092974\n",
"[6,] 0.9489846 0.9974950 0.9839859 0.9092974 0.7780732"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data(sin_data)\n",
"\n",
"sw_size <- 5\n",
"ts <- ts_data(sin_data$y, sw_size)\n",
"\n",
"ts_head(ts)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### applying data normalization"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"\tt4 | t3 | t2 | t1 | t0 |
\n",
"\n",
"\n",
"\t0.5004502 | 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 |
\n",
"\t0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 |
\n",
"\t0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 |
\n",
"\t0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 |
\n",
"\t0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 |
\n",
"\t0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 | 0.8901126 |
\n",
"\n",
"
\n"
],
"text/latex": [
"A matrix: 6 × 5 of type dbl\n",
"\\begin{tabular}{lllll}\n",
" t4 & t3 & t2 & t1 & t0\\\\\n",
"\\hline\n",
"\t 0.5004502 & 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625\\\\\n",
"\t 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058\\\\\n",
"\t 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000\\\\\n",
"\t 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346\\\\\n",
"\t 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303\\\\\n",
"\t 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303 & 0.8901126\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"A matrix: 6 × 5 of type dbl\n",
"\n",
"| t4 | t3 | t2 | t1 | t0 |\n",
"|---|---|---|---|---|\n",
"| 0.5004502 | 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 |\n",
"| 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 |\n",
"| 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 |\n",
"| 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 |\n",
"| 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 |\n",
"| 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 | 0.8901126 |\n",
"\n"
],
"text/plain": [
" t4 t3 t2 t1 t0 \n",
"[1,] 0.5004502 0.6243512 0.7405486 0.8418178 0.9218625\n",
"[2,] 0.6243512 0.7405486 0.8418178 0.9218625 0.9757058\n",
"[3,] 0.7405486 0.8418178 0.9218625 0.9757058 1.0000000\n",
"[4,] 0.8418178 0.9218625 0.9757058 1.0000000 0.9932346\n",
"[5,] 0.9218625 0.9757058 1.0000000 0.9932346 0.9558303\n",
"[6,] 0.9757058 1.0000000 0.9932346 0.9558303 0.8901126"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"preproc <- ts_norm_gminmax()\n",
"preproc <- fit(preproc, ts)\n",
"ts <- transform(preproc, ts)\n",
"\n",
"ts_head(ts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### spliting into training and test"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"samp <- ts_sample(ts, test_size = 10)\n",
"train <- as.data.frame(samp$train)\n",
"test <- as.data.frame(samp$test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### creating autoencoder\n",
"Reduce from 5 to 3 dimensions"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"ename": "ERROR",
"evalue": "name 'sample' is not defined",
"output_type": "error",
"traceback": [
"name 'sample' is not definedTraceback:\n",
"1. fit(auto, train)",
"2. fit.autoenc_encode_decode(auto, train)",
"3. autoencoder_fit(obj$model, data, num_epochs = obj$num_epochs, \n . learning_rate = obj$learning_rate, return_loss = obj$return_loss)",
"4. py_call_impl(callable, call_args$unnamed, call_args$named)"
]
}
],
"source": [
"auto <- autoenc_encode_decode(5, 3)\n",
"\n",
"auto <- fit(auto, train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### testing autoencoder\n",
"presenting the original test set and display encoding"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(head(test))\n",
"result <- transform(auto, test)\n",
"print(head(result))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "4.4.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}