{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Autoencoder transformation (encode-decode)\n", "\n", "Considering a dataset with $p$ numerical attributes. \n", "\n", "The goal of the autoencoder is to reduce the dimension of $p$ to $k$, such that these $k$ attributes are enough to recompose the original $p$ attributes. However from the $k$ dimensionals the data is returned back to $p$ dimensions. The higher the quality of autoencoder the similiar is the output from the input. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading required package: daltoolbox\n", "\n", "Registered S3 method overwritten by 'quantmod':\n", " method from\n", " as.zoo.data.frame zoo \n", "\n", "\n", "Attaching package: ‘daltoolbox’\n", "\n", "\n", "The following object is masked from ‘package:base’:\n", "\n", " transform\n", "\n", "\n" ] } ], "source": [ "# DAL ToolBox\n", "# version 1.0.777\n", "\n", "source(\"https://raw.githubusercontent.com/cefet-rj-dal/daltoolbox/main/jupyter.R\")\n", "\n", "#loading DAL\n", "load_library(\"daltoolbox\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### dataset for example " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 6 × 5 of type dbl
t4t3t2t1t0
0.00000000.24740400.47942550.68163880.8414710
0.24740400.47942550.68163880.84147100.9489846
0.47942550.68163880.84147100.94898460.9974950
0.68163880.84147100.94898460.99749500.9839859
0.84147100.94898460.99749500.98398590.9092974
0.94898460.99749500.98398590.90929740.7780732
\n" ], "text/latex": [ "A matrix: 6 × 5 of type dbl\n", "\\begin{tabular}{lllll}\n", " t4 & t3 & t2 & t1 & t0\\\\\n", "\\hline\n", "\t 0.0000000 & 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710\\\\\n", "\t 0.2474040 & 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846\\\\\n", "\t 0.4794255 & 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950\\\\\n", "\t 0.6816388 & 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859\\\\\n", "\t 0.8414710 & 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974\\\\\n", "\t 0.9489846 & 0.9974950 & 0.9839859 & 0.9092974 & 0.7780732\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 6 × 5 of type dbl\n", "\n", "| t4 | t3 | t2 | t1 | t0 |\n", "|---|---|---|---|---|\n", "| 0.0000000 | 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 |\n", "| 0.2474040 | 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 |\n", "| 0.4794255 | 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 |\n", "| 0.6816388 | 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 |\n", "| 0.8414710 | 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 |\n", "| 0.9489846 | 0.9974950 | 0.9839859 | 0.9092974 | 0.7780732 |\n", "\n" ], "text/plain": [ " t4 t3 t2 t1 t0 \n", "[1,] 0.0000000 0.2474040 0.4794255 0.6816388 0.8414710\n", "[2,] 0.2474040 0.4794255 0.6816388 0.8414710 0.9489846\n", "[3,] 0.4794255 0.6816388 0.8414710 0.9489846 0.9974950\n", "[4,] 0.6816388 0.8414710 0.9489846 0.9974950 0.9839859\n", "[5,] 0.8414710 0.9489846 0.9974950 0.9839859 0.9092974\n", "[6,] 0.9489846 0.9974950 0.9839859 0.9092974 0.7780732" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data(sin_data)\n", "\n", "sw_size <- 5\n", "ts <- ts_data(sin_data$y, sw_size)\n", "\n", "ts_head(ts)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### applying data normalization" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A matrix: 6 × 5 of type dbl
t4t3t2t1t0
0.50045020.62435120.74054860.84181780.9218625
0.62435120.74054860.84181780.92186250.9757058
0.74054860.84181780.92186250.97570581.0000000
0.84181780.92186250.97570581.00000000.9932346
0.92186250.97570581.00000000.99323460.9558303
0.97570581.00000000.99323460.95583030.8901126
\n" ], "text/latex": [ "A matrix: 6 × 5 of type dbl\n", "\\begin{tabular}{lllll}\n", " t4 & t3 & t2 & t1 & t0\\\\\n", "\\hline\n", "\t 0.5004502 & 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625\\\\\n", "\t 0.6243512 & 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058\\\\\n", "\t 0.7405486 & 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000\\\\\n", "\t 0.8418178 & 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346\\\\\n", "\t 0.9218625 & 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303\\\\\n", "\t 0.9757058 & 1.0000000 & 0.9932346 & 0.9558303 & 0.8901126\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A matrix: 6 × 5 of type dbl\n", "\n", "| t4 | t3 | t2 | t1 | t0 |\n", "|---|---|---|---|---|\n", "| 0.5004502 | 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 |\n", "| 0.6243512 | 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 |\n", "| 0.7405486 | 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 |\n", "| 0.8418178 | 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 |\n", "| 0.9218625 | 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 |\n", "| 0.9757058 | 1.0000000 | 0.9932346 | 0.9558303 | 0.8901126 |\n", "\n" ], "text/plain": [ " t4 t3 t2 t1 t0 \n", "[1,] 0.5004502 0.6243512 0.7405486 0.8418178 0.9218625\n", "[2,] 0.6243512 0.7405486 0.8418178 0.9218625 0.9757058\n", "[3,] 0.7405486 0.8418178 0.9218625 0.9757058 1.0000000\n", "[4,] 0.8418178 0.9218625 0.9757058 1.0000000 0.9932346\n", "[5,] 0.9218625 0.9757058 1.0000000 0.9932346 0.9558303\n", "[6,] 0.9757058 1.0000000 0.9932346 0.9558303 0.8901126" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "preproc <- ts_norm_gminmax()\n", "preproc <- fit(preproc, ts)\n", "ts <- transform(preproc, ts)\n", "\n", "ts_head(ts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### spliting into training and test" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "samp <- ts_sample(ts, test_size = 10)\n", "train <- as.data.frame(samp$train)\n", "test <- as.data.frame(samp$test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### creating autoencoder\n", "Reduce from 5 to 3 dimensions" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "ename": "ERROR", "evalue": "Error in fit.autoenc_encode_decode(auto, train): object 'return_loss' not found\n", "output_type": "error", "traceback": [ "Error in fit.autoenc_encode_decode(auto, train): object 'return_loss' not found\nTraceback:\n", "1. fit(auto, train)", "2. fit.autoenc_encode_decode(auto, train)" ] } ], "source": [ "auto <- autoenc_encode_decode(5, 3)\n", "\n", "auto <- fit(auto, train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### testing autoencoder\n", "presenting the original test set and display encoding" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(head(test))\n", "result <- transform(auto, test)\n", "print(head(result))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.4.1" } }, "nbformat": 4, "nbformat_minor": 4 }