{ "cells": [ { "cell_type": "markdown", "id": "95a0e06a-017a-45b3-874f-f8eff0b9b583", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": "157f5cc9-862c-4b03-b450-e2e7c2707153", "diskcache": false, "headerColor": "inherit", "id": "a9f73016-2561-4d67-bd27-be79ccd1f25f", "isComponent": false, "name": "", "parents": [] }, "tags": [] }, "source": [ "# **Welcome to Link!**\n", " \n", "\n", "\n", "## **How Link Helps Data Scientists**\n", " \n", "Link is a simple AI/ML modeling tool developed as an extension of JupyterLab. Although JupyterLab is the most popular development environment for data scientists, it is challenging to generate reproducible and collaborative results with JupyterLab. Link not only resolves this inconvenience but also ensures a smoother flow and a better experience throughout the model development cycle with\n", "\n", "- [Pipeline creation](https://makinarocks.gitbook.io/link/features/create-and-run-pipeline)\n", "- [Caching management](https://makinarocks.gitbook.io/link/features/caching-management)\n", "- [Remote Resources](https://makinarocks.gitbook.io/link/features/create-and-run-pipeline/remote-resources)\n", "- [Hyper-Parameter optimizer](https://makinarocks.gitbook.io/link/features/pipeline-parameter/hyperparameter-optimizer)\n", "- [Version control](https://makinarocks.gitbook.io/link/features/version-control)\n", "- [Collaboration](https://makinarocks.gitbook.io/link/features/export-import)" ] }, { "cell_type": "markdown", "id": "782e5c2e-741c-4649-84d7-c8cc118aaada", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "inherit", "id": "cdccaca5-4248-4e59-b742-d427875dba56", "isComponent": false, "name": "", "parents": [] } }, "source": [ "## **Getting Started**\n", "\n", "Installing Link in a new virtual environment is recommended.\n", "\n", "- Step 1. Install Link with the `python3 -m pip install mrx-link` command.\n", "- Step 2. After installation, activate Link with `python3 -m jupyterlab`.\n", "\n", "If you wish to install it as a Desktop Application, please download fromĀ [here](https://link.makinarocks.ai/download.html)." ] }, { "cell_type": "markdown", "id": "289419ba-788d-4055-bcb1-800cbda96749", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "inherit", "id": "1422f856-73e2-48a9-ac1d-79d5e08ce930", "isComponent": false, "name": "", "parents": [] }, "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## **Help and Support**\n", "\n", "- Official Documentation : [https://makinarocks.gitbook.io/link/](https://makinarocks.gitbook.io/link/)\n", "- Release Notes : [https://makinarocks.gitbook.io/link/version-history/release-notes](https://makinarocks.gitbook.io/link/version-history/release-notes)\n", "- Technical Support : Contact [Technical Support](https://link.makinarocks.ai/technical_support.html) for any questions or issues" ] }, { "cell_type": "markdown", "id": "c11acc63-d219-4b1c-9b0a-9871b2e9d94d", "metadata": { "canvas": { "comments": [ { "message": "max epoch value is 10. You can change this value.\n", "writer": "John Doe" } ], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "inherit", "id": "f664fbee-ada4-4c8b-bd10-0c6c71e6a363", "isComponent": false, "name": "", "parents": [] }, "tags": [] }, "source": [ "## **Examples with Link Pipelines**\n", "\n", "
\n", " \n", "In a file browser to the left, you can find some other notebooks in this demo.\n", "\n", "- [Deep Q-Network Reinforcement Learning for the CartPole Environment](sample-notebooks/Deep_Q-Network_Reinforcement_Learning.ipynb)\n", "- [Denoising Autoencoder with PyTorch](sample-notebooks/Denoising_Autoencoder.ipynb)\n", "- [Iris Classification](sample-notebooks/Iris_EDA_and_Modeling.ipynb)\n", "- [MNIST Image Classification with PyTorch](sample-notebooks/MNIST_Classification.ipynb)\n", "- [MNIST Image Generation with VAE](sample-notebooks/MNIST_Generation_with_VAE.ipynb)\n", "- [Spiral-Distributed Data Classification](sample-notebooks/Spiral_Classification.ipynb)\n", "- [Sequential Data Classification](sample-notebooks/Text_Classification_with_RNN.ipynb)\n", "- [Titanic Data EDA and Modeling](sample-notebooks/Titanic_EDA_and_Modeling.ipynb)\n", "- [XGBoost Example](sample-notebooks/XGBoost.ipynb)\n", "\n", "
\n", " \n", "And below is an example of a classification model that predicts the type of flower from examples of petals. If you click Link tab to the left, you can notice that you can do everything all in the pipeline structures with Link. \n", " \n", "
\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "d8bfaa17-dd64-470b-9a81-72d7cd5d4285", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": "faf50dcf-5099-418e-91cf-4258ead57c7f", "diskcache": false, "headerColor": "none", "id": "2f3f1f8c-94d8-46a7-a3fe-ca3e563be4f2", "isComponent": false, "name": "", "parents": [] } }, "source": [ "### Required Python Packages\n", "- `numpy`\n", "- `pandas`\n", "- `scikit-learn`\n", "- `torch`\n", "- `pytorch-lightning`\n", "- `livelossplot`\n", "\n", "Run the following cell to install the packages. " ] }, { "cell_type": "code", "execution_count": null, "id": "4edd723a-33ce-4db5-855e-ee4743890553", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": "a723a6b4-76fc-4188-92ba-3a0ca990a013", "diskcache": false, "headerColor": "none", "id": "ff50feab-ef5f-4a53-8912-5b6b57e65331", "isComponent": false, "name": "", "parents": [] } }, "outputs": [], "source": [ "#\n", "# Required Packages\n", "# Run this cell to install required packages.\n", "#\n", "%pip install livelossplot \"numpy>=1.19\" \"pandas>=1.1\" \"pytorch_lightning>=1.6,<1.9\" \"scikit-learn>=0.22.2\" \"torch>=1.9\" \"torchmetrics>=0.11\"" ] }, { "cell_type": "markdown", "id": "496466c9-747f-4165-b71a-08e9ae1af3bf", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "09eed00a-8990-4665-92c5-1d9654b0723d", "isComponent": false, "name": "", "parents": [] }, "tags": [] }, "source": ["### **Load iris data**"] }, { "cell_type": "code", "execution_count": null, "id": "4869d963-6165-44ad-8803-2644b350857d", "metadata": { "canvas": { "comments": [ { "message": "Load iris data with sklearn", "writer": "John Doe" } ], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": true, "headerColor": "#00E3FF", "id": "eef2dc4b-fb13-419b-a5af-f0031467c624", "isComponent": true, "name": "Load iris data", "parents": [] }, "tags": [] }, "outputs": [], "source": [ "from sklearn.datasets import load_iris\n", "\n", "data = load_iris()" ] }, { "cell_type": "markdown", "id": "6745d3f2-199c-4402-8ceb-382f750cca1c", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "399d38a8-ddfb-40ea-a5fa-29d669a1a3c2", "isComponent": false, "name": "", "parents": [] }, "tags": [] }, "source": ["### **Create dataframe**"] }, { "cell_type": "code", "execution_count": null, "id": "dbaf0c0b-593f-4293-a9c8-409887b130c5", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": true, "headerColor": "#00E3FF", "id": "cc60d6b2-ac7b-4963-860e-15753f04026f", "isComponent": true, "name": "Create dataframe", "parents": [ { "id": "eef2dc4b-fb13-419b-a5af-f0031467c624", "name": "Load iris data" } ] }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "df = pd.DataFrame(data[\"data\"], columns=data[\"feature_names\"])\n", "df[\"target\"] = data[\"target\"]" ] }, { "cell_type": "markdown", "id": "3a6ebe43-883d-4d72-b5f2-792ee6973294", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "0264c406-d9dd-43c0-959b-4cb45a4a342e", "isComponent": false, "name": "", "parents": [] } }, "source": ["### **Train/valid split**"] }, { "cell_type": "code", "execution_count": null, "id": "a9f18f8c-8510-490e-807f-c7c1fc3e00e0", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": true, "headerColor": "#00DE62", "id": "99e0b9ab-ef73-4805-8628-8ef0e9811e5e", "isComponent": true, "name": "Train valid split", "parents": [ { "id": "cc60d6b2-ac7b-4963-860e-15753f04026f", "name": "Create dataframe" } ] }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "\n", "train_ratio = 0.7\n", "train_len = int(train_ratio * len(df))\n", "indices = np.random.permutation(df.index)\n", "train_indices = indices[:train_len]\n", "valid_indices = indices[train_len:]" ] }, { "cell_type": "markdown", "id": "eb7e8f3a-fd0c-465a-84d3-778d84014026", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "d7f758b7-2de2-4add-9baa-0aa98e1f67e7", "isComponent": false, "name": "", "parents": [] } }, "source": ["### **Create dataloaders**"] }, { "cell_type": "code", "execution_count": null, "id": "43f21883-ce63-46c0-9c40-100b8fb48f60", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": true, "headerColor": "#00DE62", "id": "90b151b8-5c3d-48a2-b145-cf20d413d411", "isComponent": true, "name": "Create dataloaders", "parents": [ { "id": "99e0b9ab-ef73-4805-8628-8ef0e9811e5e", "name": "Train valid split" } ] }, "tags": [] }, "outputs": [], "source": [ "import torch\n", "from torch.utils.data import DataLoader, TensorDataset, Subset\n", "\n", "dataset = TensorDataset(torch.from_numpy(df.loc[:, data[\"feature_names\"]].values).float(), torch.from_numpy(df.loc[:, \"target\"].values).long())\n", "train_loader = DataLoader(Subset(dataset, train_indices), batch_size=20)\n", "valid_loader = DataLoader(Subset(dataset, valid_indices), batch_size=len(valid_indices))" ] }, { "cell_type": "markdown", "id": "c0658be1-9d6e-44bc-9582-05e79e13a996", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "ce720e08-919f-434c-957f-4dd4a34c7a0f", "isComponent": false, "name": "", "parents": [] } }, "source": ["### **Define pl model**"] }, { "cell_type": "code", "execution_count": null, "id": "962dc241-3ab6-40ea-b9ac-11319f17de2e", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "#FF004F", "id": "c68dd5c0-4948-447f-83a5-51fd8856d231", "isComponent": true, "name": "Define pl model", "parents": [] }, "tags": [] }, "outputs": [], "source": [ "from typing import Union, Dict, List, Any\n", "import torch\n", "import torchmetrics\n", "import pytorch_lightning as pl\n", "\n", "class Classifier(pl.LightningModule):\n", " \n", " def __init__(self):\n", " super().__init__()\n", "\n", " self._metrics: Dict[str, Dict[str, Any]] = {\n", " \"train\": {\n", " \"acc\": torchmetrics.Accuracy(task=\"multiclass\", num_classes=3),\n", " },\n", " \"valid\": {\n", " \"acc\": torchmetrics.Accuracy(task=\"multiclass\", num_classes=3),\n", " },\n", " }\n", " seq = [\n", " torch.nn.Linear(4, 5),\n", " torch.nn.ReLU(),\n", " torch.nn.Linear(5, 8),\n", " torch.nn.ReLU(),\n", " torch.nn.Linear(8, 5),\n", " torch.nn.ReLU(),\n", " torch.nn.Linear(5, 3),\n", " torch.nn.Softmax(),\n", " ]\n", " self.fcn: torch.nn.Sequential = torch.nn.Sequential(*seq)\n", "\n", " def forward(self, x: torch.Tensor) -> Any:\n", " \"\"\"\n", " Documentation.\n", " Parameters\n", " ----------\n", " x: torch.Tensor\n", " Returns\n", " -------\n", " Any\n", " \"\"\"\n", " return self.fcn(x)\n", "\n", " def training_step(self, train_batch: torch.Tensor, batch_idx: int) -> Union[torch.Tensor, Dict[str, Any]]:\n", " batch_x, batch_y = train_batch\n", " batch_y_hat: torch.Tensor = self(batch_x)\n", " loss = torch.nn.CrossEntropyLoss()(batch_y_hat, batch_y)\n", " self.log(\"train_loss\", loss, on_epoch=True)\n", "\n", " train_metrics: List[Dict[str, Any]] = self._metrics[\"train\"]\n", " for name, metric in train_metrics.items():\n", " self.log(\"Train: {0}\".format(name), metric(torch.argmax(batch_y_hat, dim=1), batch_y), on_epoch=True)\n", " return loss\n", "\n", " def validation_step(self, valid_batch: torch.Tensor, batch_idx: int) -> None:\n", " batch_x, batch_y = valid_batch\n", "\n", " batch_y_hat: torch.Tensor = self(batch_x)\n", " loss = torch.nn.CrossEntropyLoss()(batch_y_hat, batch_y)\n", " self.log(\"valid_loss\", loss, on_epoch=True)\n", " for name, metric in self._metrics[\"valid\"].items():\n", " self.log(\"Valid: {0}\".format(name), metric(torch.argmax(batch_y_hat, dim=1), batch_y), on_epoch=True)\n", "\n", " def configure_optimizers(self) -> Any:\n", " optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)\n", " return optimizer" ] }, { "cell_type": "markdown", "id": "b078646c-2116-403f-afd0-5182045700ce", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "e44ffe99-4b92-48c2-ac33-cdf8051e688a", "isComponent": false, "name": "", "parents": [] } }, "source": ["### **Create pl model**"] }, { "cell_type": "code", "execution_count": null, "id": "bd6e6209-4d3f-4c68-b211-d8fde54d0fce", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "#FF004F", "id": "ccee2dd0-8ca1-49b1-aa96-aeb3a9cca728", "isComponent": true, "name": "Create pl model", "parents": [ { "id": "c68dd5c0-4948-447f-83a5-51fd8856d231", "name": "Define pl model" } ] }, "tags": [] }, "outputs": [], "source": ["model = Classifier()"] }, { "cell_type": "markdown", "id": "ba871cb4-5639-4282-b545-6ac84b86bd6f", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "683f94a7-2159-487e-bfd7-3579786aa2e7", "isComponent": false, "name": "", "parents": [] } }, "source": ["### **Define liveplot logger**"] }, { "cell_type": "code", "execution_count": null, "id": "e0973888-a208-4417-a751-992486e93559", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "#6C00FF", "id": "4499176e-1f68-4dcb-b4d3-b9271c143e3a", "isComponent": true, "name": "Define liveplot logger", "parents": [] }, "tags": [] }, "outputs": [], "source": [ "import argparse\n", "from typing import Any, Dict, Optional\n", "from livelossplot import PlotLosses\n", "from pytorch_lightning.loggers import LightningLoggerBase\n", "from pytorch_lightning.loggers.base import rank_zero_experiment\n", "from pytorch_lightning.utilities import rank_zero_only, types\n", "\n", "\n", "class PlotLossesLogger(LightningLoggerBase):\n", " \"\"\"Canvas Logger for lightning trainers\n", " Args:\n", " LightningLoggerBase ([type]): [description]\n", " \"\"\"\n", "\n", " def __init__(self, experiment=\"tmp\", max_epoch=None, **kwargs):\n", " super().__init__()\n", " self._plotlosses = PlotLosses(**kwargs).reset_outputs().to_matplotlib(max_epoch=max_epoch)\n", " self._experiment = experiment\n", " self._last_epoch = 0\n", " self._last_metrics = {}\n", "\n", " @property # type: ignore\n", " @rank_zero_experiment\n", " def experiment(self) -> str:\n", " return self._experiment\n", "\n", " @rank_zero_only\n", " def log_hyperparams(self, params: argparse.Namespace, *args: Any, **kwargs: Any) -> None:\n", " pass\n", "\n", " @rank_zero_only\n", " def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None:\n", " if metrics[\"epoch\"] > self._last_epoch and \"valid_loss\" in metrics:\n", " # Update accumulated metrics\n", " if \"train_loss_step\" in self._last_metrics:\n", " self._last_metrics.pop(\"train_loss_step\")\n", " self._plotlosses.update(self._last_metrics)\n", " self._plotlosses.send()\n", " self._last_epoch = metrics.pop(\"epoch\")\n", " self._last_metrics = metrics\n", " else:\n", " metrics.pop(\"epoch\")\n", " self._last_metrics = {**self._last_metrics, **metrics}\n", "\n", " @property\n", " def name(self) -> str:\n", " return \"canvas\"\n", "\n", " @property\n", " def version(self) -> str:\n", " return \"prototype\"" ] }, { "cell_type": "markdown", "id": "d26630db-0bc7-4dd4-8e5b-6ac28cdc28de", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "a40c2a7b-2d6e-44d9-b324-14965b36e94c", "isComponent": false, "name": "", "parents": [] } }, "source": ["### **Train model**"] }, { "cell_type": "code", "execution_count": null, "id": "cb628aaf-03e9-4e3a-bc9b-0597fcbbcaf7", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "none", "id": "e5873610-eae0-44bb-9afe-c6a998def201", "isComponent": true, "name": "Train model", "parents": [ { "id": "ccee2dd0-8ca1-49b1-aa96-aeb3a9cca728", "name": "Create pl model" }, { "id": "4499176e-1f68-4dcb-b4d3-b9271c143e3a", "name": "Define liveplot logger" }, { "id": "90b151b8-5c3d-48a2-b145-cf20d413d411", "name": "Create dataloaders" } ] }, "tags": [] }, "outputs": [], "source": [ "max_epoch = 10\n", "logger = PlotLossesLogger(\n", " groups={\"loss\": [\"train_loss_epoch\", \"valid_loss\"], \"Accuracy\": [\"Train: acc_epoch\", \"Valid: acc\"]},\n", " max_epoch=max_epoch,\n", ")\n", "trainer = pl.Trainer(max_epochs=max_epoch, logger=[logger], enable_progress_bar=False)\n", "trainer.fit(model, train_loader, valid_loader)" ] }, { "cell_type": "code", "execution_count": null, "id": "35968c87-ccb4-4fd9-887b-736f3e66a22b", "metadata": { "canvas": { "comments": [], "componentType": "CodeCell", "copiedOriginId": null, "diskcache": false, "headerColor": "inherit", "id": "b8d8d3f6-c43a-4596-a67e-6e59c654c9cb", "isComponent": false, "name": "", "parents": [] } }, "outputs": [], "source": [] } ], "metadata": { "canvas": { "parameters": [ { "name": "a", "type": "int", "value": "12" } ], "version": "1.0" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 5 }