{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Pzd_l-d4IBap" }, "source": [ "# Quickstart Guide\n", "\n", "This guide will give a quick intro to training TorchVision models such as VGG16, Inception V3, Resnet50, AlexNet and much more with HugsVision. We'll start by loading in some data and defining a model, then we'll train it for a few epochs and see how well it does.\n", "\n", "**Note**: The easiest way to use this tutorial is as a colab notebook, which allows you to dive in with no setup. We recommend you enable a free GPU with\n", "\n", "> **Runtime**   →   **Change runtime type**   →   **Hardware Accelerator: GPU**\n", "\n", "**Note**: You need to have at least Python 3.6 to run the scripts.\n", "\n", "## Install HugsVision\n", "\n", "First we install HugsVision if needed. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZYAyYoSkISKe" }, "outputs": [], "source": [ "try:\n", " import hugsvision\n", "except:\n", " !pip install -q hugsvision\n", " import hugsvision\n", " \n", "print(hugsvision.__version__)" ] }, { "cell_type": "markdown", "metadata": { "id": "tht3p78fIav4" }, "source": [ "## Downloading Data\n", "\n", "First, we need to download the dataset called `Skin Cancer MNIST: HAM10000` [here](https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000) which weight around ~3.0 GB." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert Data\n", "\n", "Once it has been downloaded, you need to converted the data to a directory based format:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import shutil\n", "import pandas as pd\n", "metadata = pd.read_csv(\"HAM10000_metadata.csv\").set_index('image_id').T.to_dict('list')\n", "for current_dir in [\"HAM10000_images_part_1\",\"HAM10000_images_part_2\"]:\n", " for image_path in os.listdir(current_dir):\n", " label = metadata[image_path.split(\".\")[0]][1]\n", " os.makedirs(\"./HAM10000/\" + label, exist_ok=True)\n", " shutil.copy2(current_dir + \"/\" + image_path, \"./HAM10000/\" + label + \"/\" + image_path)" ] }, { "cell_type": "markdown", "metadata": { "id": "1n_kAX_gJ10T" }, "source": [ "## Loading Data\n", "\n", "Once it has been converted, we can start loading the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3VAjDhVsKSkt" }, "outputs": [], "source": [ "from hugsvision.dataio.VisionDataset import VisionDataset\n", "\n", "train, test, id2label, label2id = VisionDataset.fromImageFolder(\n", "\t\"/users/ylabrak/datasets/HAM10000/\",\n", "\ttest_ratio = 0.10,\n", "\tbalanced = True,\n", "\ttorch_vision = True,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "2cKwvSqdKeQf" }, "source": [ "## Choose a image classifier model on TorchVision\n", "\n", "Now we can choose our base model on which we will perform a fine-tuning to make it fit our needs.\n", "\n", "So, to be sure that the model will be compatible with `HugsVision` we need to have a model exported in `PyTorch` and listed in the [TORCHVISION.MODELS](https://pytorch.org/vision/stable/models.html) section of the `PyTorch` documentation.\n", "\n", "You can also find the list of available models by typing directly in Python:\n", "\n", "```python\n", "import torchvision.models as models\n", "print(models.__dict__.keys())\n", "```\n", "\n", "Output:\n", "\n", "```python\n", "dict_keys(['alexnet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19_bn', 'vgg19', 'squeezenet1_0', 'squeezenet1_1', 'inception_v3', 'densenet121', 'densenet169', 'densenet201', 'densenet161', 'googlenet', 'mobilenet_v2', 'mobilenet_v3_large', 'mobilenet_v3_small', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0'])\n", "```\n", "\n", "Others models such as ```shufflenet_v2_x1_5 shufflenet_v2_x2_0``` wasn't implemented yet by the PyTorch team and return a ```NotImplementedError```." ] }, { "cell_type": "markdown", "metadata": { "id": "bI1MYIzgNcL5" }, "source": [ "## Train the model\n", "\n", "So, once the model chosen, we can start building the `Trainer` and start the fine-tuning.\n", "\n", "**Note**: Some architectures need to apply a transformation function to your training data to fit the model requirements. Globally, most of the models are expecting `224x224` resolutions but there's some counter examples in which you will need to add:\n", "\n", "```python\n", "torch_vision=True,\n", "transform=transforms.Compose([\n", " transforms.Resize((224,224)),\n", " transforms.ToTensor(),\n", "])\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "60F0IXvQNc0N" }, "outputs": [], "source": [ "from hugsvision.nnet.TorchVisionClassifierTrainer import TorchVisionClassifierTrainer\n", "\n", "trainer = TorchVisionClassifierTrainer(\n", "\toutput_dir = \"./out_torchvision/HAM10000/\",\n", "\tmodel_name = 'densenet121',\n", "\ttrain \t = train,\n", "\ttest \t = test,\n", "\tbatch_size = 64,\n", "\tmax_epochs = 100,\n", "\tid2label \t = id2label,\n", "\tlabel2id \t = label2id,\n", "\tlr=1e-3,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "C56x7mkEN6Rx" }, "source": [ "## Evaluate F1-Score\n", "\n", "Using the F1-Score metrics will allow us to get a better representation of predictions for all the labels and find out if their are any anomalies wit ha specific label.\n", "\n", "By default, the trainer is running a evaluation after each epoch with the latest model and at the end of training phase with the best model but you can also run it manually if you want to collect the list of predictions after the `argmax` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hC2pA2EzQahn" }, "outputs": [], "source": [ "hyp, ref = trainer.evaluate_f1_score()" ] }, { "cell_type": "markdown", "metadata": { "id": "qPCJviqYQdb8" }, "source": [ "```\n", " precision recall f1-score support\n", "\n", " akiec 0.80 0.80 0.80 10\n", " bcc 0.80 0.89 0.84 9\n", " bkl 0.80 0.80 0.80 10\n", " df 1.00 1.00 1.00 15\n", " mel 0.75 0.67 0.71 9\n", " nv 0.92 0.86 0.89 14\n", " vasc 0.93 1.00 0.97 14\n", "\n", " accuracy 0.88 81\n", " macro avg 0.86 0.86 0.86 81\n", "weighted avg 0.88 0.88 0.88 81\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "MDu0-6kh3nGx" }, "source": [ "## Make a prediction" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fOZ72c-i3nGy" }, "outputs": [], "source": [ "from PIL import Image\n", "from hugsvision.inference.TorchVisionClassifierInference import TorchVisionClassifierInference\n", "\n", "classifier = TorchVisionClassifierInference(model_path = \"./OUT_TORCHVISION/HAM10000/\")\n", "\n", "print(\"Predicted class:\", classifier.predict(img_path=\"/users/ylabrak/datasets/HAM10000/bcc/ISIC_0024331.jpg\"))\n", "print(\"Predicted class:\", classifier.predict_image(img=Image.open(\"/users/ylabrak/datasets/HAM10000/bcc/ISIC_0024331.jpg\")))" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "Kvasir_v2_Image_Classifier.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }