{ "cells": [ { "cell_type": "markdown", "id": "23539678", "metadata": {}, "source": [ "[![image](https://raw.githubusercontent.com/visual-layer/visuallayer/main/imgs/vl_horizontal_logo.png)](https://www.visual-layer.com)" ] }, { "cell_type": "markdown", "id": "1b5ed76a", "metadata": {}, "source": [ "# Hugging Face Datasets\n", "This notebook shows how you can load VL Datasets from Hugging Face Datasets and train in PyTorch.\n", "\n", "We will load the [`vl-food101`](https://huggingface.co/datasets/visual-layer/vl-food101) dataset - a sanitized version of the original [Food-101 dataset](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/). Learn more [here](https://docs.visual-layer.com/docs/available-datasets#vl-food101).\n", "\n", "The `vl-food101` is curated to minimize duplicates, outliers, blurry, overly dark and bright images.\n", "The following table summarizes the issues we found in the original Food101 dataset and were removed in in vl-food101.\n", "\n", "\n", "
Category | \n", "Percentage | \n", "Count | \n", "
---|---|---|
Duplicates | \n", "0.23% | \n",
" 235 | \n",
"
Outliers | \n", "0.08% | \n",
" 77 | \n",
"
Blur | \n", "0.18% | \n",
" 185 | \n",
"
Dark | \n", "0.04% | \n",
" 43 | \n",
"
Leakage | \n", "0.086% | \n",
" 87 | \n",
"
Total | \n", "0.62% | \n",
" 627 | \n",
"