{ "cells": [ { "cell_type": "markdown", "id": "141d5896-bbc8-4074-b9a4-3129ddc1f690", "metadata": {}, "source": [ "# Using the Edge Impulse Python SDK to Upload and Download Data\n", "\n", "" ] }, { "cell_type": "markdown", "id": "45f88d97-6cc4-406f-a876-5071367ad259", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " View on edgeimpulse.com\n", " \n", " Run in Colab\n", " \n", " View source on GitHub\n", " \n", " Download notebook\n", "
" ] }, { "cell_type": "markdown", "id": "1d958839-d8cb-469e-9924-374370fb235f", "metadata": {}, "source": [ "If you want to upload files directly to an Edge Impulse project, we recommend using the [CLI uploader tool](https://docs.edgeimpulse.com/docs/tools/edge-impulse-cli/cli-uploader). However, sometimes you cannot upload your samples directly, as you might need to convert the files to one of the accepted formats or modify the data prior to model training. Edge Impulse offers [data augmentation](https://docs.edgeimpulse.com/docs/tips-and-tricks/data-augmentation) for some types of projects, but you might want to create your own custom augmentation scheme. Or perhaps you want to [generate synthetic data](https://docs.edgeimpulse.com/docs/tutorials/ml-and-data-engineering/generate-synthetic-datasets) and script the upload process.\n", "\n", "The Python SDK offers a set of functions to help you move data into and out of your project. This can be extremely helpful when generating or augmenting your dataset. The following cells demonstrate some of these upload and download functions.\n", "\n", "You can find the API documentation for the functions found in this tutorial [here](https://edgeimpulse.github.io/python-sdk/source/edgeimpulse.data.html). \n", "\n", "> **WARNING:** This notebook will add and delete data in your Edge Impulse project, so be careful! We recommend creating a throwaway project when testing this notebook.\n", "\n", "Note that you might need to refresh the page with your Edge Impulse project to see the samples appear." ] }, { "cell_type": "code", "execution_count": null, "id": "68f553aa-ed96-4551-bc84-84b76d8add4c", "metadata": {}, "outputs": [], "source": [ "# If you have not done so already, install the following dependencies\n", "!python -m pip install edgeimpulse" ] }, { "cell_type": "code", "execution_count": null, "id": "370bd3e1-a6ea-4346-bfe5-a569ea06d611", "metadata": {}, "outputs": [], "source": [ "import edgeimpulse as ei" ] }, { "cell_type": "markdown", "id": "461f43db-f44d-4e8b-abe2-269a611665f5", "metadata": {}, "source": [ "You will need to obtain an API key from an Edge Impulse project. Log into [edgeimpulse.com](https://edgeimpulse.com/) and create a new project. Open the project, navigate to **Dashboard** and click on the **Keys** tab to view your API keys. Double-click on the API key to highlight it, right-click, and select **Copy**.\n", "\n", "![Copy API key from Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-copy-ei-api-key.png)\n", "\n", "Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.\n", "\n", "Paste that API key string in the `ei.API_KEY` value in the following cell:" ] }, { "cell_type": "code", "execution_count": null, "id": "8f41407f-4ee8-4ddb-9eea-ed7b6d3947a7", "metadata": {}, "outputs": [], "source": [ "# Settings\n", "ei.API_KEY = \"ei_dae2...\" # Change this to your Edge Impulse API key" ] }, { "cell_type": "markdown", "id": "97a8f05a-5ce0-4edc-a5d2-6cb1132135fe", "metadata": {}, "source": [ "## Upload directory\n", "\n", "You can upload all files in a directory using the Python SDK. Note that you can set the *category*, *label*, and *metadata* for all files with a single call. If you want to use a different label for each file set `label=None` in the function call and name your files with *\\.\\.\\*. For example, *wave.01.csv* will have the label *wave* when uploaded. See [here](https://docs.edgeimpulse.com/docs/tools/edge-impulse-cli/cli-uploader#custom-labeling-and-metadata) for more information.\n", "\n", "The following file formats are allowed: *.cbor*, *.json*, *.csv*, *.wav*, *.jpg*, *.png*, *.mp4*, *.avi*." ] }, { "cell_type": "code", "execution_count": null, "id": "ecbfd327-11bf-4725-872b-e411f54aa67b", "metadata": {}, "outputs": [], "source": [ "from datetime import datetime" ] }, { "cell_type": "code", "execution_count": null, "id": "d2ef5e4b-5fb7-4dcd-b455-e55188ded7fc", "metadata": {}, "outputs": [], "source": [ "# Download image files to use as an example dataset\n", "!mkdir -p dataset\n", "!wget -P dataset -q \\\n", " https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.01.png \\\n", " https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.02.png" ] }, { "cell_type": "code", "execution_count": null, "id": "4f47479f-d742-474f-9e99-a98d5f0b2068", "metadata": {}, "outputs": [], "source": [ "# Upload the entire directory\n", "response = ei.experimental.data.upload_directory(\n", " directory=\"dataset\",\n", " category=\"training\",\n", " label=None, # Will use the prefix before the '.' on each filename for the label\n", " metadata={\n", " \"date\": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),\n", " \"source\": \"camera\",\n", " }\n", ")\n", "\n", "# Check to make sure there were no failures\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "\n", "# Save the sample IDs, as we will need these to retrieve file information and delete samples\n", "ids = []\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)\n", "\n", "# Review the sample IDs and get the associated server-side filename\n", "# Note the lack of extension! Multiple samples on the server can have the same filename.\n", "for id in ids:\n", " filename = ei.experimental.data.get_filename_by_id(id)\n", " print(f\"Sample ID: {id}, filename: {filename}\")" ] }, { "cell_type": "markdown", "id": "76b875be-e3e6-4a1f-bad3-8f4f4a0f0f99", "metadata": {}, "source": [ "If you head to the *Data acquisition* page on your project, you should see images in your dataset.\n", "\n", "![Images uploaded to Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-upload-download-images.png)" ] }, { "cell_type": "markdown", "id": "107abe73-d16e-4e17-8b70-bebf91adf755", "metadata": {}, "source": [ "## Download files\n", "\n", "You can download samples from your Edge Impulse project if you know the sample IDs. You can get sample IDs by calling the `ei.data.get_sample_ids()` function, which allows you to filter IDs based on filename, category, and label. " ] }, { "cell_type": "code", "execution_count": null, "id": "4322955a-efb1-4bf9-a5c7-a4042b78b2e2", "metadata": {}, "outputs": [], "source": [ "# Get sample IDs for everything in the \"training\" category\n", "infos = ei.experimental.data.get_sample_ids(category=\"training\")\n", "\n", "# The SampleInfo should match what we uploaded earlier\n", "ids = []\n", "for info in infos:\n", " print(info)\n", " ids.append(info.sample_id)" ] }, { "cell_type": "code", "execution_count": null, "id": "2ff7ff84-142e-43f0-8282-1808c217429f", "metadata": {}, "outputs": [], "source": [ "# Download samples\n", "samples = ei.experimental.data.download_samples_by_ids(ids)\n", "\n", "# Save the downloaded files\n", "for sample in samples:\n", " with open(sample.filename, \"wb\") as file:\n", " file.write(sample.data.read())\n", "\n", "# View sample information\n", "for sample in samples:\n", " print(\n", " f\"filename: {sample.filename}\\r\\n\"\n", " f\" sample ID: {sample.sample_id}\\r\\n\"\n", " f\" category: {sample.category}\\r\\n\"\n", " f\" label: {sample.label}\\r\\n\"\n", " f\" bounding boxes: {sample.bounding_boxes}\\r\\n\"\n", " f\" metadata: {sample.metadata}\"\n", " )" ] }, { "cell_type": "markdown", "id": "241e8f63-75be-4fd2-8c8c-4de1327ad6f3", "metadata": {}, "source": [ "Take a look at the files in this directory. You should see the downloaded images. They should match the images in the *dataset/* directory, which were the original images that we uploaded." ] }, { "cell_type": "markdown", "id": "9000817d-3149-442e-a82c-17ba8f770b68", "metadata": {}, "source": [ "## Delete files\n", "\n", "If you know the ID of the sample you would like to delete, you can call the `delete_sample_by_id()` function. You can also delete all the samples in your project by calling `delete_all_samples()`." ] }, { "cell_type": "code", "execution_count": null, "id": "4ad85df3-c232-4eeb-b49a-5ba6ab830e8d", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project that we uploaded earlier\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "19ecb3e8-a371-41ce-9ee1-03044d468972", "metadata": {}, "source": [ "Take a look at the data in your project. The samples that we uploaded should be gone." ] }, { "cell_type": "markdown", "id": "0f5c1235-d084-46be-8aff-fca40be9a74d", "metadata": {}, "source": [ "## Upload folder for object detection\n", "\n", "For object detection, you can put bounding box information (following the [Edge Impulse JSON bounding box format](https://docs.edgeimpulse.com/reference/image-dataset-annotation-formats)) in a file named *labels.info* in that same directory. \n", "\n", "> **Important!** The annotations file must be named exactly *labels.info*" ] }, { "cell_type": "code", "execution_count": null, "id": "84b47e73-b6c0-4ea1-8f46-6a667e42417a", "metadata": {}, "outputs": [], "source": [ "# Download images and bounding box annotations to use as an example dataset\n", "!mkdir -p dataset\n", "!rm dataset/capacitor.01.png dataset/capacitor.02.png\n", "!wget -P dataset -q \\\n", " https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.01.png \\\n", " https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.02.png \\\n", " https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/annotations/info.labels" ] }, { "cell_type": "code", "execution_count": null, "id": "69d8ee19-f85a-41c5-93a5-ae8400aa809b", "metadata": {}, "outputs": [], "source": [ "# Upload the entire directory (including the labels.info file)\n", "response = ei.experimental.data.upload_exported_dataset(\n", " directory=\"dataset\",\n", ")\n", "\n", "# Check to make sure there were no failures\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "\n", "# Save the sample IDs, as we will need these to retrieve file information and delete samples\n", "ids = []\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "markdown", "id": "5c735b98-f9e2-473a-bab7-9e781a0c4a16", "metadata": {}, "source": [ "If you head to the *Data acquisition* page on your project, you should see images in your dataset along with the bounding box information.\n", "\n", "![Images uploaded to Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-upload-download-object-detection.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "a470dd45-d753-495a-9ebf-9cce5050c70a", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project that we uploaded\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "425b6287-d6f6-4db6-bb21-25befeef6b25", "metadata": {}, "source": [ "## Upload individual CSV files\n", "\n", "The Edge Impulse ingestion service accepts CSV files, which we can use to upload raw data. Note that if you configure a CSV template using the [CSV Wizard](https://docs.edgeimpulse.com/docs/edge-impulse-studio/data-acquisition/csv-wizard), then the expected format of the CSV file might change. If you do not configure a CSV template, then the ingestion service expects CSV data to be in a particular format. See [here for details about the default CSV format](https://docs.edgeimpulse.com/reference/importing-csv-data)." ] }, { "cell_type": "code", "execution_count": null, "id": "303bbd9c-3c16-4790-a52c-864062a38d45", "metadata": {}, "outputs": [], "source": [ "import csv\n", "import io\n", "import os" ] }, { "cell_type": "code", "execution_count": null, "id": "bbfc41da-529b-4aba-b0db-24b69a697bb5", "metadata": {}, "outputs": [], "source": [ "# Create example CSV data\n", "sample_data = [\n", " [\n", " [\"timestamp\", \"accX\", \"accY\", \"accZ\"],\n", " [0, -9.81, 0.03, 0.21],\n", " [10, -9.83, 0.04, 0.27],\n", " [20, -9.12, 0.03, 0.23],\n", " [30, -9.14, 0.01, 0.25],\n", " ],\n", " [\n", " [\"timestamp\", \"accX\", \"accY\", \"accZ\"],\n", " [0, -9.56, 5.34, 1.21],\n", " [10, -9.43, 1.37, 1.27],\n", " [20, -9.22, -4.03, 1.23],\n", " [30, -9.50, -0.98, 1.25],\n", " ],\n", "]\n", "\n", "# Write to CSV files\n", "filenames = [\n", " \"001.csv\",\n", " \"002.csv\"\n", "]\n", "for i, filename in enumerate(filenames):\n", " file_path = os.path.join(\"dataset\", filename)\n", " with open(file_path, \"w\", newline=\"\") as file:\n", " writer = csv.writer(file)\n", " writer.writerows(sample_data[i])" ] }, { "cell_type": "code", "execution_count": null, "id": "0eb22747-24b5-4c65-bd7b-a6e1a57ba328", "metadata": {}, "outputs": [], "source": [ "# Add metadata to the CSV data\n", "my_samples = [\n", " {\n", " \"filename\": filenames[0],\n", " \"data\": open(os.path.join(\"dataset\", filenames[0]), \"rb\"),\n", " \"category\": \"training\",\n", " \"label\": \"idle\",\n", " \"metadata\": {\n", " \"source\": \"accelerometer\",\n", " \"collection site\": \"desk\",\n", " },\n", " },\n", " {\n", " \"filename\": filenames[1],\n", " \"data\": open(os.path.join(\"dataset\", filenames[1]), \"rb\"),\n", " \"category\": \"training\",\n", " \"label\": \"wave\",\n", " \"metadata\": {\n", " \"source\": \"accelerometer\",\n", " \"collection site\": \"desk\",\n", " },\n", " },\n", "]" ] }, { "cell_type": "code", "execution_count": null, "id": "dbbfdfbc-111a-49c7-bab9-29166a45c764", "metadata": {}, "outputs": [], "source": [ "# Wrap the samples in instances of the Sample class\n", "samples = [ei.experimental.data.Sample(**i) for i in my_samples]\n", "\n", "# Upload samples to your project\n", "response = ei.experimental.data.upload_samples(samples)\n", "\n", "# Check to make sure there were no failures\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "\n", "# Save the sample IDs, as we will need these to retrieve file information and delete samples\n", "ids = []\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "markdown", "id": "f14a17cd-b08b-40d9-8ce5-bfc21c2a95e9", "metadata": {}, "source": [ "If you head to the *Data acquisition* page on your project, you should see your time series data.\n", "\n", "![Copy API key from Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-upload-download-json-data.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "33340c1f-9928-4727-aea4-c94595f98efb", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "7a274d71-2e6a-4a0e-ba46-49986355a9dd", "metadata": {}, "source": [ "## Upload JSON data directly\n", "\n", "Another way to upload data is to encode it in JSON format. See the [data acquisition format specificaion](https://docs.edgeimpulse.com/reference/data-acquisition-format#data-acquisition-format-specification) for more information on acceptable key/value pairs. Note that at this time, the `signature` value can be set to `0`.\n", "\n", "The raw data must be encoded in an IO object. We convert the dictionary objects to a `BytesIO` object, but you can also read in data from *.json* files." ] }, { "cell_type": "code", "execution_count": null, "id": "529685a5-de16-4dc2-8906-662cbdd15e9e", "metadata": {}, "outputs": [], "source": [ "import io\n", "import json" ] }, { "cell_type": "code", "execution_count": null, "id": "e1cbcbfd-b79d-4e7e-a339-b4dbf5af4269", "metadata": {}, "outputs": [], "source": [ "# Create two different example data samples\n", "sample_data_1 = {\n", " \"protected\": {\n", " \"ver\": \"v1\",\n", " \"alg\": \"none\",\n", " },\n", " \"signature\": 0,\n", " \"payload\": {\n", " \"device_name\": \"ac:87:a3:0a:2d:1b\",\n", " \"device_type\": \"DISCO-L475VG-IOT01A\",\n", " \"interval_ms\": 10,\n", " \"sensors\": [\n", " { \"name\": \"accX\", \"units\": \"m/s2\" },\n", " { \"name\": \"accY\", \"units\": \"m/s2\" },\n", " { \"name\": \"accZ\", \"units\": \"m/s2\" }\n", " ],\n", " \"values\": [\n", " [ -9.81, 0.03, 0.21 ],\n", " [ -9.83, 0.04, 0.27 ],\n", " [ -9.12, 0.03, 0.23 ],\n", " [ -9.14, 0.01, 0.25 ]\n", " ]\n", " }\n", "}\n", "sample_data_2 = {\n", " \"protected\": {\n", " \"ver\": \"v1\",\n", " \"alg\": \"none\",\n", " },\n", " \"signature\": 0,\n", " \"payload\": {\n", " \"device_name\": \"ac:87:a3:0a:2d:1b\",\n", " \"device_type\": \"DISCO-L475VG-IOT01A\",\n", " \"interval_ms\": 10,\n", " \"sensors\": [\n", " { \"name\": \"accX\", \"units\": \"m/s2\" },\n", " { \"name\": \"accY\", \"units\": \"m/s2\" },\n", " { \"name\": \"accZ\", \"units\": \"m/s2\" }\n", " ],\n", " \"values\": [\n", " [ -9.56, 5.34, 1.21 ],\n", " [ -9.43, 1.37, 1.27 ],\n", " [ -9.22, -4.03, 1.23 ],\n", " [ -9.50, -0.98, 1.25 ]\n", " ]\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "1d63e428-c14e-4dad-9def-07936974883e", "metadata": {}, "outputs": [], "source": [ "# Provide a filename, category, label, and optional metadata for each sample\n", "my_samples = [\n", " {\n", " \"filename\": \"001.json\",\n", " \"data\": io.BytesIO(json.dumps(sample_data_1).encode('utf-8')),\n", " \"category\": \"training\",\n", " \"label\": \"idle\",\n", " \"metadata\": {\n", " \"source\": \"accelerometer\",\n", " \"collection site\": \"desk\",\n", " },\n", " },\n", " {\n", " \"filename\": \"002.json\",\n", " \"data\": io.BytesIO(json.dumps(sample_data_2).encode('utf-8')),\n", " \"category\": \"training\",\n", " \"label\": \"wave\",\n", " \"metadata\": {\n", " \"source\": \"accelerometer\",\n", " \"collection site\": \"desk\",\n", " },\n", " },\n", "]" ] }, { "cell_type": "code", "execution_count": null, "id": "924bc9f0-e248-49c1-92ce-152340292a3d", "metadata": {}, "outputs": [], "source": [ "# Wrap the samples in instances of the Sample class\n", "samples = [ei.data.sample_type.Sample(**i) for i in my_samples]\n", "\n", "# Upload samples to your project\n", "response = ei.experimental.data.upload_samples(samples)\n", "\n", "# Check to make sure there were no failures\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "\n", "# Save the sample IDs, as we will need these to retrieve file information and delete samples\n", "ids = []\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "markdown", "id": "f41e3442-d38b-4145-ab36-4203db6b066d", "metadata": {}, "source": [ "If you head to the *Data acquisition* page on your project, you should see your time series data.\n", "\n", "![Copy API key from Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-upload-download-json-data.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "3c5f50e7-d7bb-4bad-8881-5d59f94afde7", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "da6a4bdd-d2f7-4875-bd13-c1fe8280d6b1", "metadata": {}, "source": [ "## Upload NumPy arrays\n", "\n", "[NumPy](https://numpy.org/) is powerful Python library for working with large arrays and matrices. You can upload NumPy arrays directly into your Edge Impulse project. Note that the arrays are required to be in a particular format and must be uploaded some some required metadata (such as a list of labels and the sample rate).\n", "\n", "> **Important!** NumPy arrays must be in the shape `(sample, time, sensor)`\n", "\n", "If you are working with image data in NumPy, we recommend saving those images as .png or .jpg files and using `upload_directory()`." ] }, { "cell_type": "code", "execution_count": null, "id": "fe468e0a-95cc-42af-b5f1-0d8534917db8", "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "id": "455ce185-213a-4678-bb06-972c44d92fb4", "metadata": {}, "outputs": [], "source": [ "# Create example NumPy array with 2 time series samples\n", "sample_data = np.array(\n", " [\n", " [ # Sample 1 (\"idle\")\n", " [-9.81, 0.03, 0.21],\n", " [-9.83, 0.04, 0.27],\n", " [-9.12, 0.03, 0.23],\n", " [-9.14, 0.01, 0.25],\n", " ],\n", " [ # Sample 2 (\"wave\")\n", " [-9.56, 5.34, 1.21],\n", " [-9.43, 1.37, 1.27],\n", " [-9.22, -4.03, 1.23],\n", " [-9.50, -0.98, 1.25],\n", " ],\n", " ]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "80ca3f10-bba3-497d-9b08-15314dbe0266", "metadata": {}, "outputs": [], "source": [ "# Labels for each sample\n", "labels = [\"idle\", \"wave\"]\n", "\n", "# Names of the sensors and units for the 3 axes\n", "sensors = [\n", " { \"name\": \"accX\", \"units\": \"m/s2\" },\n", " { \"name\": \"accY\", \"units\": \"m/s2\" },\n", " { \"name\": \"accZ\", \"units\": \"m/s2\" },\n", "]\n", "\n", "# Optional metadata for all samples being uploaded\n", "metadata = {\n", " \"source\": \"accelerometer\",\n", " \"collection site\": \"desk\",\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "12b610b0-66fd-40dd-8933-37807680b99a", "metadata": {}, "outputs": [], "source": [ "# Upload samples to your project\n", "response = ei.experimental.data.upload_numpy(\n", " data=sample_data,\n", " labels=labels,\n", " sensors=sensors,\n", " sample_rate_ms=10,\n", " metadata=metadata,\n", " category=\"training\",\n", ")\n", "\n", "# Check to make sure there were no failures\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "\n", "# Save the sample IDs, as we will need these to retrieve file information and delete samples\n", "ids = []\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "markdown", "id": "270bbebc-3d09-4912-842f-71d7d945b64c", "metadata": {}, "source": [ "If you head to the *Data acquisition* page on your project, you should see your time series data. Note that the sample names are randomly assigned, so we recommend recording the sample IDs when you upload.\n", "\n", "![Copy API key from Edge Impulse project](https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/python-sdk-upload-download-numpy-data.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "3cff1e4c-e566-4811-a215-304cef006f32", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "8fcebb66-bd21-43a2-b90a-ec17010c60f3", "metadata": {}, "source": [ "## Upload pandas (and pandas-like) dataframes\n", "\n", "[pandas](https://pandas.pydata.org/) is popular Python library for performing data manipulation and analysis. The Edge Impulse library supports a number of ways to upload dataframes. We will go over each format.\n", "\n", "Note that several other packages exist that work as drop-in replacements for pandas. You can use these replacements so long as you import that with the name `pd`. For example, one of:\n", "\n", "```\n", "import pandas as pd\n", "import modin.pandas as pd\n", "import dask.dataframe as pd\n", "import polars as pd\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "629f8683-f9d8-4342-a251-e0344e925828", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "id": "e9237172-8dfb-49ce-b2a8-386007c9d934", "metadata": {}, "source": [ "The first option is to upload one dataframe for each sample (non-time series)" ] }, { "cell_type": "code", "execution_count": null, "id": "6a244f11-18b1-4759-a65c-6e622bdb040f", "metadata": {}, "outputs": [], "source": [ "# Construct one dataframe for each sample (multidimensional, non-time series)\n", "df_1 = pd.DataFrame([[-9.81, 0.03, 0.21]], columns=[\"accX\", \"accY\", \"accZ\"])\n", "df_2 = pd.DataFrame([[-9.56, 5.34, 1.21]], columns=[\"accX\", \"accY\", \"accZ\"])\n", "\n", "# Optional metadata for all samples being uploaded\n", "metadata = {\n", " \"source\": \"accelerometer\",\n", " \"collection site\": \"desk\",\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "48689eb7-d2ab-4aa7-8acb-1c57dbdd6de8", "metadata": {}, "outputs": [], "source": [ "# Upload the first sample\n", "ids = []\n", "response = ei.experimental.data.upload_pandas_sample(\n", " df_1,\n", " label=\"One\",\n", " filename=\"001\",\n", " metadata=metadata,\n", " category=\"training\",\n", ")\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)\n", "\n", "# Upload the second sample\n", "response = ei.experimental.data.upload_pandas_sample(\n", " df_2,\n", " label=\"Two\",\n", " filename=\"002\",\n", " metadata=metadata,\n", " category=\"training\",\n", ")\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "code", "execution_count": null, "id": "ca4b55e2-4d21-49f4-95b6-792003ee56d0", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "ba2274c4-f6da-4384-a607-8a60d818511b", "metadata": {}, "source": [ "You can also upload one dataframe for each sample (time series). As with previous examples, we'll assume that the sample rate is 10 ms." ] }, { "cell_type": "code", "execution_count": null, "id": "db26f4b9-1bef-4171-afb8-a7c0f815d8f1", "metadata": {}, "outputs": [], "source": [ "# Create samples (multidimensional, time series)\n", "sample_data_1 = [ # Sample 1 (\"idle\")\n", " [-9.81, 0.03, 0.21],\n", " [-9.83, 0.04, 0.27],\n", " [-9.12, 0.03, 0.23],\n", " [-9.14, 0.01, 0.25],\n", "]\n", "sample_data_2 = [ # Sample 1 (\"wave\")\n", " [-9.56, 5.34, 1.21],\n", " [-9.43, 1.37, 1.27],\n", " [-9.22, -4.03, 1.23],\n", " [-9.50, -0.98, 1.25],\n", "]" ] }, { "cell_type": "code", "execution_count": null, "id": "8673d7bf-0c91-4a72-a247-05b2fd632727", "metadata": {}, "outputs": [], "source": [ "# Construct one dataframe for each sample\n", "df_1 = pd.DataFrame(sample_data_1, columns=[\"accX\", \"accY\", \"accZ\"])\n", "df_2 = pd.DataFrame(sample_data_2, columns=[\"accX\", \"accY\", \"accZ\"])\n", "\n", "# Optional metadata for all samples being uploaded\n", "metadata = {\n", " \"source\": \"accelerometer\",\n", " \"collection site\": \"desk\",\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "74d20f24-216e-4a14-a2d6-df284e58583b", "metadata": {}, "outputs": [], "source": [ "# Upload the first sample\n", "ids = []\n", "response = ei.experimental.data.upload_pandas_sample(\n", " df_1,\n", " label=\"Idle\",\n", " filename=\"001\",\n", " sample_rate_ms=10,\n", " metadata=metadata,\n", " category=\"training\",\n", ")\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)\n", "\n", "# Upload the second sample\n", "response = ei.experimental.data.upload_pandas_sample(\n", " df_2,\n", " label=\"Wave\",\n", " filename=\"002\",\n", " sample_rate_ms=10,\n", " metadata=metadata,\n", " category=\"training\",\n", ")\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "code", "execution_count": null, "id": "8b651888-909e-4e7d-b96f-d053089d1266", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "05129bfe-bc42-4940-822b-c44e7a1ddcbd", "metadata": {}, "source": [ "You can upload non-time series data where each sample is a row in the dataframe. Note that you need to provide labels in the rows." ] }, { "cell_type": "code", "execution_count": null, "id": "1c76dea7-4693-4d47-a3fb-9fd4bb2f5709", "metadata": {}, "outputs": [], "source": [ "# Construct non-time series data, where each row is a different sample\n", "data = [\n", " [\"desk\", \"training\", \"One\", -9.81, 0.03, 0.21],\n", " [\"field\", \"training\", \"Two\", -9.56, 5.34, 1.21],\n", "]\n", "columns = [\"loc\", \"category\", \"label\", \"accX\", \"accY\", \"accZ\"]\n", "\n", "# Wrap the data in a DataFrame\n", "df = pd.DataFrame(data, columns=columns)" ] }, { "cell_type": "code", "execution_count": null, "id": "38084fcd-43d8-4806-86bf-804a81d3e4a1", "metadata": {}, "outputs": [], "source": [ "# Upload non-time series DataFrame (with multiple samples) to the project\n", "ids = []\n", "response = ei.experimental.data.upload_pandas_dataframe(\n", " df,\n", " feature_cols=[\"accX\", \"accY\", \"accZ\"],\n", " label_col=\"label\",\n", " category_col=\"category\",\n", " metadata_cols=[\"loc\"],\n", ")\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "code", "execution_count": null, "id": "83f53aee-9b30-4697-9383-f0d15a3b29a0", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "2a742282-481a-4654-a3fd-d21891cdfe43", "metadata": {}, "source": [ "A \"wide\" dataframe is one where each column represents a value in the time series data, and the rows become individual samples. Note that you need to provide labels in the rows." ] }, { "cell_type": "code", "execution_count": null, "id": "4ad75535-eb97-4cdc-8fdb-80388cf2b77e", "metadata": {}, "outputs": [], "source": [ "# Construct time series data, where each row is a different sample\n", "data = [\n", " [\"desk\", \"training\", \"idle\", 0.8, 0.7, 0.8, 0.9, 0.8, 0.8, 0.7, 0.8],\n", " [\"field\", \"training\", \"motion\", 0.3, 0.9, 0.4, 0.6, 0.8, 0.9, 0.5, 0.4],\n", "]\n", "columns = [\"loc\", \"category\", \"label\", \"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\"]\n", "\n", "# Wrap the data in a DataFrame\n", "df = pd.DataFrame(data, columns=columns)" ] }, { "cell_type": "code", "execution_count": null, "id": "a667f501-ae94-4849-99e4-db9671e3cba5", "metadata": {}, "outputs": [], "source": [ "# Upload time series DataFrame (with multiple samples) to the project\n", "ids = []\n", "response = ei.experimental.data.upload_pandas_dataframe_wide(\n", " df,\n", " label_col=\"label\",\n", " category_col=\"category\",\n", " metadata_cols=[\"loc\"],\n", " data_col_start=3,\n", " sample_rate_ms=100,\n", ")\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "code", "execution_count": null, "id": "903d2d15-de3e-4d03-b67b-b31d26c9650e", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] }, { "cell_type": "markdown", "id": "5ea30ca6-df42-4083-bef3-171abaeb27cd", "metadata": {}, "source": [ "A DataFrame can also be divided into \"groups\" so you can upload multidimensional time series data." ] }, { "cell_type": "code", "execution_count": null, "id": "a6f6fea2-86a1-4eea-919e-3c6ba82d9b04", "metadata": {}, "outputs": [], "source": [ "# Create samples\n", "sample_data = [\n", " [\"desk\", \"sample 1\", \"training\", \"idle\", 0, -9.81, 0.03, 0.21],\n", " [\"desk\", \"sample 1\", \"training\", \"idle\", 0.01, -9.83, 0.04, 0.27],\n", " [\"desk\", \"sample 1\", \"training\", \"idle\", 0.02, -9.12, 0.03, 0.23],\n", " [\"desk\", \"sample 1\", \"training\", \"idle\", 0.03, -9.14, 0.01, 0.25],\n", " [\"field\", \"sample 2\", \"training\", \"wave\", 0, -9.56, 5.34, 1.21],\n", " [\"field\", \"sample 2\", \"training\", \"wave\", 0.01, -9.43, 1.37, 1.27],\n", " [\"field\", \"sample 2\", \"training\", \"wave\", 0.02, -9.22, -4.03, 1.23],\n", " [\"field\", \"sample 2\", \"training\", \"wave\", 0.03, -9.50, -0.98, 1.25],\n", "]\n", "columns = [\"loc\", \"sample_name\", \"category\", \"label\", \"timestamp\", \"accX\", \"accY\", \"accZ\"]\n", "\n", "# Wrap the data in a DataFrame\n", "df = pd.DataFrame(sample_data, columns=columns)" ] }, { "cell_type": "code", "execution_count": null, "id": "aa7ab0ba-83b9-4d5f-a9e5-ae6b378db176", "metadata": {}, "outputs": [], "source": [ "# Upload time series DataFrame (with multiple samples and multiple dimensions) to the project\n", "ids = []\n", "response = ei.experimental.data.upload_pandas_dataframe_with_group(\n", " df,\n", " group_by=\"sample_name\",\n", " timestamp_col=\"timestamp\",\n", " feature_cols=[\"accX\", \"accY\", \"accZ\"],\n", " label_col=\"label\",\n", " category_col=\"category\",\n", " metadata_cols=[\"loc\"]\n", ")\n", "assert len(response.fails) == 0, \"Could not upload some files\"\n", "for sample in response.successes:\n", " ids.append(sample.sample_id)" ] }, { "cell_type": "code", "execution_count": null, "id": "57f32e6f-ce89-40ee-bc31-b4f18e3d0624", "metadata": {}, "outputs": [], "source": [ "# Delete the samples from the Edge Impulse project\n", "for id in ids:\n", " ei.experimental.data.delete_sample_by_id(id)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.16" } }, "nbformat": 4, "nbformat_minor": 5 }