{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 3.1 Aggregation Tutorial\n", "\n", "## About\n", "\n", "This notebook contains a minimal example for running workflows on aggregates of jobs using **signac-flow**. \n", "\n", "## Author\n", "\n", "Hardik Ojha\n", "\n", "## Prerequisites\n", "\n", "This notebooks requires the following packages:\n", "\n", "1. **signac-flow** >= 0.15\n", "2. numpy\n", "3. matplotlib\n", "\n", "Execute the command below to install the required packages:\n", "```bash\n", "pip install signac-flow>=0.15 matplotlib numpy\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Definition\n", "\n", "Aggregation allows a **signac-flow** operation to act on multiple jobs, rather than one job at a time.\n", "\n", "An aggregate is defined as a subset of the jobs in a **signac** project. Aggregates are generated when a `flow.aggregator` object is provided to the `FlowProject.operation` decorator.\n", "\n", "Please refer to the [documentation](https://docs.signac.io/en/latest/aggregation.html) for detailed instructions on how to use aggregation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objective\n", "\n", "The goal of this project is to plot the temperature values present in a **signac** data space along with the average value of all the temperatures present." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Project Setup\n", "\n", "Before we initialize a **signac** project inside the `projects/tutorial-aggregation` directory, we need to be sure that no such directory exists. Uncomment before executing the below cell to remove the directory if exists." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# !rm -rf projects/tutorial-aggregation" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import signac\n", "from flow import FlowProject, aggregator\n", "\n", "# Setting default figure size\n", "plt.rcParams[\"figure.figsize\"] = (10, 4)\n", "\n", "\n", "# Initializing a signac project\n", "project = signac.init_project(\"projects/tutorial-aggregation\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initializing the data space\n", "\n", "For the purpose of this notebook, we will be creating a random dataset using some mathematical calculations.\n", "\n", "All the **signac** jobs will have two state point parameters and one document value.\n", "\n", "- `job.statepoint[\"city\"]`: City for which data is being collected.\n", "- `job.statepoint[\"day\"]`: Day of the year.\n", "- `job.document[\"temperature\"]`: Average temperature for that day." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "days = np.arange(365)\n", "\n", "\n", "def generate_temperatures(days, seed=None):\n", " rng = np.random.default_rng(seed)\n", " avg_temperature = 10 + rng.random() * 10\n", " annual_variation = -10 * np.cos(days / 365 * 2 * np.pi)\n", " random_variation = 5 * rng.random(len(days))\n", " temperatures = avg_temperature + annual_variation + random_variation\n", " return temperatures\n", "\n", "\n", "temperatures = generate_temperatures(days, seed=123)\n", "\n", "for day, temperature in zip(days.tolist(), temperatures.tolist()):\n", " # Create a signac job having the state point parameters 'day' and 'temperature'\n", " statepoint = dict(city=\"Anytown\", day=day)\n", " job = project.open_job(statepoint)\n", " job.document[\"temperature\"] = temperature" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the project schema to see the jobs that were created." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "ProjectSchema(<len=2>)
{\n",
" 'city': 'str([Anytown], 1)',\n",
" 'day': 'int([0, 1, 2, ..., 363, 364], 365)',\n",
"}"
],
"text/plain": [
"ProjectSchema(