{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Toy weather data\n",
    "\n",
    "Here is an example of how to easily manipulate a toy weather dataset using\n",
    "xarray and other recommended Python libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "import xarray as xr\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:43:36.127628Z",
     "start_time": "2020-01-27T15:43:36.081733Z"
    }
   },
   "outputs": [],
   "source": [
    "np.random.seed(123)\n",
    "\n",
    "xr.set_options(display_style=\"html\")\n",
    "\n",
    "times = pd.date_range(\"2000-01-01\", \"2001-12-31\", name=\"time\")\n",
    "annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))\n",
    "\n",
    "base = 10 + 15 * annual_cycle.reshape(-1, 1)\n",
    "tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)\n",
    "tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)\n",
    "\n",
    "ds = xr.Dataset(\n",
    "    {\n",
    "        \"tmin\": ((\"time\", \"location\"), tmin_values),\n",
    "        \"tmax\": ((\"time\", \"location\"), tmax_values),\n",
    "    },\n",
    "    {\"time\": times, \"location\": [\"IA\", \"IN\", \"IL\"]},\n",
    ")\n",
    "\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examine a dataset with pandas and seaborn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convert to a pandas DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:47:14.160297Z",
     "start_time": "2020-01-27T15:47:14.126738Z"
    }
   },
   "outputs": [],
   "source": [
    "df = ds.to_dataframe()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:47:32.682065Z",
     "start_time": "2020-01-27T15:47:32.652629Z"
    }
   },
   "outputs": [],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize using pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:47:34.617042Z",
     "start_time": "2020-01-27T15:47:34.282605Z"
    }
   },
   "outputs": [],
   "source": [
    "ds.mean(dim=\"location\").to_dataframe().plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize using seaborn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:47:37.643175Z",
     "start_time": "2020-01-27T15:47:37.202479Z"
    }
   },
   "outputs": [],
   "source": [
    "sns.pairplot(df.reset_index(), vars=ds.data_vars)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Probability of freeze by calendar month"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:48:11.241224Z",
     "start_time": "2020-01-27T15:48:11.211156Z"
    }
   },
   "outputs": [],
   "source": [
    "freeze = (ds[\"tmin\"] <= 0).groupby(\"time.month\").mean(\"time\")\n",
    "freeze"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:48:13.131247Z",
     "start_time": "2020-01-27T15:48:12.924985Z"
    }
   },
   "outputs": [],
   "source": [
    "freeze.to_pandas().plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Monthly averaging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:48:08.498259Z",
     "start_time": "2020-01-27T15:48:08.210890Z"
    }
   },
   "outputs": [],
   "source": [
    "monthly_avg = ds.resample(time=\"1MS\").mean()\n",
    "monthly_avg.sel(location=\"IA\").to_dataframe().plot(style=\"s-\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that ``MS`` here refers to Month-Start; ``M`` labels Month-End (the last day of the month)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate monthly anomalies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In climatology, \"anomalies\" refer to the difference between observations and\n",
    "typical weather for a particular season. Unlike observations, anomalies should\n",
    "not show any seasonal cycle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:49:34.855086Z",
     "start_time": "2020-01-27T15:49:34.406439Z"
    }
   },
   "outputs": [],
   "source": [
    "climatology = ds.groupby(\"time.month\").mean(\"time\")\n",
    "anomalies = ds.groupby(\"time.month\") - climatology\n",
    "anomalies.mean(\"location\").to_dataframe()[[\"tmin\", \"tmax\"]].plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate standardized monthly anomalies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can create standardized anomalies where the difference between the\n",
    "observations and the climatological monthly mean is\n",
    "divided by the climatological standard deviation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:50:09.144586Z",
     "start_time": "2020-01-27T15:50:08.734682Z"
    }
   },
   "outputs": [],
   "source": [
    "climatology_mean = ds.groupby(\"time.month\").mean(\"time\")\n",
    "climatology_std = ds.groupby(\"time.month\").std(\"time\")\n",
    "stand_anomalies = xr.apply_ufunc(\n",
    "    lambda x, m, s: (x - m) / s,\n",
    "    ds.groupby(\"time.month\"),\n",
    "    climatology_mean,\n",
    "    climatology_std,\n",
    ")\n",
    "\n",
    "stand_anomalies.mean(\"location\").to_dataframe()[[\"tmin\", \"tmax\"]].plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fill missing values with climatology"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:50:46.192491Z",
     "start_time": "2020-01-27T15:50:46.174554Z"
    }
   },
   "source": [
    "The ``fillna`` method on grouped objects lets you easily fill missing values by group:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:51:40.279299Z",
     "start_time": "2020-01-27T15:51:40.220342Z"
    }
   },
   "outputs": [],
   "source": [
    "# throw away the first half of every month\n",
    "some_missing = ds.tmin.sel(time=ds[\"time.day\"] > 15).reindex_like(ds)\n",
    "filled = some_missing.groupby(\"time.month\").fillna(climatology.tmin)\n",
    "both = xr.Dataset({\"some_missing\": some_missing, \"filled\": filled})\n",
    "both"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:52:11.815769Z",
     "start_time": "2020-01-27T15:52:11.770825Z"
    }
   },
   "outputs": [],
   "source": [
    "df = both.sel(time=\"2000\").mean(\"location\").reset_coords(drop=True).to_dataframe()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-27T15:52:14.867866Z",
     "start_time": "2020-01-27T15:52:14.449684Z"
    }
   },
   "outputs": [],
   "source": [
    "df[[\"filled\", \"some_missing\"]].plot()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}