{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Toy weather data\n", "\n", "Here is an example of how to easily manipulate a toy weather dataset using\n", "xarray and other recommended Python libraries:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:43:36.127628Z", "start_time": "2020-01-27T15:43:36.081733Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "import xarray as xr\n", "\n", "np.random.seed(123)\n", "\n", "xr.set_options(display_style=\"html\")\n", "\n", "times = pd.date_range(\"2000-01-01\", \"2001-12-31\", name=\"time\")\n", "annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))\n", "\n", "base = 10 + 15 * annual_cycle.reshape(-1, 1)\n", "tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)\n", "tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)\n", "\n", "ds = xr.Dataset(\n", " {\n", " \"tmin\": ((\"time\", \"location\"), tmin_values),\n", " \"tmax\": ((\"time\", \"location\"), tmax_values),\n", " },\n", " {\"time\": times, \"location\": [\"IA\", \"IN\", \"IL\"]},\n", ")\n", "\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examine a dataset with pandas and seaborn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert to a pandas DataFrame" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:47:14.160297Z", "start_time": "2020-01-27T15:47:14.126738Z" } }, "outputs": [], "source": [ "df = ds.to_dataframe()\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:47:32.682065Z", "start_time": "2020-01-27T15:47:32.652629Z" } }, "outputs": [], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize using pandas" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:47:34.617042Z", "start_time": "2020-01-27T15:47:34.282605Z" } }, "outputs": [], "source": [ "ds.mean(dim=\"location\").to_dataframe().plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize using seaborn" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:47:37.643175Z", "start_time": "2020-01-27T15:47:37.202479Z" } }, "outputs": [], "source": [ "sns.pairplot(df.reset_index(), vars=ds.data_vars)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Probability of freeze by calendar month" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:48:11.241224Z", "start_time": "2020-01-27T15:48:11.211156Z" } }, "outputs": [], "source": [ "freeze = (ds[\"tmin\"] <= 0).groupby(\"time.month\").mean(\"time\")\n", "freeze" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:48:13.131247Z", "start_time": "2020-01-27T15:48:12.924985Z" } }, "outputs": [], "source": [ "freeze.to_pandas().plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Monthly averaging" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:48:08.498259Z", "start_time": "2020-01-27T15:48:08.210890Z" } }, "outputs": [], "source": [ "monthly_avg = ds.resample(time=\"1MS\").mean()\n", "monthly_avg.sel(location=\"IA\").to_dataframe().plot(style=\"s-\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that ``MS`` here refers to Month-Start; ``M`` labels Month-End (the last day of the month)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate monthly anomalies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In climatology, \"anomalies\" refer to the difference between observations and\n", "typical weather for a particular season. Unlike observations, anomalies should\n", "not show any seasonal cycle." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:49:34.855086Z", "start_time": "2020-01-27T15:49:34.406439Z" } }, "outputs": [], "source": [ "climatology = ds.groupby(\"time.month\").mean(\"time\")\n", "anomalies = ds.groupby(\"time.month\") - climatology\n", "anomalies.mean(\"location\").to_dataframe()[[\"tmin\", \"tmax\"]].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate standardized monthly anomalies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create standardized anomalies where the difference between the\n", "observations and the climatological monthly mean is\n", "divided by the climatological standard deviation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:50:09.144586Z", "start_time": "2020-01-27T15:50:08.734682Z" } }, "outputs": [], "source": [ "climatology_mean = ds.groupby(\"time.month\").mean(\"time\")\n", "climatology_std = ds.groupby(\"time.month\").std(\"time\")\n", "stand_anomalies = xr.apply_ufunc(\n", " lambda x, m, s: (x - m) / s,\n", " ds.groupby(\"time.month\"),\n", " climatology_mean,\n", " climatology_std,\n", ")\n", "\n", "stand_anomalies.mean(\"location\").to_dataframe()[[\"tmin\", \"tmax\"]].plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fill missing values with climatology" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:50:46.192491Z", "start_time": "2020-01-27T15:50:46.174554Z" } }, "source": [ "The ``fillna`` method on grouped objects lets you easily fill missing values by group:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:51:40.279299Z", "start_time": "2020-01-27T15:51:40.220342Z" } }, "outputs": [], "source": [ "# throw away the first half of every month\n", "some_missing = ds.tmin.sel(time=ds[\"time.day\"] > 15).reindex_like(ds)\n", "filled = some_missing.groupby(\"time.month\").fillna(climatology.tmin)\n", "both = xr.Dataset({\"some_missing\": some_missing, \"filled\": filled})\n", "both" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:52:11.815769Z", "start_time": "2020-01-27T15:52:11.770825Z" } }, "outputs": [], "source": [ "df = both.sel(time=\"2000\").mean(\"location\").reset_coords(drop=True).to_dataframe()\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-01-27T15:52:14.867866Z", "start_time": "2020-01-27T15:52:14.449684Z" } }, "outputs": [], "source": [ "df[[\"filled\", \"some_missing\"]].plot()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }