{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ground Observatory Data - FTP\n",
    "\n",
    "> Authors: Luca Mariani, Clemens Kloss\n",
    ">\n",
    "> Abstract: Demonstrates ground observatory data by direct access to the  BGS FTP server (AUX_OBS dataset). Note that in the future there will be a VirES-based access method (work in progress)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"top\"/>\n",
    "\n",
    "## Contents\n",
    "\n",
    "- [Settings and functions](#settings)\n",
    "- [Hourly mean values](#obs)\n",
    "    - [Read data from ASCII files](#obs-read-ascii)\n",
    "    - [Read data from multiple files](#obs-multifiles)\n",
    "    - [Examples](#obs-examples)\n",
    "- [Minute and second mean values](#obsms)\n",
    "    - [Read data from CDF files](#obsms-read-cdf)\n",
    "    - [Read data from multiple files](#obsms-multifiles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext watermark\n",
    "%watermark -i -v -p viresclient,pandas,xarray,matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Python standard library\n",
    "import os\n",
    "import re\n",
    "from contextlib import closing\n",
    "from datetime import datetime\n",
    "from ftplib import FTP\n",
    "from pathlib import Path\n",
    "from tempfile import TemporaryFile\n",
    "from zipfile import ZipFile\n",
    "\n",
    "# Extra libraries\n",
    "import cartopy.crs as ccrs\n",
    "import cartopy.feature as cfeature\n",
    "import cdflib\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from tqdm import tqdm\n",
    "from viresclient import SwarmRequest"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"settings\" />\n",
    "\n",
    "## Settings and functions\n",
    "\n",
    "[[TOP]](#top)\n",
    "\n",
    "First we define a number of functions to enable convenient searching, downloading and reading from the FTP server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# FTP server\n",
    "HOST = 'ftp.nerc-murchison.ac.uk'\n",
    "\n",
    "# Local directories (update paths according to your environment)\n",
    "OBS_HOUR_LOCAL_DIR = Path('~/data/AUX_OBS/hour').expanduser()\n",
    "OBS_MINUTE_LOCAL_DIR = Path('~/data/AUX_OBS/minute').expanduser()\n",
    "OBS_SECOND_LOCAL_DIR = Path('~/data/AUX_OBS/second').expanduser()\n",
    "\n",
    "# Create directories to use\n",
    "os.makedirs(OBS_HOUR_LOCAL_DIR, exist_ok=True)\n",
    "os.makedirs(OBS_MINUTE_LOCAL_DIR, exist_ok=True)\n",
    "os.makedirs(OBS_SECOND_LOCAL_DIR, exist_ok=True)\n",
    "\n",
    "\n",
    "def search(obstype, start_date=None, end_date=None):\n",
    "    \"\"\"Search OBS data file on the FTP server.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    obstype : str\n",
    "        OBS file type: `hour`, `minute`, `second`.\n",
    "    start_date : str or numpy.datetime64\n",
    "        lower bound of the time interval (default: no time interval).\n",
    "    stop_date : str or numpy.datetime64\n",
    "        upper bound of the time interval (default: no time interval).\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    list(str)\n",
    "        OBS data files.\n",
    "        \n",
    "    Raises\n",
    "    ------\n",
    "    ValueError\n",
    "        if `obstype` is not valid.\n",
    "    ftplib.all_errors\n",
    "        in case of FTP errors.\n",
    "    \n",
    "    \"\"\"\n",
    "\n",
    "    OBS_HOUR_DIR = '/geomag/Swarm/AUX_OBS/hour'\n",
    "    OBS_MINUTE_DIR = '/geomag/Swarm/AUX_OBS/minute'\n",
    "    OBS_SECOND_DIR = '/geomag/Swarm/AUX_OBS/second'\n",
    "    PATTERN = re.compile(\n",
    "        r'SW_OPER_AUX_OBS[_MS]2__(?P<start>\\d{8}T\\d{6})_'\n",
    "        r'(?P<stop>\\d{8}T\\d{6})_\\d{4}\\.ZIP$'\n",
    "    )\n",
    "    MINDATE = np.datetime64('0000', 's')\n",
    "    MAXDATE = np.datetime64('9999', 's')\n",
    "    \n",
    "    def _callback(line, result, start_date, end_date):\n",
    "        if line[0] == '-':\n",
    "            match = PATTERN.match(line[56:])\n",
    "            if match:\n",
    "                start, stop = match.groupdict().values()\n",
    "                start = np.datetime64(datetime.strptime(start, '%Y%m%dT%H%M%S'))\n",
    "                stop = np.datetime64(datetime.strptime(stop, '%Y%m%dT%H%M%S'))\n",
    "                if end_date >= start and start_date <= stop:\n",
    "                    result.append(line[56:])\n",
    "\n",
    "    start_date = MINDATE if start_date is None else np.datetime64(start_date)\n",
    "    end_date = MAXDATE if end_date is None else np.datetime64(end_date)\n",
    "    paths = {\n",
    "        'hour': OBS_HOUR_DIR,\n",
    "        'minute': OBS_MINUTE_DIR,\n",
    "        'second': OBS_SECOND_DIR\n",
    "    }\n",
    "    if obstype not in paths:\n",
    "        raise ValueError(\n",
    "            f'obstype must be hour, minute or second, not {obstype}'\n",
    "        )\n",
    "\n",
    "    result = []\n",
    "    with FTP(HOST) as ftp:\n",
    "        ftp.login()\n",
    "        ftp.dir(paths[obstype], lambda line: _callback(line, result, start_date, end_date))\n",
    "    return [f'{paths[obstype]}/{name}' for name in sorted(result)]\n",
    "\n",
    "\n",
    "def loacal_search(obstype, start_date=None, end_date=None):\n",
    "    \"\"\"Search OBS data file on local filesystem.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    obstype : str\n",
    "        OBS file type: `hour`, `minute`, `second`.\n",
    "    start_date : str or numpy.datetime64\n",
    "        lower bound of the time interval (default: no time interval).\n",
    "    stop_date : str or numpy.datetime64\n",
    "        upper bound of the time interval (default: no time interval).\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    list(pathlib.Path)\n",
    "        OBS data files.\n",
    "        \n",
    "    Raises\n",
    "    ------\n",
    "    ValueError\n",
    "        if `obstype` is not valid.\n",
    "    \n",
    "    \"\"\"\n",
    "\n",
    "    PATTERN = re.compile(\n",
    "        r'SW_OPER_AUX_OBS[_MS]2__(?P<start>\\d{8}T\\d{6})_'\n",
    "        r'(?P<stop>\\d{8}T\\d{6})_\\d{4}\\.\\w{3}$'\n",
    "    )\n",
    "    MINDATE = np.datetime64('0000', 's')\n",
    "    MAXDATE = np.datetime64('9999', 's')\n",
    "    \n",
    "    start_date = MINDATE if start_date is None else np.datetime64(start_date)\n",
    "    end_date = MAXDATE if end_date is None else np.datetime64(end_date)\n",
    "    paths = {\n",
    "        'hour': OBS_HOUR_LOCAL_DIR,\n",
    "        'minute': OBS_MINUTE_LOCAL_DIR,\n",
    "        'second': OBS_SECOND_LOCAL_DIR\n",
    "    }\n",
    "    if obstype not in paths:\n",
    "        raise ValueError(\n",
    "            f'obstype must be hour, minute or second, not {obstype}'\n",
    "        )\n",
    "\n",
    "    result = []\n",
    "    for file in (elm for elm in paths[obstype].iterdir() if elm.is_file()):\n",
    "        match = PATTERN.match(file.name)\n",
    "        if match:\n",
    "            start, stop = match.groupdict().values()\n",
    "            start = np.datetime64(datetime.strptime(start, '%Y%m%dT%H%M%S'))\n",
    "            stop = np.datetime64(datetime.strptime(stop, '%Y%m%dT%H%M%S'))\n",
    "            if end_date >= start and start_date <= stop:\n",
    "                result.append(file)\n",
    "    return sorted(result)\n",
    "\n",
    "\n",
    "def download(files, outdir='', show_progress=True):\n",
    "    \"\"\"Download files from the FTP server.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    outdir : str or os.PathLike\n",
    "        output directory (default: current directory).\n",
    "    files : collections.abc.Iterable(str)\n",
    "        path(s) of the file(s) to be downloaded\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    list(pathlib.Path)\n",
    "        list of downloaded files.\n",
    "        \n",
    "    Raises\n",
    "    ------\n",
    "    ftplib.all_errors\n",
    "        in case of FTP errors.\n",
    "    \n",
    "    \"\"\"\n",
    "    def _callback(data, fh, pbar):\n",
    "        pbar.update(len(data))\n",
    "        fh.write(data)\n",
    "\n",
    "    outdir = Path(outdir)\n",
    "    downloaded = []\n",
    "    with FTP(HOST) as ftp:\n",
    "        ftp.login()\n",
    "        for file in files:\n",
    "            file = str(file)\n",
    "            basename = file.split('/')[-1]\n",
    "            with TemporaryFile(dir=outdir) as tmp:\n",
    "                with tqdm(total=ftp.size(file), unit='B',\n",
    "                          unit_scale=True, desc=basename,\n",
    "                          disable=not show_progress) as pbar:\n",
    "                    ftp.retrbinary(f'RETR {file}', callback=lambda x: _callback(x, tmp, pbar))\n",
    "                    with ZipFile(tmp) as zf:\n",
    "                        hdr = Path(basename).with_suffix('.HDR').name\n",
    "                        datafile = [elm for elm in zf.namelist()if elm != hdr][0]\n",
    "                        outfile = zf.extract(datafile, outdir)\n",
    "                        downloaded.append(Path(outfile))\n",
    "    return downloaded\n",
    "\n",
    "\n",
    "def ascii_to_pandas(file):\n",
    "    \"\"\"Convert an OBS ASCII file to a pandas DataFrame.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    file : str or os.PathLike\n",
    "        OBS ASCII file.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    pandas.DataFrame\n",
    "        data contained in the OBS ASCII file.\n",
    "\n",
    "    \"\"\"\n",
    "    df = pd.read_csv(\n",
    "        file,\n",
    "        comment='#',\n",
    "        delim_whitespace=True,\n",
    "        names = ['IAGA_code', 'Latitude', 'Longitude', 'Radius',\n",
    "                 'yyyy', 'mm', 'dd', 'UT', 'B_N', 'B_E', 'B_C'],\n",
    "        parse_dates={'Timestamp': [4, 5, 6]},\n",
    "        infer_datetime_format=True\n",
    "    )\n",
    "    df['Timestamp'] = df['Timestamp'] + pd.to_timedelta(df['UT'], 'h')\n",
    "    df.drop(columns='UT', inplace=True)\n",
    "    df.set_index('Timestamp', inplace=True)\n",
    "    return df\n",
    "\n",
    "\n",
    "def cdf_to_pandas(file):\n",
    "    \"\"\"Convert an OBS CDF file to a pandas DataFrame.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    file : str or os.PathLike\n",
    "        OBS CDF file.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    pandas.DataFrame\n",
    "        data contained in the OBS CDF file.\n",
    "\n",
    "    \"\"\"\n",
    "    with closing(cdflib.cdfread.CDF(file)) as data:\n",
    "        ts = pd.DatetimeIndex(\n",
    "            cdflib.cdfepoch.encode(data.varget('Timestamp'), iso_8601=True),\n",
    "            name='Timestamp'\n",
    "        )\n",
    "        df = pd.DataFrame(\n",
    "            {\n",
    "                'IAGA_code': data.varget('IAGA_code')[:,0,0],\n",
    "                'Latitude': data.varget('Latitude'),\n",
    "                'Longitude': data.varget('Longitude'),\n",
    "                'Radius': data.varget('Radius'),\n",
    "                'B_N': data.varget('B_NEC')[:,0],\n",
    "                'B_E': data.varget('B_NEC')[:,1],\n",
    "                'B_C': data.varget('B_NEC')[:,2]\n",
    "            },\n",
    "            index=ts\n",
    "        )\n",
    "    return df\n",
    "\n",
    "\n",
    "def download_obslist(outdir=''):\n",
    "    \"\"\"Search observatory list file on the FTP server.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    outdir : str or os.PathLike\n",
    "        output directory (default: current directory).\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    str\n",
    "        Observatory list file.\n",
    "        \n",
    "    Raises\n",
    "    ------\n",
    "    ftplib.all_errors\n",
    "        in case of FTP errors.\n",
    "    \n",
    "    \"\"\"\n",
    "\n",
    "    OBS_HOUR_DIR = '/geomag/Swarm/AUX_OBS/hour'\n",
    "    \n",
    "    def _callback(line, result):\n",
    "        if line[0] == '-':\n",
    "            match = re.match('obslist.+_gd\\.all$', line[56:])\n",
    "            if match:\n",
    "                result.append(line[56:])\n",
    "\n",
    "    outdir = Path(outdir)\n",
    "    files = []\n",
    "    with FTP(HOST) as ftp:\n",
    "        ftp.login()\n",
    "        ftp.dir(OBS_HOUR_DIR, lambda line: _callback(line, files))\n",
    "        remote_obslist_file = f'{OBS_HOUR_DIR}/{files[0]}'\n",
    "        local_obslist_file = outdir / files[0]\n",
    "        with local_obslist_file.open('w') as fh:\n",
    "            ftp.retrlines(f'RETR {remote_obslist_file}', lambda line: print(line, file=fh))\n",
    "    return local_obslist_file\n",
    "        \n",
    "\n",
    "def read_obslist(file):\n",
    "    \"\"\"Convert observatory list  ASCII file to a pandas DataFrame.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    file : str or os.PathLike\n",
    "        observatory list ASCII file.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    pandas.DataFrame\n",
    "        data contained in the observatory list ASCII file.\n",
    "\n",
    "    \"\"\"\n",
    "    df = pd.read_csv(\n",
    "        file,\n",
    "        delim_whitespace=True,\n",
    "        names = ['IAGA_code', 'Latitude', 'Longitude', 'Altitude'],\n",
    "    )\n",
    "    return df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"obs\" />\n",
    "\n",
    "## Hourly mean values\n",
    "\n",
    "[[TOP]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hourly means hosted at:\n",
    "- ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/hour/\n",
    "\n",
    "Processing methodology:\n",
    "- Macmillan, S., Olsen, N. Observatory data and the Swarm mission. Earth Planet Sp 65, 15 (2013). https://doi.org/10.5047/eps.2013.07.011"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"obs-read-ascii\" />\n",
    "\n",
    "### Read data from ASCII files\n",
    "\n",
    "[[TOP]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the `search()` function (see [Settings and functions](#settings)) to search OBS hourly data from 2018-01-01T00:00:00 to 2019-12-31T23:59:59 on the FTP server:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = search('hour', '2018-01-01', '2019-12-31T23:59:59')\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the `download()` function (see [Settings and functions](#settings)) to download data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "downloaded = download(result, outdir=OBS_HOUR_LOCAL_DIR)\n",
    "downloaded"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Select one of the AUX_OBS_2_ files (e.g. the first one):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file1 = downloaded[0]\n",
    "file1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read ASCII file and convert data to a `pandas.DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1 = pd.read_csv(\n",
    "    file1,\n",
    "    comment='#',\n",
    "    delim_whitespace=True,\n",
    "    names = ['IAGA_code', 'Latitude', 'Longitude',\n",
    "             'Radius', 'yyyy', 'mm', 'dd', 'UT', 'B_N', 'B_E', 'B_C'],\n",
    "    parse_dates={'Timestamp': [4, 5, 6]},\n",
    "    infer_datetime_format=True\n",
    ")\n",
    "df1['Timestamp'] = df1['Timestamp'] + pd.to_timedelta(df1['UT'], 'h')\n",
    "df1.drop(columns='UT', inplace=True)\n",
    "df1.set_index('Timestamp', inplace=True)\n",
    "\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more information on `pandas.Dataframe` see: https://pandas.pydata.org/docs/reference/frame."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same result can be obtained with the `ascii_to_pandas()` function (see [Settings and functions](#settings))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new = ascii_to_pandas(file1)\n",
    "new"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare the two data frames:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.testing.assert_frame_equal(df1, new)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example: get minimum and maximum dates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1.index.min(), df1.index.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example: get list of observatories (IAGA codes) stored in the files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1['IAGA_code'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"obs-multifiles\" />\n",
    "\n",
    "### Read data from multiple files\n",
    "\n",
    "[[TOP]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas dataframes can be concatenated to represent data obtained from more than one file. E.g. read data from the next AUX_OBS_2_ file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file2 = downloaded[1]\n",
    "df2 = ascii_to_pandas(file2)\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The two dataframes can be concatenated using the `pandas.concat()` function (for more information see: https://pandas.pydata.org/docs/reference/api/pandas.concat.html#pandas.concat):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "concatenated = pd.concat([df1, df2])\n",
    "concatenated.sort_values(by=['IAGA_code', 'Timestamp'], inplace=True)\n",
    "\n",
    "concatenated.index.min(), concatenated.index.max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "concatenated"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"obs-examples\"/>\n",
    "\n",
    "### Examples\n",
    "\n",
    "[[TOP]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot hourly mean values on a map:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = ascii_to_pandas(file1)\n",
    "\n",
    "# Add F column\n",
    "df['F'] = np.linalg.norm(df[['B_N', 'B_E', 'B_C']], axis=1)\n",
    "\n",
    "# Select date\n",
    "date = '2018-01-01T01:30:00'\n",
    "\n",
    "fig = plt.figure(figsize=(16, 10))\n",
    "\n",
    "# Draw map\n",
    "ax = plt.subplot2grid((1, 1), (0, 0), projection=ccrs.PlateCarree())\n",
    "ax.coastlines()\n",
    "ax.add_feature(cfeature.OCEAN, facecolor='lightgrey')\n",
    "ax.gridlines()\n",
    "\n",
    "# Plot observatory measurements at date\n",
    "cm = ax.scatter(\n",
    "    df[date]['Longitude'], df[date]['Latitude'], c=df[date]['F'],\n",
    "    marker='D', transform=ccrs.PlateCarree(),\n",
    "    label=f'OBS F hourly mean value at {date}'\n",
    ")\n",
    "\n",
    "# Add IAGA codes\n",
    "for row in df[date].itertuples():\n",
    "    ax.annotate(\n",
    "        row.IAGA_code, (row.Longitude, row.Latitude),\n",
    "        xycoords=ccrs.PlateCarree()._as_mpl_transform(ax)\n",
    "    )\n",
    "\n",
    "# Set title and legendbb\n",
    "plt.title('Magnetic field intensities')\n",
    "plt.legend()\n",
    "\n",
    "# Add colorbar\n",
    "cax = fig.add_axes([0.92, 0.2, 0.02, 0.6])\n",
    "plt.colorbar(cm, cax=cax, label='F [nT]')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read list of all observatories (use the `download_obslist()` and `read_obslist()` functions defined in [Settings and functions](#settings)):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obslist = download_obslist(outdir=OBS_HOUR_LOCAL_DIR)\n",
    "obslist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs = read_obslist(obslist)\n",
    "obs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add the missing observatories, i.e. those not included in the observatory hourly mean values, to the plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = ascii_to_pandas(file1)\n",
    "\n",
    "# Add F column\n",
    "df['F'] = np.linalg.norm(df[['B_N', 'B_E', 'B_C']], axis=1)\n",
    "\n",
    "# Select date\n",
    "date = '2018-01-01T01:30:00'\n",
    "\n",
    "fig = plt.figure(figsize=(16, 10))\n",
    "\n",
    "# Draw map\n",
    "ax = plt.subplot2grid((1, 1), (0, 0), projection=ccrs.PlateCarree())\n",
    "ax.coastlines()\n",
    "ax.add_feature(cfeature.OCEAN, facecolor='lightgrey')\n",
    "ax.gridlines()\n",
    "\n",
    "# Plot observatory measurements at date\n",
    "cm = ax.scatter(\n",
    "    df[date]['Longitude'], df[date]['Latitude'], c=df[date]['F'],\n",
    "    marker='D', transform=ccrs.PlateCarree(),\n",
    "    label=f'OBS F hourly mean value at {date}'\n",
    ")\n",
    "\n",
    "# Add IAGA codes\n",
    "for row in df[date].itertuples():\n",
    "    ax.annotate(\n",
    "        row.IAGA_code, (row.Longitude, row.Latitude),\n",
    "        xycoords=ccrs.PlateCarree()._as_mpl_transform(ax)\n",
    "    )\n",
    "\n",
    "# Add missing observatories from obslist (position only)\n",
    "missing = obs[~obs['IAGA_code'].isin(df[date]['IAGA_code'].unique())]\n",
    "cm2 = ax.scatter(missing['Longitude'], missing['Latitude'], c='black', marker='D', alpha=0.1)\n",
    "\n",
    "# Set title and legendbb\n",
    "plt.title('Magnetic field intensities')\n",
    "plt.legend()\n",
    "\n",
    "# Add colorbar\n",
    "cax = fig.add_axes([0.92, 0.2, 0.02, 0.6])\n",
    "plt.colorbar(cm, cax=cax, label='F [nT]')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add Swarm F measurements between 01:00:00 and 02:00:00 of the same day:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# using viresclient\n",
    "request = SwarmRequest()\n",
    "request.set_collection('SW_OPER_MAGA_LR_1B')\n",
    "request.set_products(measurements='F')\n",
    "\n",
    "start_date = '2018-01-01T01:00:00'\n",
    "end_date = '2018-01-01T02:00:00'\n",
    "\n",
    "data = request.get_between(start_date, end_date)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = ascii_to_pandas(file1)\n",
    "\n",
    "# Add F column\n",
    "df['F'] = np.linalg.norm(df[['B_N', 'B_E', 'B_C']], axis=1)\n",
    "\n",
    "# Select date\n",
    "date = '2018-01-01T01:30:00'\n",
    "\n",
    "fig = plt.figure(figsize=(16, 10))\n",
    "\n",
    "# Draw map\n",
    "ax = plt.subplot2grid((1, 1), (0, 0), projection=ccrs.PlateCarree())\n",
    "ax.coastlines()\n",
    "ax.add_feature(cfeature.OCEAN, facecolor='lightgrey')\n",
    "ax.gridlines()\n",
    "\n",
    "# Plot observatory measurements at date\n",
    "cm = ax.scatter(\n",
    "    df[date]['Longitude'], df[date]['Latitude'], c=df[date]['F'],\n",
    "    marker='D', transform=ccrs.PlateCarree(),\n",
    "    label=f'OBS F hourly mean value at {date}'\n",
    ")\n",
    "\n",
    "# Add IAGA codes\n",
    "for row in df[date].itertuples():\n",
    "    ax.annotate(\n",
    "        row.IAGA_code, (row.Longitude, row.Latitude),\n",
    "        xycoords=ccrs.PlateCarree()._as_mpl_transform(ax)\n",
    "    )\n",
    "\n",
    "# Add missing observatories from obslist (position only)\n",
    "missing = obs[~obs['IAGA_code'].isin(df[date]['IAGA_code'].unique())]\n",
    "ax.scatter(missing['Longitude'], missing['Latitude'], c='black', marker='D', alpha=0.1)\n",
    "\n",
    "# Add Swarm A data\n",
    "swarm = data.as_dataframe()\n",
    "ax.scatter(\n",
    "    swarm['Longitude'], swarm['Latitude'], c=swarm['F'],\n",
    "    transform=ccrs.PlateCarree(),\n",
    "    label=f'Swarm A - F measurements between {start_date} and {end_date}'\n",
    ")\n",
    "\n",
    "# Set title and legendbb\n",
    "plt.title('Magnetic field intensities')\n",
    "plt.legend()\n",
    "\n",
    "# Add colorbar\n",
    "cax = fig.add_axes([0.92, 0.2, 0.02, 0.6])\n",
    "plt.colorbar(cm, cax=cax, label='F [nT]')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"obsms\" />\n",
    "\n",
    "## Minute and second mean values\n",
    "\n",
    "[[TOP]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Files containing observatory minute and second mean values have CDF format. They can be downloade from:\n",
    "\n",
    "- ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/minute/\n",
    "- ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/second/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"obsms-read-cdf\" />\n",
    "\n",
    "### Read data from CDF files\n",
    "\n",
    "[[TOP]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the `search()` function (see [Settings and functions](#settings)) to search OBS minute/second data from 2019-12-01T00:00:00 to 2019-12-31T23:59:59 on the FTP server:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "minute = search('minute', '2019-12-01', '2019-12-31T23:59:59')\n",
    "minute"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "second = search('second', '2019-12-01', '2019-12-31T23:59:59')\n",
    "second"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the `download()` function (see [Settings and functions](#settings)) to download data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dl_minute = download(minute, outdir=OBS_MINUTE_LOCAL_DIR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dl_second = download(second, outdir=OBS_SECOND_LOCAL_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Select one of the AUX_OBSM2_ files (e.g. the first one):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file1 = dl_minute[0]\n",
    "file1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read CDF file using `cdflib` (for more information on `cdflib`, see: https://github.com/MAVENSDC/cdflib)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = cdflib.CDF(file1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get info about the file as a Python dictionary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.cdf_info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that measurements are stored as *zVariables*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.cdf_info()['zVariables']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Data can be retrieved via the `.varget()` method, e.g:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.varget('B_NEC')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Data is returned as a `numpy.ndarray` object (for more information on `numpy.ndarray`, see: https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Variable attributes can be retrieved using the `.varattsget()` method, e.g.:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.varattsget('B_NEC')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Attributes are returned as a Python dictionary."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's retrieve the timestamps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.varget('Timestamp')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Timestamp` type is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.varget('Timestamp').dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Timestamps are represented as NumPy `float64` values. Why? Get info about `Timestamp` variable using the `.varinq()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.varinq('Timestamp')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The returned dictionary shows that the data type is *CDF_EPOCH* consising in a floating point value representing the number of milliseconds since 01-Jan-0000 00:00:00.000. It can be converted to a more readable format (list of strings) using the `cdflib.cdfepoch.encode()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts = cdflib.cdfepoch.encode(data.varget('Timestamp'), iso_8601=True)\n",
    "ts[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or to a numpy array of `numpy.datetime64` values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts = np.array(cdflib.cdfepoch.encode(data.varget('Timestamp'), iso_8601=True), dtype='datetime64')\n",
    "ts[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may be interested also in the CDF global attributes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.globalattsget()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Close the file when you have finished:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "AUX_OBSS2_ data contains the same variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with closing(cdflib.cdfread.CDF(dl_second[0])) as data:\n",
    "    zvariables = data.cdf_info()['zVariables']\n",
    "\n",
    "zvariables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Data can be represented as a `pandas.DataFrame` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with closing(cdflib.cdfread.CDF(file1)) as data:\n",
    "    ts = pd.DatetimeIndex(\n",
    "            cdflib.cdfepoch.encode(data.varget('Timestamp'), iso_8601=True),\n",
    "            name='Timestamp'\n",
    "        )\n",
    "    df1 = pd.DataFrame(\n",
    "        {\n",
    "            'IAGA_code': data.varget('IAGA_code')[:,0,0],\n",
    "            'Latitude': data.varget('Latitude'),\n",
    "            'Longitude': data.varget('Longitude'),\n",
    "            'Radius': data.varget('Radius'),\n",
    "            'B_N': data.varget('B_NEC')[:,0],\n",
    "            'B_E': data.varget('B_NEC')[:,1],\n",
    "            'B_C': data.varget('B_NEC')[:,2]\n",
    "        },\n",
    "        index=ts\n",
    "    )\n",
    "\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more information on `pandas.Dataframe` see: https://pandas.pydata.org/docs/reference/frame."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same result can be obtained with the `cdf_to_pandas()` function (see [Settings and functions](#settings))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new = cdf_to_pandas(file1)\n",
    "\n",
    "new"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compare the two data frames:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.testing.assert_frame_equal(df1, new)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example: get minimum and maximum dates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1.index.min(), df1.index.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example: get list of observatories (IAGA codes) stored in the files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1['IAGA_code'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Example: get list of observatories (IAGA codes) included in the following ranges of coordinates:\n",
    "- $30 \\leq Latitude \\leq 70$\n",
    "- $-10 \\leq Longitude \\leq 40$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1[(df1['Latitude'] >= 30) & (df1['Latitude'] <= 70) & (df1['Longitude'] >= -10) & (df1['Longitude'] <= 40)]['IAGA_code'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can do the same using the `.query()` method (see: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1.query('(30 <= Latitude <= 70) and (-10 <= Longitude <= 40)')['IAGA_code'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"obsms-multifiles\" />\n",
    "\n",
    "### Read data from multiple files\n",
    "\n",
    "[[TOP]](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas dataframes can be concatenated to represent data obtained from more than one file. E.g. read data from the next AUX_OBSM2_ file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file2 = dl_minute[1]\n",
    "\n",
    "df2 = cdf_to_pandas(file2)\n",
    "\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The two dataframes can be concatenated using the `pandas.concat()` function (for more information see: https://pandas.pydata.org/docs/reference/api/pandas.concat.html#pandas.concat):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "concatenated = pd.concat([df1, df2])\n",
    "concatenated.sort_values(by=['IAGA_code', 'Timestamp'], inplace=True)\n",
    "\n",
    "concatenated.index.min(), concatenated.index.max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "concatenated"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With AUX_OBSS2_ data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "files = dl_second[:2]\n",
    "\n",
    "files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "concatenated = pd.concat([cdf_to_pandas(file) for file in files])\n",
    "concatenated.sort_values(by=['IAGA_code', 'Timestamp'], inplace=True)\n",
    "\n",
    "concatenated.index.min(), concatenated.index.max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "concatenated"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}