{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "795ebd80-8a0f-479b-a888-161a5dfacbae",
   "metadata": {},
   "source": [
    "# Read Phoenix data into MTH5\n",
    "\n",
    "This example demonstrates how to read Phoenix data into an MTH5 file.  The data comes from example data in [PhoenixGeoPy](https://github.com/torresolmx/PhoenixGeoPy). Here I downloaded those data into a local folder on my computer by forking the main branch.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74989d46-d13b-47e8-8189-17f515cc736a",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d5083c86-ebea-41d1-ade6-87e2de073b84",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-06-15 13:58:51,048 [line 135] mth5.setup_logger - INFO: Logging file can be found C:\\Users\\peaco\\Documents\\GitHub\\mth5\\logs\\mth5_debug.log\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "from mth5.mth5 import MTH5\n",
    "from mth5 import read_file\n",
    "from mth5.io.phoenix.readers.phx_json import ReceiverMetadataJSON"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b56259e6-b397-40c7-93e4-6ac30d1cdd68",
   "metadata": {},
   "source": [
    "## Data Directory\n",
    "\n",
    "Specify the station directory.  Phoenix files place each channel in a folder under the station directory named by the channel number.  There is also a `recmeta.json` file that has metadata output by the receiver that can be useful.  In the `PhoenixGeopPy/sample_data` there are 2 folders one for native data, these are `.bin` files which are the raw data in counts sampled at 24k.  There is also a folder for segmented files, these files are calibrated to millivolts and decimate or segment the data according to the recording configuration.  Most of the time you would use the segmented files? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "6348263b-79f2-4bae-9981-cf4268d7a48b",
   "metadata": {},
   "outputs": [],
   "source": [
    "station_dir = Path(r\"c:\\Users\\peaco\\Documents\\GitHub\\PhoenixGeoPy\\sample_data\\segmented\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63bd8db6-f8d7-44fe-91e1-fbf4cb3d493c",
   "metadata": {},
   "source": [
    "## Receiver Metadata\n",
    "\n",
    "The data logger or receiver will output a `JSON` file that contains useful metadata that is missing from the data files.  The `recmeta.json` file can be read into an object with methods to translate to `mt_metadata` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b97d19eb-6103-4468-b2bb-20c68a778cb9",
   "metadata": {},
   "outputs": [],
   "source": [
    "receiver_metadata = ReceiverMetadataJSON(r\"c:\\Users\\peaco\\Documents\\GitHub\\PhoenixGeoPy\\sample_data\\segmented\\recmeta.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c310e9a-522c-453d-b230-eb704232cdb7",
   "metadata": {},
   "source": [
    "## Initiate MTH5\n",
    "\n",
    "First initiate an MTH5 file, can use the receiver metadata to fill in some `Survey` metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b4e44d4b-2173-4a9a-a6d7-f9335282eabd",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-06-15 13:58:51,298 [line 590] mth5.mth5.MTH5.open_mth5 - WARNING: mth5_from_phoenix.h5 will be overwritten in 'w' mode\n",
      "2022-06-15 13:58:51,903 [line 655] mth5.mth5.MTH5._initialize_file - INFO: Initialized MTH5 0.2.0 file mth5_from_phoenix.h5 in mode w\n"
     ]
    }
   ],
   "source": [
    "m = MTH5()\n",
    "m.open_mth5(\"mth5_from_phoenix.h5\", \"w\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dc1b671-69b1-4c14-8cd7-e9b73be0dcb1",
   "metadata": {},
   "source": [
    "### Add Survey"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "b628dba6-9f7f-48e5-9260-7e531fe37bfc",
   "metadata": {},
   "outputs": [],
   "source": [
    "survey_metadata = receiver_metadata.survey_metadata\n",
    "survey_group = m.add_survey(survey_metadata.id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bbd2943-4e0f-4b7e-8555-0ea10f5e822a",
   "metadata": {},
   "source": [
    "### Add Station\n",
    "\n",
    "Add a station and station metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bc7e61fb-6a13-43ce-89cf-db34d3015775",
   "metadata": {},
   "outputs": [],
   "source": [
    "station_metadata = receiver_metadata.station_metadata\n",
    "station_group = survey_group.stations_group.add_station(station_metadata.id, station_metadata=station_metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59d5bd8d-c10f-42f5-9a85-45332d92f48b",
   "metadata": {},
   "source": [
    "## Loop through channels\n",
    "\n",
    "Here we will loop through each channel which is a folder under the station directory.  Inside the folder are files with extensions of `.td_24k` and `td_150`.  \n",
    "\n",
    "- `.td_24k` are usually bursts of a few seconds of data sampled at 24k samples per second to get high frequency information.  When these files are read in the returned object is a list of `mth5.timeseries.ChannelTS` objects that represent each burst.\n",
    "- `td_150` is data continuously sampled at 150 samples per second.  These files usually have a set length, commonly an hour. The returned object is a `mth5.timeseries.ChannelTS`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d731d64c-cb9a-4bb2-8095-3b7f9532652a",
   "metadata": {},
   "source": [
    "### Read Continuous data\n",
    "\n",
    "We only need to open the first `.td_150` file as it will automatically read the sequence of files.  We need to do this because the header of the `.td_150` file only contains the master header which has the start time of when the recording started and not the start time of the file, that's in the file name, which is not stable."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5223f5c8-3c5a-42c1-98de-1e4897dff91a",
   "metadata": {},
   "source": [
    "#### Add a Run for continuous data\n",
    "\n",
    "Here we will add a run for the continuous data labelled `sr150_001`.  This is just a suggestion, you could name it whatever makes sense to you. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b720a2d2-e662-4a22-a2e7-2534ad84badc",
   "metadata": {},
   "outputs": [],
   "source": [
    "run_metadata = receiver_metadata.run_metadata\n",
    "run_metadata.id = \"sr150_001\"\n",
    "run_metadata.sample_rate = 150.\n",
    "continuous_run = station_group.add_run(run_metadata.id, run_metadata=run_metadata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "52d6d0bf-2940-4eb4-bd45-b7f89c1f512c",
   "metadata": {},
   "outputs": [],
   "source": [
    "for ch_dir in station_dir.iterdir():\n",
    "    if ch_dir.is_dir():\n",
    "        ch_metadata = receiver_metadata.get_ch_metadata(int(ch_dir.stem))\n",
    "        # need to set sample rate to 0 so it does not override existing value\n",
    "        ch_metadata.sample_rate = 0\n",
    "        ch_150 = read_file(\n",
    "            sorted(list(ch_dir.glob(\"*.td_150\")))[0],\n",
    "            **{\"channel_map\":receiver_metadata.channel_map}\n",
    "        )\n",
    "        ch_150.channel_metadata.update(ch_metadata)\n",
    "        ch_dataset = continuous_run.from_channel_ts(ch_150)\n",
    "        \n",
    "continuous_run.validate_run_metadata()\n",
    "continuous_run.write_metadata()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4aa58e11-1b01-4e5d-87bc-4c1d09bf42e2",
   "metadata": {},
   "source": [
    "### Read Segmented data\n",
    "\n",
    "Segmented data are busts of high frequency sampling, typically a few seconds every few minutes.  This will create a log of runs, so we will label the runs sequentially `sr24k_001`.  Now you may need to add digits onto the end depending on how long your sampling was.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "640fc623-9207-4250-ad52-8e67e403bb32",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2022-06-15T13:58:54 [line 201] <class 'mth5.io.phoenix.readers.segmented.decimated_segmented_reader.DecimatedSegmentedReader'>.DecimatedSegmentedReader.read_segments - INFO: Read 12 segments\n",
      "2022-06-15T13:58:57 [line 201] <class 'mth5.io.phoenix.readers.segmented.decimated_segmented_reader.DecimatedSegmentedReader'>.DecimatedSegmentedReader.read_segments - INFO: Read 12 segments\n",
      "2022-06-15T13:59:04 [line 201] <class 'mth5.io.phoenix.readers.segmented.decimated_segmented_reader.DecimatedSegmentedReader'>.DecimatedSegmentedReader.read_segments - INFO: Read 12 segments\n",
      "2022-06-15T13:59:13 [line 201] <class 'mth5.io.phoenix.readers.segmented.decimated_segmented_reader.DecimatedSegmentedReader'>.DecimatedSegmentedReader.read_segments - INFO: Read 12 segments\n"
     ]
    }
   ],
   "source": [
    "for ch_dir in station_dir.iterdir():\n",
    "    if ch_dir.is_dir():\n",
    "        ch_metadata = receiver_metadata.get_ch_metadata(int(ch_dir.stem))\n",
    "        # need to set sample rate to 0 so it does not override existing value\n",
    "        ch_metadata.sample_rate = 0\n",
    "        ch_segments = read_file(\n",
    "            sorted(list(ch_dir.glob(\"*.td_24k\")))[0],\n",
    "            **{\"channel_map\":receiver_metadata.channel_map}\n",
    "        )\n",
    "        for ii, seg_ts in enumerate(ch_segments):\n",
    "            # update run metadata\n",
    "    \n",
    "            run_id = f\"sr24k_{ii:03}\"\n",
    "            if not run_id in station_group.groups_list:\n",
    "                run_metadata.id = run_id\n",
    "                run_metadata.sample_rate = 24000\n",
    "                run_group = station_group.add_run(run_id, run_metadata=run_metadata)\n",
    "            else:\n",
    "                run_group = station_group.get_run(run_id)\n",
    "            \n",
    "            # update channel metadata\n",
    "            seg_ts.channel_metadata.update(ch_metadata)\n",
    "            \n",
    "            # add channel\n",
    "            run_group.from_channel_ts(seg_ts)\n",
    "            \n",
    "            # update run metadata\n",
    "            run_group.validate_run_metadata()\n",
    "            run_group.write_metadata()\n",
    "            "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ed2923c-aa99-4769-9a41-17d0c0a1f1a5",
   "metadata": {},
   "source": [
    "#### Update metadata before closing\n",
    "\n",
    "Need to update the metadata to account for added stations, runs, and channels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "eda9318c-d595-4a21-ba86-e44b055282b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "station_group.validate_station_metadata()\n",
    "station_group.write_metadata()\n",
    "\n",
    "survey_group.update_survey_metadata()\n",
    "survey_group.write_metadata()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "1fe6acfb-924d-4ceb-b15f-7f7deb7028be",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-06-15 14:00:34,014 [line 736] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing mth5_from_phoenix.h5\n"
     ]
    }
   ],
   "source": [
    "m.close_mth5()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f7662814-3ce8-469a-9755-52c5f7fdfc6c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'compression': 'gzip',\n",
       " 'compression_opts': 9,\n",
       " 'shuffle': True,\n",
       " 'fletcher32': True}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m.dataset_options"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "caac19b6-f773-4fbb-a908-841114709ceb",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}