{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploring your Home Assistant data\n",
    "\n",
    "The goal of this page is to get you familiar with the data in your Home Assistant instance. The page you're reading right now is a Jupyter Notebook. These documents contain instructions for the user and embedded Python code to generate graphs and tables of your data. It's interactive so you can at any time change the code of any example and just press the ▶️ button to update the example with your changes! \n",
    "\n",
    "To get started, let's execute all examples on this page: in the menu at the top left, click on \"Run\" -> \"Run All Cells\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install HASS-data-detective # Install detective"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip show HASS-data-detective"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import detective.core as detective\n",
    "import detective.functions as functions\n",
    "import pandas as pd\n",
    "\n",
    "db = detective.db_from_hass_config()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Popular entities\n",
    "\n",
    "In the following example, we're going to explore your most popular entities and break it down per period of the day (morning/afternoon/evening/night).\n",
    "\n",
    "We will do this by looking at which services are getting called and which entities they targeted. To make the results more relevant, we will filter out any service call that happened because of another service call. So if a user turns on a script which turns on a light, we only count the interaction with the script and not with the light."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter, OrderedDict\n",
    "import json\n",
    "import sqlalchemy\n",
    "\n",
    "from detective.time import time_category, sqlalch_datetime, localize, TIME_CATEGORIES\n",
    "\n",
    "# Prepare a dictionary to track results\n",
    "results = OrderedDict((time_cat, Counter()) for time_cat in TIME_CATEGORIES)\n",
    "\n",
    "# We keep track of contexts that we processed so that we will only process\n",
    "# the first service call in a context, and not subsequent calls.\n",
    "context_processed = set()\n",
    "\n",
    "for event in db.perform_query(sqlalchemy.text(\"SELECT context_id_bin as context_id, event_data.shared_data as shared_data, from_unixtime(time_fired_ts) as time_fired FROM events LEFT JOIN event_data ON events.data_id = event_data.data_id  WHERE event_type_id = (select event_type_id from event_types where event_type = 'call_service') ORDER BY time_fired;\")):\n",
    "    entity_ids = None\n",
    "\n",
    "    # Skip if we have already processed an event that was part of this context\n",
    "    if event.context_id in context_processed:\n",
    "        continue\n",
    "\n",
    "    try:\n",
    "        event_data = json.loads(event.shared_data)\n",
    "    except ValueError:\n",
    "        continue\n",
    "\n",
    "    # Empty event data, skipping (shouldn't happen, but to be safe)\n",
    "    if not event_data:\n",
    "        continue\n",
    "\n",
    "    service_data = event_data.get('service_data')\n",
    "\n",
    "    # No service data found, skipping\n",
    "    if not service_data:\n",
    "        continue\n",
    "\n",
    "    entity_ids = service_data.get('entity_id')\n",
    "\n",
    "    # No entitiy IDs found, skip this event\n",
    "    if entity_ids is None:\n",
    "        continue\n",
    "\n",
    "    if not isinstance(entity_ids, list):\n",
    "        entity_ids = [entity_ids]\n",
    "\n",
    "    context_processed.add(event.context_id)\n",
    "\n",
    "    period = time_category(\n",
    "        localize(sqlalch_datetime(event.time_fired)))\n",
    "\n",
    "    for entity_id in entity_ids:\n",
    "        results[period][entity_id] += 1\n",
    "\n",
    "print(\"Most popular entities to interact with:\")\n",
    "\n",
    "RESULTS_TO_SHOW = 5\n",
    "\n",
    "for period, period_results in results.items():\n",
    "    print()\n",
    "    \n",
    "    entities = [\n",
    "        ent_id for (ent_id, count)\n",
    "        in period_results.most_common(RESULTS_TO_SHOW)\n",
    "    ]\n",
    "    \n",
    "    result = ', '.join(entities) if entities else '-'\n",
    "    print(f\"{period.capitalize()}: {result}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Next up\n",
    "\n",
    "Let's now use pandas to visualise the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame.from_dict(results).fillna(0)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## View states\n",
    "Detective makes it easy to view your state data as a pandas dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "df = db.fetch_all_sensor_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our data is now in a Pandas dataframe. Lets show the head of the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is necessary to do some formatting of the data before we can plot it, and detective provides several functions to assist. You should familiarise yourself with these functions and create your own."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df[df['domain']=='sensor']\n",
    "df = functions.generate_features(df)\n",
    "df = functions.format_dataframe(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice the new feature columns added. It is straightforward to create your own features, for example to add a day_of_week column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['day_of_week'] = df['last_changed'].apply(lambda x : x.dayofweek)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot some data\n",
    "First plot using [Seaborn](https://seaborn.pydata.org/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install seaborn # Uncomment to install if required"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "fig, ax = plt.subplots(1, figsize=(20,6))\n",
    "sns.lineplot(\n",
    "    x='last_changed', \n",
    "    y='state', \n",
    "    hue='entity_id', \n",
    "    data=df[df['device_class'] == 'temperature'], \n",
    "    ax=ax);"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}