{ "cells": [ { "cell_type": "markdown", "id": "2dd23757-7cba-44b2-844a-5eff30aa513c", "metadata": {}, "source": [ "# *When* should you create an object?" ] }, { "cell_type": "markdown", "id": "23edffdb-a40e-4ef3-abf7-3c5a0bf2953b", "metadata": {}, "source": [ "## Complicated Data Structures\n", "\n", "One good example of a use for classes and objects is when you have complicated data structures. What makes a data structure complicated? Well, think about it from the user's point of view. If the user can't simply `print(data)` and understand what the data is and how to use it from the output, then it's complicated. Obviously, there are varying degrees of complicated, but the data should be simple to access and use.\n", "\n", "Let's consider an example. Imagine that you have a large number of *instruments* that record a large number of different kinds of data with irregular temporal frequencies. Imagine that the *instrument* records each variable at different times, so there is no time data common to all variables.\n", "\n", "Because the data is potentially numerous and varied, there are usually many ways of *representing* the data. One way of representing this kind of data is with nested lists, tuples, and dictionaries. Since the data record frequencies are irregular, and each variable has it's own time data, it makes some sense to group the time and variable data *together* in a `tuple`, and then store multiple time-variable `tuples` in a `list`. Then, we can store each time-variable `tuple-list` in a `dict` with the name of each variable as the key in the dictionary. This might look like the following." ] }, { "cell_type": "markdown", "id": "0519927f-d9b3-48e8-b0fe-66832cfbfb46", "metadata": {}, "source": [ "### Example Data\n", "\n", "Let's assume that the *instument* writes its data to a CSV (comma-separated-value) file with the following format." ] }, { "cell_type": "code", "execution_count": null, "id": "f2a7b785-39d1-424f-b4d8-350f2b15ea5a", "metadata": {}, "outputs": [], "source": [ "%%writefile instrument1.csv\n", "B,0.891,-0.0178\n", "A,1.132,45.741\n", "B,1.852,-0.6319\n", "A,2.376,42.178\n", "B,3.017,-2.7863\n", "A,3.861,41.389\n", "A,5.142,42.687" ] }, { "cell_type": "markdown", "id": "fa0a329f-3ad6-4e1a-8d6a-39026e82670b", "metadata": {}, "source": [ "We can read this data into our format described above with the following function:" ] }, { "cell_type": "code", "execution_count": null, "id": "7141b9fa-1ee8-449f-8343-e3a15d967518", "metadata": {}, "outputs": [], "source": [ "def read_instrument_data(filename):\n", " instrument_data = {}\n", " with open(filename) as f:\n", " for record in f:\n", " var,time,value = record.split(',')\n", " if var not in instrument_data:\n", " instrument_data[var] = []\n", " instrument_data[var].append((float(time), float(value)))\n", " return instrument_data" ] }, { "cell_type": "code", "execution_count": null, "id": "06074107-272e-4d02-a007-15710bb8f344", "metadata": {}, "outputs": [], "source": [ "instrument_data = read_instrument_data('instrument1.csv')" ] }, { "cell_type": "markdown", "id": "9cee2982-8707-40c8-91c6-f42928709162", "metadata": {}, "source": [ "Now, let's try the `print(data)` test..." ] }, { "cell_type": "code", "execution_count": null, "id": "25038437-eb25-4505-bce4-dda4c8863b30", "metadata": {}, "outputs": [], "source": [ "print(instrument_data)" ] }, { "cell_type": "markdown", "id": "40b2ab6b-7e9c-43ee-839b-e1b9800223ec", "metadata": {}, "source": [ "Can you figure out what this data represents just from the output above? If not, what kinds of problems might you expect someone to have with a data structure like this?" ] }, { "cell_type": "markdown", "id": "799b6c55-cadd-44a3-81e3-d15b823892c9", "metadata": {}, "source": [ "<div class=\"alert alert-warning\">\n", " <b>Warning:</b>\n", " <p>If a user simply looked at this output, how would they know what the numbers are? The variables are named, but there are two numbers for each variable. Perhaps they can figure out that one is time, but perhaps they cannot. This means there is information stored with the data that is <em>not explicit</em>. And that is when problems occur!</p>\n", "</div>" ] }, { "cell_type": "markdown", "id": "69fd1377-7e80-42ed-beb2-b9207a22038e", "metadata": {}, "source": [ "### Example Functions\n", "\n", "Now, what kinds of operations do we want to have for this data? Probably lots of different functions, and we won't cover all of them here. Let's just think about a couple examples. \n", "\n", "- For example, when plotting a variable's data, you probably want to extract just the `time` data into its own list and just the `variable` data into its own list, instead of having the data mixed together.\n", "\n", "- For example, you might want to compute an integral of a variable using the Trapezoid rule.\n", "\n", "Let's just consider these examples for now." ] }, { "cell_type": "code", "execution_count": null, "id": "e2f161fc-e715-427b-8712-d00c72e8aab8", "metadata": {}, "outputs": [], "source": [ "def get_time(data, var):\n", " return [tpl[0] for tpl in data[var]]\n", "\n", "def get_variable(data, var):\n", " return [tpl[1] for tpl in data[var]]" ] }, { "cell_type": "code", "execution_count": null, "id": "cd67c7b2-4117-415b-9b37-c242356ed36c", "metadata": {}, "outputs": [], "source": [ "x = get_time(instrument_data, 'A')\n", "x" ] }, { "cell_type": "code", "execution_count": null, "id": "d04b8709-1d66-43e3-ab6f-b134a1919836", "metadata": {}, "outputs": [], "source": [ "y = get_variable(instrument_data, 'A')\n", "y" ] }, { "cell_type": "markdown", "id": "792e48b3-bbfd-4f59-ba77-ca8f7ac2fc3b", "metadata": {}, "source": [ "We've encoded information about the data structure (i.e., which tuple element is `time` and which is the `variable`?) into the functions themselves. Now, the user of the data structure doesn't have to know the details of the data itself to access the data." ] }, { "cell_type": "code", "execution_count": null, "id": "919fbb12-018e-4fc5-bc55-3d59f1416b79", "metadata": {}, "outputs": [], "source": [ "def integrate_trapezoid(data, var):\n", " x, y = list(map(list, zip(*data[var])))\n", " return sum(0.5*(x[i] - x[i-1])*(y[i] + y[i-1]) for i in range(1, len(x)))" ] }, { "cell_type": "code", "execution_count": null, "id": "805d1683-8afa-4401-b3d0-3a20a16bf09e", "metadata": {}, "outputs": [], "source": [ "integrate_trapezoid(instrument_data, 'A')" ] }, { "cell_type": "markdown", "id": "424c79dd-d7a3-4c3b-bb66-4b66068d4aa0", "metadata": {}, "source": [ "Is this reasonable? How could you check if this is correct?" ] }, { "cell_type": "markdown", "id": "c5a98bcb-a501-439c-a370-74ab9a6293ab", "metadata": {}, "source": [ "## Problems and Limitations\n", "\n", "What kinds of potential problems do you see with this approach?\n", "\n", "1. What happens if someone tries to \"add data\" to the `instrument_data` structure or modify the `instrument_data` structure themselves? How do they do that? Where do they put new data? If anyone needs to modify the original data, they need to know the format for that data or all of the functions will break.\n", "\n", "2. How does someone who is looking at the functions know what the `data` argument is or how it should look? They have to read the functions to figure out the structure of the `data` argument (and infer the structure of `instrument_data`).\n", "\n", "Are there any other problems?" ] }, { "cell_type": "markdown", "id": "258a58b1-6a36-44b3-abdb-142b2906673c", "metadata": {}, "source": [ "## Key Takeaway\n", "\n", "This means that the implementation of the *functions* and the structure of the *data* are intrinsicly related to each other. And to be safe, and to avoid the problems suggested above, it is usually encouraged to group these functions and data together into a `class` and to *hide* the structure of the data from the user entirely.\n", "\n", "***How might you do that?***" ] }, { "cell_type": "markdown", "id": "ce6be040-e4c3-4fe9-8ab4-06f86d1be0e1", "metadata": {}, "source": [ "### Exercise: Try grouping the data and functions above into an object" ] }, { "cell_type": "code", "execution_count": null, "id": "d9761702-9c99-444a-9e93-2ece3765b0c4", "metadata": {}, "outputs": [], "source": [ "# Try writing a class that groups the data and functions above" ] }, { "cell_type": "markdown", "id": "9c0078c4-5f5c-4cc5-ad4b-31fb5865b817", "metadata": {}, "source": [ "| | | |\n", "| :- | -- | -: |\n", "| [[Home]](../index.ipynb) | <img width=\"100%\" height=\"1\" src=\"../images/empty.png\"/> | [« Previous](06.ipynb) \\| [Next »](08.ipynb) |" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }