\n",
"\n",
"In 1977, John Tukey, one of the great statisticians and mathematicians of all time, published a book entitled *Exploratory Data Analysis*. In it, he laid out general principles on how researchers should handle their first encounters with their data, before formal statistical inference. Most of us spend a lot of time doing exploratory data analysis, or EDA, without really knowing it. Mostly, EDA involves a graphical exploration of a data set.\n",
"\n",
"We start off with a few wise words from John Tukey himself."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Useful EDA advice from John Tukey\n",
"\n",
"- \"Exploratory data analysis can never be the whole story, but nothing else can serve as a foundation stone—as the first step.\"\n",
"\n",
" \n",
"\n",
"- \"In exploratory data analysis there can be no substitute for flexibility; for adapting what is calculated—and what we hope plotted—both to the needs of the situation and the clues that the data have already provided.\"\n",
"\n",
" \n",
"\n",
"- \"There is no excuse for failing to plot and look.\"\n",
"\n",
" \n",
"\n",
"- \"There is often no substitute for the detective's microscope - - or for the enlarging graphs.\"\n",
"\n",
" \n",
"\n",
"- \"Graphs force us to note the unexpected; nothing could be more important.\"\n",
"\n",
" \n",
"\n",
"- \"'Exploratory data analysis' is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there.\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The tools of EDA\n",
"\n",
"Being able to load in a data set and quickly start exploring it graphically enables you to _think_ about your data set instead being mired in the mechanics of producing a plot. In the notebooks that follow in this lesson, we will learn how to use the Python-based tools for EDA. In particular, we will learn how to use [Pandas](https://pandas.pydata.org) to keep the data set organized and accessible, and [Bokeh](https://docs.bokeh.org/en/latest/) and [HoloViews](https://holoviews.org/) to make interactive graphics.\n",
"\n",
"Along the way, we will learn key concepts of data organization and display. Importantly, we will learn about **tidy data**, **split-apply-combine**, and how to **plot all of your data**.\n",
"\n",
"Before we march on this trajectory, though, we need to learn a bit about [Numpy](http://numpy.org/) and [Scipy](http://scipy.org/), which form the foundation upon which much of these tools are built."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}