{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating a [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are actually quite a few ways to create a [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) from existing objects.\n", "\n", "Let's explore!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Setup\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From a 2-dimensional object" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If your data is already in rows and columns you can just pass it along to the constructor. Labels and Column headings will be automatically generated as a range." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
0CraigDennis42.42
1TreasurePorth25.00
\n", "
" ], "text/plain": [ " 0 1 2\n", "0 Craig Dennis 42.42\n", "1 Treasure Porth 25.00" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_users_list = [\n", " ['Craig', 'Dennis', 42.42],\n", " ['Treasure', 'Porth', 25.00]\n", "]\n", "\n", "pd.DataFrame(test_users_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how both the labels and column headings are autogenerated. You can specify the `index` and `columns`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namebalance
craigsdennisCraigDennis42.42
treasureTreasurePorth25.00
\n", "
" ], "text/plain": [ " first_name last_name balance\n", "craigsdennis Craig Dennis 42.42\n", "treasure Treasure Porth 25.00" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(test_users_list, index=['craigsdennis', 'treasure'],\n", " columns=['first_name', 'last_name', 'balance'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## From a dictionary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Much like a `Series`, if you do not specify the index it will be autogenerated in range format." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namebalance
0CraigDennis42.42
1TreasurePorth25.00
\n", "
" ], "text/plain": [ " first_name last_name balance\n", "0 Craig Dennis 42.42\n", "1 Treasure Porth 25.00" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Default expected Dictionary layout is column name, to ordered values\n", "test_user_data = {\n", " 'first_name': ['Craig', 'Treasure'],\n", " 'last_name': ['Dennis', 'Porth'],\n", " 'balance': [42.42, 25.00]\n", "}\n", "\n", "pd.DataFrame(test_user_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And remember that can specify the index by supplying the `index` keyword argument." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namebalance
craigsdennisCraigDennis42.42
treasureTreasurePorth25.00
\n", "
" ], "text/plain": [ " first_name last_name balance\n", "craigsdennis Craig Dennis 42.42\n", "treasure Treasure Porth 25.00" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(test_user_data, index=['craigsdennis', 'treasure'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### [`DataFrame.from_dict`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html#pandas.DataFrame.from_dict) adds more options" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `orient` keyword\n", "The orient keyword allows you to specify whether the keys of your dictionary are part of the labels (`index`) or the column titles (`columns`). Note how the nested dictionaries have been used to define the columns. You could also pass a list to the `columns` " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
first_namelast_namebalance
craigsdennisCraigDennis42.42
treasureTreasurePorth25.00
\n", "
" ], "text/plain": [ " first_name last_name balance\n", "craigsdennis Craig Dennis 42.42\n", "treasure Treasure Porth 25.00" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "by_username = {\n", " 'craigsdennis': {\n", " 'first_name': 'Craig',\n", " 'last_name': 'Dennis',\n", " 'balance': 42.42\n", " },\n", " 'treasure': {\n", " 'first_name': 'Treasure',\n", " 'last_name': 'Porth',\n", " 'balance': 25.00\n", " }\n", "}\n", "\n", "pd.DataFrame.from_dict(by_username, orient='index')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }