{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating a Series\n", "There are couple of ways to create a new [`Series`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html) from scratch. \n", "\n", "Let's explore." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Just like how NumPy is almost always abbreviated as np...\n", "import numpy as np\n", "# pandas is usually shortened to pd\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating from a dictionary\n", "\n", "\n", "Let's use this sample data here we got from CashBox. They want to track the balances of their users. This is how much money each user currently has in their account. CashBox requires that users create a username.\n", "\n", "In our example, `test_balance_data` is just a standard Python dictionary the key is username, and the value is that user's current account balance. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "test_balance_data = {\n", " 'pasan': 20.00,\n", " 'treasure': 20.18,\n", " 'ashley': 1.05,\n", " 'craig': 42.42,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Series` constructor accepts any dict-like object" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "balances = pd.Series(test_balance_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that labels have been set from the `test_balance_data.keys()` and the values are set from `test_balance_data.values()`" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pasan 20.00\n", "treasure 20.18\n", "ashley 1.05\n", "craig 42.42\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "balances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating from an Iterable\n", "\n", "You can pass any iterable as the first argument" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "unlabeled_balances = pd.Series([20.00, 20.18, 1.05, 42.42])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*NOTE*: When labels are not present they're defaulted to incremental integers starting at 0" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 20.00\n", "1 20.18\n", "2 1.05\n", "3 42.42\n", "dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unlabeled_balances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also provide the `index` argument which requires an iterable the same size as your data." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "labeled_balances = pd.Series(\n", " [20.00, 20.18, 1.05, 42.42],\n", " index=['pasan', 'treasure', 'ashley', 'craig']\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note, the order of the labels is guaranteed to match the same order of the supplied index. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pasan 20.00\n", "treasure 20.18\n", "ashley 1.05\n", "craig 42.42\n", "dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labeled_balances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing to remember is that a NumPy array is also iterable, so you can create a new `Series` from an `ndarray`. In fact, you'll find NumPy and Pandas get along very well together." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 20.00\n", "1 20.18\n", "2 1.05\n", "3 42.42\n", "dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ndbalances = np.array([20.00, 20.18, 1.05, 42.42])\n", "pd.Series(ndbalances)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating from a scalar and an index\n", "\n", "If you pass in a scalar, remember that is a single value, it will be broadcast to each of the keys specified in the `index` keyword argument." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "guil 20.0\n", "jay 20.0\n", "james 20.0\n", "ben 20.0\n", "nick 20.0\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(20.00, index=[\"guil\", \"jay\", \"james\", \"ben\", \"nick\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In other words, each key is assigned the same scalar value for the entire `Series`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Learn More\n", "* [Introduction to Data Structures - Series (pandas documentation)](https://pandas.pydata.org/pandas-docs/stable/dsintro.html#series)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }