# [Fundamental Python Data Science Libraries: A Cheatsheet (Part 2/4)](https://hackernoon.com/fundamental-python-data-science-libraries-a-cheatsheet-part-2-4-fcf5fab9cdf1)

by [Lauren Glass](https://www.linkedin.com/in/laurenjglass/), [Hackernoon](https://hackernoon.com/), Jan. 17, 2018

## pandas

This library is built on top of NumPy. It allows you to store & manipulate data in a relational table structure.

This library focuses on two objects: the Series (1D) and the DataFrame (2D). Each allow you to set:

- an index: that lets you find and manipulate certain rows
- column names: that lets you find and manipulate certain columns



In [None]:
import pandas as pd
import numpy as np

In [None]:
# Series
future_array1 = [1,2,3,4,5,6]
array1 = np.array(future_array1)
s = pd.Series(array1)

In [None]:
s

In [None]:
# DataFrame
future_array2 = [2,4,6,8,10,12]
array2 = np.array(future_array2)
df = pd.DataFrame([future_array1, future_array2])

In [None]:
df

In [None]:
# Series from a dictionary
future_series = {0: 'A', 1: 'B', 2: 'C'}
s = pd.Series(future_series)

In [None]:
s

In [None]:
# DataFrame from dictionary
dict = {'Normal': ['A', 'B', 'C'], 'Reverse': ['Z', 'Y', 'X']}
df = pd.DataFrame(dict)

In [None]:
df

In [None]:
# upload data from file: The keyword argument, index_col, is where you 
# can specify which column in your CSV should be the index in the DataFrame
#uploaded_data = pd.read_csv("filename.csv", index_col=0)

In [None]:
# Use the Index
dates = pd.date_range("20160101", periods=6)
data = np.random.random((6,3))
column_names = ['Column1', 'Column2', 'Column3']
df = pd.DataFrame(data, index=dates, columns=column_names)

In [None]:
df

In [None]:
# Indexing a column
df['Column2'] # use the column name's string

In [None]:
# Indexing a row
df[0:2] # use the standard indexing technique

In [None]:
df['20160101':'20160102'] # use the index's strings

In [None]:
# Indexing multiple axes — names
df.loc['20160101':'20160102',['Column1','Column3']]

In [None]:
# Indexing multiple axes — numbers
df.iloc[3:5, 0:2]

In [None]:
# View Your Data
df.head(2) # first 2 rows

In [None]:
df.tail(2) # last 2 rows

In [None]:
# View summary statistics
df.describe()

In [None]:
# Control Your Data
# Pandas brings the flexibility of SQL into Python.

In [None]:
# Sort
df.sort_index(axis=0, ascending=False) # sort using the index

In [None]:
df.sort_values(by='Column2') # sort using a column

In [None]:
# Join
dates1 = pd.date_range("20160101", periods=6)
data1 = np.random.random((6,2))
column_names1 = ['ColumnA', 'ColumnB']
dates2 = pd.date_range("20160101", periods=7)
data2 = np.random.random((7,2))
column_names2 = ['ColumnC', 'ColumnD']
df1 = pd.DataFrame(data1, index=dates1, columns=column_names1)
df2 = pd.DataFrame(data2, index=dates2, columns=column_names2)

In [None]:
df1.join(df2) # joins on the index

In [None]:
# Group by
df3 = df1.join(df2)
# add a column to df to group on
df3['ProfitLoss'] = pd.Series(['Profit', 'Loss', 'Profit', 'Profit', 'Profit', 'Loss'], index=dates)

In [None]:
df3.groupby('ProfitLoss').mean()

In [None]:
# Accessing Attributes

# Access the Index
df3.index

In [None]:
# Access the Values
df3.values

In [None]:
# Access the Columns
df3.columns