{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "3b938485", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'C:\\\\Users\\\\Karishma\\\\anaconda3\\\\envs\\\\ickle\\\\python.exe'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "sys.executable" ] }, { "cell_type": "markdown", "id": "accurate-secretariat", "metadata": {}, "source": [ "# Ickle\n", "*Data Analysis Library for Python*\n", "
\n", "\n", "This Jupyter Notebook serves as the documentation for Ickle.\n", "\n", "You can contribute to Ickle here: https://github.com/karishmashuklaa/ickle " ] }, { "cell_type": "markdown", "id": "educational-neighbor", "metadata": {}, "source": [ "## Table Of Contents\n", "1. [Getting Started](#Getting-Started)\n", "2. [DataFrame and Visual Representation](#DataFrame-and-Visual-Representation)\n", "3. [Basic Properties](#Basic-Properties)\n", "4. [Selection of Subsets](#Selection-of-Subsets)\n", "5. [Basic and Aggregation Methods](#Basic-And-Aggregation-Methods)\n", "6. [Non-Aggregation Methods](#Non-Aggregation-Methods)\n", "7. [Other Methods](#Other-Methods)\n", "8. [Arithmetic and Comparison Operators](#Arithmetic-And-Comparison-Operators)\n", "9. [String-Only Methods](#String-Only-Methods)\n", "10. [Pivot Table](#Pivot-Table)\n", "11. [Read CSV](#Read-CSV)\n", "12. [Read SQL](#Read-SQL)\n", "13. [Read Excel](#Read-Excel)" ] }, { "cell_type": "markdown", "id": "interstate-mainstream", "metadata": {}, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "id": "upper-genre", "metadata": {}, "source": [ "### Installation\n", "\n", "Ickle can be installed via pip.\n", "\n", "`pip install ickle`" ] }, { "cell_type": "markdown", "id": "technical-climb", "metadata": {}, "source": [ "### Import\n", "\n", "`import ickle as ick`" ] }, { "cell_type": "markdown", "id": "meaning-yemen", "metadata": {}, "source": [ "## DataFrame and Visual Representation\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "markdown", "id": "6043688e", "metadata": {}, "source": [ "### DataFrame\n", "A `DataFrame` holds two dimensional heterogenous data. It accepts dictionary as input, with Numpy arrays as values and strings as column names.\n", "\n", "Parameters:\n", "- `data`: A dictionary of strings mapped to Numpy arrays. The key will become the column name." ] }, { "cell_type": "code", "execution_count": 3, "id": "b0df5039", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import ickle as ick" ] }, { "cell_type": "code", "execution_count": 3, "id": "6c0f7f1f", "metadata": {}, "outputs": [], "source": [ "name = np.array(['John', 'Sam', 'Tina', 'Josh', 'Jack', 'Jill'])\n", "place = np.array(['Kolkata', 'Mumbai', 'Delhi', 'Mumbai', 'Mumbai', 'Mumbai'])\n", "weight = np.array([57, 70, 54, 59, 62, 70])\n", "married = np.array([True, False, True, False, False, False])\n", "\n", "data = {'name': name, 'place': place, 'weight': weight, 'married': married}\n", "df = ick.DataFrame(data)" ] }, { "cell_type": "markdown", "id": "9ab1033f", "metadata": {}, "source": [ "### Visual Representation" ] }, { "cell_type": "markdown", "id": "5fdae139", "metadata": {}, "source": [ "`DataFrame` can be displayed in the following manner" ] }, { "cell_type": "code", "execution_count": 4, "id": "90b4e248", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
name place weight married
0John Kolkata 57True
1Sam Mumbai 70False
2Tina Delhi 54True
3Josh Mumbai 59False
4Jack Mumbai 62False
5Jill Mumbai 70False
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "developing-webster", "metadata": {}, "source": [ "We will use the above `DataFrame` throughout the notebook" ] }, { "cell_type": "markdown", "id": "d75257e4", "metadata": {}, "source": [ "## Basic Properties\n", "1. [len](#len)\n", "2. [columns](#columns)\n", "3. [shape](#shape)\n", "4. [values](#values)\n", "5. [dtypes](#dtypes)\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "markdown", "id": "e2f62efd", "metadata": {}, "source": [ "### `len`\n", "returns: the number of rows in the `DataFrame`" ] }, { "cell_type": "code", "execution_count": 5, "id": "926a015f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df)" ] }, { "cell_type": "markdown", "id": "8889ccc9", "metadata": {}, "source": [ "### `columns`\n", "returns: list of column names" ] }, { "cell_type": "code", "execution_count": 6, "id": "1b833cb8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['name', 'place', 'weight', 'married']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "id": "deac7b62", "metadata": {}, "source": [ "### Modify exisiting column names" ] }, { "cell_type": "code", "execution_count": 7, "id": "2331a1dd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED
0John Kolkata 57True
1Sam Mumbai 70False
2Tina Delhi 54True
3Josh Mumbai 59False
4Jack Mumbai 62False
5Jill Mumbai 70False
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns = ['NAME', 'PLACE', 'WEIGHT', 'MARRIED']\n", "df" ] }, { "cell_type": "markdown", "id": "a73feb23", "metadata": {}, "source": [ "### `shape` \n", "returns: two-item tuple of number of rows and columns in the DataFrame" ] }, { "cell_type": "code", "execution_count": 8, "id": "854a584d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(6, 4)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "markdown", "id": "7bb8f220", "metadata": {}, "source": [ "### `values`\n", "returns: a single 2D NumPy array of all the columns of data." ] }, { "cell_type": "code", "execution_count": 9, "id": "c88f9fbe", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([['John', 'Kolkata', 57, True],\n", " ['Sam', 'Mumbai', 70, False],\n", " ['Tina', 'Delhi', 54, True],\n", " ['Josh', 'Mumbai', 59, False],\n", " ['Jack', 'Mumbai', 62, False],\n", " ['Jill', 'Mumbai', 70, False]], dtype=object)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.values" ] }, { "cell_type": "markdown", "id": "d845021f", "metadata": {}, "source": [ "### `dtypes`\n", "returns: a two-column `DataFrame` of column names in one column and their data type in the other" ] }, { "cell_type": "code", "execution_count": 10, "id": "384ddf93", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Column NameData Type
0NAME string
1PLACE string
2WEIGHT int
3MARRIED bool
" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "id": "ad6fdb90", "metadata": {}, "source": [ "## Selection of Subsets\n", "1. [Select a single column](#Select-a-single-column)\n", "2. [Select multiple columns](#Select-multiple-columns)\n", "3. [Boolean selection](#Boolean-selection)\n", "4. [Simultaneuous selection of row and column](#Simultaneuous-selection-of-row-and-column)\n", "6. [Add new / Overwrite existing columns](#Add-new-/-Overwrite-existing-columns)\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "markdown", "id": "82b21a31", "metadata": {}, "source": [ "### Select a single column\n", "by passing the name of column as a string" ] }, { "cell_type": "code", "execution_count": 11, "id": "992355b5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0John
1Sam
2Tina
3Josh
4Jack
5Jill
" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['NAME']" ] }, { "cell_type": "markdown", "id": "679d01ce", "metadata": {}, "source": [ "### Select multiple columns \n", "by passing column names as a list of strings" ] }, { "cell_type": "code", "execution_count": 12, "id": "4c2b9eb9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE
0John Kolkata
1Sam Mumbai
2Tina Delhi
3Josh Mumbai
4Jack Mumbai
5Jill Mumbai
" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['NAME', 'PLACE']]" ] }, { "cell_type": "markdown", "id": "11d85dc0", "metadata": {}, "source": [ "### Boolean Selection" ] }, { "cell_type": "code", "execution_count": 13, "id": "adb77b5d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED
0Sam Mumbai 70False
1Jack Mumbai 62False
2Jill Mumbai 70False
" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bool_sel = df['WEIGHT'] > 60\n", "\n", "df[bool_sel]" ] }, { "cell_type": "markdown", "id": "8c280a69", "metadata": {}, "source": [ "### Simultaneuous selection of row and column\n", "df[row, col]" ] }, { "cell_type": "code", "execution_count": 14, "id": "b376c0f8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 57
" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[0,2]" ] }, { "cell_type": "markdown", "id": "mathematical-twins", "metadata": {}, "source": [ "### Select columns as strings" ] }, { "cell_type": "code", "execution_count": 15, "id": "rotary-drunk", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 57
" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[0, 'WEIGHT']" ] }, { "cell_type": "markdown", "id": "white-incidence", "metadata": {}, "source": [ "### Select rows as slices" ] }, { "cell_type": "code", "execution_count": 16, "id": "convenient-drilling", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 57
" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[:1, 'WEIGHT']" ] }, { "cell_type": "markdown", "id": "faced-figure", "metadata": {}, "source": [ "### Select rows as booleans and lists" ] }, { "cell_type": "code", "execution_count": 17, "id": "facial-lucas", "metadata": {}, "outputs": [], "source": [ "bool_row = df['MARRIED']" ] }, { "cell_type": "code", "execution_count": 18, "id": "magnetic-offering", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 57
1 54
" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[bool_row, 'WEIGHT']" ] }, { "cell_type": "markdown", "id": "4e16ad13", "metadata": {}, "source": [ "### Add new / Overwrite existing columns" ] }, { "cell_type": "code", "execution_count": 19, "id": "338eb506", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 70False 41
2Tina Delhi 54True 22
3Josh Mumbai 59False 42
4Jack Mumbai 62False 32
5Jill Mumbai 70False 25
" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['AGE'] = np.array([21, 41, 22, 42, 32, 25])\n", "df" ] }, { "cell_type": "markdown", "id": "8dc613ac", "metadata": {}, "source": [ "## Basic And Aggregation Methods\n", "Basic Methods:\n", "1. [head()](#head)\n", "2. [tail()](#tail)\n", "\n", "Aggregation Methods:\n", "1. [min()](#min)\n", "2. [max()](#max)\n", "3. [mean()](#mean)\n", "4. [median()](#median)\n", "5. [sum()](#sum)\n", "6. [var()](#var)\n", "7. [std()](#std)\n", "8. [all()](#all)\n", "9. [any()](#any)\n", "10. [argmax()](#argmax)\n", "11. [argmin()](#argmin)\n", "\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "markdown", "id": "f9ef40b9", "metadata": {}, "source": [ "## head\n", "\n", "**head(n)**\n", "\n", "returns: the first n rows. By default n=5 " ] }, { "cell_type": "code", "execution_count": 20, "id": "708ef787", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 70False 41
2Tina Delhi 54True 22
3Josh Mumbai 59False 42
4Jack Mumbai 62False 32
" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "moving-briefing", "metadata": {}, "source": [ "## tail\n", "\n", "**tail(n)**\n", "return the last n rows.By default n=5" ] }, { "cell_type": "code", "execution_count": 21, "id": "fdebc49f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0Sam Mumbai 70False 41
1Tina Delhi 54True 22
2Josh Mumbai 59False 42
3Jack Mumbai 62False 32
4Jill Mumbai 70False 25
" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail()" ] }, { "cell_type": "markdown", "id": "a00bfe6b", "metadata": {}, "source": [ "## Aggregation Methods\n", "\n", "**All aggregation methods are applied only column-wise and not row-wise.**\n", "\n", "1. [min()](#min)\n", "2. [max()](#max)\n", "3. [median()](#median)\n", "4. [mean()](#mean)\n", "5. [sum()](#sum)\n", "6. [var()](#var)\n", "7. [std()](#std)\n", "8. [all()](#all)\n", "9. [any()](#any)\n", "10. [argmax()](#argmax)\n", "11. [argmin()](#argmin)" ] }, { "cell_type": "markdown", "id": "described-namibia", "metadata": {}, "source": [ "## min\n", "\n", "**min()**\n", "\n", "used to get a minimum value for each column" ] }, { "cell_type": "code", "execution_count": 22, "id": "d1a31bc0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0Jack Delhi 54False 21
" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.min()" ] }, { "cell_type": "markdown", "id": "invalid-resolution", "metadata": {}, "source": [ "## max\n", "\n", "**max()**\n", "\n", "used to get a maximum value for each column" ] }, { "cell_type": "code", "execution_count": 23, "id": "54a793d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0Tina Mumbai 70True 42
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.max()" ] }, { "cell_type": "markdown", "id": "covered-charter", "metadata": {}, "source": [ "## median\n", "\n", "**median()**\n", "\n", "computes median for each numeric column" ] }, { "cell_type": "code", "execution_count": 24, "id": "06e6994d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT MARRIED AGE
0 60.500 0.000 28.500
" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.median()" ] }, { "cell_type": "markdown", "id": "occupational-lease", "metadata": {}, "source": [ "## mean\n", "\n", "**mean()**\n", "\n", "computes mean for each numeric column" ] }, { "cell_type": "code", "execution_count": 25, "id": "a595c55d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT MARRIED AGE
0 62.000 0.333 30.500
" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean()" ] }, { "cell_type": "markdown", "id": "collective-limitation", "metadata": {}, "source": [ "## sum\n", "\n", "**sum()**\n", "\n", "returns the sum of all values for each column" ] }, { "cell_type": "code", "execution_count": 26, "id": "366c80e9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0JohnSamTinaJoshJackJillKolkataMumbaiDelhiMumbaiMumbaiMumbai 372 2 183
" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sum()" ] }, { "cell_type": "markdown", "id": "pediatric-auditor", "metadata": {}, "source": [ "## var\n", "\n", "**var()**\n", "\n", "computes variance for each numeric column" ] }, { "cell_type": "code", "execution_count": 27, "id": "b94e56d0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT MARRIED AGE
0 37.667 0.222 72.917
" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.var()" ] }, { "cell_type": "markdown", "id": "working-rugby", "metadata": {}, "source": [ "## std\n", "\n", "**std()**\n", "\n", "computes standard deviation for each numeric column" ] }, { "cell_type": "code", "execution_count": 28, "id": "9527eac3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT MARRIED AGE
0 6.137 0.471 8.539
" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.std()" ] }, { "cell_type": "markdown", "id": "collect-modeling", "metadata": {}, "source": [ "## all\n", "\n", "**all()**\n", "\n", "it tests whether all array elements along the columns evaluate to True" ] }, { "cell_type": "code", "execution_count": 29, "id": "5269b024", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0Jill Mumbai TrueFalseTrue
" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.all()" ] }, { "cell_type": "markdown", "id": "physical-beads", "metadata": {}, "source": [ "## any\n", "\n", "**any()**\n", "\n", "it tests whether any array element along the columns evaluate to True" ] }, { "cell_type": "code", "execution_count": 30, "id": "ad0f9aad", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata TrueTrueTrue
" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.any()" ] }, { "cell_type": "markdown", "id": "palestinian-murder", "metadata": {}, "source": [ "## argmax\n", "\n", "**argmax()**\n", "\n", "returns the indices of the maximum values" ] }, { "cell_type": "code", "execution_count": 31, "id": "b165d213", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0 2 1 1 0 3
" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.argmax()" ] }, { "cell_type": "markdown", "id": "spoken-animation", "metadata": {}, "source": [ "## argmin\n", "\n", "**argmin()**\n", "\n", "returns the indices of the minimum values" ] }, { "cell_type": "code", "execution_count": 32, "id": "f1671979", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0 4 2 2 1 0
" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.argmin()" ] }, { "cell_type": "markdown", "id": "other-diving", "metadata": {}, "source": [ "## Other Methods \n", "1. [isna()](#isna)\n", "2. [count()](#count)\n", "3. [unique()](#unique)\n", "4. [nunique()](#nunique)\n", "5. [value_counts()](#value_counts)\n", "6. [rename()](#rename)\n", "7. [drop()](#drop)\n", "8. [diff()](#diff)\n", "9. [pct_change()](#pct_change)\n", "10. [sort_values()](#sort_values)\n", "11. [sample()](#sample)" ] }, { "cell_type": "markdown", "id": "dcdf622d", "metadata": {}, "source": [ "### isna\n", "\n", "**isna()**\n", "\n", "Determines whether each value in the DataFrame is missing or not" ] }, { "cell_type": "code", "execution_count": 33, "id": "8b47a2a3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0FalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalse
3FalseFalseFalseFalseFalse
4FalseFalseFalseFalseFalse
5FalseFalseFalseFalseFalse
" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isna()" ] }, { "cell_type": "markdown", "id": "221c1433", "metadata": {}, "source": [ "### count\n", "\n", "**count()**\n", "\n", "Counts the number of non-missing values per column" ] }, { "cell_type": "code", "execution_count": 107, "id": "da271425", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0 6 6 6 6 6
" ], "text/plain": [ "" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.count()" ] }, { "cell_type": "markdown", "id": "3010fd76", "metadata": {}, "source": [ "### unique\n", "\n", "**unique()**\n", "\n", "Finds the unique values of each column" ] }, { "cell_type": "code", "execution_count": 35, "id": "8abf8bd4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
MARRIED
0False
1True
" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs = df.unique()\n", "dfs[3]" ] }, { "cell_type": "markdown", "id": "6ddbe1c1", "metadata": {}, "source": [ "### nunique\n", "\n", "**nunique()**\n", "\n", "Finds the **number** of unique values in each column.\n", "\n", "Identical to `unique()`" ] }, { "cell_type": "code", "execution_count": 36, "id": "ac159969", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0 6 3 5 2 6
" ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.nunique()" ] }, { "cell_type": "markdown", "id": "9894920b", "metadata": {}, "source": [ "### value_counts\n", "\n", "**value_counts(normalize=False)**\n", "\n", "Returns the frequency of each unique value for each column.\n", "\n", "Parameters\n", "\n", "`normalize`: bool\n", " \n", " If True, returns the relative frequencies(percent)" ] }, { "cell_type": "code", "execution_count": 37, "id": "54b3942e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
PLACE count
0Mumbai 4
1Delhi 1
2Kolkata 1
" ], "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfs = df.value_counts()\n", "dfs[1]" ] }, { "cell_type": "markdown", "id": "acf33c05", "metadata": {}, "source": [ "### rename\n", "\n", "**rename(columns)**\n", "\n", "Renames columns in the DataFrame\n", "\n", "Parameters\n", " \n", "`columns`: dict\n", "\n", " A dictionary mapping the old column name to the new column name\n", "\n" ] }, { "cell_type": "code", "execution_count": 38, "id": "e052881c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT (kg)MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 70False 41
2Tina Delhi 54True 22
3Josh Mumbai 59False 42
4Jack Mumbai 62False 32
5Jill Mumbai 70False 25
" ], "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.rename({'WEIGHT': 'WEIGHT (kg)'})" ] }, { "cell_type": "markdown", "id": "815e0c44", "metadata": {}, "source": [ "### drop\n", "\n", "**drop(columns)**\n", "\n", "Drops one or more columns from a DataFrame\n", "\n", "Parameters\n", "\n", "`columns`: str or list of strings" ] }, { "cell_type": "code", "execution_count": 39, "id": "9fde87bb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED
0John Kolkata 57True
1Sam Mumbai 70False
2Tina Delhi 54True
3Josh Mumbai 59False
4Jack Mumbai 62False
5Jill Mumbai 70False
" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop('AGE')" ] }, { "cell_type": "markdown", "id": "22cb2eb0", "metadata": {}, "source": [ "### diff\n", "\n", "**diff(n=1)**\n", "\n", "Take the difference between the current value and the nth value above it\n", "\n", "Parameters\n", "\n", "`n`: int" ] }, { "cell_type": "code", "execution_count": 111, "id": "fee1e7ae", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata nan nan nan
1Sam Mumbai nan nan nan
2Tina Delhi -3.000 0.000 1.000
3Josh Mumbai -11.000 0.000 1.000
4Jack Mumbai 8.000 -1.000 10.000
5Jill Mumbai 11.000 0.000 -17.000
" ], "text/plain": [ "" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.diff(n=2)" ] }, { "cell_type": "markdown", "id": "a53ae36a", "metadata": {}, "source": [ "### pct_change\n", "\n", "**pct_change(n=1)**\n", "\n", "Take the percentage difference between the current value and the nth value above it\n", "\n", "Parameters\n", "\n", "`n`: int" ] }, { "cell_type": "code", "execution_count": 104, "id": "d787b41f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata nan nan nan
1Sam Mumbai 0.228 -1.000 0.952
2Tina Delhi -0.229 inf -0.463
3Josh Mumbai 0.093 -1.000 0.909
4Jack Mumbai 0.051 nan -0.238
5Jill Mumbai 0.129 nan -0.219
" ], "text/plain": [ "" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pct_change(n=1)" ] }, { "cell_type": "markdown", "id": "d5704449", "metadata": {}, "source": [ "### sort_values\n", "\n", "**sort_values(by, asc=True)**\n", "\n", "Sort the DataFrame by one or more values\n", "\n", "Parameters\n", "\n", "`by`: str or list of column names\n", " \n", "`asc`: boolean of sorting order\n" ] }, { "cell_type": "code", "execution_count": 115, "id": "2aaaf0a7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0Josh Mumbai 59False 42
1Sam Mumbai 70False 41
2Jack Mumbai 62False 32
3Jill Mumbai 70False 25
4Tina Delhi 54True 22
5John Kolkata 57True 21
" ], "text/plain": [ "" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values('AGE', asc=False)" ] }, { "cell_type": "markdown", "id": "d18179b1", "metadata": {}, "source": [ "### sample\n", "\n", "**sample(n=None, frac=None, replace=False, seed=None)**\n", "\n", "Randomly samples rows of the DataFrame\n", "\n", "Parameters\n", "\n", "`n`: int\n", "\n", " number of rows to return\n", " \n", "`frac`: float\n", "\n", " Proportion of the data to sample\n", " \n", "`replace`: bool\n", "\n", " Whether or not to sample with replacement\n", " \n", "`seed`: int\n", "\n", " Seed the random number generator" ] }, { "cell_type": "code", "execution_count": 116, "id": "c34ffa3b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0Jack Mumbai 62False 32
1Sam Mumbai 70False 41
2Tina Delhi 54True 22
" ], "text/plain": [ "" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sample(n=4, frac=0.5, replace=True)" ] }, { "cell_type": "markdown", "id": "bfd30c5a", "metadata": {}, "source": [ "## Non Aggregation Methods\n", "1. [abs()](#abs)\n", "2. [cummin()](#cummin)\n", "3. [cummax()](#cummax)\n", "4. [cumsum()](#cumsum)\n", "5. [clip()](#clip)\n", "6. [round()](#round)\n", "7. [copy()](#copy)\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "markdown", "id": "39abfea4", "metadata": {}, "source": [ "### abs\n", "\n", "**abs()**\n", "\n", "Takes the absolute value of each value in the DataFrame" ] }, { "cell_type": "code", "execution_count": 40, "id": "153a3d1a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 70False 41
2Tina Delhi 54True 22
3Josh Mumbai 59False 42
4Jack Mumbai 62False 32
5Jill Mumbai 70False 25
" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.abs()" ] }, { "cell_type": "markdown", "id": "74f0a742", "metadata": {}, "source": [ "### cummin\n", "\n", "**cummin()**\n", "\n", "Finds cumulative minimum by column" ] }, { "cell_type": "code", "execution_count": 41, "id": "d7d02237", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 57False 21
2Tina Delhi 54False 21
3Josh Mumbai 54False 21
4Jack Mumbai 54False 21
5Jill Mumbai 54False 21
" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cummin()" ] }, { "cell_type": "markdown", "id": "63ab86e9", "metadata": {}, "source": [ "### cummax\n", "\n", "**cummax()**\n", "\n", "Finds cumulative maximum by column" ] }, { "cell_type": "code", "execution_count": 42, "id": "966e80a5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 70True 41
2Tina Delhi 70True 41
3Josh Mumbai 70True 42
4Jack Mumbai 70True 42
5Jill Mumbai 70True 42
" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cummax()" ] }, { "cell_type": "markdown", "id": "e5d356de", "metadata": {}, "source": [ "### cumsum\n", "\n", "**cumsum()**\n", "\n", "Finds cumulative sum by column" ] }, { "cell_type": "code", "execution_count": 43, "id": "12b48747", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57 1 21
1Sam Mumbai 127 1 62
2Tina Delhi 181 2 84
3Josh Mumbai 240 2 126
4Jack Mumbai 302 2 158
5Jill Mumbai 372 2 183
" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cumsum()" ] }, { "cell_type": "markdown", "id": "496ea5b3", "metadata": {}, "source": [ "### clip\n", "\n", "**clip(lower=None, upper=None)**\n", "\n", "All values less than `lower` will be set to `lower`\n", "\n", "All values greater than `upper` will be set to `upper`\n", "\n", "Parameters\n", "\n", "`lower`: number or None\n", " \n", "`upper`: number or None" ] }, { "cell_type": "code", "execution_count": 44, "id": "f786de6f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57 55 55
1Sam Mumbai 60 55 55
2Tina Delhi 55 55 55
3Josh Mumbai 59 55 55
4Jack Mumbai 60 55 55
5Jill Mumbai 60 55 55
" ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.clip(lower=55, upper=60)" ] }, { "cell_type": "markdown", "id": "272ca128", "metadata": {}, "source": [ "### round\n", "\n", "**round(n)**\n", "\n", "Rounds values to the nearest `n` decimals" ] }, { "cell_type": "code", "execution_count": 45, "id": "29e5b1c0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 70False 41
2Tina Delhi 54True 22
3Josh Mumbai 59False 42
4Jack Mumbai 62False 32
5Jill Mumbai 70False 25
" ], "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.round(n=1)" ] }, { "cell_type": "markdown", "id": "439ed9ad", "metadata": {}, "source": [ "### copy\n", "\n", "**copy()**\n", "\n", "Copies the DataFrame" ] }, { "cell_type": "code", "execution_count": 46, "id": "97de4076", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME PLACE WEIGHT MARRIED AGE
0John Kolkata 57True 21
1Sam Mumbai 70False 41
2Tina Delhi 54True 22
3Josh Mumbai 59False 42
4Jack Mumbai 62False 32
5Jill Mumbai 70False 25
" ], "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.copy()" ] }, { "cell_type": "markdown", "id": "5beea9aa", "metadata": {}, "source": [ "## Arithmetic And Comparison Operators \n", "1. [Addition](#Addition)\n", "2. [Subtraction](#Subtraction)\n", "3. [Multiplication](#Multiplication)\n", "4. [Division](#Division)\n", "5. [Floor Division](#Floor-Division)\n", "6. [Power](#Power)\n", "7. [Greater than](#Greater-than)\n", "8. [Less than](#Less-than)\n", "9. [Greater than equal to](#Greater-than-equal)\n", "10. [Lesser than equal to](#Lesser-than-equal)\n", "11. [Not Equal](#Not-Equal)\n", "12. [Equal](#Equal)\n", "\n", "*Arithmetic and Comparison Operators only work with numerical columns*\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "code", "execution_count": 110, "id": "assumed-burner", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 57
1 70
2 54
3 59
4 62
5 70
" ], "text/plain": [ "" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op = df['WEIGHT']\n", "df_op" ] }, { "cell_type": "markdown", "id": "hungarian-jacob", "metadata": {}, "source": [ "### Addition" ] }, { "cell_type": "code", "execution_count": 50, "id": "catholic-wiring", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 59
1 72
2 56
3 61
4 64
5 72
" ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op + 2" ] }, { "cell_type": "code", "execution_count": 51, "id": "imported-screen", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 59
1 72
2 56
3 61
4 64
5 72
" ], "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 + df_op" ] }, { "cell_type": "markdown", "id": "individual-respect", "metadata": {}, "source": [ "### Subtraction" ] }, { "cell_type": "code", "execution_count": 52, "id": "gentle-boring", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 55
1 68
2 52
3 57
4 60
5 68
" ], "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op - 2" ] }, { "cell_type": "code", "execution_count": 53, "id": "duplicate-cancer", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 -55
1 -68
2 -52
3 -57
4 -60
5 -68
" ], "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 - df_op" ] }, { "cell_type": "markdown", "id": "offensive-praise", "metadata": {}, "source": [ "### Multiplication " ] }, { "cell_type": "code", "execution_count": 54, "id": "surrounded-dealing", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 114
1 140
2 108
3 118
4 124
5 140
" ], "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op * 2" ] }, { "cell_type": "code", "execution_count": 55, "id": "devoted-warrior", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 114
1 140
2 108
3 118
4 124
5 140
" ], "text/plain": [ "" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 * df_op" ] }, { "cell_type": "markdown", "id": "stupid-cuisine", "metadata": {}, "source": [ "### Division " ] }, { "cell_type": "code", "execution_count": 56, "id": "express-interim", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 28.500
1 35.000
2 27.000
3 29.500
4 31.000
5 35.000
" ], "text/plain": [ "" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op / 2" ] }, { "cell_type": "code", "execution_count": 57, "id": "lonely-kitty", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 0.035
1 0.029
2 0.037
3 0.034
4 0.032
5 0.029
" ], "text/plain": [ "" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 / df_op " ] }, { "cell_type": "markdown", "id": "activated-arrival", "metadata": {}, "source": [ "### Floor Division " ] }, { "cell_type": "code", "execution_count": 58, "id": "moderate-madison", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 28
1 35
2 27
3 29
4 31
5 35
" ], "text/plain": [ "" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op // 2" ] }, { "cell_type": "code", "execution_count": 59, "id": "expensive-england", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 0
1 0
2 0
3 0
4 0
5 0
" ], "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 // df_op" ] }, { "cell_type": "markdown", "id": "accepted-birthday", "metadata": {}, "source": [ "### Power " ] }, { "cell_type": "code", "execution_count": 60, "id": "instructional-happening", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 185193
1 343000
2 157464
3 205379
4 238328
5 343000
" ], "text/plain": [ "" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op ** 3" ] }, { "cell_type": "code", "execution_count": 61, "id": "false-parking", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0 886634019
1-102221863
2 32838297
3-610228421
4 703701817
5-102221863
" ], "text/plain": [ "" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 ** df_op" ] }, { "cell_type": "markdown", "id": "prompt-bacteria", "metadata": {}, "source": [ "### Greater than" ] }, { "cell_type": "code", "execution_count": 62, "id": "hungarian-image", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0True
1True
2True
3True
4True
5True
" ], "text/plain": [ "" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op > 50" ] }, { "cell_type": "markdown", "id": "asian-drinking", "metadata": {}, "source": [ "### Less than" ] }, { "cell_type": "code", "execution_count": 63, "id": "efficient-hearts", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0False
1False
2True
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op < 55" ] }, { "cell_type": "markdown", "id": "confirmed-railway", "metadata": {}, "source": [ "### Greater than equal " ] }, { "cell_type": "code", "execution_count": 64, "id": "funny-referral", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0False
1False
2False
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op >= 75" ] }, { "cell_type": "markdown", "id": "elementary-morgan", "metadata": {}, "source": [ "### Lesser than equal " ] }, { "cell_type": "code", "execution_count": 65, "id": "overall-friend", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0False
1False
2True
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op <= 55" ] }, { "cell_type": "markdown", "id": "considerable-volleyball", "metadata": {}, "source": [ "### Not Equal " ] }, { "cell_type": "code", "execution_count": 66, "id": "instrumental-simon", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0True
1True
2True
3True
4True
5True
" ], "text/plain": [ "" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op != 55" ] }, { "cell_type": "markdown", "id": "automated-memphis", "metadata": {}, "source": [ "### Equal " ] }, { "cell_type": "code", "execution_count": 67, "id": "dried-genesis", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
WEIGHT
0False
1True
2False
3False
4False
5True
" ], "text/plain": [ "" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_op == 70" ] }, { "cell_type": "markdown", "id": "flying-tumor", "metadata": {}, "source": [ "## String Only Methods \n", "All the strings behave in the same manner as built-in string functions in Python.\n", "These methods can be used only with `columns`\n", "\n", "1. [capitalize()](#capitalize)\n", "2. [center()](#center)\n", "3. [count()](#count)\n", "4. [endswith()](#endswith)\n", "5. [startswith()](#startswith)\n", "6. [find()](#find)\n", "7. [len()](#len)\n", "8. [get()](#get)\n", "9. [index()](#index)\n", "10. [isalnum()](#isalnum)\n", "11. [isalpha()](#isalpha)\n", "12. [isdecimal()](#isdecimal)\n", "13. [islower()](#islower)\n", "14. [isnumeric()](#isnumeric)\n", "15. [isspace()](#isspace)\n", "16. [istitle()](#istitle)\n", "17. [isupper()](#isupper)\n", "18. [lstrip()](#lstrip)\n", "19. [rstrip()](#rstrip)\n", "20. [strip()](#strip)\n", "21. [replace()](#replace)\n", "22. [swapcase()](#swapcase)\n", "23. [title()](#title)\n", "24. [lower()](#lower)\n", "25. [upper()](#upper)\n", "26. [zfill()](#zfill)\n", "27. [encode()](#encode)\n", "\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "markdown", "id": "turkish-bruce", "metadata": {}, "source": [ "### capitalize\n", "\n", "**capitalize(col)**" ] }, { "cell_type": "code", "execution_count": 68, "id": "laughing-delaware", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0John
1Sam
2Tina
3Josh
4Jack
5Jill
" ], "text/plain": [ "" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.capitalize('NAME')" ] }, { "cell_type": "markdown", "id": "atmospheric-youth", "metadata": {}, "source": [ "### center\n", "\n", "**center(col, width, fillchar=None)**" ] }, { "cell_type": "code", "execution_count": 69, "id": "historic-millennium", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0aaaJohnaaa
1aaaSamaaaa
2aaaTinaaaa
3aaaJoshaaa
4aaaJackaaa
5aaaJillaaa
" ], "text/plain": [ "" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.center('NAME', 10, 'a')" ] }, { "cell_type": "markdown", "id": "moral-psychiatry", "metadata": {}, "source": [ "### count\n", "\n", "**count(col, sub, start=None, stop=None)**" ] }, { "cell_type": "code", "execution_count": 70, "id": "senior-ethiopia", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
PLACE
0 0
1 1
2 0
3 1
4 1
5 1
" ], "text/plain": [ "" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.count('PLACE', 'Mumbai')" ] }, { "cell_type": "markdown", "id": "material-fashion", "metadata": {}, "source": [ "### endswith\n", "\n", "**endswith(col, suffix, start=None, stop=None)**" ] }, { "cell_type": "code", "execution_count": 71, "id": "plastic-retirement", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0True
1False
2False
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.endswith('NAME', 'n')" ] }, { "cell_type": "markdown", "id": "danish-property", "metadata": {}, "source": [ "### startswith\n", "\n", "**startswith(col, suffix, start=None, stop=None)**" ] }, { "cell_type": "code", "execution_count": 72, "id": "collective-decade", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0True
1False
2False
3True
4True
5True
" ], "text/plain": [ "" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.startswith('NAME', 'J')" ] }, { "cell_type": "markdown", "id": "democratic-courage", "metadata": {}, "source": [ "### find\n", "\n", "**find(col, sub, start=None, stop=None)**" ] }, { "cell_type": "code", "execution_count": 73, "id": "comfortable-humanitarian", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0 -1
1 -1
2 0
3 -1
4 -1
5 -1
" ], "text/plain": [ "" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.find('NAME', 'Tina')" ] }, { "cell_type": "markdown", "id": "limiting-developer", "metadata": {}, "source": [ "### len\n", "\n", "**len(col)**" ] }, { "cell_type": "code", "execution_count": 74, "id": "overall-nashville", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0 4
1 3
2 4
3 4
4 4
5 4
" ], "text/plain": [ "" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.len('NAME')" ] }, { "cell_type": "markdown", "id": "deadly-destiny", "metadata": {}, "source": [ "### get\n", "\n", "**get(col, item)**" ] }, { "cell_type": "code", "execution_count": 75, "id": "hundred-maine", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0J
1S
2T
3J
4J
5J
" ], "text/plain": [ "" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.get('NAME', 0)" ] }, { "cell_type": "markdown", "id": "placed-service", "metadata": {}, "source": [ "### index\n", "\n", "**index(col, sub, start=None, stop=None)**" ] }, { "cell_type": "code", "execution_count": 76, "id": "illegal-cattle", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
NAME
0 0
1 0
2 0
3 0
4 0
5 0
" ], "text/plain": [ "" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.index('NAME', '')" ] }, { "cell_type": "markdown", "id": "legendary-murder", "metadata": {}, "source": [ "### isalnum\n", "\n", "**isalnum(col)**" ] }, { "cell_type": "code", "execution_count": 77, "id": "completed-praise", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0True
1True
2True
3True
4True
5True
" ], "text/plain": [ "" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.isalnum('NAME')" ] }, { "cell_type": "markdown", "id": "structured-introduction", "metadata": {}, "source": [ "### isalpha\n", "\n", "**isalpha(col)**" ] }, { "cell_type": "code", "execution_count": 78, "id": "recent-globe", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0True
1True
2True
3True
4True
5True
" ], "text/plain": [ "" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.isalpha('NAME')" ] }, { "cell_type": "markdown", "id": "developed-setting", "metadata": {}, "source": [ "### isdecimal\n", "\n", "**isdecimal(col)**" ] }, { "cell_type": "code", "execution_count": 79, "id": "double-links", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0False
1False
2False
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.isdecimal('NAME')" ] }, { "cell_type": "markdown", "id": "mature-exploration", "metadata": {}, "source": [ "### isnumeric\n", "\n", "**isnumeric(col)**" ] }, { "cell_type": "code", "execution_count": 80, "id": "early-result", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0False
1False
2False
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.isnumeric('NAME')" ] }, { "cell_type": "markdown", "id": "surprised-audit", "metadata": {}, "source": [ "### isspace\n", "\n", "**isspace(col)**" ] }, { "cell_type": "code", "execution_count": 81, "id": "persistent-dialogue", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0False
1False
2False
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.isspace('NAME')" ] }, { "cell_type": "markdown", "id": "published-delicious", "metadata": {}, "source": [ "### istitle\n", "\n", "**istitle(col)**" ] }, { "cell_type": "code", "execution_count": 82, "id": "seven-signature", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0True
1True
2True
3True
4True
5True
" ], "text/plain": [ "" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.istitle('NAME')" ] }, { "cell_type": "markdown", "id": "brutal-shower", "metadata": {}, "source": [ "### isupper\n", "\n", "**isupper(col)**" ] }, { "cell_type": "code", "execution_count": 84, "id": "ranking-origin", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0False
1False
2False
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.isupper('NAME')" ] }, { "cell_type": "markdown", "id": "important-knowing", "metadata": {}, "source": [ "### islower\n", "\n", "**islower(col)**" ] }, { "cell_type": "code", "execution_count": 83, "id": "designed-memorial", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0False
1False
2False
3False
4False
5False
" ], "text/plain": [ "" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.islower('NAME')" ] }, { "cell_type": "markdown", "id": "45a61a3e", "metadata": {}, "source": [ "### lstrip\n", "\n", "**lstrip(col, chars)**" ] }, { "cell_type": "code", "execution_count": 85, "id": "crucial-texas", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0John
1Sam
2Tina
3Josh
4Jack
5Jill
" ], "text/plain": [ "" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.lstrip('NAME', 'o')" ] }, { "cell_type": "markdown", "id": "3dc0e805", "metadata": {}, "source": [ "### rstrip\n", "\n", "**rstrip(col, chars)**" ] }, { "cell_type": "code", "execution_count": 86, "id": "effective-internship", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0John
1Sam
2Tina
3Josh
4Jack
5Jill
" ], "text/plain": [ "" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.rstrip('NAME', 'o')" ] }, { "cell_type": "markdown", "id": "2c209eb3", "metadata": {}, "source": [ "### strip\n", "\n", "**strip(col, chars)**" ] }, { "cell_type": "code", "execution_count": 87, "id": "demographic-going", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0John
1Sam
2Tina
3Josh
4Jack
5Jill
" ], "text/plain": [ "" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.strip('NAME', 'o')" ] }, { "cell_type": "markdown", "id": "attractive-focus", "metadata": {}, "source": [ "### replace\n", "\n", "**replace(col, old, new, count=None)**" ] }, { "cell_type": "code", "execution_count": 88, "id": "bored-license", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0Cena
1Sam
2Tina
3Josh
4Jack
5Jill
" ], "text/plain": [ "" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.replace('NAME', 'John', 'Cena')" ] }, { "cell_type": "markdown", "id": "knowing-lodge", "metadata": {}, "source": [ "### swapcase\n", "\n", "**swapcase(col)**" ] }, { "cell_type": "code", "execution_count": 89, "id": "respected-sussex", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0jOHN
1sAM
2tINA
3jOSH
4jACK
5jILL
" ], "text/plain": [ "" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.swapcase('NAME')" ] }, { "cell_type": "markdown", "id": "disabled-photograph", "metadata": {}, "source": [ "### title\n", "\n", "**title(col)**" ] }, { "cell_type": "code", "execution_count": 90, "id": "covered-advisory", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0John
1Sam
2Tina
3Josh
4Jack
5Jill
" ], "text/plain": [ "" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.title('NAME')" ] }, { "cell_type": "markdown", "id": "entire-silly", "metadata": {}, "source": [ "### lower\n", "\n", "**lower(col)**" ] }, { "cell_type": "code", "execution_count": 91, "id": "detected-charles", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0john
1sam
2tina
3josh
4jack
5jill
" ], "text/plain": [ "" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.lower('NAME')" ] }, { "cell_type": "markdown", "id": "enclosed-philadelphia", "metadata": {}, "source": [ "### upper\n", "\n", "**upper(col)**" ] }, { "cell_type": "code", "execution_count": 92, "id": "peaceful-stage", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0JOHN
1SAM
2TINA
3JOSH
4JACK
5JILL
" ], "text/plain": [ "" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.upper('NAME')" ] }, { "cell_type": "markdown", "id": "framed-superior", "metadata": {}, "source": [ "### zfill\n", "\n", "**zfill(col, width)**" ] }, { "cell_type": "code", "execution_count": 93, "id": "increased-professor", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0000000John
10000000Sam
2000000Tina
3000000Josh
4000000Jack
5000000Jill
" ], "text/plain": [ "" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.zfill('NAME', 10)" ] }, { "cell_type": "markdown", "id": "powerful-contact", "metadata": {}, "source": [ "### encode\n", "\n", "**encode(col, encoding='utf-8', errors='strict')**" ] }, { "cell_type": "code", "execution_count": 94, "id": "beginning-converter", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME
0b'John'
1b'Sam'
2b'Tina'
3b'Josh'
4b'Jack'
5b'Jill'
" ], "text/plain": [ "" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.str.encode('NAME')" ] }, { "cell_type": "markdown", "id": "unauthorized-chinese", "metadata": {}, "source": [ "## Pivot Table\n", "Creates a pivot table from one or two 'grouping' columns\n", "\n", "Parameters\n", "\n", "`rows`: str of column name to group by (Optional)\n", "\n", "`columns`: str of column name to group by (Optional)\n", "\n", "`values`: str of column name to aggregate (Required)\n", "\n", "`aggfunc`: str of aggregation function\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "code", "execution_count": 95, "id": "promotional-worse", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
NAME Delhi Kolkata Mumbai
0Jack nan nan 62.000
1Jill nan nan 70.000
2John nan 57.000 nan
3Josh nan nan 59.000
4Sam nan nan 70.000
5Tina 54.000 nan nan
" ], "text/plain": [ "" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(rows='NAME', columns='PLACE', values='WEIGHT', aggfunc='mean')" ] }, { "cell_type": "markdown", "id": "307d809e", "metadata": {}, "source": [ "## Read CSV\n", "Reads a CSV file into a dataframe\n", "\n", "Parameters\n", "\n", "`filelocation`: str of the file location\n", "\n", "`header`: int of the number of lines to skip in case of multi-header files, None in case of header-less files (Optional)\n", "\n", "**read_csv(file, header)**\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "code", "execution_count": 96, "id": "01015cbe", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
dept race gender salary
0Houston Police Department-HPDWhite Male 45279
1Houston Fire Department (HFD)White Male 63166
2Houston Police Department-HPDBlack Male 66614
3Public Works & Engineering-PWEAsian Male 71680
4Houston Airport System (HAS)White Male 42390
" ], "text/plain": [ "" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = ick.read_csv('./dataset/employee.csv');\n", "data.head()" ] }, { "cell_type": "code", "execution_count": 10, "id": "6e5c9b24", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
0 1 2 3
0Houston Police Department-HPDWhite Male 45279
1Houston Fire Department (HFD)White Male 63166
2Houston Police Department-HPDBlack Male 66614
3Public Works & Engineering-PWEAsian Male 71680
4Houston Airport System (HAS)White Male 42390
" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = ick.read_csv('./dataset/employee.csv', header=None);\n", "data.head()" ] }, { "cell_type": "markdown", "id": "105f6782", "metadata": {}, "source": [ "## Read SQL\n", "Reads the data from SQL query into a dataframe\n", "\n", "Parameters\n", "\n", "`sql`: str of the sql statement to be executed\n", "\n", "`drivername`: str of driver engine of the database\n", "\n", "`username`: str of username to connect to the database\n", "\n", "`password`: str of password to connect to the database\n", "\n", "`host`: str of host to connect to the database\n", "\n", "`port`: int of port to connect to the database\n", "\n", "`database`: str of database name \n", "\n", "**read_sql(sql, drivername, username, password, host, port, database)**\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "code", "execution_count": 11, "id": "47e12bf9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
id firstname lastname email
0 1John Doe upchh@example.com
1 2John Doe upchh@example.com
2 3John Doe upchh@example.com
3 4John Doe upchh@example.com
4 5John Doe upchh@example.com
5 6John Doe upchh@example.com
6 7John Doe upchh@example.com
7 8John Doe upchh@example.com
8 9John Doe upchh@example.com
" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drivername=\"postgresql\"\n", "username=\"postgres\"\n", "password=\"test123\" \n", "host=\"localhost\"\n", "port=\"5432\"\n", "database=\"spotify_trends\"\n", "sql=\"SELECT id, firstname, lastname, email FROM authors\"\n", "\n", "ick.read_sql(sql,drivername,username,password,host,port,database)" ] }, { "cell_type": "markdown", "id": "c8ee926c", "metadata": {}, "source": [ "## Read Excel\n", "Reads the data from a simple Excel file into a dataframe\n", "\n", "Parameters\n", "\n", "`file_path`: str of the path to the Excel file that you want to read\n", "`sheet_name`: str of sheet name\n", "\n", "Supported file extensions - xlsx, xlsm, xltx, xltm\n", "\n", "**read_excel(file_path, sheet_name)**\n", "\n", "*[Go to table of contents](#Table-Of-Contents)*" ] }, { "cell_type": "code", "execution_count": 4, "id": "582b7c33", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Name Age Country
0John 54 USA
1Sam 23 UK
2Max 44 Pakistan
" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ick.read_excel(file_path=\"./dataset/Book1.xlsx\", sheet_name=\"Sheet1\")" ] }, { "cell_type": "code", "execution_count": null, "id": "bf754bc7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "vscode": { "interpreter": { "hash": "84acd9daa1976748b0cb1aac2d1d7e831b987877e0682bbd8df981985727d27e" } } }, "nbformat": 4, "nbformat_minor": 5 }