{ "cells": [ { "cell_type": "markdown", "source": [ "# 10 minutes to Optimus\r\n", "\r\n", "**👋 Hi, Are you in Binder?**\r\n", "\r\n", "In Binder you can easily run Optimus. If you're not, you may want visit the link below\r\n", "\r\n", "https://mybinder.org/v2/gh/hi-primus/optimus/develop-21.9?filepath=https%3A%2F%2Fraw.githubusercontent.com%2Fhi-primus%2Foptimus%2Fdevelop-21.9%2Fexamples%2F10_min_to_optimus.ipynb" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Import Optimus and start it" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "from optimus import Optimus\r\n", "op = Optimus(\"pandas\")" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Dataframe creation\r\n", "\r\n", "Create a dataframe to passing a list of values for each column." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df = op.create.dataframe({\r\n", " \"words\": [\" I like fish \", \" zombies\", \"simpsons cat lady\", None],\r\n", " \"num\": [1, 2, 2, 3],\r\n", " \"animals\": [\"dog\", \"cat\", \"frog\", \"eagle\"],\r\n", " \"thing\": [\"housé\", \"tv\", \"table\", \"glass\"],\r\n", " \"two strings\": [\"cat-car\", \"dog-tv\", \"eagle-tv-plus\", \"lion-pc\"],\r\n", " \"filter\": [\"a\", \"b\", \"1\", \"c\"],\r\n", " \"num 2\": [\"1\", \"2\", \"3\", \"4\"],\r\n", " \"col_array\": [[\"baby\", \"sorry\"], [\"baby 1\", \"sorry 1\"], [\"baby 2\", \"sorry 2\"], [\"baby 3\", \"sorry 3\"]], \r\n", " \"col_int\": [[1, 2, 3], [3, 4], [5, 6, 7], [7, 8]]\r\n", "})\r\n", "\r\n", "df.display()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Creating a dataframe by passing a list of tuples specifyng the column data type." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "\r\n", "df = op.create.dataframe({\r\n", " (\"words\", \"str\"): [\" I like fish \", \" zombies\", \"simpsons cat lady\", None],\r\n", " (\"num\", \"int\"): [1, 2, 2, 3],\r\n", " (\"animals\", \"str\"): [\"dog\", \"cat\", \"frog\", \"eagle\"],\r\n", " (\"thing\", \"str\"): [\"housé\", \"tv\", \"table\", \"glass\"],\r\n", " (\"two strings\", \"str\"): [\"cat-car\", \"dog-tv\", \"eagle-tv-plus\", \"lion-pc\"],\r\n", " (\"filter\", \"str\"): [\"a\", \"b\", \"1\", \"c\"],\r\n", " (\"num 2\", \"string\"): [\"1\", \"2\", \"3\", \"4\"],\r\n", " \"col_array\": [[\"baby\", \"sorry\"], [\"baby 1\", \"sorry 1\"], [\"baby 2\", \"sorry 2\"], [\"baby 3\", \"sorry 3\"]], \r\n", " \"col_int\": [[1, 2, 3], [3, 4], [5, 6, 7], [7, 8]]\r\n", "})\r\n", "\r\n", "df.display()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Creating an Optimus dataframe using a pandas dataframe" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "import pandas as pd\r\n", "\r\n", "data = [(\"bumbl#ebéé \", 17.5, \"Espionage\", 7),\r\n", " (\"Optim'us\", 28.0, \"Leader\", 10),\r\n", " (\"ironhide&\", 26.0, \"Security\", 7)]\r\n", "\r\n", "labels = [\"names\", \"height\", \"function\", \"rank\"]\r\n", "\r\n", "pdf = pd.DataFrame.from_records(data, columns=labels)\r\n", "\r\n", "df = op.create.dataframe(dfd=pdf)\r\n", "\r\n", "df.display()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "# Dataframe loading" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df = op.load.file(\"https://raw.githubusercontent.com/hi-primus/optimus/develop-21.8/examples/data/foo.csv\")\r\n", "df.display()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Viewing data\r\n", "Here is how to view the first 20 elements in a dataframe" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.display(20)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Display in plain text using print" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.print(5)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "# Transforming data\r\n", "To transform data you can use operations like `upper` to transform the text data to uppercases or `rename` to rename a column." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.display()\r\n", "df.cols.rename(\"firstName\", \"name\").display(highlight=\"name\")\r\n", "df.cols.upper(\"lastName\").display(highlight=\"lastName\")" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Chaining\r\n", "\r\n", "The past transformations were done step by step, but this can be achieved by chaining all operations into one line of code, like the cell below." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.display()\r\n", "df \\\r\n", " .cols.rename(\"billingId\", \"billing\") \\\r\n", " .cols.drop([\"id\", \"dummyCol\"]) \\\r\n", " .cols.append({\"zeros\": 0}) \\\r\n", " .cols.sort(order=\"desc\") \\\r\n", " .cols.upper(\"product\") \\\r\n", " .display()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## More examples\r\n", "\r\n", "Delete repeated rows" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.rows.drop_duplicated(\"product\").display()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Replace repeated values" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.set.duplicated(\"product\", \"N/A\").display(highlight=\"product\")" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Profile of the dataframe" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "df.profile(\"*\", bins=3) # \"*\" = select all columns" ], "outputs": [], "metadata": {} } ], "metadata": { "orig_nbformat": 4, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }