{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Premium Primitive Examples\n",
    "\n",
    "Examples by type\n",
    "\n",
    "* [Geospatial Primitives](#Geospatial-Primitives)\n",
    "* [NLP Primitives](#Natural-Language-Processing-Primitives)\n",
    "* [Date of Birth Primitives](#Date-of-Birth-Primitives)\n",
    "* [Time Primitives](#Time-Primitives)\n",
    "* [Phone Number Primitives](#Phone-Number-Primitives)\n",
    "* [ZIP Code Primitives](#ZIP-Code-Primitives)\n",
    "* [Numeric Primitives](#Numeric-Primitives)\n",
    "* [Miscellaneous Data Types](#Miscellaneous-Data-Types)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Geospatial Primitives\n",
    "\n",
    "`LatLong` variables store a tuple containing the Latitude and Longitude of a point on the globe.  These primitives transform that location (e.g. what country is it in) and can do comparions between multiple `LatLong` variables (e.g. distance between them).\n",
    "\n",
    "#### CityBlockDistance\n",
    "\n",
    "Calculates the distance between points in a city road grid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    301.518836\n",
       "1    672.088624\n",
       "dtype: float64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import CityblockDistance\n",
    "\n",
    "cityblock_distance = CityblockDistance()\n",
    "DC = (38, -77)\n",
    "Boston = (43, -71)\n",
    "NYC = (40, -74)\n",
    "cityblock_distance([DC, DC], [NYC, Boston])  # DC -> NYC, DC -> Boston"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### PathLength\n",
    "Determines the length of a path defined by a series of coordinates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "805.5203180792812"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import PathLength\n",
    "\n",
    "path_length_km = PathLength(unit='kilometers')\n",
    "path_length_km([(41.881832, -87.623177), (38.6270, -90.1994), (39.0997, -94.5786)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### LatLongToCity\n",
    "\n",
    "Determines city/town corresponding to given Latitude and Longitude coordinates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        Bayswater\n",
       "1           Cochin\n",
       "2    Mountain View\n",
       "3             None\n",
       "Name: results, dtype: object"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import LatLongToCity\n",
    "\n",
    "latlong_to_city = LatLongToCity()\n",
    "latlong_to_city([(51.52, -0.17), (9.93, 76.25), (37.38, -122.08), (np.nan, np.nan)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Natural Language Processing Primitives\n",
    "\n",
    "NLP primitives can apply various natural language processing techniques to text data.\n",
    "\n",
    "#### PolarityScore\n",
    "\n",
    "Calculates the polarity of a text on a scale from -1 (negative) to 1 (positive)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.677\n",
       "1   -0.649\n",
       "2    0.000\n",
       "3    0.000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import PolarityScore\n",
    "\n",
    "x = ['He loves dogs', 'She hates cats', 'There is a dog', '']\n",
    "polarity_score = PolarityScore()\n",
    "polarity_score(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### PartOfSpeechCount\n",
    "\n",
    "Calculates the occurences of each different part of speech."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     [0.0, 0.0]\n",
       "1     [0.0, 0.0]\n",
       "2     [0.0, 0.0]\n",
       "3     [0.0, 0.0]\n",
       "4     [0.0, 0.0]\n",
       "5     [1.0, 0.0]\n",
       "6     [0.0, 0.0]\n",
       "7     [0.0, 0.0]\n",
       "8     [0.0, 0.0]\n",
       "9     [1.0, 0.0]\n",
       "10    [0.0, 0.0]\n",
       "11    [0.0, 0.0]\n",
       "12    [0.0, 0.0]\n",
       "13    [1.0, 0.0]\n",
       "14    [0.0, 0.0]\n",
       "dtype: object"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import PartOfSpeechCount\n",
    "\n",
    "x = ['He was eating cheese', '']\n",
    "part_of_speech_count = PartOfSpeechCount()\n",
    "part_of_speech_count(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Date of Birth Primitives\n",
    "\n",
    "These primitives transform `DateOfBirth` type variables. They use the time of the feature calculation to extrapolate the current age of a person. This is set by using a [cutoff time](https://featuretools.featurelabs.com/automated_feature_engineering/handling_time.html#what-is-the-cutoff-time).\n",
    "\n",
    "#### Age\n",
    "\n",
    "Calculates the age in years as a floating point number given a date of birth."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    19.013699\n",
       "1    35.616438\n",
       "2    21.221918\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import Age\n",
    "\n",
    "reference_date = pd.to_datetime(\"01-01-2019\")\n",
    "age = Age()\n",
    "input_ages = [pd.to_datetime(\"01-01-2000\"),\n",
    "              pd.to_datetime(\"05-30-1983\"),\n",
    "              pd.to_datetime(\"10-17-1997\")]\n",
    "age(input_ages, time=reference_date)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are also primitives to see if a birth date falls within a give age range\n",
    "\n",
    "#### AgeOver18\n",
    "\n",
    "Determines whether a person is over 18 years old given their date of birth."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    True\n",
       "1    True\n",
       "2    True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import AgeOver18\n",
    "\n",
    "over18 = AgeOver18()\n",
    "over18(input_ages, time=reference_date)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### AgeUnder65\n",
    "\n",
    "Determines whether a person is under 65 years old given their date of birth."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    True\n",
       "1    True\n",
       "2    True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import AgeUnder65\n",
    "\n",
    "under65 = AgeUnder65()\n",
    "under65(input_ages, time=reference_date)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Time Primitives\n",
    "\n",
    "`Datetime` variables store dates or timestamps.  These primitives can extract special properties from a `Datetime` field like if it is a holiday.\n",
    "\n",
    "#### DateToHoliday\n",
    "\n",
    "If there is no holiday, it returns `NaN`. Currently only works for the United States and Canada with dates between 1950 and 2100."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([\"New Year's Day\", nan, 'Memorial Day', 'Independence Day'],\n",
       "      dtype=object)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import DateToHoliday\n",
    "\n",
    "date_to_holiday = DateToHoliday()\n",
    "dates = pd.Series([datetime(2016, 1, 1),\n",
    "         datetime(2016, 2, 27),\n",
    "         datetime(2017, 5, 29, 10, 30, 5),\n",
    "         datetime(2018, 7, 4)])\n",
    "date_to_holiday(dates)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### NUniqueDaysOfCalendarYear\n",
    "\n",
    "Determines the number of unique calendar days."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import NUniqueDaysOfCalendarYear\n",
    "\n",
    "n_unique_days_of_calendar_year = NUniqueDaysOfCalendarYear()\n",
    "times = [datetime(2019, 2, 1),\n",
    "         datetime(2019, 2, 1),\n",
    "         datetime(2018, 2, 1),\n",
    "         datetime(2019, 1, 1)]\n",
    "n_unique_days_of_calendar_year(times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Phone Number Primitives\n",
    "\n",
    "The `PhoneNumber` variable stores phone numbers.  These primitives can transformations on the phone numbers to extract metadata like country or area code that are general enough for a model to use\n",
    "\n",
    "#### PhoneNumberToCountry\n",
    "\n",
    "Determines the country of a phone number."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    BR\n",
       "1    JP\n",
       "2    US\n",
       "dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import PhoneNumberToCountry\n",
    "\n",
    "phone_number_to_country = PhoneNumberToCountry()\n",
    "phone_number_to_country(['+55 85 5555555', '+81 55-555-5555', '+1-541-754-3010',])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ZIP Code Primitives\n",
    "\n",
    "These primitives can bring in various metadata about a zip code (like geographic or economic information).\n",
    "\n",
    "#### ZIPCodeToState\n",
    "\n",
    "Extracts the state from a ZIPCode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    IL\n",
       "1    CA\n",
       "2    MA\n",
       "dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import ZIPCodeToState\n",
    "\n",
    "zipcode_to_state = ZIPCodeToState()\n",
    "zipcode_to_state(['60622', '94120', '02111-1253'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ZIPCodeToHouseholdIncome\n",
    "\n",
    "Determines the median household income for a ZIP Code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 59000., 103422., 103422.])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import ZIPCodeToHouseholdIncome\n",
    "\n",
    "zipcode_to_household_income = ZIPCodeToHouseholdIncome()\n",
    "zipcode_to_household_income([\"82838\", \"02116\", \"02116-3899\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numeric Primitives\n",
    "\n",
    "The premium primitives have additional nmumeric primitives that add new mathematical transformations and aggregations that aren't present in the open-source library. They are frequently useful in time-series analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### NumPeaks\n",
    "\n",
    "Determines the number of peaks in a list of numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import NumPeaks\n",
    "\n",
    "num_peaks = NumPeaks()\n",
    "num_peaks([-5, 0, 10, 0, 10, -5, -4, -5, 10, 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### NumZeroCrossings\n",
    "Determines the number of times a list crosses 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import NumZeroCrossings\n",
    "\n",
    "num_zero_crossings = NumZeroCrossings()\n",
    "num_zero_crossings([1, -1, 2, -2, 3, -3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Correlation\n",
    "\n",
    "Computes the correlation between two columns of values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9221388919541468"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import Correlation\n",
    "\n",
    "correlation = Correlation()\n",
    "array_1 = [1, 4, 6, 7]\n",
    "array_2 = [1, 5, 9, 7]\n",
    "correlation(array_1, array_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### CountOutsideRange\n",
    "\n",
    "Determines the number of values that fall outside a certain range."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import CountOutsideRange\n",
    "\n",
    "count_outside_range = CountOutsideRange(lower=1.5, upper=3.6)\n",
    "count_outside_range([1, 2, 3, 4, 5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Miscellaneous Data Types\n",
    "\n",
    "Other variable types include FullName, EmailAddress, URL, CountryCode, SubRegionCode, FilePath\n",
    "\n",
    "### FullNameToLastName\n",
    "\n",
    "Determines the first name from a person's name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0          Spector\n",
       "1    Oliva y Ocana\n",
       "2             Ware\n",
       "3            Peter\n",
       "4            Brown\n",
       "Name: last_name, dtype: object"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import FullNameToLastName\n",
    "\n",
    "full_name_to_last_name = FullNameToLastName()\n",
    "names = ['Woolf Spector', 'Oliva y Ocana, Dona. Fermina',\n",
    "         'Ware, Mr. Frederick', 'Peter, Michael J', 'Mr. Brown']\n",
    "full_name_to_last_name(names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### IsFreeEmailDomain\n",
    "\n",
    "Determines if an email address is from a free email domain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ True, False])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import IsFreeEmailDomain\n",
    "\n",
    "is_free_email_domain = IsFreeEmailDomain()\n",
    "is_free_email_domain(['name@gmail.com', 'name@featuretools.com'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### URLToDomain\n",
    "\n",
    "Determines the domain of a url."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    play.google.com\n",
       "1       google.co.in\n",
       "2       facebook.com\n",
       "dtype: object"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import URLToDomain\n",
    "\n",
    "url_to_domain = URLToDomain()\n",
    "urls =  ['https://play.google.com', 'http://www.google.co.in', 'www.facebook.com']\n",
    "url_to_domain(urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CountryCodeToIncome\n",
    "\n",
    "Transforms a 2-digit or 3-digit ISO-3166-1 country code into Gross National Income (GNI) per capita."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([58270.,  3990.,  5920.])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import CountryCodeToIncome\n",
    "\n",
    "country_code_to_income = CountryCodeToIncome()\n",
    "country_code_to_income(['USA', 'AM', 'EC'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SubRegionCodeToMedianHouseholdIncome\n",
    "\n",
    "Determines the median household income of a US sub-region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([51113, 63481, 63805, 83382, 57700, 62447])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import SubRegionCodeToMedianHouseholdIncome\n",
    "\n",
    "sub_region_code_to_median_household_income = SubRegionCodeToMedianHouseholdIncome()\n",
    "subregions = [\"US-AL\", \"US-IA\", \"US-VT\", \"US-DC\", \"US-MI\", \"US-NY\"]\n",
    "sub_region_code_to_median_household_income(subregions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### FileExtension\n",
    "\n",
    "Determines the extension of a filepath."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     .txt\n",
       "1    .json\n",
       "2      NaN\n",
       "dtype: object"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from featuretools.primitives import FileExtension\n",
    "\n",
    "file_extension = FileExtension()\n",
    "file_extension(['doc.txt', '~/documents/data.json', 'file'])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}