{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data clean | Dados de vendas de veículos Brasal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[notebook \"Regressão\" -->](https://github.com/piegu/fastai-projects/blob/master/vendedor_IA_vendas_veiculos_brasal_REGRESSAO.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Autor: [Pierre Guillou](https://www.linkedin.com/in/pierreguillou)\n", "- Data: Maio 2019\n", "- Hackathon Brasal/PCTec-UnB 2019: http://www.brasal.com.br/inovacao/hackathon/\n", "- Datasets do Brasal: http://www.brasal.com.br/inovacao/hackathon/wp-content/uploads/2019/05/oficial.zip\n", "- Biblioteca de Deep Learning usada: [Fastai v1](https://docs.fast.ai/) (Deep Learning com PyTorch)\n", "- Inspiração do Fastai notebook \"Rossmann\": https://github.com/fastai/course-v3/blob/master/nbs/dl1/rossman_data_clean.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O **Hackathon Brasal/PCTec-UnB 2019** foi uma maratona de dados (dias 9 e 10 de maio de 2019), que reuniu estudantes, profissionais e comunidade, com o desafio de em dois dias, realizaram um projeto de Bussiness Intelligence para um cliente real: [Brasal Veículos](http://vw.brasal.com.br/). Aconteceu no [CDT](http://www.cdt.unb.br/) da Universidade de Brasília (UnB) no Brasil.\n", "\n", "Nesse contexto, minha equipe desenvolveu o projeto **\"Vendedor IA\" (VIA), um conjunto de modelos de Inteligência Artificial (IA) usando o Deep Learning** cujo princípio é descrito nos parágrafos seguintes.\n", "\n", "**Vendedor IA (VIA)**\n", "\n", "O VIA é uma ferramenta de IA (um conjunto de modelos usando o Deep Learning) que **ajuda os vendedores de Brasal Veículos a atingir a suas metas de vendas**. \n", "\n", "Ele não é um avatar que vende diretamente os veículos. Graças ao treinamento com os dados de vendas do passado, **faz previsões personalisadas em função do perfil de cada cliente**. Por exemplo: estimativa do orçamento que o cliente está disposto a gastar na compra de um veículo, previsão da categoria de veículo que o cliente mais gosta, previsão da data de compra mais provável, etc.).\n", "\n", "**VIA modeliza o mundo cliente-vendedor**\n", "\n", "Graças ao treinamento com os dados de vendas do passado, **as redes neurais do VIA modelizam o mundo cliente-vendedor no contexto das vendas de veículos Brasal**. \n", "\n", "**Treinamento dos modelos de IA**\n", "\n", "Os modelos de IA são treinados a partir dos dados de vendas de veículos Brasal dos últimos anos e são atualizados todas as noites com os dados de vendas do dia. \n", "\n", "**Modelos desenvolvidos no contexto do hackathon**\n", "\n", "Muitos modelos AI podem ser desenvolvidos para o VIA (veja exemplos dados). Apenas para mostrar como, um **modelo de regressão fornecendo o orçamento que o cliente está disposto a gastar na compra de um veículo** está disponível online no github ([vendedor_IA_vendas_veiculos_brasal_REGRESSAO.ipynb](https://github.com/piegu/fastai-projects/blob/master/vendedor_IA_vendas_veiculos_brasal_REGRESSAO.ipynb)).\n", "\n", "**AI4BI (Artificial Intelligence for Business Intelligence)**\n", "\n", "O VIA é também um **projeto AI4BI**: as previsões do VIA vão para um aplicativo de BI que permite aos vendedores de Brasal Veículos de visualizá-las em gráficos interativos." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebooks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2 jupyter notebooks foram criados:\n", "1. **Data clean** ([vendas_veiculos_brasal_data_clean.ipynb](https://github.com/piegu/fastai-projects/blob/master/vendas_veiculos_brasal_data_clean.ipynb)): é o notebook de preparação da tabela de dados de vendas para treinar os modelos do VIA.\n", "2. **Regressão** ([vendedor_IA_vendas_veiculos_brasal_REGRESSAO.ipynb](https://github.com/piegu/fastai-projects/blob/master/vendedor_IA_vendas_veiculos_brasal_REGRESSAO.ipynb)): é o notebook de treinamento do modelo que fornece o orçamento que o cliente está disposto a gastar na compra de um veículo." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialisation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from fastai.basics import *" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "fastai: 1.0.52\n", "cuda: True\n" ] } ], "source": [ "import fastai\n", "print(f'fastai: {fastai.__version__}')\n", "print(f'cuda: {torch.cuda.is_available()}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data preparation / Feature engineering" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "PATH = Config.data_path()/'brasal/'\n", "# PATH.ls()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para criar o modelo de probabilidades de vendas por categoria de veículo, usamos os seguintes arquivos.\n", "\n", "Nota: o arquivo `MODELOS_VEICULOS.CSV` foi criado a partir do `VEICULOS_BRAVOS.CSV` com a coluna `CLASSIFICACAO` em adição, o que dá a categoria do veículo.\n", "\n", "1. Popular\n", "2. Utilitário\n", "3. OFF ROAD\n", "4. SUV (Sport Utility Vehicle)\n", "5. Luxo" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# list of tables\n", "table_names = ['CLIENTES_BRAVOS',\n", " 'MODELOS_VEICULOS',\n", " 'VENDAS_VEICULOS_BRAVOS', \n", " 'VENDEDOR_BRAVOS']" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "tables = [pd.read_csv(PATH/f'{fname}.csv', low_memory=False, sep=\";\", encoding='cp1252') for fname in table_names]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4 tabelas: clientes, veiculos, vendas, vendedores\n", "Número de vendas: 7284\n" ] } ], "source": [ "clientes, veiculos, vendas, vendedores = tables\n", "print(f'{len(tables)} tabelas: clientes, veiculos, vendas, vendedores')\n", "print(f'Número de vendas: {len(vendas)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Table Clientes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Processo**:\n", "\n", "1. mantendo apenas as colunas necessárias para treinamento\n", "2. adicionando novas colunas se necessário (ex: CEP para CEP1, CEP2, CEP3)\n", "3. substituindo datas erradas e ausentes\n", "4. alterendo dtype das colunas quando necessário (ex: ESTADO_CIVIL para int32) e substituindo dados ausentes ou errados (ex: 0 quando não há dado de ESTADO_CIVIL)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "df = clientes" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['CLIENTE_COD', 'NOME_CLIENTE', 'DATA_NASCIMENTO', 'SEXO', 'ESTADO_CIVIL', 'DESC_ESTADO_CIVIL',\n", " 'SITUACAO_CREDITO', 'CLIENTE_TIPO', 'CLIENTE_CATEGORIA', 'DES_CATEGORIA_CLIENTE', 'ENDERECO_BAIRRO',\n", " 'ENDERECO_MUNICIPIO', 'ENDERECO_UF', 'CEP', 'CLIENTE_DTCADASTRO', 'CLIENTE_DTULTIMOMOVIMENTO',\n", " 'DTULTIMAALTERACAO', 'DTPRIMEIRAFATURA', 'DTULTIMAFATURA', 'LIMITE_CREDITO', 'CLIENTE_VIP', 'EMPRESA_ORIGEM',\n", " 'REVENDA_ORIGEM', 'GRUPO_CLIENTE', 'CADEIRANTE', 'CLIENTE_INATIVO', 'TIPO_PESSOA'], dtype=object)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# List of fields by client\n", "\n", "# client_tipo means\n", "# 1 -> Ouro 2 -> Prata 3 -> Bronze E -> Especial L -> Lista Negra N -> Normal P -> Prospecção\n", "\n", "columns_clientes = array(['CLIENTE_COD', 'DATA_NASCIMENTO', 'SEXO', \n", " 'ESTADO_CIVIL',\n", " 'SITUACAO_CREDITO', \n", " 'CEP',\n", " 'CLIENTE_TIPO', 'TIPO_PESSOA', # J / F\n", " 'CLIENTE_DTCADASTRO', 'EMPRESA_ORIGEM','REVENDA_ORIGEM', \n", " 'DTPRIMEIRAFATURA', 'DTULTIMAFATURA', 'LIMITE_CREDITO'\n", " ])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": false }, "outputs": [], "source": [ "df_clientes = df[columns_clientes].copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# keep informations about client postal address by keeping the 3 first digits of CEP\n", "\n", "df_clientes['CEP'] = df_clientes['CEP'].fillna(0.0).astype(np.int32)\n", " \n", "for i in range(3):\n", " f = lambda x: int(list(str(x))[i]) if len(list(str(x)))>=3 else 0\n", " n = i+1\n", " df_clientes[f'CEP{n}'] = df_clientes['CEP'].apply(f)\n", " \n", "df_clientes = df_clientes.drop(['CEP'], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Remplace all date with error by '1900/01/01 00:00:00.000000000'\n", "\n", "# check if value is NaN\n", "def isNaN(num):\n", " return num != num\n", "\n", "def indexes_date_correction(date_fldname, df):\n", " idxs_erros = list()\n", " idxs_empty = list()\n", " for idx,dt in df[date_fldname].iteritems():\n", " if isNaN(dt) == False:\n", " y = int(dt.split(\"/\")[0])\n", " m = int(dt.split(\"/\")[1])\n", " if (y < 1900) or (y > 2019) or (m < 1) or (m > 12) or (y == 2019 and m > 4):\n", " idxs_erros.append(idx)\n", " else:\n", " idxs_empty.append(idx)\n", " return idxs_erros, idxs_empty\n", "\n", "def date_correction(idxs, date_fldname, df, new_date='1900/01/01 00:00:00.000000000', flag='error'):\n", " df.loc[idxs, date_fldname] = new_date\n", " if flag == 'error':\n", " print(f'Todas as datas de {date_fldname} têm sido corrigidas.')\n", " elif flag == 'empty':\n", " print(f'Todas as cédulas sem data de {date_fldname} têm sido preenchidas.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DATA_NASCIMENTO" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de cédulas sem datas DATA_NASCIMENTO: 69531\n", "Número de erros nas datas DATA_NASCIMENTO: 7764\n", "\n", "7 2040/12/10 00:00:00.000000000\n", "13 2045/11/08 00:00:00.000000000\n", "16 2032/03/19 00:00:00.000000000\n", "23 2049/11/18 00:00:00.000000000\n", "24 2040/08/28 00:00:00.000000000\n", "Name: DATA_NASCIMENTO, dtype: object\n", "\n", "188002 0977/11/18 00:00:00.000000000\n", "188217 0001/01/01 00:00:00.000000000\n", "188253 2031/03/25 00:00:00.000000000\n", "188405 2019/08/18 00:00:00.000000000\n", "188407 2033/01/19 00:00:00.000000000\n", "Name: DATA_NASCIMENTO, dtype: object\n" ] } ], "source": [ "df = df_clientes\n", "date_fldname = 'DATA_NASCIMENTO'\n", "idxs_erros, idxs_empty = indexes_date_correction(date_fldname,df)\n", "errors_data = len(idxs_erros)\n", "empty_data = len(idxs_empty)\n", "\n", "print(f'Número de cédulas sem datas {date_fldname}: {empty_data}')\n", "print(f'Número de erros nas datas {date_fldname}: {errors_data}')\n", "if errors_data != 0:\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].head())\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].tail())" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Todas as datas de DATA_NASCIMENTO têm sido corrigidas.\n", "Todas as cédulas sem data de DATA_NASCIMENTO têm sido preenchidas.\n" ] } ], "source": [ "# Correction of date errors\n", "date_correction(idxs_erros,date_fldname,df)\n", "# Fillin empty cells with '1900/01/01 00:00:00.000000000'\n", "date_correction(idxs_empty,date_fldname,df,flag='empty')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### CLIENTE_DTCADASTRO" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de cédulas sem datas CLIENTE_DTCADASTRO: 75946\n", "Número de erros nas datas CLIENTE_DTCADASTRO: 247\n", "\n", "94093 2038/05/23 00:00:00.000000000\n", "101627 2043/06/24 00:00:00.000000000\n", "101901 2047/12/15 00:00:00.000000000\n", "101948 2043/12/14 00:00:00.000000000\n", "101954 2049/10/20 00:00:00.000000000\n", "Name: CLIENTE_DTCADASTRO, dtype: object\n", "\n", "162595 2049/06/15 00:00:00.000000000\n", "172822 2046/06/25 00:00:00.000000000\n", "182667 2037/02/20 00:00:00.000000000\n", "182675 2046/01/28 00:00:00.000000000\n", "185408 2047/06/03 00:00:00.000000000\n", "Name: CLIENTE_DTCADASTRO, dtype: object\n" ] } ], "source": [ "df = df_clientes\n", "date_fldname = 'CLIENTE_DTCADASTRO'\n", "idxs_erros, idxs_empty = indexes_date_correction(date_fldname,df)\n", "errors_data = len(idxs_erros)\n", "empty_data = len(idxs_empty)\n", "\n", "print(f'Número de cédulas sem datas {date_fldname}: {empty_data}')\n", "print(f'Número de erros nas datas {date_fldname}: {errors_data}')\n", "if errors_data != 0:\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].head())\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].tail())" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Todas as datas de CLIENTE_DTCADASTRO têm sido corrigidas.\n", "Todas as cédulas sem data de CLIENTE_DTCADASTRO têm sido preenchidas.\n" ] } ], "source": [ "# Correction of date errors\n", "date_correction(idxs_erros,date_fldname,df)\n", "# Fillin empty cells with '1900/01/01 00:00:00.000000000'\n", "date_correction(idxs_empty,date_fldname,df,flag='empty')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DTPRIMEIRAFATURA" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de cédulas sem datas DTPRIMEIRAFATURA: 177795\n", "Número de erros nas datas DTPRIMEIRAFATURA: 0\n" ] } ], "source": [ "df = df_clientes\n", "date_fldname = 'DTPRIMEIRAFATURA'\n", "idxs_erros, idxs_empty = indexes_date_correction(date_fldname,df)\n", "errors_data = len(idxs_erros)\n", "empty_data = len(idxs_empty)\n", "\n", "print(f'Número de cédulas sem datas {date_fldname}: {empty_data}')\n", "print(f'Número de erros nas datas {date_fldname}: {errors_data}')\n", "if errors_data != 0:\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].head())\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].tail())" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Todas as datas de DTPRIMEIRAFATURA têm sido corrigidas.\n", "Todas as cédulas sem data de DTPRIMEIRAFATURA têm sido preenchidas.\n" ] } ], "source": [ "# Correction of date errors\n", "date_correction(idxs_erros,date_fldname,df)\n", "# Fillin empty cells with '1900/01/01 00:00:00.000000000'\n", "date_correction(idxs_empty,date_fldname,df,flag='empty')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DTULTIMAFATURA" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de cédulas sem datas DTULTIMAFATURA: 177759\n", "Número de erros nas datas DTULTIMAFATURA: 0\n" ] } ], "source": [ "df = df_clientes\n", "date_fldname = 'DTULTIMAFATURA'\n", "idxs_erros, idxs_empty = indexes_date_correction(date_fldname,df)\n", "errors_data = len(idxs_erros)\n", "empty_data = len(idxs_empty)\n", "\n", "print(f'Número de cédulas sem datas {date_fldname}: {empty_data}')\n", "print(f'Número de erros nas datas {date_fldname}: {errors_data}')\n", "if errors_data != 0:\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].head())\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].tail())" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Todas as datas de DTULTIMAFATURA têm sido corrigidas.\n", "Todas as cédulas sem data de DTULTIMAFATURA têm sido preenchidas.\n" ] } ], "source": [ "# Correction of date errors\n", "date_correction(idxs_erros,date_fldname,df)\n", "# Fillin empty cells with '1900/01/01 00:00:00.000000000'\n", "date_correction(idxs_empty,date_fldname,df,flag='empty')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CLIENTE_COD int64\n", "DATA_NASCIMENTO object\n", "SEXO object\n", "ESTADO_CIVIL float64\n", "SITUACAO_CREDITO object\n", "CLIENTE_TIPO object\n", "TIPO_PESSOA object\n", "CLIENTE_DTCADASTRO object\n", "EMPRESA_ORIGEM int64\n", "REVENDA_ORIGEM int64\n", "DTPRIMEIRAFATURA object\n", "DTULTIMAFATURA object\n", "LIMITE_CREDITO float64\n", "CEP1 int64\n", "CEP2 int64\n", "CEP3 int64\n", "dtype: object" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_clientes.dtypes" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Delete espace in cells\n", "f = lambda x: x.strip()\n", "df_clientes.SEXO = df_clientes.SEXO.apply(f)\n", "\n", "# Replace empty values by nan\n", "df_clientes.SEXO.replace('', np.nan, inplace=True)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "df_clientes['CLIENTE_COD'] = df_clientes['CLIENTE_COD'].astype(np.int32)\n", "df_clientes['SEXO'] = df_clientes['SEXO'].fillna('N') # N = neutral when there is no sex\n", "df_clientes['ESTADO_CIVIL'] = df_clientes['ESTADO_CIVIL'].fillna(0.0).astype(np.int32)\n", "df_clientes['EMPRESA_ORIGEM'] = df_clientes['EMPRESA_ORIGEM'].astype(np.int32)\n", "df_clientes['REVENDA_ORIGEM'] = df_clientes['REVENDA_ORIGEM'].astype(np.int32)\n", "df_clientes['LIMITE_CREDITO'] = df_clientes['LIMITE_CREDITO'].fillna(0.0).astype(np.float32)\n", "df_clientes['CEP1'] = df_clientes['CEP1'].astype(np.int32)\n", "df_clientes['CEP2'] = df_clientes['CEP2'].astype(np.int32)\n", "df_clientes['CEP3'] = df_clientes['CEP3'].astype(np.int32)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "df_clientes['DATA_NASCIMENTO'] = pd.to_datetime(df_clientes['DATA_NASCIMENTO'], infer_datetime_format=True)\n", "df_clientes['CLIENTE_DTCADASTRO'] = pd.to_datetime(df_clientes['CLIENTE_DTCADASTRO'], infer_datetime_format=True)\n", "df_clientes['DTPRIMEIRAFATURA'] = pd.to_datetime(df_clientes['DTPRIMEIRAFATURA'], infer_datetime_format=True)\n", "df_clientes['DTULTIMAFATURA'] = pd.to_datetime(df_clientes['DTULTIMAFATURA'], infer_datetime_format=True)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CLIENTE_COD int32\n", "DATA_NASCIMENTO datetime64[ns]\n", "SEXO object\n", "ESTADO_CIVIL int32\n", "SITUACAO_CREDITO object\n", "CLIENTE_TIPO object\n", "TIPO_PESSOA object\n", "CLIENTE_DTCADASTRO datetime64[ns]\n", "EMPRESA_ORIGEM int32\n", "REVENDA_ORIGEM int32\n", "DTPRIMEIRAFATURA datetime64[ns]\n", "DTULTIMAFATURA datetime64[ns]\n", "LIMITE_CREDITO float32\n", "CEP1 int32\n", "CEP2 int32\n", "CEP3 int32\n", "dtype: object" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_clientes.dtypes" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
CLIENTE_COD170717273
DATA_NASCIMENTO1900-01-01 00:00:001966-04-24 00:00:001966-05-21 00:00:001964-09-09 00:00:001968-09-07 00:00:00
SEXONMFMM
ESTADO_CIVIL02122
SITUACAO_CREDITOAprovadoAprovadoAprovadoAprovadoAprovado
CLIENTE_TIPONNNNN
TIPO_PESSOAJFFFF
CLIENTE_DTCADASTRO2015-02-24 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:00
EMPRESA_ORIGEM11111
REVENDA_ORIGEM11111
DTPRIMEIRAFATURA2015-04-01 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:00
DTULTIMAFATURA2016-09-30 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:00
LIMITE_CREDITO1e+070000
CEP177777
CEP277777
CEP305000
\n", "
" ], "text/plain": [ " 0 1 \\\n", "CLIENTE_COD 1 70 \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 1966-04-24 00:00:00 \n", "SEXO N M \n", "ESTADO_CIVIL 0 2 \n", "SITUACAO_CREDITO Aprovado Aprovado \n", "CLIENTE_TIPO N N \n", "TIPO_PESSOA J F \n", "CLIENTE_DTCADASTRO 2015-02-24 00:00:00 1900-01-01 00:00:00 \n", "EMPRESA_ORIGEM 1 1 \n", "REVENDA_ORIGEM 1 1 \n", "DTPRIMEIRAFATURA 2015-04-01 00:00:00 1900-01-01 00:00:00 \n", "DTULTIMAFATURA 2016-09-30 00:00:00 1900-01-01 00:00:00 \n", "LIMITE_CREDITO 1e+07 0 \n", "CEP1 7 7 \n", "CEP2 7 7 \n", "CEP3 0 5 \n", "\n", " 2 3 \\\n", "CLIENTE_COD 71 72 \n", "DATA_NASCIMENTO 1966-05-21 00:00:00 1964-09-09 00:00:00 \n", "SEXO F M \n", "ESTADO_CIVIL 1 2 \n", "SITUACAO_CREDITO Aprovado Aprovado \n", "CLIENTE_TIPO N N \n", "TIPO_PESSOA F F \n", "CLIENTE_DTCADASTRO 1900-01-01 00:00:00 1900-01-01 00:00:00 \n", "EMPRESA_ORIGEM 1 1 \n", "REVENDA_ORIGEM 1 1 \n", "DTPRIMEIRAFATURA 1900-01-01 00:00:00 1900-01-01 00:00:00 \n", "DTULTIMAFATURA 1900-01-01 00:00:00 1900-01-01 00:00:00 \n", "LIMITE_CREDITO 0 0 \n", "CEP1 7 7 \n", "CEP2 7 7 \n", "CEP3 0 0 \n", "\n", " 4 \n", "CLIENTE_COD 73 \n", "DATA_NASCIMENTO 1968-09-07 00:00:00 \n", "SEXO M \n", "ESTADO_CIVIL 2 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA F \n", "CLIENTE_DTCADASTRO 1900-01-01 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 1900-01-01 00:00:00 \n", "DTULTIMAFATURA 1900-01-01 00:00:00 \n", "LIMITE_CREDITO 0 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 0 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_clientes.head().T" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de clientes: 188432\n" ] } ], "source": [ "print(f'Número de clientes: {len(df_clientes)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Table Veiculos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Processo**: mantendo apenas as colunas necessárias para treinamento." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "df = veiculos" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['EMPRESA', 'MODELO', 'DES_MODELO', 'FAMILIA', 'DES_FAMILIA', 'MARCA', 'CLASSIFICACAO'], dtype=object)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns.values" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# List of fields by car\n", "\n", "# We did not keep the empresa that owns the car and we classify in the same category new and second-hand cars\n", "\n", "columns_veiculos = array(['MODELO', 'DES_MODELO', 'FAMILIA', 'MARCA', 'CLASSIFICACAO'])" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "df_veiculos = df[columns_veiculos].copy()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# Delete espace in cells\n", "f = lambda x: x.strip()\n", "df_veiculos.MODELO = df_veiculos.MODELO.apply(f)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MODELO object\n", "DES_MODELO object\n", "FAMILIA int64\n", "MARCA object\n", "CLASSIFICACAO int64\n", "dtype: object" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_veiculos.dtypes" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "df_veiculos['FAMILIA'] = df_veiculos['FAMILIA'].astype(np.int32)\n", "df_veiculos['CLASSIFICACAO'] = df_veiculos['CLASSIFICACAO'].fillna(0).astype(np.int32)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MODELO object\n", "DES_MODELO object\n", "FAMILIA int32\n", "MARCA object\n", "CLASSIFICACAO int32\n", "dtype: object" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_veiculos.dtypes" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
MODELO....1623H31624N31624U31632F2
DES_MODELOFOX CONNECT SBVIRTUS MSI 1.6VW/GOL 1.0GOL 1.6GOL 1,6
FAMILIA11107111
MARCAVWVWVWVWVW
CLASSIFICACAO13111
\n", "
" ], "text/plain": [ " 0 \\\n", "MODELO .... \n", "DES_MODELO FOX CONNECT SB \n", "FAMILIA 11 \n", "MARCA VW \n", "CLASSIFICACAO 1 \n", "\n", " 1 \\\n", "MODELO 1623H3 \n", "DES_MODELO VIRTUS MSI 1.6 \n", "FAMILIA 107 \n", "MARCA VW \n", "CLASSIFICACAO 3 \n", "\n", " 2 \\\n", "MODELO 1624N3 \n", "DES_MODELO VW/GOL 1.0 \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 1 \n", "\n", " 3 \\\n", "MODELO 1624U3 \n", "DES_MODELO GOL 1.6 \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 1 \n", "\n", " 4 \n", "MODELO 1632F2 \n", "DES_MODELO GOL 1,6 \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 1 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_veiculos.head().T" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de veiculos: 592\n" ] } ], "source": [ "print(f'Número de veiculos: {len(df_veiculos)}')" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de família de veiculos: 37\n" ] } ], "source": [ "print(f'Número de família de veiculos: {len(df_veiculos.FAMILIA.unique())}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Table Vendedores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Este arquivo de vendedores deveria fornecer mais informações sobre eles como sexo, idade etc. e como o local de trabalho do vendedor é indicado na tabela de vendas, **não é necessário usar esse arquivo**." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "df = vendedores" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['VENDEDOR', 'NOME', 'EMPRESA', 'REVENDA', 'DEPARTAMENTO', 'FUNCAO', 'COD_FUNCAO', 'ATIVO'], dtype=object)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns.values" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# List of fields by seller\n", "columns_vendedores = array(['VENDEDOR',\n", " 'EMPRESA', 'REVENDA'\n", " ])" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "df_vendedores = df[columns_vendedores].copy()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "VENDEDOR int64\n", "EMPRESA int64\n", "REVENDA int64\n", "dtype: object" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendedores.dtypes" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
VENDEDOR110001110002110003110004110005
EMPRESA11111
REVENDA11111
\n", "
" ], "text/plain": [ " 0 1 2 3 4\n", "VENDEDOR 110001 110002 110003 110004 110005\n", "EMPRESA 1 1 1 1 1\n", "REVENDA 1 1 1 1 1" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendedores.head().T" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de vendedores: 447\n" ] } ], "source": [ "print(f'Número de vendedores: {len(df_vendedores)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Table Vendas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Processo**:\n", "\n", "1. mantendo apenas as colunas necessárias para treinamento\n", "2. substituindo datas erradas e ausentes\n", "3. alterendo dtype das colunas quando necessário (ex: DTA_ENTRADA_SAIDA para datetime) " ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "df = vendas" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['EMPRESA', 'REVENDA', 'NUMERO_NOTA_FISCAL', 'SERIE_NOTA_FISCAL', 'TIPO_TRANSACAO', 'DTA_ENTRADA_SAIDA',\n", " 'DTA_DOCUMENTO', 'CLIENTE', 'DEPARTAMENTO', 'USUARIO', 'REVENDA_ORIGEM', 'QUANTIDADE', 'VAL_FRETE',\n", " 'VAL_DESCONTO', 'VAL_SEGURO', 'VAL_IPI', 'VAL_PIS', 'VAL_COFINS', 'VAL_TOTAL_REAL_ITEM', 'VAL_TOTAL_NOTA_ITEM',\n", " 'VAL_ICMS_RETIDO', 'VAL_CUSTO_CONTABIL', 'VAL_ICMS', 'VENDEDOR', 'VAL_ISS_RETIDO', 'COD_SETOR_VENDA',\n", " 'VAL_BONUS', 'MODELO'], dtype=object)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "# List of fields by sale\n", "columns_vendas = array(['EMPRESA', 'REVENDA',\n", " 'DTA_ENTRADA_SAIDA',\n", " 'CLIENTE', \n", " 'VAL_TOTAL_REAL_ITEM',\n", " 'VAL_TOTAL_NOTA_ITEM', \n", " 'VAL_CUSTO_CONTABIL',\n", " 'VENDEDOR', \n", " 'VAL_BONUS',\n", " 'MODELO'\n", " ])" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "df_vendas = df[columns_vendas].copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# Rename columns names to avoid problem when merging with table of sales\n", "df_vendas.rename(columns={\"EMPRESA\": 'EMPRESA_VENDAS', 'REVENDA': 'REVENDA_VENDAS'}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de cédulas sem datas DTA_ENTRADA_SAIDA: 0\n", "Número de erros nas datas DTA_ENTRADA_SAIDA: 3\n", "\n", "6600 2201/11/16 00:00:00.000\n", "7035 2019/12/20 00:00:00.000\n", "7109 2019/12/07 00:00:00.000\n", "Name: DTA_ENTRADA_SAIDA, dtype: object\n", "\n", "6600 2201/11/16 00:00:00.000\n", "7035 2019/12/20 00:00:00.000\n", "7109 2019/12/07 00:00:00.000\n", "Name: DTA_ENTRADA_SAIDA, dtype: object\n" ] } ], "source": [ "df = df_vendas\n", "date_fldname = 'DTA_ENTRADA_SAIDA'\n", "idxs_erros, idxs_empty = indexes_date_correction(date_fldname,df)\n", "errors_data = len(idxs_erros)\n", "empty_data = len(idxs_empty)\n", "\n", "print(f'Número de cédulas sem datas {date_fldname}: {empty_data}')\n", "print(f'Número de erros nas datas {date_fldname}: {errors_data}')\n", "if errors_data != 0:\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].head())\n", " print()\n", " print(df.loc[idxs_erros, date_fldname].tail())" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "# Correction of date errors\n", "df.loc[6600,'DTA_ENTRADA_SAIDA'] = '2001/11/16 00:00:00.000'\n", "df.loc[7035,'DTA_ENTRADA_SAIDA'] = '2009/12/20 00:00:00.000'\n", "df.loc[7109,'DTA_ENTRADA_SAIDA'] = '2009/12/07 00:00:00.000'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "df_vendas['VAL_TOTAL_REAL_ITEM'] = df_vendas['VAL_TOTAL_REAL_ITEM'].abs()\n", "df_vendas['VAL_TOTAL_NOTA_ITEM'] = df_vendas['VAL_TOTAL_NOTA_ITEM'].abs()\n", "df_vendas['VAL_CUSTO_CONTABIL'] = df_vendas['VAL_CUSTO_CONTABIL'].abs()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df_vendas.VAL_TOTAL_REAL_ITEM < 0.).sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "EMPRESA_VENDAS int64\n", "REVENDA_VENDAS int64\n", "DTA_ENTRADA_SAIDA object\n", "CLIENTE int64\n", "VAL_TOTAL_REAL_ITEM float64\n", "VAL_TOTAL_NOTA_ITEM float64\n", "VAL_CUSTO_CONTABIL float64\n", "VENDEDOR int64\n", "VAL_BONUS float64\n", "MODELO object\n", "dtype: object" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendas.dtypes" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "df_vendas['EMPRESA_VENDAS'] = df_vendas['EMPRESA_VENDAS'].astype(np.int32)\n", "df_vendas['REVENDA_VENDAS'] = df_vendas['REVENDA_VENDAS'].astype(np.int32)\n", "df_vendas['CLIENTE'] = df_vendas['CLIENTE'].astype(np.int32)\n", "df_vendas['VENDEDOR'] = df_vendas['VENDEDOR'].astype(np.int32)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "df_vendas['DTA_ENTRADA_SAIDA'] = pd.to_datetime(df_vendas['DTA_ENTRADA_SAIDA'], infer_datetime_format=True)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "EMPRESA_VENDAS int32\n", "REVENDA_VENDAS int32\n", "DTA_ENTRADA_SAIDA datetime64[ns]\n", "CLIENTE int32\n", "VAL_TOTAL_REAL_ITEM float64\n", "VAL_TOTAL_NOTA_ITEM float64\n", "VAL_CUSTO_CONTABIL float64\n", "VENDEDOR int32\n", "VAL_BONUS float64\n", "MODELO object\n", "dtype: object" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendas.dtypes" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
EMPRESA_VENDAS11151
REVENDA_VENDAS11111
DTA_ENTRADA_SAIDA2015-03-25 00:00:002015-03-31 00:00:002015-04-30 00:00:002018-09-18 00:00:002015-05-14 00:00:00
CLIENTE5230211518041
VAL_TOTAL_REAL_ITEM4880037008.137043.241869.3127944
VAL_TOTAL_NOTA_ITEM4880037008.137043.241869.3127944
VAL_CUSTO_CONTABIL47713.837008.137043.241869.3127944
VENDEDOR1100041100111100114000020110011
VAL_BONUSNaNNaNNaNNaNNaN
MODELO5U4FE46A53K45U3PN45U7TA42HBB3A
\n", "
" ], "text/plain": [ " 0 1 \\\n", "EMPRESA_VENDAS 1 1 \n", "REVENDA_VENDAS 1 1 \n", "DTA_ENTRADA_SAIDA 2015-03-25 00:00:00 2015-03-31 00:00:00 \n", "CLIENTE 52302 1 \n", "VAL_TOTAL_REAL_ITEM 48800 37008.1 \n", "VAL_TOTAL_NOTA_ITEM 48800 37008.1 \n", "VAL_CUSTO_CONTABIL 47713.8 37008.1 \n", "VENDEDOR 110004 110011 \n", "VAL_BONUS NaN NaN \n", "MODELO 5U4FE4 6A53K4 \n", "\n", " 2 3 \\\n", "EMPRESA_VENDAS 1 5 \n", "REVENDA_VENDAS 1 1 \n", "DTA_ENTRADA_SAIDA 2015-04-30 00:00:00 2018-09-18 00:00:00 \n", "CLIENTE 1 51804 \n", "VAL_TOTAL_REAL_ITEM 37043.2 41869.3 \n", "VAL_TOTAL_NOTA_ITEM 37043.2 41869.3 \n", "VAL_CUSTO_CONTABIL 37043.2 41869.3 \n", "VENDEDOR 110011 4000020 \n", "VAL_BONUS NaN NaN \n", "MODELO 5U3PN4 5U7TA4 \n", "\n", " 4 \n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-05-14 00:00:00 \n", "CLIENTE 1 \n", "VAL_TOTAL_REAL_ITEM 127944 \n", "VAL_TOTAL_NOTA_ITEM 127944 \n", "VAL_CUSTO_CONTABIL 127944 \n", "VENDEDOR 110011 \n", "VAL_BONUS NaN \n", "MODELO 2HBB3A " ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendas.head().T" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de vendas: 7284\n" ] } ], "source": [ "print(f'Número de vendas: {len(df_vendas)}')" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "# save tables\n", "df_tables = [df_clientes, df_veiculos, df_vendas, df_vendedores]\n", "\n", "with open(PATH/'df_tables', 'wb') as f:\n", " pickle.dump(df_tables, f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1 table with all informations" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "# load file\n", "file = open(PATH/'df_tables','rb')\n", "df_tables = pickle.load(file)\n", "df_clientes, df_veiculos, df_vendas, df_vendedores = df_tables\n", "file.close()" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "# function to merge 2 dataframes\n", "def join_df(left, right, left_on, right_on=None, suffix='_y'):\n", " if right_on is None: right_on = left_on\n", " return left.merge(right, how='left', left_on=left_on, right_on=right_on, \n", " suffixes=(\"\", suffix))" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "df_vendas = join_df(df_vendas, df_clientes, \"CLIENTE\", \"CLIENTE_COD\")\n", "df_vendas = join_df(df_vendas, df_veiculos, \"MODELO\", \"MODELO\")" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "df_vendas = df_vendas.drop(['CLIENTE_COD'], axis=1)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "EMPRESA_VENDAS int32\n", "REVENDA_VENDAS int32\n", "DTA_ENTRADA_SAIDA datetime64[ns]\n", "CLIENTE int32\n", "VAL_TOTAL_REAL_ITEM float64\n", "VAL_TOTAL_NOTA_ITEM float64\n", "VAL_CUSTO_CONTABIL float64\n", "VENDEDOR int32\n", "VAL_BONUS float64\n", "MODELO object\n", "DATA_NASCIMENTO datetime64[ns]\n", "SEXO object\n", "ESTADO_CIVIL int32\n", "SITUACAO_CREDITO object\n", "CLIENTE_TIPO object\n", "TIPO_PESSOA object\n", "CLIENTE_DTCADASTRO datetime64[ns]\n", "EMPRESA_ORIGEM int32\n", "REVENDA_ORIGEM int32\n", "DTPRIMEIRAFATURA datetime64[ns]\n", "DTULTIMAFATURA datetime64[ns]\n", "LIMITE_CREDITO float32\n", "CEP1 int32\n", "CEP2 int32\n", "CEP3 int32\n", "DES_MODELO object\n", "FAMILIA float64\n", "MARCA object\n", "CLASSIFICACAO float64\n", "dtype: object" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendas.dtypes" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
EMPRESA_VENDAS11151
REVENDA_VENDAS11111
DTA_ENTRADA_SAIDA2015-03-25 00:00:002015-03-31 00:00:002015-04-30 00:00:002018-09-18 00:00:002015-05-14 00:00:00
CLIENTE5230211518041
VAL_TOTAL_REAL_ITEM4880037008.137043.241869.3127944
VAL_TOTAL_NOTA_ITEM4880037008.137043.241869.3127944
VAL_CUSTO_CONTABIL47713.837008.137043.241869.3127944
VENDEDOR1100041100111100114000020110011
VAL_BONUSNaNNaNNaNNaNNaN
MODELO5U4FE46A53K45U3PN45U7TA42HBB3A
DATA_NASCIMENTO1977-09-09 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:00
SEXOMNNNN
ESTADO_CIVIL20000
SITUACAO_CREDITOAprovadoAprovadoAprovadoAprovadoAprovado
CLIENTE_TIPONNNNN
TIPO_PESSOAFJJJJ
CLIENTE_DTCADASTRO1900-01-01 00:00:002015-02-24 00:00:002015-02-24 00:00:001900-01-01 00:00:002015-02-24 00:00:00
EMPRESA_ORIGEM11111
REVENDA_ORIGEM11111
DTPRIMEIRAFATURA2015-03-02 00:00:002015-04-01 00:00:002015-04-01 00:00:002015-03-25 00:00:002015-04-01 00:00:00
DTULTIMAFATURA2015-03-25 00:00:002016-09-30 00:00:002016-09-30 00:00:002019-03-07 00:00:002016-09-30 00:00:00
LIMITE_CREDITO01e+071e+0701e+07
CEP177777
CEP277777
CEP350080
DES_MODELOFRONTIERFIAT SIENAVOYAGE 1.6VW/FOX 1.6 PIME GIIPEUGEOT 207 PASSION XS A
FAMILIA111158
MARCAVWVWVWVWPEUG
CLASSIFICACAO31311
\n", "
" ], "text/plain": [ " 0 \\\n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-03-25 00:00:00 \n", "CLIENTE 52302 \n", "VAL_TOTAL_REAL_ITEM 48800 \n", "VAL_TOTAL_NOTA_ITEM 48800 \n", "VAL_CUSTO_CONTABIL 47713.8 \n", "VENDEDOR 110004 \n", "VAL_BONUS NaN \n", "MODELO 5U4FE4 \n", "DATA_NASCIMENTO 1977-09-09 00:00:00 \n", "SEXO M \n", "ESTADO_CIVIL 2 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA F \n", "CLIENTE_DTCADASTRO 1900-01-01 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-03-02 00:00:00 \n", "DTULTIMAFATURA 2015-03-25 00:00:00 \n", "LIMITE_CREDITO 0 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 5 \n", "DES_MODELO FRONTIER \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 3 \n", "\n", " 1 \\\n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-03-31 00:00:00 \n", "CLIENTE 1 \n", "VAL_TOTAL_REAL_ITEM 37008.1 \n", "VAL_TOTAL_NOTA_ITEM 37008.1 \n", "VAL_CUSTO_CONTABIL 37008.1 \n", "VENDEDOR 110011 \n", "VAL_BONUS NaN \n", "MODELO 6A53K4 \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 2015-02-24 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-04-01 00:00:00 \n", "DTULTIMAFATURA 2016-09-30 00:00:00 \n", "LIMITE_CREDITO 1e+07 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 0 \n", "DES_MODELO FIAT SIENA \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 1 \n", "\n", " 2 \\\n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-04-30 00:00:00 \n", "CLIENTE 1 \n", "VAL_TOTAL_REAL_ITEM 37043.2 \n", "VAL_TOTAL_NOTA_ITEM 37043.2 \n", "VAL_CUSTO_CONTABIL 37043.2 \n", "VENDEDOR 110011 \n", "VAL_BONUS NaN \n", "MODELO 5U3PN4 \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 2015-02-24 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-04-01 00:00:00 \n", "DTULTIMAFATURA 2016-09-30 00:00:00 \n", "LIMITE_CREDITO 1e+07 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 0 \n", "DES_MODELO VOYAGE 1.6 \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 3 \n", "\n", " 3 \\\n", "EMPRESA_VENDAS 5 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2018-09-18 00:00:00 \n", "CLIENTE 51804 \n", "VAL_TOTAL_REAL_ITEM 41869.3 \n", "VAL_TOTAL_NOTA_ITEM 41869.3 \n", "VAL_CUSTO_CONTABIL 41869.3 \n", "VENDEDOR 4000020 \n", "VAL_BONUS NaN \n", "MODELO 5U7TA4 \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 1900-01-01 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-03-25 00:00:00 \n", "DTULTIMAFATURA 2019-03-07 00:00:00 \n", "LIMITE_CREDITO 0 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 8 \n", "DES_MODELO VW/FOX 1.6 PIME GII \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 1 \n", "\n", " 4 \n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-05-14 00:00:00 \n", "CLIENTE 1 \n", "VAL_TOTAL_REAL_ITEM 127944 \n", "VAL_TOTAL_NOTA_ITEM 127944 \n", "VAL_CUSTO_CONTABIL 127944 \n", "VENDEDOR 110011 \n", "VAL_BONUS NaN \n", "MODELO 2HBB3A \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 2015-02-24 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-04-01 00:00:00 \n", "DTULTIMAFATURA 2016-09-30 00:00:00 \n", "LIMITE_CREDITO 1e+07 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 0 \n", "DES_MODELO PEUGEOT 207 PASSION XS A \n", "FAMILIA 58 \n", "MARCA PEUG \n", "CLASSIFICACAO 1 " ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendas.head().T" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Número de vendas: 7284\n" ] } ], "source": [ "print(f'Número de vendas: {len(df_vendas)}')" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "# save file\n", "df_vendas.to_pickle(PATH/'df_vendas')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following extracts particular date fields from a complete datetime for the purpose of constructing categoricals.\n", "\n", "You should *always* consider this feature extraction step when working with date-time. Without expanding your date-time into these additional fields, you can't capture any trend/cyclical behavior as a function of time at any of these granularities. We'll add to every table with a date field." ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "# load file\n", "file = open(PATH/'df_vendas','rb')\n", "df_vendas = pickle.load(file)\n", "file.close()" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "def add_datepart(df, fldname, drop=True, time=False):\n", " \"Helper function that adds columns relevant to a date.\"\n", " fld = df[fldname]\n", " fld_dtype = fld.dtype\n", " if isinstance(fld_dtype, pd.core.dtypes.dtypes.DatetimeTZDtype):\n", " fld_dtype = np.datetime64\n", "\n", " if not np.issubdtype(fld_dtype, np.datetime64):\n", " df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)\n", " targ_pre = re.sub('[Dd]ate$', '', fldname)\n", " attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',\n", " 'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']\n", " if time: attr = attr + ['Hour', 'Minute', 'Second']\n", " for n in attr: df[targ_pre + '_' + n] = getattr(fld.dt, n.lower())\n", " df[targ_pre + '_' + 'Elapsed'] = fld.astype(np.int64) // 10 ** 9\n", " if drop: df.drop(fldname, axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "add_datepart(df_vendas, \"DTA_ENTRADA_SAIDA\", drop=False)\n", "add_datepart(df_vendas, \"DATA_NASCIMENTO\", drop=False)\n", "add_datepart(df_vendas, \"CLIENTE_DTCADASTRO\", drop=False)\n", "add_datepart(df_vendas, \"DTPRIMEIRAFATURA\", drop=False)\n", "add_datepart(df_vendas, \"DTULTIMAFATURA\", drop=False)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
EMPRESA_VENDAS11151
REVENDA_VENDAS11111
DTA_ENTRADA_SAIDA2015-03-25 00:00:002015-03-31 00:00:002015-04-30 00:00:002018-09-18 00:00:002015-05-14 00:00:00
CLIENTE5230211518041
VAL_TOTAL_REAL_ITEM4880037008.137043.241869.3127944
VAL_TOTAL_NOTA_ITEM4880037008.137043.241869.3127944
VAL_CUSTO_CONTABIL47713.837008.137043.241869.3127944
VENDEDOR1100041100111100114000020110011
VAL_BONUSNaNNaNNaNNaNNaN
MODELO5U4FE46A53K45U3PN45U7TA42HBB3A
DATA_NASCIMENTO1977-09-09 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:001900-01-01 00:00:00
SEXOMNNNN
ESTADO_CIVIL20000
SITUACAO_CREDITOAprovadoAprovadoAprovadoAprovadoAprovado
CLIENTE_TIPONNNNN
TIPO_PESSOAFJJJJ
CLIENTE_DTCADASTRO1900-01-01 00:00:002015-02-24 00:00:002015-02-24 00:00:001900-01-01 00:00:002015-02-24 00:00:00
EMPRESA_ORIGEM11111
REVENDA_ORIGEM11111
DTPRIMEIRAFATURA2015-03-02 00:00:002015-04-01 00:00:002015-04-01 00:00:002015-03-25 00:00:002015-04-01 00:00:00
DTULTIMAFATURA2015-03-25 00:00:002016-09-30 00:00:002016-09-30 00:00:002019-03-07 00:00:002016-09-30 00:00:00
LIMITE_CREDITO01e+071e+0701e+07
CEP177777
CEP277777
CEP350080
DES_MODELOFRONTIERFIAT SIENAVOYAGE 1.6VW/FOX 1.6 PIME GIIPEUGEOT 207 PASSION XS A
FAMILIA111158
MARCAVWVWVWVWPEUG
CLASSIFICACAO31311
DTA_ENTRADA_SAIDA_Year20152015201520182015
..................
CLIENTE_DTCADASTRO_Is_quarter_startTrueFalseFalseTrueFalse
CLIENTE_DTCADASTRO_Is_year_endFalseFalseFalseFalseFalse
CLIENTE_DTCADASTRO_Is_year_startTrueFalseFalseTrueFalse
CLIENTE_DTCADASTRO_Elapsed-220898880014247360001424736000-22089888001424736000
DTPRIMEIRAFATURA_Year20152015201520152015
DTPRIMEIRAFATURA_Month34434
DTPRIMEIRAFATURA_Week1014141314
DTPRIMEIRAFATURA_Day211251
DTPRIMEIRAFATURA_Dayofweek02222
DTPRIMEIRAFATURA_Dayofyear6191918491
DTPRIMEIRAFATURA_Is_month_endFalseFalseFalseFalseFalse
DTPRIMEIRAFATURA_Is_month_startFalseTrueTrueFalseTrue
DTPRIMEIRAFATURA_Is_quarter_endFalseFalseFalseFalseFalse
DTPRIMEIRAFATURA_Is_quarter_startFalseTrueTrueFalseTrue
DTPRIMEIRAFATURA_Is_year_endFalseFalseFalseFalseFalse
DTPRIMEIRAFATURA_Is_year_startFalseFalseFalseFalseFalse
DTPRIMEIRAFATURA_Elapsed14252544001427846400142784640014272416001427846400
DTULTIMAFATURA_Year20152016201620192016
DTULTIMAFATURA_Month39939
DTULTIMAFATURA_Week1339391039
DTULTIMAFATURA_Day253030730
DTULTIMAFATURA_Dayofweek24434
DTULTIMAFATURA_Dayofyear8427427466274
DTULTIMAFATURA_Is_month_endFalseTrueTrueFalseTrue
DTULTIMAFATURA_Is_month_startFalseFalseFalseFalseFalse
DTULTIMAFATURA_Is_quarter_endFalseTrueTrueFalseTrue
DTULTIMAFATURA_Is_quarter_startFalseFalseFalseFalseFalse
DTULTIMAFATURA_Is_year_endFalseFalseFalseFalseFalse
DTULTIMAFATURA_Is_year_startFalseFalseFalseFalseFalse
DTULTIMAFATURA_Elapsed14272416001475193600147519360015519168001475193600
\n", "

94 rows × 5 columns

\n", "
" ], "text/plain": [ " 0 \\\n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-03-25 00:00:00 \n", "CLIENTE 52302 \n", "VAL_TOTAL_REAL_ITEM 48800 \n", "VAL_TOTAL_NOTA_ITEM 48800 \n", "VAL_CUSTO_CONTABIL 47713.8 \n", "VENDEDOR 110004 \n", "VAL_BONUS NaN \n", "MODELO 5U4FE4 \n", "DATA_NASCIMENTO 1977-09-09 00:00:00 \n", "SEXO M \n", "ESTADO_CIVIL 2 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA F \n", "CLIENTE_DTCADASTRO 1900-01-01 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-03-02 00:00:00 \n", "DTULTIMAFATURA 2015-03-25 00:00:00 \n", "LIMITE_CREDITO 0 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 5 \n", "DES_MODELO FRONTIER \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 3 \n", "DTA_ENTRADA_SAIDA_Year 2015 \n", "... ... \n", "CLIENTE_DTCADASTRO_Is_quarter_start True \n", "CLIENTE_DTCADASTRO_Is_year_end False \n", "CLIENTE_DTCADASTRO_Is_year_start True \n", "CLIENTE_DTCADASTRO_Elapsed -2208988800 \n", "DTPRIMEIRAFATURA_Year 2015 \n", "DTPRIMEIRAFATURA_Month 3 \n", "DTPRIMEIRAFATURA_Week 10 \n", "DTPRIMEIRAFATURA_Day 2 \n", "DTPRIMEIRAFATURA_Dayofweek 0 \n", "DTPRIMEIRAFATURA_Dayofyear 61 \n", "DTPRIMEIRAFATURA_Is_month_end False \n", "DTPRIMEIRAFATURA_Is_month_start False \n", "DTPRIMEIRAFATURA_Is_quarter_end False \n", "DTPRIMEIRAFATURA_Is_quarter_start False \n", "DTPRIMEIRAFATURA_Is_year_end False \n", "DTPRIMEIRAFATURA_Is_year_start False \n", "DTPRIMEIRAFATURA_Elapsed 1425254400 \n", "DTULTIMAFATURA_Year 2015 \n", "DTULTIMAFATURA_Month 3 \n", "DTULTIMAFATURA_Week 13 \n", "DTULTIMAFATURA_Day 25 \n", "DTULTIMAFATURA_Dayofweek 2 \n", "DTULTIMAFATURA_Dayofyear 84 \n", "DTULTIMAFATURA_Is_month_end False \n", "DTULTIMAFATURA_Is_month_start False \n", "DTULTIMAFATURA_Is_quarter_end False \n", "DTULTIMAFATURA_Is_quarter_start False \n", "DTULTIMAFATURA_Is_year_end False \n", "DTULTIMAFATURA_Is_year_start False \n", "DTULTIMAFATURA_Elapsed 1427241600 \n", "\n", " 1 \\\n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-03-31 00:00:00 \n", "CLIENTE 1 \n", "VAL_TOTAL_REAL_ITEM 37008.1 \n", "VAL_TOTAL_NOTA_ITEM 37008.1 \n", "VAL_CUSTO_CONTABIL 37008.1 \n", "VENDEDOR 110011 \n", "VAL_BONUS NaN \n", "MODELO 6A53K4 \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 2015-02-24 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-04-01 00:00:00 \n", "DTULTIMAFATURA 2016-09-30 00:00:00 \n", "LIMITE_CREDITO 1e+07 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 0 \n", "DES_MODELO FIAT SIENA \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 1 \n", "DTA_ENTRADA_SAIDA_Year 2015 \n", "... ... \n", "CLIENTE_DTCADASTRO_Is_quarter_start False \n", "CLIENTE_DTCADASTRO_Is_year_end False \n", "CLIENTE_DTCADASTRO_Is_year_start False \n", "CLIENTE_DTCADASTRO_Elapsed 1424736000 \n", "DTPRIMEIRAFATURA_Year 2015 \n", "DTPRIMEIRAFATURA_Month 4 \n", "DTPRIMEIRAFATURA_Week 14 \n", "DTPRIMEIRAFATURA_Day 1 \n", "DTPRIMEIRAFATURA_Dayofweek 2 \n", "DTPRIMEIRAFATURA_Dayofyear 91 \n", "DTPRIMEIRAFATURA_Is_month_end False \n", "DTPRIMEIRAFATURA_Is_month_start True \n", "DTPRIMEIRAFATURA_Is_quarter_end False \n", "DTPRIMEIRAFATURA_Is_quarter_start True \n", "DTPRIMEIRAFATURA_Is_year_end False \n", "DTPRIMEIRAFATURA_Is_year_start False \n", "DTPRIMEIRAFATURA_Elapsed 1427846400 \n", "DTULTIMAFATURA_Year 2016 \n", "DTULTIMAFATURA_Month 9 \n", "DTULTIMAFATURA_Week 39 \n", "DTULTIMAFATURA_Day 30 \n", "DTULTIMAFATURA_Dayofweek 4 \n", "DTULTIMAFATURA_Dayofyear 274 \n", "DTULTIMAFATURA_Is_month_end True \n", "DTULTIMAFATURA_Is_month_start False \n", "DTULTIMAFATURA_Is_quarter_end True \n", "DTULTIMAFATURA_Is_quarter_start False \n", "DTULTIMAFATURA_Is_year_end False \n", "DTULTIMAFATURA_Is_year_start False \n", "DTULTIMAFATURA_Elapsed 1475193600 \n", "\n", " 2 \\\n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-04-30 00:00:00 \n", "CLIENTE 1 \n", "VAL_TOTAL_REAL_ITEM 37043.2 \n", "VAL_TOTAL_NOTA_ITEM 37043.2 \n", "VAL_CUSTO_CONTABIL 37043.2 \n", "VENDEDOR 110011 \n", "VAL_BONUS NaN \n", "MODELO 5U3PN4 \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 2015-02-24 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-04-01 00:00:00 \n", "DTULTIMAFATURA 2016-09-30 00:00:00 \n", "LIMITE_CREDITO 1e+07 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 0 \n", "DES_MODELO VOYAGE 1.6 \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 3 \n", "DTA_ENTRADA_SAIDA_Year 2015 \n", "... ... \n", "CLIENTE_DTCADASTRO_Is_quarter_start False \n", "CLIENTE_DTCADASTRO_Is_year_end False \n", "CLIENTE_DTCADASTRO_Is_year_start False \n", "CLIENTE_DTCADASTRO_Elapsed 1424736000 \n", "DTPRIMEIRAFATURA_Year 2015 \n", "DTPRIMEIRAFATURA_Month 4 \n", "DTPRIMEIRAFATURA_Week 14 \n", "DTPRIMEIRAFATURA_Day 1 \n", "DTPRIMEIRAFATURA_Dayofweek 2 \n", "DTPRIMEIRAFATURA_Dayofyear 91 \n", "DTPRIMEIRAFATURA_Is_month_end False \n", "DTPRIMEIRAFATURA_Is_month_start True \n", "DTPRIMEIRAFATURA_Is_quarter_end False \n", "DTPRIMEIRAFATURA_Is_quarter_start True \n", "DTPRIMEIRAFATURA_Is_year_end False \n", "DTPRIMEIRAFATURA_Is_year_start False \n", "DTPRIMEIRAFATURA_Elapsed 1427846400 \n", "DTULTIMAFATURA_Year 2016 \n", "DTULTIMAFATURA_Month 9 \n", "DTULTIMAFATURA_Week 39 \n", "DTULTIMAFATURA_Day 30 \n", "DTULTIMAFATURA_Dayofweek 4 \n", "DTULTIMAFATURA_Dayofyear 274 \n", "DTULTIMAFATURA_Is_month_end True \n", "DTULTIMAFATURA_Is_month_start False \n", "DTULTIMAFATURA_Is_quarter_end True \n", "DTULTIMAFATURA_Is_quarter_start False \n", "DTULTIMAFATURA_Is_year_end False \n", "DTULTIMAFATURA_Is_year_start False \n", "DTULTIMAFATURA_Elapsed 1475193600 \n", "\n", " 3 \\\n", "EMPRESA_VENDAS 5 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2018-09-18 00:00:00 \n", "CLIENTE 51804 \n", "VAL_TOTAL_REAL_ITEM 41869.3 \n", "VAL_TOTAL_NOTA_ITEM 41869.3 \n", "VAL_CUSTO_CONTABIL 41869.3 \n", "VENDEDOR 4000020 \n", "VAL_BONUS NaN \n", "MODELO 5U7TA4 \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 1900-01-01 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-03-25 00:00:00 \n", "DTULTIMAFATURA 2019-03-07 00:00:00 \n", "LIMITE_CREDITO 0 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 8 \n", "DES_MODELO VW/FOX 1.6 PIME GII \n", "FAMILIA 1 \n", "MARCA VW \n", "CLASSIFICACAO 1 \n", "DTA_ENTRADA_SAIDA_Year 2018 \n", "... ... \n", "CLIENTE_DTCADASTRO_Is_quarter_start True \n", "CLIENTE_DTCADASTRO_Is_year_end False \n", "CLIENTE_DTCADASTRO_Is_year_start True \n", "CLIENTE_DTCADASTRO_Elapsed -2208988800 \n", "DTPRIMEIRAFATURA_Year 2015 \n", "DTPRIMEIRAFATURA_Month 3 \n", "DTPRIMEIRAFATURA_Week 13 \n", "DTPRIMEIRAFATURA_Day 25 \n", "DTPRIMEIRAFATURA_Dayofweek 2 \n", "DTPRIMEIRAFATURA_Dayofyear 84 \n", "DTPRIMEIRAFATURA_Is_month_end False \n", "DTPRIMEIRAFATURA_Is_month_start False \n", "DTPRIMEIRAFATURA_Is_quarter_end False \n", "DTPRIMEIRAFATURA_Is_quarter_start False \n", "DTPRIMEIRAFATURA_Is_year_end False \n", "DTPRIMEIRAFATURA_Is_year_start False \n", "DTPRIMEIRAFATURA_Elapsed 1427241600 \n", "DTULTIMAFATURA_Year 2019 \n", "DTULTIMAFATURA_Month 3 \n", "DTULTIMAFATURA_Week 10 \n", "DTULTIMAFATURA_Day 7 \n", "DTULTIMAFATURA_Dayofweek 3 \n", "DTULTIMAFATURA_Dayofyear 66 \n", "DTULTIMAFATURA_Is_month_end False \n", "DTULTIMAFATURA_Is_month_start False \n", "DTULTIMAFATURA_Is_quarter_end False \n", "DTULTIMAFATURA_Is_quarter_start False \n", "DTULTIMAFATURA_Is_year_end False \n", "DTULTIMAFATURA_Is_year_start False \n", "DTULTIMAFATURA_Elapsed 1551916800 \n", "\n", " 4 \n", "EMPRESA_VENDAS 1 \n", "REVENDA_VENDAS 1 \n", "DTA_ENTRADA_SAIDA 2015-05-14 00:00:00 \n", "CLIENTE 1 \n", "VAL_TOTAL_REAL_ITEM 127944 \n", "VAL_TOTAL_NOTA_ITEM 127944 \n", "VAL_CUSTO_CONTABIL 127944 \n", "VENDEDOR 110011 \n", "VAL_BONUS NaN \n", "MODELO 2HBB3A \n", "DATA_NASCIMENTO 1900-01-01 00:00:00 \n", "SEXO N \n", "ESTADO_CIVIL 0 \n", "SITUACAO_CREDITO Aprovado \n", "CLIENTE_TIPO N \n", "TIPO_PESSOA J \n", "CLIENTE_DTCADASTRO 2015-02-24 00:00:00 \n", "EMPRESA_ORIGEM 1 \n", "REVENDA_ORIGEM 1 \n", "DTPRIMEIRAFATURA 2015-04-01 00:00:00 \n", "DTULTIMAFATURA 2016-09-30 00:00:00 \n", "LIMITE_CREDITO 1e+07 \n", "CEP1 7 \n", "CEP2 7 \n", "CEP3 0 \n", "DES_MODELO PEUGEOT 207 PASSION XS A \n", "FAMILIA 58 \n", "MARCA PEUG \n", "CLASSIFICACAO 1 \n", "DTA_ENTRADA_SAIDA_Year 2015 \n", "... ... \n", "CLIENTE_DTCADASTRO_Is_quarter_start False \n", "CLIENTE_DTCADASTRO_Is_year_end False \n", "CLIENTE_DTCADASTRO_Is_year_start False \n", "CLIENTE_DTCADASTRO_Elapsed 1424736000 \n", "DTPRIMEIRAFATURA_Year 2015 \n", "DTPRIMEIRAFATURA_Month 4 \n", "DTPRIMEIRAFATURA_Week 14 \n", "DTPRIMEIRAFATURA_Day 1 \n", "DTPRIMEIRAFATURA_Dayofweek 2 \n", "DTPRIMEIRAFATURA_Dayofyear 91 \n", "DTPRIMEIRAFATURA_Is_month_end False \n", "DTPRIMEIRAFATURA_Is_month_start True \n", "DTPRIMEIRAFATURA_Is_quarter_end False \n", "DTPRIMEIRAFATURA_Is_quarter_start True \n", "DTPRIMEIRAFATURA_Is_year_end False \n", "DTPRIMEIRAFATURA_Is_year_start False \n", "DTPRIMEIRAFATURA_Elapsed 1427846400 \n", "DTULTIMAFATURA_Year 2016 \n", "DTULTIMAFATURA_Month 9 \n", "DTULTIMAFATURA_Week 39 \n", "DTULTIMAFATURA_Day 30 \n", "DTULTIMAFATURA_Dayofweek 4 \n", "DTULTIMAFATURA_Dayofyear 274 \n", "DTULTIMAFATURA_Is_month_end True \n", "DTULTIMAFATURA_Is_month_start False \n", "DTULTIMAFATURA_Is_quarter_end True \n", "DTULTIMAFATURA_Is_quarter_start False \n", "DTULTIMAFATURA_Is_year_end False \n", "DTULTIMAFATURA_Is_year_start False \n", "DTULTIMAFATURA_Elapsed 1475193600 \n", "\n", "[94 rows x 5 columns]" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vendas.head().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Missing Values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we'll fill in missing values to avoid complications with `NA`'s. `NA` (not available) is how Pandas indicates missing values; many models have problems when missing values are present, so it's always important to think about how to deal with them. In these cases, we are picking an arbitrary *signal value* that doesn't otherwise appear in the data." ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "list_col = list(df_vendas.columns.values)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "VAL_BONUS\n", "DES_MODELO\n", "FAMILIA\n", "MARCA\n", "CLASSIFICACAO\n" ] } ], "source": [ "# list columns with missing values\n", "for col in list_col:\n", " s = df_vendas[col].isna().sum()\n", "# n = (df_vendas[col] == '').sum()\n", " if s != 0 :\n", " print(col)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "df_vendas['VAL_BONUS'] = df_vendas.VAL_BONUS.fillna(0.0).astype(np.float32)\n", "df_vendas['FAMILIA'] = df_vendas['FAMILIA'].fillna(1.0).astype(np.int32)\n", "df_vendas['CLASSIFICACAO'] = df_vendas['CLASSIFICACAO'].fillna(1.0).astype(np.int32)" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DES_MODELO\n", "MARCA\n" ] } ], "source": [ "# list columns with missing values\n", "for col in list_col:\n", " s = df_vendas[col].isna().sum()\n", "# n = (df_vendas[col] == '').sum()\n", " if s != 0 :\n", " print(col)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "# save file\n", "df_vendas.to_pickle(PATH/'df_vendas_clean')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nota: como não usaremos as colunas `DES_MODELO` e `MARCA`, seus processamentos não valem a pena." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Durations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "É comum trabalhar com dados de séries temporais para extrair dados que explicam os relacionamentos entre linhas, em oposição a colunas, por exemplo:\n", "* Média de funcionamento\n", "* Tempo até o próximo evento\n", "* Tempo desde o último evento\n", "\n", "Isso geralmente é difícil de fazer com a maioria das estruturas de manipulação de tabelas, pois elas são projetadas para trabalhar com relacionamentos entre colunas. \n", "\n", "Assim, criamos uma nova coluna `DTA_VENDA_CADASTRO_DaysElapsed` que fornece o número de dias entre o dia de venda (`DTA_ENTRADA_SAIDA`) e o de cadastro (`CLIENTE_DTCADASTRO`)." ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "# load file\n", "df_vendas = pd.read_pickle(PATH/'df_vendas_clean')" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "df_vendas['DTA_VENDA_CADASTRO_DaysElapsed'] = ((df_vendas.DTA_ENTRADA_SAIDA_Elapsed - df_vendas.CLIENTE_DTCADASTRO_Elapsed)/(24*60*60)).copy()\n", "\n", "f = lambda x: int(x)\n", "df_vendas['DTA_VENDA_CADASTRO_DaysElapsed'] = df_vendas['DTA_VENDA_CADASTRO_DaysElapsed'].apply(f)\n", "\n", "df_vendas['DTA_VENDA_CADASTRO_DaysElapsed'] = df_vendas['DTA_VENDA_CADASTRO_DaysElapsed'].astype(np.int32)" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "# save file\n", "df_vendas.to_pickle(PATH/'df_vendas_clean_DaysElapsed')" ] } ], "metadata": { "kernelspec": { "display_name": "fastai test v1", "language": "python", "name": "fastai_test_v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }