# MPG Cars

Check out [Cars Exercises Video Tutorial](https://www.youtube.com/watch?v=avzLRBxoguU&list=PLgJhDSE2ZLxaY_DigHeiIDC1cD09rXgJv&index=3) to watch a data scientist go through the exercises

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [24]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a to a variable called cars1 and cars2

In [2]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [12]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [14]:
print(cars1.shape)
print(cars2.shape)

(198, 9)
(200, 9)


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [23]:
cars = cars1.append(cars2)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
5,15.0,8,429,198,4341,10.0,70,1,ford galaxie 500
6,14.0,8,454,220,4354,9.0,70,1,chevrolet impala
7,14.0,8,440,215,4312,8.5,70,1,plymouth fury iii
8,14.0,8,455,225,4425,10.0,70,1,pontiac catalina
9,15.0,8,390,190,3850,8.5,70,1,amc ambassador dpl


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [33]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

array([29487, 25680, 65268, 31827, 69215, 72602, 52693, 58440, 16183,
       45014, 32318, 72942, 62163, 35951, 57625, 59355, 36533, 67048,
       58159, 69743, 25146, 22755, 44966, 46792, 56553, 65013, 55908,
       69563, 22030, 59561, 15593, 52998, 54795, 16169, 24809, 35580,
       46590, 38792, 43099, 37166, 21390, 56496, 68606, 21110, 56334,
       45477, 51961, 27625, 51176, 30796, 61809, 65450, 67375, 23342,
       27499, 50585, 57302, 56191, 60281, 32865, 58605, 66374, 15315,
       31791, 28670, 38796, 69214, 41055, 32353, 31574, 65799, 42998,
       72785, 18415, 31977, 29812, 65439, 21161, 60871, 67151, 22179,
       32821, 55392, 34586, 67937, 31646, 66397, 35258, 63815, 71291,
       51130, 27684, 49648, 52691, 50681, 68185, 32635, 51553, 28970,
       19112, 26035, 67666, 55471, 51477, 62055, 53003, 41265, 18565,
       48851, 48673, 45832, 67891, 57638, 29240, 41236, 16950, 31449,
       50528, 22397, 15876, 26414, 16736, 23896, 46104, 17583, 65951,
       38538, 31443,

### Step 8. Add the column owners to cars

In [34]:
cars['owners'] = nr_owners
cars.tail()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,21825
196,44.0,4,97,52,2130,24.6,82,2,vw pickup,69344
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage,63210
198,28.0,4,120,79,2625,18.6,82,1,ford ranger,15982
199,31.0,4,119,82,2720,19.4,82,1,chevy s-10,20259
