This markdown contains the example code for the blog post [How to read and write Stata files in R](https://www.marsja.se/how-to-read-and-write-stata-dta-files-in-r-with-haven/). The code is, of course, more thouroughly explained in that blog post. 

Note, the difference between this example, and the code in the blog post is as that "RScript" is removed frome the file.path() function. This is done, because when running the Jupyter notebook will get the path of the .ipynp file. That is, if the file is in "RScripts" already we only need tell R where the data files are (i.e., in "Data"). Remember to change the file path so it corresponds to where the .dta files are. If we run the code, as a script, in RStudio, for instance, we need the "RScript" to be the second argument. E.g., <code>dtafile &lt;- file.path(getwd(), "RScript", "Data", "FifthDayData.dta")</code>.

## Install Haven if missing

First, we need the packages. This first code chunk will install them if we don't have them installed. Note, that we could just install tidyverse and get all the above packages.


In [11]:
list.of.packages <- c("haven", "readr", "readxl", "dplyr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

## Path
Note, if you have cloned the Git repository or downloaded the files from the git repository and have the example data in the same folder as in the repo you can do as follows to get a running script:

In [12]:
data.path <- file.path(getwd(), "..", "SimData")


## Load Haven:

In [13]:
library(haven)

## Load a .dta file:

Remember to change to the path where the .dta file is.

In [14]:
dtafile <- file.path(data.path, "FifthDayData.dta")

## Load the .dta file

fifthD.df <- read_dta(dtafile)


## Get the first five rows
head(fifthD.df)

index,ID,Name,Day,Age,Response,Gender
<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
0,1,John,Fifth,23,0.453733,0
1,2,Billie,Fifth,22,0.2573597,0
2,3,Robert,Fifth,20,0.4433932,0
3,4,Don,Fifth,27,0.4235921,0
4,5,Joseph,Fifth,21,0.5713554,0
5,6,James,Fifth,25,0.5577922,0


## Load a .dta file from URL:

Now, here we will read the Stata file from a URL:

In [15]:
url = "http://www.principlesofeconometrics.com/stata/broiler.dta"

data.df <- read_dta(url)

head(data.df)

year,q,y,pchick,pbeef,pcor,pf,cpi,qproda,pop,meatex,time
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1950,14.3,7863,69.5,31.2,59.8,,24.1,2628500,151.684,,41
1951,15.1,7953,72.9,36.5,72.1,,26.0,2843000,154.287,,42
1952,15.3,8071,73.1,36.2,71.3,,26.5,2851200,156.954,,43
1953,15.2,8319,71.3,28.5,62.7,,26.7,2953900,159.565,,44
1954,15.8,8276,64.4,27.4,63.4,,26.9,3099700,162.391,,45
1955,14.7,8675,67.0,27.1,56.1,,26.8,2958100,165.275,,46


## Read a specific column


In [16]:
data.df <- read_dta(url, col_select="pbeef")

head(data.df)

pbeef
<dbl>
31.2
36.5
36.2
28.5
27.4
27.1


## Read multiple columns


In [17]:
cols <- c("year", "pbeef", "q", "pop")

data.df <- read_dta(url, col_select=all_of(cols))

head(data.df)

year,q,pbeef,pop
<dbl>,<dbl>,<dbl>,<dbl>
1950,14.3,31.2,151.684
1951,15.1,36.5,154.287
1952,15.3,36.2,156.954
1953,15.2,28.5,159.565
1954,15.8,27.4,162.391
1955,14.7,27.1,165.275


## Read multiple columns

In [18]:
library(haven);library(dplyr)

## Dta file:
dtafile  <-  file.path(data.path, "FifthDayData.dta")

dta.df <- read_dta(dtafile)

newdta.df <- select(dta.df, -c(index, Day))

write_dta(newdta.df, file.path(data.path, "NewFifthDayData.dta"))

## Read a CSV File and Write a .dta file

Here we use readr and read_csv to read a CSV file and then write it to a .dta file:

In [19]:
library(readr)

csvfile <- file.path(data.path, "FirstDayData.csv") 

data.df <- read_csv(csvfile)

## Saving it as a dta

write_dta(data.df, file.path(data.path, "FirstDayData.dta"))

Parsed with column specification:
cols(
  ID = [32mcol_double()[39m,
  Name = [31mcol_character()[39m,
  Day = [31mcol_character()[39m,
  Age = [32mcol_double()[39m,
  Response = [32mcol_double()[39m,
  Gender = [32mcol_double()[39m
)


## Read a Excel File and Write a .dta file

Here we use readxl and read_excel to read a CSV file and then write it to a .dta file:

In [20]:
library(readxl)

csvfile <- file.path(data.path, "example_concat.xlsx") 

data.df <- read_excel(csvfile)

## Saving it as a dta

write_dta(data.df, file.path(data.path,  "play_data2.dta"))