********************************************************************************
**                          BASIC COMMAND LINE USAGE                          **
********************************************************************************

* For the purpose of this introductory cookbook, we use one command only:
* summarize. This comamnd provides basic descriptive statitics for one or more
* variables in your dataset, including the number of observations, their mean
* value and standard deviation, as well as the largest and smallest value.

* All examples are based on the Kyrgyz household survey, and particularly the
* POVERTY.dta dataset.


* USING ONE OR MORE VARIABLES
* Most commands can be applied to one or more variables. To apply the command
* to ALL variables, enter the command only:
summarize

* To apply the command to individual variables, list them after the command,
* separated by a space:

summarize toty totx

* You can use wildcards to save yourself some typing. The asterisk * stands for
* any character or characters (one or many). For example:

summarize y*

* ... is the same as listing every single variable whose name starts with a Y:

summarize y1 y2 y3 y5 y6 y7 y8 y9 y10 y11 y12 y13 y41 y42 y42 y44 y45 y46

* Another wildcard is the question mark ? standing for any one character:

summarize y?3

* is the same as listing every single variable whose name started with a Y,
* followed by any character and ending with a 3:

summarize y13 y43


* ABBREVIATING VARIABLES NAMES
* Stata is all about saving you some typing. The most common commands can be
* abbreviated a lot, e.g. summarize can be abbreviated to "su". Writing:

su toty

* is the same as writing

summarize toty

* If you want to know if you can abbreviate a variable name, use the help
* function:

help summarize

* Under syntax, the abbreviated version of the variable name will be underlined.


* USING ALL OBSERVATIONS OR A SUBSET
* By default, Stata will apply the command to all observations for your
* variable, possibly excluding any missing values. But sometimes you don't want
* all observations, but only a subset. For example, the following command will
* give you a summary of the variables for total household income and
* expenditure.

summarize toty totx

* But perhaps you care only about households in rural or urban areas. The
* variable b002 tells you if a household is in an urban (1) or rural (2) area.
* This command summarizes income and expenditure for urban households by
* telling Stata to only include observations if the area indicator has the
* right value.

* Here are the summary values for urban households:

summarize totx toty if b002 == 1

* ... and here for rural households:

summarize totx toty if b002 == 2

* You will usually find that rural households have a lower average income and
* higher average expenditures. Makes sense!

* You can force Stata to take a subset of observations that fulfills multiple
* options at the same time (logical AND). The following command gives us rural
* households with no children under 14 living at home. We use the ampersand
* character & to say AND:

summarize totx toty if b002 == 2 & child14 == 0

* The average expenses are usually lower than the average for all rural
* households. Makes sense!

* You can also ensure that Stata checks whether one of several conditions is
* fulfilled (logical OR). The following command gives us all households who
* have some income from work:

summarize totx toty if y1 > 0

* And this command gives us all households who have some income from pensions:

summarize totx toty if y2 > 0

* This command here gives us all households who have income from work OR from
* pensions. We use the pipe character | to say OR:

summarize totx toty if y1 > 0 | y2 > 0

* Notice something? The total amount of households with some income from either (4801) is less than the sum of the households who have
* income from work (4069) and from pensions (2307). Logically, there
* must be 1575 households who have some income from work and
* additionally some income from pensions!


* AUTO-COMPLETING VARIABLE NAMES
* Note that you can use the tab key to extend a variable name. This
* works fine if there is only one variable name that you could mean.
* Try typing "summarize xs" and then press the tab key -->|  You'll
* get:

summarize xserv 

* If you could mean many different names, Stata won't do anything.
* Try typring "summarize y1" and then press the tab key -->|  Stata
* won't do a thing. That's because you could mean y1, y10, y11, y12
* or y13 -- and Stata does not know which one you want.

* If there are multiple options, but Stata can save you some typing,
* it will go all the way to the letter where you have to make a
* choice: Try typring "summarize qui" and then press the tab key -->|
* You'll get:

summarize quintil

* That's because there are two possible variables: quintilc and
* quintilx. Stata doesn't know which one you want, but will
* auto-complete to quintil and leave the choice for the final letter
* to you.


********************************************************************************
**                              IMPORTING DATA                                **
********************************************************************************


* Stata can import data in a variety of formats. The two most common formats are
* text files (comma-separated values = csv), and Excel files. Virtually all data
* is either delivered in either of these formats, or can be exported into one
* of these formats.


** WARNING

* Before you import new data, make sure that any existing data has been saved
* and cleared from Stata memory!


** COMMA-SEPARATED VALUE TEXT FILES

* The "less messy" way is to import from a text file. This is very plain file
* format that does not require a lot of disk space, and Stata is pretty smart
* when reading this file format. Usually, you can simply import with:
import delimited "filename.csv"

* Stata will treat the first row as variable names, detect the types of
* varibales automatically and import all observations into memory.

* Be sure to check your data after import, pay particular attention to the
* following problems:

*  1  Do the variable names look ok? If not, re-import or re-label
*  2  Are numeric variables actually numeric? If not, see if you can recognize
*     the source of the problem, e.g. decimal commas when Stata expects decimal
*     points. Fix the problem, then use
      destring variablename, replace
*     to fix the problem. Not solved? Try re-importing using the import dialog.
*  3  Check if categorical variables are displayed as numbers. If so, you may
*     want to give each value a label (see labeling.do).
*  4  Check that all observations were imported. If anything is missing, try
*     re-importing using the import dialog.

* As you may have noticed, an easy problem-solving approach is to first import
* with "import delimited", and if problems show up, revert to the regular dialog
* you can find under File > Import > Text data delimited in the Stata menu.
* Using this dialog allows you to adjust all import options while checking a
* preview of what the imported data will look like. Once you are happy with the
* settings, the import dialog will produce the corresponding command. Make sure 
* your log is on so that you capture the correct command.


** MICROSOFT EXCEL

* The "slightly more messy" way is to import from an Excel sheet. Stata can work
* with old and new Excel sheets (.xls and .xlsx, respectively). Because Excel
* sheets can contain multiple worksheets, and because each worksheet can
* contain multiple tables, you have to specify where the table you want to
* import is located by giving the name of the worksheet and the top-left and
* bottom-right corners of the table.

* Open your workbook in Excel and take note of the name of the worksheet. Excel
* files created on an English-language Mac or PC tend to name their sheets
* "Sheet1", "Sheet2", etc. However, these labels are different if the workbook
* was created on a computer with a different language, and the user can change
* the names of the sheets as well. It's best to copy the name to the clipboard
* excactly: double click on the worksheet name (printed on the tab at the
* bottom of the Excel window). The name is now highlighted and can be copied and
* pasted into the Stata command below.

* Next, check whether your table has the names of the variables in the first
* row only. Stata only accepts variable names in the top-most row, but Excel
* has no such restrictions. If you have variable names in more than one row,
* adjust your table so that the top row has clear, unique variable names and
* every other row has individual observations.

* Finally, take note of the top-left and bottom-right cells of the table you
* want Stata to import. E.g. if you have a table that starts at the top-left of
* the sheet and has ten columns and ten rows, the top-left cell would be A1 and
* the bottom-right cell would be J10.

* Adjust the command below with the right name and range to import the table
* from Excel:
import excel "filename.xlsx", sheet("Sheet1") cellrange(A1:J10) firstrow

* If you have only observations and no variable names, you need to skip the
* "firstrow" option.

* If you have trouble importing from Excel with the above steps, use the import
* dialog by choosing File > Import > Excel spreadsheet from the Stata menu.
* The dialog allows you to select the proper sheet and enter the right cell
* range; and provides you with a preview of what Stata will import so you can
* check that the settings are correct.


** OTHER FILE TYPES
* Stata allows importing from other file types, including XML (a prominent new
* file format often used for online data), SAS (a different statistics
* package), and various text files.

* You won't need these file types in the near future, and they are comparatively
* rare. We recommend two options:

*  1  If possible, export or convert these files into Excel or CSV format. These
*     formats are sufficiently standardized and "well-behaved" that adding this
*     step often makes the overall process less painful.
*  2  Use the individual import mechanisms offered in the Stata menu under
*     File > Import. The custom dialogs will walk you through the process of
*     importing the data you need and often allow you to preview the import so
*     you can check for problems. Make sure you are already logging while
*     importing, so that the log captures the correct import command.


** IMPORTING FROM DATABASES

* Stata can connect directly from databases. This usually requires specific
* settings that specify where the database is located (usually on a server on
* the internet), any user name and password that may be needed to get access,
* and how Stata should communicate with the database.

* Should you need to connect to a database, make sure that you have the
* documentation specific for that database and use the "odbc" command to connect
* to it. You can find out more here:
help odbc


********************************************************************************
**                            LOGGING STATA OUTPUT                            **
********************************************************************************

* For reproducibility, it is essential to keep a complete log of what your
* analyses. Stata offers logging functions that automate this task for you --
* the only thing you need to do is switch on the log at the start of each
* analysis and switch it off at the end. In between these two commands, Stata
* will write the following things to the log file:
* _1_ Every command you type or run from a do-file
* _2_ Every comment you make or including in a do-file (handy for annotations!)
* _3_ All output that your commands generate except for graphs.


** OPENING YOUR LOG FILE -- Pretty straightforward:

* This command starts logging immediately in the specified file name
log using "my log filename.log"

* Be sure to include the file name in quotation marks except if the filename is
* one word only.

* Stata knows two file formats for logs, we strongly recommend plain text log
* files, since you can open them in any text editor or word processor like
* Microsoft Word or LibreOffice. As you as the file name ends on .log, Stata
* will save the log in the correct format. However, you can force Stata to play
* nice by explicitly telling it to save the log as text:
log using "my log filename.log", text


* If you using the same log file for analysis in multiple sessions, Stata will
* complain if you are at risk of overwriting an existing file. This forces you
* to choose what you want to do:

* You can CONTINUE the old log file:
log using "my log filename.log", append
* All information in the log file is kept and all new information is added to
* the end of the log file. This is the better option in most circumstances,
* since it avoids any data loss.

* You can OVERWRITE the old log file:
log using "my log filename.log", replace
* You lose everything contained in the log file at this point, so you should
* only use this option if you are sure you can afford to lose this information!


* An easy way to avoid potential data loss is to give each log file a file name
* that includes the date and time when the analysis was started. You can do this
* automatically with the command:
log using "Log `c(current_date)' `c(current_time)'.log"

* Feel free to change the name of the log, but make sure to keep the commands
* for date and time -- `c(current_date)' and `c(current_time)' -- in the file
* name.


* We recommend giving each log file a name:
log using "Log `c(current_date)' `c(current_time)'.log", name("HELPFUL NOTE")
* The name is printed at the very top of the log file and allows you to leave a
* note to your future self about what you are doing in this log file.


* Are you doing unspeakable things to your data that you don't want to record?
* You can pause the log to exclude a part of your analysis from the log:
log off
* All done and want to switch the log back on?
log on

* This is particularly useful if you are exploring or tinkering with your data
* and don't need an actual log of what you are doing. However, there is no harm
* in keeping the log running at all times. Who knows? You might find something
* useful when playing with the data, and if the log isn't on, you don't have a
* record of it...


* At the end of your analysis, you should close your log properly:
log close _all

* This ensures that your log file is properly written to disk. Using _all here
* tells Stata to close any log, independent of the helpful name we have given
* it above.


********************************************************************************
**                       LABELING VARIABLES AND VALUES                        **
********************************************************************************

* It is essential that variables and values in your data set are carefully named
* and labeled. Good variable names reduce the amount of typing you have to do,
* good variable labels help you (in the future) to remember what information is
* contained in each variable, and correct value labels make sure that
* categorical information is correctly understood and analyzed.


** (RE-)NAMING YOUR VARIABLES -- Especially when importing data from other
* sources, you may find that the variable names are not as succinct as they
* could be.

* You can easily rename variables using:
rename oldname newname
* As long as the oldname correctly identifiea an existing variable and the new
* name is an acceptable name, Stata will rename the variable for you.

* The abbreviated version of "rename" is "ren":
ren old new

* As an example, let's say that we got household data that includes the number
* of children under 5 and under 14 in a household, and the data was in an Excel
* sheet with the very clear, but verbose variable names:
*     No. of children aged <=5
*     No. of children aged <=14
* Upon importing the data into Stata, the variable names become:
*     noofchildrenaged5
*     noofchildrenaged14
* This is hard to read and cumbersome to type. We could rename these variables
* to something shorter, but still sufficiently clear:
rename noofchildrenaged5 child5
rename noofchildrenaged14 child14

* You will want to rename your variables right after importing, so that all
* future commands use the new names consistently. If you rename variables in the
* middle of your do-file, you may have commands referring to the same variable
* but using the old name before the renaming command and the new name after the
* renaming command. Because these names are different, somebody reading your
* code will have a harder time seeing that you are working with the same
* variable. Ergo: rename right after import.

* Nota bene:
* For the KIHS, the names are already pretty good, because the dataset is
* created by professional statisticians who understand the ways in which
* statistical packages tend to work. We request that you stick to the original
* variable names so that we have an easier time reading your do-file.


** LABELING YOUR VARIABLES -- In Stata, variable names are meant to be short and
* to the point, so that they quick to type and easy to understand. However, 
* variable names are usually to brief to contain all information you will need.
* For this reason, Stata offers variable labels, which can contain a lot more
* information.

* Ideally, even the simplest variables should have a label and we encourage
* proper labeling in every dataset you work on. While you may not be creating
* data sets that will be distributed to many users (yet!), you do have one all-
* important constituent who needs to understand your data: yourself in the
* future. Careful labeling helps you understand your own data set when you come
* back to it in a few weeks, months or even years. If you label your data now,
* you won't have to travel back in time to punish yourself for making your own
* life harder.

* There's an additional benefit to labeling data: writing good variable labels
* forces you to think carefully about what each variable contains. If something
* is unclear to you know (e.g. what currency is this income data in?), you can
* probably investigate and fix this problem, or you are at least aware that a
* problem exists and needs to be taken into account when interpreting the data.

* To give any variable a label, use:
label variable some_variable "Description of the variable, up to 80 characters"

* The abbreviated version is:
la var variable "Description of the variable, up to 80 characters"


** LABELING VALUES OF CATEGORICAL VARIABLES -- Many datasets contain categorical
* variables which are stored as numerical codes that have an assigned meaning.
* Labeling the values is what tells Stata (and the users of the data set) which
* numerical code stands for which meaning.

* Assigning value labels proceeds in two steps. First, you create a list of the
* numerical codes (or values) and their label appropriate label:
label define name_of_this_set_of_values 1 "What does 1 stand for?" 2 "What..."
* Then, you tell Stata to use this set of labels on a particular variable
label values variable_name name_of_this_set_of_values

* We might have a marital status variable with different categories:
label define statuslabels 0 "not married" 1 "married" 2 "divorced" 3 "widowed"
label values maritalstatus statuslabels

* The abbreviated versions of this commands are "la de" and "la val":
la de genderlabels 0 "male" 1 "female"
la val gender genderlabels

* Your set of value labels can contain tens of thousands of value-label pairs.


** THE CODEBOOK -- If you have received a properly labeled dataset from a source
* or if you labeled a dataset yourself, you can use the
codebook
* command to get a comprehensive overview of all variables. For each variable,
* you will get the name, the label, the type of the variable, and useful
* summary statistics such as the number of missing observations, the range,
* average, standard deviation and percentiles (for numeric information), or a
* tabulation (for categorical information).

* Especially when getting a new dataset or refreshing your memory of an existing
* dataset, this should be your go-to command.


********************************************************************************
**                                  GRAPHING                                  **
********************************************************************************

* One of the best ways for understanding a dataset is to make graphs for its
* various variables. Stata offers a number of graphing commands 


** GRAPHING CATEGORICAL DATA (PIE CHART) -- Pie charts are useful for showing
* how 100% of something are distributed among various categories. A draw-back of
* pie charts is that visual comparison between similarly sized categories is not
* easy -- but otherwise, they are easy to interpret.

* To devide information by categories, Stata uses the over option. In our
* dataset, the variable "priz" contains information about the terrain (flat vs.
* mountainous). Here is a pie chart based on the frequencies of each terrain
* category:
graph pie, over(priz)

* By default, the pie chart comes with a legend if the variable has value labels
* assigned. You can customize the appearance -- see below in this text, or in
* the documentation under
help graph pie

* The abbreviated version of "graph pie" is "gr pie":
gr pie, over(priz)

* You can order the slices from smallest to largest (starting at 12 o'clock and
* going clockwise) by adding the "sort" option:
graph pie, over(priz) sort

* If you prefer to go from largest to smallest, also add "descending":
graph pie, over(priz) sort descending


** GRAPHING CATEGORICAL DATA (BAR CHART) -- Bar charts can fulfill the same task
* for categorial data as pie charts, and have the added advantage that the size
* of similar categories can be more easily perceived.

* The command works the same way as "graph pie":
graph bar, over(priz)

* The abbreviated version of "graph bar" is "gr bar":
gr bar, over(priz)

* If the variable has value labels, they will be used to label the bars for each
* category.

* By default, bar charts have the categories along the x-axis, with bars growing
* upward. The y-axis is labeled in percent for the relative shares of each
* category. If you would like to change the orientation, with the categories
* along the y-axis and percentage along the x-axis, you can use "hbar" instead
* of "bar":
graph hbar, over(priz)

* You can customize the appearance of the bar chart further. See below in this
* text, or in the documentation under
help graph bar


** GRAPHING QUANTITATIVE DATA (HISTOGRAM) -- Histograms are very useful for
* quantitative information because break the entire range of values into
* individual bins and show the frequencies with which observations fall into the
* different bins.

* Our dataset has a variable "totx" which represents the total expenditures of
* households. Creating a histogram for it easy:
histogram totx

* The abbreviated version of "histogram" is "hist":
hist totx

* The histogram will show the entire range and the label of the variable along
* its x-axis. The y-axis is labeled with densities, which are not easy to
* interpret for "normal" earthlings. We recommend switching to percentages or
* frequencies:
histogram totx, percent
histogram totx, frequency

* Later in our course, we'll find that it may be useful to see how "normal"
* a distribution is. Stata allows us to visually inspect this by adding a
* normal distribution to the histogram with the "normal" option:
histogram totx, percent normal

* You can get more documentation on histograms with:
help histogram
* ... and you can customize its appearence (see below!)


** COMPARING QUANTITATIVE DATA BY CATEGORIES (BOX PLOT) -- Box plots are 
* summary graphs that show the distribution of a variable in comparison for
* multiple categories, with indications for the center, central 50% of
* observations, and outliers.

* Our dataset identifies whether a household is located in a rural or urban
* area ("b002"). We can use a boxplot to compare rural and urban household
* expenditures:
graph box totx, over(b002)

* The abbreviated version of "graph box" is "gr box":
gr box totx, over(b002)

* By default, the graph is labeled for each category if the categorical variable
* has value labels, and the range and label of the quantitative variable is
* provided. You can customize the appearance of the graph: see below or read
* the documentation on box plots with:
help graph box


** PLOTTING TWO QUANTITATIVE VARIABLES AGAINST EACH OTHER (SCATTER PLOT) --
* Scatter plots are useful for showing the distribution of two quantitative
* variables in respect to each other.

* Our data contains variables on total household income and expenditures. A
* scatter plot for these two variables is easily created with
graph twoway scatter totx toty

* This command can be abbreviated in many ever-shorter versions:
twoway scatter totx toty
scatter totx toty
tw sc totx toty
sc totx toty
* We like the last one best -- given how frequently we use scatter plots, it's
* fitting to have this command super short.

* By default, the variable listed first will be plotted along the y-axis, and
* variable listed second will be plotted along the x-axis. 

* Especially when you have lots of observations, relatively thick dots for each
* observation may obscure the pattern. You can opt for smaller markers with
sc totx toty, msize(small)

* Not small enough? Go for tiny:
sc totx toty, msize(tiny)

* Stata knows 12 different sizes. You can find a list here:
help markersizestyle

* You can get extensive documentation on scatter plots with:
help scatter

* Scatter plots will come in handy during Econometrics -- we'll be sure to
* revisit this subject...


** CUSTOMIZING AND LABELING YOUR CHARTS --  Stata graphs have sensible default
* options. Usually, the basic command produces a properly labeled, legible
* graph. There are situations where you want or need to fine-tune the settings:
* you may want to adjust the labels to better communicate what the chart shows,
* or you may want to correct default settings in Stata when they don't work well
* for the graph you are producing.

* Often, the most important changes apply to the labeling. We'll start with a
* simple scatter plot:
sc totx toty, msize(tiny)

* and add a title:
sc totx toty, msize(tiny) title("Household Income v Expenditure")

* We can also add subtitles and captions:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009")
* Because graph commands tend to get longer and longer, here's a trick that
* allows us to break the command over several lines to keep for legibility:
* end each line with \\\ until the command is finished. This only works in do-
* files, not in the command line!

* We can also re-label our axes in the graph in case the variable labels don't
* do a perfect job:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009") ///
   xtitle("Total Annual Income") ///
   ytitle("Total Annual Expenditures")
   
* Finally, we may want to change the layout and design of the graph. While it is
* possible to change colors, fonts, and positions manually, Stata has a powerful
* option called "scheme" which allows you to apply a consistent design to the
* entire graph. Two popular options are

* ... graphs that match the design used in The Economist:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009") ///
   xtitle("Total Annual Income") ///
   ytitle("Total Annual Expenditures") ///
   scheme(economist)

* ... and graphs optimized for gray-scale (black and white), low-ink printing:
sc totx toty, msize(tiny) ///
   title("Household Income v Expenditure") ///
   subtitle("Kyrgyz Integrated Household Survey") ///
   caption("Observations: 4984 households in 2009") ///
   xtitle("Total Annual Income") ///
   ytitle("Total Annual Expenditures") ///
   scheme(s1mono)
   
* Stata has 11 default schemes and can install more. Find out more at
help schemes

* All of the graphing commands have many options, far more than we want to cover
* here. Be sure to read the extensive documentation that comes with Stata for
* all of the graph commands by calling the help function:
help graph


** SAVING YOUR GRAPHS -- While you are exploring your data set, just looking at
* different graphs is enough. But as you get closer to writing the report, you
* will want to save graphs. The easiest way is to tack a "saving" option onto
* the end of your graph command
sc totx toty, saving(scatter)

* Unfortunately, this produces a graph in Stata's own file format .gph

* If you want to save a graph for inclusion in your written documents, it's best
* to export the graph in a useful file format. This is done with "graph export":
graph export "scatter.pdf"

* This produces a high-quality PDF version of your graph, which is great for 
* inclusion in Word, LaTeX, PowerPoint, ...

* "graph export" can produce other file types (such png, tiff, eps). You can
* find more here:
help graph export


********************************************************************************
**                       IDEAL STRUCTURE OF A DO-FILE                         **
********************************************************************************

* This is an example of the ideal start of a do-file, with comments that explain
* each step and why it is useful to do these steps in the order proposed here.
* Feel free to use this as the template for your own do-files in the future.


************************* BEFORE YOU START WORKING... **************************

** SETTING PAGINATION OF MESSAGES OFF -- When manually running commands, it's
* useful that Stata pauses after a screenful of messages so that you have time
* to read the output before it scrolls past. This is less helpful when running
* do-files that create a lot of output, since it forces to you press a key at
* the end of every screen. Switch it off with:

set more off

* By the way: if you want to make the change permanent, you can use 
set more off, permanently
* Now, you don't need to individually switch more off at the start of each
* session or the top of each do-file. Just remember to switch it back on again
* when you do some manual work in Stata.


** CLOSING ANY RUNNING LOG FILE -- Especially when you do a lot of analysis, you
* may already have a log file open. It is good practice to make sure that you
* properly close running log files when you start a new analysis. The best way
* to do this is to use the following command, which closes any open log files
* and does not complain in case no log was open:
capture log close

* Placing capture before log close is what prevents error messages from stopping
* your work in case there actually was not a log file open.


** CHANGE TO YOUR WORKING DIRECTORY -- It's good practice to have one folder
* contains your data, do files and logs. Before switching on logging, we change
* to this working directory or folder. Once we do this, we don't have to bother
* with directory paths when opening data and logs. Here are examples of how to
* change directories under Windows, Mac and Linux, assuming that your user name
* is StataNinja. (Honestly, why wouldn't you want this to be your user name?)

* Windows:
cd "C:\Users\StataNinja"

* Mac:
cd "/Users/StataNinja"

* Linux:
cd "/home/stataninja/"

* The above examples get you to your user directory. Make sure to change them
* to lead you to your specific data directory.


** SWITCH ON LOGGING -- We recommend switching on logging before you open your
* data file or run any commands. This ensures that you always know what data
* file you were working on and captures any commands you run.

* We also recommend that you have a log file for each session that you run, and
* you label them by day and time when you did your analysis. The advantage of
* this approach is that you can easily find, say, the analysis you did last
* Tuesday afternoon, simply by looking at the name of the log file. Stata can
* automatically open a log file with date and time information:

log using "Log `c(current_date)' `c(current_time)'.log", name("HELPFUL NOTE")

* Feel free to change the name of the log, but make sure to keep the commands
* for date and time -- `c(current_date)' and `c(current_time)' -- in the file
* name.

* We also recommend giving each log file a name, like "HELPFUL NOTE" above.
* You can use this to leave a clue to your future self about what you are doing
* in this log file.

* Finally, if you label your log files with date and time, you usually won't
* need to use the "append" or "replace" options. Unless you open another log
* file in less than 60 seconds, your log file will have a unique name that will
* not repeat again, so there is no risk of having a log file with the same name
* in your working directory.


** OPEN YOUR DATA FILE -- It's best to open your data file right at the start of
* your log. Just like naming your log, opening your data file as the very first
* command helps your future self remember what you are doing.

use "My delightful data.dta", clear

* We recommend using the clear option when opening your data file, but this
* creates the risk of data loss since any data in Stata that has not been saved
* yet will be lost. You do remember our recommendation about backing up your
* original data at least once before and after import, right?


************************* YOUR ANALYSIS HAPPENS HERE! **************************

* At this point, you're good to go. We recommend leaving comments to yourself
* next to each block of commands, and to separate logical units of commands
* from each other with a couple of empty lines. For example, if your first set
* commands import and clean the data, followed by blocks of commands that
* describe and analyze the data, put a couple of empty lines between these
* blocks and leave comments like "import", "data cleaning", "descriptives",
* "analysis", "graphs", etc. at the start of each block.

* Of course, you don't have to be as verbose as we are here. :)


*************************** AT THE END OF YOUR WORK ****************************

** SAVE YOUR DATA AS NEEDED -- Be sure to save your data if you have made
* changes to it that you want to safe-guard. Remember to backup your original
* data at least once before and after import. We recommend that you give sub-
* sequent saves a slightly different file name, like:

save "My delightful data RECODED.dta", replace

* Careful with the "replace" option. If you already have a data file with that
* name, it's contents will be overwritten!

* Make sure you save your data while the log file is still open. In case you
* ever need to figure out where a particular version of a data file comes from,
* you'll have the save command on the log and can scroll up to see how you
* generated the data file.


** FINAL MESSAGES -- Just before you close your log is a good time to leave any
* final notes to yourself, like the time and date when your analysis ended:
display "Analysis ends on `c(current_date)' at `c(current_time)'"

* The abbreviated version of "display" is "di":
di "Analysis ends on `c(current_date)' at `c(current_time)'"


** CLOSE YOUR LOG -- It is best practice to close your log at the end of your
* analysis. This ensures that all your data is written to the hard drive before
* you close Stata. 
log close _all

* Using _all here tells Stata to close any log, independent of the helpful name
* we have given it above.


** CLEAN UP -- We recommend clearing your data at the end of your analysis. This
* is good practice because it establishes that at then end of each orderly
* analysis, no data should be in Stata. If you ever run into a situation where
* data is still in memory when the analysis is over, you can treat it as an
* indicator that something went wrong.


** WE HOPE YOU HAVE A GOOD TIME WORKING IN STATA...