********************************************************************************
**                       LABELING VARIABLES AND VALUES                        **
********************************************************************************

* It is essential that variables and values in your data set are carefully named
* and labeled. Good variable names reduce the amount of typing you have to do,
* good variable labels help you (in the future) to remember what information is
* contained in each variable, and correct value labels make sure that
* categorical information is correctly understood and analyzed.


** (RE-)NAMING YOUR VARIABLES -- Especially when importing data from other
* sources, you may find that the variable names are not as succinct as they
* could be.

* You can easily rename variables using:
rename oldname newname
* As long as the oldname correctly identifiea an existing variable and the new
* name is an acceptable name, Stata will rename the variable for you.

* The abbreviated version of "rename" is "ren":
ren old new

* As an example, let's say that we got household data that includes the number
* of children under 5 and under 14 in a household, and the data was in an Excel
* sheet with the very clear, but verbose variable names:
*     No. of children aged <=5
*     No. of children aged <=14
* Upon importing the data into Stata, the variable names become:
*     noofchildrenaged5
*     noofchildrenaged14
* This is hard to read and cumbersome to type. We could rename these variables
* to something shorter, but still sufficiently clear:
rename noofchildrenaged5 child5
rename noofchildrenaged14 child14

* You will want to rename your variables right after importing, so that all
* future commands use the new names consistently. If you rename variables in the
* middle of your do-file, you may have commands referring to the same variable
* but using the old name before the renaming command and the new name after the
* renaming command. Because these names are different, somebody reading your
* code will have a harder time seeing that you are working with the same
* variable. Ergo: rename right after import.

* Nota bene:
* For the KIHS, the names are already pretty good, because the dataset is
* created by professional statisticians who understand the ways in which
* statistical packages tend to work. We request that you stick to the original
* variable names so that we have an easier time reading your do-file.


** LABELING YOUR VARIABLES -- In Stata, variable names are meant to be short and
* to the point, so that they quick to type and easy to understand. However, 
* variable names are usually to brief to contain all information you will need.
* For this reason, Stata offers variable labels, which can contain a lot more
* information.

* Ideally, even the simplest variables should have a label and we encourage
* proper labeling in every dataset you work on. While you may not be creating
* data sets that will be distributed to many users (yet!), you do have one all-
* important constituent who needs to understand your data: yourself in the
* future. Careful labeling helps you understand your own data set when you come
* back to it in a few weeks, months or even years. If you label your data now,
* you won't have to travel back in time to punish yourself for making your own
* life harder.

* There's an additional benefit to labeling data: writing good variable labels
* forces you to think carefully about what each variable contains. If something
* is unclear to you know (e.g. what currency is this income data in?), you can
* probably investigate and fix this problem, or you are at least aware that a
* problem exists and needs to be taken into account when interpreting the data.

* To give any variable a label, use:
label variable some_variable "Description of the variable, up to 80 characters"

* The abbreviated version is:
la var variable "Description of the variable, up to 80 characters"


** LABELING VALUES OF CATEGORICAL VARIABLES -- Many datasets contain categorical
* variables which are stored as numerical codes that have an assigned meaning.
* Labeling the values is what tells Stata (and the users of the data set) which
* numerical code stands for which meaning.

* Assigning value labels proceeds in two steps. First, you create a list of the
* numerical codes (or values) and their label appropriate label:
label define name_of_this_set_of_values 1 "What does 1 stand for?" 2 "What..."
* Then, you tell Stata to use this set of labels on a particular variable
label values variable_name name_of_this_set_of_values

* We might have a marital status variable with different categories:
label define statuslabels 0 "not married" 1 "married" 2 "divorced" 3 "widowed"
label values maritalstatus statuslabels

* The abbreviated versions of this commands are "la de" and "la val":
la de genderlabels 0 "male" 1 "female"
la val gender genderlabels

* Your set of value labels can contain tens of thousands of value-label pairs.


** THE CODEBOOK -- If you have received a properly labeled dataset from a source
* or if you labeled a dataset yourself, you can use the
codebook
* command to get a comprehensive overview of all variables. For each variable,
* you will get the name, the label, the type of the variable, and useful
* summary statistics such as the number of missing observations, the range,
* average, standard deviation and percentiles (for numeric information), or a
* tabulation (for categorical information).

* Especially when getting a new dataset or refreshing your memory of an existing
* dataset, this should be your go-to command.