# example 2.8 of section 2.2.1 
# (example 2.8 of section 2.2.1)  : Loading data into R : Working with relational databases : A production-size example 
# Title: PUMS data provenance documentation 

3-12-2013
PUMS Data set from:
  http://www.census.gov/acs/www/data_documentation/pums_data/ 	# Note: 1 
  select "2011 ACS 1-year PUMS"  	# Note: 2 
  select "2011 ACS 1-year Public Use Microdata Samples\
  (PUMS) - CSV format"
  download "United States Population Records" and
  "United States Housing Unit Records"
    http://www2.census.gov/acs2011_1yr/pums/csv_pus.zip 	# Note: 3 
    http://www2.census.gov/acs2011_1yr/pums/csv_hus.zip
  downloaded file details:
    $ ls -lh *.zip               	# Note: 4 
      239M Oct 15 13:17 csv_hus.zip
      580M Mar  4 06:31 csv_pus.zip
    $ shasum *.zip  	# Note: 5 
      cdfdfb326956e202fdb560ee34471339ac8abd6c  csv_hus.zip
      aa0f4add21e327b96d9898b850e618aeca10f6d0  csv_pus.zip

# Note 1: 
#   Where we found the data documentation. This 
#   is important to record as many data files don’t 
#   contain links back to the documentation. Census 
#   PUMS does in fact contain embedded documentation, 
#   but not every source is so careful. 

# Note 2: 
#   How we navigated from the documentation site 
#   to the actual data files. It may be necessary to 
#   record this if the data supplier requires any sort 
#   of click-through license to get to the actual 
#   data. 

# Note 3: 
#   The actual files we downloaded. 

# Note 4: 
#   The sizes of the files after we downloaded 
#   them. 

# Note 5: 
#   Cryptographic hashes of the file contents we 
#   downloaded. These are very short summaries (called 
#   hashes) that are very unlikely to have the same 
#   value for different files. These summaries can 
#   later help us determine if another researcher in 
#   our organization is using the same data 
#   distribution or not. 

