// Benjamin Skinner // LPO 9951: PhD Student Research Practicum // Fall 2015 // QUICK NOTE RE: variance estimates of weighted means // In the first lesson on sampling design, we discussed the use of inverse // probability weights to compute a more accurate estimate of a // population mean. We noted that in the class example, the estimate of // the standard error of the mean increased. Is this always the case? // Analytically, we would expect that our estimate of the standard error // of the weighted mean should increase since we use more degrees of // freedom in the process. We can also run simulations to empirically // show this to be the case. While simulations may be sensitive to the // data set and coding structure, they can be useful when investigating // questions such as these. // load fake SAT data global datadir "../data/" use ${datadir}fakesat, clear // store the population mean qui sum score scalar fullmean = `r(mean)' // three probabilities of reporting score // (1) more likely as score goes up (+) // (2) more likely as score goes down (-) // (3) random gen preport1 = score / 1000 + .1 * (score / 10000)^2 + rnormal(0, .025) gen preport2 = (1 - (score / 1000)) - .1 * (score / 10000)^2 + rnormal(0, .025) gen preport3 = runiform(0,1) // check first column of correlations cor score preport1 preport2 preport3 // generate inverse probability weights gen pw1 = 1 / preport1 gen pw2 = 1 / preport2 gen pw3 = 1 / preport3 // init blank matrix to fill; for each weight (3), storing: // unadjusted mean // unadjusted sem // weighted mean // weighted sem matrix storemat = J(100,12,.) // run simulations...will take a few minutes forvalues pw = 1/3 { // Looping through each type of pweight di "========================" di "Probability weight: pw`pw'" di "========================" // set j for matrix: needs to start at 1 and move up by 4 local j = 4 * `pw' - 3 // Monte Carlo simulations with selected weight forvalues i = 1/100 { // 100 Monte Carlo runs di "Monte Carlo run: `i'" preserve // sample 1% using selected probability of reporting quietly gsample 1 [w = preport`pw'], percent // get unadjusted mean/sem quietly mean score matrix est = r(table) matrix storemat[`i', `j'] = est[1,1] matrix storemat[`i', `j' + 1] = est[2,1] // get weighted mean/sem quietly mean score [pweight = pw`pw'] matrix est = r(table) matrix storemat[`i', `j' + 2] = est[1,1] matrix storemat[`i', `j' + 3] = est[2,1] restore } } // drop data drop score preport* pw* // label the column names of the matrix matrix colnames storemat = mean1 sem1 wmean1 wsem1 /// mean2 sem2 wmean2 wsem2 /// mean3 sem3 wmean3 wsem3 // convert the matrix to data in stata svmat storemat, names(col) // summarize all di fullmean sum *, sep(4) // COMMENTS // In all cases, we are able to estimate the true population mean // with greater accuracy when the inverse probability weights are // taken into account. Also note that in all cases, the standard // error of the mean increases when weights are used. This is due to // the variance approximation formula used by Stata. Lesson? Use weights // to estimate a more accurate mean, but know that you pay a penalty // in terms of your surety about the mean. exit