Due before class May 14th.
hw07
repositoryGo here to fork the repo for homework 07.
Last week you estimated statistical learning models predicting survival and death on the Titanic. Take the model specification for your best-performing model from the Titanic problem, and estimate it using at least three machine learning algorithms in sparklyr
(either MLlib or H2O). Calculate the accuracy and AUC metrics for each model. Which algorithm performs the best?
ml_()
function, you need to split the data into training/test setsh2o.()
function, you need to either split the data into training/test sets or use \(k\)-fold CVBuild a map.
Yes.
I know. But at this point you should be able to rise to the occasion.
Think of data you’ve seen in the past that you think would make for a good geospatial visualization. Which means it needs to include both a geographic component plus some additional data to overlay on top of the geography.
As for drawing the geographic boundaries, that depends on what you want to map. Remember the maps
package has some basic boundaries for major geographic units. Run help(package = "maps")
to get a list of geographic databases contained in maps
. Also remember the fiftystater
package is useful for mapping state-level variables within the United States. If you need to obtain shapefiles for more specific regions, Google is a great starting point. If you need help finding a relevant shapefile, feel free to post on the issues page to get help from the instructional staff/peers.
Once you have your geographic boundaries data (either from an R package or imported from a shapefile), combine this with your substantive data you wish to visualize. Make sure to make the graph presentable - that is, make it look like a nice map. Things to consider include (but are not limited to):
Along with the map itself, write a brief description (250-500 words) of the map. Summarize the information being depicted and explain any major visual design choices (e.g. why this color palette, why split the continuous variable into XYZ intervals rather than ABC intervals).
Remember to make your assignment reproducible. If you get a shapefile from the internet, either include it in your repo or make sure your R Markdown document/R script includes a function to download it from the internet.
Your assignment should be submitted as one or more R Markdown documents, data files, figures, etc. Follow instructions on homework workflow. As part of the pull request, you’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc.
Check minus: Cannot get code to run or is poorly documented. No documentation in the README
file. Severe misinterpretations of the results. Overall a shoddy or incomplete assignment. Map looks amateurish or hard to interpret.
Check: Solid effort. Hits all the elements. No clear mistakes. Easy to follow (both the code and the output). Nothing spectacular, either bad or good.
Check plus: Interpretation is clear and in-depth. Accurately interprets the results, with appropriate caveats for what the technique can and cannot do. Code is reproducible. Writes a user-friendly README
file. Graph looks crisp, easy-to-read, and communicates information honestly and accurately.
This work is licensed under the CC BY-NC 4.0 Creative Commons License.