Overview

Consult the general homework guidelines.

Due sometime Friday 2015-10-16. I am open to negotiation if the lateness of this posting is creating hardship.

The goal is to write one (or more) custom functions that do something useful to pieces of the Gapminder data. Then use dplyr::do() to apply to all such pieces. Then use dplyr() and or ggplot2 to explore what you got back.

Remember the sampler concept. Your homework should serve as your own personal cheatsheet in the future for how to write a function and how to scale up its application with data aggregation machinery.

Gapminder data

Work with the Gapminder excerpt. If you really, really want to, you can explore a different dataset but get permission from Jenny.

Your mission, high-level

Write a function to compute something interesting on a piece of the Gapminder data. Make it something you can’t easily do with built-in functions. Make it something that’s not trivial to do with the simple dplyr verbs.

Use dplyr::do() to apply your function to all possible pieces of the Gapminder dataset and return the combined result.

Explore the results you get back. Use all your usual tricks, so I expect to see alot of dplyr and ggplot2 here.

Make observations about what your tables/figures show and about the process.

Inspiration for what to compute

Find countries with interesting stories. Remember this one from last week? You are even better equipped to tackle it now. Here are some ideas to get you thinking. Feel free to riff on them – I don’t expect rote implementation. Some of these ideas will impact the function you write AND the follow-up exploration.

Wear your statistical hat and characterize how well/poorly the simple linear model is working. Via residual analysis. Or by fitting something more complicated (but still appropriate for \(n\) = 12!) – like a degree 2 polynomial. Retain quantities that speak to goodness-of-fit and explore that across all 142 countries.

Do anything we’ve discussed so far but for a different combination of variables. How is GDP per capita changing over time? How about the relationship between GDP per capita and life expectancy?

Exploration of the results

Once you’ve found something interesting to compute and you’ve used dplyr::do() to enact the computation broadly, it’s vital that you digest and interpret the results.

This will probably mean some sorting, filtering, etc. All your dplyr skills will come in handy. There’s probably a couple of interesting tables to make.

Whenever possible, include a companion figure that adds context to the numbers and bolsters your comments. The figure does not have to depict exactly or only what the table does – it just needs to reinforce the connection back to the underlying data.

But I want to do more!

Do your main data aggregation task with dplyr::group_by() + do() AND plyr::ddply(). Reflect on the pros/cons of the two approaches.

Explore more functions in the broom package.

Explore plyr’s capabilities to work with vectors, multi-dimensional arrays, and lists. Get outside the safe little world of data.frames.

Take a look at purrr for functional programming more generally. Here’s a blog post by one of the authors on combining purrr and dplyr.

Report your process

You’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc. Give credit to your sources, whether it’s a blog post, a fellow student, an online tutorial, etc.

Submit the assignment

Follow instructions on How to submit homework

Rubric

Start using our general rubric for specifics to evaluate! The form will require you to do so!

Check minus: One or more problems such as … Student’s custom function was exactly what we did in class or a very modest extension. Exploration of the data aggregation results is missing or minimal. Student missed clear opportunities to complement numbers with a figure. Technical problem(s) that are relatively easy to fix. Repository organization – or lack thereof – leaves work for the reader, in terms of finding the necessary files.

Check: Hits all the elements. No obvious mistakes. Pleasant to read. No heroic detective work required. Solid.

Check plus: Exceeded the requirements in number of dimensions. Developed novel tasks that were indeed interesting and “worked”. Impressive use of dplyr, plyr, broom, and/or ggplot2. Impeccable organization of repo and report. You learned something new from reviewing their work and you’re eager to incorporate it into your work.