---
title: "R Notebook"
output: html_notebook
author: "Martin Shepperd"
date: 03/09/2020
---

# Make some charts from Data Scientist Need to Know Skills

This is R Notebook for Week1 of CS5702 (Modern Data).
 
As a simplification I only use base R, rather than ggplot.

First we need to reproduce the data, since the original article ["What are the In-Demand Skills for Data Scientists in 2020?"] (https://towardsdatascience.com/what-are-the-in-demand-skills-for-data-scientists-in-2020-a060b7f31b11), doesn't provide the raw observations.  I extracted the percentages from the text / chart. 

```{r}
# Reproduce the data

percent <- c(62,40,39,21,20,20,19,16,15,13)
skill <- c("Python","SQL","R","Spark","Cloud","AWS","Java","TensorFlow","Hadoop","SAS")
```

Now we have the data let's try to create some graphical representations.

```{r}
# Ugly visualisation of the data

pie(percent, main="Pie chart of in demand Data Scientist skills", labels=skill)
```

This pie chart has a number of disadvantages.  Can you think what some of them might be?

As an alternative I now create a bar chart.

```{r}
# Creating a bar chart with Base R

barplot(percent, 
main="Bar chart of in-demand Data Scientist skills (Jan 2020)", 
sub = "Source: https://j.mp/2Ga8CD1",
ylab="Percentage of listings that mention skill", 
names.arg=skill, 
cex.names = 0.6,                                     # Shrinks text size so fits on x-axis
col="darkolivegreen",                                # Colour for bar chart boxes
density=c(100, 50, 100, 50, 50, 50, 50, 50, 50, 50)) # Shades Python and R in a darker colour
```

Hopefully you consider this to be an improvement, but it's not perfect. Can you think of any further improvements or clarifications?