---
title: "R Notebook"
output: html_notebook
author: "Martin Shepperd"
date: 03/09/2020
---
# Make some charts from Data Scientist Need to Know Skills
This is R Notebook for Week1 of CS5702 (Modern Data).
As a simplification I only use base R, rather than ggplot.
First we need to reproduce the data, since the original article ["What are the In-Demand Skills for Data Scientists in 2020?"] (https://towardsdatascience.com/what-are-the-in-demand-skills-for-data-scientists-in-2020-a060b7f31b11), doesn't provide the raw observations. I extracted the percentages from the text / chart.
```{r}
# Reproduce the data
percent <- c(62,40,39,21,20,20,19,16,15,13)
skill <- c("Python","SQL","R","Spark","Cloud","AWS","Java","TensorFlow","Hadoop","SAS")
```
Now we have the data let's try to create some graphical representations.
```{r}
# Ugly visualisation of the data
pie(percent, main="Pie chart of in demand Data Scientist skills", labels=skill)
```
This pie chart has a number of disadvantages. Can you think what some of them might be?
As an alternative I now create a bar chart.
```{r}
# Creating a bar chart with Base R
barplot(percent,
main="Bar chart of in-demand Data Scientist skills (Jan 2020)",
sub = "Source: https://j.mp/2Ga8CD1",
ylab="Percentage of listings that mention skill",
names.arg=skill,
cex.names = 0.6, # Shrinks text size so fits on x-axis
col="darkolivegreen", # Colour for bar chart boxes
density=c(100, 50, 100, 50, 50, 50, 50, 50, 50, 50)) # Shades Python and R in a darker colour
```
Hopefully you consider this to be an improvement, but it's not perfect. Can you think of any further improvements or clarifications?