# Visualising In the Spotlight Data Over Time

In this notebook we will produce some visualisations of [*In the Spotlight*](https://www.libcrowds.com/collection/playbills) performance data over time to see if we can begin to identify any trends.

As we begin to get into more complicated territory, we won't explain every function used in detail. However, hopefully there will be something here that most can follow.

We will again use pandas and plotly as our core Python libraries, both of which were introduced in previous notebooks.

In [92]:
import pandas
import plotly

## The dataset

Our input will again be the dataframe of performance data introduced in a [previous notebook](intro_to_analysing_its_data_using_python.ipynb). The dataframe is loaded in the code block below.

In [93]:
import os
import sys
module_path = os.path.abspath(os.path.join('..', 'data', 'scripts'))
if module_path not in sys.path:
    sys.path.append(module_path)
from get_its_performances import get_performances_df
df = get_performances_df()

# Sets plotly to offline mode
plotly.offline.init_notebook_mode()

As a reminder of how this dataframe looks we can run the `head()` function.

In [94]:
df.head()

Unnamed: 0,title,date,genre,link,theatre,city,source
0,Pageantry,,,http://access.bl.uk/item/viewer/ark:/81055/vdc...,"Theatre Royal, Margate",Margate,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...
1,The Hypocrite,,Comedy,http://access.bl.uk/item/viewer/ark:/81055/vdc...,"Theatre Royal, Margate",Margate,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...
2,The Padlock,,Musical Farce,http://access.bl.uk/item/viewer/ark:/81055/vdc...,"Theatre Royal, Margate",Margate,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...
3,The Village Lawyer,,Farce,http://access.bl.uk/item/viewer/ark:/81055/vdc...,"Theatre Royal, Margate",Margate,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...
4,Death of Gen. Wolfe,,Ballet,http://access.bl.uk/item/viewer/ark:/81055/vdc...,"Theatre Royal, Margate",Margate,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...


## Adding days, months and years to the dataframe

As we begin looking at our date information more closely it might be useful to add separate columns for day, month and year to our dataframe so that we can plot other entities against these values.

We will also want to remove any rows that do not contian a date, or contain an incomplete date, as is the case for many of the playbills. The following line of code checks each value in the date column against a regular expression and removes those rows that do not match the pattern that identifies a complete date.

In [95]:
df = df[df.date.str.contains('\d{4}-\d{2}-\d{2}', na=False)]

The date column is then converted to a date type.

In [96]:
df['date'] = pandas.to_datetime(df['date'])

We are now ready to create our additional columns.

In [97]:
df['day'] = df['date'].dt.strftime('%d').astype('int32')
df['month'] = df['date'].dt.strftime('%m').astype('int32')
df['year'] = df['date'].dt.strftime('%Y').astype('int32')

In [98]:
df.head()

Unnamed: 0,title,date,genre,link,theatre,city,source,day,month,year
194,"Wandering Boys: Or, the Castle of Olival",1829-04-30,,http://access.bl.uk/item/viewer/ark:/81055/vdc...,Miscellaneous Plymouth theatres,Plymouth,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...,30,4,1829
198,High Life Below Stairs,1828-04-10,Farce,http://access.bl.uk/item/viewer/ark:/81055/vdc...,Miscellaneous Plymouth theatres,Plymouth,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...,10,4,1828
202,Jack Robinson and His Monkey,1829-01-30,,http://access.bl.uk/item/viewer/ark:/81055/vdc...,Miscellaneous Plymouth theatres,Plymouth,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...,30,1,1829
205,Invincibles; Ou Les Femmes Soldats,1829-03-05,,http://access.bl.uk/item/viewer/ark:/81055/vdc...,Miscellaneous Plymouth theatres,Plymouth,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...,5,3,1829
208,Devil to Pay,1830-11-23,Farce,http://access.bl.uk/item/viewer/ark:/81055/vdc...,Miscellaneous Plymouth theatres,Plymouth,https://api.bl.uk/metadata/iiif/ark:/81055/vdc...,23,11,1830


## Plotting most popular periods

We can now identify the days, months or years where most plays were performed. The following code block plots a chart of plays performed by month of the year.

In [99]:
date_part = 'month'
series = df[date_part].value_counts()
series.sort_index(inplace=True)
trace = plotly.graph_objs.Scatter(x=series.index, y=series)
fig = plotly.graph_objs.Figure(data=[trace])
plotly.offline.iplot(fig)

We can see that there appears to be a trend towards less performances during the middle of the year. Although, with a relatively small dateset we might want to be careful about attempting to draw any conclusions just yet (trends will become clearer as more data is collected).

Similar charts for the day or year can be produced by modifying the `date_part` variable above.

## Summary

Work in progress!