# Addendum: how many views for how many videos? 

While writing the above article, I was wondering what the most watched videos of 33C3 were. This information is not easily accessible, so let's extract it from the individual video pages.

First, let's find all the links to the single episodes.

In [1]:
import requests

In [2]:
r = requests.get('https://media.ccc.de/c/33c3')

In [3]:
r

<Response [200]>

Let's parse the result:

In [5]:
from bs4 import BeautifulSoup

In [10]:
soup = BeautifulSoup(r.text, 'html.parser')

Now, let's build the link list:

In [24]:
links = []
for item in soup.select("h3 > a"):
    links.append("https://media.ccc.de" + item.attrs['href'])

In [25]:
len(links)

144

In [26]:
links[0]

'https://media.ccc.de/v/33c3-8428-33c3_closing_ceremony'

Let's download each link content:

In [27]:
contents = {}
for link in links:
    r = requests.get(link)
    contents[link] = r

Let's check that each request went fine:

In [42]:
all([item.status_code == 200 for item in contents.values()])

True

Now, let's extract the number of views from each webpage.

In [308]:
def metadata(content):
    "Returns metadata from url content."
    soup = BeautifulSoup(content, 'html.parser')
    tag = soup.find('ul', class_='metadata')
    meta = dict(zip(["duration", "date", "views", "description"],
                     [item.text.strip() for item in tag.find_all('li')]))
    meta['description'] = str(tag.find('a'))
    meta['views'] = int(meta['views'])
    meta['title'] = soup.title.text.replace('\n', '')
    return meta

In [309]:
first = list(contents.values())[0].text

In [310]:
metadata(first)

{'date': '2016-12-28',
 'description': '<a href="https://fahrplan.events.ccc.de/congress/2016/Fahrplan/events/8293.html">\nfahrplan.events.ccc.de\n</a>',
 'duration': '61 min',
 'title': 'C3TV -Netzpolitik in Österreich',
 'views': 813}

Let's build a dataframe with this data:

In [311]:
import pandas as pd

In [312]:
all_metadata = []
for item in contents:
    meta = metadata(contents[item].text)
    meta['video_link'] = '<a href="{}">video link</a>'.format(item)
    all_metadata.append(meta)

In [313]:
df = pd.DataFrame(all_metadata)
df = df.set_index(df.pop('title'))

In [314]:
df.columns

Index(['date', 'description', 'duration', 'video_link', 'views'], dtype='object')

In [315]:
df['description'] = df.description.apply(lambda s: s.replace('\n', ''))

In [316]:
df.head()

Unnamed: 0_level_0,date,description,duration,video_link,views
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
C3TV -Netzpolitik in Österreich,2016-12-28,"<a href=""https://fahrplan.events.ccc.de/congress/2016/Fahrplan/events/8293.html"">fahrplan.events.ccc.de</a>",61 min,"<a href=""https://media.ccc.de/v/33c3-8293-netzpolitik_in_osterreich"">video link</a>",813
C3TV -Predicting and Abusing WPA2/802.11 Group Keys,2016-12-27,"<a href=""https://fahrplan.events.ccc.de/congress/2016/Fahrplan/events/8195.html"">fahrplan.events.ccc.de</a>",60 min,"<a href=""https://media.ccc.de/v/33c3-8195-predicting_and_abusing_wpa2_802_11_group_keys"">video link</a>",708
C3TV -Formal Verification of Verilog HDL with Yosys-SMTBMC,2016-12-28,"<a href=""https://fahrplan.events.ccc.de/congress/2016/Fahrplan/events/7922.html"">fahrplan.events.ccc.de</a>",52 min,"<a href=""https://media.ccc.de/v/33c3-7922-formal_verification_of_verilog_hdl_with_yosys-smtbmc"">video link</a>",235
C3TV -The Zcash anonymous cryptocurrency,2016-12-30,"<a href=""https://fahrplan.events.ccc.de/congress/2016/Fahrplan/events/8330.html"">fahrplan.events.ccc.de</a>",31 min,"<a href=""https://media.ccc.de/v/33c3-8330-the_zcash_anonymous_cryptocurrency"">video link</a>",611
C3TV -The Global Assassination Grid,2016-12-27,"<a href=""https://fahrplan.events.ccc.de/congress/2016/Fahrplan/events/8425.html"">fahrplan.events.ccc.de</a>",63 min,"<a href=""https://media.ccc.de/v/33c3-8425-the_global_assassination_grid"">video link</a>",4498


Let's now export this to HTML:

In [317]:
pd.set_option('display.max_colwidth', -1)

In [318]:
from IPython.display import HTML

In [319]:
HTML(df.sort_values(by='views', ascending=False).to_html(escape=False))

Unnamed: 0_level_0,date,description,duration,video_link,views
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
C3TV -Console Hacking 2016,2016-12-28,fahrplan.events.ccc.de,53 min,video link,21020
C3TV -Fnord-Jahresrückblick,2016-12-29,fahrplan.events.ccc.de,99 min,video link,14297
C3TV -Shut Up and Take My Money!,2016-12-27,fahrplan.events.ccc.de,30 min,video link,12451
C3TV -SpiegelMining – Reverse Engineering von Spiegel-Online,2016-12-29,fahrplan.events.ccc.de,58 min,video link,12162
C3TV -Where in the World Is Carmen Sandiego?,2016-12-28,fahrplan.events.ccc.de,59 min,video link,8630
C3TV -How Do I Crack Satellite and Cable Pay TV?,2016-12-27,fahrplan.events.ccc.de,62 min,video link,7880
C3TV -Copywrongs 2.0,2016-12-28,fahrplan.events.ccc.de,61 min,video link,7042
C3TV -Nintendo Hacking 2016,2016-12-27,fahrplan.events.ccc.de,61 min,video link,6073
C3TV -33C3 Opening Ceremony,2016-12-27,fahrplan.events.ccc.de,15 min,video link,6005
C3TV -Die Sprache der Populisten,2016-12-28,fahrplan.events.ccc.de,59 min,video link,5884


So there we have it: all talks sorted by views at the time of writing! Have fun watching!