{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Ejemplo de extracción de datos\n", "\n", "Este notebook utiliza una colección digital descrita a través de ficheros MARCXML que incluye metadatos descriptivos del catálogo [Moving Image Archive](https://data.nls.uk/data/metadata-collections/moving-image-archive/) de la Biblioteca Nacional de Escocia." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importando las librerías de código" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# https://pypi.org/project/pymarc/\n", "import pymarc, re, csv\n", "import pandas as pd\n", "from pymarc import parse_xml_to_array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generando un fichero CSV como salida con el contenido procesado a partir de los archivos originales" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "with open('registros_marc.csv', 'w') as csv_fichero:\n", " csv_salida = csv.writer(csv_fichero, delimiter = ',', quotechar = '\"', quoting = csv.QUOTE_MINIMAL)\n", " csv_salida.writerow(['titulo', 'autor', 'lugar_produccion', 'fecha', 'extension', 'creditos', 'materias', 'resumen', 'detalles', 'enlace'])\n", "\n", "\n", " registros = parse_xml_to_array(open('Moving-Image-Archive/Moving-Image-Archive-dataset-MARC.xml'))\n", "\n", " for registro in registros:\n", "\n", " titulo = autor = lugar_produccion = fecha = extension = creditos = materias = resumen = detalles = enlace =''\n", "\n", " # titulo\n", " if registro['245'] is not None:\n", " titulo = registro['245']['a']\n", " if registro['245']['b'] is not None:\n", " titulo = titulo + \" \" + registro['245']['b']\n", "\n", " # autor\n", " if registro['100'] is not None:\n", " autor = registro['100']['a']\n", " elif registro['110'] is not None:\n", " autor = registro['110']['a']\n", " elif registro['700'] is not None:\n", " autor = registro['700']['a']\n", " elif registro['710'] is not None:\n", " autor = registro['710']['a']\n", "\n", " # lugar de producción\n", " if registro['264'] is not None:\n", " lugar_produccion = registro['264']['a']\n", "\n", " # fecha\n", " for f in registro.get_fields('264'):\n", " fechas = f.get_subfields('c')\n", " if len(fechas):\n", " fecha = fechas[0]\n", "\n", " if fecha.endswith('.'): fecha = fecha[:-1]\n", "\n", "\n", " # Physical Description - extent\n", " for f in registro.get_fields('300'):\n", " extension = f.get_subfields('a')\n", " if len(extension):\n", " extension = extension[0]\n", " # TODO cleaning\n", " detalles = f.get_subfields('b')\n", " if len(detalles):\n", " detalles = detalles[0]\n", "\n", " # creditos\n", " if registro['508'] is not None:\n", " creditos = registro['508']['a']\n", "\n", " # Resumen\n", " if registro['520'] is not None:\n", " resumen = registro['520']['a']\n", "\n", " # Materia\n", " if registro['653'] is not None:\n", " materias = '' \n", " for f in registro.get_fields('653'):\n", " materias += f.get_subfields('a')[0] + ' -- '\n", " materias = re.sub(' -- $', '', materias)\n", "\n", "\n", " # enlace\n", " if registro['856'] is not None:\n", " enlace = registro['856']['u']\n", "\n", "\n", " csv_salida.writerow([titulo,autor,lugar_produccion,fecha,extension,creditos,materias,resumen,detalles,enlace])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Leyendo el fichero CSV " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Este comando añade el contenido del fichero a un Pandas DataFrame\n", "df = pd.read_csv('registros_marc.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consultando el contenido" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tituloautorlugar_produccionfechaextensioncreditosmateriasresumendetallesenlace
0(GLASGOW TRAMS AND BOTANIC GARDENS).RUSSELL, Stanley Livingstone[Place of production not identified] :1950.0(2.00 mins) :Director, [filmed by Stanley L. Russell, Thame...Bus Stations and Depots -- Buses and Coaches, ...The Botanic Gardens, Glasgow with shots of the...mute, colourhttp://movingimage.nls.uk/film/0001
1(LAST DAY OF THE TRAMS, GLASGOW).NaN[Place of production not identified] :1962.0(28.00 mins) :Director, [filmed by SAAC].Transport -- Glasgow -- documentary -- amateurFootage of the last trams to run in Glasgow, a...silent, colourhttp://movingimage.nls.uk/film/0002
2INTO THE MISTS.NaN[Place of production not identified] :1956.0(10.04 mins) :Director, [filmed by W.S. Dobson].Ceremonies -- Emotions, Attitudes and Behaviou...The story of the last Edinburgh tram. Shots o...silent, colourhttp://movingimage.nls.uk/film/0004
3PASSING OF THE TRAMCAR, the.NaN[Place of production not identified] :1962.0(63.36 mins) :NaNCeremonies -- Transport -- GlasgowFootage of the last tram to run in Glasgow. Th...silent, colourhttp://movingimage.nls.uk/film/0005
4SCOTS OF TOMORROW.Campbell Harper Productions[Place of production not identified] :1959.0(13.00 mins) :Producer, Campbell Harper Films Ltd..Art and Artists, general -- Education -- edu...Scottish school pupils studying scientific and...sound, black and whitehttp://movingimage.nls.uk/film/0007
.................................
6012CITY OF BIRMINGHAM .NaN[Place of production not identified] :1948.0(6.11 mins) :NaNCeremonies -- Construction and Engineering -- ...Built and engined by John Brown & Co. Ltd. S...silent, colourhttp://movingimage.nls.uk/film/UCS0195
6013BUILDING THE BIG DREDGE - STAGE 1.NaN[Place of production not identified] :1964.0(8min20sec) :Producer, Stephen Group Film Unit.Construction and Engineering -- Ships and Ship...Shots of Indonesian Sea Dredge No. 1, under co...silent, colourhttp://movingimage.nls.uk/film/UCS0204
6014ALEXANDER STEPHEN'S YARD.NaN[Place of production not identified] :1964.0(11.57 mins) :Producer, .Employment, Industry and Industrial Relations ...Shots of the Alexander Stephen's yard, and the...silent, colourhttp://movingimage.nls.uk/film/UCS0207
6015QUEEN ELIZABETH Ship No. 552.NaN[Place of production not identified] :1940.0(5min24sec) :NaNEmployment, Industry and Industrial Relations ...Built and engineered by John Brown & Co. Ltd. ...silent, black and whitehttp://movingimage.nls.uk/film/UCS0213
6016RUAHINE.NaN[Place of production not identified] :1951.0(12.26 mins) :NaNCarriages -- Ceremonies -- Ships and Shipping ...Footage of \"Ruahine\" ship being launched and t...silent, black and white/colourhttp://movingimage.nls.uk/film/UCS0214
\n", "

6017 rows × 10 columns

\n", "
" ], "text/plain": [ " titulo autor \\\n", "0 (GLASGOW TRAMS AND BOTANIC GARDENS). RUSSELL, Stanley Livingstone \n", "1 (LAST DAY OF THE TRAMS, GLASGOW). NaN \n", "2 INTO THE MISTS. NaN \n", "3 PASSING OF THE TRAMCAR, the. NaN \n", "4 SCOTS OF TOMORROW. Campbell Harper Productions \n", "... ... ... \n", "6012 CITY OF BIRMINGHAM . NaN \n", "6013 BUILDING THE BIG DREDGE - STAGE 1. NaN \n", "6014 ALEXANDER STEPHEN'S YARD. NaN \n", "6015 QUEEN ELIZABETH Ship No. 552. NaN \n", "6016 RUAHINE. NaN \n", "\n", " lugar_produccion fecha extension \\\n", "0 [Place of production not identified] : 1950.0 (2.00 mins) : \n", "1 [Place of production not identified] : 1962.0 (28.00 mins) : \n", "2 [Place of production not identified] : 1956.0 (10.04 mins) : \n", "3 [Place of production not identified] : 1962.0 (63.36 mins) : \n", "4 [Place of production not identified] : 1959.0 (13.00 mins) : \n", "... ... ... ... \n", "6012 [Place of production not identified] : 1948.0 (6.11 mins) : \n", "6013 [Place of production not identified] : 1964.0 (8min20sec) : \n", "6014 [Place of production not identified] : 1964.0 (11.57 mins) : \n", "6015 [Place of production not identified] : 1940.0 (5min24sec) : \n", "6016 [Place of production not identified] : 1951.0 (12.26 mins) : \n", "\n", " creditos \\\n", "0 Director, [filmed by Stanley L. Russell, Thame... \n", "1 Director, [filmed by SAAC]. \n", "2 Director, [filmed by W.S. Dobson]. \n", "3 NaN \n", "4 Producer, Campbell Harper Films Ltd.. \n", "... ... \n", "6012 NaN \n", "6013 Producer, Stephen Group Film Unit. \n", "6014 Producer, . \n", "6015 NaN \n", "6016 NaN \n", "\n", " materias \\\n", "0 Bus Stations and Depots -- Buses and Coaches, ... \n", "1 Transport -- Glasgow -- documentary -- amateur \n", "2 Ceremonies -- Emotions, Attitudes and Behaviou... \n", "3 Ceremonies -- Transport -- Glasgow \n", "4 Art and Artists, general -- Education -- edu... \n", "... ... \n", "6012 Ceremonies -- Construction and Engineering -- ... \n", "6013 Construction and Engineering -- Ships and Ship... \n", "6014 Employment, Industry and Industrial Relations ... \n", "6015 Employment, Industry and Industrial Relations ... \n", "6016 Carriages -- Ceremonies -- Ships and Shipping ... \n", "\n", " resumen \\\n", "0 The Botanic Gardens, Glasgow with shots of the... \n", "1 Footage of the last trams to run in Glasgow, a... \n", "2 The story of the last Edinburgh tram. Shots o... \n", "3 Footage of the last tram to run in Glasgow. Th... \n", "4 Scottish school pupils studying scientific and... \n", "... ... \n", "6012 Built and engined by John Brown & Co. Ltd. S... \n", "6013 Shots of Indonesian Sea Dredge No. 1, under co... \n", "6014 Shots of the Alexander Stephen's yard, and the... \n", "6015 Built and engineered by John Brown & Co. Ltd. ... \n", "6016 Footage of \"Ruahine\" ship being launched and t... \n", "\n", " detalles enlace \n", "0 mute, colour http://movingimage.nls.uk/film/0001 \n", "1 silent, colour http://movingimage.nls.uk/film/0002 \n", "2 silent, colour http://movingimage.nls.uk/film/0004 \n", "3 silent, colour http://movingimage.nls.uk/film/0005 \n", "4 sound, black and white http://movingimage.nls.uk/film/0007 \n", "... ... ... \n", "6012 silent, colour http://movingimage.nls.uk/film/UCS0195 \n", "6013 silent, colour http://movingimage.nls.uk/film/UCS0204 \n", "6014 silent, colour http://movingimage.nls.uk/film/UCS0207 \n", "6015 silent, black and white http://movingimage.nls.uk/film/UCS0213 \n", "6016 silent, black and white/colour http://movingimage.nls.uk/film/UCS0214 \n", "\n", "[6017 rows x 10 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Consultando las columnas" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['titulo', 'autor', 'lugar_produccion', 'fecha', 'extension', 'creditos',\n", " 'materias', 'resumen', 'detalles', 'enlace'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ¿Cuántos registros existen?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6017" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explorando las materias\n", "### Creamos una lista de materias y la ordenamos alfabéticamente" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Ceremonies -- Emotions, Attitudes and Behaviour -- Local Government -- Transport -- Edinburgh -- amateur'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['materias'][2]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0 Bus Stations and Depots \n", " 1 Buses and Coaches, general \n", " 2 Celebrations, Traditions and Customs \n", " 3 Children and Infants \n", " 4 Leisure and Recreation \n", " ... \n", "6016 0 Carriages \n", " 1 Ceremonies \n", " 2 Ships and Shipping \n", " 3 Dunbartonshire \n", " 4 technical\n", "Length: 23742, dtype: object" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['materias'].str.split('--', expand=True).stack()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Aberdeen\n", "Aberdeenshire\n", "advertising\n", "Agriculture\n", "Air displays and shows\n", "Air Raids\n", "Aircraft see also Helicopters\n", "Airports\n", "amateur\n", "Angus\n", "Animals\n", "animation\n", "Architecture and Buildings\n", "Argyllshire\n", "Art and Artists, general \n", "Arts and Crafts\n", "Ayrshire\n", "Banff\n", "Berwickshire\n", "biographical\n", "Birds\n", "Borders\n", "British Empire, the\n", "Broadcasting, general\n", "Buddhism\n", "Bulldozers\n", "Bus Stations and Depots\n", "Buses and Coaches, general\n", "Butchers and Butcher Shops\n", "Bute\n", "Cafeterias and Canteens\n", "Caithness\n", "Camping\n", "Canals\n", "Canoeing\n", "Carriages\n", "Celebrations, Traditions and Customs\n", "Celts and Celtic Culture\n", "Ceremonies\n", "Cheese and Cheese Making\n", "Children and Infants\n", "children's\n", "Christmas see also New Year\n", "cine mag\n", "Clackmannanshire\n", "comedy\n", "Construction and Engineering\n", "crime\n", "Crime, Punishment and Law Enforcement\n", "dance\n", "Dentistry\n", "Depression, the\n", "Disillusionment\n", "documentary\n", "Dumfriesshire\n", "Dunbartonshire\n", "Dundee\n", "East Lothian\n", "Easter\n", "Edinburgh\n", "Education\n", "educational\n", "Emotions, Attitudes and Behaviour\n", "Employment, Industry and Industrial Relations\n", "Environment\n", "ethnographic\n", "experimental\n", "fantasy\n", "Ferries\n", "Fife\n", "Fire Service\n", "Fish and Fishing\n", "Fish Gutting\n", "Fish Markets\n", "Fishing Boats\n", "Fishwives\n", "Food and Drink\n", "Forth River\n", "Glasgow\n", "Gorbals, the\n", "Healthcare\n", "Highland Games\n", "Highlands, the\n", "historical\n", "Hogmanay\n", "Holiday Camps\n", "Home Guard\n", "Home Life\n", "home movies and videos\n", "horror\n", "Housing and Living Conditions\n", "industrial\n", "Inner Hebrides\n", "Institutional Care\n", "instructional\n", "Invernesshire\n", "Kincardineshire\n", "Kinrosshire\n", "Kirkudbrightshire\n", "Lanarkshire\n", "Landscapes and Seascapes\n", "Leisure and Recreation\n", "Lifeboats\n", "Lobster Fishing\n", "Local Government\n", "local topical\n", "Loch Ness Monster, the\n", "Media, Communication and the Creative Industries\n", "medical\n", "Midlothian\n", "Military, the\n", "Morayshire\n", "Music\n", "music\n", "Music Hall\n", "music video\n", "Nairn\n", "newsreel\n", "Orkney Islands\n", "Outer Hebrides\n", "Paddle Steamers\n", "parody\n", "Peat and Peat Cutting\n", "Peebles- shire\n", "Perth\n", "Politics\n", "Power Resources\n", "promotional\n", "propaganda\n", "public information\n", "Religion\n", "religion\n", "Renfrewshire\n", "Reptiles\n", "Reservoirs\n", "Residential Homes for the Elderly\n", "Restaurants\n", "Revenge\n", "Riding of the Marches\n", "Rodents\n", "romance\n", "Ross-shire \n", "Roxburghshire \n", "Royalty\n", "Science and Technology\n", "science fiction\n", "scientific\n", "Selkirkshire\n", "Shetland Islands\n", "Ships and Shipping\n", "Special Needs Education\n", "Spinning\n", "sponsored\n", "Sporting Activities\n", "sports\n", "Spring\n", "Stained Glass\n", "Stirling\n", "Stirlingshire\n", "Sutherland\n", "technical\n", "television arts\n", "television documentary\n", "television educational\n", "television entertainment\n", "television news\n", "television sport\n", "Tourism and Travel\n", "training\n", "Transport\n", "travelogue\n", "War\n", "War Crimes\n", "Water and Waterways\n", "West Lothian\n", "Wigtownshire\n", "women film makers\n" ] } ], "source": [ "# Obtener valores únicos\n", "materias = pd.unique(df['materias'].str.split(' -- ', expand=True).stack()).tolist()\n", "for materia in sorted(materias, key=str.lower):\n", " print(materia)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Bus Stations and Depots -- Buses and Coaches, ...\n", "1 Transport -- Glasgow -- documentary -- amateur\n", "2 Ceremonies -- Emotions, Attitudes and Behaviou...\n", "3 Ceremonies -- Transport -- Glasgow\n", "4 Art and Artists, general -- Education -- edu...\n", " ... \n", "6012 Ceremonies -- Construction and Engineering -- ...\n", "6013 Construction and Engineering -- Ships and Ship...\n", "6014 Employment, Industry and Industrial Relations ...\n", "6015 Employment, Industry and Industrial Relations ...\n", "6016 Carriages -- Ceremonies -- Ships and Shipping ...\n", "Name: materias, Length: 6017, dtype: object" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['materias']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### También podemos calcular con qué frecuencia se usa una materia" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
materia contador
0amateur2023
1Leisure and Recreation813
2Glasgow797
3documentary707
4Transport674
5Employment, Industry and Industrial Relations632
6television news542
7Edinburgh538
8Sporting Activities525
9Celebrations, Traditions and Customs453
10Ships and Shipping444
11local topical411
12Children and Infants407
13Media, Communication and the Creative Industries399
14educational379
15Ceremonies359
16Education356
17Arts and Crafts353
18Tourism and Travel352
19Construction and Engineering303
20Agriculture299
21Fish and Fishing272
22sponsored270
23Emotions, Attitudes and Behaviour267
24promotional264
25Food and Drink262
26newsreel256
27Landscapes and Seascapes245
28Art and Artists, general 237
29television documentary224
30Lanarkshire222
31Ayrshire219
32Fife218
33Aberdeen214
34home movies and videos203
35Military, the198
36War197
37Animals192
38Renfrewshire188
39Home Life183
40Politics182
41Power Resources180
42Environment180
43Science and Technology180
44Water and Waterways176
45Argyllshire171
46Architecture and Buildings167
47Religion167
48Forth River166
49Dunbartonshire166
50Aberdeenshire166
51Perth161
52Healthcare159
53women film makers158
54Birds148
55advertising134
56Fishing Boats128
57Housing and Living Conditions127
58comedy125
59Highlands, the124
60Royalty122
61West Lothian119
62Buses and Coaches, general118
63animation118
64Dundee117
65industrial114
66Invernesshire111
67Dumfriesshire107
68Music105
69Borders104
70Carriages103
71Inner Hebrides102
72Outer Hebrides101
73Ferries93
74Stirlingshire91
75Orkney Islands90
76technical88
77Local Government87
78sports78
79Shetland Islands76
80experimental72
81Crime, Punishment and Law Enforcement67
82East Lothian65
83travelogue62
84British Empire, the61
85Bute61
86Institutional Care59
87Ross-shire 58
88Paddle Steamers58
89instructional57
90Stirling57
91Midlothian55
92Roxburghshire 54
93Celts and Celtic Culture53
94Morayshire47
95Berwickshire47
96Peat and Peat Cutting47
97Caithness47
98Angus41
99Selkirkshire41
100Spinning40
101music40
102propaganda37
103television sport37
104Highland Games37
105Camping36
106Fish Markets35
107Aircraft see also Helicopters34
108biographical33
109Cafeterias and Canteens33
110Canals32
111Banff31
112Riding of the Marches31
113Fish Gutting31
114Christmas see also New Year30
115television arts30
116Sutherland30
117Bus Stations and Depots29
118television educational28
119Restaurants28
120Fishwives27
121religion25
122Wigtownshire24
123music video23
124children's20
125Canoeing19
126romance19
127Air displays and shows19
128medical18
129Peebles- shire18
130Gorbals, the18
131Butchers and Butcher Shops18
132Disillusionment18
133Reservoirs16
134Lobster Fishing16
135television entertainment16
136Airports16
137scientific15
138fantasy15
139dance15
140Special Needs Education14
141Bulldozers13
142public information13
143historical13
144Kincardineshire13
145Kirkudbrightshire12
146Lifeboats12
147crime12
148ethnographic12
149cine mag11
150Loch Ness Monster, the11
151Holiday Camps11
152Rodents11
153Home Guard10
154Clackmannanshire10
155training10
156horror9
157Residential Homes for the Elderly8
158science fiction8
159Dentistry8
160Nairn8
161Fire Service7
162Music Hall7
163parody7
164Revenge5
165Depression, the5
166Air Raids5
167Reptiles4
168Kinrosshire4
169Spring4
170Broadcasting, general3
171Hogmanay3
172Buddhism2
173Cheese and Cheese Making2
174War Crimes1
175Easter1
176Stained Glass1
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Partir las materias y obtener el número de ocurrencias\n", "materia_contador = df['materias'].str.split(' -- ').apply(lambda x: pd.Series(x).value_counts()).sum().astype('int').sort_values(ascending=False).to_frame().reset_index(level=0)\n", "# Añadimos las columnas\n", "materia_contador.columns = ['materia', 'contador']\n", "# Mostrar con barras horizontales\n", "display(materia_contador.style.bar(subset=['contador'], color='#d65f5f').set_properties(subset=['contador'], **{'width': '300px'}))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }