{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# La Bibliografía Española de Cartografía\n", "\n", "Este notebook utiliza la Bibliografía Española de Cartografía es una publicación cuyo objetivo principal es dar a conocer el material cartográfico publicado en España, que ingresa en la Biblioteca Nacional de España que incluye mapas, planos, cartas náuticas, atlas, etc., tanto en formato impreso como electrónico. \n", "\n", "En el año 2007 se inició la publicación en línea y desde el 2010 se incorporan a esta bibliografía los atlas, antes recogidos en la Bibliografía Española de Monografías. Se publica anualmente y es consultable en línea.\n", "\n", "https://datos.gob.es/es/catalogo/e00123904-bibliografia-espanola-de-cartografia-2017" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importando las librerías de código" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# https://pypi.org/project/pymarc/\n", "import pymarc, re, csv\n", "import pandas as pd\n", "from pymarc import parse_xml_to_array\n", "from datapackage import Package" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generando un fichero CSV como salida con el contenido procesado a partir de los ficheros originales" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "with open('registros_marc_bne.csv', 'w') as csv_fichero:\n", " csv_salida = csv.writer(csv_fichero, delimiter = ',', quotechar = '\"', quoting = csv.QUOTE_MINIMAL)\n", " csv_salida.writerow(['titulo', 'autor', 'variante_titulo', 'extension', 'distribuidor', 'materias', 'nota', 'detalles'])\n", "\n", " registros = parse_xml_to_array(open('BNE-cartografia/BN_CARTOGRAFIA_2017-MARCXML.xml'))\n", "\n", " for registro in registros:\n", "\n", " titulo = autor = variante_titulo = extension = distribuidor = materias = nota = detalles =''\n", "\n", " # titulo\n", " if registro['245'] is not None:\n", " titulo = registro['245']['a']\n", " if registro['245']['b'] is not None:\n", " titulo = titulo + \" \" + registro['245']['b']\n", "\n", " # autor\n", " if registro['100'] is not None:\n", " autor = registro['100']['a']\n", " elif registro['110'] is not None:\n", " autor = registro['110']['a']\n", " elif registro['700'] is not None:\n", " autor = registro['700']['a']\n", " elif registro['710'] is not None:\n", " autor = registro['710']['a']\n", "\n", " # variante de titulo\n", " if registro['246'] is not None:\n", " variante_titulo = registro['246']['a']\n", "\n", "\n", " # Physical Description - extent\n", " for f in registro.get_fields('300'):\n", " extension = f.get_subfields('a')\n", " if len(extension):\n", " extension = extension[0]\n", " # TODO cleaning\n", " detalles = f.get_subfields('b')\n", " if len(detalles):\n", " detalles = detalles[0]\n", "\n", " # distribuidor\n", " if registro['260'] is not None:\n", " distribuidor = registro['260']['b']\n", "\n", " # nota\n", " if registro['501'] is not None:\n", " nota = registro['501']['a']\n", "\n", " # materia\n", " if registro['650'] is not None:\n", " materias = '' \n", " for f in registro.get_fields('650'):\n", " materias += f.get_subfields('a')[0] + ' -- '\n", " materias += f.get_subfields('v')[0] + ' -- '\n", "\n", " materias = re.sub(' -- $', '', materias)\n", "\n", " csv_salida.writerow([titulo,autor,variante_titulo,extension,distribuidor,materias,nota,detalles])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Leyendo el fichero CSV " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Este comando añade el contenido del fichero a un Pandas DataFrame\n", "df = pd.read_csv('registros_marc_bne.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consultando el contenido" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tituloautorvariante_tituloextensiondistribuidormateriasnotadetalles
014 routen um ManacorManacorVierzehn routen um Manacor1 mapaAjuntament de Manacor, Delegació de TurismeExcursionismo -- MapasIncluye fotografíascol.
114 routes in ManacorManacorFourteen routes in Manacor1 mapaAjuntament de Manacor, Delegació de TurismeExcursionismo -- MapasIncluye fotografíascol.
214 rutas por ManacorManacorCatorce rutas por Manacor1 mapaAjuntament de Manacor, Delegació de TurismeExcursionismo -- MapasIncluye fotografíascol.
314 rutes per ManacorManacorQuatorze rutes per Manacor1 mapaAjuntament de Manacor, Delegació de TurismeExcursionismo -- MapasIncluye fotografíascol.
4A CoruñaTurismo de GaliciaNaN1 planoTurismo de GaliciaNaNAl verso: Información y localización en el pla...col.
...........................
1298[Southeastern Spain] low flying chart : EuropeEspañaNaN1 h.Ministerio de Defensa, Secretaría General TécnicaNaNAl verso: Clave por símbolos; frecuencias de a...col.
1299[Southwestern Spain] low flying chart : EuropeEspañaNaN1 h.Ministerio de Defensa, Secretaría General TécnicaNaNAl verso: Clave por símbolos; información aero...col.
1300[Southwestern Spain] low flying chart : EuropeEspañaNaN1 h.Ministerio de Defensa, Secretaría General TécnicaNaNAl verso: Clave por símbolos; frecuencias de a...col.
1301[Western Spain] low flying chart : EuropeEspañaNaN1 h.Ministerio de Defensa, Secretaría General TécnicaNaNAl verso: Clave por símbolos; información aero...col.
1302[Western Spain] low flying chart : EuropeEspañaNaN1 h.Ministerio de Defensa, Secretaría General TécnicaNaNAl verso: Clave por símbolos; frecuencias de a...col.
\n", "

1303 rows × 8 columns

\n", "
" ], "text/plain": [ " titulo autor \\\n", "0 14 routen um Manacor Manacor \n", "1 14 routes in Manacor Manacor \n", "2 14 rutas por Manacor Manacor \n", "3 14 rutes per Manacor Manacor \n", "4 A Coruña Turismo de Galicia \n", "... ... ... \n", "1298 [Southeastern Spain] low flying chart : Europe España \n", "1299 [Southwestern Spain] low flying chart : Europe España \n", "1300 [Southwestern Spain] low flying chart : Europe España \n", "1301 [Western Spain] low flying chart : Europe España \n", "1302 [Western Spain] low flying chart : Europe España \n", "\n", " variante_titulo extension \\\n", "0 Vierzehn routen um Manacor 1 mapa \n", "1 Fourteen routes in Manacor 1 mapa \n", "2 Catorce rutas por Manacor 1 mapa \n", "3 Quatorze rutes per Manacor 1 mapa \n", "4 NaN 1 plano \n", "... ... ... \n", "1298 NaN 1 h. \n", "1299 NaN 1 h. \n", "1300 NaN 1 h. \n", "1301 NaN 1 h. \n", "1302 NaN 1 h. \n", "\n", " distribuidor \\\n", "0 Ajuntament de Manacor, Delegació de Turisme \n", "1 Ajuntament de Manacor, Delegació de Turisme \n", "2 Ajuntament de Manacor, Delegació de Turisme \n", "3 Ajuntament de Manacor, Delegació de Turisme \n", "4 Turismo de Galicia \n", "... ... \n", "1298 Ministerio de Defensa, Secretaría General Técnica \n", "1299 Ministerio de Defensa, Secretaría General Técnica \n", "1300 Ministerio de Defensa, Secretaría General Técnica \n", "1301 Ministerio de Defensa, Secretaría General Técnica \n", "1302 Ministerio de Defensa, Secretaría General Técnica \n", "\n", " materias \\\n", "0 Excursionismo -- Mapas \n", "1 Excursionismo -- Mapas \n", "2 Excursionismo -- Mapas \n", "3 Excursionismo -- Mapas \n", "4 NaN \n", "... ... \n", "1298 NaN \n", "1299 NaN \n", "1300 NaN \n", "1301 NaN \n", "1302 NaN \n", "\n", " nota detalles \n", "0 Incluye fotografías col. \n", "1 Incluye fotografías col. \n", "2 Incluye fotografías col. \n", "3 Incluye fotografías col. \n", "4 Al verso: Información y localización en el pla... col. \n", "... ... ... \n", "1298 Al verso: Clave por símbolos; frecuencias de a... col. \n", "1299 Al verso: Clave por símbolos; información aero... col. \n", "1300 Al verso: Clave por símbolos; frecuencias de a... col. \n", "1301 Al verso: Clave por símbolos; información aero... col. \n", "1302 Al verso: Clave por símbolos; frecuencias de a... col. \n", "\n", "[1303 rows x 8 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Consultando las columnas" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['titulo', 'autor', 'variante_titulo', 'extension', 'distribuidor',\n", " 'materias', 'nota', 'detalles'],\n", " dtype='object')" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ¿Cuántos registros existen?" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1303" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explorando las materias\n", "### Creamos una lista de materias y la ordenamos alfabéticamente" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Excursionismo -- Mapas'" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['materias'][2]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0 Excursionismo\n", " 1 Mapas\n", "1 0 Excursionismo\n", " 1 Mapas\n", "2 0 Excursionismo\n", " ... \n", "1284 1 Planos\n", "1285 0 Comercios\n", " 1 Planos\n", "1295 0 Senderismo\n", " 1 Mapas\n", "Length: 1198, dtype: object" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['materias'].str.split(' -- ', expand=True).stack()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Aceite de oliva\n", "Actividades recreativas al aire libre\n", "Albergues juveniles\n", "Alojamientos turísticos\n", "Arquitectura\n", "Arquitectura modernista\n", "Arquitectura románica\n", "Arte románico\n", "Aves\n", "Bares\n", "Bodegas\n", "Buceo\n", "Caminos\n", "Carreras campo a través (Atletismo)\n", "Carreras ciclistas\n", "Carreras de automóviles\n", "Carreteras\n", "Centrales hidroeléctricas\n", "Centros docentes\n", "Ciclismo\n", "Ciclismo todoterreno\n", "Cicloturismo\n", "Cine\n", "Cines\n", "Comarcas\n", "Comercios\n", "Costas\n", "Deportes acuáticos\n", "Desplazamientos en bicicleta\n", "Enoturismo\n", "Escuelas infantiles\n", "Excursionismo\n", "Ferrocarriles de alta velocidad\n", "Ferrocarriles metropolitanos\n", "Geología\n", "Geomorfología\n", "Geoparques\n", "Hidrogeología\n", "Hidrología\n", "Historia\n", "Hostelería\n", "Jardines botánicos\n", "Lengua vasca\n", "Mantequilla\n", "Mapas\n", "Mapas \n", "Mitología\n", "Museos\n", "Nombres geográficos\n", "Orientación (Deporte)\n", "Paisaje\n", "Parques naturales\n", "Peregrinaciones cristianas\n", "Pesca de agua dulce\n", "Piragüismo\n", "Planos\n", "Quesos\n", "Radiación natural\n", "Recogida selectiva de residuos\n", "Restaurantes\n", "Rutas literarias\n", "Sector servicios\n", "Senderismo\n", "Sidra\n", "Suelos\n", "Surf\n", "Transportes\n", "Tráfico\n", "Usos del suelo\n", "Vinos\n", "Vías verdes\n", "Zonas húmedas\n", "Zonas industriales\n" ] } ], "source": [ "# Obtener valores únicos\n", "materias = pd.unique(df['materias'].str.split(' -- ', expand=True).stack()).tolist()\n", "for materia in sorted(materias, key=str.lower):\n", " print(materia)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### También podemos calcular con qué frecuencia se usa una materia" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
materia contador
0Mapas508
1Excursionismo130
2Carreteras95
3Planos88
4Senderismo81
5Comercios79
6Geología22
7Nombres geográficos22
8Cicloturismo17
9Suelos12
10Orientación (Deporte)10
11Zonas industriales8
12Hidrogeología7
13Carreras campo a través (Atletismo)6
14Aves5
15Geomorfología5
16Vinos5
17Enoturismo5
18Transportes4
19Vías verdes4
20Museos3
21Ferrocarriles metropolitanos3
22Actividades recreativas al aire libre3
23Tráfico3
24Comarcas3
25Buceo3
26Surf3
27Centros docentes3
28Piragüismo3
29Cines2
30Cine2
31Arquitectura románica2
32Costas2
33Ciclismo todoterreno2
34Ciclismo2
35Bodegas2
36Aceite de oliva2
37Mapas 2
38Recogida selectiva de residuos2
39Restaurantes2
40Peregrinaciones cristianas2
41Jardines botánicos2
42Parques naturales2
43Zonas húmedas1
44Caminos1
45Rutas literarias1
46Bares1
47Sector servicios1
48Carreras de automóviles1
49Arte románico1
50Sidra1
51Arquitectura modernista1
52Arquitectura1
53Alojamientos turísticos1
54Albergues juveniles1
55Usos del suelo1
56Carreras ciclistas1
57Centrales hidroeléctricas1
58Radiación natural1
59Hidrología1
60Quesos1
61Pesca de agua dulce1
62Paisaje1
63Mitología1
64Mantequilla1
65Lengua vasca1
66Deportes acuáticos1
67Desplazamientos en bicicleta1
68Escuelas infantiles1
69Hostelería1
70Ferrocarriles de alta velocidad1
71Historia1
72Geoparques1
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Partir las materias y obterner el número de ocurrencias\n", "materia_contador = df['materias'].str.split(' -- ').apply(lambda x: pd.Series(x).value_counts()).sum().astype('int').sort_values(ascending=False).to_frame().reset_index(level=0)\n", "# Añadimos las columnas\n", "materia_contador.columns = ['materia', 'contador']\n", "# Mostrar con barras horizontales\n", "display(materia_contador.style.bar(subset=['contador'], color='#d65f5f').set_properties(subset=['contador'], **{'width': '300px'}))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }