{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# La Bibliografía Española de Cartografía\n", "\n", "Este notebook utiliza la Bibliografía Española de Cartografía es una publicación cuyo objetivo principal es dar a conocer el material cartográfico publicado en España, que ingresa en la Biblioteca Nacional de España que incluye mapas, planos, cartas náuticas, atlas, etc., tanto en formato impreso como electrónico. \n", "\n", "En el año 2007 se inició la publicación en línea y desde el 2010 se incorporan a esta bibliografía los atlas, antes recogidos en la Bibliografía Española de Monografías. Se publica anualmente y es consultable en línea.\n", "\n", "https://datos.gob.es/es/catalogo/e00123904-bibliografia-espanola-de-cartografia-2017" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importando las librerías de código" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# https://pypi.org/project/pymarc/\n", "import pymarc, re, csv\n", "import pandas as pd\n", "from pymarc import parse_xml_to_array\n", "from datapackage import Package" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generando un fichero CSV como salida con el contenido procesado a partir de los ficheros originales" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "with open('registros_marc_bne.csv', 'w') as csv_fichero:\n", " csv_salida = csv.writer(csv_fichero, delimiter = ',', quotechar = '\"', quoting = csv.QUOTE_MINIMAL)\n", " csv_salida.writerow(['titulo', 'autor', 'variante_titulo', 'extension', 'distribuidor', 'materias', 'nota', 'detalles'])\n", "\n", " registros = parse_xml_to_array(open('BNE-cartografia/BN_CARTOGRAFIA_2017-MARCXML.xml'))\n", "\n", " for registro in registros:\n", "\n", " titulo = autor = variante_titulo = extension = distribuidor = materias = nota = detalles =''\n", "\n", " # titulo\n", " if registro['245'] is not None:\n", " titulo = registro['245']['a']\n", " if registro['245']['b'] is not None:\n", " titulo = titulo + \" \" + registro['245']['b']\n", "\n", " # autor\n", " if registro['100'] is not None:\n", " autor = registro['100']['a']\n", " elif registro['110'] is not None:\n", " autor = registro['110']['a']\n", " elif registro['700'] is not None:\n", " autor = registro['700']['a']\n", " elif registro['710'] is not None:\n", " autor = registro['710']['a']\n", "\n", " # variante de titulo\n", " if registro['246'] is not None:\n", " variante_titulo = registro['246']['a']\n", "\n", "\n", " # Physical Description - extent\n", " for f in registro.get_fields('300'):\n", " extension = f.get_subfields('a')\n", " if len(extension):\n", " extension = extension[0]\n", " # TODO cleaning\n", " detalles = f.get_subfields('b')\n", " if len(detalles):\n", " detalles = detalles[0]\n", "\n", " # distribuidor\n", " if registro['260'] is not None:\n", " distribuidor = registro['260']['b']\n", "\n", " # nota\n", " if registro['501'] is not None:\n", " nota = registro['501']['a']\n", "\n", " # materia\n", " if registro['650'] is not None:\n", " materias = '' \n", " for f in registro.get_fields('650'):\n", " materias += f.get_subfields('a')[0] + ' -- '\n", " materias += f.get_subfields('v')[0] + ' -- '\n", "\n", " materias = re.sub(' -- $', '', materias)\n", "\n", " csv_salida.writerow([titulo,autor,variante_titulo,extension,distribuidor,materias,nota,detalles])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Leyendo el fichero CSV " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Este comando añade el contenido del fichero a un Pandas DataFrame\n", "df = pd.read_csv('registros_marc_bne.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consultando el contenido" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | titulo | \n", "autor | \n", "variante_titulo | \n", "extension | \n", "distribuidor | \n", "materias | \n", "nota | \n", "detalles | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "14 routen um Manacor | \n", "Manacor | \n", "Vierzehn routen um Manacor | \n", "1 mapa | \n", "Ajuntament de Manacor, Delegació de Turisme | \n", "Excursionismo -- Mapas | \n", "Incluye fotografías | \n", "col. | \n", "
1 | \n", "14 routes in Manacor | \n", "Manacor | \n", "Fourteen routes in Manacor | \n", "1 mapa | \n", "Ajuntament de Manacor, Delegació de Turisme | \n", "Excursionismo -- Mapas | \n", "Incluye fotografías | \n", "col. | \n", "
2 | \n", "14 rutas por Manacor | \n", "Manacor | \n", "Catorce rutas por Manacor | \n", "1 mapa | \n", "Ajuntament de Manacor, Delegació de Turisme | \n", "Excursionismo -- Mapas | \n", "Incluye fotografías | \n", "col. | \n", "
3 | \n", "14 rutes per Manacor | \n", "Manacor | \n", "Quatorze rutes per Manacor | \n", "1 mapa | \n", "Ajuntament de Manacor, Delegació de Turisme | \n", "Excursionismo -- Mapas | \n", "Incluye fotografías | \n", "col. | \n", "
4 | \n", "A Coruña | \n", "Turismo de Galicia | \n", "NaN | \n", "1 plano | \n", "Turismo de Galicia | \n", "NaN | \n", "Al verso: Información y localización en el pla... | \n", "col. | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1298 | \n", "[Southeastern Spain] low flying chart : Europe | \n", "España | \n", "NaN | \n", "1 h. | \n", "Ministerio de Defensa, Secretaría General Técnica | \n", "NaN | \n", "Al verso: Clave por símbolos; frecuencias de a... | \n", "col. | \n", "
1299 | \n", "[Southwestern Spain] low flying chart : Europe | \n", "España | \n", "NaN | \n", "1 h. | \n", "Ministerio de Defensa, Secretaría General Técnica | \n", "NaN | \n", "Al verso: Clave por símbolos; información aero... | \n", "col. | \n", "
1300 | \n", "[Southwestern Spain] low flying chart : Europe | \n", "España | \n", "NaN | \n", "1 h. | \n", "Ministerio de Defensa, Secretaría General Técnica | \n", "NaN | \n", "Al verso: Clave por símbolos; frecuencias de a... | \n", "col. | \n", "
1301 | \n", "[Western Spain] low flying chart : Europe | \n", "España | \n", "NaN | \n", "1 h. | \n", "Ministerio de Defensa, Secretaría General Técnica | \n", "NaN | \n", "Al verso: Clave por símbolos; información aero... | \n", "col. | \n", "
1302 | \n", "[Western Spain] low flying chart : Europe | \n", "España | \n", "NaN | \n", "1 h. | \n", "Ministerio de Defensa, Secretaría General Técnica | \n", "NaN | \n", "Al verso: Clave por símbolos; frecuencias de a... | \n", "col. | \n", "
1303 rows × 8 columns
\n", "materia | contador | |
---|---|---|
0 | \n", "Mapas | \n", "508 | \n", "
1 | \n", "Excursionismo | \n", "130 | \n", "
2 | \n", "Carreteras | \n", "95 | \n", "
3 | \n", "Planos | \n", "88 | \n", "
4 | \n", "Senderismo | \n", "81 | \n", "
5 | \n", "Comercios | \n", "79 | \n", "
6 | \n", "Geología | \n", "22 | \n", "
7 | \n", "Nombres geográficos | \n", "22 | \n", "
8 | \n", "Cicloturismo | \n", "17 | \n", "
9 | \n", "Suelos | \n", "12 | \n", "
10 | \n", "Orientación (Deporte) | \n", "10 | \n", "
11 | \n", "Zonas industriales | \n", "8 | \n", "
12 | \n", "Hidrogeología | \n", "7 | \n", "
13 | \n", "Carreras campo a través (Atletismo) | \n", "6 | \n", "
14 | \n", "Aves | \n", "5 | \n", "
15 | \n", "Geomorfología | \n", "5 | \n", "
16 | \n", "Vinos | \n", "5 | \n", "
17 | \n", "Enoturismo | \n", "5 | \n", "
18 | \n", "Transportes | \n", "4 | \n", "
19 | \n", "Vías verdes | \n", "4 | \n", "
20 | \n", "Museos | \n", "3 | \n", "
21 | \n", "Ferrocarriles metropolitanos | \n", "3 | \n", "
22 | \n", "Actividades recreativas al aire libre | \n", "3 | \n", "
23 | \n", "Tráfico | \n", "3 | \n", "
24 | \n", "Comarcas | \n", "3 | \n", "
25 | \n", "Buceo | \n", "3 | \n", "
26 | \n", "Surf | \n", "3 | \n", "
27 | \n", "Centros docentes | \n", "3 | \n", "
28 | \n", "Piragüismo | \n", "3 | \n", "
29 | \n", "Cines | \n", "2 | \n", "
30 | \n", "Cine | \n", "2 | \n", "
31 | \n", "Arquitectura románica | \n", "2 | \n", "
32 | \n", "Costas | \n", "2 | \n", "
33 | \n", "Ciclismo todoterreno | \n", "2 | \n", "
34 | \n", "Ciclismo | \n", "2 | \n", "
35 | \n", "Bodegas | \n", "2 | \n", "
36 | \n", "Aceite de oliva | \n", "2 | \n", "
37 | \n", "Mapas | \n", "2 | \n", "
38 | \n", "Recogida selectiva de residuos | \n", "2 | \n", "
39 | \n", "Restaurantes | \n", "2 | \n", "
40 | \n", "Peregrinaciones cristianas | \n", "2 | \n", "
41 | \n", "Jardines botánicos | \n", "2 | \n", "
42 | \n", "Parques naturales | \n", "2 | \n", "
43 | \n", "Zonas húmedas | \n", "1 | \n", "
44 | \n", "Caminos | \n", "1 | \n", "
45 | \n", "Rutas literarias | \n", "1 | \n", "
46 | \n", "Bares | \n", "1 | \n", "
47 | \n", "Sector servicios | \n", "1 | \n", "
48 | \n", "Carreras de automóviles | \n", "1 | \n", "
49 | \n", "Arte románico | \n", "1 | \n", "
50 | \n", "Sidra | \n", "1 | \n", "
51 | \n", "Arquitectura modernista | \n", "1 | \n", "
52 | \n", "Arquitectura | \n", "1 | \n", "
53 | \n", "Alojamientos turísticos | \n", "1 | \n", "
54 | \n", "Albergues juveniles | \n", "1 | \n", "
55 | \n", "Usos del suelo | \n", "1 | \n", "
56 | \n", "Carreras ciclistas | \n", "1 | \n", "
57 | \n", "Centrales hidroeléctricas | \n", "1 | \n", "
58 | \n", "Radiación natural | \n", "1 | \n", "
59 | \n", "Hidrología | \n", "1 | \n", "
60 | \n", "Quesos | \n", "1 | \n", "
61 | \n", "Pesca de agua dulce | \n", "1 | \n", "
62 | \n", "Paisaje | \n", "1 | \n", "
63 | \n", "Mitología | \n", "1 | \n", "
64 | \n", "Mantequilla | \n", "1 | \n", "
65 | \n", "Lengua vasca | \n", "1 | \n", "
66 | \n", "Deportes acuáticos | \n", "1 | \n", "
67 | \n", "Desplazamientos en bicicleta | \n", "1 | \n", "
68 | \n", "Escuelas infantiles | \n", "1 | \n", "
69 | \n", "Hostelería | \n", "1 | \n", "
70 | \n", "Ferrocarriles de alta velocidad | \n", "1 | \n", "
71 | \n", "Historia | \n", "1 | \n", "
72 | \n", "Geoparques | \n", "1 | \n", "