{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding non-English newspapers in Trove\n", "\n", "There are a growing number of non-English newspapers digitised in Trove. However, if you're only searching using English keywords, you might never know that they're there. I thought it would be useful to generate a list of non-English newspapers, but it wasn't quite as straightforward as I thought.\n", "\n", "## How not to do it...\n", "\n", "My first thought was I could start by searching for digitised newspapers amongst the library records in Trove. My theory was that catalogue metadata would include language information. For example, you can search for newspapers using `format:Periodical/Newspaper` in the books and libraries category (or the `article` API zone). To find those that are digitised, you can add a search for 'trove.nla.gov.au'. Here's the [sort of results](https://trove.nla.gov.au/search/category/books?keyword=%22trove.nla.gov.au%22%20format%3APeriodical%2FNewspaper) you get. Unfortunately, you only get about 826 results and there are many more newspapers than that in Trove. It seems links to digitised newspapers are not consistently recorded.\n", "\n", "My second approach was to get the list of digitised newspapers from the API, extract the ISSN, then use this to search for catalogue records. Here's the code snippet I used.\n", "\n", "``` python\n", "params = {\n", " 'zone': 'article',\n", " 'encoding': 'json',\n", " 'l-format': 'Periodical/Newspaper',\n", " 'reclevel': 'full',\n", " 'key': TROVE_API_KEY\n", "}\n", "newspapers = get_newspapers()\n", "for newspaper in newspapers:\n", " print(f'\\n{newspaper[\"title\"]}')\n", " issn = newspaper.get('issn')\n", " params['q'] = f'issn:{issn}'\n", " response = s.get('https://api.trove.nla.gov.au/v2/result', params=params)\n", " data = response.json()\n", " try:\n", " works = data['response']['zone'][0]['records']['work']\n", " except KeyError:\n", " print('Not found')\n", " else:\n", " for work in works:\n", " print(work.get('language'))\n", " if not response.from_cache:\n", " time.sleep(0.2)\n", "```\n", "\n", "The main problem here is that not all titles have ISSNs. You could try searching on the titles is there's no ISSN, but this would involve a fair bit of disambiguation. In any case, in running this I discovered that while there is some language information in the metadata, it's not consistently applied. So basically a metadata-only approach is not going to work. Sigh...\n", "\n", "## How I actually did it\n", "\n", "If I couldn't get language details from metadata, then I had to try and extract it from the resource itself. I spent quite a bit of time looking around for Python packages that provided reliable language detection. The first one I tried regularly identified Mandarin as Korean (it turns out this was a known issue). Another one sent me into dependency hell. Finally I found [pycld3](https://pypi.org/project/pycld3/) which installed with `pip`, and *just worked*.\n", "\n", "My plan was to get the list of newspapers via the API as before, then fire off an empty search for each one. I'd then loop through the results, running the language detector over the article text. I set the query parameters to retrieve the maxmimum number of results in one request – 100. That seemed like a reasonable sample. To try and provide a big enough amount of text for the language detector to work with, I set the number of words parameter to return articles with between 100 and 1000 words. So the query parameters I used were:\n", "\n", "``` python\n", "params = {\n", " 'zone': 'newspaper',\n", " 'encoding': 'json',\n", " 'l-word': '100 - 1000 Words',\n", " 'include': 'articletext',\n", " 'key': TROVE_API_KEY,\n", " 'q': ' ',\n", " 'n': 100,\n", "}\n", "```\n", "\n", "Because some of the newspapers had short runs and the word count filter limits the results, I found that I wasn't always getting 100 results per newspaper. To work around this I found the likely language for each article, aggregated the counts, and then calculated the proportion of results for each language. This gave me the proportion of articles in each language – a number I could use across newspapers to find the non-English titles. \n", "\n", "In general this worked pretty well, and the result was a [list of 48 newspapers](non-english-newspapers.md) (also as a [Gist](https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97)) that have significant amounts of non-English content. However, I had to do a fair bit of fiddling to filter out dodgy results. All the details are included below.\n", "\n", "## Problems / limitations\n", "\n", "* It's no surprise that the results of the language detection are affected by the quality of the OCR. \n", "* In filtering out what seems to be the product of dodgy OCR, it's possible that I might be excluding some non-English content. \n", "* I'm only detecting the predominant language for each article, so there might be articles containing a mix of languages that are being missed. \n", "* I'm just talking the first 100 results from a blank search in each newspaper. Larger, or more randomised samples might produce different results.\n", "* Some dodgy detection results remain in the list of newspapers, but the point of this exercise was to find non-English newspapers. If you wanted to accurately determine the quantity of non-English content, you'd have to do a lot more fine-grained analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import what we need" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import time\n", "import requests_cache\n", "from requests.adapters import HTTPAdapter\n", "from requests.packages.urllib3.util.retry import Retry\n", "from collections import Counter\n", "import re\n", "from langdetect import detect\n", "from tqdm.auto import tqdm\n", "import pandas as pd\n", "import cld3\n", "import pycountry\n", "from language_tags import tags\n", "import altair as alt\n", "from pathlib import Path\n", "\n", "s = requests_cache.CachedSession()\n", "retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])\n", "s.mount('https://', HTTPAdapter(max_retries=retries))\n", "s.mount('http://', HTTPAdapter(max_retries=retries))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "TROVE_API_KEY = '[YOUR API KEY]'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Harvest the data and run language detection on articles" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def get_newspapers():\n", " '''\n", " Get a list of newspapers in Trove.\n", " '''\n", " response = s.get('https://api.trove.nla.gov.au/v2/newspaper/titles', params={'encoding': 'json', 'key': TROVE_API_KEY})\n", " data = response.json()\n", " return data['response']['records']['newspaper']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7e7df2aa932c478aaf2c1e6832b0f9fd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(FloatProgress(value=0.0, max=1622.0), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "params = {\n", " 'zone': 'newspaper',\n", " 'encoding': 'json',\n", " #'l-category': 'Article',\n", " 'l-word': '100 - 1000 Words',\n", " 'include': 'articletext',\n", " 'key': TROVE_API_KEY,\n", " 'q': ' ',\n", " 'n': 100,\n", "}\n", "newspaper_langs = []\n", "newspapers = get_newspapers()\n", "for newspaper in tqdm(newspapers):\n", " langs = []\n", " # print(f'\\n{newspaper[\"title\"]}')\n", " params['l-title'] = newspaper['id']\n", " response = s.get('https://api.trove.nla.gov.au/v2/result', params=params)\n", " data = response.json()\n", " n = data['response']['zone'][0]['records']['n']\n", " try:\n", " articles = data['response']['zone'][0]['records']['article']\n", " except KeyError:\n", " # print('Not found')\n", " pass\n", " else:\n", " # Detect language for each article in results\n", " for article in articles:\n", " if 'articleText' in article:\n", " # Clean up OCRd text by removing takings and extra whitespace\n", " text = article['articleText']\n", " text = re.sub('<[^<]+?>', '', text)\n", " text = re.sub(\"\\s\\s+\", \" \", text)\n", " # Get the language\n", " ld = cld3.get_language(text)\n", " # If the language prediction is reliable, save it\n", " if ld.is_reliable:\n", " langs.append(ld.language)\n", " # Find the count of each language detected in the sample of articles\n", " for lang, count in dict(Counter(langs)).items():\n", " # Calculate the language count as a proportion of the total number of results\n", " prop = int(count) / len(langs)\n", " newspaper_langs.append({'id': newspaper['id'], 'title': newspaper['title'], 'language': lang, 'proportion': prop, 'number': n})\n", " if not response.from_cache:\n", " time.sleep(0.2)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert the results into a dataframe." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitlelanguageproportionnumber
0166Canberra Community News (ACT : 1925 - 1927)en1.0100
1165Canberra Illustrated: A Quarterly Magazine (AC...en1.029
269Federal Capital Pioneer (Canberra, ACT : 1924 ...en1.0100
3871Good Neighbour (ACT : 1950 - 1969)en1.0100
4665Student Notes/Canberra University College Stud...en1.0100
\n", "
" ], "text/plain": [ " id title language \\\n", "0 166 Canberra Community News (ACT : 1925 - 1927) en \n", "1 165 Canberra Illustrated: A Quarterly Magazine (AC... en \n", "2 69 Federal Capital Pioneer (Canberra, ACT : 1924 ... en \n", "3 871 Good Neighbour (ACT : 1950 - 1969) en \n", "4 665 Student Notes/Canberra University College Stud... en \n", "\n", " proportion number \n", "0 1.0 100 \n", "1 1.0 29 \n", "2 1.0 100 \n", "3 1.0 100 \n", "4 1.0 100 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(newspaper_langs)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add full language names\n", "\n", "The language detector returns BCP-47-style language codes. To translate these into something that's a bit easier for humans to understand, we can use the [language-tags](https://github.com/OnroerendErfgoed/language-tags) package." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "def get_full_language(lc):\n", " '''\n", " Get full language names from codes\n", " '''\n", " lang = tags.description(lc)\n", " if lang:\n", " return lang[0]\n", " else:\n", " print(lc)\n", " return lc\n", "\n", "df['language_full'] = df['language'].apply(get_full_language)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering the results\n", "\n", "If we just look at the numbers of languages detected we might think that Australia's cultural diversity was much greater than we expected! But the likelihood that there were ten newspapers publishing articles in Igbo (the language of the Igbo people in south-eastern Nigeria) seems small. Obviously there are a considerable number of false positives here." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "English 1565\n", "Maltese 279\n", "Catalan 53\n", "Welsh 35\n", "Japanese 31\n", "Italian 31\n", "Somali 24\n", "Norwegian 23\n", "Danish 17\n", "German 16\n", "Samoan 10\n", "Igbo 10\n", "Portuguese 9\n", "French 9\n", "Chinese 8\n", "Estonian 8\n", "Scottish Gaelic 8\n", "Luxembourgish 8\n", "Vietnamese 7\n", "Western Frisian 7\n", "Hawaiian 7\n", "Russian 6\n", "Modern Greek (1453-) 5\n", "Swedish 5\n", "Filipino 5\n", "Afrikaans 4\n", "Javanese 4\n", "Indonesian 4\n", "Polish 4\n", "Hindi 4\n", "Bulgarian 4\n", "Corsican 4\n", "Dutch 3\n", "Malagasy 3\n", "Haitian 3\n", "Latin 3\n", "Malay (macrolanguage) 3\n", "Albanian 2\n", "Spanish 2\n", "Shona 2\n", "Kurdish 2\n", "Cebuano 2\n", "Irish 2\n", "Ukrainian 2\n", "Bosnian 2\n", "Macedonian 1\n", "Slovak 1\n", "Galician 1\n", "Turkish 1\n", "Czech 1\n", "Lithuanian 1\n", "Croatian 1\n", "Slovenian 1\n", "Zulu 1\n", "Maori 1\n", "Marathi 1\n", "Name: language_full, dtype: int64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['language_full'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember that for each language detected in a newspaper we calculated the proportion of articles in our results set in that language. So we can, for example, just look at newspapers where 100% of the articles are in a single language. This highlights a few non-English language newspapers, but obviously we're missing a lot of others." ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "English 1112\n", "Italian 3\n", "German 3\n", "Modern Greek (1453-) 1\n", "Portuguese 1\n", "Estonian 1\n", "Name: language_full, dtype: int64" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['proportion'] == 1]['language_full'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we chart the proportions, we see them bunched up at either end of the scale. So there are lots of languages detected in only a small proportion of articles." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(df).mark_bar().encode(\n", " x=alt.X('proportion:Q', bin=True),\n", " y='count():Q'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we zoom in on the proportions less than 0.1 (that's 10 articles in a sample of 100) we see that they're mostly less that 0.01 (or 1 article in 100). It seems likely that these are false positives. " ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(df.loc[df['proportion'] < 0.1]).mark_bar().encode(\n", " x=alt.X('proportion:Q', bin=True),\n", " y='count():Q'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's be fairly conservative and filter out languages that have a proportion (per newspaper) less than 0.5. This list seems a bit more in line with what we would expect, but there are still some surprises – 48 newspapers published articles in Maltese?" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "English 1559\n", "Maltese 48\n", "Italian 14\n", "German 9\n", "Chinese 8\n", "Catalan 6\n", "Somali 5\n", "Modern Greek (1453-) 4\n", "Japanese 3\n", "Portuguese 3\n", "Polish 3\n", "Western Frisian 2\n", "Dutch 2\n", "French 2\n", "Spanish 1\n", "Ukrainian 1\n", "Malay (macrolanguage) 1\n", "Welsh 1\n", "Indonesian 1\n", "Russian 1\n", "Danish 1\n", "Scottish Gaelic 1\n", "Bosnian 1\n", "Estonian 1\n", "Vietnamese 1\n", "Macedonian 1\n", "Lithuanian 1\n", "Bulgarian 1\n", "Samoan 1\n", "Name: language_full, dtype: int64" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['proportion'] >= 0.05]['language_full'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we focus in on the newspapers that supposedly have a significant proportion of articles in Maltese, we see some very strange results. I seriously doubt that 80% of the *Mildura Irrigationist* from 1892-3 is in Maltese. So what's going on?" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitlelanguageproportionnumberlanguage_full
2181596L'Italo-Australiano = The Italo-Australian (Su...mt0.222222100Maltese
308623Sunday News (Sydney, NSW : 1919)mt0.219178100Maltese
400224The Castlereagh (Gilgandra, NSW : 1905 - 1907)mt0.105882100Maltese
568500The Richmond River Express and Casino Kyogle A...mt0.168675100Maltese
637452The Sydney Wool and Stock Journal (NSW : 1899 ...mt0.233766100Maltese
710394Twofold Bay and Maneroo Observer (NSW : 1860)mt0.139535100Maltese
719810Upper Hunter Courier (Murrurundi, NSW : 1871)mt0.14285714Maltese
8341207The Coolangatta Chronicle (Qld. : 1926)mt0.13043526Maltese
884892Warwick Daily News (Qld. : 1919 -1954)mt0.139241100Maltese
102834The Advertiser (Adelaide, SA : 1889 - 1931)mt0.486111100Maltese
1513384North Melbourne Gazette (Vic. : 1894 - 1901)mt0.146341100Maltese
1577318Sandringham Southern Cross (Vic. : 1914 - 1918)mt0.312500100Maltese
161913The Argus (Melbourne, Vic. : 1848 - 1957)mt0.629630100Maltese
17151583The Mildura Irrigationist (Vic. : 1892 - 1893)mt0.795455100Maltese
17191581The Mildura Irrigationist and Murray River Agr...mt0.750000100Maltese
17211582The Mildura Irrigationist and Murray River Cul...mt0.333333100Maltese
19881543Murchison Times and Cue-Big Bell-Reedy Advocat...mt0.137500100Maltese
\n", "
" ], "text/plain": [ " id title language \\\n", "218 1596 L'Italo-Australiano = The Italo-Australian (Su... mt \n", "308 623 Sunday News (Sydney, NSW : 1919) mt \n", "400 224 The Castlereagh (Gilgandra, NSW : 1905 - 1907) mt \n", "568 500 The Richmond River Express and Casino Kyogle A... mt \n", "637 452 The Sydney Wool and Stock Journal (NSW : 1899 ... mt \n", "710 394 Twofold Bay and Maneroo Observer (NSW : 1860) mt \n", "719 810 Upper Hunter Courier (Murrurundi, NSW : 1871) mt \n", "834 1207 The Coolangatta Chronicle (Qld. : 1926) mt \n", "884 892 Warwick Daily News (Qld. : 1919 -1954) mt \n", "1028 34 The Advertiser (Adelaide, SA : 1889 - 1931) mt \n", "1513 384 North Melbourne Gazette (Vic. : 1894 - 1901) mt \n", "1577 318 Sandringham Southern Cross (Vic. : 1914 - 1918) mt \n", "1619 13 The Argus (Melbourne, Vic. : 1848 - 1957) mt \n", "1715 1583 The Mildura Irrigationist (Vic. : 1892 - 1893) mt \n", "1719 1581 The Mildura Irrigationist and Murray River Agr... mt \n", "1721 1582 The Mildura Irrigationist and Murray River Cul... mt \n", "1988 1543 Murchison Times and Cue-Big Bell-Reedy Advocat... mt \n", "\n", " proportion number language_full \n", "218 0.222222 100 Maltese \n", "308 0.219178 100 Maltese \n", "400 0.105882 100 Maltese \n", "568 0.168675 100 Maltese \n", "637 0.233766 100 Maltese \n", "710 0.139535 100 Maltese \n", "719 0.142857 14 Maltese \n", "834 0.130435 26 Maltese \n", "884 0.139241 100 Maltese \n", "1028 0.486111 100 Maltese \n", "1513 0.146341 100 Maltese \n", "1577 0.312500 100 Maltese \n", "1619 0.629630 100 Maltese \n", "1715 0.795455 100 Maltese \n", "1719 0.750000 100 Maltese \n", "1721 0.333333 100 Maltese \n", "1988 0.137500 100 Maltese " ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[(df['proportion'] > 0.1) & (df['language_full'] == 'Maltese')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you look at results for the *Mildura Irrigationist* [in Trove](https://trove.nla.gov.au/search/advanced/category/newspapers?l-advtitle=1583&l-advWord=100%20-%201000%20Words) you'll see that many of the page images are blurry, and as a result the OCR is very, very bad. Here's a sample:\n", "\n", "> ill Tatr W lyltwililUmt aat aa«v aa MwOkaWtOPMlkMrf faiflftMMRltitlWBfMNM fmiMW^M^K IMIOHIpM^fQBMMI ft tWMmrwl tWWiltjfNMStW ffw aailwt«M wtMitiar«lH*a ifcmH af tlw ial«««l ion «M««f ffantoif wwtMaaM. tto tf h «frwringmhw torf M hr toaiy. Im*4. ar, fc> mmirf awlUW wefllaM aA. aaytMaa. l «Wa A tfc» tow waliw Macks b aaM, b wil fVfbH Ja ^IMntaam* Mm' ls tolliac. rt Tto aad nf ttoar UhKMimiw*a afM» ftjrwl ans W l OtfWOar jpaaofTwSi aJwwr la'aahS^*— attor aakwt mm rvfimMiMh* ttoai. day - Why. aa IH thrf t«fl almd yaa.\"iw. aal wwifciha m OiO all tto laM amnavaA, fawawNl I r aa4 f wa* tm enr a Mtcfc tto watrr tto wiaaal m a* a* day pfaMat. aa4 (h* ilj amintir* ilm tTtsjtvL.f**' \"\"j •fria—lhati* tow ««4M k.\" tlml t | r 4m» wtn .aa rUa* I h ha«« t ctoantaf InMM* aM*toclt ttopnaMaf II It la Mat rtgM, t jmi awl a 1 : af but d awtliqg a Mr. Jafc Matwa-(MMa M t «wl y gha yaar «toa anl yaar (ma as «fpai ta af t«l. i pwwiaf Mtan (tot jw. twy MwUI «*a1 a«ry ftajr «ndl tar tlw aad annaH* a*«r aarf a««r aaria. tiaa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens when we feed this fragment of bad OCR to the language detector? Remarkably, the language detector is 96% sure that it's Maltese! To find out why this is the case, we'd probably have to dig into the way the language detection model was trained. But for our purposes it's enough to know that some of the languages detected seem to be the result of bad OCR." ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LanguagePrediction(language='mt', probability=0.960280179977417, is_reliable=True, proportion=1.0)" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ocr = '''ill Tatr W lyltwililUmt aat aa«v aa MwOkaWtOPMlkMrf faiflftMMRltitlWBfMNM fmiMW^M^K IMIOHIpM^fQBMMI ft tWMmrwl tWWiltjfNMStW ffw aailwt«M wtMitiar«lH*a ifcmH af tlw ial«««l ion «M««f ffantoif wwtMaaM. tto tf h «frwringmhw torf M hr toaiy. Im*4. ar, fc> mmirf awlUW wefllaM aA. aaytMaa. l «Wa A tfc» tow waliw Macks b aaM, b wil fVfbH Ja ^IMntaam* Mm' ls tolliac. rt Tto aad nf ttoar UhKMimiw*a afM» ftjrwl ans W l OtfWOar jpaaofTwSi aJwwr la'aahS^*— attor aakwt mm rvfimMiMh* ttoai. day - Why. aa IH thrf t«fl almd yaa.\"iw. aal wwifciha m OiO all tto laM amnavaA, fawawNl I r aa4 f wa* tm enr a Mtcfc tto watrr tto wiaaal m a* a* day pfaMat. aa4 (h* ilj amintir* ilm tTtsjtvL.f**' \"\"j •fria—lhati* tow ««4M k.\" tlml t | r 4m» wtn .aa rUa* I h ha«« t ctoantaf InMM* aM*toclt ttopnaMaf II It la Mat rtgM, t jmi awl a 1 : af but d awtliqg a Mr. Jafc Matwa-(MMa M t «wl y gha yaar «toa anl yaar (ma as «fpai ta af t«l. i pwwiaf Mtan (tot jw. twy MwUI «*a1 a«ry ftajr «ndl tar tlw aad annaH* a*«r aarf a««r aaria. tiaa'''\n", "cld3.get_language(ocr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course there might actually be newspapers with articles in Maltese, so we don't want to filter them all out. So let's do some manual inspection of the newspapers that *seem* to have non-English content. First we'll filter our results to include only languages with proportions of more than 0.05, and then drop out newspapers that seem to be only in English. We end up with 105 different titles. " ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "105" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The filter on the groupby drops out newspapers that only have articles in English.\n", "filtered = df.loc[df['proportion'] >= 0.05].groupby(by=['title', 'id']).filter(lambda x: (len(x) > 1) or (len(x)== 1 and x['language'] != 'en'))\n", "papers = filtered.groupby(by=['title', 'id'])\n", "len(papers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's list those 105 newspapers. From the list below, I think it's pretty easy to pick out the results that are likely to be the product of bad OCR." ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "A Voz de Timor (Dili, East Timor : 1970 - 1975) (1498)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
9Portuguesept1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "9 Portuguese pt 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Adelaide Chronicle and South Australian Literary Record (SA : 1840 - 1842) (986)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
894Englishen0.929293
893Catalanca0.070707
\n", "
" ], "text/plain": [ " language_full language proportion\n", "894 English en 0.929293\n", "893 Catalan ca 0.070707" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Adelaide Independent and Cabinet of Amusement (SA : 1841) (1336)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
895Englishen0.928571
897Catalanca0.061224
\n", "
" ], "text/plain": [ " language_full language proportion\n", "895 English en 0.928571\n", "897 Catalan ca 0.061224" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Adelaider Deutsche Zeitung (SA : 1851 - 1862) (277)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
904Germande1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "904 German de 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Auburn and District News (NSW : 1929) (1320)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
40Englishen0.947368
41Vietnamesevi0.052632
\n", "
" ], "text/plain": [ " language_full language proportion\n", "40 English en 0.947368\n", "41 Vietnamese vi 0.052632" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Australische Zeitung (Adelaide, SA : 1875 - 1916) (1150)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
908Germande1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "908 German de 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Bangkok Recorder (Thailand : 1865 - 1867) (1488)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
10Englishen0.925532
11Maltesemt0.053191
\n", "
" ], "text/plain": [ " language_full language proportion\n", "10 English en 0.925532\n", "11 Maltese mt 0.053191" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Berita Repoeblik (Djakarta, Indonesia : 1945 - 1946) (1283)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
14Malay (macrolanguage)ms0.891304
15Indonesianid0.108696
\n", "
" ], "text/plain": [ " language_full language proportion\n", "14 Malay (macrolanguage) ms 0.891304\n", "15 Indonesian id 0.108696" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Bulong Bulletin and Mining Register (WA : 1897 - 1898) (1400)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1813Englishen0.913043
1814Maltesemt0.086957
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1813 English en 0.913043\n", "1814 Maltese mt 0.086957" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chinese Republic News (Sydney, NSW : 1914 - 1937) (1186)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
82Chinesezh0.945652
\n", "
" ], "text/plain": [ " language_full language proportion\n", "82 Chinese zh 0.945652" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chinese Times (Melbourne, Vic. : 1902 - 1922) (705)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1304Chinesezh0.843373
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1304 Chinese zh 0.843373" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chronicle and North Coast Advertiser (Qld. : 1903 - 1922) (286)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
765Englishen0.93617
766Maltesemt0.06383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "765 English en 0.93617\n", "766 Maltese mt 0.06383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chung Wah News (Perth, WA : 1981 - 1987) (1383)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1831Englishen0.637363
1830Chinesezh0.263736
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1831 English en 0.637363\n", "1830 Chinese zh 0.263736" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Colac Reformer (Vic. : 1914 - 1918) (763)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1324Englishen0.947917
1325Maltesemt0.052083
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1324 English en 0.947917\n", "1325 Maltese mt 0.052083" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Daily Post (Hobart, Tas. : 1908 - 1918) (860)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1114Englishen0.704545
1113Japaneseja0.125000
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1114 English en 0.704545\n", "1113 Japanese ja 0.125000" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Der Australische Spiegel = The Australian Mirror (Perth, WA : 1952) (1385)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1856Germande0.83
1857Englishen0.17
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1856 German de 0.83\n", "1857 English en 0.17" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Deutsch-Australische Post : Wochenschrift = German-Australian Post : Weekly (Sydney, NSW : 1893 - 1906) (1600)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
126Germande1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "126 German de 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Deutsche Zeitung für Sud-Australien = German Times for South Australia (Tanunda, SA : 1851) (1577)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
922Germande0.9
921Englishen0.1
\n", "
" ], "text/plain": [ " language_full language proportion\n", "922 German de 0.9\n", "921 English en 0.1" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Die Brucke = The Bridge (Sydney, NSW : 1934 - 1939) (1591)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
127Germande0.729167
128Englishen0.270833
\n", "
" ], "text/plain": [ " language_full language proportion\n", "127 German de 0.729167\n", "128 English en 0.270833" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Die Deutsche Post für die Australischen Colonien = The German Australian Post (Adelaide, SA : 1848 - 1851) (1576)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
923Germande0.989691
\n", "
" ], "text/plain": [ " language_full language proportion\n", "923 German de 0.989691" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Dutch Australian Weekly (Sydney, NSW : 1951 - 1993) (1044)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
132Dutchnl0.882979
133Englishen0.106383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "132 Dutch nl 0.882979\n", "133 English en 0.106383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Dutch Weekly (Sydney, NSW : 1993 - 2004) (1045)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
135Dutchnl0.924731
136Englishen0.053763
\n", "
" ], "text/plain": [ " language_full language proportion\n", "135 Dutch nl 0.924731\n", "136 English en 0.053763" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Echo : Polski Tygodnik Niezalezny (Perth, WA : 1950 - 1952) (1384)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1862Polishpl0.91
1863Englishen0.09
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1862 Polish pl 0.91\n", "1863 English en 0.09" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Eco Italiano (Perth, WA : 1958 - 1959) (1387)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1864Italianit1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1864 Italian it 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Emu Bay Times and North West and West Coast Advocate (Tas. : 1897 - 1899) (116)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1130Englishen0.929412
1131Maltesemt0.070588
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1130 English en 0.929412\n", "1131 Maltese mt 0.070588" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Evelyn Observer, and South and East Bourke Record (Vic. : 1882 - 1902) (145)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1358Englishen0.913978
1357Maltesemt0.075269
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1358 English en 0.913978\n", "1357 Maltese mt 0.075269" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Geelong Advertiser (Vic. : 1840 - 1845) (292)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1379Englishen0.904255
1378Samoansm0.074468
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1379 English en 0.904255\n", "1378 Samoan sm 0.074468" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Geraldton Advocate and Johnstone River Guardian (Qld. : 1895 - 1896) (1103)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
774Englishen0.910112
775Maltesemt0.089888
\n", "
" ], "text/plain": [ " language_full language proportion\n", "774 English en 0.910112\n", "775 Maltese mt 0.089888" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Geraldton Express and Murchison Goldfields News (WA : 1894 - 1896) (1623)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1875Englishen0.661538
1879Maltesemt0.076923
1876Japaneseja0.061538
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1875 English en 0.661538\n", "1879 Maltese mt 0.076923\n", "1876 Japanese ja 0.061538" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Guang yi hua bao = The Chinese Australian Herald (Sydney, NSW : 1894 - 1923) (704)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
170Chinesezh0.803030
173Western Frisianfy0.075758
\n", "
" ], "text/plain": [ " language_full language proportion\n", "170 Chinese zh 0.803030\n", "173 Western Frisian fy 0.075758" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Hamilton Spectator and Grange District Advertiser (South Melbourne, Vic. : 1860 - 1870) (927)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1410Englishen0.921348
1409Maltesemt0.078652
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1410 English en 0.921348\n", "1409 Maltese mt 0.078652" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Healesville Guardian (Vic. : 1893 - 1898) (140)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1415Englishen0.938144
1416Maltesemt0.051546
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1415 English en 0.938144\n", "1416 Maltese mt 0.051546" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Hellenic Echo (Perth, WA : 1967 - 1968) (1389)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1917Modern Greek (1453-)el1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1917 Modern Greek (1453-) el 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Il Canguro = The Kangaroo (Perth, WA : 1955 - 1957) (1378)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1919Italianit0.97
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1919 Italian it 0.97" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Il Giornale Italiano (Sydney, NSW : 1932 - 1940) (279)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
186Italianit0.92
187Englishen0.08
\n", "
" ], "text/plain": [ " language_full language proportion\n", "186 Italian it 0.92\n", "187 English en 0.08" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Il Risveglio = The Awakening (Sydney, NSW : 1944 - 1954) (1601)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
188Italianit0.777778
189Englishen0.222222
\n", "
" ], "text/plain": [ " language_full language proportion\n", "188 Italian it 0.777778\n", "189 English en 0.222222" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Inglewood Advertiser (Vic. : 1914 - 1918) (570)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1435Englishen0.936842
1436Maltesemt0.063158
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1435 English en 0.936842\n", "1436 Maltese mt 0.063158" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Italian Bulletin of Australia (Sydney, NSW : 1922 - 1928, 1935 - 1940) (1602)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
199Englishen0.840426
200Italianit0.159574
\n", "
" ], "text/plain": [ " language_full language proportion\n", "199 English en 0.840426\n", "200 Italian it 0.159574" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Italian Bulletin of Commerce (Sydney, NSW : 1929 - 1935) (1603)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
201Englishen0.903226
202Italianit0.096774
\n", "
" ], "text/plain": [ " language_full language proportion\n", "201 English en 0.903226\n", "202 Italian it 0.096774" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Italo-Australian (Sydney, NSW : 1927 - 1940) (1595)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
203Italianit0.909091
204Englishen0.090909
\n", "
" ], "text/plain": [ " language_full language proportion\n", "203 Italian it 0.909091\n", "204 English en 0.090909" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Japanese Perth Times (Subiaco, WA : 1989 - 1996) (1386)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1924Japaneseja0.93617
1925Englishen0.06383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1924 Japanese ja 0.93617\n", "1925 English en 0.06383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Katoomba Times (NSW : 1889 - 1894) (906)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
207Englishen0.934066
209Maltesemt0.054945
\n", "
" ], "text/plain": [ " language_full language proportion\n", "207 English en 0.934066\n", "209 Maltese mt 0.054945" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Kyabram Union (Vic. : 1886 - 1894) (196)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1456Englishen0.921348
1457Maltesemt0.056180
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1456 English en 0.921348\n", "1457 Maltese mt 0.056180" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "L'Italo-Australiano = The Italo-Australian (Surry Hills, NSW : 1885) (1596)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
217Italianit0.682540
218Maltesemt0.222222
\n", "
" ], "text/plain": [ " language_full language proportion\n", "217 Italian it 0.682540\n", "218 Maltese mt 0.222222" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "L'Italo-Australiano = The Italo-Australian (Sydney, NSW : 1905 - 1909) (1597)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
223Italianit0.95
\n", "
" ], "text/plain": [ " language_full language proportion\n", "223 Italian it 0.95" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "La Rondine (Perth, WA : 1969 - 1994) (1388)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1942Italianit0.928571
1943Englishen0.071429
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1942 Italian it 0.928571\n", "1943 English en 0.071429" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Laura Standard and Crystal Brook Courier (SA : 1917 - 1948) (926)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
940Englishen0.931034
941Maltesemt0.068966
\n", "
" ], "text/plain": [ " language_full language proportion\n", "940 English en 0.931034\n", "941 Maltese mt 0.068966" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Le Courrier Australien (Sydney, NSW : 1892 - 2011) (829)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
228Frenchfr0.816327
229Englishen0.173469
\n", "
" ], "text/plain": [ " language_full language proportion\n", "228 French fr 0.816327\n", "229 English en 0.173469" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Mediterranean Voice (Perth, WA : 1971 - 1972) (1390)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1961Modern Greek (1453-)el0.375000
1955Englishen0.281250
1962Portuguesept0.104167
1956Frenchfr0.062500
1954Spanishes0.052083
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1961 Modern Greek (1453-) el 0.375000\n", "1955 English en 0.281250\n", "1962 Portuguese pt 0.104167\n", "1956 French fr 0.062500\n", "1954 Spanish es 0.052083" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Meie Kodu = Our Home (Sydney, NSW : 1949 - 1956) (280)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
238Estonianet1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "238 Estonian et 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Murchison Times and Cue-Big Bell-Reedy Advocate (WA : 1937 - 1942) (1543)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1987Englishen0.8250
1988Maltesemt0.1375
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1987 English en 0.8250\n", "1988 Maltese mt 0.1375" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Mu̇sų Pastogė = Our Haven (Sydney, NSW : 1950 - 1954) (1594)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
250Lithuanianlt0.95
\n", "
" ], "text/plain": [ " language_full language proportion\n", "250 Lithuanian lt 0.95" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Narandera Argus and Riverina Advertiser (NSW : 1893 - 1953) (431)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
254Englishen0.940476
255Maltesemt0.059524
\n", "
" ], "text/plain": [ " language_full language proportion\n", "254 English en 0.940476\n", "255 Maltese mt 0.059524" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Narromine News and Trangie Advocate (NSW : 1898 - 1955) (430)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
256Englishen0.946809
257Maltesemt0.053191
\n", "
" ], "text/plain": [ " language_full language proportion\n", "256 English en 0.946809\n", "257 Maltese mt 0.053191" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Nasza droga (Adelaide, SA : 1952 - 1954) (1323)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
947Polishpl0.9
948Englishen0.1
\n", "
" ], "text/plain": [ " language_full language proportion\n", "947 Polish pl 0.9\n", "948 English en 0.1" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Norden (Melbourne, Vic. : 1914 - 1918) (797)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1505Englishen0.467391
1504Danishda0.413043
1506Maltesemt0.065217
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1505 English en 0.467391\n", "1504 Danish da 0.413043\n", "1506 Maltese mt 0.065217" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "North Melbourne Gazette (Vic. : 1894 - 1901) (384)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1512Englishen0.829268
1513Maltesemt0.146341
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1512 English en 0.829268\n", "1513 Maltese mt 0.146341" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Oceania (Sydney, NSW : 1913 - 1915) (1598)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
270Englishen0.574468
269Italianit0.425532
\n", "
" ], "text/plain": [ " language_full language proportion\n", "270 English en 0.574468\n", "269 Italian it 0.425532" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Referee (Sydney, NSW : 1886 - 1939) (499)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
284Englishen0.924242
285Maltesemt0.075758
\n", "
" ], "text/plain": [ " language_full language proportion\n", "284 English en 0.924242\n", "285 Maltese mt 0.075758" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Reporter and Illawarra Journal (Kiama, NSW : 1887 - 1894) (389)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
286Englishen0.891566
288Maltesemt0.084337
\n", "
" ], "text/plain": [ " language_full language proportion\n", "286 English en 0.891566\n", "288 Maltese mt 0.084337" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Ringwood and Croydon Chronicle (Vic. : 1914 - 1918) (329)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1565Englishen0.93617
1566Maltesemt0.06383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1565 English en 0.93617\n", "1566 Maltese mt 0.06383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Rockhampton Bulletin and Central Queensland Advertiser (Qld. : 1861 - 1871) (92)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
814Englishen0.946237
815Maltesemt0.053763
\n", "
" ], "text/plain": [ " language_full language proportion\n", "814 English en 0.946237\n", "815 Maltese mt 0.053763" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sandringham Southern Cross (Vic. : 1914 - 1918) (318)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1576Englishen0.6500
1577Maltesemt0.3125
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1576 English en 0.6500\n", "1577 Maltese mt 0.3125" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Seamen's Strike Bulletin (Melbourne, Vic. : 1919) (1043)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1584Polishpl0.4
1582Western Frisianfy0.2
1583Bosnianbs0.2
1585Russianru-Latn0.2
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1584 Polish pl 0.4\n", "1582 Western Frisian fy 0.2\n", "1583 Bosnian bs 0.2\n", "1585 Russian ru-Latn 0.2" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Southern Australian (Adelaide, SA : 1838 - 1844) (171)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1012Englishen0.904255
1011Catalanca0.074468
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1012 English en 0.904255\n", "1011 Catalan ca 0.074468" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Southern Morning Herald (Goulburn, NSW : 1920 - 1923) (418)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
304Englishen0.909091
306Maltesemt0.077922
\n", "
" ], "text/plain": [ " language_full language proportion\n", "304 English en 0.909091\n", "306 Maltese mt 0.077922" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Stampa Italiana = The Italian Press (Perth, WA : 1931 - 1932) (1380)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2026Italianit0.97
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2026 Italian it 0.97" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Suedaustralische Zeitung (Adelaide, SA : 1850 - 1851) (314)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1022Germande0.888889
1023Englishen0.111111
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1022 German de 0.888889\n", "1023 English en 0.111111" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sunday News (Sydney, NSW : 1919) (623)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
309Englishen0.739726
308Maltesemt0.219178
\n", "
" ], "text/plain": [ " language_full language proportion\n", "309 English en 0.739726\n", "308 Maltese mt 0.219178" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sunday Times Edizione Italiana (Perth, WA : 1958 - 1959) (1379)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2031Italianit1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2031 Italian it 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sydney Chronicle (NSW : 1846 - 1848) (94)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
313Englishen0.923077
314Maltesemt0.076923
\n", "
" ], "text/plain": [ " language_full language proportion\n", "313 English en 0.923077\n", "314 Maltese mt 0.076923" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Süd Australische Zeitung (Tanunda and Adelaide, SA : 1860 - 1874) (278)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1020Germande0.989691
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1020 German de 0.989691" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Tasmanian Evening Herald (Launceston, Tas. : 1878) (1265)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1154Englishen0.898876
1153Maltesemt0.067416
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1154 English en 0.898876\n", "1153 Maltese mt 0.067416" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Advertiser (Adelaide, SA : 1889 - 1931) (34)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1027Englishen0.513889
1028Maltesemt0.486111
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1027 English en 0.513889\n", "1028 Maltese mt 0.486111" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Argus (Melbourne, Vic. : 1848 - 1957) (13)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1619Maltesemt0.629630
1620Englishen0.358025
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1619 Maltese mt 0.629630\n", "1620 English en 0.358025" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Castlereagh (Gilgandra, NSW : 1905 - 1907) (224)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
399Englishen0.741176
401Somaliso0.152941
400Maltesemt0.105882
\n", "
" ], "text/plain": [ " language_full language proportion\n", "399 English en 0.741176\n", "401 Somali so 0.152941\n", "400 Maltese mt 0.105882" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Chinese Advertiser (Ballarat, Vic. : 1856) (706)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1646Chinesezh0.500000
1648Englishen0.333333
1647Scottish Gaelicgd0.166667
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1646 Chinese zh 0.500000\n", "1648 English en 0.333333\n", "1647 Scottish Gaelic gd 0.166667" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Coolangatta Chronicle (Qld. : 1926) (1207)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
833Englishen0.869565
834Maltesemt0.130435
\n", "
" ], "text/plain": [ " language_full language proportion\n", "833 English en 0.869565\n", "834 Maltese mt 0.130435" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The English and Chinese Advertiser (Vic. : 1856 - 1858) (685)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1664Englishen0.894737
1665Chinesezh0.052632
1666Maltesemt0.052632
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1664 English en 0.894737\n", "1665 Chinese zh 0.052632\n", "1666 Maltese mt 0.052632" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Goldfields Observer (Kalgoorlie, WA : 1930 - 1939) (1626)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2095Englishen0.909091
2097Maltesemt0.051948
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2095 English en 0.909091\n", "2097 Maltese mt 0.051948" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Gwydir Examiner and Moree General Advertiser (NSW : 1898 - 1899) (886)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
466Englishen0.910112
467Maltesemt0.078652
\n", "
" ], "text/plain": [ " language_full language proportion\n", "466 English en 0.910112\n", "467 Maltese mt 0.078652" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Melbourne Advertiser (Vic. : 1838) (935)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1696Englishen0.666667
1697Welshcy0.333333
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1696 English en 0.666667\n", "1697 Welsh cy 0.333333" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Mildura Irrigationist (Vic. : 1892 - 1893) (1583)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1715Maltesemt0.795455
1714Englishen0.113636
1716Somaliso0.090909
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1715 Maltese mt 0.795455\n", "1714 English en 0.113636\n", "1716 Somali so 0.090909" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Mildura Irrigationist and Murray River Agricultural Times (Vic. : 1888) (1581)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1719Maltesemt0.750000
1718Somaliso0.132353
1717Englishen0.117647
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1719 Maltese mt 0.750000\n", "1718 Somali so 0.132353\n", "1717 English en 0.117647" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Mildura Irrigationist and Murray River Cultural Advocate (Vic. : 1891 - 1892) (1582)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1722Englishen0.523810
1721Maltesemt0.333333
1720Somaliso0.126984
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1722 English en 0.523810\n", "1721 Maltese mt 0.333333\n", "1720 Somali so 0.126984" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Millicent Times (SA : 1891 - 1905) (970)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1048Englishen0.94898
1049Catalanca0.05102
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1048 English en 0.94898\n", "1049 Catalan ca 0.05102" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The News, Shoalhaven, Broughton Creek and Ulladulla Advertiser (NSW : 1875 - 1877) (1678)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
537Englishen0.913978
538Catalanca0.086022
\n", "
" ], "text/plain": [ " language_full language proportion\n", "537 English en 0.913978\n", "538 Catalan ca 0.086022" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Phillips River Times (Ravensthorpe, WA : 1908 - 1909) (1546)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2163Englishen0.9
2164Maltesemt0.1
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2163 English en 0.9\n", "2164 Maltese mt 0.1" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Port Phillip Patriot and Morning Advertiser (Vic. : 1845 - 1848) (937)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1729Englishen0.894737
1728Maltesemt0.084211
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1729 English en 0.894737\n", "1728 Maltese mt 0.084211" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Richmond River Express and Casino Kyogle Advertiser (NSW : 1904 - 1929) (500)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
570Englishen0.734940
568Maltesemt0.168675
569Somaliso0.072289
\n", "
" ], "text/plain": [ " language_full language proportion\n", "570 English en 0.734940\n", "568 Maltese mt 0.168675\n", "569 Somali so 0.072289" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Sydney Wool and Stock Journal (NSW : 1899 - 1917) (452)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
639Englishen0.727273
637Maltesemt0.233766
\n", "
" ], "text/plain": [ " language_full language proportion\n", "639 English en 0.727273\n", "637 Maltese mt 0.233766" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Tasmanian (Launceston, Tas. : 1871 - 1879) (946)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1216Englishen0.917808
1217Maltesemt0.082192
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1216 English en 0.917808\n", "1217 Maltese mt 0.082192" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Teetotaller and General Newspaper (Sydney, NSW : 1842 - 1843) (1036)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
642Englishen0.95
\n", "
" ], "text/plain": [ " language_full language proportion\n", "642 English en 0.95" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Voice of Freedom = Elefthera Phoni (Perth, WA : 1956 - 1957) (1381)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2203Modern Greek (1453-)el0.97
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2203 Modern Greek (1453-) el 0.97" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "To Ethnico Vema = Greek National Tribune (Arncliffe, NSW : 1931 - 1954) (1592)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
690Modern Greek (1453-)el0.989362
\n", "
" ], "text/plain": [ " language_full language proportion\n", "690 Modern Greek (1453-) el 0.989362" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Tung Wah News (Sydney, NSW : 1898 - 1902) (1185)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
697Chinesezh0.926316
\n", "
" ], "text/plain": [ " language_full language proportion\n", "697 Chinese zh 0.926316" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Tung Wah Times (Sydney, NSW : 1901 - 1936) (1184)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
704Chinesezh0.968085
\n", "
" ], "text/plain": [ " language_full language proportion\n", "704 Chinese zh 0.968085" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Twofold Bay Telegraph (NSW : 1860) (479)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
715Englishen0.945652
716Maltesemt0.054348
\n", "
" ], "text/plain": [ " language_full language proportion\n", "715 English en 0.945652\n", "716 Maltese mt 0.054348" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Twofold Bay and Maneroo Observer (NSW : 1860) (394)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
709Englishen0.825581
710Maltesemt0.139535
\n", "
" ], "text/plain": [ " language_full language proportion\n", "709 English en 0.825581\n", "710 Maltese mt 0.139535" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Uniamoci (Sydney, NSW : 1903 - 1904) (1599)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
717Italianit1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "717 Italian it 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Upper Hunter Courier (Murrurundi, NSW : 1871) (810)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
718Englishen0.857143
719Maltesemt0.142857
\n", "
" ], "text/plain": [ " language_full language proportion\n", "718 English en 0.857143\n", "719 Maltese mt 0.142857" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Vesnik (Perth, WA : 1975 - 1994) (1382)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2234Macedonianmk0.410526
2233Englishen0.357895
2235Bulgarianbg-Latn0.221053
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2234 Macedonian mk 0.410526\n", "2233 English en 0.357895\n", "2235 Bulgarian bg-Latn 0.221053" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Vil'na Dumka = Free Thought (Sydney, NSW : 1949 - 1954) (1593)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
720Ukrainianuk0.82
721Englishen0.18
\n", "
" ], "text/plain": [ " language_full language proportion\n", "720 Ukrainian uk 0.82\n", "721 English en 0.18" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Warwick Daily News (Qld. : 1919 -1954) (892)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
883Englishen0.835443
884Maltesemt0.139241
\n", "
" ], "text/plain": [ " language_full language proportion\n", "883 English en 0.835443\n", "884 Maltese mt 0.139241" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Williamstown Trade Circular (Vic. : 1855 - 1856) (213)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1792Englishen0.875
1793Portuguesept0.125
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1792 English en 0.875\n", "1793 Portuguese pt 0.125" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for n, l in papers:\n", " if not l.loc[(~df['language'].isin(['en'])) & (df['proportion'] >= 0.05)].empty:\n", " print(f'\\n{n[0]} ({n[1]})')\n", " display(l[['language_full', 'language', 'proportion']].loc[(l['proportion'] > 0.05)].sort_values(by='proportion', ascending=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I went through the titles above and compiled a list of title identifiers that seem to be producing dodgy results. We can use this to filter these newspapers out of our results." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# Titles where dodgy OCR causes false positives in language detection\n", "# This was manually created after scanning results\n", "dodgy = ['1036', '1043', '1103', '116', '1207', '1265', '13', '1320', '1336', '140', '1400', '145', '1488', '1543', '1546', '1581', '1582', '1583', '1623', '1626', '1678', '171', '196', '213', '224', '286', '292', '318', '329', '34', '384', '389', '394', '418', '430', '431', '452', '479', '499', '500', '570', '623', '763', '810', '860', '886', '892', '906', '92', '926', '927', '935', '937', '94', '946', '970', '986']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we'll add the dodgy title ids into our filter. It seems that we have 48 newspapers with significant amounts of non-English content." ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "48" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The filter removes titles that only have one language, which is English\n", "filtered = df.loc[(~df['id'].isin(dodgy)) & (df['proportion'] >= 0.05)].groupby(by=['title', 'id']).filter(lambda x: (len(x) > 1) or (len(x)== 1 and x['language'] != 'en'))\n", "papers = filtered.groupby(by=['title', 'id'])\n", "len(papers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's list them." ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A Voz de Timor (Dili, East Timor : 1970 - 1975)\n", "Adelaider Deutsche Zeitung (SA : 1851 - 1862)\n", "Australische Zeitung (Adelaide, SA : 1875 - 1916)\n", "Berita Repoeblik (Djakarta, Indonesia : 1945 - 1946)\n", "Chinese Republic News (Sydney, NSW : 1914 - 1937)\n", "Chinese Times (Melbourne, Vic. : 1902 - 1922)\n", "Chung Wah News (Perth, WA : 1981 - 1987)\n", "Der Australische Spiegel = The Australian Mirror (Perth, WA : 1952)\n", "Deutsch-Australische Post : Wochenschrift = German-Australian Post : Weekly (Sydney, NSW : 1893 - 1906)\n", "Deutsche Zeitung für Sud-Australien = German Times for South Australia (Tanunda, SA : 1851)\n", "Die Brucke = The Bridge (Sydney, NSW : 1934 - 1939)\n", "Die Deutsche Post für die Australischen Colonien = The German Australian Post (Adelaide, SA : 1848 - 1851)\n", "Dutch Australian Weekly (Sydney, NSW : 1951 - 1993)\n", "Dutch Weekly (Sydney, NSW : 1993 - 2004)\n", "Echo : Polski Tygodnik Niezalezny (Perth, WA : 1950 - 1952)\n", "Eco Italiano (Perth, WA : 1958 - 1959)\n", "Guang yi hua bao = The Chinese Australian Herald (Sydney, NSW : 1894 - 1923)\n", "Hellenic Echo (Perth, WA : 1967 - 1968)\n", "Il Canguro = The Kangaroo (Perth, WA : 1955 - 1957)\n", "Il Giornale Italiano (Sydney, NSW : 1932 - 1940)\n", "Il Risveglio = The Awakening (Sydney, NSW : 1944 - 1954)\n", "Italian Bulletin of Australia (Sydney, NSW : 1922 - 1928, 1935 - 1940)\n", "Italian Bulletin of Commerce (Sydney, NSW : 1929 - 1935)\n", "Italo-Australian (Sydney, NSW : 1927 - 1940)\n", "Japanese Perth Times (Subiaco, WA : 1989 - 1996)\n", "L'Italo-Australiano = The Italo-Australian (Surry Hills, NSW : 1885)\n", "L'Italo-Australiano = The Italo-Australian (Sydney, NSW : 1905 - 1909)\n", "La Rondine (Perth, WA : 1969 - 1994)\n", "Le Courrier Australien (Sydney, NSW : 1892 - 2011)\n", "Mediterranean Voice (Perth, WA : 1971 - 1972)\n", "Meie Kodu = Our Home (Sydney, NSW : 1949 - 1956)\n", "Mu̇sų Pastogė = Our Haven (Sydney, NSW : 1950 - 1954)\n", "Nasza droga (Adelaide, SA : 1952 - 1954)\n", "Norden (Melbourne, Vic. : 1914 - 1918)\n", "Oceania (Sydney, NSW : 1913 - 1915)\n", "Stampa Italiana = The Italian Press (Perth, WA : 1931 - 1932)\n", "Suedaustralische Zeitung (Adelaide, SA : 1850 - 1851)\n", "Sunday Times Edizione Italiana (Perth, WA : 1958 - 1959)\n", "Süd Australische Zeitung (Tanunda and Adelaide, SA : 1860 - 1874)\n", "The Chinese Advertiser (Ballarat, Vic. : 1856)\n", "The English and Chinese Advertiser (Vic. : 1856 - 1858)\n", "The Voice of Freedom = Elefthera Phoni (Perth, WA : 1956 - 1957)\n", "To Ethnico Vema = Greek National Tribune (Arncliffe, NSW : 1931 - 1954)\n", "Tung Wah News (Sydney, NSW : 1898 - 1902)\n", "Tung Wah Times (Sydney, NSW : 1901 - 1936)\n", "Uniamoci (Sydney, NSW : 1903 - 1904)\n", "Vesnik (Perth, WA : 1975 - 1994)\n", "Vil'na Dumka = Free Thought (Sydney, NSW : 1949 - 1954)\n" ] } ], "source": [ "for n, l in papers:\n", " print(n[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's looking pretty good. Let's save the results as a Markdown file to make it easy to explore. We'll include links into Trove. Here's the [list of all 48 newspapers](non-english-newspapers.md) (also as a [Gist](https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97))." ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "with open(Path('non-english-newspapers.md'), 'w') as md_file:\n", " i = 1\n", " for n, l in papers:\n", " md_file.write(f'\\n### {i}. [{n[0]}](http://nla.gov.au/nla.news-title{n[1]})\\n\\n')\n", " md_file.write('| Language | Language code | Proportion of sample |\\n')\n", " md_file.write('|---|---|---|\\n')\n", " for row in l[['language_full', 'language', 'proportion']].loc[(l['proportion'] > 0.05)].sort_values(by='proportion', ascending=False).itertuples():\n", " md_file.write(f'| {row.language_full} | {row.language} | {row.proportion} |\\n')\n", " i += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you look at the Markdown files you'll see that there are still some dodgy results – for example, 16% of the *Chinese Advertiser* is detected as 'Scottish Gaelic'. But the point of this exercise was to find non-English newspapers, rather than accurately detect the proportion of non-English content, so I think we can live with it for now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }