{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding non-English newspapers in Trove\n", "\n", "There are a growing number of non-English newspapers digitised in Trove. However, if you're only searching using English keywords, you might never know that they're there. I thought it would be useful to generate a list of non-English newspapers, but it wasn't quite as straightforward as I thought.\n", "\n", "## How not to do it...\n", "\n", "My first thought was I could start by searching for digitised newspapers amongst the library records in Trove. My theory was that catalogue metadata would include language information. For example, you can search for newspapers using `format:Periodical/Newspaper` in the books and libraries category (or the `article` API zone). To find those that are digitised, you can add a search for 'trove.nla.gov.au'. Here's the [sort of results](https://trove.nla.gov.au/search/category/books?keyword=%22trove.nla.gov.au%22%20format%3APeriodical%2FNewspaper) you get. Unfortunately, you only get about 826 results and there are many more newspapers than that in Trove. It seems links to digitised newspapers are not consistently recorded.\n", "\n", "My second approach was to get the list of digitised newspapers from the API, extract the ISSN, then use this to search for catalogue records. Here's the code snippet I used.\n", "\n", "``` python\n", "params = {\n", " 'zone': 'article',\n", " 'encoding': 'json',\n", " 'l-format': 'Periodical/Newspaper',\n", " 'reclevel': 'full',\n", " 'key': TROVE_API_KEY\n", "}\n", "newspapers = get_newspapers()\n", "for newspaper in newspapers:\n", " print(f'\\n{newspaper[\"title\"]}')\n", " issn = newspaper.get('issn')\n", " params['q'] = f'issn:{issn}'\n", " response = s.get('https://api.trove.nla.gov.au/v2/result', params=params)\n", " data = response.json()\n", " try:\n", " works = data['response']['zone'][0]['records']['work']\n", " except KeyError:\n", " print('Not found')\n", " else:\n", " for work in works:\n", " print(work.get('language'))\n", " if not response.from_cache:\n", " time.sleep(0.2)\n", "```\n", "\n", "The main problem here is that not all titles have ISSNs. You could try searching on the titles is there's no ISSN, but this would involve a fair bit of disambiguation. In any case, in running this I discovered that while there is some language information in the metadata, it's not consistently applied. So basically a metadata-only approach is not going to work. Sigh...\n", "\n", "## How I actually did it\n", "\n", "If I couldn't get language details from metadata, then I had to try and extract it from the resource itself. I spent quite a bit of time looking around for Python packages that provided reliable language detection. The first one I tried regularly identified Mandarin as Korean (it turns out this was a known issue). Another one sent me into dependency hell. Finally I found [pycld3](https://pypi.org/project/pycld3/) which installed with `pip`, and *just worked*.\n", "\n", "My plan was to get the list of newspapers via the API as before, then fire off an empty search for each one. I'd then loop through the results, running the language detector over the article text. I set the query parameters to retrieve the maxmimum number of results in one request – 100. That seemed like a reasonable sample. To try and provide a big enough amount of text for the language detector to work with, I set the number of words parameter to return articles with between 100 and 1000 words. So the query parameters I used were:\n", "\n", "``` python\n", "params = {\n", " 'zone': 'newspaper',\n", " 'encoding': 'json',\n", " 'l-word': '100 - 1000 Words',\n", " 'include': 'articletext',\n", " 'key': TROVE_API_KEY,\n", " 'q': ' ',\n", " 'n': 100,\n", "}\n", "```\n", "\n", "Because some of the newspapers had short runs and the word count filter limits the results, I found that I wasn't always getting 100 results per newspaper. To work around this I found the likely language for each article, aggregated the counts, and then calculated the proportion of results for each language. This gave me the proportion of articles in each language – a number I could use across newspapers to find the non-English titles. \n", "\n", "In general this worked pretty well, and the result was a [list of 48 newspapers](non-english-newspapers.md) (also as a [Gist](https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97)) that have significant amounts of non-English content. However, I had to do a fair bit of fiddling to filter out dodgy results. All the details are included below.\n", "\n", "## Problems / limitations\n", "\n", "* It's no surprise that the results of the language detection are affected by the quality of the OCR. \n", "* In filtering out what seems to be the product of dodgy OCR, it's possible that I might be excluding some non-English content. \n", "* I'm only detecting the predominant language for each article, so there might be articles containing a mix of languages that are being missed. \n", "* I'm just talking the first 100 results from a blank search in each newspaper. Larger, or more randomised samples might produce different results.\n", "* Some dodgy detection results remain in the list of newspapers, but the point of this exercise was to find non-English newspapers. If you wanted to accurately determine the quantity of non-English content, you'd have to do a lot more fine-grained analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import what we need" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "import time\n", "import requests_cache\n", "from requests.adapters import HTTPAdapter\n", "from requests.packages.urllib3.util.retry import Retry\n", "from collections import Counter\n", "import re\n", "from langdetect import detect\n", "from tqdm.auto import tqdm\n", "import pandas as pd\n", "import cld3\n", "import pycountry\n", "from language_tags import tags\n", "import altair as alt\n", "from pathlib import Path\n", "\n", "s = requests_cache.CachedSession()\n", "retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])\n", "s.mount('https://', HTTPAdapter(max_retries=retries))\n", "s.mount('http://', HTTPAdapter(max_retries=retries))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "TROVE_API_KEY = '[YOUR API KEY]'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "TROVE_API_KEY = '6pi5hht0d2umqcro'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Harvest the data and run language detection on articles" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def get_newspapers():\n", " '''\n", " Get a list of newspapers in Trove.\n", " '''\n", " response = s.get('https://api.trove.nla.gov.au/v2/newspaper/titles', params={'encoding': 'json', 'key': TROVE_API_KEY})\n", " data = response.json()\n", " return data['response']['records']['newspaper']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "076eae9c38bb4db5b750a50007434ebb", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1666 [00:00', '', text)\n", " text = re.sub(\"\\s\\s+\", \" \", text)\n", " # Get the language\n", " ld = cld3.get_language(text)\n", " # If the language prediction is reliable, save it\n", " if ld.is_reliable:\n", " langs.append(ld.language)\n", " # Find the count of each language detected in the sample of articles\n", " for lang, count in dict(Counter(langs)).items():\n", " # Calculate the language count as a proportion of the total number of results\n", " prop = int(count) / len(langs)\n", " newspaper_langs.append({'id': newspaper['id'], 'title': newspaper['title'], 'language': lang, 'proportion': prop, 'number': n})\n", " if not response.from_cache:\n", " time.sleep(0.2)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert the results into a dataframe." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitlelanguageproportionnumber
0166Canberra Community News (ACT : 1925 - 1927)en1.0100
1165Canberra Illustrated: A Quarterly Magazine (AC...en1.029
269Federal Capital Pioneer (Canberra, ACT : 1924 ...en1.0100
3871Good Neighbour (ACT : 1950 - 1969)en1.0100
4665Student Notes/Canberra University College Stud...en1.0100
\n", "
" ], "text/plain": [ " id title language \\\n", "0 166 Canberra Community News (ACT : 1925 - 1927) en \n", "1 165 Canberra Illustrated: A Quarterly Magazine (AC... en \n", "2 69 Federal Capital Pioneer (Canberra, ACT : 1924 ... en \n", "3 871 Good Neighbour (ACT : 1950 - 1969) en \n", "4 665 Student Notes/Canberra University College Stud... en \n", "\n", " proportion number \n", "0 1.0 100 \n", "1 1.0 29 \n", "2 1.0 100 \n", "3 1.0 100 \n", "4 1.0 100 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(newspaper_langs)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add full language names\n", "\n", "The language detector returns BCP-47-style language codes. To translate these into something that's a bit easier for humans to understand, we can use the [language-tags](https://github.com/OnroerendErfgoed/language-tags) package." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def get_full_language(lc):\n", " '''\n", " Get full language names from codes\n", " '''\n", " lang = tags.description(lc)\n", " if lang:\n", " return lang[0]\n", " else:\n", " print(lc)\n", " return lc\n", "\n", "df['language_full'] = df['language'].apply(get_full_language)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering the results\n", "\n", "If we just look at the numbers of languages detected we might think that Australia's cultural diversity was much greater than we expected! But the likelihood that there were ten newspapers publishing articles in Igbo (the language of the Igbo people in south-eastern Nigeria) seems small. Obviously there are a considerable number of false positives here." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "English 1608\n", "Maltese 285\n", "Catalan 52\n", "Welsh 36\n", "Japanese 32\n", "Italian 31\n", "Norwegian 24\n", "Somali 24\n", "Danish 18\n", "German 17\n", "Portuguese 10\n", "Igbo 10\n", "French 10\n", "Samoan 10\n", "Chinese 8\n", "Estonian 8\n", "Luxembourgish 8\n", "Hawaiian 8\n", "Scottish Gaelic 8\n", "Western Frisian 7\n", "Vietnamese 7\n", "Corsican 6\n", "Russian 6\n", "Modern Greek (1453-) 5\n", "Filipino 5\n", "Swedish 5\n", "Bulgarian 4\n", "Afrikaans 4\n", "Polish 4\n", "Indonesian 4\n", "Javanese 4\n", "Hindi 4\n", "Malagasy 4\n", "Haitian 3\n", "Latin 3\n", "Malay (macrolanguage) 3\n", "Dutch 3\n", "Cebuano 2\n", "Kurdish 2\n", "Shona 2\n", "Hebrew 2\n", "Bosnian 2\n", "Ukrainian 2\n", "Spanish 2\n", "Yiddish 2\n", "Irish 2\n", "Albanian 2\n", "Maori 1\n", "Turkish 1\n", "Slovak 1\n", "Zulu 1\n", "Marathi 1\n", "Galician 1\n", "Czech 1\n", "Croatian 1\n", "Macedonian 1\n", "Lithuanian 1\n", "Slovenian 1\n", "Name: language_full, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['language_full'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember that for each language detected in a newspaper we calculated the proportion of articles in our results set in that language. So we can, for example, just look at newspapers where 100% of the articles are in a single language. This highlights a few non-English language newspapers, but obviously we're missing a lot of others." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "English 1144\n", "German 3\n", "Italian 3\n", "Modern Greek (1453-) 1\n", "Estonian 1\n", "Portuguese 1\n", "Name: language_full, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['proportion'] == 1]['language_full'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we chart the proportions, we see them bunched up at either end of the scale. So there are lots of languages detected in only a small proportion of articles." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(df).mark_bar().encode(\n", " x=alt.X('proportion:Q', bin=True),\n", " y='count():Q'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we zoom in on the proportions less than 0.1 (that's 10 articles in a sample of 100) we see that they're mostly less that 0.01 (or 1 article in 100). It seems likely that these are false positives. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(df.loc[df['proportion'] < 0.1]).mark_bar().encode(\n", " x=alt.X('proportion:Q', bin=True),\n", " y='count():Q'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's be fairly conservative and filter out languages that have a proportion (per newspaper) less than 0.5. This list seems a bit more in line with what we would expect, but there are still some surprises – 48 newspapers published articles in Maltese?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "English 1601\n", "Maltese 50\n", "Italian 14\n", "German 9\n", "Chinese 8\n", "Catalan 6\n", "Somali 5\n", "Modern Greek (1453-) 4\n", "French 3\n", "Polish 3\n", "Japanese 3\n", "Portuguese 3\n", "Western Frisian 2\n", "Yiddish 2\n", "Dutch 2\n", "Malay (macrolanguage) 1\n", "Indonesian 1\n", "Bosnian 1\n", "Russian 1\n", "Estonian 1\n", "Ukrainian 1\n", "Lithuanian 1\n", "Danish 1\n", "Spanish 1\n", "Macedonian 1\n", "Corsican 1\n", "Welsh 1\n", "Bulgarian 1\n", "Vietnamese 1\n", "Scottish Gaelic 1\n", "Samoan 1\n", "Name: language_full, dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df['proportion'] >= 0.05]['language_full'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we focus in on the newspapers that supposedly have a significant proportion of articles in Maltese, we see some very strange results. I seriously doubt that 80% of the *Mildura Irrigationist* from 1892-3 is in Maltese. So what's going on?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitlelanguageproportionnumberlanguage_full
2221596L'Italo-Australiano = The Italo-Australian (Su...mt0.218750100Maltese
314623Sunday News (Sydney, NSW : 1919)mt0.219178100Maltese
414224The Castlereagh (Gilgandra, NSW : 1905 - 1907)mt0.105882100Maltese
582500The Richmond River Express and Casino Kyogle A...mt0.168675100Maltese
652452The Sydney Wool and Stock Journal (NSW : 1899 ...mt0.233766100Maltese
725394Twofold Bay and Maneroo Observer (NSW : 1860)mt0.139535100Maltese
734810Upper Hunter Courier (Murrurundi, NSW : 1871)mt0.14285714Maltese
8571207The Coolangatta Chronicle (Qld. : 1926)mt0.13043526Maltese
907892Warwick Daily News (Qld. : 1919 -1954)mt0.139241100Maltese
105234The Advertiser (Adelaide, SA : 1889 - 1931)mt0.486111100Maltese
1539384North Melbourne Gazette (Vic. : 1894 - 1901)mt0.146341100Maltese
1603318Sandringham Southern Cross (Vic. : 1914 - 1918)mt0.312500100Maltese
164513The Argus (Melbourne, Vic. : 1848 - 1957)mt0.629630100Maltese
17541583The Mildura Irrigationist (Vic. : 1892 - 1893)mt0.795455100Maltese
17581581The Mildura Irrigationist and Murray River Agr...mt0.739130100Maltese
17601582The Mildura Irrigationist and Murray River Cul...mt0.333333100Maltese
20271543Murchison Times and Cue-Big Bell-Reedy Advocat...mt0.137500100Maltese
21191617The Derby News (WA : 1887)mt0.7500005Maltese
\n", "
" ], "text/plain": [ " id title language \\\n", "222 1596 L'Italo-Australiano = The Italo-Australian (Su... mt \n", "314 623 Sunday News (Sydney, NSW : 1919) mt \n", "414 224 The Castlereagh (Gilgandra, NSW : 1905 - 1907) mt \n", "582 500 The Richmond River Express and Casino Kyogle A... mt \n", "652 452 The Sydney Wool and Stock Journal (NSW : 1899 ... mt \n", "725 394 Twofold Bay and Maneroo Observer (NSW : 1860) mt \n", "734 810 Upper Hunter Courier (Murrurundi, NSW : 1871) mt \n", "857 1207 The Coolangatta Chronicle (Qld. : 1926) mt \n", "907 892 Warwick Daily News (Qld. : 1919 -1954) mt \n", "1052 34 The Advertiser (Adelaide, SA : 1889 - 1931) mt \n", "1539 384 North Melbourne Gazette (Vic. : 1894 - 1901) mt \n", "1603 318 Sandringham Southern Cross (Vic. : 1914 - 1918) mt \n", "1645 13 The Argus (Melbourne, Vic. : 1848 - 1957) mt \n", "1754 1583 The Mildura Irrigationist (Vic. : 1892 - 1893) mt \n", "1758 1581 The Mildura Irrigationist and Murray River Agr... mt \n", "1760 1582 The Mildura Irrigationist and Murray River Cul... mt \n", "2027 1543 Murchison Times and Cue-Big Bell-Reedy Advocat... mt \n", "2119 1617 The Derby News (WA : 1887) mt \n", "\n", " proportion number language_full \n", "222 0.218750 100 Maltese \n", "314 0.219178 100 Maltese \n", "414 0.105882 100 Maltese \n", "582 0.168675 100 Maltese \n", "652 0.233766 100 Maltese \n", "725 0.139535 100 Maltese \n", "734 0.142857 14 Maltese \n", "857 0.130435 26 Maltese \n", "907 0.139241 100 Maltese \n", "1052 0.486111 100 Maltese \n", "1539 0.146341 100 Maltese \n", "1603 0.312500 100 Maltese \n", "1645 0.629630 100 Maltese \n", "1754 0.795455 100 Maltese \n", "1758 0.739130 100 Maltese \n", "1760 0.333333 100 Maltese \n", "2027 0.137500 100 Maltese \n", "2119 0.750000 5 Maltese " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[(df['proportion'] > 0.1) & (df['language_full'] == 'Maltese')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you look at results for the *Mildura Irrigationist* [in Trove](https://trove.nla.gov.au/search/advanced/category/newspapers?l-advtitle=1583&l-advWord=100%20-%201000%20Words) you'll see that many of the page images are blurry, and as a result the OCR is very, very bad. Here's a sample:\n", "\n", "> ill Tatr W lyltwililUmt aat aa«v aa MwOkaWtOPMlkMrf faiflftMMRltitlWBfMNM fmiMW^M^K IMIOHIpM^fQBMMI ft tWMmrwl tWWiltjfNMStW ffw aailwt«M wtMitiar«lH*a ifcmH af tlw ial«««l ion «M««f ffantoif wwtMaaM. tto tf h «frwringmhw torf M hr toaiy. Im*4. ar, fc> mmirf awlUW wefllaM aA. aaytMaa. l «Wa A tfc» tow waliw Macks b aaM, b wil fVfbH Ja ^IMntaam* Mm' ls tolliac. rt Tto aad nf ttoar UhKMimiw*a afM» ftjrwl ans W l OtfWOar jpaaofTwSi aJwwr la'aahS^*— attor aakwt mm rvfimMiMh* ttoai. day - Why. aa IH thrf t«fl almd yaa.\"iw. aal wwifciha m OiO all tto laM amnavaA, fawawNl I r aa4 f wa* tm enr a Mtcfc tto watrr tto wiaaal m a* a* day pfaMat. aa4 (h* ilj amintir* ilm tTtsjtvL.f**' \"\"j •fria—lhati* tow ««4M k.\" tlml t | r 4m» wtn .aa rUa* I h ha«« t ctoantaf InMM* aM*toclt ttopnaMaf II It la Mat rtgM, t jmi awl a 1 : af but d awtliqg a Mr. Jafc Matwa-(MMa M t «wl y gha yaar «toa anl yaar (ma as «fpai ta af t«l. i pwwiaf Mtan (tot jw. twy MwUI «*a1 a«ry ftajr «ndl tar tlw aad annaH* a*«r aarf a««r aaria. tiaa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens when we feed this fragment of bad OCR to the language detector? Remarkably, the language detector is 96% sure that it's Maltese! To find out why this is the case, we'd probably have to dig into the way the language detection model was trained. But for our purposes it's enough to know that some of the languages detected seem to be the result of bad OCR." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LanguagePrediction(language='mt', probability=0.960280179977417, is_reliable=True, proportion=1.0)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ocr = '''ill Tatr W lyltwililUmt aat aa«v aa MwOkaWtOPMlkMrf faiflftMMRltitlWBfMNM fmiMW^M^K IMIOHIpM^fQBMMI ft tWMmrwl tWWiltjfNMStW ffw aailwt«M wtMitiar«lH*a ifcmH af tlw ial«««l ion «M««f ffantoif wwtMaaM. tto tf h «frwringmhw torf M hr toaiy. Im*4. ar, fc> mmirf awlUW wefllaM aA. aaytMaa. l «Wa A tfc» tow waliw Macks b aaM, b wil fVfbH Ja ^IMntaam* Mm' ls tolliac. rt Tto aad nf ttoar UhKMimiw*a afM» ftjrwl ans W l OtfWOar jpaaofTwSi aJwwr la'aahS^*— attor aakwt mm rvfimMiMh* ttoai. day - Why. aa IH thrf t«fl almd yaa.\"iw. aal wwifciha m OiO all tto laM amnavaA, fawawNl I r aa4 f wa* tm enr a Mtcfc tto watrr tto wiaaal m a* a* day pfaMat. aa4 (h* ilj amintir* ilm tTtsjtvL.f**' \"\"j •fria—lhati* tow ««4M k.\" tlml t | r 4m» wtn .aa rUa* I h ha«« t ctoantaf InMM* aM*toclt ttopnaMaf II It la Mat rtgM, t jmi awl a 1 : af but d awtliqg a Mr. Jafc Matwa-(MMa M t «wl y gha yaar «toa anl yaar (ma as «fpai ta af t«l. i pwwiaf Mtan (tot jw. twy MwUI «*a1 a«ry ftajr «ndl tar tlw aad annaH* a*«r aarf a««r aaria. tiaa'''\n", "cld3.get_language(ocr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course there might actually be newspapers with articles in Maltese, so we don't want to filter them all out. So let's do some manual inspection of the newspapers that *seem* to have non-English content. First we'll filter our results to include only languages with proportions of more than 0.05, and then drop out newspapers that seem to be only in English. We end up with 105 different titles. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "111" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The filter on the groupby drops out newspapers that only have articles in English.\n", "filtered = df.loc[df['proportion'] >= 0.05].groupby(by=['title', 'id']).filter(lambda x: (len(x) > 1) or (len(x)== 1 and x['language'] != 'en'))\n", "papers = filtered.groupby(by=['title', 'id'])\n", "len(papers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's list those 111 newspapers. From the list below, I think it's pretty easy to pick out the results that are likely to be the product of bad OCR." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "A Voz de Timor (Dili, East Timor : 1970 - 1975) (1498)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
9Portuguesept1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "9 Portuguese pt 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Adelaide Chronicle and South Australian Literary Record (SA : 1840 - 1842) (986)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
917Englishen0.929293
916Catalanca0.070707
\n", "
" ], "text/plain": [ " language_full language proportion\n", "917 English en 0.929293\n", "916 Catalan ca 0.070707" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Adelaide Independent and Cabinet of Amusement (SA : 1841) (1336)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
918Englishen0.928571
920Catalanca0.061224
\n", "
" ], "text/plain": [ " language_full language proportion\n", "918 English en 0.928571\n", "920 Catalan ca 0.061224" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Adelaider Deutsche Zeitung (SA : 1851 - 1862) (277)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
927Germande1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "927 German de 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Auburn and District News (NSW : 1929) (1320)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
40Englishen0.947368
41Vietnamesevi0.052632
\n", "
" ], "text/plain": [ " language_full language proportion\n", "40 English en 0.947368\n", "41 Vietnamese vi 0.052632" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Australische Zeitung (Adelaide, SA : 1875 - 1916) (1150)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
931Germande1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "931 German de 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Bangkok Recorder (Thailand : 1865 - 1867) (1488)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
10Englishen0.925532
11Maltesemt0.053191
\n", "
" ], "text/plain": [ " language_full language proportion\n", "10 English en 0.925532\n", "11 Maltese mt 0.053191" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Berita Repoeblik (Djakarta, Indonesia : 1945 - 1946) (1283)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
14Malay (macrolanguage)ms0.891304
15Indonesianid0.108696
\n", "
" ], "text/plain": [ " language_full language proportion\n", "14 Malay (macrolanguage) ms 0.891304\n", "15 Indonesian id 0.108696" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Bulong Bulletin and Mining Register (WA : 1897 - 1898) (1400)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1852Englishen0.913043
1853Maltesemt0.086957
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1852 English en 0.913043\n", "1853 Maltese mt 0.086957" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chinese Republic News (Sydney, NSW : 1914 - 1937) (1186)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
83Chinesezh0.945652
\n", "
" ], "text/plain": [ " language_full language proportion\n", "83 Chinese zh 0.945652" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chinese Times (Melbourne, Vic. : 1902 - 1922) (705)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1330Chinesezh0.843373
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1330 Chinese zh 0.843373" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chronicle and North Coast Advertiser (Qld. : 1903 - 1922) (286)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
780Englishen0.93617
781Maltesemt0.06383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "780 English en 0.93617\n", "781 Maltese mt 0.06383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Chung Wah News (Perth, WA : 1981 - 1987) (1383)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1870Englishen0.637363
1869Chinesezh0.263736
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1870 English en 0.637363\n", "1869 Chinese zh 0.263736" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Colac Reformer (Vic. : 1914 - 1918) (763)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1350Englishen0.947917
1351Maltesemt0.052083
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1350 English en 0.947917\n", "1351 Maltese mt 0.052083" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Daily Post (Hobart, Tas. : 1908 - 1918) (860)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1141Englishen0.704545
1140Japaneseja0.125000
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1141 English en 0.704545\n", "1140 Japanese ja 0.125000" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Der Australische Spiegel = The Australian Mirror (Perth, WA : 1952) (1385)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1895Germande0.83
1896Englishen0.17
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1895 German de 0.83\n", "1896 English en 0.17" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Deutsch-Australische Post : Wochenschrift = German-Australian Post : Weekly (Sydney, NSW : 1893 - 1906) (1600)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
130Germande1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "130 German de 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Deutsche Zeitung für Sud-Australien = German Times for South Australia (Tanunda, SA : 1851) (1577)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
945Germande0.9
944Englishen0.1
\n", "
" ], "text/plain": [ " language_full language proportion\n", "945 German de 0.9\n", "944 English en 0.1" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Die Brucke = The Bridge (Sydney, NSW : 1934 - 1939) (1591)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
131Germande0.729167
132Englishen0.270833
\n", "
" ], "text/plain": [ " language_full language proportion\n", "131 German de 0.729167\n", "132 English en 0.270833" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Die Deutsche Post für die Australischen Colonien = The German Australian Post (Adelaide, SA : 1848 - 1851) (1576)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
946Germande0.989691
\n", "
" ], "text/plain": [ " language_full language proportion\n", "946 German de 0.989691" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Dutch Australian Weekly (Sydney, NSW : 1951 - 1993) (1044)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
136Dutchnl0.882979
137Englishen0.106383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "136 Dutch nl 0.882979\n", "137 English en 0.106383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Dutch Weekly (Sydney, NSW : 1993 - 2004) (1045)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
139Dutchnl0.924731
140Englishen0.053763
\n", "
" ], "text/plain": [ " language_full language proportion\n", "139 Dutch nl 0.924731\n", "140 English en 0.053763" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Echo : Polski Tygodnik Niezalezny (Perth, WA : 1950 - 1952) (1384)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1901Polishpl0.91
1902Englishen0.09
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1901 Polish pl 0.91\n", "1902 English en 0.09" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Eco Italiano (Perth, WA : 1958 - 1959) (1387)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1903Italianit1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1903 Italian it 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Emu Bay Times and North West and West Coast Advocate (Tas. : 1897 - 1899) (116)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1157Englishen0.929412
1158Maltesemt0.070588
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1157 English en 0.929412\n", "1158 Maltese mt 0.070588" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Evelyn Observer, and South and East Bourke Record (Vic. : 1882 - 1902) (145)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1384Englishen0.913978
1383Maltesemt0.075269
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1384 English en 0.913978\n", "1383 Maltese mt 0.075269" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Geelong Advertiser (Vic. : 1840 - 1845) (292)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1405Englishen0.904255
1404Samoansm0.074468
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1405 English en 0.904255\n", "1404 Samoan sm 0.074468" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Geraldton Advocate and Johnstone River Guardian (Qld. : 1895 - 1896) (1103)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
789Englishen0.910112
790Maltesemt0.089888
\n", "
" ], "text/plain": [ " language_full language proportion\n", "789 English en 0.910112\n", "790 Maltese mt 0.089888" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Geraldton Express and Murchison Goldfields News (WA : 1894 - 1896) (1623)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1914Englishen0.661538
1918Maltesemt0.076923
1915Japaneseja0.061538
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1914 English en 0.661538\n", "1918 Maltese mt 0.076923\n", "1915 Japanese ja 0.061538" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Guang yi hua bao = The Chinese Australian Herald (Sydney, NSW : 1894 - 1923) (704)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
174Chinesezh0.803030
177Western Frisianfy0.075758
\n", "
" ], "text/plain": [ " language_full language proportion\n", "174 Chinese zh 0.803030\n", "177 Western Frisian fy 0.075758" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Hamilton Spectator and Grange District Advertiser (Vic. : 1860 - 1870) (927)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1436Englishen0.921348
1435Maltesemt0.078652
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1436 English en 0.921348\n", "1435 Maltese mt 0.078652" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Healesville Guardian (Vic. : 1893 - 1898) (140)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1441Englishen0.938144
1442Maltesemt0.051546
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1441 English en 0.938144\n", "1442 Maltese mt 0.051546" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Hellenic Echo (Perth, WA : 1967 - 1968) (1389)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1956Modern Greek (1453-)el1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1956 Modern Greek (1453-) el 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Il Canguro = The Kangaroo (Perth, WA : 1955 - 1957) (1378)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1958Italianit0.97
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1958 Italian it 0.97" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Il Giornale Italiano (Sydney, NSW : 1932 - 1940) (279)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
190Italianit0.92
191Englishen0.08
\n", "
" ], "text/plain": [ " language_full language proportion\n", "190 Italian it 0.92\n", "191 English en 0.08" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Il Risveglio = The Awakening (Sydney, NSW : 1944 - 1954) (1601)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
192Italianit0.777778
193Englishen0.222222
\n", "
" ], "text/plain": [ " language_full language proportion\n", "192 Italian it 0.777778\n", "193 English en 0.222222" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Inglewood Advertiser (Vic. : 1914 - 1918) (570)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1461Englishen0.936842
1462Maltesemt0.063158
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1461 English en 0.936842\n", "1462 Maltese mt 0.063158" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Italian Bulletin of Australia (Sydney, NSW : 1922 - 1928, 1935 - 1940) (1602)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
203Englishen0.840426
204Italianit0.159574
\n", "
" ], "text/plain": [ " language_full language proportion\n", "203 English en 0.840426\n", "204 Italian it 0.159574" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Italian Bulletin of Commerce (Sydney, NSW : 1929 - 1935) (1603)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
205Englishen0.903226
206Italianit0.096774
\n", "
" ], "text/plain": [ " language_full language proportion\n", "205 English en 0.903226\n", "206 Italian it 0.096774" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Italo-Australian (Sydney, NSW : 1927 - 1940) (1595)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
207Italianit0.909091
208Englishen0.090909
\n", "
" ], "text/plain": [ " language_full language proportion\n", "207 Italian it 0.909091\n", "208 English en 0.090909" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Japanese Perth Times (Subiaco, WA : 1989 - 1996) (1386)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1963Japaneseja0.93617
1964Englishen0.06383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1963 Japanese ja 0.93617\n", "1964 English en 0.06383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Katoomba Times (NSW : 1889 - 1894) (906)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
211Englishen0.934066
213Maltesemt0.054945
\n", "
" ], "text/plain": [ " language_full language proportion\n", "211 English en 0.934066\n", "213 Maltese mt 0.054945" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Kyabram Union (Vic. : 1886 - 1894) (196)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1482Englishen0.921348
1483Maltesemt0.056180
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1482 English en 0.921348\n", "1483 Maltese mt 0.056180" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "L'Italo-Australiano = The Italo-Australian (Surry Hills, NSW : 1885) (1596)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
221Italianit0.68750
222Maltesemt0.21875
\n", "
" ], "text/plain": [ " language_full language proportion\n", "221 Italian it 0.68750\n", "222 Maltese mt 0.21875" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "L'Italo-Australiano = The Italo-Australian (Sydney, NSW : 1905 - 1909) (1597)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
227Italianit0.95
\n", "
" ], "text/plain": [ " language_full language proportion\n", "227 Italian it 0.95" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "La Rondine (Perth, WA : 1969 - 1994) (1388)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1981Italianit0.928571
1982Englishen0.071429
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1981 Italian it 0.928571\n", "1982 English en 0.071429" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Laura Standard and Crystal Brook Courier (SA : 1917 - 1948) (926)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
963Englishen0.931034
964Maltesemt0.068966
\n", "
" ], "text/plain": [ " language_full language proportion\n", "963 English en 0.931034\n", "964 Maltese mt 0.068966" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Le Courrier Australien (Sydney, NSW : 1892 - 2011) (829)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
232Frenchfr0.816327
233Englishen0.173469
\n", "
" ], "text/plain": [ " language_full language proportion\n", "232 French fr 0.816327\n", "233 English en 0.173469" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Mediterranean Voice (Perth, WA : 1971 - 1972) (1390)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2000Modern Greek (1453-)el0.375000
1994Englishen0.281250
2001Portuguesept0.104167
1995Frenchfr0.062500
1993Spanishes0.052083
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2000 Modern Greek (1453-) el 0.375000\n", "1994 English en 0.281250\n", "2001 Portuguese pt 0.104167\n", "1995 French fr 0.062500\n", "1993 Spanish es 0.052083" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Meie Kodu = Our Home (Sydney, NSW : 1949 - 1956) (280)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
242Estonianet1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "242 Estonian et 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Murchison Times and Cue-Big Bell-Reedy Advocate (WA : 1937 - 1942) (1543)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2026Englishen0.8250
2027Maltesemt0.1375
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2026 English en 0.8250\n", "2027 Maltese mt 0.1375" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Mu̇sų Pastogė = Our Haven (Sydney, NSW : 1950 - 1954) (1594)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
254Lithuanianlt0.95
\n", "
" ], "text/plain": [ " language_full language proportion\n", "254 Lithuanian lt 0.95" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Narandera Argus and Riverina Advertiser (NSW : 1893 - 1953) (431)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
258Englishen0.940476
259Maltesemt0.059524
\n", "
" ], "text/plain": [ " language_full language proportion\n", "258 English en 0.940476\n", "259 Maltese mt 0.059524" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Narromine News and Trangie Advocate (NSW : 1898 - 1955) (430)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
260Englishen0.946809
261Maltesemt0.053191
\n", "
" ], "text/plain": [ " language_full language proportion\n", "260 English en 0.946809\n", "261 Maltese mt 0.053191" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Nasza droga (Adelaide, SA : 1952 - 1954) (1323)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
970Polishpl0.9
971Englishen0.1
\n", "
" ], "text/plain": [ " language_full language proportion\n", "970 Polish pl 0.9\n", "971 English en 0.1" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Norden (Melbourne, Vic. : 1914 - 1918) (797)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1531Englishen0.467391
1530Danishda0.413043
1532Maltesemt0.065217
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1531 English en 0.467391\n", "1530 Danish da 0.413043\n", "1532 Maltese mt 0.065217" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "North Melbourne Gazette (Vic. : 1894 - 1901) (384)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1538Englishen0.829268
1539Maltesemt0.146341
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1538 English en 0.829268\n", "1539 Maltese mt 0.146341" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Oceania (Sydney, NSW : 1913 - 1915) (1598)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
274Englishen0.574468
273Italianit0.425532
\n", "
" ], "text/plain": [ " language_full language proportion\n", "274 English en 0.574468\n", "273 Italian it 0.425532" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Referee (Sydney, NSW : 1886 - 1939) (499)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
288Englishen0.924242
289Maltesemt0.075758
\n", "
" ], "text/plain": [ " language_full language proportion\n", "288 English en 0.924242\n", "289 Maltese mt 0.075758" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Reporter and Illawarra Journal (Kiama, NSW : 1887 - 1894) (389)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
290Englishen0.891566
292Maltesemt0.084337
\n", "
" ], "text/plain": [ " language_full language proportion\n", "290 English en 0.891566\n", "292 Maltese mt 0.084337" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Revue Australienne : Journal des Interets Francais en Australie, Nouvelle Caledonie, Nouvelle Zelande, Fiji, Tahiti, Polynesie = (1604)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
294Frenchfr0.99
\n", "
" ], "text/plain": [ " language_full language proportion\n", "294 French fr 0.99" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Ringwood and Croydon Chronicle (Vic. : 1914 - 1918) (329)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1591Englishen0.93617
1592Maltesemt0.06383
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1591 English en 0.93617\n", "1592 Maltese mt 0.06383" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Rockhampton Bulletin and Central Queensland Advertiser (Qld. : 1861 - 1871) (92)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
837Englishen0.946237
838Maltesemt0.053763
\n", "
" ], "text/plain": [ " language_full language proportion\n", "837 English en 0.946237\n", "838 Maltese mt 0.053763" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sandringham Southern Cross (Vic. : 1914 - 1918) (318)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1602Englishen0.6500
1603Maltesemt0.3125
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1602 English en 0.6500\n", "1603 Maltese mt 0.3125" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Seamen's Strike Bulletin (Melbourne, Vic. : 1919) (1043)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1610Polishpl0.4
1608Western Frisianfy0.2
1609Bosnianbs0.2
1611Russianru-Latn0.2
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1610 Polish pl 0.4\n", "1608 Western Frisian fy 0.2\n", "1609 Bosnian bs 0.2\n", "1611 Russian ru-Latn 0.2" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Southern Australian (Adelaide, SA : 1838 - 1844) (171)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1036Englishen0.904255
1035Catalanca0.074468
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1036 English en 0.904255\n", "1035 Catalan ca 0.074468" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Southern Morning Herald (Goulburn, NSW : 1920 - 1923) (418)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
310Englishen0.909091
312Maltesemt0.077922
\n", "
" ], "text/plain": [ " language_full language proportion\n", "310 English en 0.909091\n", "312 Maltese mt 0.077922" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Stampa Italiana = The Italian Press (Perth, WA : 1931 - 1932) (1380)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2069Italianit0.97
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2069 Italian it 0.97" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Suedaustralische Zeitung (Adelaide, SA : 1850 - 1851) (314)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1046Germande0.888889
1047Englishen0.111111
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1046 German de 0.888889\n", "1047 English en 0.111111" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sunday News (Sydney, NSW : 1919) (623)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
315Englishen0.739726
314Maltesemt0.219178
\n", "
" ], "text/plain": [ " language_full language proportion\n", "315 English en 0.739726\n", "314 Maltese mt 0.219178" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sunday Times Edizione Italiana (Perth, WA : 1958 - 1959) (1379)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2075Italianit1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2075 Italian it 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sydney Chronicle (NSW : 1846 - 1848) (94)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
319Englishen0.923077
320Maltesemt0.076923
\n", "
" ], "text/plain": [ " language_full language proportion\n", "319 English en 0.923077\n", "320 Maltese mt 0.076923" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Süd Australische Zeitung (Tanunda and Adelaide, SA : 1860 - 1874) (278)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1044Germande0.989691
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1044 German de 0.989691" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Tasmanian Evening Herald (Launceston, Tas. : 1878) (1265)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1181Englishen0.898876
1180Maltesemt0.067416
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1181 English en 0.898876\n", "1180 Maltese mt 0.067416" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Advertiser (Adelaide, SA : 1889 - 1931) (34)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1051Englishen0.513889
1052Maltesemt0.486111
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1051 English en 0.513889\n", "1052 Maltese mt 0.486111" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Argus (Melbourne, Vic. : 1848 - 1957) (13)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1645Maltesemt0.629630
1646Englishen0.358025
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1645 Maltese mt 0.629630\n", "1646 English en 0.358025" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Australian Jewish News (Melbourne, Vic. : 1935 - 1999) (1685)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1657Englishen0.894737
1659Yiddishyi0.084211
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1657 English en 0.894737\n", "1659 Yiddish yi 0.084211" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Castlereagh (Gilgandra, NSW : 1905 - 1907) (224)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
413Englishen0.741176
415Somaliso0.152941
414Maltesemt0.105882
\n", "
" ], "text/plain": [ " language_full language proportion\n", "413 English en 0.741176\n", "415 Somali so 0.152941\n", "414 Maltese mt 0.105882" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Chinese Advertiser (Ballarat, Vic. : 1856) (706)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1680Chinesezh0.500000
1682Englishen0.333333
1681Scottish Gaelicgd0.166667
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1680 Chinese zh 0.500000\n", "1682 English en 0.333333\n", "1681 Scottish Gaelic gd 0.166667" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Coolangatta Chronicle (Qld. : 1926) (1207)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
856Englishen0.869565
857Maltesemt0.130435
\n", "
" ], "text/plain": [ " language_full language proportion\n", "856 English en 0.869565\n", "857 Maltese mt 0.130435" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Derby News (WA : 1887) (1617)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2119Maltesemt0.75
2120Corsicanco0.25
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2119 Maltese mt 0.75\n", "2120 Corsican co 0.25" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The English and Chinese Advertiser (Vic. : 1856 - 1858) (685)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1699Englishen0.894737
1700Chinesezh0.052632
1701Maltesemt0.052632
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1699 English en 0.894737\n", "1700 Chinese zh 0.052632\n", "1701 Maltese mt 0.052632" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Goldfields Observer (Kalgoorlie, WA : 1930 - 1939) (1626)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2143Englishen0.909091
2145Maltesemt0.051948
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2143 English en 0.909091\n", "2145 Maltese mt 0.051948" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Gwydir Examiner and Moree General Advertiser (NSW : 1898 - 1899) (886)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
480Englishen0.910112
481Maltesemt0.078652
\n", "
" ], "text/plain": [ " language_full language proportion\n", "480 English en 0.910112\n", "481 Maltese mt 0.078652" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Jewish Weekly News (Melbourne, Vic. : 1933 - 1935) (1707)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1717Englishen0.783505
1718Yiddishyi0.195876
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1717 English en 0.783505\n", "1718 Yiddish yi 0.195876" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Melbourne Advertiser (Vic. : 1838) (935)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1735Englishen0.666667
1736Welshcy0.333333
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1735 English en 0.666667\n", "1736 Welsh cy 0.333333" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Mildura Irrigationist (Vic. : 1892 - 1893) (1583)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1754Maltesemt0.795455
1753Englishen0.113636
1755Somaliso0.090909
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1754 Maltese mt 0.795455\n", "1753 English en 0.113636\n", "1755 Somali so 0.090909" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Mildura Irrigationist and Murray River Agricultural Times (Vic. : 1888) (1581)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1758Maltesemt0.739130
1756Englishen0.130435
1757Somaliso0.130435
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1758 Maltese mt 0.739130\n", "1756 English en 0.130435\n", "1757 Somali so 0.130435" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Mildura Irrigationist and Murray River Cultural Advocate (Vic. : 1891 - 1892) (1582)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1761Englishen0.523810
1760Maltesemt0.333333
1759Somaliso0.126984
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1761 English en 0.523810\n", "1760 Maltese mt 0.333333\n", "1759 Somali so 0.126984" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Millicent Times (SA : 1891 - 1905) (970)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1074Englishen0.94898
1075Catalanca0.05102
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1074 English en 0.94898\n", "1075 Catalan ca 0.05102" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Miner's Right (Boulder, WA : 1897) (1638)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2174Englishen0.909091
2176Maltesemt0.070707
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2174 English en 0.909091\n", "2176 Maltese mt 0.070707" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The News, Shoalhaven, Broughton Creek and Ulladulla Advertiser (NSW : 1875 - 1877) (1678)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
551Englishen0.913978
552Catalanca0.086022
\n", "
" ], "text/plain": [ " language_full language proportion\n", "551 English en 0.913978\n", "552 Catalan ca 0.086022" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Phillips River Times (Ravensthorpe, WA : 1908 - 1909) (1546)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2220Englishen0.9
2221Maltesemt0.1
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2220 English en 0.9\n", "2221 Maltese mt 0.1" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Port Phillip Patriot and Morning Advertiser (Vic. : 1845 - 1848) (937)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1768Englishen0.894737
1767Maltesemt0.084211
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1768 English en 0.894737\n", "1767 Maltese mt 0.084211" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Richmond River Express and Casino Kyogle Advertiser (NSW : 1904 - 1929) (500)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
584Englishen0.734940
582Maltesemt0.168675
583Somaliso0.072289
\n", "
" ], "text/plain": [ " language_full language proportion\n", "584 English en 0.734940\n", "582 Maltese mt 0.168675\n", "583 Somali so 0.072289" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Sydney Wool and Stock Journal (NSW : 1899 - 1917) (452)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
654Englishen0.727273
652Maltesemt0.233766
\n", "
" ], "text/plain": [ " language_full language proportion\n", "654 English en 0.727273\n", "652 Maltese mt 0.233766" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Tasmanian (Launceston, Tas. : 1871 - 1879) (946)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1243Englishen0.917808
1244Maltesemt0.082192
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1243 English en 0.917808\n", "1244 Maltese mt 0.082192" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Teetotaller and General Newspaper (Sydney, NSW : 1842 - 1843) (1036)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
657Englishen0.95
\n", "
" ], "text/plain": [ " language_full language proportion\n", "657 English en 0.95" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "The Voice of Freedom = Elefthera Phoni (Perth, WA : 1956 - 1957) (1381)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2262Modern Greek (1453-)el0.97
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2262 Modern Greek (1453-) el 0.97" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "To Ethnico Vema = Greek National Tribune (Arncliffe, NSW : 1931 - 1954) (1592)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
705Modern Greek (1453-)el0.989362
\n", "
" ], "text/plain": [ " language_full language proportion\n", "705 Modern Greek (1453-) el 0.989362" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Tung Wah News (Sydney, NSW : 1898 - 1902) (1185)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
712Chinesezh0.926316
\n", "
" ], "text/plain": [ " language_full language proportion\n", "712 Chinese zh 0.926316" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Tung Wah Times (Sydney, NSW : 1901 - 1936) (1184)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
719Chinesezh0.968085
\n", "
" ], "text/plain": [ " language_full language proportion\n", "719 Chinese zh 0.968085" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Twofold Bay Telegraph (NSW : 1860) (479)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
730Englishen0.945652
731Maltesemt0.054348
\n", "
" ], "text/plain": [ " language_full language proportion\n", "730 English en 0.945652\n", "731 Maltese mt 0.054348" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Twofold Bay and Maneroo Observer (NSW : 1860) (394)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
724Englishen0.825581
725Maltesemt0.139535
\n", "
" ], "text/plain": [ " language_full language proportion\n", "724 English en 0.825581\n", "725 Maltese mt 0.139535" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Uniamoci (Sydney, NSW : 1903 - 1904) (1599)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
732Italianit1.0
\n", "
" ], "text/plain": [ " language_full language proportion\n", "732 Italian it 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Upper Hunter Courier (Murrurundi, NSW : 1871) (810)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
733Englishen0.857143
734Maltesemt0.142857
\n", "
" ], "text/plain": [ " language_full language proportion\n", "733 English en 0.857143\n", "734 Maltese mt 0.142857" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Vesnik (Perth, WA : 1975 - 1994) (1382)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
2297Macedonianmk0.410526
2296Englishen0.357895
2298Bulgarianbg-Latn0.221053
\n", "
" ], "text/plain": [ " language_full language proportion\n", "2297 Macedonian mk 0.410526\n", "2296 English en 0.357895\n", "2298 Bulgarian bg-Latn 0.221053" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Vil'na Dumka = Free Thought (Sydney, NSW : 1949 - 1954) (1593)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
735Ukrainianuk0.82
736Englishen0.18
\n", "
" ], "text/plain": [ " language_full language proportion\n", "735 Ukrainian uk 0.82\n", "736 English en 0.18" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Warwick Daily News (Qld. : 1919 -1954) (892)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
906Englishen0.835443
907Maltesemt0.139241
\n", "
" ], "text/plain": [ " language_full language proportion\n", "906 English en 0.835443\n", "907 Maltese mt 0.139241" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Williamstown Trade Circular (Vic. : 1855 - 1856) (213)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
language_fulllanguageproportion
1831Englishen0.875
1832Portuguesept0.125
\n", "
" ], "text/plain": [ " language_full language proportion\n", "1831 English en 0.875\n", "1832 Portuguese pt 0.125" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for n, l in papers:\n", " if not l.loc[(~df['language'].isin(['en'])) & (df['proportion'] >= 0.05)].empty:\n", " print(f'\\n{n[0]} ({n[1]})')\n", " display(l[['language_full', 'language', 'proportion']].loc[(l['proportion'] > 0.05)].sort_values(by='proportion', ascending=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I went through the titles above and compiled a list of title identifiers that seem to be producing dodgy results. We can use this to filter these newspapers out of our results." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Titles where dodgy OCR causes false positives in language detection\n", "# This was manually created after scanning results\n", "dodgy = ['1036', '1043', '1103', '116', '1207', '1265', '13', '1320', '1336', '140', '1400', '145', '1488', '1543', '1546', '1581', '1582', '1583', '1617', '1623', '1626', '1638', '1675', '1678', '171', '196', '213', '224', '286', '292', '318', '329', '34', '384', '389', '394', '418', '430', '431', '452', '479', '499', '500', '570', '623', '763', '810', '860', '886', '892', '906', '92', '926', '927', '935', '937', '94', '946', '970', '986']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we'll add the dodgy title ids into our filter. It seems that we have 51 newspapers with significant amounts of non-English content." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "51" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The filter removes titles that only have one language, which is English\n", "filtered = df.loc[(~df['id'].isin(dodgy)) & (df['proportion'] >= 0.05)].groupby(by=['title', 'id']).filter(lambda x: (len(x) > 1) or (len(x)== 1 and x['language'] != 'en'))\n", "papers = filtered.groupby(by=['title', 'id'])\n", "len(papers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's list them." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A Voz de Timor (Dili, East Timor : 1970 - 1975)\n", "Adelaider Deutsche Zeitung (SA : 1851 - 1862)\n", "Australische Zeitung (Adelaide, SA : 1875 - 1916)\n", "Berita Repoeblik (Djakarta, Indonesia : 1945 - 1946)\n", "Chinese Republic News (Sydney, NSW : 1914 - 1937)\n", "Chinese Times (Melbourne, Vic. : 1902 - 1922)\n", "Chung Wah News (Perth, WA : 1981 - 1987)\n", "Der Australische Spiegel = The Australian Mirror (Perth, WA : 1952)\n", "Deutsch-Australische Post : Wochenschrift = German-Australian Post : Weekly (Sydney, NSW : 1893 - 1906)\n", "Deutsche Zeitung für Sud-Australien = German Times for South Australia (Tanunda, SA : 1851)\n", "Die Brucke = The Bridge (Sydney, NSW : 1934 - 1939)\n", "Die Deutsche Post für die Australischen Colonien = The German Australian Post (Adelaide, SA : 1848 - 1851)\n", "Dutch Australian Weekly (Sydney, NSW : 1951 - 1993)\n", "Dutch Weekly (Sydney, NSW : 1993 - 2004)\n", "Echo : Polski Tygodnik Niezalezny (Perth, WA : 1950 - 1952)\n", "Eco Italiano (Perth, WA : 1958 - 1959)\n", "Guang yi hua bao = The Chinese Australian Herald (Sydney, NSW : 1894 - 1923)\n", "Hellenic Echo (Perth, WA : 1967 - 1968)\n", "Il Canguro = The Kangaroo (Perth, WA : 1955 - 1957)\n", "Il Giornale Italiano (Sydney, NSW : 1932 - 1940)\n", "Il Risveglio = The Awakening (Sydney, NSW : 1944 - 1954)\n", "Italian Bulletin of Australia (Sydney, NSW : 1922 - 1928, 1935 - 1940)\n", "Italian Bulletin of Commerce (Sydney, NSW : 1929 - 1935)\n", "Italo-Australian (Sydney, NSW : 1927 - 1940)\n", "Japanese Perth Times (Subiaco, WA : 1989 - 1996)\n", "L'Italo-Australiano = The Italo-Australian (Surry Hills, NSW : 1885)\n", "L'Italo-Australiano = The Italo-Australian (Sydney, NSW : 1905 - 1909)\n", "La Rondine (Perth, WA : 1969 - 1994)\n", "Le Courrier Australien (Sydney, NSW : 1892 - 2011)\n", "Mediterranean Voice (Perth, WA : 1971 - 1972)\n", "Meie Kodu = Our Home (Sydney, NSW : 1949 - 1956)\n", "Mu̇sų Pastogė = Our Haven (Sydney, NSW : 1950 - 1954)\n", "Nasza droga (Adelaide, SA : 1952 - 1954)\n", "Norden (Melbourne, Vic. : 1914 - 1918)\n", "Oceania (Sydney, NSW : 1913 - 1915)\n", "Revue Australienne : Journal des Interets Francais en Australie, Nouvelle Caledonie, Nouvelle Zelande, Fiji, Tahiti, Polynesie =\n", "Stampa Italiana = The Italian Press (Perth, WA : 1931 - 1932)\n", "Suedaustralische Zeitung (Adelaide, SA : 1850 - 1851)\n", "Sunday Times Edizione Italiana (Perth, WA : 1958 - 1959)\n", "Süd Australische Zeitung (Tanunda and Adelaide, SA : 1860 - 1874)\n", "The Australian Jewish News (Melbourne, Vic. : 1935 - 1999)\n", "The Chinese Advertiser (Ballarat, Vic. : 1856)\n", "The English and Chinese Advertiser (Vic. : 1856 - 1858)\n", "The Jewish Weekly News (Melbourne, Vic. : 1933 - 1935)\n", "The Voice of Freedom = Elefthera Phoni (Perth, WA : 1956 - 1957)\n", "To Ethnico Vema = Greek National Tribune (Arncliffe, NSW : 1931 - 1954)\n", "Tung Wah News (Sydney, NSW : 1898 - 1902)\n", "Tung Wah Times (Sydney, NSW : 1901 - 1936)\n", "Uniamoci (Sydney, NSW : 1903 - 1904)\n", "Vesnik (Perth, WA : 1975 - 1994)\n", "Vil'na Dumka = Free Thought (Sydney, NSW : 1949 - 1954)\n" ] } ], "source": [ "for n, l in papers:\n", " print(n[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's looking pretty good. Let's save the results as a Markdown file to make it easy to explore. We'll include links into Trove. Here's the [list of all 51 newspapers](non-english-newspapers.md) (also as a [Gist](https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97))." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "with open(Path('non-english-newspapers.md'), 'w') as md_file:\n", " i = 1\n", " for n, l in papers:\n", " md_file.write(f'\\n### {i}. [{n[0]}](http://nla.gov.au/nla.news-title{n[1]})\\n\\n')\n", " md_file.write('| Language | Language code | Proportion of sample |\\n')\n", " md_file.write('|---|---|---|\\n')\n", " for row in l[['language_full', 'language', 'proportion']].loc[(l['proportion'] > 0.05)].sort_values(by='proportion', ascending=False).itertuples():\n", " md_file.write(f'| {row.language_full} | {row.language} | {row.proportion} |\\n')\n", " i += 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you look at the Markdown files you'll see that there are still some dodgy results – for example, 16% of the *Chinese Advertiser* is detected as 'Scottish Gaelic'. But the point of this exercise was to find non-English newspapers, rather than accurately detect the proportion of non-English content, so I think we can live with it for now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "\n", "Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n", "Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }