{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Finding non-English newspapers in Trove\n",
"\n",
"There are a growing number of non-English newspapers digitised in Trove. However, if you're only searching using English keywords, you might never know that they're there. I thought it would be useful to generate a list of non-English newspapers, but it wasn't quite as straightforward as I thought.\n",
"\n",
"## How not to do it...\n",
"\n",
"My first thought was I could start by searching for digitised newspapers amongst the library records in Trove. My theory was that catalogue metadata would include language information. For example, you can search for newspapers using `format:Periodical/Newspaper` in the books and libraries category (or the `article` API zone). To find those that are digitised, you can add a search for 'trove.nla.gov.au'. Here's the [sort of results](https://trove.nla.gov.au/search/category/books?keyword=%22trove.nla.gov.au%22%20format%3APeriodical%2FNewspaper) you get. Unfortunately, you only get about 826 results and there are many more newspapers than that in Trove. It seems links to digitised newspapers are not consistently recorded.\n",
"\n",
"My second approach was to get the list of digitised newspapers from the API, extract the ISSN, then use this to search for catalogue records. Here's the code snippet I used.\n",
"\n",
"``` python\n",
"params = {\n",
" 'zone': 'article',\n",
" 'encoding': 'json',\n",
" 'l-format': 'Periodical/Newspaper',\n",
" 'reclevel': 'full',\n",
" 'key': TROVE_API_KEY\n",
"}\n",
"newspapers = get_newspapers()\n",
"for newspaper in newspapers:\n",
" print(f'\\n{newspaper[\"title\"]}')\n",
" issn = newspaper.get('issn')\n",
" params['q'] = f'issn:{issn}'\n",
" response = s.get('https://api.trove.nla.gov.au/v2/result', params=params)\n",
" data = response.json()\n",
" try:\n",
" works = data['response']['zone'][0]['records']['work']\n",
" except KeyError:\n",
" print('Not found')\n",
" else:\n",
" for work in works:\n",
" print(work.get('language'))\n",
" if not response.from_cache:\n",
" time.sleep(0.2)\n",
"```\n",
"\n",
"The main problem here is that not all titles have ISSNs. You could try searching on the titles is there's no ISSN, but this would involve a fair bit of disambiguation. In any case, in running this I discovered that while there is some language information in the metadata, it's not consistently applied. So basically a metadata-only approach is not going to work. Sigh...\n",
"\n",
"## How I actually did it\n",
"\n",
"If I couldn't get language details from metadata, then I had to try and extract it from the resource itself. I spent quite a bit of time looking around for Python packages that provided reliable language detection. The first one I tried regularly identified Mandarin as Korean (it turns out this was a known issue). Another one sent me into dependency hell. Finally I found [pycld3](https://pypi.org/project/pycld3/) which installed with `pip`, and *just worked*.\n",
"\n",
"My plan was to get the list of newspapers via the API as before, then fire off an empty search for each one. I'd then loop through the results, running the language detector over the article text. I set the query parameters to retrieve the maxmimum number of results in one request – 100. That seemed like a reasonable sample. To try and provide a big enough amount of text for the language detector to work with, I set the number of words parameter to return articles with between 100 and 1000 words. So the query parameters I used were:\n",
"\n",
"``` python\n",
"params = {\n",
" 'zone': 'newspaper',\n",
" 'encoding': 'json',\n",
" 'l-word': '100 - 1000 Words',\n",
" 'include': 'articletext',\n",
" 'key': TROVE_API_KEY,\n",
" 'q': ' ',\n",
" 'n': 100,\n",
"}\n",
"```\n",
"\n",
"Because some of the newspapers had short runs and the word count filter limits the results, I found that I wasn't always getting 100 results per newspaper. To work around this I found the likely language for each article, aggregated the counts, and then calculated the proportion of results for each language. This gave me the proportion of articles in each language – a number I could use across newspapers to find the non-English titles. \n",
"\n",
"In general this worked pretty well, and the result was a [list of 52 newspapers](non-english-newspapers.md) (also as a [Gist](https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97)) that have significant amounts of non-English content. However, I had to do a fair bit of fiddling to filter out dodgy results. All the details are included below.\n",
"\n",
"## Problems / limitations\n",
"\n",
"* It's no surprise that the results of the language detection are affected by the quality of the OCR. \n",
"* In filtering out what seems to be the product of dodgy OCR, it's possible that I might be excluding some non-English content. \n",
"* I'm only detecting the predominant language for each article, so there might be articles containing a mix of languages that are being missed. \n",
"* I'm just talking the first 100 results from a blank search in each newspaper. Larger, or more randomised samples might produce different results.\n",
"* Some dodgy detection results remain in the list of newspapers, but the point of this exercise was to find non-English newspapers. If you wanted to accurately determine the quantity of non-English content, you'd have to do a lot more fine-grained analysis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import what we need"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import re\n",
"import time\n",
"from collections import Counter\n",
"from pathlib import Path\n",
"\n",
"import altair as alt\n",
"import cld3\n",
"import pandas as pd\n",
"import requests_cache\n",
"from IPython.display import display\n",
"from language_tags import tags\n",
"from requests.adapters import HTTPAdapter\n",
"from requests.packages.urllib3.util.retry import Retry\n",
"from tqdm.auto import tqdm\n",
"\n",
"s = requests_cache.CachedSession()\n",
"retries = Retry(total=5, backoff_factor=1, status_forcelist=[502, 503, 504])\n",
"s.mount(\"https://\", HTTPAdapter(max_retries=retries))\n",
"s.mount(\"http://\", HTTPAdapter(max_retries=retries))"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"# Load variables from the .env file if it exists\n",
"# Use %%capture to suppress messages\n",
"%load_ext dotenv\n",
"%dotenv"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Insert your Trove API key\n",
"API_KEY = \"YOUR API KEY\"\n",
"\n",
"# Use api key value from environment variables if it is available\n",
"if os.getenv(\"TROVE_API_KEY\"):\n",
" API_KEY = os.getenv(\"TROVE_API_KEY\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Harvest the data and run language detection on articles"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def get_newspapers():\n",
" \"\"\"\n",
" Get a list of newspapers in Trove.\n",
" \"\"\"\n",
" response = s.get(\n",
" \"https://api.trove.nla.gov.au/v2/newspaper/titles\",\n",
" params={\"encoding\": \"json\", \"key\": API_KEY},\n",
" )\n",
" data = response.json()\n",
" return data[\"response\"][\"records\"][\"newspaper\"]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "460e5f112a1e44afaa87359a8f58e617",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1741 [00:00, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"params = {\n",
" \"zone\": \"newspaper\",\n",
" \"encoding\": \"json\",\n",
" # 'l-category': 'Article',\n",
" \"l-word\": \"100 - 1000 Words\",\n",
" \"include\": \"articletext\",\n",
" \"key\": API_KEY,\n",
" \"q\": \" \",\n",
" \"n\": 100,\n",
"}\n",
"newspaper_langs = []\n",
"newspapers = get_newspapers()\n",
"for newspaper in tqdm(newspapers):\n",
" langs = []\n",
" # print(f'\\n{newspaper[\"title\"]}')\n",
" params[\"l-title\"] = newspaper[\"id\"]\n",
" response = s.get(\"https://api.trove.nla.gov.au/v2/result\", params=params)\n",
" data = response.json()\n",
" n = data[\"response\"][\"zone\"][0][\"records\"][\"n\"]\n",
" try:\n",
" articles = data[\"response\"][\"zone\"][0][\"records\"][\"article\"]\n",
" except KeyError:\n",
" # print('Not found')\n",
" pass\n",
" else:\n",
" # Detect language for each article in results\n",
" for article in articles:\n",
" if \"articleText\" in article:\n",
" # Clean up OCRd text by removing tags and extra whitespace\n",
" text = article[\"articleText\"]\n",
" text = re.sub(r\"<[^<]+?>\", \"\", text)\n",
" text = re.sub(r\"\\s\\s+\", \" \", text)\n",
" # Get the language\n",
" ld = cld3.get_language(text)\n",
" # If the language prediction is reliable, save it\n",
" if ld.is_reliable:\n",
" langs.append(ld.language)\n",
" # Find the count of each language detected in the sample of articles\n",
" for lang, count in dict(Counter(langs)).items():\n",
" # Calculate the language count as a proportion of the total number of results\n",
" prop = int(count) / len(langs)\n",
" newspaper_langs.append(\n",
" {\n",
" \"id\": newspaper[\"id\"],\n",
" \"title\": newspaper[\"title\"],\n",
" \"language\": lang,\n",
" \"proportion\": prop,\n",
" \"number\": n,\n",
" }\n",
" )\n",
" if not response.from_cache:\n",
" time.sleep(0.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Convert the results into a dataframe."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" title | \n",
" language | \n",
" proportion | \n",
" number | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 166 | \n",
" Canberra Community News (ACT : 1925 - 1927) | \n",
" en | \n",
" 1.0 | \n",
" 100 | \n",
"
\n",
" \n",
" 1 | \n",
" 165 | \n",
" Canberra Illustrated: A Quarterly Magazine (AC... | \n",
" en | \n",
" 1.0 | \n",
" 29 | \n",
"
\n",
" \n",
" 2 | \n",
" 69 | \n",
" Federal Capital Pioneer (Canberra, ACT : 1924 ... | \n",
" en | \n",
" 1.0 | \n",
" 100 | \n",
"
\n",
" \n",
" 3 | \n",
" 871 | \n",
" Good Neighbour (ACT : 1950 - 1969) | \n",
" en | \n",
" 1.0 | \n",
" 100 | \n",
"
\n",
" \n",
" 4 | \n",
" 665 | \n",
" Student Notes/Canberra University College Stud... | \n",
" en | \n",
" 1.0 | \n",
" 100 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id title language \\\n",
"0 166 Canberra Community News (ACT : 1925 - 1927) en \n",
"1 165 Canberra Illustrated: A Quarterly Magazine (AC... en \n",
"2 69 Federal Capital Pioneer (Canberra, ACT : 1924 ... en \n",
"3 871 Good Neighbour (ACT : 1950 - 1969) en \n",
"4 665 Student Notes/Canberra University College Stud... en \n",
"\n",
" proportion number \n",
"0 1.0 100 \n",
"1 1.0 29 \n",
"2 1.0 100 \n",
"3 1.0 100 \n",
"4 1.0 100 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(newspaper_langs)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add full language names\n",
"\n",
"The language detector returns BCP-47-style language codes. To translate these into something that's a bit easier for humans to understand, we can use the [language-tags](https://github.com/OnroerendErfgoed/language-tags) package."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def get_full_language(lc):\n",
" \"\"\"\n",
" Get full language names from codes\n",
" \"\"\"\n",
" lang = tags.description(lc)\n",
" if lang:\n",
" return lang[0]\n",
" else:\n",
" print(lc)\n",
" return lc\n",
"\n",
"\n",
"df[\"language_full\"] = df[\"language\"].apply(get_full_language)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering the results\n",
"\n",
"If we just look at the numbers of languages detected we might think that Australia's cultural diversity was much greater than we expected! But the likelihood that there were ten newspapers publishing articles in Igbo (the language of the Igbo people in south-eastern Nigeria) seems small. Obviously there are a considerable number of false positives here."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"English 1680\n",
"Maltese 177\n",
"Japanese 28\n",
"Italian 22\n",
"Somali 18\n",
"German 16\n",
"Welsh 15\n",
"Catalan 12\n",
"Portuguese 9\n",
"Norwegian 9\n",
"Chinese 8\n",
"Estonian 7\n",
"Danish 7\n",
"Hindi 6\n",
"French 6\n",
"Western Frisian 6\n",
"Corsican 6\n",
"Hawaiian 4\n",
"Bulgarian 4\n",
"Vietnamese 4\n",
"Polish 4\n",
"Igbo 4\n",
"Indonesian 4\n",
"Modern Greek (1453-) 4\n",
"Luxembourgish 3\n",
"Javanese 3\n",
"Yiddish 3\n",
"Dutch 3\n",
"Scottish Gaelic 3\n",
"Swedish 3\n",
"Czech 2\n",
"Samoan 2\n",
"Latin 2\n",
"Kurdish 2\n",
"Malagasy 2\n",
"Filipino 2\n",
"Russian 2\n",
"Malay (macrolanguage) 2\n",
"Bosnian 2\n",
"Spanish 2\n",
"Cebuano 2\n",
"Uzbek 1\n",
"Slovenian 1\n",
"Irish 1\n",
"Croatian 1\n",
"Haitian 1\n",
"Turkish 1\n",
"Hebrew 1\n",
"Maori 1\n",
"Zulu 1\n",
"Galician 1\n",
"Latvian 1\n",
"Shona 1\n",
"Ukrainian 1\n",
"Lithuanian 1\n",
"Afrikaans 1\n",
"Hausa 1\n",
"Macedonian 1\n",
"Name: language_full, dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"language_full\"].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember that for each language detected in a newspaper we calculated the proportion of articles in our results set in that language. So we can, for example, just look at newspapers where 100% of the articles are in a single language. This highlights a few non-English language newspapers, but obviously we're missing a lot of others."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"English 1422\n",
"German 3\n",
"Italian 3\n",
"Modern Greek (1453-) 2\n",
"Estonian 1\n",
"Yiddish 1\n",
"Name: language_full, dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df[\"proportion\"] == 1][\"language_full\"].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we chart the proportions, we see them bunched up at either end of the scale. So there are lots of languages detected in only a small proportion of articles."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df).mark_bar().encode(x=alt.X(\"proportion:Q\", bin=True), y=\"count():Q\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we zoom in on the proportions less than 0.1 (that's 10 articles in a sample of 100) we see that they're mostly less that 0.01 (or 1 article in 100). It seems likely that these are false positives. "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(df.loc[df[\"proportion\"] < 0.1]).mark_bar().encode(\n",
" x=alt.X(\"proportion:Q\", bin=True), y=\"count():Q\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's be fairly conservative and filter out languages that have a proportion (per newspaper) less than 0.5. This list seems a bit more in line with what we would expect, but there are still some surprises – 34 newspapers published articles in Maltese?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"English 1670\n",
"Maltese 33\n",
"Italian 15\n",
"German 9\n",
"Chinese 8\n",
"Somali 5\n",
"Modern Greek (1453-) 4\n",
"Japanese 3\n",
"Portuguese 3\n",
"Yiddish 3\n",
"French 3\n",
"Polish 3\n",
"Western Frisian 2\n",
"Dutch 2\n",
"Malay (macrolanguage) 1\n",
"Lithuanian 1\n",
"Ukrainian 1\n",
"Estonian 1\n",
"Indonesian 1\n",
"Vietnamese 1\n",
"Danish 1\n",
"Swedish 1\n",
"Bosnian 1\n",
"Russian 1\n",
"Scottish Gaelic 1\n",
"Welsh 1\n",
"Spanish 1\n",
"Corsican 1\n",
"Macedonian 1\n",
"Bulgarian 1\n",
"Name: language_full, dtype: int64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[df[\"proportion\"] >= 0.05][\"language_full\"].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we focus in on the newspapers that supposedly have a significant proportion of articles in Maltese, we see some very strange results. I seriously doubt that 80% of the *Mildura Irrigationist* from 1892-3 is in Maltese. So what's going on?"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" title | \n",
" language | \n",
" proportion | \n",
" number | \n",
" language_full | \n",
"
\n",
" \n",
" \n",
" \n",
" 203 | \n",
" 1596 | \n",
" L'Italo-Australiano = The Italo-Australian (Su... | \n",
" mt | \n",
" 0.206349 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 270 | \n",
" 389 | \n",
" Reporter and Illawarra Journal (Kiama, NSW : 1... | \n",
" mt | \n",
" 0.105882 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 286 | \n",
" 418 | \n",
" Southern Morning Herald (Goulburn, NSW : 1920 ... | \n",
" mt | \n",
" 0.146667 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 289 | \n",
" 623 | \n",
" Sunday News (Sydney, NSW : 1919) | \n",
" mt | \n",
" 0.181818 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 530 | \n",
" 500 | \n",
" The Richmond River Express and Casino Kyogle A... | \n",
" mt | \n",
" 0.126437 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 654 | \n",
" 810 | \n",
" Upper Hunter Courier (Murrurundi, NSW : 1871) | \n",
" mt | \n",
" 0.142857 | \n",
" 14 | \n",
" Maltese | \n",
"
\n",
" \n",
" 812 | \n",
" 892 | \n",
" Warwick Daily News (Qld. : 1919 -1954) | \n",
" mt | \n",
" 0.111111 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 928 | \n",
" 34 | \n",
" The Advertiser (Adelaide, SA : 1889 - 1931) | \n",
" mt | \n",
" 0.486111 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1205 | \n",
" 543 | \n",
" Cobden Times (Vic. : 1918) | \n",
" mt | \n",
" 0.109890 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1375 | \n",
" 384 | \n",
" North Melbourne Gazette (Vic. : 1894 - 1901) | \n",
" mt | \n",
" 0.189873 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1431 | \n",
" 318 | \n",
" Sandringham Southern Cross (Vic. : 1914 - 1918) | \n",
" mt | \n",
" 0.243902 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1565 | \n",
" 1583 | \n",
" The Mildura Irrigationist (Vic. : 1892 - 1893) | \n",
" mt | \n",
" 0.762500 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1568 | \n",
" 1581 | \n",
" The Mildura Irrigationist and Murray River Agr... | \n",
" mt | \n",
" 0.626667 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1577 | \n",
" 1733 | \n",
" The Morwell Advocate and Boolara and Mirboo Ch... | \n",
" mt | \n",
" 0.625000 | \n",
" 21 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1580 | \n",
" 1734 | \n",
" The Morwell Advocate and Narracan, Boolara and... | \n",
" mt | \n",
" 0.170732 | \n",
" 100 | \n",
" Maltese | \n",
"
\n",
" \n",
" 1927 | \n",
" 1617 | \n",
" The Derby News (WA : 1887) | \n",
" mt | \n",
" 0.750000 | \n",
" 5 | \n",
" Maltese | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id title language \\\n",
"203 1596 L'Italo-Australiano = The Italo-Australian (Su... mt \n",
"270 389 Reporter and Illawarra Journal (Kiama, NSW : 1... mt \n",
"286 418 Southern Morning Herald (Goulburn, NSW : 1920 ... mt \n",
"289 623 Sunday News (Sydney, NSW : 1919) mt \n",
"530 500 The Richmond River Express and Casino Kyogle A... mt \n",
"654 810 Upper Hunter Courier (Murrurundi, NSW : 1871) mt \n",
"812 892 Warwick Daily News (Qld. : 1919 -1954) mt \n",
"928 34 The Advertiser (Adelaide, SA : 1889 - 1931) mt \n",
"1205 543 Cobden Times (Vic. : 1918) mt \n",
"1375 384 North Melbourne Gazette (Vic. : 1894 - 1901) mt \n",
"1431 318 Sandringham Southern Cross (Vic. : 1914 - 1918) mt \n",
"1565 1583 The Mildura Irrigationist (Vic. : 1892 - 1893) mt \n",
"1568 1581 The Mildura Irrigationist and Murray River Agr... mt \n",
"1577 1733 The Morwell Advocate and Boolara and Mirboo Ch... mt \n",
"1580 1734 The Morwell Advocate and Narracan, Boolara and... mt \n",
"1927 1617 The Derby News (WA : 1887) mt \n",
"\n",
" proportion number language_full \n",
"203 0.206349 100 Maltese \n",
"270 0.105882 100 Maltese \n",
"286 0.146667 100 Maltese \n",
"289 0.181818 100 Maltese \n",
"530 0.126437 100 Maltese \n",
"654 0.142857 14 Maltese \n",
"812 0.111111 100 Maltese \n",
"928 0.486111 100 Maltese \n",
"1205 0.109890 100 Maltese \n",
"1375 0.189873 100 Maltese \n",
"1431 0.243902 100 Maltese \n",
"1565 0.762500 100 Maltese \n",
"1568 0.626667 100 Maltese \n",
"1577 0.625000 21 Maltese \n",
"1580 0.170732 100 Maltese \n",
"1927 0.750000 5 Maltese "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[(df[\"proportion\"] > 0.1) & (df[\"language_full\"] == \"Maltese\")]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you look at results for the *Mildura Irrigationist* [in Trove](https://trove.nla.gov.au/search/advanced/category/newspapers?l-advtitle=1583&l-advWord=100%20-%201000%20Words) you'll see that many of the page images are blurry, and as a result the OCR is very, very bad. Here's a sample:\n",
"\n",
"> ill Tatr W lyltwililUmt aat aa«v aa MwOkaWtOPMlkMrf faiflftMMRltitlWBfMNM fmiMW^M^K IMIOHIpM^fQBMMI ft tWMmrwl tWWiltjfNMStW ffw aailwt«M wtMitiar«lH*a ifcmH af tlw ial«««l ion «M««f ffantoif wwtMaaM. tto tf h «frwringmhw torf M hr toaiy. Im*4. ar, fc> mmirf awlUW wefllaM aA. aaytMaa. l «Wa A tfc» tow waliw Macks b aaM, b wil fVfbH Ja ^IMntaam* Mm' ls tolliac. rt Tto aad nf ttoar UhKMimiw*a afM» ftjrwl ans W l OtfWOar jpaaofTwSi aJwwr la'aahS^*— attor aakwt mm rvfimMiMh* ttoai. day - Why. aa IH thrf t«fl almd yaa.\"iw. aal wwifciha m OiO all tto laM amnavaA, fawawNl I r aa4 f wa* tm enr a Mtcfc tto watrr tto wiaaal m a* a* day pfaMat. aa4 (h* ilj amintir* ilm tTtsjtvL.f**' \"\"j •fria—lhati* tow ««4M k.\" tlml t | r 4m» wtn .aa rUa* I h ha«« t ctoantaf InMM* aM*toclt ttopnaMaf II It la Mat rtgM, t jmi awl a 1 : af but d awtliqg a Mr. Jafc Matwa-(MMa M t «wl y gha yaar «toa anl yaar (ma as «fpai ta af t«l. i pwwiaf Mtan (tot jw. twy MwUI «*a1 a«ry ftajr «ndl tar tlw aad annaH* a*«r aarf a««r aaria. tiaa"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What happens when we feed this fragment of bad OCR to the language detector? Remarkably, the language detector is 96% sure that it's Maltese! To find out why this is the case, we'd probably have to dig into the way the language detection model was trained. But for our purposes it's enough to know that some of the languages detected seem to be the result of bad OCR."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LanguagePrediction(language='mt', probability=0.960280179977417, is_reliable=True, proportion=1.0)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ocr = \"\"\"ill Tatr W lyltwililUmt aat aa«v aa MwOkaWtOPMlkMrf faiflftMMRltitlWBfMNM fmiMW^M^K IMIOHIpM^fQBMMI ft tWMmrwl tWWiltjfNMStW ffw aailwt«M wtMitiar«lH*a ifcmH af tlw ial«««l ion «M««f ffantoif wwtMaaM. tto tf h «frwringmhw torf M hr toaiy. Im*4. ar, fc> mmirf awlUW wefllaM aA. aaytMaa. l «Wa A tfc» tow waliw Macks b aaM, b wil fVfbH Ja ^IMntaam* Mm' ls tolliac. rt Tto aad nf ttoar UhKMimiw*a afM» ftjrwl ans W l OtfWOar jpaaofTwSi aJwwr la'aahS^*— attor aakwt mm rvfimMiMh* ttoai. day - Why. aa IH thrf t«fl almd yaa.\"iw. aal wwifciha m OiO all tto laM amnavaA, fawawNl I r aa4 f wa* tm enr a Mtcfc tto watrr tto wiaaal m a* a* day pfaMat. aa4 (h* ilj amintir* ilm tTtsjtvL.f**' \"\"j •fria—lhati* tow ««4M k.\" tlml t | r 4m» wtn .aa rUa* I h ha«« t ctoantaf InMM* aM*toclt ttopnaMaf II It la Mat rtgM, t jmi awl a 1 : af but d awtliqg a Mr. Jafc Matwa-(MMa M t «wl y gha yaar «toa anl yaar (ma as «fpai ta af t«l. i pwwiaf Mtan (tot jw. twy MwUI «*a1 a«ry ftajr «ndl tar tlw aad annaH* a*«r aarf a««r aaria. tiaa\"\"\"\n",
"cld3.get_language(ocr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course there might actually be newspapers with articles in Maltese, so we don't want to filter them all out. So let's do some manual inspection of the newspapers that *seem* to have non-English content. First we'll filter our results to include only languages with proportions of more than 0.05, and then drop out newspapers that seem to be only in English. We end up with 89 different titles. "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"89"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The filter on the groupby drops out newspapers that only have articles in English.\n",
"filtered = (\n",
" df.loc[df[\"proportion\"] >= 0.05]\n",
" .groupby(by=[\"title\", \"id\"])\n",
" .filter(lambda x: (len(x) > 1) or (len(x) == 1 and x[\"language\"] != \"en\"))\n",
")\n",
"papers = filtered.groupby(by=[\"title\", \"id\"])\n",
"len(papers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's list those 89 newspapers. From the list below, I think it's pretty easy to pick out the results that are likely to be the product of bad OCR."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"A Voz de Timor (Dili, East Timor : 1970 - 1975) (1498)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 8 | \n",
" Portuguese | \n",
" pt | \n",
" 0.988889 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"8 Portuguese pt 0.988889"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Adelaider Deutsche Zeitung (SA : 1851 - 1862) (277)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 828 | \n",
" German | \n",
" de | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"828 German de 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Auburn and District News (NSW : 1929) (1320)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 43 | \n",
" English | \n",
" en | \n",
" 0.947368 | \n",
"
\n",
" \n",
" 44 | \n",
" Vietnamese | \n",
" vi | \n",
" 0.052632 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"43 English en 0.947368\n",
"44 Vietnamese vi 0.052632"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Australier Leben = Australian Life (Melbourne, Vic. : 1931 - 1933) (1686)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1158 | \n",
" Yiddish | \n",
" yi | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1158 Yiddish yi 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Australische Zeitung (Adelaide, SA : 1875 - 1916) (1150)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 832 | \n",
" German | \n",
" de | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"832 German de 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Berita Repoeblik (Djakarta, Indonesia : 1945 - 1946) (1283)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 14 | \n",
" Malay (macrolanguage) | \n",
" ms | \n",
" 0.891304 | \n",
"
\n",
" \n",
" 15 | \n",
" Indonesian | \n",
" id | \n",
" 0.108696 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"14 Malay (macrolanguage) ms 0.891304\n",
"15 Indonesian id 0.108696"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Chinese Republic News (Sydney, NSW : 1914 - 1937) (1186)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 83 | \n",
" Chinese | \n",
" zh | \n",
" 0.928571 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"83 Chinese zh 0.928571"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Chinese Times (Melbourne, Vic. : 1902 - 1922) (705)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1194 | \n",
" Chinese | \n",
" zh | \n",
" 0.918367 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1194 Chinese zh 0.918367"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Chronicle and North Coast Advertiser (Qld. : 1903 - 1922) (286)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 695 | \n",
" English | \n",
" en | \n",
" 0.94898 | \n",
"
\n",
" \n",
" 696 | \n",
" Maltese | \n",
" mt | \n",
" 0.05102 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"695 English en 0.94898\n",
"696 Maltese mt 0.05102"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Chung Wah News (Perth, WA : 1981 - 1987) (1383)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1694 | \n",
" English | \n",
" en | \n",
" 0.566667 | \n",
"
\n",
" \n",
" 1693 | \n",
" Chinese | \n",
" zh | \n",
" 0.388889 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1694 English en 0.566667\n",
"1693 Chinese zh 0.388889"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Cobden Times (Vic. : 1918) (543)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1204 | \n",
" English | \n",
" en | \n",
" 0.857143 | \n",
"
\n",
" \n",
" 1205 | \n",
" Maltese | \n",
" mt | \n",
" 0.109890 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1204 English en 0.857143\n",
"1205 Maltese mt 0.109890"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Colac Reformer (Vic. : 1914 - 1918) (763)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1214 | \n",
" English | \n",
" en | \n",
" 0.947368 | \n",
"
\n",
" \n",
" 1215 | \n",
" Maltese | \n",
" mt | \n",
" 0.052632 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1214 English en 0.947368\n",
"1215 Maltese mt 0.052632"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Daily Post (Hobart, Tas. : 1908 - 1918) (860)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1011 | \n",
" English | \n",
" en | \n",
" 0.719101 | \n",
"
\n",
" \n",
" 1012 | \n",
" Japanese | \n",
" ja | \n",
" 0.112360 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1011 English en 0.719101\n",
"1012 Japanese ja 0.112360"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Der Australische Spiegel = The Australian Mirror (Perth, WA : 1952) (1385)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1716 | \n",
" German | \n",
" de | \n",
" 0.82 | \n",
"
\n",
" \n",
" 1717 | \n",
" English | \n",
" en | \n",
" 0.18 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1716 German de 0.82\n",
"1717 English en 0.18"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Deutsch-Australische Post : Wochenschrift = German-Australian Post : Weekly (Sydney, NSW : 1893 - 1906) (1600)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 125 | \n",
" German | \n",
" de | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"125 German de 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Deutsche Zeitung für Sud-Australien = German Times for South Australia (Tanunda, SA : 1851) (1577)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 844 | \n",
" German | \n",
" de | \n",
" 0.9 | \n",
"
\n",
" \n",
" 843 | \n",
" English | \n",
" en | \n",
" 0.1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"844 German de 0.9\n",
"843 English en 0.1"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Die Brucke = The Bridge (Sydney, NSW : 1934 - 1939) (1591)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 126 | \n",
" German | \n",
" de | \n",
" 0.704082 | \n",
"
\n",
" \n",
" 127 | \n",
" English | \n",
" en | \n",
" 0.295918 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"126 German de 0.704082\n",
"127 English en 0.295918"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Die Deutsche Post für die Australischen Colonien = The German Australian Post (Adelaide, SA : 1848 - 1851) (1576)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 845 | \n",
" German | \n",
" de | \n",
" 0.989583 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"845 German de 0.989583"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Dutch Australian Weekly (Sydney, NSW : 1951 - 1993) (1044)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 131 | \n",
" Dutch | \n",
" nl | \n",
" 0.969697 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"131 Dutch nl 0.969697"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Dutch Weekly (Sydney, NSW : 1993 - 2004) (1045)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 134 | \n",
" Dutch | \n",
" nl | \n",
" 0.919192 | \n",
"
\n",
" \n",
" 135 | \n",
" English | \n",
" en | \n",
" 0.060606 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"134 Dutch nl 0.919192\n",
"135 English en 0.060606"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Echo : Polski Tygodnik Niezalezny (Perth, WA : 1950 - 1952) (1384)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1721 | \n",
" Polish | \n",
" pl | \n",
" 0.91 | \n",
"
\n",
" \n",
" 1722 | \n",
" English | \n",
" en | \n",
" 0.09 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1721 Polish pl 0.91\n",
"1722 English en 0.09"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Eco Italiano (Perth, WA : 1958 - 1959) (1387)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1723 | \n",
" Italian | \n",
" it | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1723 Italian it 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Emu Bay Times and North West and West Coast Advocate (Tas. : 1897 - 1899) (116)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1027 | \n",
" English | \n",
" en | \n",
" 0.933333 | \n",
"
\n",
" \n",
" 1028 | \n",
" Maltese | \n",
" mt | \n",
" 0.066667 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1027 English en 0.933333\n",
"1028 Maltese mt 0.066667"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Evelyn Observer, and South and East Bourke Record (Vic. : 1882 - 1902) (145)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1241 | \n",
" English | \n",
" en | \n",
" 0.913978 | \n",
"
\n",
" \n",
" 1240 | \n",
" Maltese | \n",
" mt | \n",
" 0.075269 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1241 English en 0.913978\n",
"1240 Maltese mt 0.075269"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Geraldton Advocate and Johnstone River Guardian (Qld. : 1895 - 1896) (1103)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 704 | \n",
" English | \n",
" en | \n",
" 0.947917 | \n",
"
\n",
" \n",
" 705 | \n",
" Maltese | \n",
" mt | \n",
" 0.052083 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"704 English en 0.947917\n",
"705 Maltese mt 0.052083"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Geraldton Express and Murchison Goldfields News (WA : 1894 - 1896) (1623)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1734 | \n",
" English | \n",
" en | \n",
" 0.643836 | \n",
"
\n",
" \n",
" 1735 | \n",
" Maltese | \n",
" mt | \n",
" 0.095890 | \n",
"
\n",
" \n",
" 1739 | \n",
" Japanese | \n",
" ja | \n",
" 0.068493 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1734 English en 0.643836\n",
"1735 Maltese mt 0.095890\n",
"1739 Japanese ja 0.068493"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Guang yi hua bao = The Chinese Australian Herald (Sydney, NSW : 1894 - 1923) (704)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 162 | \n",
" Chinese | \n",
" zh | \n",
" 0.854167 | \n",
"
\n",
" \n",
" 165 | \n",
" Western Frisian | \n",
" fy | \n",
" 0.062500 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"162 Chinese zh 0.854167\n",
"165 Western Frisian fy 0.062500"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Hamilton Spectator and Grange District Advertiser (Vic. : 1860 - 1870) (927)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1282 | \n",
" English | \n",
" en | \n",
" 0.915789 | \n",
"
\n",
" \n",
" 1283 | \n",
" Maltese | \n",
" mt | \n",
" 0.073684 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1282 English en 0.915789\n",
"1283 Maltese mt 0.073684"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Hellenic Echo (Perth, WA : 1967 - 1968) (1389)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1771 | \n",
" Modern Greek (1453-) | \n",
" el | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1771 Modern Greek (1453-) el 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Il Canguro = The Kangaroo (Perth, WA : 1955 - 1957) (1378)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1773 | \n",
" Italian | \n",
" it | \n",
" 0.97 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1773 Italian it 0.97"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Il Giornale Italiano (Sydney, NSW : 1932 - 1940) (279)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 175 | \n",
" Italian | \n",
" it | \n",
" 0.91 | \n",
"
\n",
" \n",
" 176 | \n",
" English | \n",
" en | \n",
" 0.09 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"175 Italian it 0.91\n",
"176 English en 0.09"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Il Risveglio = The Awakening (Sydney, NSW : 1944 - 1954) (1601)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 177 | \n",
" Italian | \n",
" it | \n",
" 0.75 | \n",
"
\n",
" \n",
" 178 | \n",
" English | \n",
" en | \n",
" 0.25 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"177 Italian it 0.75\n",
"178 English en 0.25"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Italian Bulletin of Australia (Sydney, NSW : 1922 - 1928, 1935 - 1940) (1602)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 188 | \n",
" English | \n",
" en | \n",
" 0.833333 | \n",
"
\n",
" \n",
" 189 | \n",
" Italian | \n",
" it | \n",
" 0.166667 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"188 English en 0.833333\n",
"189 Italian it 0.166667"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Italian Bulletin of Commerce (Sydney, NSW : 1929 - 1935) (1603)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 190 | \n",
" English | \n",
" en | \n",
" 0.893617 | \n",
"
\n",
" \n",
" 191 | \n",
" Italian | \n",
" it | \n",
" 0.106383 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"190 English en 0.893617\n",
"191 Italian it 0.106383"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Italo-Australian (Sydney, NSW : 1927 - 1940) (1595)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 192 | \n",
" Italian | \n",
" it | \n",
" 0.97 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"192 Italian it 0.97"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Japanese Perth Times (Subiaco, WA : 1989 - 1996) (1386)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1777 | \n",
" Japanese | \n",
" ja | \n",
" 0.9375 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1777 Japanese ja 0.9375"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Kyabram Union (Vic. : 1886 - 1894) (196)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1326 | \n",
" English | \n",
" en | \n",
" 0.931818 | \n",
"
\n",
" \n",
" 1327 | \n",
" Maltese | \n",
" mt | \n",
" 0.068182 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1326 English en 0.931818\n",
"1327 Maltese mt 0.068182"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"L'Italo-Australiano = The Italo-Australian (Surry Hills, NSW : 1885) (1596)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 202 | \n",
" Italian | \n",
" it | \n",
" 0.698413 | \n",
"
\n",
" \n",
" 203 | \n",
" Maltese | \n",
" mt | \n",
" 0.206349 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"202 Italian it 0.698413\n",
"203 Maltese mt 0.206349"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"L'Italo-Australiano = The Italo-Australian (Sydney, NSW : 1905 - 1909) (1597)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 208 | \n",
" Italian | \n",
" it | \n",
" 0.97 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"208 Italian it 0.97"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"La Rondine (Perth, WA : 1970 - 1974; 1983 - 1984) (1388)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1796 | \n",
" Italian | \n",
" it | \n",
" 0.98 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1796 Italian it 0.98"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Le Courrier Australien (Sydney, NSW : 1892 - 2011) (829)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 212 | \n",
" French | \n",
" fr | \n",
" 0.76 | \n",
"
\n",
" \n",
" 213 | \n",
" English | \n",
" en | \n",
" 0.24 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"212 French fr 0.76\n",
"213 English en 0.24"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Mediterranean Voice (Perth, WA : 1971 - 1972) (1390)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1815 | \n",
" Modern Greek (1453-) | \n",
" el | \n",
" 0.357143 | \n",
"
\n",
" \n",
" 1814 | \n",
" English | \n",
" en | \n",
" 0.224490 | \n",
"
\n",
" \n",
" 1816 | \n",
" Portuguese | \n",
" pt | \n",
" 0.153061 | \n",
"
\n",
" \n",
" 1809 | \n",
" French | \n",
" fr | \n",
" 0.081633 | \n",
"
\n",
" \n",
" 1808 | \n",
" Spanish | \n",
" es | \n",
" 0.061224 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1815 Modern Greek (1453-) el 0.357143\n",
"1814 English en 0.224490\n",
"1816 Portuguese pt 0.153061\n",
"1809 French fr 0.081633\n",
"1808 Spanish es 0.061224"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Meie Kodu = Our Home (Sydney, NSW : 1949 - 1956) (280)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 221 | \n",
" Estonian | \n",
" et | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"221 Estonian et 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Murchison Times and Cue-Big Bell-Reedy Advocate (WA : 1937 - 1942) (1543)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1838 | \n",
" English | \n",
" en | \n",
" 0.892857 | \n",
"
\n",
" \n",
" 1839 | \n",
" Maltese | \n",
" mt | \n",
" 0.071429 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1838 English en 0.892857\n",
"1839 Maltese mt 0.071429"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Musu Pastoge = Our Haven (Sydney, NSW : 1950 - 1954) (1594)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 233 | \n",
" Lithuanian | \n",
" lt | \n",
" 0.95 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"233 Lithuanian lt 0.95"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Nasza droga (Adelaide, SA : 1952 - 1954) (1323)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 869 | \n",
" Polish | \n",
" pl | \n",
" 0.89 | \n",
"
\n",
" \n",
" 870 | \n",
" English | \n",
" en | \n",
" 0.11 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"869 Polish pl 0.89\n",
"870 English en 0.11"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Norden (Melbourne, Vic. : 1914 - 1918) (797)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1366 | \n",
" Danish | \n",
" da | \n",
" 0.752809 | \n",
"
\n",
" \n",
" 1369 | \n",
" Swedish | \n",
" sv | \n",
" 0.112360 | \n",
"
\n",
" \n",
" 1367 | \n",
" English | \n",
" en | \n",
" 0.067416 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1366 Danish da 0.752809\n",
"1369 Swedish sv 0.112360\n",
"1367 English en 0.067416"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"North Melbourne Gazette (Vic. : 1894 - 1901) (384)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1374 | \n",
" English | \n",
" en | \n",
" 0.784810 | \n",
"
\n",
" \n",
" 1375 | \n",
" Maltese | \n",
" mt | \n",
" 0.189873 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1374 English en 0.784810\n",
"1375 Maltese mt 0.189873"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Oceania (Sydney, NSW : 1913 - 1915) (1598)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 254 | \n",
" Italian | \n",
" it | \n",
" 0.54 | \n",
"
\n",
" \n",
" 255 | \n",
" English | \n",
" en | \n",
" 0.46 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"254 Italian it 0.54\n",
"255 English en 0.46"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Reporter and Illawarra Journal (Kiama, NSW : 1887 - 1894) (389)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 269 | \n",
" English | \n",
" en | \n",
" 0.894118 | \n",
"
\n",
" \n",
" 270 | \n",
" Maltese | \n",
" mt | \n",
" 0.105882 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"269 English en 0.894118\n",
"270 Maltese mt 0.105882"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Revue Australienne : Journal des Interets Francais en Australie ... (Sydney, NSW : 1873 - 1874) (1604)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 271 | \n",
" French | \n",
" fr | \n",
" 0.98 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"271 French fr 0.98"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Ringwood and Croydon Chronicle (Vic. : 1914 - 1918) (329)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1422 | \n",
" English | \n",
" en | \n",
" 0.938144 | \n",
"
\n",
" \n",
" 1423 | \n",
" Maltese | \n",
" mt | \n",
" 0.061856 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1422 English en 0.938144\n",
"1423 Maltese mt 0.061856"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Sandringham Southern Cross (Vic. : 1914 - 1918) (318)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1430 | \n",
" English | \n",
" en | \n",
" 0.731707 | \n",
"
\n",
" \n",
" 1431 | \n",
" Maltese | \n",
" mt | \n",
" 0.243902 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1430 English en 0.731707\n",
"1431 Maltese mt 0.243902"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Seamen's Strike Bulletin (Melbourne, Vic. : 1919) (1043)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1436 | \n",
" Polish | \n",
" pl | \n",
" 0.4 | \n",
"
\n",
" \n",
" 1435 | \n",
" Bosnian | \n",
" bs | \n",
" 0.2 | \n",
"
\n",
" \n",
" 1437 | \n",
" Russian | \n",
" ru-Latn | \n",
" 0.2 | \n",
"
\n",
" \n",
" 1438 | \n",
" Western Frisian | \n",
" fy | \n",
" 0.2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1436 Polish pl 0.4\n",
"1435 Bosnian bs 0.2\n",
"1437 Russian ru-Latn 0.2\n",
"1438 Western Frisian fy 0.2"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Southern Morning Herald (Goulburn, NSW : 1920 - 1923) (418)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 285 | \n",
" English | \n",
" en | \n",
" 0.800000 | \n",
"
\n",
" \n",
" 286 | \n",
" Maltese | \n",
" mt | \n",
" 0.146667 | \n",
"
\n",
" \n",
" 287 | \n",
" Somali | \n",
" so | \n",
" 0.053333 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"285 English en 0.800000\n",
"286 Maltese mt 0.146667\n",
"287 Somali so 0.053333"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Stampa Italiana = The Italian Press (Perth, WA : 1931 - 1932) (1380)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1881 | \n",
" Italian | \n",
" it | \n",
" 0.97 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1881 Italian it 0.97"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Suedaustralische Zeitung (Adelaide, SA : 1850 - 1851) (314)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 924 | \n",
" German | \n",
" de | \n",
" 0.888889 | \n",
"
\n",
" \n",
" 925 | \n",
" English | \n",
" en | \n",
" 0.111111 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"924 German de 0.888889\n",
"925 English en 0.111111"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Sunday News (Sydney, NSW : 1919) (623)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 290 | \n",
" English | \n",
" en | \n",
" 0.779221 | \n",
"
\n",
" \n",
" 289 | \n",
" Maltese | \n",
" mt | \n",
" 0.181818 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"290 English en 0.779221\n",
"289 Maltese mt 0.181818"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Sunday Times Edizione Italiana (Perth, WA : 1958 - 1959) (1379)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1888 | \n",
" Italian | \n",
" it | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1888 Italian it 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Süd Australische Zeitung (Tanunda and Adelaide, SA : 1860 - 1874) (278)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 922 | \n",
" German | \n",
" de | \n",
" 0.989691 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"922 German de 0.989691"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Advertiser (Adelaide, SA : 1889 - 1931) (34)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 927 | \n",
" English | \n",
" en | \n",
" 0.513889 | \n",
"
\n",
" \n",
" 928 | \n",
" Maltese | \n",
" mt | \n",
" 0.486111 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"927 English en 0.513889\n",
"928 Maltese mt 0.486111"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Australian Jewish News (Melbourne, Vic. : 1935 - 1999) (1685)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1473 | \n",
" English | \n",
" en | \n",
" 0.810526 | \n",
"
\n",
" \n",
" 1475 | \n",
" Yiddish | \n",
" yi | \n",
" 0.157895 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1473 English en 0.810526\n",
"1475 Yiddish yi 0.157895"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Castlereagh (Gilgandra, NSW : 1905 - 1907) (224)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 384 | \n",
" English | \n",
" en | \n",
" 0.609195 | \n",
"
\n",
" \n",
" 385 | \n",
" Somali | \n",
" so | \n",
" 0.310345 | \n",
"
\n",
" \n",
" 386 | \n",
" Maltese | \n",
" mt | \n",
" 0.080460 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"384 English en 0.609195\n",
"385 Somali so 0.310345\n",
"386 Maltese mt 0.080460"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Chinese Advertiser (Ballarat, Vic. : 1856) (706)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1504 | \n",
" Chinese | \n",
" zh | \n",
" 0.500000 | \n",
"
\n",
" \n",
" 1506 | \n",
" English | \n",
" en | \n",
" 0.333333 | \n",
"
\n",
" \n",
" 1505 | \n",
" Scottish Gaelic | \n",
" gd | \n",
" 0.166667 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1504 Chinese zh 0.500000\n",
"1506 English en 0.333333\n",
"1505 Scottish Gaelic gd 0.166667"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Derby News (WA : 1887) (1617)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1927 | \n",
" Maltese | \n",
" mt | \n",
" 0.75 | \n",
"
\n",
" \n",
" 1928 | \n",
" Corsican | \n",
" co | \n",
" 0.25 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1927 Maltese mt 0.75\n",
"1928 Corsican co 0.25"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The English and Chinese Advertiser (Vic. : 1856 - 1858) (685)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1522 | \n",
" English | \n",
" en | \n",
" 0.894737 | \n",
"
\n",
" \n",
" 1523 | \n",
" Chinese | \n",
" zh | \n",
" 0.052632 | \n",
"
\n",
" \n",
" 1524 | \n",
" Maltese | \n",
" mt | \n",
" 0.052632 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1522 English en 0.894737\n",
"1523 Chinese zh 0.052632\n",
"1524 Maltese mt 0.052632"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Hay Standard and Advertiser for Balranald, Wentworth, Maude...(Hay, NSW : 1871 - 1873; 1880 - 1881; 1890 - 1900) (725)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 441 | \n",
" English | \n",
" en | \n",
" 0.947368 | \n",
"
\n",
" \n",
" 442 | \n",
" Maltese | \n",
" mt | \n",
" 0.052632 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"441 English en 0.947368\n",
"442 Maltese mt 0.052632"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Herald of Tasmania (Hobart, Tas. : 1845) (1741)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1083 | \n",
" English | \n",
" en | \n",
" 0.857143 | \n",
"
\n",
" \n",
" 1085 | \n",
" Italian | \n",
" it | \n",
" 0.095238 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1083 English en 0.857143\n",
"1085 Italian it 0.095238"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Jewish Weekly News (Melbourne, Vic. : 1933 - 1935) (1707)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1535 | \n",
" English | \n",
" en | \n",
" 0.81 | \n",
"
\n",
" \n",
" 1536 | \n",
" Yiddish | \n",
" yi | \n",
" 0.19 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1535 English en 0.81\n",
"1536 Yiddish yi 0.19"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Melbourne Advertiser (Vic. : 1838) (935)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1550 | \n",
" English | \n",
" en | \n",
" 0.666667 | \n",
"
\n",
" \n",
" 1551 | \n",
" Welsh | \n",
" cy | \n",
" 0.333333 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1550 English en 0.666667\n",
"1551 Welsh cy 0.333333"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Mildura Irrigationist (Vic. : 1892 - 1893) (1583)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1565 | \n",
" Maltese | \n",
" mt | \n",
" 0.7625 | \n",
"
\n",
" \n",
" 1564 | \n",
" English | \n",
" en | \n",
" 0.1250 | \n",
"
\n",
" \n",
" 1566 | \n",
" Somali | \n",
" so | \n",
" 0.1125 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1565 Maltese mt 0.7625\n",
"1564 English en 0.1250\n",
"1566 Somali so 0.1125"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Mildura Irrigationist and Murray River Agricultural Times (Vic. : 1888) (1581)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1568 | \n",
" Maltese | \n",
" mt | \n",
" 0.626667 | \n",
"
\n",
" \n",
" 1569 | \n",
" English | \n",
" en | \n",
" 0.240000 | \n",
"
\n",
" \n",
" 1567 | \n",
" Somali | \n",
" so | \n",
" 0.133333 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1568 Maltese mt 0.626667\n",
"1569 English en 0.240000\n",
"1567 Somali so 0.133333"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Mildura Irrigationist and Murray River Cultural Advocate (Vic. : 1891 - 1892) (1582)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1570 | \n",
" English | \n",
" en | \n",
" 0.746667 | \n",
"
\n",
" \n",
" 1571 | \n",
" Somali | \n",
" so | \n",
" 0.146667 | \n",
"
\n",
" \n",
" 1572 | \n",
" Maltese | \n",
" mt | \n",
" 0.093333 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1570 English en 0.746667\n",
"1571 Somali so 0.146667\n",
"1572 Maltese mt 0.093333"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Miner's Right (Boulder, WA : 1897) (1638)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1984 | \n",
" English | \n",
" en | \n",
" 0.908163 | \n",
"
\n",
" \n",
" 1986 | \n",
" Maltese | \n",
" mt | \n",
" 0.061224 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1984 English en 0.908163\n",
"1986 Maltese mt 0.061224"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Morwell Advocate and Boolara and Mirboo Chronicle (Vic. : 1886) (1733)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1577 | \n",
" Maltese | \n",
" mt | \n",
" 0.625 | \n",
"
\n",
" \n",
" 1578 | \n",
" English | \n",
" en | \n",
" 0.375 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1577 Maltese mt 0.625\n",
"1578 English en 0.375"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Morwell Advocate and Narracan, Boolara and Mirboo Chronicle (Vic. : 1886) (1734)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1579 | \n",
" English | \n",
" en | \n",
" 0.829268 | \n",
"
\n",
" \n",
" 1580 | \n",
" Maltese | \n",
" mt | \n",
" 0.170732 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1579 English en 0.829268\n",
"1580 Maltese mt 0.170732"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Reporter (Box Hill, Vic. : 1889 - 1925) (244)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1594 | \n",
" English | \n",
" en | \n",
" 0.904255 | \n",
"
\n",
" \n",
" 1593 | \n",
" Maltese | \n",
" mt | \n",
" 0.085106 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1594 English en 0.904255\n",
"1593 Maltese mt 0.085106"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Richmond River Express and Casino Kyogle Advertiser (NSW : 1904 - 1929) (500)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 532 | \n",
" English | \n",
" en | \n",
" 0.827586 | \n",
"
\n",
" \n",
" 530 | \n",
" Maltese | \n",
" mt | \n",
" 0.126437 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"532 English en 0.827586\n",
"530 Maltese mt 0.126437"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"The Voice of Freedom = Elefthera Phoni (Perth, WA : 1956 - 1957) (1381)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 2064 | \n",
" Modern Greek (1453-) | \n",
" el | \n",
" 0.98 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"2064 Modern Greek (1453-) el 0.98"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"To Ethnico Vema = Greek National Tribune (Arncliffe, NSW : 1931 - 1954) (1592)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 626 | \n",
" Modern Greek (1453-) | \n",
" el | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"626 Modern Greek (1453-) el 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Tung Wah News (Sydney, NSW : 1898 - 1902) (1185)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 632 | \n",
" Chinese | \n",
" zh | \n",
" 0.94 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"632 Chinese zh 0.94"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Tung Wah Times (Sydney, NSW : 1901 - 1936) (1184)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 638 | \n",
" Chinese | \n",
" zh | \n",
" 0.926316 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"638 Chinese zh 0.926316"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Twofold Bay and Maneroo Observer (NSW : 1860) (394)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 645 | \n",
" English | \n",
" en | \n",
" 0.886364 | \n",
"
\n",
" \n",
" 647 | \n",
" Maltese | \n",
" mt | \n",
" 0.090909 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"645 English en 0.886364\n",
"647 Maltese mt 0.090909"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Uniamoci (Sydney, NSW : 1903 - 1904) (1599)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 652 | \n",
" Italian | \n",
" it | \n",
" 1.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"652 Italian it 1.0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Upper Hunter Courier (Murrurundi, NSW : 1871) (810)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 653 | \n",
" English | \n",
" en | \n",
" 0.857143 | \n",
"
\n",
" \n",
" 654 | \n",
" Maltese | \n",
" mt | \n",
" 0.142857 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"653 English en 0.857143\n",
"654 Maltese mt 0.142857"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Vesnik (Perth, WA : 1975 - 1994) (1382)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 2093 | \n",
" Macedonian | \n",
" mk | \n",
" 0.408163 | \n",
"
\n",
" \n",
" 2092 | \n",
" English | \n",
" en | \n",
" 0.357143 | \n",
"
\n",
" \n",
" 2094 | \n",
" Bulgarian | \n",
" bg-Latn | \n",
" 0.224490 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"2093 Macedonian mk 0.408163\n",
"2092 English en 0.357143\n",
"2094 Bulgarian bg-Latn 0.224490"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Vil'na Dumka = Free Thought (Sydney, NSW : 1949 - 1954) (1593)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 655 | \n",
" Ukrainian | \n",
" uk | \n",
" 0.82 | \n",
"
\n",
" \n",
" 656 | \n",
" English | \n",
" en | \n",
" 0.18 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"655 Ukrainian uk 0.82\n",
"656 English en 0.18"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Warwick Daily News (Qld. : 1919 -1954) (892)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 811 | \n",
" English | \n",
" en | \n",
" 0.864198 | \n",
"
\n",
" \n",
" 812 | \n",
" Maltese | \n",
" mt | \n",
" 0.111111 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"811 English en 0.864198\n",
"812 Maltese mt 0.111111"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Williamstown Trade Circular (Vic. : 1855 - 1856) (213)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" language_full | \n",
" language | \n",
" proportion | \n",
"
\n",
" \n",
" \n",
" \n",
" 1658 | \n",
" English | \n",
" en | \n",
" 0.882353 | \n",
"
\n",
" \n",
" 1659 | \n",
" Portuguese | \n",
" pt | \n",
" 0.117647 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" language_full language proportion\n",
"1658 English en 0.882353\n",
"1659 Portuguese pt 0.117647"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"for n, l in papers:\n",
" if not l.loc[(~df[\"language\"].isin([\"en\"])) & (df[\"proportion\"] >= 0.05)].empty:\n",
" print(f\"\\n{n[0]} ({n[1]})\")\n",
" display(\n",
" l[[\"language_full\", \"language\", \"proportion\"]]\n",
" .loc[(l[\"proportion\"] > 0.05)]\n",
" .sort_values(by=\"proportion\", ascending=False)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I went through the titles above and compiled a list of title identifiers that seem to be producing dodgy results. We can use this to filter these newspapers out of our results."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# Titles where dodgy OCR causes false positives in language detection\n",
"# This was manually created after scanning results\n",
"dodgy = [\n",
" \"1036\",\n",
" \"1043\",\n",
" \"1103\",\n",
" \"116\",\n",
" \"1207\",\n",
" \"1265\",\n",
" \"13\",\n",
" \"1320\",\n",
" \"1336\",\n",
" \"140\",\n",
" \"1400\",\n",
" \"145\",\n",
" \"1488\",\n",
" \"1543\",\n",
" \"1546\",\n",
" \"1581\",\n",
" \"1582\",\n",
" \"1583\",\n",
" \"1617\",\n",
" \"1623\",\n",
" \"1626\",\n",
" \"1638\",\n",
" \"1675\",\n",
" \"1678\",\n",
" \"171\",\n",
" \"1733\",\n",
" \"1734\",\n",
" \"1741\",\n",
" \"196\",\n",
" \"213\",\n",
" \"224\",\n",
" \"244\",\n",
" \"286\",\n",
" \"292\",\n",
" \"318\",\n",
" \"329\",\n",
" \"34\",\n",
" \"384\",\n",
" \"389\",\n",
" \"394\",\n",
" \"418\",\n",
" \"430\",\n",
" \"431\",\n",
" \"452\",\n",
" \"479\",\n",
" \"499\",\n",
" \"500\",\n",
" \"543\",\n",
" \"570\",\n",
" \"623\",\n",
" \"725\",\n",
" \"763\",\n",
" \"810\",\n",
" \"860\",\n",
" \"886\",\n",
" \"892\",\n",
" \"906\",\n",
" \"92\",\n",
" \"926\",\n",
" \"927\",\n",
" \"935\",\n",
" \"937\",\n",
" \"94\",\n",
" \"946\",\n",
" \"970\",\n",
" \"986\",\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we'll add the dodgy title ids into our filter. It seems that we have 52 newspapers with significant amounts of non-English content."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"52"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The filter removes titles that only have one language, which is English\n",
"filtered = (\n",
" df.loc[(~df[\"id\"].isin(dodgy)) & (df[\"proportion\"] >= 0.05)]\n",
" .groupby(by=[\"title\", \"id\"])\n",
" .filter(lambda x: (len(x) > 1) or (len(x) == 1 and x[\"language\"] != \"en\"))\n",
")\n",
"papers = filtered.groupby(by=[\"title\", \"id\"])\n",
"len(papers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's list them."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"A Voz de Timor (Dili, East Timor : 1970 - 1975)\n",
"Adelaider Deutsche Zeitung (SA : 1851 - 1862)\n",
"Australier Leben = Australian Life (Melbourne, Vic. : 1931 - 1933)\n",
"Australische Zeitung (Adelaide, SA : 1875 - 1916)\n",
"Berita Repoeblik (Djakarta, Indonesia : 1945 - 1946)\n",
"Chinese Republic News (Sydney, NSW : 1914 - 1937)\n",
"Chinese Times (Melbourne, Vic. : 1902 - 1922)\n",
"Chung Wah News (Perth, WA : 1981 - 1987)\n",
"Der Australische Spiegel = The Australian Mirror (Perth, WA : 1952)\n",
"Deutsch-Australische Post : Wochenschrift = German-Australian Post : Weekly (Sydney, NSW : 1893 - 1906)\n",
"Deutsche Zeitung für Sud-Australien = German Times for South Australia (Tanunda, SA : 1851)\n",
"Die Brucke = The Bridge (Sydney, NSW : 1934 - 1939)\n",
"Die Deutsche Post für die Australischen Colonien = The German Australian Post (Adelaide, SA : 1848 - 1851)\n",
"Dutch Australian Weekly (Sydney, NSW : 1951 - 1993)\n",
"Dutch Weekly (Sydney, NSW : 1993 - 2004)\n",
"Echo : Polski Tygodnik Niezalezny (Perth, WA : 1950 - 1952)\n",
"Eco Italiano (Perth, WA : 1958 - 1959)\n",
"Guang yi hua bao = The Chinese Australian Herald (Sydney, NSW : 1894 - 1923)\n",
"Hellenic Echo (Perth, WA : 1967 - 1968)\n",
"Il Canguro = The Kangaroo (Perth, WA : 1955 - 1957)\n",
"Il Giornale Italiano (Sydney, NSW : 1932 - 1940)\n",
"Il Risveglio = The Awakening (Sydney, NSW : 1944 - 1954)\n",
"Italian Bulletin of Australia (Sydney, NSW : 1922 - 1928, 1935 - 1940)\n",
"Italian Bulletin of Commerce (Sydney, NSW : 1929 - 1935)\n",
"Italo-Australian (Sydney, NSW : 1927 - 1940)\n",
"Japanese Perth Times (Subiaco, WA : 1989 - 1996)\n",
"L'Italo-Australiano = The Italo-Australian (Surry Hills, NSW : 1885)\n",
"L'Italo-Australiano = The Italo-Australian (Sydney, NSW : 1905 - 1909)\n",
"La Rondine (Perth, WA : 1970 - 1974; 1983 - 1984)\n",
"Le Courrier Australien (Sydney, NSW : 1892 - 2011)\n",
"Mediterranean Voice (Perth, WA : 1971 - 1972)\n",
"Meie Kodu = Our Home (Sydney, NSW : 1949 - 1956)\n",
"Musu Pastoge = Our Haven (Sydney, NSW : 1950 - 1954)\n",
"Nasza droga (Adelaide, SA : 1952 - 1954)\n",
"Norden (Melbourne, Vic. : 1914 - 1918)\n",
"Oceania (Sydney, NSW : 1913 - 1915)\n",
"Revue Australienne : Journal des Interets Francais en Australie ... (Sydney, NSW : 1873 - 1874)\n",
"Stampa Italiana = The Italian Press (Perth, WA : 1931 - 1932)\n",
"Suedaustralische Zeitung (Adelaide, SA : 1850 - 1851)\n",
"Sunday Times Edizione Italiana (Perth, WA : 1958 - 1959)\n",
"Süd Australische Zeitung (Tanunda and Adelaide, SA : 1860 - 1874)\n",
"The Australian Jewish News (Melbourne, Vic. : 1935 - 1999)\n",
"The Chinese Advertiser (Ballarat, Vic. : 1856)\n",
"The English and Chinese Advertiser (Vic. : 1856 - 1858)\n",
"The Jewish Weekly News (Melbourne, Vic. : 1933 - 1935)\n",
"The Voice of Freedom = Elefthera Phoni (Perth, WA : 1956 - 1957)\n",
"To Ethnico Vema = Greek National Tribune (Arncliffe, NSW : 1931 - 1954)\n",
"Tung Wah News (Sydney, NSW : 1898 - 1902)\n",
"Tung Wah Times (Sydney, NSW : 1901 - 1936)\n",
"Uniamoci (Sydney, NSW : 1903 - 1904)\n",
"Vesnik (Perth, WA : 1975 - 1994)\n",
"Vil'na Dumka = Free Thought (Sydney, NSW : 1949 - 1954)\n"
]
}
],
"source": [
"for n, l in papers:\n",
" print(n[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's looking pretty good. Let's save the results as a Markdown file to make it easy to explore. We'll include links into Trove. Here's the [list of all 52 newspapers](non-english-newspapers.md) (also as a [Gist](https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97))."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"tags": [
"nbval-skip"
]
},
"outputs": [],
"source": [
"with open(Path(\"non-english-newspapers.md\"), \"w\") as md_file:\n",
" i = 1\n",
" for n, l in papers:\n",
" md_file.write(\n",
" f\"\\n### {i}. [{n[0]}](http://nla.gov.au/nla.news-title{n[1]})\\n\\n\"\n",
" )\n",
" md_file.write(\"| Language | Language code | Proportion of sample |\\n\")\n",
" md_file.write(\"|---|---|---|\\n\")\n",
" for row in (\n",
" l[[\"language_full\", \"language\", \"proportion\"]]\n",
" .loc[(l[\"proportion\"] > 0.05)]\n",
" .sort_values(by=\"proportion\", ascending=False)\n",
" .itertuples()\n",
" ):\n",
" md_file.write(\n",
" f\"| {row.language_full} | {row.language} | {row.proportion} |\\n\"\n",
" )\n",
" i += 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you look at the Markdown files you'll see that there are still some dodgy results – for example, 16% of the *Chinese Advertiser* is detected as 'Scottish Gaelic'. But the point of this exercise was to find non-English newspapers, rather than accurately detect the proportion of non-English content, so I think we can live with it for now."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"\n",
"Created by [Tim Sherratt](https://timsherratt.org/) for the [GLAM Workbench](https://glam-workbench.github.io/). \n",
"Support this project by becoming a [GitHub sponsor](https://github.com/sponsors/wragge?o=esb)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {
"13abd8bf3b3a4efbbfef28059b5e2d28": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"32bea42ad3994f479facca131929d4ea": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {}
},
"460e5f112a1e44afaa87359a8f58e617": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HBoxModel",
"state": {
"children": [
"IPY_MODEL_f2903fe835d64a108adccc3163a545b9",
"IPY_MODEL_96fc52e23bf044c98a178dd686ed9c65",
"IPY_MODEL_c6ab6032dfd6449589467453e9768ccd"
],
"layout": "IPY_MODEL_32bea42ad3994f479facca131929d4ea"
}
},
"70b163754840435288248389045aa6a2": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {}
},
"751555ed8adf4bada10bb8f1dfc9849c": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "DescriptionStyleModel",
"state": {
"description_width": ""
}
},
"96fc52e23bf044c98a178dd686ed9c65": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "FloatProgressModel",
"state": {
"bar_style": "success",
"layout": "IPY_MODEL_fcea84e37cff4825b3bc9229f13e7483",
"max": 1741,
"style": "IPY_MODEL_bb0991de58bc4127a7a7855496015f93",
"value": 1741
}
},
"bb0991de58bc4127a7a7855496015f93": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "ProgressStyleModel",
"state": {
"description_width": ""
}
},
"c6ab6032dfd6449589467453e9768ccd": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_70b163754840435288248389045aa6a2",
"style": "IPY_MODEL_751555ed8adf4bada10bb8f1dfc9849c",
"value": " 1741/1741 [11:33<00:00, 1.47s/it]"
}
},
"f2903fe835d64a108adccc3163a545b9": {
"model_module": "@jupyter-widgets/controls",
"model_module_version": "1.5.0",
"model_name": "HTMLModel",
"state": {
"layout": "IPY_MODEL_fd7abe707d5c493fa9d9ab5643ddbfc9",
"style": "IPY_MODEL_13abd8bf3b3a4efbbfef28059b5e2d28",
"value": "100%"
}
},
"fcea84e37cff4825b3bc9229f13e7483": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {}
},
"fd7abe707d5c493fa9d9ab5643ddbfc9": {
"model_module": "@jupyter-widgets/base",
"model_module_version": "1.2.0",
"model_name": "LayoutModel",
"state": {}
}
},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}