{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Examining all files in RecordSearch with the access status of 'Closed'\n", "(Harvested on 1 January 2021)\n", "\n", "This notebook attempts some large-scale analysis of files from the National Archives of Australia's RecordSearch database that have the access status of 'closed'. For a previous attempt at this, see [Closed Access](http://closedaccess.herokuapp.com/). For more background, see my [*Inside Story* article](https://insidestory.org.au/withheld-pending-advice/) from 2018.\n", "\n", "See [this notebook](harvest_closed_files.ipynb) for the code used to harvest the data and create the CSV dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "\n", "import altair as alt\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The harvested data has been saved as a CSV file. First we'll open it up using Pandas." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifierseriescontrol_symboltitlecontents_date_strcontents_start_datecontents_end_datelocationaccess_statusaccess_decision_date_straccess_decision_datereasons
065454A3735117[Home Office suspect index]1946 - 19501946-01-011950-01-01CanberraClosed26 Nov 19801980-11-2633(1)(b)
166910A4321929/394 ATTACHMENT 2Abrahams. Opinions.1927 - 19281927-01-011928-01-01CanberraClosed30 Jul 20182018-07-3033(3)(a)(i)|33(3)(b)
266911A4321929/394 ATTACHMENT 3Legal opinions expressed by O Dixon, E Gorman ...1928 - 19281928-01-011928-01-01CanberraClosed30 Jul 20182018-07-3033(3)(a)(i)|33(3)(b)
399746A47149941[THOMAS Leslie Hector (Leading Aircraftman) : ...1943 - 19431943-01-011943-01-01CanberraClosed20 May 19991999-05-2033(1)(g)
4103094A518FJ118/6Nauru Census 19521952 - 19531952-01-011953-01-01CanberraClosed24 Oct 19891989-10-2433(1)(d)|33(1)(g)
\n", "
" ], "text/plain": [ " identifier series control_symbol \\\n", "0 65454 A373 5117 \n", "1 66910 A432 1929/394 ATTACHMENT 2 \n", "2 66911 A432 1929/394 ATTACHMENT 3 \n", "3 99746 A471 49941 \n", "4 103094 A518 FJ118/6 \n", "\n", " title contents_date_str \\\n", "0 [Home Office suspect index] 1946 - 1950 \n", "1 Abrahams. Opinions. 1927 - 1928 \n", "2 Legal opinions expressed by O Dixon, E Gorman ... 1928 - 1928 \n", "3 [THOMAS Leslie Hector (Leading Aircraftman) : ... 1943 - 1943 \n", "4 Nauru Census 1952 1952 - 1953 \n", "\n", " contents_start_date contents_end_date location access_status \\\n", "0 1946-01-01 1950-01-01 Canberra Closed \n", "1 1927-01-01 1928-01-01 Canberra Closed \n", "2 1928-01-01 1928-01-01 Canberra Closed \n", "3 1943-01-01 1943-01-01 Canberra Closed \n", "4 1952-01-01 1953-01-01 Canberra Closed \n", "\n", " access_decision_date_str access_decision_date reasons \n", "0 26 Nov 1980 1980-11-26 33(1)(b) \n", "1 30 Jul 2018 2018-07-30 33(3)(a)(i)|33(3)(b) \n", "2 30 Jul 2018 2018-07-30 33(3)(a)(i)|33(3)(b) \n", "3 20 May 1999 1999-05-20 33(1)(g) \n", "4 24 Oct 1989 1989-10-24 33(1)(d)|33(1)(g) " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020 = pd.read_csv(\n", " \"data/closed-20210101.csv\",\n", " parse_dates=[\"contents_start_date\", \"contents_end_date\", \"access_decision_date\"],\n", " keep_default_na=False,\n", ")\n", "df2020.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many closed files are there?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11140" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What series do the 'closed' files come from?\n", "\n", "First let's see how many different series are represented in the data set." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "686" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020[\"series\"].unique().shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look at the 25 most common series." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "K60 1671\n", "A1838 1571\n", "A13147 581\n", "A6122 322\n", "AWM54 309\n", "A1209 293\n", "A9737 232\n", "B26 196\n", "A1533 173\n", "D4082 162\n", "B73 158\n", "A6135 154\n", "E72 151\n", "F1 126\n", "AWM239 118\n", "A432 114\n", "PP946/1 112\n", "A7452 103\n", "A7324 85\n", "C4384 76\n", "C139 76\n", "A3092 73\n", "A2539 72\n", "A1200 72\n", "D1915 70\n", "Name: series, dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020[\"series\"].value_counts()[:25]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Series [A1838](http://recordsearch.naa.gov.au/scripts/AutoSearch.asp?Number=A1838) is familiar to anyone who's looked into the NAA's access examination process. It's a general correspondence series from DFAT, and requests for access tend to take a **long** time to be processed. Series [K60](http://recordsearch.naa.gov.au/scripts/AutoSearch.asp?Number=K60) contains repatriation files from the Department of Veterans' Affairs, so these will often been withheld on privacy grounds. We'll see more about both of these below.\n", "\n", "Let's chart the results." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# This creates a compact dataset to feed to Altair for charting\n", "# We could make Altair do all the work, but that would embed a lot of data in the notebook.\n", "\n", "# Save the series counts to a new dataframe.\n", "series_counts = df2020[\"series\"].value_counts().to_frame().reset_index()\n", "series_counts.columns = [\"series\", \"count\"]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Chart the results, sorted by number of files\n", "alt.Chart(series_counts[:50]).mark_bar().encode(\n", " x=alt.X(\"series\", sort=\"-y\"),\n", " y=alt.Y(\"count\", title=\"number of files\"),\n", " tooltip=[\"series\", \"count\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is only the top 50 of 686 series, so quite obviously there's a very long tail of series that have a small number of closed files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What reasons are given for closing the files?\n", "\n", "[Section 33](http://www8.austlii.edu.au/cgi-bin/viewdoc/au/legis/cth/consol_act/aa198398/s33.html) of the Archives Act defines a number of 'exemptions' – these are reasons why files should not be opened to public access. These reasons are recorded in RecordSearch, so we can explore why files have been closed. It's a little complicated, however, because multiple exemptions can be applied to a single file. The CSV data file records multiple reasons as a pipe-separated string. First we can look at the most common combinations of reasons." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "33(1)(g) 4573\n", "Withheld pending adv 3398\n", "Parliament Class A 1285\n", "33(1)(a) 497\n", "33(1)(a)|33(1)(b) 260\n", "Closed period 214\n", "33(1)(d)|33(1)(g) 149\n", "33(1)(a)|33(1)(d)|33(1)(g) 120\n", "Non Cwlth-no appeal 54\n", "33(1)(a)|33(1)(b)|Withheld pending adv 53\n", "33(1)(d) 50\n", " 49\n", "33(1)(a)|Withheld pending adv 42\n", "33(1)(a)|33(1)(d)|33(1)(e)(ii)|33(1)(g) 27\n", "33(1)(e)(ii) 27\n", "33(1)(a)|33(1)(d)|33(1)(e)(i)|33(1)(g) 25\n", "33(1)(e)(ii)|33(1)(g) 25\n", "33(3)(a)(i)|33(3)(b)|33(3)(a)(ii)|33(3)(b) 24\n", "33(1)(d)|33(1)(e)(iii)|33(1)(g) 19\n", "33(2)(a)|33(2)(b) 15\n", "33(1)(b) 15\n", "NRF 14\n", "33(3)(a)(i)|33(3)(b) 13\n", "33(1)(a)|33(1)(b)|33(1)(e)(iii) 9\n", "Court records 9\n", "Name: reasons, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020[\"reasons\"].value_counts()[:25]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's probably more useful, however, to look at the frequency of individual reasons. So we'll split the pip-separated string and create a row for each file/reason combination." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "df2020_reasons = df2020.copy()\n", "# Split the reasons field on pipe symbol |. This turns the string into a list of values.\n", "df2020_reasons[\"reason\"] = df2020_reasons[\"reasons\"].str.split(\"|\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Now we'll explode the list into separate rows.\n", "df2020_reasons = df2020_reasons.explode(\"reason\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can look at the frequency of individual reasons. Not, of course, that the sum of the reasons will be greater than the number of files, as some files have multiple exemptions applied to them." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "33(1)(g) 5003\n", "Withheld pending adv 3524\n", "Parliament Class A 1286\n", "33(1)(a) 1096\n", "33(1)(d) 429\n", "33(1)(b) 362\n", "Closed period 239\n", "33(1)(e)(ii) 110\n", "33(3)(b) 66\n", "Non Cwlth-no appeal 60\n", " 49\n", "33(1)(e)(iii) 46\n", "33(3)(a)(i) 38\n", "33(1)(e)(i) 37\n", "33(1)(j) 30\n", "33(3)(a)(ii) 28\n", "33(2)(a) 24\n", "33(2)(b) 24\n", "NRF 15\n", "MAKE YOUR SELECTION 12\n", "Non Cwlth-depositor 10\n", "Court records 9\n", "33(1)(f)(i) 7\n", "33(1)(f)(ii) 5\n", "33(1)(f)(iii) 4\n", "33(1)(h) 4\n", "33(1)(c) 3\n", "Destroyed 2\n", "Name: reason, dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020_reasons[\"reason\"].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reasons starting with '33' are clauses in [section 33](http://www8.austlii.edu.au/cgi-bin/viewdoc/au/legis/cth/consol_act/aa198398/s33.html) of the Archives Act. You can look up the Act to find out more about them, or [look at this list](https://www.naa.gov.au/help-your-research/using-collection/access-records-under-archives-act/why-we-refuse-access) on the NAA website. Some of the reasons, such as 'Parliament Class A' refer to particular types of records that are not subject to the same public access arrangements as other government records. Others, such as 'MAKE YOUR SELECTION' seem to be products of the data entry system!\n", "\n", "Looking at the other most common reasons:\n", "\n", "* 33(1)(g) relates to privacy\n", "* 'Withheld pending adv' is applied to files that are undergoing access examination and have been referred to the relevant government agency for advice on whether they should be released to the public. This is not a final determination – these files may or may not end up being closed. But, as any researcher knows, this process can be *very* slow.\n", "* 33(1)(a) is the national security catch-all\n", "\n", "You might also notice that there's a blank line in the list above. This is because some closed files have no reasons recorded in RecordSearch. We can check this." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "49" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_reasons = df2020.loc[df2020[\"reasons\"] == \"\"]\n", "missing_reasons.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 46 closed files with no reason recorded. Here's a sample." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifierseriescontrol_symboltitlecontents_date_strcontents_start_datecontents_end_datelocationaccess_statusaccess_decision_date_straccess_decision_datereasons
612546435A1838919/13/4 PART 31France - Disarmament - Nuclear Weapons Testing1972 - 19721972-01-011972-01-01CanberraClosed07 Oct 20162016-10-07
661548392A18383127/3/4South Korea - Labour1959 - 19681959-01-011968-01-01CanberraClosed11 Dec 20142014-12-11
862567499A1838563/2/16 PART 9Radio Australia - Technical - Foreign broadcas...1961 - 19611961-01-011961-01-01CanberraClosed29 Aug 20122012-08-29
1351733600AWM239178[RAN Medical Officers' journals] PENGUIN (1 Ap...1945 - 19451945-01-011945-01-01Australian War MemorialClosed14 Apr 20032003-04-14
1560853102AWM103R478/1/147[Headquarters, 1st Australian Task Force (HQ 1...1970 - 19701970-01-011970-01-01Australian War MemorialClosed22 Jun 20092009-06-22
\n", "
" ], "text/plain": [ " identifier series control_symbol \\\n", "612 546435 A1838 919/13/4 PART 31 \n", "661 548392 A1838 3127/3/4 \n", "862 567499 A1838 563/2/16 PART 9 \n", "1351 733600 AWM239 178 \n", "1560 853102 AWM103 R478/1/147 \n", "\n", " title contents_date_str \\\n", "612 France - Disarmament - Nuclear Weapons Testing 1972 - 1972 \n", "661 South Korea - Labour 1959 - 1968 \n", "862 Radio Australia - Technical - Foreign broadcas... 1961 - 1961 \n", "1351 [RAN Medical Officers' journals] PENGUIN (1 Ap... 1945 - 1945 \n", "1560 [Headquarters, 1st Australian Task Force (HQ 1... 1970 - 1970 \n", "\n", " contents_start_date contents_end_date location \\\n", "612 1972-01-01 1972-01-01 Canberra \n", "661 1959-01-01 1968-01-01 Canberra \n", "862 1961-01-01 1961-01-01 Canberra \n", "1351 1945-01-01 1945-01-01 Australian War Memorial \n", "1560 1970-01-01 1970-01-01 Australian War Memorial \n", "\n", " access_status access_decision_date_str access_decision_date reasons \n", "612 Closed 07 Oct 2016 2016-10-07 \n", "661 Closed 11 Dec 2014 2014-12-11 \n", "862 Closed 29 Aug 2012 2012-08-29 \n", "1351 Closed 14 Apr 2003 2003-04-14 \n", "1560 Closed 22 Jun 2009 2009-06-22 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_reasons.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's change the missing reasons to 'None recorded' to make it easier to see what's going on." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "df2020_reasons[\"reason\"].replace(\"\", \"None recorded\", inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's chart the frequency of the different reasons." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Once again we'll create a compact dataset for charting\n", "reason_counts = df2020_reasons[\"reason\"].value_counts().to_frame().reset_index()\n", "reason_counts.columns = [\"reason\", \"count\"]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make the Chart\n", "alt.Chart(reason_counts).mark_bar().encode(\n", " x=\"reason\", y=alt.Y(\"count\", title=\"number of files\"), tooltip=[\"reason\", \"count\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connecting reasons and series\n", "\n", "It would be interesting to bring together the analyses above and see how reasons are distributed across series. First we need to reshape our dataset to show combinations of series and reasons." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Group files by series and reason, then count the number of combinations\n", "series_reasons_counts = (\n", " df2020_reasons.groupby(by=[\"series\", \"reason\"]).size().reset_index()\n", ")\n", "# Rename columns\n", "series_reasons_counts.columns = [\"series\", \"reason\", \"count\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can chart the results. Once again we'll show the number of files in the 50 most common series, but this time we'll highlight the reasons using color." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(series_reasons_counts).transform_aggregate(\n", " count=\"sum(count)\",\n", " groupby=[\"series\", \"reason\"]\n", " # Sort by number of files\n", ").transform_window(\n", " rank=\"rank(count)\",\n", " sort=[alt.SortField(\"count\", order=\"descending\")]\n", " # Get the top 50\n", ").transform_filter(\n", " alt.datum.rank < 50\n", ").mark_bar().encode(\n", " x=alt.X(\"series\", sort=\"-y\"),\n", " y=alt.Y(\"sum(count)\", title=\"number of files\", axis=alt.Axis(grid=False)),\n", " color=\"reason\",\n", " tooltip=[\"series\", \"reason\", \"count\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can see that the distribution of reasons varies considerably across series." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How old are these files?\n", "\n", "You would think that the sensitivity of material in closed files diminishes over time. However, there's no automatic re-assessment or time limit on 'closed' files. They stay closed until someone asks for them to be re-examined. That means that some of these files can be quite old. How old? We can use the contents end date to explore this." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Normalise contents end values as end of year\n", "df2020[\"contents_end_year\"] = df2020[\"contents_end_date\"].apply(\n", " lambda x: datetime.datetime(x.year, 12, 31)\n", ")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "date_counts = df2020[\"contents_end_year\"].value_counts().to_frame().reset_index()\n", "date_counts.columns = [\"end_date\", \"count\"]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(date_counts).mark_bar().encode(x=\"year(end_date):T\", y=\"count\").properties(\n", " width=700\n", ")" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(date_counts.loc[date_counts[\"end_date\"] > \"1890-12-31\"]).mark_bar().encode(\n", " x=\"year(end_date):T\", y=\"count\", tooltip=\"year(end_date)\"\n", ").properties(width=700)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "df2020[\"years_old\"] = df2020[\"contents_end_year\"].apply(\n", " lambda x: round((datetime.datetime.now() - x).days / 365)\n", ")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "age_counts = df2020[\"years_old\"].value_counts().to_frame().reset_index()\n", "age_counts.columns = [\"age\", \"count\"]\n", "alt.Chart(age_counts.loc[age_counts[\"age\"] < 130]).mark_bar().encode(\n", " x=alt.X(\"age:Q\", title=\"age in years\"),\n", " y=alt.Y(\"count\", title=\"number of files\"),\n", " tooltip=[\"age\", \"count\"],\n", ").properties(width=700)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 11140.000000\n", "mean 51.081598\n", "std 19.422067\n", "min 6.000000\n", "25% 36.000000\n", "50% 49.000000\n", "75% 65.000000\n", "max 222.000000\n", "Name: years_old, dtype: float64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020[\"years_old\"].describe()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 1096.000000\n", "mean 63.380474\n", "std 12.805262\n", "min 22.000000\n", "25% 58.000000\n", "50% 66.000000\n", "75% 73.000000\n", "max 96.000000\n", "Name: years_old, dtype: float64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020.loc[df2020[\"reasons\"].str.contains(\"33(1)(a)\", regex=False)][\n", " \"years_old\"\n", "].describe()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[36.0, 49.0, 65.0]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2020[\"years_old\"].quantile([0.25, 0.5, 0.75]).to_list()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dates of decisions" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "df2020[\"year\"] = df2020[\"access_decision_date\"].dt.year\n", "year_counts = df2020[\"year\"].value_counts().to_frame().reset_index()\n", "year_counts.columns = [\"year\", \"count\"]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(year_counts).mark_bar().encode(x=\"year:O\", y=\"count\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 33(1)(a)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifierseriescontrol_symboltitlecontents_date_strcontents_start_datecontents_end_datelocationaccess_statusaccess_decision_date_straccess_decision_datereasonscontents_end_yearyears_oldyear
5140757A18383034/2/2/2 PART 7Indonesia. Communism in Indonesia1960 - 19621960-01-011962-01-01CanberraClosed14 May 20122012-05-1433(1)(a)|33(1)(b)|Withheld pending adv1962-12-31602012
7170971A81641/301/195Exchange of staff between Joint Intelligence B...1953 - 19581953-01-011958-01-01CanberraClosed29 Apr 19911991-04-2933(1)(a)|33(1)(b)1958-12-31641991
8171089A81643/302/76Cryptographic Material for ASIO1951 - 19521951-01-011952-01-01CanberraClosed11 Mar 19931993-03-1133(1)(a)|33(1)(b)1952-12-31701993
9171129A81644/301/219SEATO [South East Asia Treaty Organisation] Co...1957 - 19571957-01-011957-01-01CanberraClosed01 Aug 19911991-08-0133(1)(a)|33(1)(b)1957-12-31651991
12200166A119629/501/225Evasion of Customs Regulations - RAAF Station,...1944 - 19441944-01-011944-01-01CanberraClosed09 Apr 19751975-04-0933(1)(a)|33(1)(b)1944-12-31781975
\n", "
" ], "text/plain": [ " identifier series control_symbol \\\n", "5 140757 A1838 3034/2/2/2 PART 7 \n", "7 170971 A816 41/301/195 \n", "8 171089 A816 43/302/76 \n", "9 171129 A816 44/301/219 \n", "12 200166 A1196 29/501/225 \n", "\n", " title contents_date_str \\\n", "5 Indonesia. Communism in Indonesia 1960 - 1962 \n", "7 Exchange of staff between Joint Intelligence B... 1953 - 1958 \n", "8 Cryptographic Material for ASIO 1951 - 1952 \n", "9 SEATO [South East Asia Treaty Organisation] Co... 1957 - 1957 \n", "12 Evasion of Customs Regulations - RAAF Station,... 1944 - 1944 \n", "\n", " contents_start_date contents_end_date location access_status \\\n", "5 1960-01-01 1962-01-01 Canberra Closed \n", "7 1953-01-01 1958-01-01 Canberra Closed \n", "8 1951-01-01 1952-01-01 Canberra Closed \n", "9 1957-01-01 1957-01-01 Canberra Closed \n", "12 1944-01-01 1944-01-01 Canberra Closed \n", "\n", " access_decision_date_str access_decision_date \\\n", "5 14 May 2012 2012-05-14 \n", "7 29 Apr 1991 1991-04-29 \n", "8 11 Mar 1993 1993-03-11 \n", "9 01 Aug 1991 1991-08-01 \n", "12 09 Apr 1975 1975-04-09 \n", "\n", " reasons contents_end_year years_old year \n", "5 33(1)(a)|33(1)(b)|Withheld pending adv 1962-12-31 60 2012 \n", "7 33(1)(a)|33(1)(b) 1958-12-31 64 1991 \n", "8 33(1)(a)|33(1)(b) 1952-12-31 70 1993 \n", "9 33(1)(a)|33(1)(b) 1957-12-31 65 1991 \n", "12 33(1)(a)|33(1)(b) 1944-12-31 78 1975 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df331a = df2020.loc[df2020[\"reasons\"].str.contains(\"33(1)(a)\", regex=False)]\n", "df331a.head()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "series_counts_331a = df331a[\"series\"].value_counts().to_frame().reset_index()\n", "series_counts_331a.columns = [\"series\", \"count\"]" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(series_counts_331a[:50]).mark_bar().encode(\n", " x=alt.X(\"series\", sort=\"-y\"),\n", " y=alt.Y(\"count\", title=\"number of files\"),\n", " tooltip=[\"series\", \"count\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Withheld pending advice" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifierseriescontrol_symboltitlecontents_date_strcontents_start_datecontents_end_datelocationaccess_statusaccess_decision_date_straccess_decision_datereasonscontents_end_yearyears_oldyear
5140757A18383034/2/2/2 PART 7Indonesia. Communism in Indonesia1960 - 19621960-01-011962-01-01CanberraClosed14 May 20122012-05-1433(1)(a)|33(1)(b)|Withheld pending adv1962-12-31602012
10171205A81648/301/131Inland tele-radio channels in Australia, Papua...1950 - 19531950-01-011953-01-01CanberraClosed16 Aug 20182018-08-16Withheld pending adv1953-12-31692018
11199284A11962/501/295Provision of Capacity for the Manufacture of n...1952 - 19551952-01-011955-01-01CanberraClosed04 Mar 20202020-03-04Withheld pending adv1955-12-31672020
13200647A119636/501/729 PART 2RAAF Component of the Strategic Reserve- Execu...1958 - 19581958-01-011958-01-01CanberraClosed02 Nov 20162016-11-02Withheld pending adv1958-12-31642016
14200648A119636/501/729 PART 3RAAF Component Strategic Reserve. (Execution o...1958 - 19591958-01-011959-01-01CanberraClosed02 Nov 20162016-11-02Withheld pending adv1959-12-31632016
\n", "
" ], "text/plain": [ " identifier series control_symbol \\\n", "5 140757 A1838 3034/2/2/2 PART 7 \n", "10 171205 A816 48/301/131 \n", "11 199284 A1196 2/501/295 \n", "13 200647 A1196 36/501/729 PART 2 \n", "14 200648 A1196 36/501/729 PART 3 \n", "\n", " title contents_date_str \\\n", "5 Indonesia. Communism in Indonesia 1960 - 1962 \n", "10 Inland tele-radio channels in Australia, Papua... 1950 - 1953 \n", "11 Provision of Capacity for the Manufacture of n... 1952 - 1955 \n", "13 RAAF Component of the Strategic Reserve- Execu... 1958 - 1958 \n", "14 RAAF Component Strategic Reserve. (Execution o... 1958 - 1959 \n", "\n", " contents_start_date contents_end_date location access_status \\\n", "5 1960-01-01 1962-01-01 Canberra Closed \n", "10 1950-01-01 1953-01-01 Canberra Closed \n", "11 1952-01-01 1955-01-01 Canberra Closed \n", "13 1958-01-01 1958-01-01 Canberra Closed \n", "14 1958-01-01 1959-01-01 Canberra Closed \n", "\n", " access_decision_date_str access_decision_date \\\n", "5 14 May 2012 2012-05-14 \n", "10 16 Aug 2018 2018-08-16 \n", "11 04 Mar 2020 2020-03-04 \n", "13 02 Nov 2016 2016-11-02 \n", "14 02 Nov 2016 2016-11-02 \n", "\n", " reasons contents_end_year years_old year \n", "5 33(1)(a)|33(1)(b)|Withheld pending adv 1962-12-31 60 2012 \n", "10 Withheld pending adv 1953-12-31 69 2018 \n", "11 Withheld pending adv 1955-12-31 67 2020 \n", "13 Withheld pending adv 1958-12-31 64 2016 \n", "14 Withheld pending adv 1959-12-31 63 2016 " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfwh = df2020.loc[df2020[\"reasons\"].str.contains(\"Withheld pending adv\", regex=False)]\n", "dfwh.head()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "series_counts_wh = dfwh[\"series\"].value_counts().to_frame().reset_index()\n", "series_counts_wh.columns = [\"series\", \"count\"]" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(series_counts_wh[:50]).mark_bar().encode(\n", " x=alt.X(\"series\", sort=\"-y\"), y=\"count\"\n", ")" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_596751/1245189278.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " dfwh[\"wait_days\"] = dfwh[\"access_decision_date\"].apply(\n" ] } ], "source": [ "dfwh[\"wait_days\"] = dfwh[\"access_decision_date\"].apply(\n", " lambda x: round((datetime.datetime.now() - x).days)\n", ")" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_596751/1295627524.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " dfwh[\"wait_years\"] = dfwh[\"access_decision_date\"].apply(\n" ] } ], "source": [ "dfwh[\"wait_years\"] = dfwh[\"access_decision_date\"].apply(\n", " lambda x: round((datetime.datetime.now() - x).days / 365)\n", ")" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 652\n", "10 499\n", "7 384\n", "6 376\n", "8 334\n", "2 325\n", "3 275\n", "4 252\n", "11 243\n", "9 114\n", "12 44\n", "13 11\n", "32 2\n", "24 2\n", "19 2\n", "33 1\n", "22 1\n", "30 1\n", "14 1\n", "27 1\n", "31 1\n", "17 1\n", "16 1\n", "15 1\n", "Name: wait_years, dtype: int64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfwh[\"wait_years\"].value_counts()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifierseriescontrol_symboltitlecontents_date_strcontents_start_datecontents_end_datelocationaccess_statusaccess_decision_date_straccess_decision_datereasonscontents_end_yearyears_oldyearwait_dayswait_years
42271461A183869/1/3/7 PART 2USSR Relations with Australia Activities of Au...1963 - 19641963-01-011964-01-01CanberraClosed01 Nov 20112011-11-01Withheld pending adv1964-12-31582011391311
53273271A1838162/11/52 PART 1Congo. Relations with Other Countries - USSR1960 - 19601960-01-011960-01-01CanberraClosed29 Nov 20112011-11-29Withheld pending adv1960-12-31622011388511
54273272A1838162/11/73 PART 2Congo. Relations with Other Countries. Congo -...1960 - 19611960-01-011961-01-01CanberraClosed29 Nov 20112011-11-29Withheld pending adv1961-12-31612011388511
60277191A6980S250793Non-British European Migration from China Part 61968 - 19791968-01-011979-01-01CanberraClosed04 Aug 20102010-08-04Withheld pending adv1979-12-31432010436712
74302544A12091957/4254ANZUS Council meeting - Washington, November 19561956 - 19561956-01-011956-01-01CanberraClosed06 Apr 20102010-04-0633(1)(a)|33(1)(b)|Withheld pending adv1956-12-31662010448712
......................................................
894130091158A1238120/47/1RCIS - Prime Minister Mr Fraser 19761975 - 19761975-01-011976-01-01CanberraClosed13 Apr 20102010-04-13Withheld pending adv1976-12-31462010448012
905930713375A97371991/766 PART 1French Polynesia - nuclear testing25 Jul 1978 - 29 May 19911978-07-251991-05-29CanberraClosed16 Jan 20122012-01-16Withheld pending adv1991-12-31312012383711
907330714039A97371990/1314 PART 1French nuclear testing27 Feb 1973 - 15 Mar 19951973-02-271995-03-15CanberraClosed15 Dec 20112011-12-15Withheld pending adv1995-12-31272011386911
934031162900A462626Department - Attorney-General's18 Jul 1975 - 14 Apr 19761975-07-181976-04-14CanberraClosed29 Jun 20112011-06-29Withheld pending adv1976-12-31462011403811
934131162905A462631Department of Foreign Affairs24 Jun 1974 - 18 Feb 19761974-06-241976-02-18CanberraClosed29 Jun 20112011-06-29Withheld pending adv1976-12-31462011403811
\n", "

313 rows × 17 columns

\n", "
" ], "text/plain": [ " identifier series control_symbol \\\n", "42 271461 A1838 69/1/3/7 PART 2 \n", "53 273271 A1838 162/11/52 PART 1 \n", "54 273272 A1838 162/11/73 PART 2 \n", "60 277191 A6980 S250793 \n", "74 302544 A1209 1957/4254 \n", "... ... ... ... \n", "8941 30091158 A12381 20/47/1 \n", "9059 30713375 A9737 1991/766 PART 1 \n", "9073 30714039 A9737 1990/1314 PART 1 \n", "9340 31162900 A4626 26 \n", "9341 31162905 A4626 31 \n", "\n", " title \\\n", "42 USSR Relations with Australia Activities of Au... \n", "53 Congo. Relations with Other Countries - USSR \n", "54 Congo. Relations with Other Countries. Congo -... \n", "60 Non-British European Migration from China Part 6 \n", "74 ANZUS Council meeting - Washington, November 1956 \n", "... ... \n", "8941 RCIS - Prime Minister Mr Fraser 1976 \n", "9059 French Polynesia - nuclear testing \n", "9073 French nuclear testing \n", "9340 Department - Attorney-General's \n", "9341 Department of Foreign Affairs \n", "\n", " contents_date_str contents_start_date contents_end_date \\\n", "42 1963 - 1964 1963-01-01 1964-01-01 \n", "53 1960 - 1960 1960-01-01 1960-01-01 \n", "54 1960 - 1961 1960-01-01 1961-01-01 \n", "60 1968 - 1979 1968-01-01 1979-01-01 \n", "74 1956 - 1956 1956-01-01 1956-01-01 \n", "... ... ... ... \n", "8941 1975 - 1976 1975-01-01 1976-01-01 \n", "9059 25 Jul 1978 - 29 May 1991 1978-07-25 1991-05-29 \n", "9073 27 Feb 1973 - 15 Mar 1995 1973-02-27 1995-03-15 \n", "9340 18 Jul 1975 - 14 Apr 1976 1975-07-18 1976-04-14 \n", "9341 24 Jun 1974 - 18 Feb 1976 1974-06-24 1976-02-18 \n", "\n", " location access_status access_decision_date_str access_decision_date \\\n", "42 Canberra Closed 01 Nov 2011 2011-11-01 \n", "53 Canberra Closed 29 Nov 2011 2011-11-29 \n", "54 Canberra Closed 29 Nov 2011 2011-11-29 \n", "60 Canberra Closed 04 Aug 2010 2010-08-04 \n", "74 Canberra Closed 06 Apr 2010 2010-04-06 \n", "... ... ... ... ... \n", "8941 Canberra Closed 13 Apr 2010 2010-04-13 \n", "9059 Canberra Closed 16 Jan 2012 2012-01-16 \n", "9073 Canberra Closed 15 Dec 2011 2011-12-15 \n", "9340 Canberra Closed 29 Jun 2011 2011-06-29 \n", "9341 Canberra Closed 29 Jun 2011 2011-06-29 \n", "\n", " reasons contents_end_year years_old \\\n", "42 Withheld pending adv 1964-12-31 58 \n", "53 Withheld pending adv 1960-12-31 62 \n", "54 Withheld pending adv 1961-12-31 61 \n", "60 Withheld pending adv 1979-12-31 43 \n", "74 33(1)(a)|33(1)(b)|Withheld pending adv 1956-12-31 66 \n", "... ... ... ... \n", "8941 Withheld pending adv 1976-12-31 46 \n", "9059 Withheld pending adv 1991-12-31 31 \n", "9073 Withheld pending adv 1995-12-31 27 \n", "9340 Withheld pending adv 1976-12-31 46 \n", "9341 Withheld pending adv 1976-12-31 46 \n", "\n", " year wait_days wait_years \n", "42 2011 3913 11 \n", "53 2011 3885 11 \n", "54 2011 3885 11 \n", "60 2010 4367 12 \n", "74 2010 4487 12 \n", "... ... ... ... \n", "8941 2010 4480 12 \n", "9059 2012 3837 11 \n", "9073 2011 3869 11 \n", "9340 2011 4038 11 \n", "9341 2011 4038 11 \n", "\n", "[313 rows x 17 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfwh.loc[dfwh[\"wait_years\"] > 10]" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 3524.000000\n", "mean 6.547673\n", "std 3.039012\n", "min 2.000000\n", "25% 5.000000\n", "50% 6.000000\n", "75% 9.000000\n", "max 33.000000\n", "Name: wait_years, dtype: float64" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfwh[\"wait_years\"].describe()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "dfwhs = df2020.loc[df2020[\"reasons\"] == \"Withheld pending adv\"]" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_596751/1357762301.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " dfwhs[\"wait_years\"] = dfwhs[\"access_decision_date\"].apply(\n" ] } ], "source": [ "dfwhs[\"wait_years\"] = dfwhs[\"access_decision_date\"].apply(\n", " lambda x: round((datetime.datetime.now() - x).days / 365)\n", ")" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 3398.000000\n", "mean 6.429665\n", "std 2.778587\n", "min 2.000000\n", "25% 5.000000\n", "50% 6.000000\n", "75% 8.000000\n", "max 24.000000\n", "Name: wait_years, dtype: float64" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfwhs[\"wait_years\"].describe()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifierseriescontrol_symboltitlecontents_date_strcontents_start_datecontents_end_datelocationaccess_statusaccess_decision_date_straccess_decision_datereasonscontents_end_yearyears_oldyearwait_years
42271461A183869/1/3/7 PART 2USSR Relations with Australia Activities of Au...1963 - 19641963-01-011964-01-01CanberraClosed01 Nov 20112011-11-01Withheld pending adv1964-12-3158201111
53273271A1838162/11/52 PART 1Congo. Relations with Other Countries - USSR1960 - 19601960-01-011960-01-01CanberraClosed29 Nov 20112011-11-29Withheld pending adv1960-12-3162201111
54273272A1838162/11/73 PART 2Congo. Relations with Other Countries. Congo -...1960 - 19611960-01-011961-01-01CanberraClosed29 Nov 20112011-11-29Withheld pending adv1961-12-3161201111
60277191A6980S250793Non-British European Migration from China Part 61968 - 19791968-01-011979-01-01CanberraClosed04 Aug 20102010-08-04Withheld pending adv1979-12-3143201012
622546984A1838938/17/7/1 PART 2United Nations - Law of the Sea - Pollution of...1971 - 19721971-01-011972-01-01CanberraClosed21 Sep 20112011-09-21Withheld pending adv1972-12-3150201111
...................................................
894130091158A1238120/47/1RCIS - Prime Minister Mr Fraser 19761975 - 19761975-01-011976-01-01CanberraClosed13 Apr 20102010-04-13Withheld pending adv1976-12-3146201012
905930713375A97371991/766 PART 1French Polynesia - nuclear testing25 Jul 1978 - 29 May 19911978-07-251991-05-29CanberraClosed16 Jan 20122012-01-16Withheld pending adv1991-12-3131201211
907330714039A97371990/1314 PART 1French nuclear testing27 Feb 1973 - 15 Mar 19951973-02-271995-03-15CanberraClosed15 Dec 20112011-12-15Withheld pending adv1995-12-3127201111
934031162900A462626Department - Attorney-General's18 Jul 1975 - 14 Apr 19761975-07-181976-04-14CanberraClosed29 Jun 20112011-06-29Withheld pending adv1976-12-3146201111
934131162905A462631Department of Foreign Affairs24 Jun 1974 - 18 Feb 19761974-06-241976-02-18CanberraClosed29 Jun 20112011-06-29Withheld pending adv1976-12-3146201111
\n", "

282 rows × 16 columns

\n", "
" ], "text/plain": [ " identifier series control_symbol \\\n", "42 271461 A1838 69/1/3/7 PART 2 \n", "53 273271 A1838 162/11/52 PART 1 \n", "54 273272 A1838 162/11/73 PART 2 \n", "60 277191 A6980 S250793 \n", "622 546984 A1838 938/17/7/1 PART 2 \n", "... ... ... ... \n", "8941 30091158 A12381 20/47/1 \n", "9059 30713375 A9737 1991/766 PART 1 \n", "9073 30714039 A9737 1990/1314 PART 1 \n", "9340 31162900 A4626 26 \n", "9341 31162905 A4626 31 \n", "\n", " title \\\n", "42 USSR Relations with Australia Activities of Au... \n", "53 Congo. Relations with Other Countries - USSR \n", "54 Congo. Relations with Other Countries. Congo -... \n", "60 Non-British European Migration from China Part 6 \n", "622 United Nations - Law of the Sea - Pollution of... \n", "... ... \n", "8941 RCIS - Prime Minister Mr Fraser 1976 \n", "9059 French Polynesia - nuclear testing \n", "9073 French nuclear testing \n", "9340 Department - Attorney-General's \n", "9341 Department of Foreign Affairs \n", "\n", " contents_date_str contents_start_date contents_end_date \\\n", "42 1963 - 1964 1963-01-01 1964-01-01 \n", "53 1960 - 1960 1960-01-01 1960-01-01 \n", "54 1960 - 1961 1960-01-01 1961-01-01 \n", "60 1968 - 1979 1968-01-01 1979-01-01 \n", "622 1971 - 1972 1971-01-01 1972-01-01 \n", "... ... ... ... \n", "8941 1975 - 1976 1975-01-01 1976-01-01 \n", "9059 25 Jul 1978 - 29 May 1991 1978-07-25 1991-05-29 \n", "9073 27 Feb 1973 - 15 Mar 1995 1973-02-27 1995-03-15 \n", "9340 18 Jul 1975 - 14 Apr 1976 1975-07-18 1976-04-14 \n", "9341 24 Jun 1974 - 18 Feb 1976 1974-06-24 1976-02-18 \n", "\n", " location access_status access_decision_date_str access_decision_date \\\n", "42 Canberra Closed 01 Nov 2011 2011-11-01 \n", "53 Canberra Closed 29 Nov 2011 2011-11-29 \n", "54 Canberra Closed 29 Nov 2011 2011-11-29 \n", "60 Canberra Closed 04 Aug 2010 2010-08-04 \n", "622 Canberra Closed 21 Sep 2011 2011-09-21 \n", "... ... ... ... ... \n", "8941 Canberra Closed 13 Apr 2010 2010-04-13 \n", "9059 Canberra Closed 16 Jan 2012 2012-01-16 \n", "9073 Canberra Closed 15 Dec 2011 2011-12-15 \n", "9340 Canberra Closed 29 Jun 2011 2011-06-29 \n", "9341 Canberra Closed 29 Jun 2011 2011-06-29 \n", "\n", " reasons contents_end_year years_old year wait_years \n", "42 Withheld pending adv 1964-12-31 58 2011 11 \n", "53 Withheld pending adv 1960-12-31 62 2011 11 \n", "54 Withheld pending adv 1961-12-31 61 2011 11 \n", "60 Withheld pending adv 1979-12-31 43 2010 12 \n", "622 Withheld pending adv 1972-12-31 50 2011 11 \n", "... ... ... ... ... ... \n", "8941 Withheld pending adv 1976-12-31 46 2010 12 \n", "9059 Withheld pending adv 1991-12-31 31 2012 11 \n", "9073 Withheld pending adv 1995-12-31 27 2011 11 \n", "9340 Withheld pending adv 1976-12-31 46 2011 11 \n", "9341 Withheld pending adv 1976-12-31 46 2011 11 \n", "\n", "[282 rows x 16 columns]" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfwhs.loc[dfwhs[\"wait_years\"] > 10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }