{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Column detection results\n", "\n", "This notebook analyses the results of running the column detection script across all of the Stock Exchange images on CloudStor.\n", "\n", "The raw results are in CSV files, one for each year. See [this notebook](find-image-sizes-and-columns-by-year.ipynb) for more details.\n", "\n", "See [this notebook](Visualise-column-detection-results.ipynb) for some visualisations of this data." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import os" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# We're going to combin all of the CSV files into one big dataframe\n", "\n", "# Create an empty dataframe\n", "combined_df = pd.DataFrame()\n", "\n", "# Loop through the range of years\n", "for year in range(1901, 1951):\n", " \n", " # Open the CSV file for that year as a dataframe\n", " year_df = pd.read_csv('{}.csv'.format(year))\n", " \n", " # Add the single year df to the combined df\n", " combined_df = combined_df.append(year_df)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(72932, 11)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many images do we have data for?\n", "combined_df.shape" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
directorynamepathreferenceCodestartDateendDateyearwidthheightcolumnscolumn_positions
0AU NBAC N193-001/N193-001_0001.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0011901-01-011901-03-0119016237500030,1811,3222
1AU NBAC N193-001/N193-001_0002.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0011901-01-011901-03-011901626650003205,1840,3259
2AU NBAC N193-001/N193-001_0003.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0011901-01-011901-03-011901623750002286,2068
3AU NBAC N193-001/N193-001_0004.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0011901-01-011901-03-0119016236500039,1821,3219
4AU NBAC N193-001/N193-001_0005.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0011901-01-011901-03-011901623650003288,1821,3220
\n", "
" ], "text/plain": [ " directory name \\\n", "0 AU NBAC N193-001/ N193-001_0001.tif \n", "1 AU NBAC N193-001/ N193-001_0002.tif \n", "2 AU NBAC N193-001/ N193-001_0003.tif \n", "3 AU NBAC N193-001/ N193-001_0004.tif \n", "4 AU NBAC N193-001/ N193-001_0005.tif \n", "\n", " path referenceCode \\\n", "0 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-001 \n", "1 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-001 \n", "2 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-001 \n", "3 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-001 \n", "4 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-001 \n", "\n", " startDate endDate year width height columns column_positions \n", "0 1901-01-01 1901-03-01 1901 6237 5000 3 0,1811,3222 \n", "1 1901-01-01 1901-03-01 1901 6266 5000 3 205,1840,3259 \n", "2 1901-01-01 1901-03-01 1901 6237 5000 2 286,2068 \n", "3 1901-01-01 1901-03-01 1901 6236 5000 3 9,1821,3219 \n", "4 1901-01-01 1901-03-01 1901 6236 5000 3 288,1821,3220 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Have a look inside\n", "combined_df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3 41076\n", "4 26917\n", "2 4825\n", "1 19\n", "0 6\n", "Name: columns, dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined_df['columns'].value_counts()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
directorynamepathreferenceCodestartDateendDateyearwidthheightcolumnscolumn_positions
677AU NBAC N193-055/N193-055_0037.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0551914-07-011914-09-011914000NaN
1051AU NBAC N193-064/N193-064_0078.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0641916-10-011916-12-011916000NaN
44AU NBAC N193-173/N193-173_0045.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
50AU NBAC N193-173/N193-173_0051.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
52AU NBAC N193-173/N193-173_0053.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
65AU NBAC N193-173/N193-173_0066.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
\n", "
" ], "text/plain": [ " directory name \\\n", "677 AU NBAC N193-055/ N193-055_0037.tif \n", "1051 AU NBAC N193-064/ N193-064_0078.tif \n", "44 AU NBAC N193-173/ N193-173_0045.tif \n", "50 AU NBAC N193-173/ N193-173_0051.tif \n", "52 AU NBAC N193-173/ N193-173_0053.tif \n", "65 AU NBAC N193-173/ N193-173_0066.tif \n", "\n", " path referenceCode \\\n", "677 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-055 \n", "1051 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-064 \n", "44 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "50 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "52 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "65 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "\n", " startDate endDate year width height columns column_positions \n", "677 1914-07-01 1914-09-01 1914 0 0 0 NaN \n", "1051 1916-10-01 1916-12-01 1916 0 0 0 NaN \n", "44 1944-01-01 1944-03-01 1944 0 0 0 NaN \n", "50 1944-01-01 1944-03-01 1944 0 0 0 NaN \n", "52 1944-01-01 1944-03-01 1944 0 0 0 NaN \n", "65 1944-01-01 1944-03-01 1944 0 0 0 NaN " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "combined_df.loc[combined_df['width'] == 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pages with 0 or 1 columns detected\n", "\n", "There are 25 pages with 0 or 1 columns detected. Let's see what's up with them..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
directorynamepathreferenceCodestartDateendDateyearwidthheightcolumnscolumn_positions
677AU NBAC N193-055/N193-055_0037.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0551914-07-011914-09-011914000NaN
1051AU NBAC N193-064/N193-064_0078.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0641916-10-011916-12-011916000NaN
515AU NBAC N193-090/N193-090_0210.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-0901923-04-011923-06-011923426450001355
330AU NBAC N193-109/N193-109_0331.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1091928-01-011928-03-01192840325000126
856AU NBAC N193-111/N193-111_0216.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1111928-07-011928-09-01192857325000112
1590AU NBAC N193-163/N193-163_0427.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1631941-07-011941-09-0119413642246410
44AU NBAC N193-173/N193-173_0045.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
50AU NBAC N193-173/N193-173_0051.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
52AU NBAC N193-173/N193-173_0053.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
65AU NBAC N193-173/N193-173_0066.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-011944000NaN
414AU NBAC N193-173/N193-173_0415.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879461810
415AU NBAC N193-173/N193-173_0416.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879461810
416AU NBAC N193-173/N193-173_0417.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879468919
417AU NBAC N193-173/N193-173_0418.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879461810
418AU NBAC N193-173/N193-173_0419.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879461810
419AU NBAC N193-173/N193-173_0420.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879461810
420AU NBAC N193-173/N193-173_0421.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879457010
421AU NBAC N193-173/N193-173_0422.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879457010
422AU NBAC N193-173/N193-173_0423.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879457010
423AU NBAC N193-173/N193-173_0424.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879455810
424AU NBAC N193-173/N193-173_0425.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879455810
426AU NBAC N193-173/N193-173_0427.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879455810
427AU NBAC N193-173/N193-173_0428.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879455810
431AU NBAC N193-173/N193-173_0432.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879455810
432AU NBAC N193-173/N193-173_0433.tifShared/ANU-Library/Sydney Stock Exchange 1901-...N193-1731944-01-011944-03-0119445879455810
\n", "
" ], "text/plain": [ " directory name \\\n", "677 AU NBAC N193-055/ N193-055_0037.tif \n", "1051 AU NBAC N193-064/ N193-064_0078.tif \n", "515 AU NBAC N193-090/ N193-090_0210.tif \n", "330 AU NBAC N193-109/ N193-109_0331.tif \n", "856 AU NBAC N193-111/ N193-111_0216.tif \n", "1590 AU NBAC N193-163/ N193-163_0427.tif \n", "44 AU NBAC N193-173/ N193-173_0045.tif \n", "50 AU NBAC N193-173/ N193-173_0051.tif \n", "52 AU NBAC N193-173/ N193-173_0053.tif \n", "65 AU NBAC N193-173/ N193-173_0066.tif \n", "414 AU NBAC N193-173/ N193-173_0415.tif \n", "415 AU NBAC N193-173/ N193-173_0416.tif \n", "416 AU NBAC N193-173/ N193-173_0417.tif \n", "417 AU NBAC N193-173/ N193-173_0418.tif \n", "418 AU NBAC N193-173/ N193-173_0419.tif \n", "419 AU NBAC N193-173/ N193-173_0420.tif \n", "420 AU NBAC N193-173/ N193-173_0421.tif \n", "421 AU NBAC N193-173/ N193-173_0422.tif \n", "422 AU NBAC N193-173/ N193-173_0423.tif \n", "423 AU NBAC N193-173/ N193-173_0424.tif \n", "424 AU NBAC N193-173/ N193-173_0425.tif \n", "426 AU NBAC N193-173/ N193-173_0427.tif \n", "427 AU NBAC N193-173/ N193-173_0428.tif \n", "431 AU NBAC N193-173/ N193-173_0432.tif \n", "432 AU NBAC N193-173/ N193-173_0433.tif \n", "\n", " path referenceCode \\\n", "677 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-055 \n", "1051 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-064 \n", "515 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-090 \n", "330 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-109 \n", "856 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-111 \n", "1590 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-163 \n", "44 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "50 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "52 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "65 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "414 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "415 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "416 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "417 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "418 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "419 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "420 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "421 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "422 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "423 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "424 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "426 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "427 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "431 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "432 Shared/ANU-Library/Sydney Stock Exchange 1901-... N193-173 \n", "\n", " startDate endDate year width height columns column_positions \n", "677 1914-07-01 1914-09-01 1914 0 0 0 NaN \n", "1051 1916-10-01 1916-12-01 1916 0 0 0 NaN \n", "515 1923-04-01 1923-06-01 1923 4264 5000 1 355 \n", "330 1928-01-01 1928-03-01 1928 4032 5000 1 26 \n", "856 1928-07-01 1928-09-01 1928 5732 5000 1 12 \n", "1590 1941-07-01 1941-09-01 1941 3642 2464 1 0 \n", "44 1944-01-01 1944-03-01 1944 0 0 0 NaN \n", "50 1944-01-01 1944-03-01 1944 0 0 0 NaN \n", "52 1944-01-01 1944-03-01 1944 0 0 0 NaN \n", "65 1944-01-01 1944-03-01 1944 0 0 0 NaN \n", "414 1944-01-01 1944-03-01 1944 5879 4618 1 0 \n", "415 1944-01-01 1944-03-01 1944 5879 4618 1 0 \n", "416 1944-01-01 1944-03-01 1944 5879 4689 1 9 \n", "417 1944-01-01 1944-03-01 1944 5879 4618 1 0 \n", "418 1944-01-01 1944-03-01 1944 5879 4618 1 0 \n", "419 1944-01-01 1944-03-01 1944 5879 4618 1 0 \n", "420 1944-01-01 1944-03-01 1944 5879 4570 1 0 \n", "421 1944-01-01 1944-03-01 1944 5879 4570 1 0 \n", "422 1944-01-01 1944-03-01 1944 5879 4570 1 0 \n", "423 1944-01-01 1944-03-01 1944 5879 4558 1 0 \n", "424 1944-01-01 1944-03-01 1944 5879 4558 1 0 \n", "426 1944-01-01 1944-03-01 1944 5879 4558 1 0 \n", "427 1944-01-01 1944-03-01 1944 5879 4558 1 0 \n", "431 1944-01-01 1944-03-01 1944 5879 4558 1 0 \n", "432 1944-01-01 1944-03-01 1944 5879 4558 1 0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the problem pages\n", "problems = combined_df.loc[(combined_df['columns'] == 0) | (combined_df['columns'] == 1)]\n", "problems" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# If running locally need to set up Cloudstor client to download images\n", "# DON'T RUN THIS ON SWAN (or you'll get an error because webdav is not installed)\n", "import webdav.client as wc\n", "from webdav.client import RemoteResourceNotFound\n", "from credentials import * # Storing my CloudStor credentials in another file\n", "# Set the connection options. CLOUDSTOR_USER and CLOUDSTOR_PW are stored in a separate credentials file.\n", "options = {\n", " 'webdav_hostname': 'https://cloudstor.aarnet.edu.au',\n", " 'webdav_login': CLOUDSTOR_USER,\n", " 'webdav_password': CLOUDSTOR_PW,\n", " 'webdav_root': '/plus/remote.php/webdav/'\n", "}\n", "# Ok let's initiate the client.\n", "client = wc.Client(options)\n", "from PIL import Image" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Not found: N193-055_0037.tif\n" ] } ], "source": [ "def download_image(image):\n", " try:\n", " client.download_sync(remote_path=image.path, local_path='problems/{}'.format(image.name))\n", " except RemoteResourceNotFound:\n", " print('Not found: {}'.format(image.name))\n", " else:\n", " filename, ext = os.path.splitext(image.name)\n", " if os.path.getsize('problems/{}'.format(image.name)) > 3000000:\n", " img = Image.open('problems/{}'.format(image.name))\n", " img.thumbnail((1000,1000), resample=Image.LANCZOS)\n", " img.save('problems/{}.jpg'.format(filename))\n", " else:\n", " print('Small: {}'.format(image.name))\n", " \n", "for row in problems.itertuples():\n", " if not os.path.exists('problems/{}'.format(row.name)):\n", " download_image(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that 6 of the pages have no width or height recorded. This means that the script couldn't open the images. I manually checked these:\n", "\n", "* N193-055_0037.tif – only 31mb, seems to be compressed (also has a .tiff file extension)\n", "* N193-064_0078.tif – seems ok\n", "* N193-173_0045.tif – seems ok\n", "* N193-173_0051.tif – seems ok\n", "* N193-173_0053.tif – seems ok\n", "* N193-173_0066.tif – seems ok\n", "\n", "I downloaded the 5 that seemed ok, and ran the column detection script on them and the results were as expected. So I think there must have been some temporary problem on CloudStor when the script tried to access them.\n", "\n", "I downloaded the rest and all of them were either rotated, or not the usual page format. These rotated:\n", "\n", "* N193-090_0210.tif\n", "* N193-109_0331.tif\n", "\n", "Others:\n", "\n", "* N193-111_0216.tif – back of page\n", "* N193-163_0427.tif – page of publication\n", "\n", "All the rest from `N193-173` are hand-written register pages.\n", "\n", "So, in summary, the column detector script seems to have worked as expected on all of these." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }