{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# When Does The Brain Reach It's Peak In Art?\n", "\n", "\n", "## The Museum of Modern Art (MoMA) Collection\n", "\n", "The Museum of Modern Art (MoMA) acquired its first artworks in 1929, the year it was established. Today, the Museum’s evolving collection contains almost 200,000 works from around the world spanning the last 150 years.\n", "MoMA is committed to helping everyone understand, enjoy, and use it collection. The Museum’s website features 88,398 artworks from 26,422 artists. [This research dataset](https://github.com/MuseumofModernArt/collection/blob/master/Artworks.csv) contains 138,161 records, representing all of the works that have been accessioned into MoMA’s collection and cataloged in its database. It includes basic metadata for each work, including title, artist, date made, medium, dimensions, and date acquired by the Museum. At this time, the dataset is available in CSV format, encoded in `UTF-8`.\n", "\n", "Description of some of the MoMA's columns we are interested in:\n", "\n", "- `Title`: The title of the artwork.\n", "- `Artist`: The name of the artist who created the artwork.\n", "- `Nationality`: The nationality of the artist.\n", "- `BeginDate`: The year in which the artist was born.\n", "- `EndDate`: The year in which the artist died.\n", "- `Gender`: The gender of the artist.\n", "- `Date`: The date that the artwork was created.\n", "- `Department`: The department inside MoMA to which the artwork belongs.\n", "- `Medium`: Description of the artwork.\n", "- `Classification`: The kind of the artwork.\n", "\n", "\n", "## Introduction\n", "\n", "Scientists say that the human brain reaches its peak efficiency by the age of 30. For instance, you can read *Chapter 6* of [Behave: The Biology of Humans at Our Best and Worst](https://www.amazon.com/Behave-Biology-Humans-Best-Worst/dp/1594205078) by [Robert M. Sapolsky](https://en.wikipedia.org/wiki/Robert_Sapolsky). \n", "\n", "We discovered the Moma dataset and wondered *if most of the artworks were actually created by artists in their 30s*. \n", "\n", "We'll proceed from the assumption that, since the artworks have been preserved and included in the collection of the museum, they represent a valuable result of the human brain activity. \n", "\n", "We'll calculate the age when the artist created his work as the difference between the `Date` and the `BeginDate` columns. \n", "Then we'll create a plot to see *at what age the most of the valuable artworks were created*.\n", "\n", "\n", "## Reading The MoMA Dataset\n", "\n", "Let's take a look at the data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "display.width: 80\n", "display.width: 120\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleArtistConstituentIDArtistBioNationalityBeginDateEndDateGenderDateMedium...ThumbnailURLCircumference (cm)Depth (cm)Diameter (cm)Height (cm)Length (cm)Weight (kg)Width (cm)Seat Height (cm)Duration (sec.)
0Ferdinandsbrücke Project, Vienna, Austria (Ele...Otto Wagner6210(Austrian, 1841–1918)(Austrian)(1841)(1918)(Male)1896Ink and cut-and-pasted painted pages on paper...http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s...NaNNaNNaN48.6000NaNNaN168.9000NaNNaN
1City of Music, National Superior Conservatory ...Christian de Portzamparc7470(French, born 1944)(French)(1944)(0)(Male)1987Paint and colored pencil on print...http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw...NaNNaNNaN40.6401NaNNaN29.8451NaNNaN
2Villa near Vienna Project, Outside Vienna, Aus...Emil Hoppe7605(Austrian, 1876–1957)(Austrian)(1876)(1957)(Male)1903Graphite, pen, color pencil, ink, and gouache ......http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw...NaNNaNNaN34.3000NaNNaN31.8000NaNNaN
3The Manhattan Transcripts Project, New York, N...Bernard Tschumi7056(French and Swiss, born Switzerland 1944)()(1944)(0)(Male)1980Photographic reproduction with colored synthet......http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi...NaNNaNNaN50.8000NaNNaN50.8000NaNNaN
4Villa, project, outside Vienna, Austria, Exter...Emil Hoppe7605(Austrian, 1876–1957)(Austrian)(1876)(1957)(Male)1903Graphite, color pencil, ink, and gouache on tr......http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi...NaNNaNNaN38.4000NaNNaN19.1000NaNNaN
\n", "

5 rows × 29 columns

\n", "
" ], "text/plain": [ " Title Artist ConstituentID \\\n", "0 Ferdinandsbrücke Project, Vienna, Austria (Ele... Otto Wagner 6210 \n", "1 City of Music, National Superior Conservatory ... Christian de Portzamparc 7470 \n", "2 Villa near Vienna Project, Outside Vienna, Aus... Emil Hoppe 7605 \n", "3 The Manhattan Transcripts Project, New York, N... Bernard Tschumi 7056 \n", "4 Villa, project, outside Vienna, Austria, Exter... Emil Hoppe 7605 \n", "\n", " ArtistBio Nationality BeginDate EndDate Gender Date \\\n", "0 (Austrian, 1841–1918) (Austrian) (1841) (1918) (Male) 1896 \n", "1 (French, born 1944) (French) (1944) (0) (Male) 1987 \n", "2 (Austrian, 1876–1957) (Austrian) (1876) (1957) (Male) 1903 \n", "3 (French and Swiss, born Switzerland 1944) () (1944) (0) (Male) 1980 \n", "4 (Austrian, 1876–1957) (Austrian) (1876) (1957) (Male) 1903 \n", "\n", " Medium ... ThumbnailURL \\\n", "0 Ink and cut-and-pasted painted pages on paper ... http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s... \n", "1 Paint and colored pencil on print ... http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw... \n", "2 Graphite, pen, color pencil, ink, and gouache ... ... http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw... \n", "3 Photographic reproduction with colored synthet... ... http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi... \n", "4 Graphite, color pencil, ink, and gouache on tr... ... http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi... \n", "\n", " Circumference (cm) Depth (cm) Diameter (cm) Height (cm) Length (cm) Weight (kg) Width (cm) Seat Height (cm) \\\n", "0 NaN NaN NaN 48.6000 NaN NaN 168.9000 NaN \n", "1 NaN NaN NaN 40.6401 NaN NaN 29.8451 NaN \n", "2 NaN NaN NaN 34.3000 NaN NaN 31.8000 NaN \n", "3 NaN NaN NaN 50.8000 NaN NaN 50.8000 NaN \n", "4 NaN NaN NaN 38.4000 NaN NaN 19.1000 NaN \n", "\n", " Duration (sec.) \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "\n", "[5 rows x 29 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Import libs\n", "from numpy import nan\n", "import pandas as pd\n", "import re\n", "\n", "# Set width of display\n", "print('display.width:', pd.get_option('display.width'))\n", "pd.set_option('display.width', 120)\n", "print('display.width:', pd.get_option('display.width'))\n", "\n", "# Get data\n", "moma = pd.read_csv('data/Artworks.csv')\n", "moma.head()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 138161 entries, 0 to 138160\n", "Data columns (total 29 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Title 138122 non-null object \n", " 1 Artist 136847 non-null object \n", " 2 ConstituentID 136847 non-null object \n", " 3 ArtistBio 132113 non-null object \n", " 4 Nationality 136847 non-null object \n", " 5 BeginDate 136847 non-null object \n", " 6 EndDate 136847 non-null object \n", " 7 Gender 136847 non-null object \n", " 8 Date 135936 non-null object \n", " 9 Medium 127923 non-null object \n", " 10 Dimensions 128078 non-null object \n", " 11 CreditLine 135629 non-null object \n", " 12 AccessionNumber 138161 non-null object \n", " 13 Classification 138161 non-null object \n", " 14 Department 138161 non-null object \n", " 15 DateAcquired 131029 non-null object \n", " 16 Cataloged 138161 non-null object \n", " 17 ObjectID 138161 non-null int64 \n", " 18 URL 85925 non-null object \n", " 19 ThumbnailURL 75283 non-null object \n", " 20 Circumference (cm) 10 non-null float64\n", " 21 Depth (cm) 13791 non-null float64\n", " 22 Diameter (cm) 1468 non-null float64\n", " 23 Height (cm) 120027 non-null float64\n", " 24 Length (cm) 741 non-null float64\n", " 25 Weight (kg) 289 non-null float64\n", " 26 Width (cm) 119105 non-null float64\n", " 27 Seat Height (cm) 0 non-null float64\n", " 28 Duration (sec.) 2233 non-null float64\n", "dtypes: float64(9), int64(1), object(19)\n", "memory usage: 30.6+ MB\n" ] } ], "source": [ "moma.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see 138,161 artworks. Almost all of the columns contain `null` values.\n", "\n", "\n", "## Exploring The Data\n", "\n", "Now, we'll overview the `NaN` values in the data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nan_countnan_percentage
Title390.028228
Artist13140.951064
ArtistBio60484.377502
Nationality13140.951064
BeginDate13140.951064
EndDate13140.951064
Gender13140.951064
Date22251.610440
Department00.000000
\n", "
" ], "text/plain": [ " nan_count nan_percentage\n", "Title 39 0.028228\n", "Artist 1314 0.951064\n", "ArtistBio 6048 4.377502\n", "Nationality 1314 0.951064\n", "BeginDate 1314 0.951064\n", "EndDate 1314 0.951064\n", "Gender 1314 0.951064\n", "Date 2225 1.610440\n", "Department 0 0.000000" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Number of null values per column\n", "moma_nan_cnt = moma[['Title', 'Artist', 'ArtistBio', 'Nationality', 'BeginDate', 'EndDate',\n", " 'Gender', 'Date', 'Department']].isnull().sum()\n", "\n", "# Percentage of null values\n", "moma_nan_pct = moma_nan_cnt / moma.shape[0] * 100\n", "\n", "moma_nan = pd.DataFrame({'nan_count': moma_nan_cnt, 'nan_percentage': moma_nan_pct})\n", "moma_nan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like there is no artist defined for 1314 (0.95%) artworks (`Artist`, `Nationality`, `BeginDate`, `EndDate`, `Gender` columns). The important column `Date` contains 1.6% empty values.\n", "\n", "Let's find duplicate artworks in the dataset using the unique identifier of the object `ObjectID`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Duplicated: 0\n" ] } ], "source": [ "# Duplicated\n", "print('Duplicated: {}'.format(moma.duplicated(subset='ObjectID').sum()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fortunately, there are no duplicates!\n", "\n", "Let's inspect **the `ArtistBio` column**." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleArtistConstituentIDArtistBioNationalityBeginDateEndDateGenderDate
151Slow House Project, North Haven, New York, Pla...Diller + Scofidio, Elizabeth Diller, Ricardo S...8707, 6951, 6952(American, established 1979) (American, born P...(American) (American) (American)(1979) (1954) (1935)(0) (0) (0)() (Female) (Male)1989
160City Hall, project, North Canton, Ohio, Perspe...Venturi and Rauch, Robert Venturi, John Rauch8213, 6132, 8214(American, est. 1964–1979) (American, 1925–201...(American) (American) (American)(1964) (1925) (1930)(1979) (2018) (0)() (Male) (Male)1965
161House, Northern Delaware, Preliminary study of...Venturi and Rauch, Robert Venturi, John Rauch,...8213, 6132, 8214, 8216(American, est. 1964–1979) (American, 1925–201...(American) (American) (American) (American)(1964) (1925) (1930) (1931)(1979) (2018) (0) (0)() (Male) (Male) (Female)1978
162Gordon Wu Hall, Princeton University, Princeto...Venturi, Rauch and Scott Brown, Robert Venturi...8215, 6132, 8214, 8216(American, established 1980) (American, 1925–2...(American) (American) (American) (American)(1980) (1925) (1930) (1931)(0) (2018) (0) (0)() (Male) (Male) (Female)1981
163Gordon Wu Hall, Princeton University, Princeto...Venturi, Rauch and Scott Brown, Robert Venturi...8215, 6132, 8214, 8216(American, established 1980) (American, 1925–2...(American) (American) (American) (American)(1980) (1925) (1930) (1931)(0) (2018) (0) (0)() (Male) (Male) (Female)1981
164Lewis Thomas Laboratory for Molecular Biology,...Venturi, Rauch and Scott Brown, Robert Venturi...8215, 6132, 8214, 8216(American, established 1980) (American, 1925–2...(American) (American) (American) (American)(1980) (1925) (1930) (1931)(0) (2018) (0) (0)() (Male) (Male) (Female)1983
241Sixth Street House project, Santa Monica, CA (...Morphosis, Santa Monica, CA, Thom Mayne, Andre...29711, 8218, 22884(founded 1972) (American, born 1944) (American...() (American) (American)(1972) (1944) (1958)(0) (0) (0)() (Male) (Male)1990
368Slow House Project, North Haven, Long Island, ...Diller + Scofidio, Elizabeth Diller, Ricardo S...8707, 6951, 6952(American, established 1979) (American, born P...(American) (American) (American)(1979) (1954) (1935)(0) (0) (0)() (Female) (Male)1991
450Eclectic House Facade, projectVenturi and Rauch, John Rauch, Robert Venturi8213, 8214, 6132(American, est. 1964–1979) (American, born 193...(American) (American) (American)(1964) (1930) (1925)(1979) (0) (2018)() (Male) (Male)1977
582Charrette Submission for The Museum of Modern ...Herzog & de Meuron, Basel, Jacques Herzog, Pie...7567, 7421, 7422(est. 1978) (Swiss, born 1950) (Swiss, born 1950)(Swiss) (Swiss) (Swiss)(1978) (1950) (1950)(0) (0) (0)() (Male) (Male)1997
584National Commercial Bank, Jeddah, Saudi Arabia...Skidmore Owings & Merrill, Gordon Bunshaft5518, 8170(American, founded 1936) (American, 1909–1990)(American) (American)(1936) (1909)(0) (1990)() (Male)1977
586National Commercial Bank, Jeddah, Saudi Arabia...Skidmore Owings & Merrill, Gordon Bunshaft5518, 8170(American, founded 1936) (American, 1909–1990)(American) (American)(1936) (1909)(0) (1990)() (Male)1977
588National Commercial Bank, Jeddah, Saudi Arabia...Skidmore Owings & Merrill, Gordon Bunshaft5518, 8170(American, founded 1936) (American, 1909–1990)(American) (American)(1936) (1909)(0) (1990)() (Male)1977
590National Commercial Bank, Jeddah, Saudi Arabia...Skidmore Owings & Merrill, Gordon Bunshaft5518, 8170(American, founded 1936) (American, 1909–1990)(American) (American)(1936) (1909)(0) (1990)() (Male)1977
591Charrette Submission for The Museum of Modern ...Dominique Perrault Architecture, Paris, Domini...8053, 8052(founded 1981) (French, born 1953)(French) (French)(1981) (1953)(0) (0)() (Male)1997
\n", "
" ], "text/plain": [ " Title Artist \\\n", "151 Slow House Project, North Haven, New York, Pla... Diller + Scofidio, Elizabeth Diller, Ricardo S... \n", "160 City Hall, project, North Canton, Ohio, Perspe... Venturi and Rauch, Robert Venturi, John Rauch \n", "161 House, Northern Delaware, Preliminary study of... Venturi and Rauch, Robert Venturi, John Rauch,... \n", "162 Gordon Wu Hall, Princeton University, Princeto... Venturi, Rauch and Scott Brown, Robert Venturi... \n", "163 Gordon Wu Hall, Princeton University, Princeto... Venturi, Rauch and Scott Brown, Robert Venturi... \n", "164 Lewis Thomas Laboratory for Molecular Biology,... Venturi, Rauch and Scott Brown, Robert Venturi... \n", "241 Sixth Street House project, Santa Monica, CA (... Morphosis, Santa Monica, CA, Thom Mayne, Andre... \n", "368 Slow House Project, North Haven, Long Island, ... Diller + Scofidio, Elizabeth Diller, Ricardo S... \n", "450 Eclectic House Facade, project Venturi and Rauch, John Rauch, Robert Venturi \n", "582 Charrette Submission for The Museum of Modern ... Herzog & de Meuron, Basel, Jacques Herzog, Pie... \n", "584 National Commercial Bank, Jeddah, Saudi Arabia... Skidmore Owings & Merrill, Gordon Bunshaft \n", "586 National Commercial Bank, Jeddah, Saudi Arabia... Skidmore Owings & Merrill, Gordon Bunshaft \n", "588 National Commercial Bank, Jeddah, Saudi Arabia... Skidmore Owings & Merrill, Gordon Bunshaft \n", "590 National Commercial Bank, Jeddah, Saudi Arabia... Skidmore Owings & Merrill, Gordon Bunshaft \n", "591 Charrette Submission for The Museum of Modern ... Dominique Perrault Architecture, Paris, Domini... \n", "\n", " ConstituentID ArtistBio \\\n", "151 8707, 6951, 6952 (American, established 1979) (American, born P... \n", "160 8213, 6132, 8214 (American, est. 1964–1979) (American, 1925–201... \n", "161 8213, 6132, 8214, 8216 (American, est. 1964–1979) (American, 1925–201... \n", "162 8215, 6132, 8214, 8216 (American, established 1980) (American, 1925–2... \n", "163 8215, 6132, 8214, 8216 (American, established 1980) (American, 1925–2... \n", "164 8215, 6132, 8214, 8216 (American, established 1980) (American, 1925–2... \n", "241 29711, 8218, 22884 (founded 1972) (American, born 1944) (American... \n", "368 8707, 6951, 6952 (American, established 1979) (American, born P... \n", "450 8213, 8214, 6132 (American, est. 1964–1979) (American, born 193... \n", "582 7567, 7421, 7422 (est. 1978) (Swiss, born 1950) (Swiss, born 1950) \n", "584 5518, 8170 (American, founded 1936) (American, 1909–1990) \n", "586 5518, 8170 (American, founded 1936) (American, 1909–1990) \n", "588 5518, 8170 (American, founded 1936) (American, 1909–1990) \n", "590 5518, 8170 (American, founded 1936) (American, 1909–1990) \n", "591 8053, 8052 (founded 1981) (French, born 1953) \n", "\n", " Nationality BeginDate EndDate \\\n", "151 (American) (American) (American) (1979) (1954) (1935) (0) (0) (0) \n", "160 (American) (American) (American) (1964) (1925) (1930) (1979) (2018) (0) \n", "161 (American) (American) (American) (American) (1964) (1925) (1930) (1931) (1979) (2018) (0) (0) \n", "162 (American) (American) (American) (American) (1980) (1925) (1930) (1931) (0) (2018) (0) (0) \n", "163 (American) (American) (American) (American) (1980) (1925) (1930) (1931) (0) (2018) (0) (0) \n", "164 (American) (American) (American) (American) (1980) (1925) (1930) (1931) (0) (2018) (0) (0) \n", "241 () (American) (American) (1972) (1944) (1958) (0) (0) (0) \n", "368 (American) (American) (American) (1979) (1954) (1935) (0) (0) (0) \n", "450 (American) (American) (American) (1964) (1930) (1925) (1979) (0) (2018) \n", "582 (Swiss) (Swiss) (Swiss) (1978) (1950) (1950) (0) (0) (0) \n", "584 (American) (American) (1936) (1909) (0) (1990) \n", "586 (American) (American) (1936) (1909) (0) (1990) \n", "588 (American) (American) (1936) (1909) (0) (1990) \n", "590 (American) (American) (1936) (1909) (0) (1990) \n", "591 (French) (French) (1981) (1953) (0) (0) \n", "\n", " Gender Date \n", "151 () (Female) (Male) 1989 \n", "160 () (Male) (Male) 1965 \n", "161 () (Male) (Male) (Female) 1978 \n", "162 () (Male) (Male) (Female) 1981 \n", "163 () (Male) (Male) (Female) 1981 \n", "164 () (Male) (Male) (Female) 1983 \n", "241 () (Male) (Male) 1990 \n", "368 () (Female) (Male) 1991 \n", "450 () (Male) (Male) 1977 \n", "582 () (Male) (Male) 1997 \n", "584 () (Male) 1977 \n", "586 () (Male) 1977 \n", "588 () (Male) 1977 \n", "590 () (Male) 1977 \n", "591 () (Male) 1997 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artist_bio_pattern_org = r'(?:founded|established|est\\.|active|formed)'\n", "\n", "# Value examples\n", "moma.loc[((moma['ArtistBio'].notnull())\n", " & (moma['ArtistBio'].str.contains(artist_bio_pattern_org, flags=re.I))\n", " ),\n", " ['Title', 'Artist', 'ConstituentID', 'ArtistBio',\n", " 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date'\n", " ]\n", " ].head(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's inspect **the `Gender` column**." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(Male) 104121\n", "(Female) 17728\n", "() 7403\n", "(Male) (Male) 1771\n", "NaN 1314\n", "(Male) (Male) (Male) 885\n", "(Male) () 819\n", "(Male) (Female) 731\n", "() (Male) 520\n", "(Female) (Male) 489\n", "(Female) (Female) 172\n", "() () 146\n", "(Female) (Male) (Male) 110\n", "() (Male) (Female) 106\n", "() (Male) (Male) 97\n", "Name: Gender, dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Value examples\n", "moma['Gender'].value_counts(dropna=False).head(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Gender` column can contain the following:\n", "\n", "- `Male`\n", "- `Female`\n", "- empty value\n", "- combinations of the values above\n", "\n", "Let's look at the last case." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleArtistConstituentIDArtistBioNationalityBeginDateEndDateGenderDateMedium...ThumbnailURLCircumference (cm)Depth (cm)Diameter (cm)Height (cm)Length (cm)Weight (kg)Width (cm)Seat Height (cm)Duration (sec.)
65House IV Project, Falls Village, Connecticut (...Peter Eisenman, Robert Cole6969, 8134(American, born 1932)(American) ()(1932) (0)(0) (0)(Male) (Male)1975Ink and color ink on frosted polymer sheet...http://www.moma.org/media/W1siZiIsIjIxMDA0MSJd...NaNNaNNaN34.9251NaNNaN113.3477NaNNaN
66Villa dall'Ava, Paris (Saint-Cloud), France, E...Rem Koolhaas, Madelon Vriesendorp6956, 6957(Dutch, born 1944) (Dutch, born 1945)(Dutch) (Dutch)(1944) (1945)(0) (0)(Male) (Female)1987Synthetic polymer paint and ink on paper...http://www.moma.org/media/W1siZiIsIjYwMTEyIl0s...NaNNaNNaN63.5001NaNNaN99.0602NaNNaN
76Regional Administrative Center, project \"Tries...Aldo Rossi, Gianni Braghieri, M. Bosshard7661, 8131, 8180(Italian, 1931–1997) (Italian, born 1945) (Ita...(Italian) (Italian) (Italian)(1931) (1945) (0)(1997) (0) (0)(Male) (Male) (Male)1974Rubbed ink and pastel on whiteprint...http://www.moma.org/media/W1siZiIsIjY5MSJdLFsi...NaNNaNNaN72.4000NaNNaN91.4000NaNNaN
107Woodland Crematorium, Woodland Cemetery, Stock...Erik Gunnar Asplund, Sigurd Lewerentz27, 24452(Swedish, 1885–1940) (Swedish)(Swedish) (Swedish)(1885) (0)(1940) (0)(Male) (Male)1937Graphite on tracing paper...http://www.moma.org/media/W1siZiIsIjEyNjUiXSxb...NaNNaNNaN41.3000NaNNaN96.2000NaNNaN
110Palais de la Découverte Project, Paris, France...Paul Nelson, Frantz Jourdain, Oscar Nitzchke8102, 6703, 4312(American, 1895–1979) (French, 1847–1935) (Ame...(American) (French) (American)(1895) (1847) (1900)(1979) (1935) (1991)(Male) (Male) (Male)1938Ink and color pencil on paper mounted on board...http://www.moma.org/media/W1siZiIsIjEzMjMiXSxb...NaN1.3NaN37.5000NaNNaN95.3000NaNNaN
..................................................................
137981Algae Geographies cupAtelier Luma/ Luma Arles, Eric Klarenbeek, Maa...131930, 132165, 132166, 132113(est. 2016) (Dutch, born 1978) (Dutch) (Dutch,...() (Dutch) (Dutch) (Dutch)(2016) (1978) (0) (2014)(0) (0) (0) (0)() (Male) (Female) ()2019Microalgae and sugar-based biopolymer...NaNNaN7.0NaN7.0000NaNNaN7.0000NaNNaN
137982Algae Geographies cupAtelier Luma/ Luma Arles, Eric Klarenbeek, Maa...131930, 132165, 132166, 132113(est. 2016) (Dutch, born 1978) (Dutch) (Dutch,...() (Dutch) (Dutch) (Dutch)(2016) (1978) (0) (2014)(0) (0) (0) (0)() (Male) (Female) ()2019Microalgae and sugar-based biopolymer...NaNNaN7.0NaN7.0000NaNNaN7.0000NaNNaN
137983Algae Geographies cupAtelier Luma/ Luma Arles, Eric Klarenbeek, Maa...131930, 132165, 132166, 132113(est. 2016) (Dutch, born 1978) (Dutch) (Dutch,...() (Dutch) (Dutch) (Dutch)(2016) (1978) (0) (2014)(0) (0) (0) (0)() (Male) (Female) ()2019Microalgae and sugar-based biopolymer...NaNNaN7.0NaN7.0000NaNNaN7.0000NaNNaN
137984Algae Geographies cupAtelier Luma/ Luma Arles, Eric Klarenbeek, Maa...131930, 132165, 132166, 132113(est. 2016) (Dutch, born 1978) (Dutch) (Dutch,...() (Dutch) (Dutch) (Dutch)(2016) (1978) (0) (2014)(0) (0) (0) (0)() (Male) (Female) ()2019Microalgae and sugar-based biopolymer...NaNNaN7.0NaN7.0000NaNNaN7.0000NaNNaN
137985Algae Geographies cupAtelier Luma/ Luma Arles, Eric Klarenbeek, Maa...131930, 132165, 132166, 132113(est. 2016) (Dutch, born 1978) (Dutch) (Dutch,...() (Dutch) (Dutch) (Dutch)(2016) (1978) (0) (2014)(0) (0) (0) (0)() (Male) (Female) ()2019Microalgae and sugar-based biopolymer...NaNNaN7.0NaN7.0000NaNNaN7.0000NaNNaN
\n", "

7554 rows × 29 columns

\n", "
" ], "text/plain": [ " Title Artist \\\n", "65 House IV Project, Falls Village, Connecticut (... Peter Eisenman, Robert Cole \n", "66 Villa dall'Ava, Paris (Saint-Cloud), France, E... Rem Koolhaas, Madelon Vriesendorp \n", "76 Regional Administrative Center, project \"Tries... Aldo Rossi, Gianni Braghieri, M. Bosshard \n", "107 Woodland Crematorium, Woodland Cemetery, Stock... Erik Gunnar Asplund, Sigurd Lewerentz \n", "110 Palais de la Découverte Project, Paris, France... Paul Nelson, Frantz Jourdain, Oscar Nitzchke \n", "... ... ... \n", "137981 Algae Geographies cup Atelier Luma/ Luma Arles, Eric Klarenbeek, Maa... \n", "137982 Algae Geographies cup Atelier Luma/ Luma Arles, Eric Klarenbeek, Maa... \n", "137983 Algae Geographies cup Atelier Luma/ Luma Arles, Eric Klarenbeek, Maa... \n", "137984 Algae Geographies cup Atelier Luma/ Luma Arles, Eric Klarenbeek, Maa... \n", "137985 Algae Geographies cup Atelier Luma/ Luma Arles, Eric Klarenbeek, Maa... \n", "\n", " ConstituentID ArtistBio \\\n", "65 6969, 8134 (American, born 1932) \n", "66 6956, 6957 (Dutch, born 1944) (Dutch, born 1945) \n", "76 7661, 8131, 8180 (Italian, 1931–1997) (Italian, born 1945) (Ita... \n", "107 27, 24452 (Swedish, 1885–1940) (Swedish) \n", "110 8102, 6703, 4312 (American, 1895–1979) (French, 1847–1935) (Ame... \n", "... ... ... \n", "137981 131930, 132165, 132166, 132113 (est. 2016) (Dutch, born 1978) (Dutch) (Dutch,... \n", "137982 131930, 132165, 132166, 132113 (est. 2016) (Dutch, born 1978) (Dutch) (Dutch,... \n", "137983 131930, 132165, 132166, 132113 (est. 2016) (Dutch, born 1978) (Dutch) (Dutch,... \n", "137984 131930, 132165, 132166, 132113 (est. 2016) (Dutch, born 1978) (Dutch) (Dutch,... \n", "137985 131930, 132165, 132166, 132113 (est. 2016) (Dutch, born 1978) (Dutch) (Dutch,... \n", "\n", " Nationality BeginDate EndDate Gender Date \\\n", "65 (American) () (1932) (0) (0) (0) (Male) (Male) 1975 \n", "66 (Dutch) (Dutch) (1944) (1945) (0) (0) (Male) (Female) 1987 \n", "76 (Italian) (Italian) (Italian) (1931) (1945) (0) (1997) (0) (0) (Male) (Male) (Male) 1974 \n", "107 (Swedish) (Swedish) (1885) (0) (1940) (0) (Male) (Male) 1937 \n", "110 (American) (French) (American) (1895) (1847) (1900) (1979) (1935) (1991) (Male) (Male) (Male) 1938 \n", "... ... ... ... ... ... \n", "137981 () (Dutch) (Dutch) (Dutch) (2016) (1978) (0) (2014) (0) (0) (0) (0) () (Male) (Female) () 2019 \n", "137982 () (Dutch) (Dutch) (Dutch) (2016) (1978) (0) (2014) (0) (0) (0) (0) () (Male) (Female) () 2019 \n", "137983 () (Dutch) (Dutch) (Dutch) (2016) (1978) (0) (2014) (0) (0) (0) (0) () (Male) (Female) () 2019 \n", "137984 () (Dutch) (Dutch) (Dutch) (2016) (1978) (0) (2014) (0) (0) (0) (0) () (Male) (Female) () 2019 \n", "137985 () (Dutch) (Dutch) (Dutch) (2016) (1978) (0) (2014) (0) (0) (0) (0) () (Male) (Female) () 2019 \n", "\n", " Medium ... ThumbnailURL \\\n", "65 Ink and color ink on frosted polymer sheet ... http://www.moma.org/media/W1siZiIsIjIxMDA0MSJd... \n", "66 Synthetic polymer paint and ink on paper ... http://www.moma.org/media/W1siZiIsIjYwMTEyIl0s... \n", "76 Rubbed ink and pastel on whiteprint ... http://www.moma.org/media/W1siZiIsIjY5MSJdLFsi... \n", "107 Graphite on tracing paper ... http://www.moma.org/media/W1siZiIsIjEyNjUiXSxb... \n", "110 Ink and color pencil on paper mounted on board ... http://www.moma.org/media/W1siZiIsIjEzMjMiXSxb... \n", "... ... ... ... \n", "137981 Microalgae and sugar-based biopolymer ... NaN \n", "137982 Microalgae and sugar-based biopolymer ... NaN \n", "137983 Microalgae and sugar-based biopolymer ... NaN \n", "137984 Microalgae and sugar-based biopolymer ... NaN \n", "137985 Microalgae and sugar-based biopolymer ... NaN \n", "\n", " Circumference (cm) Depth (cm) Diameter (cm) Height (cm) Length (cm) Weight (kg) Width (cm) Seat Height (cm) \\\n", "65 NaN NaN NaN 34.9251 NaN NaN 113.3477 NaN \n", "66 NaN NaN NaN 63.5001 NaN NaN 99.0602 NaN \n", "76 NaN NaN NaN 72.4000 NaN NaN 91.4000 NaN \n", "107 NaN NaN NaN 41.3000 NaN NaN 96.2000 NaN \n", "110 NaN 1.3 NaN 37.5000 NaN NaN 95.3000 NaN \n", "... ... ... ... ... ... ... ... ... \n", "137981 NaN 7.0 NaN 7.0000 NaN NaN 7.0000 NaN \n", "137982 NaN 7.0 NaN 7.0000 NaN NaN 7.0000 NaN \n", "137983 NaN 7.0 NaN 7.0000 NaN NaN 7.0000 NaN \n", "137984 NaN 7.0 NaN 7.0000 NaN NaN 7.0000 NaN \n", "137985 NaN 7.0 NaN 7.0000 NaN NaN 7.0000 NaN \n", "\n", " Duration (sec.) \n", "65 NaN \n", "66 NaN \n", "76 NaN \n", "107 NaN \n", "110 NaN \n", "... ... \n", "137981 NaN \n", "137982 NaN \n", "137983 NaN \n", "137984 NaN \n", "137985 NaN \n", "\n", "[7554 rows x 29 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gender_pattern_arr = r'(?:\\((?:male|female)?\\))'\n", "\n", "# Value examples\n", "moma.loc[((moma['Gender'].notnull())\n", " & (moma['Gender'].str.count(gender_pattern_arr, flags=re.I) > 1)\n", " )\n", " ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By exploring the columns `Gender` and `ArtistBio` above, we have found that the artist can be represented by a company, a collective of artists, an association of a company and artists.\n", " \n", "There is no clear indication in the data to check whether the artist is an individual or a team. \n", "We could identify the teams by the missing value in the `Gender` column. \n", "However, gender can also be specified for the team. Below is an example of such data:\n", "\n", "- `Robin Schwartz` is a photographer,\n", "- `General Idea` is a collective of three Canadian artists,\n", "- `Hi Red Center` is a short-lived radical art collective." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleArtistConstituentIDArtistBioNationalityBeginDateEndDateGenderDateMediumDimensionsCreditLineAccessionNumberClassificationDepartmentDateAcquiredCatalogedObjectIDURLThumbnailURL
101220News Flash! What Is the Communication Satellit...Hi Red Center36946(Japanese, 1963–1964)()(1963)(1964)()1964Offsetsheet: 6 7/8 x 10\" (17.4 x 25.4 cm)The Gilbert and Lila Silverman Fluxus Collecti...FC2887PrintFluxus Collection2008-10-08N136625NaNNaN
63914Mid-summer MeadowRobin Schwartz5287(American, born 1957)(American)(1957)(0)(Female)1958Woodcutcomposition: 16 1/4 x 17 1/16\" (41.3 x 43.4cm)...The Ingram-Merrill Foundation385.1958PrintDrawings & Prints1958-12-19Y68210http://www.moma.org/collection/works/68210http://www.moma.org/media/W1siZiIsIjIyMzM4NCJd...
131800Bundle of Events from Fluxus 1Hi Red Center36946(Japanese, 1963–1964)()(1963)(1964)()1964, assembled c. 1976Double-sided offsetbook: 7 1/2 × 8 1/4 × 1 15/16\" (19.1 × 21 × 5 cm)The Gilbert and Lila Silverman Fluxus Collecti...2183.2008.16Illustrated BookDrawings & Prints2008-10-08N277655NaNNaN
136750Orgasm Energy ChartGeneral Idea7474(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1970Offset lithographcomposition (irreg.): 15 15/16 × 6 7/16\" (40.5...Anonymous gift785.2019PrintDrawings & Prints2020-04-01Y402075http://www.moma.org/collection/works/402075http://www.moma.org/media/W1siZiIsIjQ4MTc1MSJd...
\n", "
" ], "text/plain": [ " Title Artist ConstituentID ArtistBio \\\n", "101220 News Flash! What Is the Communication Satellit... Hi Red Center 36946 (Japanese, 1963–1964) \n", "63914 Mid-summer Meadow Robin Schwartz 5287 (American, born 1957) \n", "131800 Bundle of Events from Fluxus 1 Hi Red Center 36946 (Japanese, 1963–1964) \n", "136750 Orgasm Energy Chart General Idea 7474 (Canadian, 1969–1994) \n", "\n", " Nationality BeginDate EndDate Gender Date Medium \\\n", "101220 () (1963) (1964) () 1964 Offset \n", "63914 (American) (1957) (0) (Female) 1958 Woodcut \n", "131800 () (1963) (1964) () 1964, assembled c. 1976 Double-sided offset \n", "136750 (Canadian) (1969) (1994) (Male) 1970 Offset lithograph \n", "\n", " Dimensions CreditLine \\\n", "101220 sheet: 6 7/8 x 10\" (17.4 x 25.4 cm) The Gilbert and Lila Silverman Fluxus Collecti... \n", "63914 composition: 16 1/4 x 17 1/16\" (41.3 x 43.4cm)... The Ingram-Merrill Foundation \n", "131800 book: 7 1/2 × 8 1/4 × 1 15/16\" (19.1 × 21 × 5 cm) The Gilbert and Lila Silverman Fluxus Collecti... \n", "136750 composition (irreg.): 15 15/16 × 6 7/16\" (40.5... Anonymous gift \n", "\n", " AccessionNumber Classification Department DateAcquired Cataloged ObjectID \\\n", "101220 FC2887 Print Fluxus Collection 2008-10-08 N 136625 \n", "63914 385.1958 Print Drawings & Prints 1958-12-19 Y 68210 \n", "131800 2183.2008.16 Illustrated Book Drawings & Prints 2008-10-08 N 277655 \n", "136750 785.2019 Print Drawings & Prints 2020-04-01 Y 402075 \n", "\n", " URL ThumbnailURL \n", "101220 NaN NaN \n", "63914 http://www.moma.org/collection/works/68210 http://www.moma.org/media/W1siZiIsIjIyMzM4NCJd... \n", "131800 NaN NaN \n", "136750 http://www.moma.org/collection/works/402075 http://www.moma.org/media/W1siZiIsIjQ4MTc1MSJd... " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Value examples\n", "moma.iloc[[101220, 63914, 131800, 136750], 0:20]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the main goal of this project is to analyze the individual capabilities of the human brain, we'll focus on the artworks created by single authors, not teams.\n", "\n", "Let's inspect **the `date` columns** such as **`BeginDate`, `EndDate`, `Date`**." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1995) (1966) (1965) 1\n", "(1995) (1968) (1965) 1\n", "(1996) 2\n", "(1996) (1960) (1961) 5\n", "(1996) (1963) (1965) (1963) (1966) (1965) 1\n", "(1996) (1963) (1966) (1965) (1963) (1965) 1\n", "(1997) 42\n", "(1997) (1964) (1966) 3\n", "(1997) (1988) (1938) (1968) 1\n", "(1998) 2\n", "(1998) (1972) 1\n", "(1999) 1\n", "(1999) (1979) 1\n", "(2000) 1\n", "(2000) (1969) (1972) 1\n", "(2000) (1971) (1972) 1\n", "(2001) (1955) 1\n", "(2002) (1957) (1959) (1958) (1971) (1958) (1964) (1965) (1963) (1958) (0) 1\n", "(2002) (1977) 1\n", "(2002) (1977) (1971) (1980) 3\n", "(2003) 3\n", "(2003) (0) 2\n", "(2003) (1973) (1972) (1973) 3\n", "(2004) 3\n", "(2004) (1973) (0) (0) 1\n", "(2004) (1976) 4\n", "(2005) 12\n", "(2005) (1943) (1953) (1965) (1960) (1967) (1960) (1967) (1974) (1999) (1959) (1996) (1976) (2004) (1952) (1996) (1988) 6\n", "(2006) (1974) (1976) (1976) 1\n", "(2006) (1975) (1983) 13\n", "(2007) 1\n", "(2007) (0) (1977) (1973) (0) (0) 6\n", "(2007) (1979) (1980) 3\n", "(2007) (1979) (1980) (0) (1977) 1\n", "(2009) (1980) (1983) 5\n", "(2010) 3\n", "(2012) 1\n", "(2014) 5\n", "(2016) (1978) (0) (2014) 15\n", "NaN 1314\n", "Name: BeginDate, dtype: int64\n", "Index(['(0)', '(0) (0)', '(0) (0) (0)', '(0) (0) (0) (0)', '(0) (0) (0) (0) (0)', '(0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (1957)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (1957)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (1957)', '(0) (0) (0) (0) (0) (0) (0) (0) (0) (1957)',\n", " '(0) (0) (0) (0) (0) (0) (0) (1957)',\n", " '(0) (0) (0) (0) (0) (0) (1955) (0) (0) (1956) (0) (1943) (0) (1954) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (1957)', '(0) (0) (0) (0) (0) (1923) (0) (0) (0)', '(0) (0) (0) (0) (0) (2001)',\n", " '(0) (0) (0) (0) (1937) (0) (0) (1939) (0)', '(0) (0) (0) (1867) (0)', '(0) (0) (0) (1872) (0)',\n", " '(0) (0) (0) (1875) (0) (0)', '(0) (0) (0) (1897)',\n", " '(0) (0) (0) (1922) (0) (0) (1933) (1937) (1916) (1899) (1935)', '(0) (0) (1862) (1861) (0)',\n", " '(0) (0) (1863) (0) (0)', '(0) (0) (1877)', '(0) (0) (1886) (1883)',\n", " '(0) (0) (1889) (1888) (1892) (0) (0) (1875) (0) (0) (1891) (0) (1891) (0) (1861) (0) (0) (0) (0) (0)',\n", " '(0) (0) (1894) (0) (0) (0) (1898)', '(0) (0) (1894) (0) (0) (1894) (0) (0) (0) (0)', '(0) (0) (1895)',\n", " '(0) (0) (1896) (1892)', '(0) (0) (1897) (1892)', '(0) (0) (1899) (0) (1883) (1891)', '(0) (0) (1902) (0)',\n", " '(0) (0) (1903)',\n", " '(0) (0) (1922) (1922) (1930) (1938) (1925) (1927) (1936) (1916) (1928) (1928) (1926) (1930) (0) (0) (0)',\n", " '(0) (0) (1925) (1894)', '(0) (0) (1925) (1900) (1904)', '(0) (0) (1926)',\n", " '(0) (0) (1934) (0) (1927) (1938) (0) (1927) (1940) (1933) (1929) (1936) (0) (1926) (1926) (1932) (1932) (0) (1930) (1930) (1925) (1935) (1932) (1928) (0)'],\n", " dtype='object')\n" ] } ], "source": [ "# Value examples\n", "print(moma['BeginDate'].value_counts(dropna=False).sort_index().tail(40),\n", " moma['BeginDate'].value_counts(dropna=False).sort_index().head(40).index,\n", " sep='\\n'\n", " )" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2017) (0) (0) 10\n", "(2017) (0) (0) (0) (0) 1\n", "(2017) (0) (0) (0) (0) (0) (0) 1\n", "(2017) (0) (0) (0) (0) (0) (1988) (0) (0) (0) (0) (0) 1\n", "(2017) (0) (1970) (0) 1\n", "(2017) (1986) 1\n", "(2017) (2006) (2009) 23\n", "(2017) (2013) 1\n", "(2017) (2014) 3\n", "(2017) (2017) 1\n", "(2017) (2018) (2007) (1997) (0) (0) (0) 5\n", "(2018) 513\n", "(2018) (0) 2\n", "(2018) (0) (1978) (1993) 1\n", "(2018) (0) (2008) 2\n", "(2018) (1966) 4\n", "(2018) (1978) 2\n", "(2018) (1978) (0) 1\n", "(2018) (1987) 1\n", "(2018) (1988) 1\n", "(2018) (2006) (2009) 29\n", "(2018) (2017) (0) 1\n", "(2019) 479\n", "(2019) (0) 13\n", "(2019) (0) (0) 1\n", "(2019) (0) (0) (0) 1\n", "(2019) (0) (0) (0) (0) (0) (0) (0) 10\n", "(2019) (1958) 1\n", "(2019) (1981) 1\n", "(2019) (1998) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) 1\n", "(2020) 570\n", "(2020) (0) 6\n", "(2020) (0) (0) (0) (0) (2015) (0) (0) (2012) (2020) (0) (0) (0) (0) (0) 1\n", "(2020) (0) (1969) 1\n", "(2020) (1989) (0) 1\n", "(2020) (2006) 8\n", "(2020) (2006) (2009) 64\n", "(2020) (2010) 1\n", "(2020) (2011) 2\n", "NaN 1314\n", "Name: EndDate, dtype: int64\n", "Index(['(0)', '(0) (0)', '(0) (0) (0)', '(0) (0) (0) (0)', '(0) (0) (0) (0) (0)', '(0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0)', '(0) (0) (0) (0) (0) (0) (0) (0)', '(0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0)', '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)', '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (1999) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (1992) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (2017) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (0) (2011) (1995) (0) (0) (0) (2007) (2017) (2005) (0) (2003) (0)',\n", " '(0) (0) (0) (0) (0) (0) (0) (0) (2009) (0)', '(0) (0) (0) (0) (0) (0) (1999) (0) (0) (0) (2005)',\n", " '(0) (0) (0) (0) (0) (0) (2013) (0)', '(0) (0) (0) (0) (0) (1997) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (2004) (0) (0) (0)', '(0) (0) (0) (0) (0) (2012) (0) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (0) (2014) (0)', '(0) (0) (0) (0) (0) (2014) (0) (0) (0) (0) (0)',\n", " '(0) (0) (0) (0) (1983) (2008) (0) (0) (1987) (0) (2004) (2010) (2006) (0) (0) (1990) (2015) (2005) (2003) (0) (2019) (1990)',\n", " '(0) (0) (0) (0) (1988)', '(0) (0) (0) (0) (1991) (0)',\n", " '(0) (0) (0) (0) (1995) (1998) (0) (2006) (0) (0) (1998) (0) (0) (2017) (0) (1993) (2006) (0) (1998) (0) (1991) (0) (1998) (2013) (0)',\n", " '(0) (0) (0) (0) (1996) (0) (0) (0) (0) (0) (0) (0)', '(0) (0) (0) (0) (2009) (0)',\n", " '(0) (0) (0) (0) (2012) (0) (0) (0)', '(0) (0) (0) (1910) (0)', '(0) (0) (0) (1923) (0)', '(0) (0) (0) (1932)',\n", " '(0) (0) (0) (1936) (0)'],\n", " dtype='object')\n" ] } ], "source": [ "# Value examples\n", "print(moma['EndDate'].value_counts(dropna=False).sort_index().tail(40),\n", " moma['EndDate'].value_counts(dropna=False).sort_index().head(40).index,\n", " sep='\\n'\n", " )" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1961 1\n", "(1858) 1\n", "(1868-69?) 1\n", "(1873) 1\n", "(1883) 1\n", " ... \n", "version I, 1918 1\n", "version I, 1920 (close to the marble of 1915) 1\n", "winter 1908-09 1\n", "winter 1911-12 1\n", "NaN 2225\n", "Name: Date, Length: 9384, dtype: int64\n" ] } ], "source": [ "# Value examples\n", "print(moma['Date'].value_counts(dropna=False).sort_index())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `BeginDate` and `EndDate` columns, like the `Gender` column, contain groups of values for the teams:\n", "\n", "- `(2020) (2006) (2009)`\n", "- `(0) (0) (0) (0)`\n", "- `(0) (0) (0) (1936) (0)`\n", "\n", "The `Date` column contains quite mixed data that needs to be cleaned up. The artwork creation date can be:\n", "\n", "- one year,\n", "- range of years.\n", "\n", "So, we have a following plan to clean up the data:\n", "\n", "1. Drop the rows where the `BeginDate` or `Date` columns contain null values, since we won't be able calculate the age.\n", "2. Drop the rows where artist is not a single author.\n", "3. For the `BeginDate` and `EndDate` columns:\n", "\n", " - Extract and convert the year to a number.\n", " - Assign it to the new columns `begin_date_clean` and `end_date_clean`.\n", "\n", "\n", "4. Clean up the gender and assign it to the `gender_clean` column.\n", "5. For the `Date` column:\n", "\n", " - If the date isn't a range:\n", "\n", " * Extract and convert the value to a number.\n", "\n", " - If the date is a range:\n", "\n", " * Extract two bound years.\n", " * Convert them to the integer type and then average them by adding them together and dividing by two.\n", " * Use the round() function to round the average, so values like 1872.5 become 1872.\n", "\n", " - Assign the year to the `data_clean` column.\n", "\n", "\n", "## Clearing The Data\n", "\n", "We'll perform data cleanup iteratively, step by step. We'll use hard-coded regular expression patterns to avoid missing any values.\n", "\n", "### Drop the columns\n", "\n", "First, let's drop the unnecessary columns from the dataframe. \n", "The `ConstituentID`, `Medium`, `Dimensions`, `CreditLine`, `AccessionNumber`, `DateAcquired`, `Cataloged`, `ObjectID`, `URL`, `ThumbnailURL`, `Circumference (cm)`, `Depth (cm)`, `Diameter (cm)`, `Height (cm)`, `Length (cm)`, `Weight (kg)`, `Width (cm)`, `Seat Height (cm)`, `Duration (sec.)` columns don't contain some useful information for our goal." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop:\n", "Index(['Title', 'Artist', 'ConstituentID', 'ArtistBio', 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date',\n", " 'Medium', 'Dimensions', 'CreditLine', 'AccessionNumber', 'Classification', 'Department', 'DateAcquired',\n", " 'Cataloged', 'ObjectID', 'URL', 'ThumbnailURL', 'Circumference (cm)', 'Depth (cm)', 'Diameter (cm)',\n", " 'Height (cm)', 'Length (cm)', 'Weight (kg)', 'Width (cm)', 'Seat Height (cm)', 'Duration (sec.)'],\n", " dtype='object')\n", "\n", "After drop:\n", "Index(['Title', 'Artist', 'ArtistBio', 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date', 'Classification',\n", " 'Department'],\n", " dtype='object')\n" ] } ], "source": [ "print('Before drop:', moma.columns, sep='\\n', end='\\n\\n')\n", "\n", "# List to drop\n", "drop_cols = ['ConstituentID', 'Medium', 'Dimensions', 'CreditLine', 'AccessionNumber', 'DateAcquired',\n", " 'Cataloged', 'ObjectID', 'URL', 'ThumbnailURL', 'Circumference (cm)',\n", " 'Depth (cm)', 'Diameter (cm)', 'Height (cm)', 'Length (cm)', 'Weight (kg)', 'Width (cm)',\n", " 'Seat Height (cm)', 'Duration (sec.)'\n", " ]\n", "\n", "# Drop columns\n", "moma.drop(drop_cols, axis=1, inplace=True)\n", "\n", "print('After drop:', moma.columns, sep='\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rename the columns\n", "\n", "Let's convert the remaining column names to `snake_case` format." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before rename:\n", "Index(['Title', 'Artist', 'ArtistBio', 'Nationality', 'BeginDate', 'EndDate', 'Gender', 'Date', 'Classification',\n", " 'Department'],\n", " dtype='object')\n", "\n", "After rename:\n", "Index(['title', 'artist', 'artist_bio', 'nationality', 'begin_date', 'end_date', 'gender', 'date', 'classification',\n", " 'department'],\n", " dtype='object')\n" ] } ], "source": [ "print('Before rename:', moma.columns, sep='\\n', end='\\n\\n')\n", "\n", "# Convert to lower case\n", "moma.columns = moma.columns.str.lower()\n", "\n", "# Add underline\n", "cols = {'artistbio':'artist_bio', 'begindate':'begin_date', 'enddate':'end_date'}\n", "moma.rename(columns=cols, inplace=True)\n", "\n", "print('After rename:', moma.columns, sep='\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clear the `ArtistBio` column\n", "\n", "The `ArtistBio` column contains some details about the artist. Years from this column are also represented in the `BeginDate` and `EndDate` columns we are interested in.\n", "\n", "As mentioned, we need to remove all companies and other author groups. For instance, the `ArtistBio` contains values such as `(British, founded 1967)` or `(Italian, established 1969)` for organizations.\n", "\n", "Also, the column may represent the artist's years of activity, rather than the years of life. Since we are interested in the year of birth to calculate the age, these rows are useless.\n", "\n", "Let's drop all of them." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " artist_bio artist_bio_pattern_drop\n", "0 (British, founded 1967) True\n", "1 (Italian, established 1969) True\n", "2 (est. 1933) True\n", "3 (American, active 1904–present) True\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valid_countvalid_percentage
True136719136719
False14421442
\n", "
" ], "text/plain": [ " valid_count valid_percentage\n", "True 136719 136719\n", "False 1442 1442" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artist_bio_pattern_drop = r'(?:founded|established|est\\.|active|formed)'\n", "\n", "# Test\n", "artist_bio_test = pd.DataFrame(['(British, founded 1967)',\n", " '(Italian, established 1969)',\n", " '(est. 1933)',\n", " '(American, active 1904–present)'\n", " ], columns=['artist_bio'])\n", "artist_bio_test['artist_bio_pattern_drop'] = artist_bio_test['artist_bio'].str.contains(artist_bio_pattern_drop, flags=re.I)\n", "print(artist_bio_test, end='\\n\\n')\n", "\n", "artist_bio_bool_drop = moma['artist_bio'].str.contains(artist_bio_pattern_drop, flags=re.I) # bool mask to drop\n", "artist_bio_bool_drop.fillna(False, inplace=True) # do not drop artist_bio with NaN\n", "\n", "# Number of valid (True) and invalid (False) rows\n", "artist_bio_cnt = (~artist_bio_bool_drop).value_counts(dropna=False)\n", "\n", "# Percentage of valid (True) and invalid (False) rows\n", "artist_bio_pct = artist_bio_cnt * 100 / moma.shape[0]\n", "artist_bio_pct = (~artist_bio_bool_drop).value_counts(dropna=False)\n", "\n", "artist_bio_stat = pd.DataFrame({'valid_count': artist_bio_cnt, 'valid_percentage': artist_bio_pct})\n", "artist_bio_stat" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop:\n", "total: 138161\n", "True 136719\n", "False 1442\n", "Name: artist_bio, dtype: int64\n", "\n", "After drop:\n", "total: 136719\n", "True 136719\n", "Name: artist_bio, dtype: int64\n" ] } ], "source": [ "print('Before drop:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~artist_bio_bool_drop).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "artist_bio_drop = moma[artist_bio_bool_drop].index # rows to drop\n", "moma.drop(index=artist_bio_drop, inplace=True)\n", "\n", "print('After drop:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~(moma['artist_bio'].str.contains(artist_bio_pattern_drop, flags=re.I)\n", " .fillna(False)\n", " )\n", " ).value_counts(dropna=False)\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clear the `BeginDate` column\n", "\n", "According to the plan above, we'll remove rows with `NaN` values." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before dropna:\n", "total: 136719\n", "NaNs: 1314\n", "\n", "After dropna:\n", "total: 135405\n", "NaNs: 0\n" ] } ], "source": [ "print('Before dropna:')\n", "print('total:', moma.shape[0]) # print the total number of rows before\n", "print('NaNs:', moma['begin_date'].isna().sum(), end='\\n\\n') # print the number of NaNs before\n", "\n", "moma.dropna(subset=['begin_date'], axis=0, inplace=True) # drop NaNs\n", "\n", "print('After dropna:')\n", "print('total:', moma.shape[0]) # print the total number of rows after\n", "print('NaNs:', moma['begin_date'].isna().sum()) # print the number of NaNs after" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's remove rows containing a group of values from the dataset (for example, `(2020) (0) (1969)`), that is, organizations." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " begin_date begin_date_pattern\n", "0 (0) False\n", "1 (0) (0) False\n", "2 (1885) (0) False\n", "3 (0) (1995)(1895) (1847) (1900) False\n", "4 (1880) True\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valid_countvalid_percentage
True12045888.961264
False1494711.038736
\n", "
" ], "text/plain": [ " valid_count valid_percentage\n", "True 120458 88.961264\n", "False 14947 11.038736" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "begin_date_pattern = r'^\\(([0-2]\\d{3})\\)$'\n", "\n", "# Test\n", "begin_date_test = pd.DataFrame(['(0)',\n", " '(0) (0)',\n", " '(1885) (0)',\n", " '(0) (1995)'\n", " '(1895) (1847) (1900)',\n", " '(1880)'\n", " ], columns=['begin_date'])\n", "begin_date_test['begin_date_pattern'] = (begin_date_test['begin_date'].str.replace(r'\\s', '')\n", " .str.match(begin_date_pattern, flags=re.I)\n", " )\n", "print(begin_date_test, end='\\n\\n')\n", "\n", "# Valid rows\n", "begin_date_bool_valid = moma['begin_date'].str.replace(r'\\s', '').str.match(begin_date_pattern, flags=re.I)\n", "\n", "# Number of valid (True) and invalid (False) rows\n", "begin_date_cnt = begin_date_bool_valid.value_counts(dropna=False)\n", "\n", "# Percentage of valid (True) and invalid (False) rows\n", "begin_date_pct = begin_date_cnt * 100 / moma.shape[0]\n", "\n", "begin_date_stat = pd.DataFrame({'valid_count': begin_date_cnt, 'valid_percentage': begin_date_pct})\n", "begin_date_stat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although this is a large percentage (about 11%) of the total number of rows, we have to drop them." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop:\n", "total: 135405\n", "True 120458\n", "False 14947\n", "Name: begin_date, dtype: int64\n", "\n", "After drop:\n", "total: 120458\n", "True 120458\n", "Name: begin_date, dtype: int64\n", "\n" ] } ], "source": [ "print('Before drop:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print(begin_date_bool_valid.value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "begin_date_drop = moma[~begin_date_bool_valid].index # rows to drop\n", "moma.drop(index=begin_date_drop, inplace=True)\n", "\n", "print('After drop:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((moma['begin_date'].str.replace(r'\\s', '')\n", " .str.match(begin_date_pattern, flags=re.I)\n", " .value_counts(dropna=False)\n", " ), end='\\n\\n'\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's convert the years to the integer type and add these values to the new `begin_date_clean` column." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
begin_datebegin_date_clean
0(1841)1841
1(1944)1944
2(1876)1876
3(1944)1944
4(1876)1876
5(1944)1944
6(1944)1944
7(1944)1944
8(1944)1944
9(1944)1944
\n", "
" ], "text/plain": [ " begin_date begin_date_clean\n", "0 (1841) 1841\n", "1 (1944) 1944\n", "2 (1876) 1876\n", "3 (1944) 1944\n", "4 (1876) 1876\n", "5 (1944) 1944\n", "6 (1944) 1944\n", "7 (1944) 1944\n", "8 (1944) 1944\n", "9 (1944) 1944" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract the birth year\n", "moma['begin_date_clean'] = (moma['begin_date'].str.replace(r'\\s', '')\n", " .str.extract(begin_date_pattern, flags=re.I)\n", " .astype(int)\n", " )\n", "moma[['begin_date', 'begin_date_clean']].head(10) # check the values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clear the `EndDate` column\n", "\n", "The year of death is not directly involved in the calculation of age. However, we'll clean up the `end_date` column to use it for validation: `date <= end_date`.\n", "\n", "Let's check the column for `NaN` values." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NaNs: 0\n" ] } ], "source": [ "print('NaNs:', moma['end_date'].isna().sum()) # print the number of NaNs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are no empty values in the `end_date`. \n", "\n", "Let's take a look at the single year values." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valid_countvalid_percentage
True8289868.819007
False3756031.180993
\n", "
" ], "text/plain": [ " valid_count valid_percentage\n", "True 82898 68.819007\n", "False 37560 31.180993" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "end_date_pattern = r'^\\(([0-2]\\d{3})\\)$'\n", "\n", "# Valid rows\n", "end_date_bool_valid = moma['end_date'].str.replace(r'\\s', '').str.match(end_date_pattern, flags=re.I)\n", "\n", "# Number of valid (True) and invalid (False) rows\n", "end_date_cnt = end_date_bool_valid.value_counts(dropna=False)\n", "\n", "# Percentage of valid (True) and invalid (False) rows\n", "end_date_pct = end_date_cnt * 100 / moma.shape[0]\n", "\n", "end_date_stat = pd.DataFrame({'valid_count': end_date_cnt, 'valid_percentage': end_date_pct})\n", "end_date_stat" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0) 37560\n", "Name: end_date, dtype: int64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inspect values\n", "moma.loc[~end_date_bool_valid, 'end_date'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "About 30% of the rows doesn't contain the year of death. \n", "\n", "We'll extract the years and convert them to an integer, then add those values to the `end_date_clean` column.\n", "Also we'll fill in the empty values with `0` in the `end_date_clean` column." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True 120458\n", "Name: end_date_clean, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
end_dateend_date_clean
0(1918)1918
1(0)0
2(1957)1957
3(0)0
4(1957)1957
5(0)0
\n", "
" ], "text/plain": [ " end_date end_date_clean\n", "0 (1918) 1918\n", "1 (0) 0\n", "2 (1957) 1957\n", "3 (0) 0\n", "4 (1957) 1957\n", "5 (0) 0" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract the death year\n", "moma['end_date_clean'] = (moma.loc[end_date_bool_valid, 'end_date'].str.replace(r'\\s', '')\n", " .str.extract(end_date_pattern, flags=re.I)\n", " )\n", "moma['end_date_clean'].fillna(0, inplace=True)\n", "moma['end_date_clean'] = moma['end_date_clean'].astype(int)\n", "\n", "# Number of valid (True) and invalid (False) rows\n", "print(moma['end_date_clean'].notnull().value_counts())\n", "moma[['end_date', 'end_date_clean']].head(6) # check the values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clear the `Gender` column\n", "\n", "We'll clean up the `Gender` column to create plots for men and women.\n", "\n", "Let's check for NaN values." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NaNs: 0\n" ] } ], "source": [ "print('NaNs:', moma['gender'].isna().sum()) # print the number of NaNs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are no `NaN` values in the `gender` column.\n", "\n", "Let's consider the rows where the `gender` column contains a single value." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valid_countvalid_percentage
True12002999.643859
False4290.356141
\n", "
" ], "text/plain": [ " valid_count valid_percentage\n", "True 120029 99.643859\n", "False 429 0.356141" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gender_pattern = r'^\\((?P(?:male|female))\\)$'\n", "\n", "# Valid rows\n", "gender_bool_valid = moma['gender'].str.replace(r'\\s', '').str.match(gender_pattern, flags=re.I)\n", "\n", "# Number of valid (True) and invalid (False) rows\n", "gender_cnt = gender_bool_valid.value_counts(dropna=False)\n", "\n", "# Percentage of valid (True) and invalid (False) rows\n", "gender_pct = gender_cnt * 100 / moma.shape[0]\n", "\n", "gender_stat = pd.DataFrame({'valid_count': gender_cnt, 'valid_percentage': gender_pct})\n", "gender_stat" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "() 418\n", "(Non-Binary) 11\n", "Name: gender, dtype: int64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inspect values\n", "moma.loc[~gender_bool_valid, 'gender'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The gender is not defined in 429 rows.\n", "\n", "We'll extract valid gender values and assign them to the `gender_clean` column." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True 120029\n", "False 429\n", "Name: gender_clean, dtype: int64\n", "\n", "male 102580\n", "female 17449\n", "NaN 429\n", "Name: gender_clean, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gendergender_clean
0(Male)male
1(Male)male
2(Male)male
3(Male)male
4(Male)male
\n", "
" ], "text/plain": [ " gender gender_clean\n", "0 (Male) male\n", "1 (Male) male\n", "2 (Male) male\n", "3 (Male) male\n", "4 (Male) male" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract the gender\n", "moma['gender_clean'] = (moma.loc[gender_bool_valid, 'gender'].str.replace(r'\\s', '')\n", " .str.extract(gender_pattern, flags=re.I)['gender']\n", " .str.lower()\n", " )\n", "# Number of valid (True) and invalid (False) rows\n", "print(moma['gender_clean'].notnull().value_counts(), end='\\n\\n')\n", "print(moma['gender_clean'].value_counts(dropna=False))\n", "moma[['gender', 'gender_clean']].head() # check the values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clear the `Nationality` column\n", "\n", "We'll process the this column to create plots depending on nationality.\n", "\n", "First, check for NaN values." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NaNs: 0\n" ] } ], "source": [ "print('NaNs:', moma['nationality'].isna().sum()) # print the number of NaNs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are no `NaN` values in the `Nationality` column.\n", "\n", "Let's consider the rows where the `nationality` column contains a single value." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valid_countvalid_percentage
True12029599.864683
False1630.135317
\n", "
" ], "text/plain": [ " valid_count valid_percentage\n", "True 120295 99.864683\n", "False 163 0.135317" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nationality_pattern = r'^\\((?P(.+))\\)$'\n", "\n", "# Valid rows\n", "nationality_bool_valid = moma['nationality'].str.replace(r'\\s', '').str.match(nationality_pattern, flags=re.I)\n", "\n", "# Number of valid (True) and invalid (False) rows\n", "nationality_cnt = nationality_bool_valid.value_counts(dropna=False)\n", "\n", "# Percentage of valid (True) and invalid (False) rows\n", "nationality_pct = nationality_cnt * 100 / moma.shape[0]\n", "\n", "nationality_stat = pd.DataFrame({'valid_count': nationality_cnt, 'valid_percentage': nationality_pct})\n", "nationality_stat" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "() 163\n", "Name: nationality, dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inspect values\n", "moma.loc[~nationality_bool_valid, 'nationality'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The nationality is not defined in 163 rows.\n", "\n", "We'll extract valid nationality values and assign them to the `nationality_clean` column." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True 120295\n", "False 163\n", "Name: nationality_clean, dtype: int64\n", "\n", "american 56038\n", "french 22392\n", "german 8955\n", "british 5306\n", "spanish 3082\n", " ... \n", "kuwaiti 1\n", "ugandan 1\n", "vietnamese 1\n", "nicaraguan 1\n", "azerbaijani 1\n", "Name: nationality_clean, Length: 113, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nationalitynationality_clean
0(Austrian)austrian
1(French)french
2(Austrian)austrian
3()NaN
4(Austrian)austrian
\n", "
" ], "text/plain": [ " nationality nationality_clean\n", "0 (Austrian) austrian\n", "1 (French) french\n", "2 (Austrian) austrian\n", "3 () NaN\n", "4 (Austrian) austrian" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract the nationality\n", "moma['nationality_clean'] = (moma.loc[nationality_bool_valid, 'nationality'].str.replace(r'\\s', '')\n", " .str.extract(nationality_pattern, flags=re.I)['nationality']\n", " .str.lower()\n", " )\n", "# Number of valid (True) and invalid (False) rows\n", "print(moma['nationality_clean'].notnull().value_counts(), end='\\n\\n')\n", "print(moma['nationality_clean'].value_counts(dropna=False))\n", "moma[['nationality', 'nationality_clean']].head() # check the values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clear the `Date` column\n", "\n", "Now, we'll continue with clearing the `date` column.\n", "\n", "As a reminder, we decided to process the `date` column as follows:\n", "\n", "- If the date isn't a range:\n", "\n", " * Extract and convert the value to a number.\n", "\n", "- If the date is a range:\n", "\n", " * Extract two bound years.\n", " * Convert them to the integer type and then average them by adding them together and dividing by two.\n", " * Use the round() function to round the average, so values like 1872.5 become 1872.\n", "\n", "- Assign the year to the `data_clean` column.\n", "\n", "Let's drop the rows with `NaN` values." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before dropna:\n", "total: 120458\n", "NaNs: 1362\n", "\n", "After dropna:\n", "total: 119096\n", "NaNs: 0\n" ] } ], "source": [ "print('Before dropna:')\n", "print('total:', moma.shape[0]) # print the total number of rows before\n", "print('NaNs:', moma['date'].isna().sum(), end='\\n\\n') # print the number of NaNs before\n", "\n", "moma.dropna(subset=['date'], axis=0, inplace=True) # drop NaNs\n", "\n", "print('After dropna:')\n", "print('total:', moma.shape[0]) # print the total number of rows after\n", "print('NaNs:', moma['date'].isna().sum()) # print the number of NaNs after" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we'll prepare the data: remove `c.`, `(`, `)` and do some others replacements that you can see below." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date date_pattern_char_replace\n", "0 c. 1960s 1960s\n", "1 c. 1964, printed 1992 1964, printed 1992\n", "2 c.1935-1945 1935-1945\n", "3 c. 1983, signed 2007 1983, signed 2007\n", "4 (c. 1914-20) 1914-20\n", "5 1964, assembled c.1965 1964, assembled 1965\n", "6 1927. (Print executed c. 1925-1927). 1927. Print executed 1925-1927.\n", "7 published c. 1946 published 1946\n", "8 (1960s) 1960s\n", "9 1973 (published 1974) 1973 published 1974\n", "10 Published 1944 (Prints executed 1915-1930) Published 1944 Prints executed 1915-1930\n", "11 (September 29-October 24, 1967) September 29-October 24, 1967\n", "12 1965-66, printed 1983 1965-66, printed 1983\n", "13 1968 - 1972 1968-1972\n", "14 1947–49, published 1949 1947-49, published 1949\n", "15 Dec. 9, 1954 Dec. 9, 1954\n", "\n", "Before replace:\n", "138151 1935 (originals executed 1933–34)\n", "138152 1935 (originals executed 1933–34)\n", "138153 1935 (originals executed 1933–34)\n", "138154 1935 (originals executed 1933–34)\n", "138155 1935 (originals executed 1933–34)\n", "138156 1935 (originals executed 1933–34)\n", "138157 1935 (originals executed 1933–34)\n", "138158 1935 (originals executed 1933–34)\n", "138159 1935 (originals executed 1933–34)\n", "138160 1935 (originals executed 1933–34)\n", "Name: date, dtype: object\n", "\n", "After replace:\n", "138151 1935 originals executed 1933-34\n", "138152 1935 originals executed 1933-34\n", "138153 1935 originals executed 1933-34\n", "138154 1935 originals executed 1933-34\n", "138155 1935 originals executed 1933-34\n", "138156 1935 originals executed 1933-34\n", "138157 1935 originals executed 1933-34\n", "138158 1935 originals executed 1933-34\n", "138159 1935 originals executed 1933-34\n", "138160 1935 originals executed 1933-34\n", "Name: date, dtype: object\n" ] } ], "source": [ "date_pattern_char_replace = {r'(?:\\bc\\.\\s?|\\(|\\)|;|:)': '', # remove special chars\n", " r'\\s+': ' ', # reduce gaps\n", " r'(?:\\–|\\/|\\s\\-\\s)': '-' # set range character as hyphen\n", " } # dictionary to replace\n", "\n", "# Test\n", "date_test = pd.DataFrame(['c. 1960s',\n", " 'c. 1964, printed 1992',\n", " 'c.1935-1945',\n", " 'c. 1983, signed 2007',\n", " '(c. 1914-20)',\n", " '1964, assembled c.1965',\n", " '1927. (Print executed c. 1925-1927).',\n", " 'published c. 1946',\n", " '(1960s)',\n", " '1973 (published 1974)',\n", " 'Published 1944 (Prints executed 1915-1930)',\n", " '(September 29-October 24, 1967)',\n", " '1965-66, printed 1983',\n", " '1968 - 1972',\n", " '1947–49, published 1949',\n", " 'Dec. 9, 1954'\n", " ], columns=['date'])\n", "date_test['date_pattern_char_replace'] = date_test['date'].replace(regex=date_pattern_char_replace)\n", "print(date_test, end='\\n\\n')\n", "\n", "# Replace chars\n", "print('Before replace:', moma['date'].tail(10), sep='\\n', end='\\n\\n')\n", "moma['date'] = moma['date'].replace(regex=date_pattern_char_replace) # replace\n", "print('After replace:', moma['date'].tail(10), sep='\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look at the values without year pattern." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "moma values:\n", "n.d. 676\n", "Unknown 123\n", "unknown 19\n", "London?, published in aid of the Comforts Fund for Women and Children of Sovie 7\n", "no date 4\n", "TBC 3\n", "New York 2\n", "TBD 2\n", "Various 1\n", "196? 1\n", "date of publicati 1\n", "newspaper published March 30 1\n", "Unkown 1\n", "n.d 1\n", "nd 1\n", "Name: date, dtype: int64\n", "\n", "Matched: 843\n" ] } ], "source": [ "date_pattern_drop_1 = r'([0-2]\\d{3})'\n", "\n", "date_bool_drop_1 = moma['date'].str.count(date_pattern_drop_1) == 0 # bool mask to drop\n", "\n", "# Inspect values\n", "print('moma values:', moma.loc[date_bool_drop_1, 'date'].value_counts(dropna=False), sep='\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_drop_1.sum()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It doesn't make sense to do data recovery as part of this project. So we'll get rid of these rows." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop 1:\n", "total: 119096\n", "True 118253\n", "False 843\n", "Name: date, dtype: int64\n", "\n", "After drop 1:\n", "total: 118253\n", "True 118253\n", "Name: date, dtype: int64\n" ] } ], "source": [ "print('Before drop 1:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~date_bool_drop_1).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "date_drop_1 = moma[date_bool_drop_1].index # rows to drop\n", "moma.drop(index=date_drop_1, inplace=True)\n", "\n", "print('After drop 1:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((moma['date'].str.count(date_pattern_drop_1) != 0).value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll discard values such as `early 1940s`, `1920s`, since this is a rather vague period and therefore has no value for our task." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date date_pattern_drop_2\n", "0 1915? True\n", "1 1860s? True\n", "2 1880 ? True\n", "3 1920s True\n", "4 1880s-90s True\n", "5 1960s-1970s True\n", "6 1920s or 1930s True\n", "7 late 1950s True\n", "8 early 1940s True\n", "9 Early 1970's True\n", "\n", "moma values:\n", "early 1940s 152\n", "early 1930s 97\n", "1950s 84\n", "1960s 71\n", "1930s 63\n", "1920s 58\n", "1970s 54\n", "1901? 25\n", "1960s-1970s 12\n", "late 1950s 11\n", "1896? 11\n", "1990s 8\n", "early 1950s 8\n", "early 1960s 6\n", "1940s 6\n", "1870s 5\n", "late 1930s 4\n", "1980s 4\n", "1880s-90s 4\n", "late 1960s 4\n", "1850s 4\n", "1890s 3\n", "Late 1930s 3\n", "early 1970s 3\n", "late 1920s 3\n", "1960's 3\n", "Early 1950's 2\n", "1920s or 1930s 2\n", "1922? 2\n", "1956? 2\n", "1860s 2\n", "1840s 2\n", "1900? 2\n", "1930s or 1940s 2\n", "1910? 2\n", "1910s 2\n", "1970's 2\n", "1987? 1\n", "1880 ? 1\n", "1920's 1\n", "1908? 1\n", "1959? 1\n", "late 1850s 1\n", "1927? 1\n", "1915? 1\n", "Early 1960s 1\n", "1950s-1960s 1\n", "1860s or 70s 1\n", "1855? 1\n", "1879? 1\n", "late 1970's 1\n", "Early 1950s 1\n", "1880s 1\n", "Early 1970's 1\n", "1903? 1\n", "1969? 1\n", "early 1900s 1\n", "1860s? 1\n", "early 1980s 1\n", "1963? 1\n", "1930? 1\n", "1964? 1\n", "1929? 1\n", "late 1970s 1\n", "late 1940s 1\n", "1907? 1\n", "Name: date, dtype: int64\n", "\n", "Matched: 757\n" ] } ], "source": [ "date_pattern_drop_2 = (r'^(?:early|late)?\\s?[0-2]\\d{3}\\'?(?:s|\\s?\\?|s\\?)'\n", " '(?:(?:\\-|\\sor\\s)(?:[0-2]\\d)?\\d{2}\\'?(?:s|\\s?\\?|s\\?))?$'\n", " )\n", "\n", "# Test\n", "date_test = pd.DataFrame(['1915?',\n", " '1860s?',\n", " '1880 ?',\n", " '1920s',\n", " '1880s-90s',\n", " '1960s-1970s',\n", " '1920s or 1930s',\n", " 'late 1950s',\n", " 'early 1940s',\n", " 'Early 1970\\'s'\n", " ], columns=['date'])\n", "date_test['date_pattern_drop_2'] = date_test['date'].str.contains(date_pattern_drop_2, flags=re.I)\n", "print(date_test, end='\\n\\n')\n", "\n", "date_bool_drop_2 = moma['date'].str.contains(date_pattern_drop_2, flags=re.I) # bool mask to drop\n", "\n", "# Inspect values\n", "pd.set_option('display.max_rows', 80) # increase the number of rows to display\n", "print('moma values:', moma.loc[date_bool_drop_2, 'date'].value_counts(dropna=False), sep='\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_drop_2.sum()))\n", "pd.reset_option('display.max_rows') # reset the number of rows to display to default" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop 2:\n", "total: 118253\n", "True 117496\n", "False 757\n", "Name: date, dtype: int64\n", "\n", "After drop 2:\n", "total: 117496\n", "True 117496\n", "Name: date, dtype: int64\n" ] } ], "source": [ "print('Before drop 2:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~date_bool_drop_2).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "date_drop_2 = moma[date_bool_drop_2].index # rows to drop\n", "moma.drop(index=date_drop_2, inplace=True)\n", "\n", "print('After drop 2:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~moma['date'].str.contains(date_pattern_drop_2, flags=re.I)).value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And such as `Before 1900`, `After 1933`." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date date_pattern_drop_3\n", "0 Before 1900 True\n", "1 Before 1900? True\n", "2 After 1933 True\n", "3 after 1891 True\n", "\n", "moma values:\n", "Before 1900 177\n", "After 1933 62\n", "after 1891 34\n", "After 1924 16\n", "after 1938 9\n", "After 1929 8\n", "after 1888 6\n", "Before 1952 5\n", "after 1900 5\n", "Before 1899 4\n", "Before 1941 4\n", "Before 1948 3\n", "before 1927 3\n", "after 1910 2\n", "Before 1955 2\n", "after 1890 2\n", "Before 1910 2\n", "Before 1946 2\n", "Before 1900? 2\n", "before 1933 2\n", "Before 1975 2\n", "Before 1959 2\n", "before 1930 2\n", "After 1959 2\n", "Before 1979 1\n", "Before 1870 1\n", "Before 1931 1\n", "After 1925 1\n", "Before 1961 1\n", "before 1929 1\n", "after 1962 1\n", "After 1957 1\n", "before 1924 1\n", "Before 1929 1\n", "Before 1980 1\n", "Before 1949 1\n", "before 1887 1\n", "Before 1960 1\n", "before 1985 1\n", "Before 1935 1\n", "Before 1908 1\n", "Before 1954 1\n", "before 1943 1\n", "before 1971 1\n", "Before 1934 1\n", "after 1923 1\n", "Before 1933 1\n", "After 1954 1\n", "After 1852 1\n", "after 1895 1\n", "Before 1977 1\n", "Before 1932 1\n", "before 1928 1\n", "Name: date, dtype: int64\n", "\n", "Matched: 387\n" ] } ], "source": [ "date_pattern_drop_3 = r'^(?:before|after)\\s?[0-2]\\d{3}\\s?\\??$'\n", "\n", "# Test\n", "date_test = pd.DataFrame(['Before 1900',\n", " 'Before 1900?',\n", " 'After 1933',\n", " 'after 1891'\n", " ], columns=['date'])\n", "date_test['date_pattern_drop_3'] = date_test['date'].str.contains(date_pattern_drop_3, flags=re.I)\n", "print(date_test, end='\\n\\n')\n", "\n", "date_bool_drop_3 = moma['date'].str.contains(date_pattern_drop_3, flags=re.I) # bool mask to drop\n", "\n", "# Inspect values\n", "print('moma values:', moma.loc[date_bool_drop_3, 'date'].value_counts(dropna=False), sep='\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_drop_3.sum()))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop 3:\n", "total: 117496\n", "True 117109\n", "False 387\n", "Name: date, dtype: int64\n", "\n", "After drop 3:\n", "total: 117109\n", "True 117109\n", "Name: date, dtype: int64\n" ] } ], "source": [ "print('Before drop 3:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~date_bool_drop_3).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "date_drop_3 = moma[date_bool_drop_3].index # rows to drop\n", "moma.drop(index=date_drop_3, inplace=True)\n", "\n", "print('After drop 3:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~moma['date'].str.contains(date_pattern_drop_3, flags=re.I)).value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As well as the rows with an indistinct year such as `1898 or earlier`." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date date_pattern_drop_4\n", "0 1898 or earlier True\n", "1 1898 or before? True\n", "\n", "moma values:\n", "1899 or before 25\n", "1910 or earlier 14\n", "1910 or before 6\n", "1939 or before 3\n", "1898 or earlier 2\n", "1898 or before? 2\n", "1853 or earlier 1\n", "1911 or after 1\n", "1931 or after 1\n", "1931 or earlier 1\n", "Name: date, dtype: int64\n", "\n", "Matched: 56\n" ] } ], "source": [ "date_pattern_drop_4 = r'^[0-2]\\d{3}\\sor\\s(?:before|after|earlier)\\??$'\n", "\n", "# Test\n", "date_test = pd.DataFrame(['1898 or earlier',\n", " '1898 or before?'\n", " ], columns=['date'])\n", "date_test['date_pattern_drop_4'] = date_test['date'].str.contains(date_pattern_drop_4, flags=re.I)\n", "print(date_test, end='\\n\\n')\n", "\n", "date_bool_drop_4 = moma['date'].str.contains(date_pattern_drop_4, flags=re.I) # bool mask to drop\n", "\n", "# Inspect values\n", "print('moma values:', moma.loc[date_bool_drop_4, 'date'].value_counts(dropna=False), sep='\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_drop_4.sum()))" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop 4:\n", "total: 117109\n", "True 117053\n", "False 56\n", "Name: date, dtype: int64\n", "\n", "After drop 4:\n", "total: 117053\n", "True 117053\n", "Name: date, dtype: int64\n" ] } ], "source": [ "print('Before drop 4:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~date_bool_drop_4).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "date_drop_4 = moma[date_bool_drop_4].index # rows to drop\n", "moma.drop(index=date_drop_4, inplace=True)\n", "\n", "print('After drop 4:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~moma['date'].str.contains(date_pattern_drop_4, flags=re.I)).value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For some artworks, the years of publication are specified instead of the years of creation (`published 1965`, `published April 1898`). We'll remove these data from the dataset." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date date_pattern_drop_5\n", "0 published 1965 True\n", "1 Published 1946 True\n", "2 published April 1898 True\n", "3 newspaper published May 15-16, 1999 True\n", "4 Published 1944 Prints executed 1915-1930 False\n", "\n", "moma values:\n", "published 1942 25\n", "published April 1898 5\n", "published November 1897 4\n", "published October 1897 4\n", "published December 1897 3\n", " ..\n", "newspaper published August 23-29, 2004 1\n", "newspaper published May 5, 2000 1\n", "published January 1985 1\n", "newspaper published November 4, 2000 1\n", "newspaper published July 25, 1995 1\n", "Name: date, Length: 296, dtype: int64\n", "\n", "Matched: 350\n" ] } ], "source": [ "date_pattern_drop_5 = r'(?!.*prints executed.*)^(?:newspapers?\\s)?(?:published.*)'\n", "\n", "# Test\n", "date_test = pd.DataFrame(['published 1965',\n", " 'Published 1946',\n", " 'published April 1898',\n", " 'newspaper published May 15-16, 1999',\n", " 'Published 1944 Prints executed 1915-1930' # must be False (we'll explore this later)\n", " ], columns=['date'])\n", "date_test['date_pattern_drop_5'] = date_test['date'].str.contains(date_pattern_drop_5, flags=re.I)\n", "print(date_test, end='\\n\\n')\n", "\n", "date_bool_drop_5 = moma['date'].str.contains(date_pattern_drop_5, flags=re.I) # bool mask to drop\n", "\n", "# Inspect values\n", "print('moma values:', moma.loc[date_bool_drop_5, 'date'].value_counts(dropna=False), sep='\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_drop_5.sum()))" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop 5:\n", "total: 117053\n", "True 116703\n", "False 350\n", "Name: date, dtype: int64\n", "\n", "After drop 5:\n", "total: 116703\n", "True 116703\n", "Name: date, dtype: int64\n" ] } ], "source": [ "print('Before drop 5:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~date_bool_drop_5).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "date_drop_5 = moma[date_bool_drop_5].index # rows to drop\n", "moma.drop(index=date_drop_5, inplace=True)\n", "\n", "print('After drop 5:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~moma['date'].str.contains(date_pattern_drop_5, flags=re.I)).value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's distinguish three groups among the remaining values:\n", "\n", "- Year or range of years followed by additional information, for instance:\n", " - year of printing `(1964, printed 1992)`,\n", " - years of assembly `(1961, assembled 1964-65)`. \n", "\n", "- Year or range of years specified after `executed`:\n", " - `(1922, executed 1920-21)`,\n", " - for prints `(Print executed 1936)`. \n", "\n", "- Year or range of years with detailed information such as season, month, date, place, etc., such as:\n", " - `September 29-October 24, 1967`,\n", " - `August 5, 1877-June 22, 1894`,\n", " - `Fontainebleau, summer 1921`.\n", "\n", "\n", "We'll consider each case separately.\n", "\n", "In *the first case* we'll do the following:\n", "\n", "- Replace the secondary words with placeholder `updated` for convenience.\n", "- Extract the year or years of creation.\n", "- Store them in the additional columns `year_1` and `year_2` of the `moma` dataframe." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date year_1 year_2\n", "0 1896 1896 NaN\n", "1 1941-1948 1941 1948\n", "2 1969-70 1969 70\n", "3 1965 printed 2014 1965 NaN\n", "4 1964, printed 1992 1964 NaN\n", "5 2000-01, printed 2007 2000 01\n", "6 1973 published 1974 1973 NaN\n", "7 1941, published 1943 1941 NaN\n", "8 1975 Published 1976. 1975 NaN\n", "9 1947-49, published 1949 1947 49\n", "10 1918, published 1922-1923 1918 NaN\n", "11 1961, assembled 1964-65 1961 NaN\n", "12 1969, realized 1973 1969 NaN\n", "13 1983, signed 2007 1983 NaN\n", "14 1945, reprinted 1990 1945 NaN\n", "15 1961, reconstructed 1981 1961 NaN\n", "16 1963, fabricated 1975 1963 NaN\n", "17 1985, released 1990 1985 NaN\n", "18 1944, printed in 1967 1944 NaN\n", "19 1966 repainted in 1990 1966 NaN\n", "20 1950-52 manufactured 1955 1950 52\n", "21 1950-55-1980 1950 55\n", "\n", "moma values:\n", "1963-66, published 1966 1\n", "1953-1959 1\n", "1964-73 1\n", "1936, published 1938 1\n", "1983-2002 1\n", "1976-82 1\n", "1910-1915 1\n", "1960, assembled 1976 1\n", "1956-59 1\n", "1949-50, printed 1966 1\n", "1962-80, assembled 1980 1\n", "1958-87 1\n", "1926-27 cast 1976 1\n", "2001-2008 1\n", "1854-58 1\n", "1912, published 1921 1\n", "1893-1931 1\n", "1881-86 1\n", "1960-62, assembled 1965 1\n", "1967, printed in 1968 1\n", "1963-1978 1\n", "1944 cast 1954 1\n", "1898, published 1918 1\n", "1974-82 1\n", "1945, printed 1980 1\n", "1949, printed 1980 1\n", "1961-2010 1\n", "1986-88 1\n", "1969, published 1976 1\n", "1915 published 1918 1\n", "1902-09 1\n", "1882-84 1\n", "1897, printed 1950 1\n", "1928-50 1\n", "1966-1980 1\n", "1943-49 1\n", "1923-27 1\n", "1966 repainted in 1990 1\n", "1938-39 cast 1959 1\n", "1948 reprinted 1990 1\n", "Name: date, dtype: int64\n", "\n", "Matched: 110718\n" ] } ], "source": [ "date_updated_replace = (r'(?:published|repainted\\sin|printed\\sin|printed|assembled|'\n", " 'realized|signed|reprinted|reconstructed|fabricated|'\n", " 'released|cast|arranged|manufactured)'\n", " ) # keys to replace with placeholder\n", "date_pattern_1 = (r'^(?P[0-2]\\d{3})'\n", " '(?:\\-(?P(?:[0-2]\\d)?\\d{2}))?'\n", " '(?:,?(?:\\supdated|\\-)\\s?[0-2]\\d{3}(?:\\-(?:[0-2]\\d)?\\d{2})?)?s?\\.?$'\n", " )\n", "\n", "# Test\n", "date_test = pd.DataFrame(['1896',\n", " '1941-1948',\n", " '1969-70',\n", " '1965 printed 2014',\n", " '1964, printed 1992',\n", " '2000-01, printed 2007',\n", " '1973 published 1974',\n", " '1941, published 1943',\n", " '1975 Published 1976.',\n", " '1947-49, published 1949',\n", " '1918, published 1922-1923',\n", " '1961, assembled 1964-65',\n", " '1969, realized 1973',\n", " '1983, signed 2007',\n", " '1945, reprinted 1990',\n", " '1961, reconstructed 1981',\n", " '1963, fabricated 1975',\n", " '1985, released 1990',\n", " '1944, printed in 1967',\n", " '1966 repainted in 1990',\n", " '1950-52 manufactured 1955',\n", " '1950-55-1980'\n", " ], columns=['date'])\n", "date_test[['year_1', 'year_2']] = (date_test['date'].str.replace(date_updated_replace, 'updated', flags=re.I)\n", " .str.extract(date_pattern_1, flags=re.I)\n", " )\n", "print(date_test, end='\\n\\n')\n", "\n", "date_bool_1 = (moma['date'].str.replace(date_updated_replace, 'updated', flags=re.I)\n", " .str.match(date_pattern_1, flags=re.I)\n", " ) # bool mask to extract the years\n", "\n", "# Inspect values\n", "print('moma values:', moma.loc[date_bool_1, 'date'].value_counts(dropna=False).tail(40), sep='\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_1.sum()))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before extract 1:\n", "total: 116703\n", "True 110718\n", "False 5985\n", "Name: date, dtype: int64\n", "\n", "After extract 1:\n", " date year_1 year_2\n", "count 116703 110718 20104\n", "unique 6236 195 236\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateyear_1year_2
018961896NaN
119871987NaN
219031903NaN
319801980NaN
419031903NaN
51976-77197677
61976-77197677
71976-77197677
\n", "
" ], "text/plain": [ " date year_1 year_2\n", "0 1896 1896 NaN\n", "1 1987 1987 NaN\n", "2 1903 1903 NaN\n", "3 1980 1980 NaN\n", "4 1903 1903 NaN\n", "5 1976-77 1976 77\n", "6 1976-77 1976 77\n", "7 1976-77 1976 77" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Before extract 1:')\n", "# Total number of rows\n", "print('total:', moma.shape[0])\n", "# Number of rows matching the pattern (True) and the rest (False)\n", "print(date_bool_1.value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Extract\n", "moma.loc[date_bool_1, ['year_1', 'year_2']] = (moma.loc[date_bool_1, 'date']\n", " .str.replace(date_updated_replace, 'updated', flags=re.I)\n", " .str.extract(date_pattern_1, flags=re.I)\n", " )\n", "\n", "# Inspect values\n", "print('After extract 1:', moma[['date', 'year_1', 'year_2']].describe().loc[['count', 'unique']], sep='\\n', end='\\n\\n')\n", "moma.loc[date_bool_1, ['date', 'year_1', 'year_2']].head(8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have extracted most of the data in the `Date` column (94.9%)!\n", "\n", "Let's move on to *the second case* (that is, `executed`)." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Illustrated Book 630\n", "Print 11\n", "Periodical 2\n", "Name: classification, dtype: int64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(moma.loc[moma['date'].str.contains(r'print executed', flags=re.I), 'classification']\n", " .value_counts()\n", ")" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleartistartist_bionationalitybegin_dateend_dategenderdateclassificationdepartmentbegin_date_cleanend_date_cleangender_cleannationality_cleanyear_1year_2
7951Group in a Storm (Gruppe im Sturm)(plate 14) f...Ernst Barlach(German, 1870–1938)(German)(1870)(1938)(Male)1920 print executed 1919Illustrated BookDrawings & Prints18701938malegermanNaNNaN
7961Peasants Strolling (Bauerngang) from the illus...Heinrich Campendonk(German, 1889–1957)(German)(1889)(1957)(Male)1920 print executed 1918Illustrated BookDrawings & Prints18891957malegermanNaNNaN
7962Sick Girl (Krankes Mädchen) (plate 19) from th...Erich Heckel(German, 1883–1970)(German)(1883)(1970)(Male)1920 print executed 1913Illustrated BookDrawings & Prints18831970malegermanNaNNaN
7964Woman Desired by Man (Weib vom Manne begehrt) ...Max Pechstein(German, 1881–1955)(German)(1881)(1955)(Male)1920 print executed 1919Illustrated BookDrawings & Prints18811955malegermanNaNNaN
7965Woman's Head (plate 22) from the illustrated b...Karl Schmidt-Rottluff(German, 1884–1976)(German)(1884)(1976)(Male)1920 print executed in 1916Illustrated BookDrawings & Prints18841976malegermanNaNNaN
...................................................
10003940 dessins de Picasso en marge du BuffonPablo Picasso(Spanish, 1881–1973)(Spanish)(1881)(1973)(Male)Paris, Berggruen, 1957. Print executed 1954-1957.Illustrated BookDrawings & Prints18811973malespanishNaNNaN
10004040 dessins de Picasso en marge du BuffonPablo Picasso(Spanish, 1881–1973)(Spanish)(1881)(1973)(Male)Paris, Berggruen, 1957. Print executed 1954-1957.Illustrated BookDrawings & Prints18811973malespanishNaNNaN
102304Members of the Brücke Artists' Group (Titelvig...Ernst Ludwig Kirchner(German, 1880–1938)(German)(1880)(1938)(Male)1910 print executed 1907Illustrated BookDrawings & Prints18801938malegermanNaNNaN
102305PM (Passive Members) [PM (Passive Mitglieder)]...Ernst Ludwig Kirchner(German, 1880–1938)(German)(1880)(1938)(Male)1910 print executed 1907Illustrated BookDrawings & Prints18801938malegermanNaNNaN
102306PM (Passive Members) [PM (Passive Mitglieder)]...Ernst Ludwig Kirchner(German, 1880–1938)(German)(1880)(1938)(Male)1910 print executed 1908Illustrated BookDrawings & Prints18801938malegermanNaNNaN
\n", "

643 rows × 16 columns

\n", "
" ], "text/plain": [ " title artist artist_bio nationality \\\n", "7951 Group in a Storm (Gruppe im Sturm)(plate 14) f... Ernst Barlach (German, 1870–1938) (German) \n", "7961 Peasants Strolling (Bauerngang) from the illus... Heinrich Campendonk (German, 1889–1957) (German) \n", "7962 Sick Girl (Krankes Mädchen) (plate 19) from th... Erich Heckel (German, 1883–1970) (German) \n", "7964 Woman Desired by Man (Weib vom Manne begehrt) ... Max Pechstein (German, 1881–1955) (German) \n", "7965 Woman's Head (plate 22) from the illustrated b... Karl Schmidt-Rottluff (German, 1884–1976) (German) \n", "... ... ... ... ... \n", "100039 40 dessins de Picasso en marge du Buffon Pablo Picasso (Spanish, 1881–1973) (Spanish) \n", "100040 40 dessins de Picasso en marge du Buffon Pablo Picasso (Spanish, 1881–1973) (Spanish) \n", "102304 Members of the Brücke Artists' Group (Titelvig... Ernst Ludwig Kirchner (German, 1880–1938) (German) \n", "102305 PM (Passive Members) [PM (Passive Mitglieder)]... Ernst Ludwig Kirchner (German, 1880–1938) (German) \n", "102306 PM (Passive Members) [PM (Passive Mitglieder)]... Ernst Ludwig Kirchner (German, 1880–1938) (German) \n", "\n", " begin_date end_date gender date classification \\\n", "7951 (1870) (1938) (Male) 1920 print executed 1919 Illustrated Book \n", "7961 (1889) (1957) (Male) 1920 print executed 1918 Illustrated Book \n", "7962 (1883) (1970) (Male) 1920 print executed 1913 Illustrated Book \n", "7964 (1881) (1955) (Male) 1920 print executed 1919 Illustrated Book \n", "7965 (1884) (1976) (Male) 1920 print executed in 1916 Illustrated Book \n", "... ... ... ... ... ... \n", "100039 (1881) (1973) (Male) Paris, Berggruen, 1957. Print executed 1954-1957. Illustrated Book \n", "100040 (1881) (1973) (Male) Paris, Berggruen, 1957. Print executed 1954-1957. Illustrated Book \n", "102304 (1880) (1938) (Male) 1910 print executed 1907 Illustrated Book \n", "102305 (1880) (1938) (Male) 1910 print executed 1907 Illustrated Book \n", "102306 (1880) (1938) (Male) 1910 print executed 1908 Illustrated Book \n", "\n", " department begin_date_clean end_date_clean gender_clean nationality_clean year_1 year_2 \n", "7951 Drawings & Prints 1870 1938 male german NaN NaN \n", "7961 Drawings & Prints 1889 1957 male german NaN NaN \n", "7962 Drawings & Prints 1883 1970 male german NaN NaN \n", "7964 Drawings & Prints 1881 1955 male german NaN NaN \n", "7965 Drawings & Prints 1884 1976 male german NaN NaN \n", "... ... ... ... ... ... ... ... \n", "100039 Drawings & Prints 1881 1973 male spanish NaN NaN \n", "100040 Drawings & Prints 1881 1973 male spanish NaN NaN \n", "102304 Drawings & Prints 1880 1938 male german NaN NaN \n", "102305 Drawings & Prints 1880 1938 male german NaN NaN \n", "102306 Drawings & Prints 1880 1938 male german NaN NaN \n", "\n", "[643 rows x 16 columns]" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Value examples\n", "(moma.loc[moma['date'].str.contains(r'print executed', flags=re.I)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to extract the year when the print was created.\n", "\n", "Now let's process the values with `originals executed` etc." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date year_1 year_2\n", "0 1921 executed 1920 1920 NaN\n", "1 1922, executed 1920-21 1920 21\n", "2 1935 originals executed 1933-34 1933 34\n", "3 1935 drawings executed 1933-34 1933 34\n", "4 1922-23 original executed in 1922 1922 NaN\n", "5 1973-1974, executed 1973 1973 NaN\n", "6 Print executed 1936 1936 NaN\n", "7 Prints executed 1956 1956 NaN\n", "8 1950, print executed 1949-50 1949 50\n", "9 1972. Print executed 1971-1972. 1971 1972\n", "10 1962. Print executed 1960. 1960 NaN\n", "11 1944. Print executed 1942. 1942 NaN\n", "12 1927. Print executed 1925-1927. 1925 1927\n", "13 1963 Woodcuts executed 1907 1907 NaN\n", "14 1970. Sculpture executed 1968-1970. 1968 1970\n", "\n", "moma values:\n", "1898, prints executed 1897 1\n", "Prints executed 1953 1\n", "1950, prints executed 1945-1950 1\n", "1951. Prints executed 1948-1951. 1\n", "Print executed 1945 1\n", "1950. prints executed 1949. 1\n", "1951. Print executed 1942. 1\n", "1953 original executed in 1932 1\n", "Print executed 1909-1917 1\n", "1918 prints executed 1917 1\n", "Print executed 1930-1932 1\n", "1828. Prints executed 1826-1827. 1\n", "1926 print executed 1921 1\n", "Print executed 1924-1925 1\n", "1962, prints executed 1948 1\n", "1922 prints executed 1921 1\n", "1983. Prints executed 1981-1983. 1\n", "1917 executed 1914 1\n", "1944. Prints executed 1943. 1\n", "1962 Prints executed 1959-1961. 1\n", "1963. Prints executed 1949-1963. 1\n", "1968. Prints executed 1966-1967. 1\n", "1965. Prints executed 1964. 1\n", "1921. Prints executed 1920-1921. 1\n", "1968, prints executed 1967-68 1\n", "1920 print executed in 1916 1\n", "1931. Prints executed 1930. 1\n", "1920 print executed 1916 1\n", "Print executed 1960 1\n", "1918 executed 1911 1\n", "Prints executed 1954-1955 1\n", "1948. Prints executed 1942-1948. 1\n", "1947. Prints executed 1946-1947. 1\n", "1954, woodcut executed 1948 1\n", "1920 print executed 1918 1\n", "1927. Print executed 1926-1927. 1\n", "Print executed 1921 1\n", "Prints executed 1956-1957 1\n", "1930, prints executed 1929 1\n", "1964 Prints executed 1963 1\n", "Name: date, dtype: int64\n", "\n", "Matched: 1110\n" ] } ], "source": [ "date_char_trim = r'[\\.,]'\n", "date_pattern_2 = (r'^(?:[0-2]\\d{3})?(?:\\-\\d{2,4})?,?\\s?'\n", " '(?:originals?|drawings?|prints?|woodcuts?|sculpture?)?\\s?executed\\s(?:in\\s)?'\n", " '(?P[0-2]\\d{3})(?:\\-(?P(?:[0-2]\\d)?\\d{2}))?$'\n", " )\n", "\n", "# Test\n", "date_test = pd.DataFrame(['1921 executed 1920',\n", " '1922, executed 1920-21',\n", " '1935 originals executed 1933-34',\n", " '1935 drawings executed 1933-34',\n", " '1922-23 original executed in 1922',\n", " '1973-1974, executed 1973',\n", " 'Print executed 1936',\n", " 'Prints executed 1956',\n", " '1950, print executed 1949-50',\n", " '1972. Print executed 1971-1972.',\n", " '1962. Print executed 1960.',\n", " '1944. Print executed 1942.',\n", " '1927. Print executed 1925-1927.',\n", " '1963 Woodcuts executed 1907',\n", " '1970. Sculpture executed 1968-1970.'\n", " ], columns=['date'])\n", "date_test[['year_1', 'year_2']] = (date_test['date'].str.replace(date_char_trim, '', flags=re.I)\n", " .str.extract(date_pattern_2, flags=re.I)\n", " )\n", "print(date_test, end='\\n\\n')\n", "\n", "date_bool_2 = (moma['date'].str.replace(date_char_trim, '', flags=re.I)\n", " .str.match(date_pattern_2, flags=re.I)\n", " ) # bool mask to extract the years\n", "\n", "# Inspect values\n", "print('moma values:', moma.loc[date_bool_2, 'date'].value_counts(dropna=False).tail(40), sep='\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_2.sum()))" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before extract 2:\n", "total: 116703\n", "False 115593\n", "True 1110\n", "Name: date, dtype: int64\n", "\n", "After extract 2:\n", " date year_1 year_2\n", "count 116703 111828 20470\n", "unique 6236 195 238\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateyear_1year_2
110271944. Print executed 1938.1938NaN
110291944. Print executed 1942.1942NaN
110301944. Print executed 1942.1942NaN
110311944. Print executed 1942.1942NaN
114811963 Woodcuts executed 19071907NaN
117731962. Prints executed 1960-1962.19601962
118431918. Prints executed 1913.1913NaN
120041921. Prints executed 1920-1921.19201921
120111981. Prints executed 1979-1981.19791981
121381919, executed 19181918NaN
\n", "
" ], "text/plain": [ " date year_1 year_2\n", "11027 1944. Print executed 1938. 1938 NaN\n", "11029 1944. Print executed 1942. 1942 NaN\n", "11030 1944. Print executed 1942. 1942 NaN\n", "11031 1944. Print executed 1942. 1942 NaN\n", "11481 1963 Woodcuts executed 1907 1907 NaN\n", "11773 1962. Prints executed 1960-1962. 1960 1962\n", "11843 1918. Prints executed 1913. 1913 NaN\n", "12004 1921. Prints executed 1920-1921. 1920 1921\n", "12011 1981. Prints executed 1979-1981. 1979 1981\n", "12138 1919, executed 1918 1918 NaN" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Before extract 2:')\n", "# Total number of rows\n", "print('total:', moma.shape[0])\n", "# Number of rows matching the pattern (True) and the rest (False)\n", "print(date_bool_2.value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Extract\n", "moma.loc[date_bool_2, ['year_1', 'year_2']] = (moma.loc[date_bool_2, 'date'].str.replace(date_char_trim, '', flags=re.I)\n", " .str.extract(date_pattern_2, flags=re.I)\n", " )\n", "\n", "# Inspect values\n", "print('After extract 2:', moma[['date', 'year_1', 'year_2']].describe().loc[['count', 'unique']], sep='\\n', end='\\n\\n')\n", "moma.loc[date_bool_2, ['date', 'year_1', 'year_2']][120:130]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's continue with *the third case*. \n", "\n", "\n", "We'll classify this group of values as special cases. \n", "Since it is better for our task to drop data than to have questionable cleaned data, we will only clean up the values that we can say for sure that this is the year the artwork was created.\n", "\n", "For convenience, we'll first create a dictionary, with the patterns we are sure of. Then we'll extract the year or range of years the artwork was created from the values corresponding to the dictionary." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " date date_pattern_special year_1 year_2_2 year_2_4 year_2\n", "0 October 1977 True 1977 NaN NaN NaN\n", "1 August 15 1966 True 1966 NaN NaN NaN\n", "2 February 1, 1970 True 1970 NaN NaN NaN\n", "3 May 15, 1962. True 1962 NaN NaN NaN\n", "4 11 July 1854 True 1854 NaN NaN NaN\n", "5 May-June 1991 True 1991 NaN NaN NaN\n", "6 May 13-19, 1970 True 1970 NaN NaN NaN\n", "7 May 2-10 1969 True 1969 NaN NaN NaN\n", "8 September 29-October 24, 1967 True 1967 NaN NaN NaN\n", "9 August 5, 1877-June 22, 1894 True 1877 NaN 1894 1894\n", "10 Dec. 9, 1954 True 1954 NaN NaN NaN\n", "11 Spring 1909 True 1909 NaN NaN NaN\n", "12 Early 1969 True 1969 NaN NaN NaN\n", "13 Late 1924-1925 True 1924 NaN 1925 1925\n", "14 Mars 1926 True 1926 NaN NaN NaN\n", "15 Mars, 7 h. matin, 1925 True 1925 NaN NaN NaN\n", "16 Mai 1926 True 1926 NaN NaN NaN\n", "17 Mai, 8 h. matin, 1925 True 1925 NaN NaN NaN\n", "18 Gallifa, 1956 True 1956 NaN NaN NaN\n", "19 Juillet 1921 True 1921 NaN NaN NaN\n", "20 Fontainebleau, summer 1921 True 1921 NaN NaN NaN\n", "21 Avril, 7 h. matin, 1925 True 1925 NaN NaN NaN\n", "22 Juin, 7 h. matin, 1925 True 1925 NaN NaN NaN\n", "23 Kamakura, 1952 True 1952 NaN NaN NaN\n", "24 Août 1924 True 1924 NaN NaN NaN\n", "25 Decemer 1888 True 1888 NaN NaN NaN\n", "26 Issy-les-Moulineaux, summer 1916 True 1916 NaN NaN NaN\n", "27 Paris, early 1899 True 1899 NaN NaN NaN\n", "28 Paris, June - July 1914 True 1914 NaN NaN NaN\n", "29 Paris, winter 1914-15 True 1914 15 NaN 15\n", "30 Paris, spring 1908 True 1908 NaN NaN NaN\n", "31 Frankfurt 1920 True 1920 NaN NaN NaN\n", "32 Cannes, 1958 True 1958 NaN NaN NaN\n", "33 Mars, 8 h. matin, 1925 True 1925 NaN NaN NaN\n", "34 circa 1980 True 1980 NaN NaN NaN\n", "35 Begun 1938 True 1938 NaN NaN NaN\n", "36 Berlin 1926 True 1926 NaN NaN NaN\n", "37 Meudon 1932 True 1932 NaN NaN NaN\n", "38 Jupiter Island 1992 True 1992 NaN NaN NaN\n", "39 Seasons of 1871, 1872 and 1873 False 1873 NaN NaN NaN\n", "\n", "moma values:\n", "\n", "July 24-27, 1968 1\n", "April 21, 1893 1\n", "May 1968 1\n", "September 1 1965 1\n", "July 1934 1\n", "July 16, 1993 1\n", "August 11, 1893 1\n", "December 10-11, 1962 1\n", "July 27-August 5 1966 1\n", "January 19, 1986 1\n", "February 12-March 3 1966 1\n", "March 26-27 1963 1\n", "February 10, 1893 1\n", "June 2, 1964 1\n", "July 14-17, 1965 1\n", "Summer 1983 1\n", "June 14-19, 1968 1\n", "August 14-24 1961 1\n", "May 1961 1\n", "November 17, 1984 1\n", "Name: date, dtype: int64\n", "\n", "Matched: 3991\n" ] } ], "source": [ "date_char_trim = r'[\\.,]'\n", "date_pattern_special = (r'^(?:\\d{,2}\\s)?(?:issy-les-moulineaux\\ssummer|fontainebleau\\ssummer|'\n", " 'summer|spring|winter|autumn|fall|january|february|march|'\n", " 'april|may|june|july|august|september|october|november|'\n", " 'december|decemer|dec|begun|late|early|'\n", " 'mars\\s7\\sh\\smatin|mars\\s8\\sh\\smatin|mars|'\n", " 'avril\\s7\\sh\\smatin|avril|'\n", " 'paris\\sjune\\s-\\sjuly|paris\\searly|paris\\swinter|paris\\sspring|paris|'\n", " 'juin\\s7\\sh\\smatin|juin|'\n", " 'mai\\s8\\sh\\smatin|mai|'\n", " 'gallifa|juillet|kamakura|août|frankfurt|cannes|circa|hiver|bogotá|cuba|'\n", " 'berlin|meudon|jupiter\\sisland|barcelona|cavalière|arles|germany|rome|'\n", " 'horta\\sde\\ssan\\sjoan|collioure|new\\syork|saint\\srémy|issy-les-moulineaux)'\n", " ) # special cases\n", "date_pattern_3 = (r'^.*?(?P[0-2]\\d{3})'\n", " '(?:\\-(?:(?P[0-2]\\d)|.*(?P[0-2]\\d{3})))?$'\n", " )\n", "\n", "# Test\n", "date_test = pd.DataFrame(['October 1977',\n", " 'August 15 1966',\n", " 'February 1, 1970',\n", " 'May 15, 1962.',\n", " '11 July 1854',\n", " 'May-June 1991',\n", " 'May 13-19, 1970',\n", " 'May 2-10 1969',\n", " 'September 29-October 24, 1967',\n", " 'August 5, 1877-June 22, 1894',\n", " 'Dec. 9, 1954',\n", " 'Spring 1909',\n", " 'Early 1969',\n", " 'Late 1924-1925',\n", " 'Mars 1926',\n", " 'Mars, 7 h. matin, 1925',\n", " 'Mai 1926',\n", " 'Mai, 8 h. matin, 1925',\n", " 'Gallifa, 1956',\n", " 'Juillet 1921',\n", " 'Fontainebleau, summer 1921',\n", " 'Avril, 7 h. matin, 1925',\n", " 'Juin, 7 h. matin, 1925',\n", " 'Kamakura, 1952',\n", " 'Août 1924',\n", " 'Decemer 1888',\n", " 'Issy-les-Moulineaux, summer 1916',\n", " 'Paris, early 1899',\n", " 'Paris, June - July 1914',\n", " 'Paris, winter 1914-15',\n", " 'Paris, spring 1908',\n", " 'Frankfurt 1920',\n", " 'Cannes, 1958',\n", " 'Mars, 8 h. matin, 1925',\n", " 'circa 1980',\n", " 'Begun 1938',\n", " 'Berlin 1926',\n", " 'Meudon 1932',\n", " 'Jupiter Island 1992',\n", " 'Seasons of 1871, 1872 and 1873' # must be False (we'll explore this later)\n", " ], columns=['date'])\n", "date_test['date_pattern_special'] = (date_test['date'].str.replace(date_char_trim, '', flags=re.I)\n", " .str.contains(date_pattern_special, flags=re.I)\n", " )\n", "date_test[['year_1', 'year_2_2', 'year_2_4']] = (date_test['date'].str.replace(date_char_trim, '', flags=re.I)\n", " .str.extract(date_pattern_3, flags=re.I)\n", " )\n", "date_test['year_2'] = date_test['year_2_2'].fillna(date_test['year_2_4'])\n", "print(date_test, end='\\n\\n')\n", "\n", "date_bool_3 = (moma['date'].str.replace(date_char_trim, '', flags=re.I)\n", " .str.match(date_pattern_3, flags=re.I)\n", " & moma['year_1'].isnull() # among the remaining rows\n", " & moma['date'].str.replace(date_char_trim, '', flags=re.I)\n", " .str.contains(date_pattern_special, flags=re.I) # among the rows with special cases\n", " ) # bool mask to extract the years\n", "\n", "# Inspect values\n", "print('moma values:', moma.loc[date_bool_3, 'date'].value_counts(dropna=False).tail(20), sep='\\n\\n', end='\\n\\n')\n", "print('Matched: {}'.format(date_bool_3.sum()))" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before extract 3:\n", "total: 116703\n", "False 112712\n", "True 3991\n", "dtype: int64\n", "\n", "After extract 3:\n", " date year_1 year_2\n", "count 116703 115819 20651\n", "unique 6236 195 238\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateyear_1year_2
7602Late 1924-192519241925
9749April 1994 plate printed November 1983.1983NaN
9758December 19841984NaN
12550December 24, 1898.1898NaN
12577February 5, 1898-October 22, 189818981898
13216June 16, 18981898NaN
13237Paris, privately published, 1949.1949NaN
13239Paris, privately published, 1949.1949NaN
\n", "
" ], "text/plain": [ " date year_1 year_2\n", "7602 Late 1924-1925 1924 1925\n", "9749 April 1994 plate printed November 1983. 1983 NaN\n", "9758 December 1984 1984 NaN\n", "12550 December 24, 1898. 1898 NaN\n", "12577 February 5, 1898-October 22, 1898 1898 1898\n", "13216 June 16, 1898 1898 NaN\n", "13237 Paris, privately published, 1949. 1949 NaN\n", "13239 Paris, privately published, 1949. 1949 NaN" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Before extract 3:')\n", "# Total number of rows\n", "print('total:', moma.shape[0])\n", "# Number of rows matching the pattern (True) and the rest (False)\n", "print(date_bool_3.value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Extract\n", "moma.loc[date_bool_3, ['year_1', 'year_2_2', 'year_2_4']] = (moma.loc[date_bool_3, 'date']\n", " .str.replace(date_char_trim, '', flags=re.I)\n", " .str.extract(date_pattern_3, flags=re.I)\n", " )\n", "moma.loc[date_bool_3, 'year_2'] = moma.loc[date_bool_3, 'year_2_2'].fillna(moma.loc[date_bool_3, 'year_2_4'])\n", "\n", "# Inspect values\n", "print('After extract 3:', moma[['date', 'year_1', 'year_2']].describe().loc[['count', 'unique']], sep='\\n', end='\\n\\n')\n", "moma.loc[date_bool_3, ['date', 'year_1', 'year_2']].head(8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have processed 3991 special cases. In general, we have cleaned up 115,819 values.\n", "\n", "Let's evaluate the remaining data in the `date` column." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rest count: 884\n", "rest percentage: 0.76\n", "rest count unique: 582\n", "total: 116703\n", "\n", "moma values:\n", "1893 reproduced drawings executed 1891-93 33\n", "Seasons of 1871, 1872 and 1873 25\n", "1991 project begun 1989. 25\n", "1929. Play produced and reproduced drawings executed 1907. 24\n", "1947 reproduced drawing executed 1938 24\n", "book published, 1958 15\n", "Book commissioned, but unpublished by Vollard Print executed 1922 Book publi 14\n", "2015-ongoing 12\n", "Probably 1914-21 not later than 1925 10\n", "1999-present 9\n", "first published 1883 8\n", "Drawing on mylar executed 1967 8\n", "Commissioned, but unpublished by Vollard Print executed 1922-1927. 8\n", "Book commissioned, but unpublished by Vollard Print executed 1926 Book publi 8\n", "1919 reproduced drawings executed 1908-09 7\n", "Book commissioned, but unpublished by Vollard Print executed 1927 Book publi 7\n", "Name: date, dtype: int64\n" ] } ], "source": [ "date_bool_rest = ~(date_bool_1 \n", " | date_bool_2\n", " | date_bool_3\n", " )\n", "# Or another way\n", "# date_bool_not_year_1 = moma['year_1'].isnull()\n", "\n", "# Statistics for the rest rows\n", "print('rest count: {}'.format(date_bool_rest.sum()))\n", "print('rest percentage: {}'.format(round(date_bool_rest.sum()*100/moma.shape[0], 2)))\n", "print('rest count unique: {}'.format(moma.loc[date_bool_rest, 'date'].value_counts(dropna=False).shape[0]))\n", "print('total: {}'.format(moma.shape[0]), end='\\n\\n')\n", "\n", "# Inspect values\n", "print('moma values:',\n", " (moma.loc[date_bool_rest, 'date']\n", " .value_counts(dropna=False)\n", " .sort_values(ascending=False)\n", " .head(16)\n", " ),\n", " sep='\\n'\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Only 0.76% of questionable values remain in the `Data` column.\n", "\n", "We are currently looking for high-level results, so we will not waste time on the rest." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop rest:\n", "total: 116703\n", "True 115819\n", "False 884\n", "dtype: int64\n", "\n", "After drop rest:\n", "total: 115819\n", "True 115819\n", "Name: year_1, dtype: int64\n" ] } ], "source": [ "print('Before drop rest:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((~date_bool_rest).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "date_drop_rest = moma[date_bool_rest].index # rows to drop\n", "moma.drop(index=date_drop_rest, inplace=True)\n", "\n", "print('After drop rest:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print(moma['year_1'].notnull().value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's calculate the year the artwork was created." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " year_1 year_2\n", "138156 1933 1934\n", "138157 1933 1934\n", "138158 1933 1934\n", "138159 1933 1934\n", "138160 1933 1934\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
begin_date_cleandate_cleanage
01841189655
11944198743
21876190327
31944198036
41876190327
51944197632
61944197632
71944197632
81944197632
91944197632
\n", "
" ], "text/plain": [ " begin_date_clean date_clean age\n", "0 1841 1896 55\n", "1 1944 1987 43\n", "2 1876 1903 27\n", "3 1944 1980 36\n", "4 1876 1903 27\n", "5 1944 1976 32\n", "6 1944 1976 32\n", "7 1944 1976 32\n", "8 1944 1976 32\n", "9 1944 1976 32" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Fill in a two-digit year to four digits\n", "year_2_bool_two = moma['year_2'].str.len() == 2\n", "\n", "moma.loc[year_2_bool_two, 'year_2'] = (moma.loc[year_2_bool_two, 'year_1'].str[0:2] \n", " + moma.loc[year_2_bool_two, 'year_2']\n", " )\n", "\n", "# Inspect values\n", "print(moma.loc[year_2_bool_two, ['year_1', 'year_2']].tail())\n", "\n", "# Fill in NaN 'year_2' with 'year_1'\n", "moma['year_2'].fillna(value=moma['year_1'], inplace=True)\n", "# Cast years to int\n", "moma[['year_1', 'year_2']] = moma[['year_1', 'year_2']].astype(int)\n", "# Calculate date as average\n", "moma['date_clean'] = round((moma['year_2'] + moma['year_1']) / 2)\n", "moma['date_clean'] = moma['date_clean'].astype(int) # cast to int\n", "# Calculate age\n", "moma['age'] = moma['date_clean'] - moma['begin_date_clean']\n", "moma['age'] = moma['age'].astype(int) # cast to int\n", "# Inspect values\n", "moma[['begin_date_clean', 'date_clean', 'age']].head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to perform the following data validation:\n", "\n", "- `year_1 <= year_2` - valid range\n", "- `begin_date_clean < year_1`\n", "- for the death year:\n", " - `year_2 <= end_date_clean` if `end_date_clean` is specified \n", " or\n", " - `end_date_clean` may not be specified." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "invalid count: 1146\n", "invalid percentage: 0.99\n", "total: 115819\n", "\n", "moma invalid values:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleartistartist_bionationalitybegin_dateend_dategenderdateclassificationdepartmentbegin_date_cleanend_date_cleangender_cleannationality_cleanyear_1year_2year_2_2year_2_4date_cleanage
951Town Center, Seinajoki, FinlandAlvar Aalto(Finnish, 1898–1976)(Finnish)(1898)(1976)(Male)1958-87ArchitectureArchitecture & Design18981976malefinnish19581987NaNNaN197274
3348Canned Transistors SMS CardSam Lucente(American, born 1958)(American)(1958)(0)(Male)1955.DesignArchitecture & Design19580maleamerican19551955NaNNaN1955-3
3430Swiss Officers' Knife Champion (no. 5012)Karl Elsener(Swiss, 1860–1918)(Swiss)(1860)(1918)()1968DesignArchitecture & Design18601918NaNswiss19681968NaNNaN1968108
3594Rabbit Pattern Printed Fabric (no. 23583)William Morris(British, 1834–1896)(British)(1834)(1896)(Male)1938DesignArchitecture & Design18341896malebritish19381938NaNNaN1938104
3834Danger, Don't TouchAbram Games(British, 1914–1996)(British)(1914)(1996)(Male)1900-1945DesignArchitecture & Design19141996malebritish19001945NaNNaN19228
...............................................................
135037UntitledFelix Akinniran Olunloyo(Nigerian)(Nigerian)(1970)(0)(Male)1950-70PhotographPhotography19700malenigerian19501970NaNNaN1960-10
135038UntitledFelix Akinniran Olunloyo(Nigerian)(Nigerian)(1970)(0)(Male)1950-70PhotographPhotography19700malenigerian19501970NaNNaN1960-10
135039UntitledFelix Akinniran Olunloyo(Nigerian)(Nigerian)(1970)(0)(Male)1950-70PhotographPhotography19700malenigerian19501970NaNNaN1960-10
135040UntitledFelix Akinniran Olunloyo(Nigerian)(Nigerian)(1970)(0)(Male)1950-70PhotographPhotography19700malenigerian19501970NaNNaN1960-10
135041UntitledFelix Akinniran Olunloyo(Nigerian)(Nigerian)(1970)(0)(Male)1950-70PhotographPhotography19700malenigerian19501970NaNNaN1960-10
\n", "

1146 rows × 20 columns

\n", "
" ], "text/plain": [ " title artist artist_bio nationality \\\n", "951 Town Center, Seinajoki, Finland Alvar Aalto (Finnish, 1898–1976) (Finnish) \n", "3348 Canned Transistors SMS Card Sam Lucente (American, born 1958) (American) \n", "3430 Swiss Officers' Knife Champion (no. 5012) Karl Elsener (Swiss, 1860–1918) (Swiss) \n", "3594 Rabbit Pattern Printed Fabric (no. 23583) William Morris (British, 1834–1896) (British) \n", "3834 Danger, Don't Touch Abram Games (British, 1914–1996) (British) \n", "... ... ... ... ... \n", "135037 Untitled Felix Akinniran Olunloyo (Nigerian) (Nigerian) \n", "135038 Untitled Felix Akinniran Olunloyo (Nigerian) (Nigerian) \n", "135039 Untitled Felix Akinniran Olunloyo (Nigerian) (Nigerian) \n", "135040 Untitled Felix Akinniran Olunloyo (Nigerian) (Nigerian) \n", "135041 Untitled Felix Akinniran Olunloyo (Nigerian) (Nigerian) \n", "\n", " begin_date end_date gender date classification department begin_date_clean end_date_clean \\\n", "951 (1898) (1976) (Male) 1958-87 Architecture Architecture & Design 1898 1976 \n", "3348 (1958) (0) (Male) 1955. Design Architecture & Design 1958 0 \n", "3430 (1860) (1918) () 1968 Design Architecture & Design 1860 1918 \n", "3594 (1834) (1896) (Male) 1938 Design Architecture & Design 1834 1896 \n", "3834 (1914) (1996) (Male) 1900-1945 Design Architecture & Design 1914 1996 \n", "... ... ... ... ... ... ... ... ... \n", "135037 (1970) (0) (Male) 1950-70 Photograph Photography 1970 0 \n", "135038 (1970) (0) (Male) 1950-70 Photograph Photography 1970 0 \n", "135039 (1970) (0) (Male) 1950-70 Photograph Photography 1970 0 \n", "135040 (1970) (0) (Male) 1950-70 Photograph Photography 1970 0 \n", "135041 (1970) (0) (Male) 1950-70 Photograph Photography 1970 0 \n", "\n", " gender_clean nationality_clean year_1 year_2 year_2_2 year_2_4 date_clean age \n", "951 male finnish 1958 1987 NaN NaN 1972 74 \n", "3348 male american 1955 1955 NaN NaN 1955 -3 \n", "3430 NaN swiss 1968 1968 NaN NaN 1968 108 \n", "3594 male british 1938 1938 NaN NaN 1938 104 \n", "3834 male british 1900 1945 NaN NaN 1922 8 \n", "... ... ... ... ... ... ... ... ... \n", "135037 male nigerian 1950 1970 NaN NaN 1960 -10 \n", "135038 male nigerian 1950 1970 NaN NaN 1960 -10 \n", "135039 male nigerian 1950 1970 NaN NaN 1960 -10 \n", "135040 male nigerian 1950 1970 NaN NaN 1960 -10 \n", "135041 male nigerian 1950 1970 NaN NaN 1960 -10 \n", "\n", "[1146 rows x 20 columns]" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "moma_bool_invalid = ~(\n", " (moma['year_1'] <= moma['year_2'])\n", " & (moma['begin_date_clean'] < moma['year_1'])\n", " & (((moma['end_date_clean'] >= moma['year_2']) & (moma['end_date_clean'] != 0))\n", " | (moma['end_date_clean'] == 0)\n", " )\n", " )\n", "\n", "# Statistics for the rest rows\n", "print('invalid count: {}'.format(moma_bool_invalid.sum()))\n", "print('invalid percentage: {}'.format(round(moma_bool_invalid.sum()*100/moma.shape[0], 2)))\n", "print('total:', moma.shape[0], end='\\n\\n')\n", "\n", "# Inspect values\n", "print('moma invalid values:')\n", "moma.loc[moma_bool_invalid]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to drop the rows that don't pass the validation above." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop invalid:\n", "total: 115819\n", "True 114673\n", "False 1146\n", "dtype: int64\n", "\n", "After drop invalid:\n", "total: 114673\n", "True 114673\n", "dtype: int64\n" ] } ], "source": [ "print('Before drop invalid:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of the valid (True) and invalid (False) rows\n", "print((~moma_bool_invalid).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "moma_drop_invalid = moma[moma_bool_invalid].index # rows to drop\n", "moma.drop(index=moma_drop_invalid, inplace=True)\n", "\n", "print('After drop invalid:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of valid (True) and invalid (False) rows\n", "print((\n", " (moma['year_1'] <= moma['year_2'])\n", " & (moma['begin_date_clean'] < moma['year_1'])\n", " & (((moma['end_date_clean'] >= moma['year_2']) & (moma['end_date_clean'] != 0))\n", " | (moma['end_date_clean'] == 0)\n", " )\n", " ).value_counts(dropna=False)\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the `age` column." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 114673.000000\n", "mean 44.959459\n", "std 15.104405\n", "min 1.000000\n", "25% 34.000000\n", "50% 42.000000\n", "75% 53.000000\n", "max 102.000000\n", "Name: age, dtype: float64" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "moma['age'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see some suspicious outliers. For instance, it's unlikely that the artwork was created at the age of 1 year. What are these outliers?" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Louise Bourgeois 1184\n", "Marc Chagall 31\n", "Hans Erni 12\n", "Will Barnet 11\n", "Carmen Herrera 9\n", "Frank Lloyd Wright 8\n", "Sylvia Sleigh 1\n", "June Wayne 1\n", "Pablo Picasso 1\n", "Manoel de Oliveira 1\n", "Name: artist, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleartistartist_bionationalitybegin_dateend_dategenderdateclassificationdepartmentbegin_date_cleanend_date_cleangender_cleannationality_cleanyear_1year_2year_2_2year_2_4date_cleanage
109413Three Sons, plate 14 of 24, from the series, S...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
109406Michel, plate 7 of 24, from the series, Self P...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
109436Untitled, plate 13 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107875Untitled, plate 14 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
109438Untitled, plate 15 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107885Untitled, plate 14 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107886UntitledLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107887UntitledLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107888UntitledLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
100324Les FleursLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
109437Untitled, plate 14 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111747Ma MaisonLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111748Ma MaisonLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111749Where Would We Be Without Each Other?Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107904LipsLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111750Where Would We Be Without Each Other?Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111751Ma FamilleLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111752La DanseLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111753La DanseLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111754La DanseLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111755The SwarmLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111756UntitledLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111757UntitledLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
103908Have You Saved a Soul Today?Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
108650The NestLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
111816Self-PityLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009TextileDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107884Untitled, plate 14 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107883Untitled, plate 13 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
109425Untitled, plate 2 of 15, from the series, Natu...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107881Untitled, plate 8 of 15, from the series, Natu...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
110174Self PortraitLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
110189Nature StudyLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107876Untitled, plate 9 of 15, from the series, Natu...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
115482Self PortraitLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
115481Marriage, plate 5 of 24, from the series, Self...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
115480La Nausée, plate 22 of 24, from the series, Se...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
115479Nature StudyLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
115478Nature StudyLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107882Untitled, plate 9 of 15, from the series, Natu...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
115477Good Mother, plate 17 of 24, from the series, ...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107877Untitled, plate 10 of 15, from the series, Nat...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107880Untitled, plate 6 of 15, from the series, Natu...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107879UntitledLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107878Untitled, plate 5 of 15, from the series, Natu...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009PrintDrawings & Prints19112010femaleamerican20092009NaNNaN200998
107865Les FleursLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010PrintDrawings & Prints19112010femaleamerican20102010NaNNaN201099
109399To Whom It May ConcernLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2009-2010Illustrated BookDrawings & Prints19112010femaleamerican20092010NaNNaN201099
107866Les FleursLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010PrintDrawings & Prints19112010femaleamerican20102010NaNNaN201099
111210Deuxième promenade (plate, page 12) from Les r...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
111212Quatrième promenade (plate, page 24) from Les ...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
111215Septième promenade (plate, page 42) from Les r...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
107864Les FleursLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010PrintDrawings & Prints19112010femaleamerican20102010NaNNaN201099
111208Frontispiece from Les rêveries du promeneur so...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
111812I DoLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010PrintDrawings & Prints19112010femaleamerican20102010NaNNaN201099
110778Les rêveries du promeneur solitaire (Extraits)Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
107651UntitledLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010PrintDrawings & Prints19112010femaleamerican20102010NaNNaN201099
111211Troisième promenade (plate, page 18) from Les ...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
111213Cinquième promenade (plate, page 30) from Les ...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
107867Les FleursLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010PrintDrawings & Prints19112010femaleamerican20102010NaNNaN201099
111218Dixièmem promenade (plate, page 60) from Les r...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
113819Untitled, no. 12 of 12, from the illustrated b...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113818Untitled, no. 11 of 12, from the illustrated b...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
111214Sixième promenade (plate, page 36) from Les rê...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
111209Première promenade (plate, page 6) from Les rê...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
113815Untitled, no. 8 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113814Untitled, no. 7 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113813Untitled, no. 6 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113817Untitled, no. 10 of 12, from the illustrated b...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113816Untitled, no. 9 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113811Untitled, no. 4 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113810Untitled, no. 3 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113809Untitled, no. 2 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
113808Untitled, no. 1 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
111216Huitième promenade (plate, page 48) from Les r...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
111217Neuvième promenade (plate, page 54) from Les r...Hans Erni(Swiss, 1909–2015)(Swiss)(1909)(2015)(Male)2008Illustrated BookDrawings & Prints19092015maleswiss20082008NaNNaN200899
113812Untitled, no. 5 of 12, from the illustrated bo...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)2010Illustrated BookDrawings & Prints19112010femaleamerican20102010NaNNaN201099
133436Untitled from Verde y NegroCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
133435Untitled from Verde y NegroCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
133434Untitled from Verde y NegroCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
133433Untitled from Verde y AmarilloCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
115010The Strange Case of AngelicaManoel de Oliveira(Portuguese, 1908–2015)(Portuguese)(1908)(2015)()2010FilmFilm19082015NaNportuguese20102010NaNNaN2010102
133431Untitled from Verde y AmarilloCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
133205EquilibrioCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
133203Verde y AmarilloCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
133432Untitled from Verde y AmarilloCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
133204Verde y NegroCarmen Herrera(Cuban, born 1915)(Cuban)(1915)(0)(Female)2017PrintDrawings & Prints19150femalecuban20172017NaNNaN2017102
\n", "
" ], "text/plain": [ " title artist artist_bio \\\n", "109413 Three Sons, plate 14 of 24, from the series, S... Louise Bourgeois (American, born France. 1911–2010) \n", "109406 Michel, plate 7 of 24, from the series, Self P... Louise Bourgeois (American, born France. 1911–2010) \n", "109436 Untitled, plate 13 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "107875 Untitled, plate 14 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "109438 Untitled, plate 15 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "107885 Untitled, plate 14 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "107886 Untitled Louise Bourgeois (American, born France. 1911–2010) \n", "107887 Untitled Louise Bourgeois (American, born France. 1911–2010) \n", "107888 Untitled Louise Bourgeois (American, born France. 1911–2010) \n", "100324 Les Fleurs Louise Bourgeois (American, born France. 1911–2010) \n", "109437 Untitled, plate 14 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "111747 Ma Maison Louise Bourgeois (American, born France. 1911–2010) \n", "111748 Ma Maison Louise Bourgeois (American, born France. 1911–2010) \n", "111749 Where Would We Be Without Each Other? Louise Bourgeois (American, born France. 1911–2010) \n", "107904 Lips Louise Bourgeois (American, born France. 1911–2010) \n", "111750 Where Would We Be Without Each Other? Louise Bourgeois (American, born France. 1911–2010) \n", "111751 Ma Famille Louise Bourgeois (American, born France. 1911–2010) \n", "111752 La Danse Louise Bourgeois (American, born France. 1911–2010) \n", "111753 La Danse Louise Bourgeois (American, born France. 1911–2010) \n", "111754 La Danse Louise Bourgeois (American, born France. 1911–2010) \n", "111755 The Swarm Louise Bourgeois (American, born France. 1911–2010) \n", "111756 Untitled Louise Bourgeois (American, born France. 1911–2010) \n", "111757 Untitled Louise Bourgeois (American, born France. 1911–2010) \n", "103908 Have You Saved a Soul Today? Louise Bourgeois (American, born France. 1911–2010) \n", "108650 The Nest Louise Bourgeois (American, born France. 1911–2010) \n", "111816 Self-Pity Louise Bourgeois (American, born France. 1911–2010) \n", "107884 Untitled, plate 14 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "107883 Untitled, plate 13 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "109425 Untitled, plate 2 of 15, from the series, Natu... Louise Bourgeois (American, born France. 1911–2010) \n", "107881 Untitled, plate 8 of 15, from the series, Natu... Louise Bourgeois (American, born France. 1911–2010) \n", "110174 Self Portrait Louise Bourgeois (American, born France. 1911–2010) \n", "110189 Nature Study Louise Bourgeois (American, born France. 1911–2010) \n", "107876 Untitled, plate 9 of 15, from the series, Natu... Louise Bourgeois (American, born France. 1911–2010) \n", "115482 Self Portrait Louise Bourgeois (American, born France. 1911–2010) \n", "115481 Marriage, plate 5 of 24, from the series, Self... Louise Bourgeois (American, born France. 1911–2010) \n", "115480 La Nausée, plate 22 of 24, from the series, Se... Louise Bourgeois (American, born France. 1911–2010) \n", "115479 Nature Study Louise Bourgeois (American, born France. 1911–2010) \n", "115478 Nature Study Louise Bourgeois (American, born France. 1911–2010) \n", "107882 Untitled, plate 9 of 15, from the series, Natu... Louise Bourgeois (American, born France. 1911–2010) \n", "115477 Good Mother, plate 17 of 24, from the series, ... Louise Bourgeois (American, born France. 1911–2010) \n", "107877 Untitled, plate 10 of 15, from the series, Nat... Louise Bourgeois (American, born France. 1911–2010) \n", "107880 Untitled, plate 6 of 15, from the series, Natu... Louise Bourgeois (American, born France. 1911–2010) \n", "107879 Untitled Louise Bourgeois (American, born France. 1911–2010) \n", "107878 Untitled, plate 5 of 15, from the series, Natu... Louise Bourgeois (American, born France. 1911–2010) \n", "107865 Les Fleurs Louise Bourgeois (American, born France. 1911–2010) \n", "109399 To Whom It May Concern Louise Bourgeois (American, born France. 1911–2010) \n", "107866 Les Fleurs Louise Bourgeois (American, born France. 1911–2010) \n", "111210 Deuxième promenade (plate, page 12) from Les r... Hans Erni (Swiss, 1909–2015) \n", "111212 Quatrième promenade (plate, page 24) from Les ... Hans Erni (Swiss, 1909–2015) \n", "111215 Septième promenade (plate, page 42) from Les r... Hans Erni (Swiss, 1909–2015) \n", "107864 Les Fleurs Louise Bourgeois (American, born France. 1911–2010) \n", "111208 Frontispiece from Les rêveries du promeneur so... Hans Erni (Swiss, 1909–2015) \n", "111812 I Do Louise Bourgeois (American, born France. 1911–2010) \n", "110778 Les rêveries du promeneur solitaire (Extraits) Hans Erni (Swiss, 1909–2015) \n", "107651 Untitled Louise Bourgeois (American, born France. 1911–2010) \n", "111211 Troisième promenade (plate, page 18) from Les ... Hans Erni (Swiss, 1909–2015) \n", "111213 Cinquième promenade (plate, page 30) from Les ... Hans Erni (Swiss, 1909–2015) \n", "107867 Les Fleurs Louise Bourgeois (American, born France. 1911–2010) \n", "111218 Dixièmem promenade (plate, page 60) from Les r... Hans Erni (Swiss, 1909–2015) \n", "113819 Untitled, no. 12 of 12, from the illustrated b... Louise Bourgeois (American, born France. 1911–2010) \n", "113818 Untitled, no. 11 of 12, from the illustrated b... Louise Bourgeois (American, born France. 1911–2010) \n", "111214 Sixième promenade (plate, page 36) from Les rê... Hans Erni (Swiss, 1909–2015) \n", "111209 Première promenade (plate, page 6) from Les rê... Hans Erni (Swiss, 1909–2015) \n", "113815 Untitled, no. 8 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "113814 Untitled, no. 7 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "113813 Untitled, no. 6 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "113817 Untitled, no. 10 of 12, from the illustrated b... Louise Bourgeois (American, born France. 1911–2010) \n", "113816 Untitled, no. 9 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "113811 Untitled, no. 4 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "113810 Untitled, no. 3 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "113809 Untitled, no. 2 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "113808 Untitled, no. 1 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "111216 Huitième promenade (plate, page 48) from Les r... Hans Erni (Swiss, 1909–2015) \n", "111217 Neuvième promenade (plate, page 54) from Les r... Hans Erni (Swiss, 1909–2015) \n", "113812 Untitled, no. 5 of 12, from the illustrated bo... Louise Bourgeois (American, born France. 1911–2010) \n", "133436 Untitled from Verde y Negro Carmen Herrera (Cuban, born 1915) \n", "133435 Untitled from Verde y Negro Carmen Herrera (Cuban, born 1915) \n", "133434 Untitled from Verde y Negro Carmen Herrera (Cuban, born 1915) \n", "133433 Untitled from Verde y Amarillo Carmen Herrera (Cuban, born 1915) \n", "115010 The Strange Case of Angelica Manoel de Oliveira (Portuguese, 1908–2015) \n", "133431 Untitled from Verde y Amarillo Carmen Herrera (Cuban, born 1915) \n", "133205 Equilibrio Carmen Herrera (Cuban, born 1915) \n", "133203 Verde y Amarillo Carmen Herrera (Cuban, born 1915) \n", "133432 Untitled from Verde y Amarillo Carmen Herrera (Cuban, born 1915) \n", "133204 Verde y Negro Carmen Herrera (Cuban, born 1915) \n", "\n", " nationality begin_date end_date gender date classification department begin_date_clean \\\n", "109413 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "109406 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "109436 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107875 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "109438 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107885 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107886 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107887 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107888 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "100324 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "109437 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111747 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111748 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111749 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107904 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111750 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111751 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111752 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111753 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111754 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111755 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111756 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111757 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "103908 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "108650 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "111816 (American) (1911) (2010) (Female) 2009 Textile Drawings & Prints 1911 \n", "107884 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107883 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "109425 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107881 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "110174 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "110189 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107876 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "115482 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "115481 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "115480 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "115479 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "115478 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107882 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "115477 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107877 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107880 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107879 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107878 (American) (1911) (2010) (Female) 2009 Print Drawings & Prints 1911 \n", "107865 (American) (1911) (2010) (Female) 2010 Print Drawings & Prints 1911 \n", "109399 (American) (1911) (2010) (Female) 2009-2010 Illustrated Book Drawings & Prints 1911 \n", "107866 (American) (1911) (2010) (Female) 2010 Print Drawings & Prints 1911 \n", "111210 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "111212 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "111215 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "107864 (American) (1911) (2010) (Female) 2010 Print Drawings & Prints 1911 \n", "111208 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "111812 (American) (1911) (2010) (Female) 2010 Print Drawings & Prints 1911 \n", "110778 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "107651 (American) (1911) (2010) (Female) 2010 Print Drawings & Prints 1911 \n", "111211 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "111213 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "107867 (American) (1911) (2010) (Female) 2010 Print Drawings & Prints 1911 \n", "111218 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "113819 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113818 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "111214 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "111209 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "113815 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113814 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113813 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113817 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113816 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113811 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113810 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113809 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "113808 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "111216 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "111217 (Swiss) (1909) (2015) (Male) 2008 Illustrated Book Drawings & Prints 1909 \n", "113812 (American) (1911) (2010) (Female) 2010 Illustrated Book Drawings & Prints 1911 \n", "133436 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "133435 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "133434 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "133433 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "115010 (Portuguese) (1908) (2015) () 2010 Film Film 1908 \n", "133431 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "133205 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "133203 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "133432 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "133204 (Cuban) (1915) (0) (Female) 2017 Print Drawings & Prints 1915 \n", "\n", " end_date_clean gender_clean nationality_clean year_1 year_2 year_2_2 year_2_4 date_clean age \n", "109413 2010 female american 2009 2009 NaN NaN 2009 98 \n", "109406 2010 female american 2009 2009 NaN NaN 2009 98 \n", "109436 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107875 2010 female american 2009 2009 NaN NaN 2009 98 \n", "109438 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107885 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107886 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107887 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107888 2010 female american 2009 2009 NaN NaN 2009 98 \n", "100324 2010 female american 2009 2009 NaN NaN 2009 98 \n", "109437 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111747 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111748 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111749 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107904 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111750 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111751 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111752 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111753 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111754 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111755 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111756 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111757 2010 female american 2009 2009 NaN NaN 2009 98 \n", "103908 2010 female american 2009 2009 NaN NaN 2009 98 \n", "108650 2010 female american 2009 2009 NaN NaN 2009 98 \n", "111816 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107884 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107883 2010 female american 2009 2009 NaN NaN 2009 98 \n", "109425 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107881 2010 female american 2009 2009 NaN NaN 2009 98 \n", "110174 2010 female american 2009 2009 NaN NaN 2009 98 \n", "110189 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107876 2010 female american 2009 2009 NaN NaN 2009 98 \n", "115482 2010 female american 2009 2009 NaN NaN 2009 98 \n", "115481 2010 female american 2009 2009 NaN NaN 2009 98 \n", "115480 2010 female american 2009 2009 NaN NaN 2009 98 \n", "115479 2010 female american 2009 2009 NaN NaN 2009 98 \n", "115478 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107882 2010 female american 2009 2009 NaN NaN 2009 98 \n", "115477 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107877 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107880 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107879 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107878 2010 female american 2009 2009 NaN NaN 2009 98 \n", "107865 2010 female american 2010 2010 NaN NaN 2010 99 \n", "109399 2010 female american 2009 2010 NaN NaN 2010 99 \n", "107866 2010 female american 2010 2010 NaN NaN 2010 99 \n", "111210 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "111212 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "111215 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "107864 2010 female american 2010 2010 NaN NaN 2010 99 \n", "111208 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "111812 2010 female american 2010 2010 NaN NaN 2010 99 \n", "110778 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "107651 2010 female american 2010 2010 NaN NaN 2010 99 \n", "111211 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "111213 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "107867 2010 female american 2010 2010 NaN NaN 2010 99 \n", "111218 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "113819 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113818 2010 female american 2010 2010 NaN NaN 2010 99 \n", "111214 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "111209 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "113815 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113814 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113813 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113817 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113816 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113811 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113810 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113809 2010 female american 2010 2010 NaN NaN 2010 99 \n", "113808 2010 female american 2010 2010 NaN NaN 2010 99 \n", "111216 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "111217 2015 male swiss 2008 2008 NaN NaN 2008 99 \n", "113812 2010 female american 2010 2010 NaN NaN 2010 99 \n", "133436 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "133435 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "133434 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "133433 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "115010 2015 NaN portuguese 2010 2010 NaN NaN 2010 102 \n", "133431 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "133205 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "133203 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "133432 0 female cuban 2017 2017 NaN NaN 2017 102 \n", "133204 0 female cuban 2017 2017 NaN NaN 2017 102 " ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option('display.max_rows', 85) # increase the number of rows to display\n", "\n", "# Inspect values\n", "print(moma.loc[moma['age'] > 90, 'artist'].value_counts())\n", "moma[moma['age'] > 90].sort_values('age').tail(85)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "General Idea 35\n", "Grapus 3\n", "Hi Red Center 2\n", "Gorgona artists group 1\n", "B. Tillmann 1\n", "Grey Organisation 1\n", "Robin Schwartz 1\n", "Richard Pare 1\n", "J.P. Sniadecki 1\n", "Banana Equipment 1\n", "Raúl Anguiano 1\n", "Atelier Martine, Paris, France 1\n", "Hermine David 1\n", "Sam Lucente 1\n", "Name: artist, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleartistartist_bionationalitybegin_dateend_dategenderdateclassificationdepartmentbegin_date_cleanend_date_cleangender_cleannationality_cleanyear_1year_2year_2_2year_2_4date_cleanage
2124Printed FabricAtelier Martine, Paris, France(1911–1930)()(1911)(1930)()1912-1913DesignArchitecture & Design19111930NaNNaN19121913NaNNaN19121
136750Orgasm Energy ChartGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1970PrintDrawings & Prints19691994malecanadian19701970NaNNaN19701
101220News Flash! What Is the Communication Satellit...Hi Red Center(Japanese, 1963–1964)()(1963)(1964)()1964PrintFluxus Collection19631964NaNNaN19641964NaNNaN19641
131800Bundle of Events from Fluxus 1Hi Red Center(Japanese, 1963–1964)()(1963)(1964)()1964, assembled 1976Illustrated BookDrawings & Prints19631964NaNNaN19641964NaNNaN19641
63914Mid-summer MeadowRobin Schwartz(American, born 1957)(American)(1957)(0)(Female)1958PrintDrawings & Prints19570femaleamerican19581958NaNNaN19581
133087DemolitionJ.P. Sniadecki(American, born 1979)(American)(1979)(0)(Male)1980VideoFilm19790maleamerican19801980NaNNaN19801
136751Looking AheadGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1971PrintDrawings & Prints19691994malecanadian19711971NaNNaN19712
95212The 1970 Miss General Idea PageantGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1971Illustrated BookDrawings & Prints19691994malecanadian19711971NaNNaN19712
95213Artist's Conception: Miss General Idea 1971General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1971PrintDrawings & Prints19691994malecanadian19711971NaNNaN19712
136755CIGARETTE BURN executed by Andy Warhol on Gran...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1971PrintDrawings & Prints19691994malecanadian19711971NaNNaN19712
136756Light On (Double Mirror)General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1971PrintDrawings & Prints19691994malecanadian19711971NaNNaN19712
136757Light On (Drive-in Theatre)General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1971PrintDrawings & Prints19691994malecanadian19711971NaNNaN19712
136758Untitled Drawing (Bondage)General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1971PrintDrawings & Prints19691994malecanadian19711971NaNNaN19712
136752Gold-diggers of '84General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1972PrintDrawings & Prints19691994malecanadian19721972NaNNaN19723
6009St. PrexB. Tillmann(Swiss, born 1947)(Swiss)(1947)(0)()1950DesignArchitecture & Design19470NaNswiss19501950NaNNaN19503
56719(Untitled)Richard Pare(British, born 1948)(British)(1948)(0)(Male)1952PrintDrawings & Prints19480malebritish19521952NaNNaN19524
136759Luxon V.B.: The 1984 Miss General Idea Pavilli...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1973PrintDrawings & Prints19691994malecanadian19731973NaNNaN19734
95214Manipulating the SelfGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1973PrintDrawings & Prints19691994malecanadian19731973NaNNaN19734
61723The Band Stand, Menton (Menton, le kiosque à m...Hermine David(French, 1886–1971)(French)(1886)(1971)(Female)1890PrintDrawings & Prints18861971femalefrench18901890NaNNaN18904
136760Art Metropole LetterheadGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1974PrintDrawings & Prints19691994malecanadian19741974NaNNaN19745
1378862. The 1984 Miss General Idea Pageant: Voice OverGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378821. The Search for the Spirit of Miss General I...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378924. The 1984 Miss General Idea Pavillion: The 1...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378842. The 1984 Miss General Idea Pageant: The 198...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
136753Censor SunglassesGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378852. The 1984 Miss General Idea Pageant: Zoom Ou...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378913. Miss General Idea 1984: Miss General Idea 1984General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378934. The 1984 Miss General Idea Pavillion: Voice...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378831. The Search for the Spirit of Miss General I...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
121499Album cover for De La Soul, 3 Feet High and Ri...Grey Organisation(British, 1983–1991)(British)(1983)(1991)()1989DesignArchitecture & Design19831991NaNbritish19891989NaNNaN19896
95218Luxon Louvre (Ambiguity Without Contradiction)General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
3351Solid Logic Technology ModuleSam Lucente(American, born 1958)(American)(1958)(0)(Male)1964DesignArchitecture & Design19580maleamerican19641964NaNNaN19646
95216The Dr. Brute Colonnade (Dominant Imagery)General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
95217The Dr. Brute Colonnade and Drop Ceiling DetailGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
95219Miss General Idea Glove Pattern (Form Follows ...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
95220Pavillion Construction HoardingGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
95221Proposed Seating Arrangement (Form Follows Fic...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378955. Frame of Reference: Frame of ReferenceGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
109072Gorgona no. 8Gorgona artists group(Zagreb, Croatian 1961–1966)(Croatian)(1959)(1966)()1965PeriodicalDrawings & Prints19591966NaNcroatian19651965NaNNaN19656
1378944. The 1984 Miss General Idea Pavillion: Voice...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
6124ON Y VAGrapus(French, 1970–1991)(French)(1970)(1991)()1977DesignArchitecture & Design19701991NaNfrench19771977NaNNaN19777
95222General Idea: Search for the SpiritGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1976Illustrated BookDrawings & Prints19691994malecanadian19761976NaNNaN19767
95223S/HE: The 1984 Miss General Idea Pageant No. 102General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1977Illustrated BookDrawings & Prints19691994malecanadian19771977NaNNaN19778
1335454-021 The 1984 Miss General Idea Pavillion: No...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1977DrawingDrawings & Prints19691994malecanadian19771977NaNNaN19778
6642\"VIVRE LE 14 JUILLET/ 1978\"Grapus(French, 1970–1991)(French)(1970)(1991)()1978DesignArchitecture & Design19701991NaNfrench19781978NaNNaN19788
1378872. The 1984 Miss General Idea Pageant: The Tyr...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1977PrintDrawings & Prints19691994malecanadian19771977NaNNaN19778
6638Allez-Y De Ma PartGrapus(French, 1970–1991)(French)(1970)(1991)()1978DesignArchitecture & Design19701991NaNfrench19781978NaNNaN19788
131283Gore-Tex ParkaBanana Equipment(American, 1972–1980)(American)(1972)(1980)()1980DesignArchitecture & Design19721980NaNamerican19801980NaNNaN19808
95224General Idea: Ménage à TroisGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1978Illustrated BookDrawings & Prints19691994malecanadian19781978NaNNaN19789
95225Glamour di General IdeaGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1978PrintDrawings & Prints19691994malecanadian19781978NaNNaN19789
67412Peasant of Puebla (Campesino de Puebla)Raúl Anguiano(Mexican, 1915–2006)(Mexican)(1915)(2006)(Male)1924PrintDrawings & Prints19152006malemexican19241924NaNNaN19249
\n", "
" ], "text/plain": [ " title artist \\\n", "2124 Printed Fabric Atelier Martine, Paris, France \n", "136750 Orgasm Energy Chart General Idea \n", "101220 News Flash! What Is the Communication Satellit... Hi Red Center \n", "131800 Bundle of Events from Fluxus 1 Hi Red Center \n", "63914 Mid-summer Meadow Robin Schwartz \n", "133087 Demolition J.P. Sniadecki \n", "136751 Looking Ahead General Idea \n", "95212 The 1970 Miss General Idea Pageant General Idea \n", "95213 Artist's Conception: Miss General Idea 1971 General Idea \n", "136755 CIGARETTE BURN executed by Andy Warhol on Gran... General Idea \n", "136756 Light On (Double Mirror) General Idea \n", "136757 Light On (Drive-in Theatre) General Idea \n", "136758 Untitled Drawing (Bondage) General Idea \n", "136752 Gold-diggers of '84 General Idea \n", "6009 St. Prex B. Tillmann \n", "56719 (Untitled) Richard Pare \n", "136759 Luxon V.B.: The 1984 Miss General Idea Pavilli... General Idea \n", "95214 Manipulating the Self General Idea \n", "61723 The Band Stand, Menton (Menton, le kiosque à m... Hermine David \n", "136760 Art Metropole Letterhead General Idea \n", "137886 2. The 1984 Miss General Idea Pageant: Voice Over General Idea \n", "137882 1. The Search for the Spirit of Miss General I... General Idea \n", "137892 4. The 1984 Miss General Idea Pavillion: The 1... General Idea \n", "137884 2. The 1984 Miss General Idea Pageant: The 198... General Idea \n", "136753 Censor Sunglasses General Idea \n", "137885 2. The 1984 Miss General Idea Pageant: Zoom Ou... General Idea \n", "137891 3. Miss General Idea 1984: Miss General Idea 1984 General Idea \n", "137893 4. The 1984 Miss General Idea Pavillion: Voice... General Idea \n", "137883 1. The Search for the Spirit of Miss General I... General Idea \n", "121499 Album cover for De La Soul, 3 Feet High and Ri... Grey Organisation \n", "95218 Luxon Louvre (Ambiguity Without Contradiction) General Idea \n", "3351 Solid Logic Technology Module Sam Lucente \n", "95216 The Dr. Brute Colonnade (Dominant Imagery) General Idea \n", "95217 The Dr. Brute Colonnade and Drop Ceiling Detail General Idea \n", "95219 Miss General Idea Glove Pattern (Form Follows ... General Idea \n", "95220 Pavillion Construction Hoarding General Idea \n", "95221 Proposed Seating Arrangement (Form Follows Fic... General Idea \n", "137895 5. Frame of Reference: Frame of Reference General Idea \n", "109072 Gorgona no. 8 Gorgona artists group \n", "137894 4. The 1984 Miss General Idea Pavillion: Voice... General Idea \n", "6124 ON Y VA Grapus \n", "95222 General Idea: Search for the Spirit General Idea \n", "95223 S/HE: The 1984 Miss General Idea Pageant No. 102 General Idea \n", "133545 4-021 The 1984 Miss General Idea Pavillion: No... General Idea \n", "6642 \"VIVRE LE 14 JUILLET/ 1978\" Grapus \n", "137887 2. The 1984 Miss General Idea Pageant: The Tyr... General Idea \n", "6638 Allez-Y De Ma Part Grapus \n", "131283 Gore-Tex Parka Banana Equipment \n", "95224 General Idea: Ménage à Trois General Idea \n", "95225 Glamour di General Idea General Idea \n", "67412 Peasant of Puebla (Campesino de Puebla) Raúl Anguiano \n", "\n", " artist_bio nationality begin_date end_date gender date \\\n", "2124 (1911–1930) () (1911) (1930) () 1912-1913 \n", "136750 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1970 \n", "101220 (Japanese, 1963–1964) () (1963) (1964) () 1964 \n", "131800 (Japanese, 1963–1964) () (1963) (1964) () 1964, assembled 1976 \n", "63914 (American, born 1957) (American) (1957) (0) (Female) 1958 \n", "133087 (American, born 1979) (American) (1979) (0) (Male) 1980 \n", "136751 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1971 \n", "95212 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1971 \n", "95213 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1971 \n", "136755 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1971 \n", "136756 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1971 \n", "136757 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1971 \n", "136758 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1971 \n", "136752 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1972 \n", "6009 (Swiss, born 1947) (Swiss) (1947) (0) () 1950 \n", "56719 (British, born 1948) (British) (1948) (0) (Male) 1952 \n", "136759 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1973 \n", "95214 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1973 \n", "61723 (French, 1886–1971) (French) (1886) (1971) (Female) 1890 \n", "136760 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1974 \n", "137886 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137882 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137892 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137884 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "136753 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137885 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137891 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137893 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137883 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "121499 (British, 1983–1991) (British) (1983) (1991) () 1989 \n", "95218 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "3351 (American, born 1958) (American) (1958) (0) (Male) 1964 \n", "95216 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "95217 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "95219 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "95220 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "95221 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "137895 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "109072 (Zagreb, Croatian 1961–1966) (Croatian) (1959) (1966) () 1965 \n", "137894 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1975 \n", "6124 (French, 1970–1991) (French) (1970) (1991) () 1977 \n", "95222 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1976 \n", "95223 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1977 \n", "133545 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1977 \n", "6642 (French, 1970–1991) (French) (1970) (1991) () 1978 \n", "137887 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1977 \n", "6638 (French, 1970–1991) (French) (1970) (1991) () 1978 \n", "131283 (American, 1972–1980) (American) (1972) (1980) () 1980 \n", "95224 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1978 \n", "95225 (Canadian, 1969–1994) (Canadian) (1969) (1994) (Male) 1978 \n", "67412 (Mexican, 1915–2006) (Mexican) (1915) (2006) (Male) 1924 \n", "\n", " classification department begin_date_clean end_date_clean gender_clean nationality_clean \\\n", "2124 Design Architecture & Design 1911 1930 NaN NaN \n", "136750 Print Drawings & Prints 1969 1994 male canadian \n", "101220 Print Fluxus Collection 1963 1964 NaN NaN \n", "131800 Illustrated Book Drawings & Prints 1963 1964 NaN NaN \n", "63914 Print Drawings & Prints 1957 0 female american \n", "133087 Video Film 1979 0 male american \n", "136751 Print Drawings & Prints 1969 1994 male canadian \n", "95212 Illustrated Book Drawings & Prints 1969 1994 male canadian \n", "95213 Print Drawings & Prints 1969 1994 male canadian \n", "136755 Print Drawings & Prints 1969 1994 male canadian \n", "136756 Print Drawings & Prints 1969 1994 male canadian \n", "136757 Print Drawings & Prints 1969 1994 male canadian \n", "136758 Print Drawings & Prints 1969 1994 male canadian \n", "136752 Print Drawings & Prints 1969 1994 male canadian \n", "6009 Design Architecture & Design 1947 0 NaN swiss \n", "56719 Print Drawings & Prints 1948 0 male british \n", "136759 Print Drawings & Prints 1969 1994 male canadian \n", "95214 Print Drawings & Prints 1969 1994 male canadian \n", "61723 Print Drawings & Prints 1886 1971 female french \n", "136760 Print Drawings & Prints 1969 1994 male canadian \n", "137886 Print Drawings & Prints 1969 1994 male canadian \n", "137882 Print Drawings & Prints 1969 1994 male canadian \n", "137892 Print Drawings & Prints 1969 1994 male canadian \n", "137884 Print Drawings & Prints 1969 1994 male canadian \n", "136753 Print Drawings & Prints 1969 1994 male canadian \n", "137885 Print Drawings & Prints 1969 1994 male canadian \n", "137891 Print Drawings & Prints 1969 1994 male canadian \n", "137893 Print Drawings & Prints 1969 1994 male canadian \n", "137883 Print Drawings & Prints 1969 1994 male canadian \n", "121499 Design Architecture & Design 1983 1991 NaN british \n", "95218 Print Drawings & Prints 1969 1994 male canadian \n", "3351 Design Architecture & Design 1958 0 male american \n", "95216 Print Drawings & Prints 1969 1994 male canadian \n", "95217 Print Drawings & Prints 1969 1994 male canadian \n", "95219 Print Drawings & Prints 1969 1994 male canadian \n", "95220 Print Drawings & Prints 1969 1994 male canadian \n", "95221 Print Drawings & Prints 1969 1994 male canadian \n", "137895 Print Drawings & Prints 1969 1994 male canadian \n", "109072 Periodical Drawings & Prints 1959 1966 NaN croatian \n", "137894 Print Drawings & Prints 1969 1994 male canadian \n", "6124 Design Architecture & Design 1970 1991 NaN french \n", "95222 Illustrated Book Drawings & Prints 1969 1994 male canadian \n", "95223 Illustrated Book Drawings & Prints 1969 1994 male canadian \n", "133545 Drawing Drawings & Prints 1969 1994 male canadian \n", "6642 Design Architecture & Design 1970 1991 NaN french \n", "137887 Print Drawings & Prints 1969 1994 male canadian \n", "6638 Design Architecture & Design 1970 1991 NaN french \n", "131283 Design Architecture & Design 1972 1980 NaN american \n", "95224 Illustrated Book Drawings & Prints 1969 1994 male canadian \n", "95225 Print Drawings & Prints 1969 1994 male canadian \n", "67412 Print Drawings & Prints 1915 2006 male mexican \n", "\n", " year_1 year_2 year_2_2 year_2_4 date_clean age \n", "2124 1912 1913 NaN NaN 1912 1 \n", "136750 1970 1970 NaN NaN 1970 1 \n", "101220 1964 1964 NaN NaN 1964 1 \n", "131800 1964 1964 NaN NaN 1964 1 \n", "63914 1958 1958 NaN NaN 1958 1 \n", "133087 1980 1980 NaN NaN 1980 1 \n", "136751 1971 1971 NaN NaN 1971 2 \n", "95212 1971 1971 NaN NaN 1971 2 \n", "95213 1971 1971 NaN NaN 1971 2 \n", "136755 1971 1971 NaN NaN 1971 2 \n", "136756 1971 1971 NaN NaN 1971 2 \n", "136757 1971 1971 NaN NaN 1971 2 \n", "136758 1971 1971 NaN NaN 1971 2 \n", "136752 1972 1972 NaN NaN 1972 3 \n", "6009 1950 1950 NaN NaN 1950 3 \n", "56719 1952 1952 NaN NaN 1952 4 \n", "136759 1973 1973 NaN NaN 1973 4 \n", "95214 1973 1973 NaN NaN 1973 4 \n", "61723 1890 1890 NaN NaN 1890 4 \n", "136760 1974 1974 NaN NaN 1974 5 \n", "137886 1975 1975 NaN NaN 1975 6 \n", "137882 1975 1975 NaN NaN 1975 6 \n", "137892 1975 1975 NaN NaN 1975 6 \n", "137884 1975 1975 NaN NaN 1975 6 \n", "136753 1975 1975 NaN NaN 1975 6 \n", "137885 1975 1975 NaN NaN 1975 6 \n", "137891 1975 1975 NaN NaN 1975 6 \n", "137893 1975 1975 NaN NaN 1975 6 \n", "137883 1975 1975 NaN NaN 1975 6 \n", "121499 1989 1989 NaN NaN 1989 6 \n", "95218 1975 1975 NaN NaN 1975 6 \n", "3351 1964 1964 NaN NaN 1964 6 \n", "95216 1975 1975 NaN NaN 1975 6 \n", "95217 1975 1975 NaN NaN 1975 6 \n", "95219 1975 1975 NaN NaN 1975 6 \n", "95220 1975 1975 NaN NaN 1975 6 \n", "95221 1975 1975 NaN NaN 1975 6 \n", "137895 1975 1975 NaN NaN 1975 6 \n", "109072 1965 1965 NaN NaN 1965 6 \n", "137894 1975 1975 NaN NaN 1975 6 \n", "6124 1977 1977 NaN NaN 1977 7 \n", "95222 1976 1976 NaN NaN 1976 7 \n", "95223 1977 1977 NaN NaN 1977 8 \n", "133545 1977 1977 NaN NaN 1977 8 \n", "6642 1978 1978 NaN NaN 1978 8 \n", "137887 1977 1977 NaN NaN 1977 8 \n", "6638 1978 1978 NaN NaN 1978 8 \n", "131283 1980 1980 NaN NaN 1980 8 \n", "95224 1978 1978 NaN NaN 1978 9 \n", "95225 1978 1978 NaN NaN 1978 9 \n", "67412 1924 1924 NaN NaN 1924 9 " ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Inspect values\n", "print(moma.loc[moma['age'] < 10, 'artist'].value_counts())\n", "moma[moma['age'] < 10].sort_values('age')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The age values over 90 years old look quite likely.\n", "\n", "There are many teams and organizations with the `age` value less than 10 years old.\n", "\n", "As we mentioned above, there is no clear way to check whether the artist is an individual or a team. \n", "However, as a result of exploring the outliers, we found such teams as `Hi Red Center`, `General Idea`, `Gorgona artists group`, `Gray Organization`. Let's remove them from the dataset." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "org count: 114\n", "org percentage: 0.1\n", "total: 114673\n", "\n", "moma org values:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleartistartist_bionationalitybegin_dateend_dategenderdateclassificationdepartmentbegin_date_cleanend_date_cleangender_cleannationality_cleanyear_1year_2year_2_2year_2_4date_cleanage
2124Printed FabricAtelier Martine, Paris, France(1911–1930)()(1911)(1930)()1912-1913DesignArchitecture & Design19111930NaNNaN19121913NaNNaN19121
5580Raymond Loewy dessine la StudebakerGrapus(French, 1970–1991)(French)(1970)(1991)()1987DesignArchitecture & Design19701991NaNfrench19871987NaNNaN198717
6124ON Y VAGrapus(French, 1970–1991)(French)(1970)(1991)()1977DesignArchitecture & Design19701991NaNfrench19771977NaNNaN19777
6638Allez-Y De Ma PartGrapus(French, 1970–1991)(French)(1970)(1991)()1978DesignArchitecture & Design19701991NaNfrench19781978NaNNaN19788
6642\"VIVRE LE 14 JUILLET/ 1978\"Grapus(French, 1970–1991)(French)(1970)(1991)()1978DesignArchitecture & Design19701991NaNfrench19781978NaNNaN19788
...............................................................
1378913. Miss General Idea 1984: Miss General Idea 1984General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378924. The 1984 Miss General Idea Pavillion: The 1...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378934. The 1984 Miss General Idea Pavillion: Voice...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378944. The 1984 Miss General Idea Pavillion: Voice...General Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
1378955. Frame of Reference: Frame of ReferenceGeneral Idea(Canadian, 1969–1994)(Canadian)(1969)(1994)(Male)1975PrintDrawings & Prints19691994malecanadian19751975NaNNaN19756
\n", "

114 rows × 20 columns

\n", "
" ], "text/plain": [ " title artist artist_bio \\\n", "2124 Printed Fabric Atelier Martine, Paris, France (1911–1930) \n", "5580 Raymond Loewy dessine la Studebaker Grapus (French, 1970–1991) \n", "6124 ON Y VA Grapus (French, 1970–1991) \n", "6638 Allez-Y De Ma Part Grapus (French, 1970–1991) \n", "6642 \"VIVRE LE 14 JUILLET/ 1978\" Grapus (French, 1970–1991) \n", "... ... ... ... \n", "137891 3. Miss General Idea 1984: Miss General Idea 1984 General Idea (Canadian, 1969–1994) \n", "137892 4. The 1984 Miss General Idea Pavillion: The 1... General Idea (Canadian, 1969–1994) \n", "137893 4. The 1984 Miss General Idea Pavillion: Voice... General Idea (Canadian, 1969–1994) \n", "137894 4. The 1984 Miss General Idea Pavillion: Voice... General Idea (Canadian, 1969–1994) \n", "137895 5. Frame of Reference: Frame of Reference General Idea (Canadian, 1969–1994) \n", "\n", " nationality begin_date end_date gender date classification department begin_date_clean \\\n", "2124 () (1911) (1930) () 1912-1913 Design Architecture & Design 1911 \n", "5580 (French) (1970) (1991) () 1987 Design Architecture & Design 1970 \n", "6124 (French) (1970) (1991) () 1977 Design Architecture & Design 1970 \n", "6638 (French) (1970) (1991) () 1978 Design Architecture & Design 1970 \n", "6642 (French) (1970) (1991) () 1978 Design Architecture & Design 1970 \n", "... ... ... ... ... ... ... ... ... \n", "137891 (Canadian) (1969) (1994) (Male) 1975 Print Drawings & Prints 1969 \n", "137892 (Canadian) (1969) (1994) (Male) 1975 Print Drawings & Prints 1969 \n", "137893 (Canadian) (1969) (1994) (Male) 1975 Print Drawings & Prints 1969 \n", "137894 (Canadian) (1969) (1994) (Male) 1975 Print Drawings & Prints 1969 \n", "137895 (Canadian) (1969) (1994) (Male) 1975 Print Drawings & Prints 1969 \n", "\n", " end_date_clean gender_clean nationality_clean year_1 year_2 year_2_2 year_2_4 date_clean age \n", "2124 1930 NaN NaN 1912 1913 NaN NaN 1912 1 \n", "5580 1991 NaN french 1987 1987 NaN NaN 1987 17 \n", "6124 1991 NaN french 1977 1977 NaN NaN 1977 7 \n", "6638 1991 NaN french 1978 1978 NaN NaN 1978 8 \n", "6642 1991 NaN french 1978 1978 NaN NaN 1978 8 \n", "... ... ... ... ... ... ... ... ... ... \n", "137891 1994 male canadian 1975 1975 NaN NaN 1975 6 \n", "137892 1994 male canadian 1975 1975 NaN NaN 1975 6 \n", "137893 1994 male canadian 1975 1975 NaN NaN 1975 6 \n", "137894 1994 male canadian 1975 1975 NaN NaN 1975 6 \n", "137895 1994 male canadian 1975 1975 NaN NaN 1975 6 \n", "\n", "[114 rows x 20 columns]" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "artist_org = ['Hi Red Center', 'General Idea', 'Gorgona artists group', \n", " 'Grey Organisation', 'Grapus', 'Banana Equipment', \n", " 'Atelier Martine, Paris, France'\n", " ]\n", "artist_bool_org = moma['artist'].isin(artist_org)\n", "\n", "# Statistics for the rest rows\n", "print('org count:', artist_bool_org.sum())\n", "print('org percentage:', round(artist_bool_org.sum()*100/moma.shape[0], 2))\n", "print('total:', moma.shape[0], end='\\n\\n')\n", "\n", "# Inspect values\n", "print('moma org values:')\n", "moma.loc[artist_bool_org]" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before drop org:\n", "total: 114673\n", "True 114559\n", "False 114\n", "Name: artist, dtype: int64\n", "\n", "After drop org:\n", "total: 114559\n", "True 114559\n", "Name: artist, dtype: int64\n" ] } ], "source": [ "print('Before drop org:')\n", "# Total number of rows before\n", "print('total:', moma.shape[0])\n", "# Number of the valid (True) and invalid (False) rows\n", "print((~artist_bool_org).value_counts(dropna=False), end='\\n\\n')\n", "\n", "# Drop\n", "artist_drop_org = moma[artist_bool_org].index # rows to drop\n", "moma.drop(index=artist_drop_org, inplace=True)\n", "\n", "print('After drop org:')\n", "# Total number of rows after\n", "print('total:', moma.shape[0])\n", "# Number of the valid (True) and invalid (False) rows\n", "print((~(moma['artist'].isin(artist_org))).value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusions\n", "\n", "Since we were focused on getting quick high-level results, we assume that there are some erroneous values among the cleared data. \n", "\n", "Now, let's continue with the `age` values." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "33 3.412216\n", "37 3.166927\n", "35 3.165181\n", "30 3.134629\n", "32 3.121536\n", "36 3.118044\n", "40 2.978378\n", "39 2.891087\n", "43 2.888468\n", "41 2.881485\n", "38 2.819508\n", "45 2.773243\n", "34 2.741819\n", "31 2.704283\n", "46 2.690317\n", "47 2.625721\n", "49 2.509624\n", "42 2.467724\n", "29 2.417968\n", "28 2.291396\n", "Name: age, dtype: float64" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "moma['age'].value_counts(dropna=False, normalize=True).head(20)*100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is the distribution of the number of artworks by age groups." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(30.0, 35.0] 15.145034\n", "(35.0, 40.0] 14.973944\n", "(40.0, 45.0] 13.231610\n", "(45.0, 50.0] 11.708377\n", "(25.0, 30.0] 11.234386\n", "(50.0, 55.0] 8.020321\n", "(55.0, 60.0] 6.456935\n", "(60.0, 65.0] 4.947669\n", "(20.0, 25.0] 3.635681\n", "(65.0, 70.0] 3.479430\n", "(70.0, 75.0] 1.860177\n", "(75.0, 80.0] 1.476095\n", "(80.0, 85.0] 1.112964\n", "(85.0, 90.0] 1.067572\n", "(90.0, 95.0] 0.801334\n", "(15.0, 20.0] 0.432092\n", "(95.0, 100.0] 0.288934\n", "(10.0, 15.0] 0.104750\n", "(5.0, 10.0] 0.009602\n", "(100.0, 105.0] 0.008729\n", "(-0.001, 5.0] 0.004365\n", "Name: age, dtype: float64" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bins=[i for i in range(0, 110, 5)] # age groups\n", "moma['age'].value_counts(dropna=False, bins=bins, normalize=True) * 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most of the artworks were created in the 4th decade of life. The peak is 33 years old.\n", "\n", "### The overall plot\n", "\n", "Let's plot the distribution of the number of artworks by age." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "# Import libs\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "# Turn on svg rendering\n", "%config InlineBackend.figure_format = 'svg'\n", "\n", "# Color palette for the blog\n", "snark_palette = ['#e0675a', # red\n", " '#5ca0af', # green\n", " '#edde7e', # yellow\n", " '#211c47' # dark blue\n", " ]" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " 2020-09-05T15:08:39.385823\r\n", " image/svg+xml\r\n", " \r\n", " \r\n", " Matplotlib v3.3.1, https://matplotlib.org/\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", "\r\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Set the figure\n", "sns.set(context='paper', style='ticks', palette=snark_palette,\n", " rc={'xtick.major.size': 4, 'ytick.left':False,\n", " 'axes.spines.left': False, 'axes.spines.bottom': True,\n", " 'axes.spines.right': False, 'axes.spines.top': False\n", " }\n", " )\n", "\n", "# Create the plot\n", "ax_age = sns.distplot(moma['age'], hist=True, rug=False)\n", "ax_age.axvline(x=33, ymin=0, ymax=0.97, marker='x', linestyle=':', color=snark_palette[-1]) # 33 boundary\n", "\n", "# Set some aesthetic params for the plot\n", "ax_age.annotate('33', [35, 0.0325], c=snark_palette[-1]) # set label for the 33 boundary\n", "ax_age.set_title('Amount of Artworks by Age', loc='right', pad=0, c=snark_palette[-1]) # set title of the plot\n", "ax_age.set_xlabel('Age', c=snark_palette[-1]) # set label of x axis\n", "ax_age.get_yaxis().set_visible(False) # hide y axis\n", "ax_age.set_xticks([i for i in range(0, 110, 10)]) # set x ticks labels\n", "ax_age.set_xlim([10, 100]) # set x axis range\n", "ax_age.tick_params(axis='x', colors=snark_palette[-1]) # color x ticks\n", "ax_age.spines['bottom'].set_color(snark_palette[-1]) # color x axis\n", "\n", "# Save and plot\n", "plt.savefig('plot.pic\\plot.age.png', dpi=150)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's curious that most of the works were created at the age of 33! \n", "33 is a meaningful age. For example, in Christianity, Jesus was crucified and then resurrected at the age of 33.\n", "\n", "We can assume that there is a certain time lag between the origin of the idea and its implementation, the artist had an idea a little earlier.\n", "\n", "### Plot by gender\n", "\n", "We are interested in plotting the distribution of the number of artworks by age for men and women." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(35.0, 40.0] 16.370506\n", "(30.0, 35.0] 14.983175\n", "(40.0, 45.0] 12.362005\n", "(25.0, 30.0] 10.407934\n", "(45.0, 50.0] 10.278057\n", "(50.0, 55.0] 6.033414\n", "(85.0, 90.0] 5.372218\n", "(90.0, 95.0] 5.118366\n", "(80.0, 85.0] 4.102958\n", "(60.0, 65.0] 3.099357\n", "(20.0, 25.0] 3.081646\n", "(55.0, 60.0] 2.697916\n", "(95.0, 100.0] 1.883228\n", "(75.0, 80.0] 1.611665\n", "(65.0, 70.0] 1.322392\n", "(70.0, 75.0] 0.702521\n", "(15.0, 20.0] 0.489994\n", "(100.0, 105.0] 0.053132\n", "(10.0, 15.0] 0.017711\n", "(-0.001, 5.0] 0.011807\n", "Name: age, dtype: float64" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Women\n", "moma.loc[(moma['gender_clean'] == 'female'), 'age'].value_counts(normalize=True, bins=bins).head(20) * 100" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(30.0, 35.0] 15.131045\n", "(35.0, 40.0] 14.682748\n", "(40.0, 45.0] 13.401606\n", "(45.0, 50.0] 11.961093\n", "(25.0, 30.0] 11.382214\n", "(50.0, 55.0] 8.388085\n", "(55.0, 60.0] 7.131620\n", "(60.0, 65.0] 5.284967\n", "(65.0, 70.0] 3.868102\n", "(20.0, 25.0] 3.719013\n", "(70.0, 75.0] 2.068746\n", "(75.0, 80.0] 1.457993\n", "(80.0, 85.0] 0.596358\n", "(15.0, 20.0] 0.413338\n", "(85.0, 90.0] 0.321828\n", "(10.0, 15.0] 0.120300\n", "(90.0, 95.0] 0.052438\n", "(95.0, 100.0] 0.012338\n", "(5.0, 10.0] 0.004113\n", "(-0.001, 5.0] 0.002056\n", "Name: age, dtype: float64" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Men\n", "moma.loc[(moma['gender_clean'] == 'male'), 'age'].value_counts(normalize=True, bins=bins).head(20) * 100" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " 2020-09-05T15:08:40.051873\r\n", " image/svg+xml\r\n", " \r\n", " \r\n", " Matplotlib v3.3.1, https://matplotlib.org/\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", "\r\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Set the figure\n", "sns.set(context='paper', style='ticks', palette=snark_palette,\n", " rc={'xtick.major.size': 4, 'ytick.left':False,\n", " 'axes.spines.left': False, 'axes.spines.bottom': True,\n", " 'axes.spines.right': False, 'axes.spines.top': False\n", " }\n", " )\n", "\n", "# Create the plot\n", "f_ag, ax_ag = plt.subplots()\n", "sns.distplot(moma.loc[moma['gender_clean'] == 'female', 'age'], hist=False, rug=False, label='female', ax=ax_ag)\n", "sns.distplot(moma.loc[moma['gender_clean'] == 'male', 'age'], hist=False, rug=False, label='male', ax=ax_ag)\n", "\n", "ax_ag.axvline(x=33, ymin=0, ymax=0.98, marker='x', linestyle=':', color=snark_palette[-1]) # 33 boundary\n", "\n", "# Set some aesthetic params for the plot\n", "ax_ag.annotate('33', [28, 0.0323], c=snark_palette[-1]) # set label for the 33 boundary\n", "ax_ag.legend() # set legend\n", "ax_ag.set_title('Amount of Artworks by Age: gender', loc='right', c=snark_palette[-1]) # set title of the plot\n", "ax_ag.set_xlabel('Age', c=snark_palette[-1]) # set label of x axis\n", "ax_ag.get_yaxis().set_visible(False) # hide y axis\n", "ax_ag.set_xticks([i for i in range(0, 110, 10)]) # set x ticks labels\n", "ax_ag.set_xlim([10, 110]) # set x axis range\n", "ax_ag.tick_params(axis='x', colors=snark_palette[-1]) # color x ticks\n", "ax_ag.spines['bottom'].set_color(snark_palette[-1]) # color x axis\n", "\n", "# Save and plot\n", "plt.savefig('plot.pic\\plot.age.gender.png', dpi=150)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like thirty is really the most productive age.\n", "\n", "For men, the distribution of the number of works reflects the pattern of 30 years. \n", "However, the plot is ambiguous for women, perhaps because of a small portion of the data.\n", "\n", "Let's deep into the values for women." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total by gender:\n", "male 97257\n", "female 16939\n", "Name: gender_clean, dtype: int64\n" ] }, { "data": { "text/plain": [ "46 665\n", "36 648\n", "37 622\n", "41 586\n", "38 567\n", "32 559\n", "35 557\n", "39 521\n", "33 486\n", "31 469\n", "34 467\n", "30 458\n", "45 435\n", "40 415\n", "43 395\n", "42 374\n", "28 373\n", "29 361\n", "49 353\n", "27 336\n", "Name: age, dtype: int64" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Women\n", "print('Total by gender:', moma['gender_clean'].value_counts(), sep='\\n')\n", "\n", "# 46 peack\n", "moma.loc[(moma['gender_clean'] == 'female'), 'age'].value_counts().head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For women, more artworks are created at the age of 46." ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lilly Reich 253\n", "Doris Ulmann 91\n", "Ynez Johnston 35\n", "Gay Block 26\n", "Kiki Smith 23\n", "Name: artist, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleartistartist_bionationalitybegin_dateend_dategenderdateclassificationdepartmentbegin_date_cleanend_date_cleangender_cleannationality_cleanyear_1year_2year_2_2year_2_4date_cleanage
50001UntitledDoris Ulmann(American, 1884–1934)(American)(1884)(1934)(Female)1929-31PhotographPhotography18841934femaleamerican19291931NaNNaN193046
50002Roll, Jordan, RollDoris Ulmann(American, 1884–1934)(American)(1884)(1934)(Female)1929-31PhotographPhotography18841934femaleamerican19291931NaNNaN193046
50003UntitledDoris Ulmann(American, 1884–1934)(American)(1884)(1934)(Female)1929-31PhotographPhotography18841934femaleamerican19291931NaNNaN193046
50004UntitledDoris Ulmann(American, 1884–1934)(American)(1884)(1934)(Female)1929-31PhotographPhotography18841934femaleamerican19291931NaNNaN193046
50005UntitledDoris Ulmann(American, 1884–1934)(American)(1884)(1934)(Female)1929-31PhotographPhotography18841934femaleamerican19291931NaNNaN193046
...............................................................
107260Attachment for rubber strap (Perspective sketc...Lilly Reich(German, 1885–1947)(German)(1885)(1947)(Female)1931Mies van der Rohe ArchiveArchitecture & Design18851947femalegerman19311931NaNNaN193146
107261Bed and couch, LR 600, 610, and 620 (Elevation...Lilly Reich(German, 1885–1947)(German)(1885)(1947)(Female)1931Mies van der Rohe ArchiveArchitecture & Design18851947femalegerman19311931NaNNaN193146
107262Bed and couch, LR 600, 610, and 620 (Elevation...Lilly Reich(German, 1885–1947)(German)(1885)(1947)(Female)1931Mies van der Rohe ArchiveArchitecture & Design18851947femalegerman19311931NaNNaN193146
107263Bed and mattress frame, LR 600 (Plan and eleva...Lilly Reich(German, 1885–1947)(German)(1885)(1947)(Female)1931Mies van der Rohe ArchiveArchitecture & Design18851947femalegerman19311931NaNNaN193146
107264Daybed, LR 620 (Plan and elevations)Lilly Reich(German, 1885–1947)(German)(1885)(1947)(Female)1931Mies van der Rohe ArchiveArchitecture & Design18851947femalegerman19311931NaNNaN193146
\n", "

428 rows × 20 columns

\n", "
" ], "text/plain": [ " title artist artist_bio nationality begin_date \\\n", "50001 Untitled Doris Ulmann (American, 1884–1934) (American) (1884) \n", "50002 Roll, Jordan, Roll Doris Ulmann (American, 1884–1934) (American) (1884) \n", "50003 Untitled Doris Ulmann (American, 1884–1934) (American) (1884) \n", "50004 Untitled Doris Ulmann (American, 1884–1934) (American) (1884) \n", "50005 Untitled Doris Ulmann (American, 1884–1934) (American) (1884) \n", "... ... ... ... ... ... \n", "107260 Attachment for rubber strap (Perspective sketc... Lilly Reich (German, 1885–1947) (German) (1885) \n", "107261 Bed and couch, LR 600, 610, and 620 (Elevation... Lilly Reich (German, 1885–1947) (German) (1885) \n", "107262 Bed and couch, LR 600, 610, and 620 (Elevation... Lilly Reich (German, 1885–1947) (German) (1885) \n", "107263 Bed and mattress frame, LR 600 (Plan and eleva... Lilly Reich (German, 1885–1947) (German) (1885) \n", "107264 Daybed, LR 620 (Plan and elevations) Lilly Reich (German, 1885–1947) (German) (1885) \n", "\n", " end_date gender date classification department begin_date_clean \\\n", "50001 (1934) (Female) 1929-31 Photograph Photography 1884 \n", "50002 (1934) (Female) 1929-31 Photograph Photography 1884 \n", "50003 (1934) (Female) 1929-31 Photograph Photography 1884 \n", "50004 (1934) (Female) 1929-31 Photograph Photography 1884 \n", "50005 (1934) (Female) 1929-31 Photograph Photography 1884 \n", "... ... ... ... ... ... ... \n", "107260 (1947) (Female) 1931 Mies van der Rohe Archive Architecture & Design 1885 \n", "107261 (1947) (Female) 1931 Mies van der Rohe Archive Architecture & Design 1885 \n", "107262 (1947) (Female) 1931 Mies van der Rohe Archive Architecture & Design 1885 \n", "107263 (1947) (Female) 1931 Mies van der Rohe Archive Architecture & Design 1885 \n", "107264 (1947) (Female) 1931 Mies van der Rohe Archive Architecture & Design 1885 \n", "\n", " end_date_clean gender_clean nationality_clean year_1 year_2 year_2_2 year_2_4 date_clean age \n", "50001 1934 female american 1929 1931 NaN NaN 1930 46 \n", "50002 1934 female american 1929 1931 NaN NaN 1930 46 \n", "50003 1934 female american 1929 1931 NaN NaN 1930 46 \n", "50004 1934 female american 1929 1931 NaN NaN 1930 46 \n", "50005 1934 female american 1929 1931 NaN NaN 1930 46 \n", "... ... ... ... ... ... ... ... ... ... \n", "107260 1947 female german 1931 1931 NaN NaN 1931 46 \n", "107261 1947 female german 1931 1931 NaN NaN 1931 46 \n", "107262 1947 female german 1931 1931 NaN NaN 1931 46 \n", "107263 1947 female german 1931 1931 NaN NaN 1931 46 \n", "107264 1947 female german 1931 1931 NaN NaN 1931 46 \n", "\n", "[428 rows x 20 columns]" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Top5 women in 46\n", "women_46_top5 = (moma.loc[(moma['gender_clean'] == 'female') & (moma['age'] == 46), 'artist']).value_counts().head()\n", "\n", "print(women_46_top5)\n", "moma.loc[(moma['artist'].isin(women_46_top5.index)) & (moma['age'] == 46)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Five female artists influence the outliers of 46 years. More than others, [Lilly Reich](https://www.moma.org/artists/8059).\n", "\n", "We can see a peak around 90 years on the plot. Let's take a closer look." ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "88 335\n", "84 249\n", "89 196\n", "90 175\n", "83 148\n", "87 107\n", "81 106\n", "82 105\n", "86 97\n", "85 87\n", "80 26\n", "Name: age, dtype: int64\n", "Louise Bourgeois 334\n", "Sonia Delaunay-Terk 1\n", "Name: artist, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleartistartist_bionationalitybegin_dateend_dategenderdateclassificationdepartmentbegin_date_cleanend_date_cleangender_cleannationality_cleanyear_1year_2year_2_2year_2_4date_cleanage
35799Color Rhythm No. 1921-1973Sonia Delaunay-Terk(French, born Ukraine. 1885–1979)(French)(1885)(1979)(Female)1973DrawingDrawings & Prints18851979femalefrench19731973NaNNaN197388
55942Mother and ChildLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999PrintDrawings & Prints19112010femaleamerican19991999NaNNaN199988
55943The Angry CatLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999PrintDrawings & Prints19112010femaleamerican19991999NaNNaN199988
55944The Angry Cat, state I of III (recto), Hanging...Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999PrintDrawings & Prints19112010femaleamerican19991999NaNNaN199988
55945Champfleurette #2Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999PrintDrawings & Prints19112010femaleamerican19991999NaNNaN199988
...............................................................
124192Point d'Ironie, coverLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999Illustrated BookDrawings & Prints19112010femaleamerican19991999NaNNaN199988
124193Point d'Ironie, spread 1 of 3Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999Illustrated BookDrawings & Prints19112010femaleamerican19991999NaNNaN199988
124194Point d'Ironie, spread 2 of 3Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999Illustrated BookDrawings & Prints19112010femaleamerican19991999NaNNaN199988
124195Point d'Ironie, spread 3 of 3Louise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999Illustrated BookDrawings & Prints19112010femaleamerican19991999NaNNaN199988
124196Point d'Ironie, back coverLouise Bourgeois(American, born France. 1911–2010)(American)(1911)(2010)(Female)1999Illustrated BookDrawings & Prints19112010femaleamerican19991999NaNNaN199988
\n", "

335 rows × 20 columns

\n", "
" ], "text/plain": [ " title artist artist_bio \\\n", "35799 Color Rhythm No. 1921-1973 Sonia Delaunay-Terk (French, born Ukraine. 1885–1979) \n", "55942 Mother and Child Louise Bourgeois (American, born France. 1911–2010) \n", "55943 The Angry Cat Louise Bourgeois (American, born France. 1911–2010) \n", "55944 The Angry Cat, state I of III (recto), Hanging... Louise Bourgeois (American, born France. 1911–2010) \n", "55945 Champfleurette #2 Louise Bourgeois (American, born France. 1911–2010) \n", "... ... ... ... \n", "124192 Point d'Ironie, cover Louise Bourgeois (American, born France. 1911–2010) \n", "124193 Point d'Ironie, spread 1 of 3 Louise Bourgeois (American, born France. 1911–2010) \n", "124194 Point d'Ironie, spread 2 of 3 Louise Bourgeois (American, born France. 1911–2010) \n", "124195 Point d'Ironie, spread 3 of 3 Louise Bourgeois (American, born France. 1911–2010) \n", "124196 Point d'Ironie, back cover Louise Bourgeois (American, born France. 1911–2010) \n", "\n", " nationality begin_date end_date gender date classification department begin_date_clean \\\n", "35799 (French) (1885) (1979) (Female) 1973 Drawing Drawings & Prints 1885 \n", "55942 (American) (1911) (2010) (Female) 1999 Print Drawings & Prints 1911 \n", "55943 (American) (1911) (2010) (Female) 1999 Print Drawings & Prints 1911 \n", "55944 (American) (1911) (2010) (Female) 1999 Print Drawings & Prints 1911 \n", "55945 (American) (1911) (2010) (Female) 1999 Print Drawings & Prints 1911 \n", "... ... ... ... ... ... ... ... ... \n", "124192 (American) (1911) (2010) (Female) 1999 Illustrated Book Drawings & Prints 1911 \n", "124193 (American) (1911) (2010) (Female) 1999 Illustrated Book Drawings & Prints 1911 \n", "124194 (American) (1911) (2010) (Female) 1999 Illustrated Book Drawings & Prints 1911 \n", "124195 (American) (1911) (2010) (Female) 1999 Illustrated Book Drawings & Prints 1911 \n", "124196 (American) (1911) (2010) (Female) 1999 Illustrated Book Drawings & Prints 1911 \n", "\n", " end_date_clean gender_clean nationality_clean year_1 year_2 year_2_2 year_2_4 date_clean age \n", "35799 1979 female french 1973 1973 NaN NaN 1973 88 \n", "55942 2010 female american 1999 1999 NaN NaN 1999 88 \n", "55943 2010 female american 1999 1999 NaN NaN 1999 88 \n", "55944 2010 female american 1999 1999 NaN NaN 1999 88 \n", "55945 2010 female american 1999 1999 NaN NaN 1999 88 \n", "... ... ... ... ... ... ... ... ... ... \n", "124192 2010 female american 1999 1999 NaN NaN 1999 88 \n", "124193 2010 female american 1999 1999 NaN NaN 1999 88 \n", "124194 2010 female american 1999 1999 NaN NaN 1999 88 \n", "124195 2010 female american 1999 1999 NaN NaN 1999 88 \n", "124196 2010 female american 1999 1999 NaN NaN 1999 88 \n", "\n", "[335 rows x 20 columns]" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 90 peack\n", "print(moma.loc[(moma['gender_clean'] == 'female') & (moma['age'].between(80, 90)), 'age'].value_counts().head(20))\n", "\n", "# Top5 women in 88\n", "women_88_top5 = (moma.loc[(moma['gender_clean'] == 'female') & (moma['age'] == 88), 'artist']).value_counts().head()\n", "\n", "print(women_88_top5)\n", "moma.loc[(moma['artist'].isin(women_88_top5.index)) & (moma['age'] == 88)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This peak is characterized by works of Louise Bourgeois, an artist who did not pay attention to age.\n", "\n", "You can find more about Louise Bourgeois and her creativity [here](https://en.wikipedia.org/wiki/Louise_Bourgeois) and [here](https://www.moma.org/artists/8059).\n", "\n", "### Plot by nationality\n", "\n", "Let's determine 4 nationalities with the largest number of the art objects." ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "american 53589\n", "french 20915\n", "german 8559\n", "british 5168\n", "Name: nationality_clean, dtype: int64\n" ] } ], "source": [ "nationality_top4 = moma['nationality_clean'].value_counts(normalize=False).head(4)\n", "print(nationality_top4)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " 2020-09-05T15:08:40.838422\r\n", " image/svg+xml\r\n", " \r\n", " \r\n", " Matplotlib v3.3.1, https://matplotlib.org/\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", "\r\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Set the figure\n", "sns.set(context='paper', style='ticks', palette=snark_palette,\n", " rc={'xtick.major.size': 4, 'ytick.left':False,\n", " 'axes.spines.left': False, 'axes.spines.bottom': True,\n", " 'axes.spines.right': False, 'axes.spines.top': False\n", " }\n", " )\n", "\n", "# Create the plot\n", "moma_nationality = moma.loc[moma['nationality_clean'].isin(nationality_top4.index), ['nationality_clean', 'age']] # data\n", "g_an = sns.FacetGrid(moma_nationality, hue='nationality_clean')\n", "g_an = g_an.map(sns.distplot, 'age', hist=False, rug=False)\n", "\n", "g_an.ax.axvline(x=33, ymin=0, ymax=0.98, marker='x', linestyle=':', color=snark_palette[-1]) # 33 boundary\n", "\n", "# Set some aesthetic params for the plot\n", "g_an.fig.set_size_inches(6, 4)\n", "g_an.ax.annotate('33', [28, 0.0415], c=snark_palette[-1]) # set label for the 33 boundary\n", "g_an.ax.legend() # set legend\n", "g_an.ax.set_title('Amount of Artworks by Age: nationality', loc='right', c=snark_palette[-1]) # set title of the plot\n", "g_an.ax.set_xlabel('Age', c=snark_palette[-1]) # set label of x axis\n", "g_an.ax.get_yaxis().set_visible(False) # hide y labels\n", "g_an.despine(left=True) # hide y axis\n", "g_an.ax.set_xticks([i for i in range(0, 110, 10)]) # set x ticks labels\n", "g_an.ax.set_xlim([10, 110]) # set x axis range\n", "g_an.ax.tick_params(axis='x', colors=snark_palette[-1]) # color x ticks\n", "g_an.ax.spines['bottom'].set_color(snark_palette[-1]) # color x axis\n", "\n", "# Save and plot\n", "g_an.fig.subplots_adjust(bottom=0.125, top=0.88, left=0.125, right=0.9) # adjust for the post picture\n", "g_an.savefig('plot.pic\\plot.age.nationality.png', dpi=150, bbox_inches=None)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot by century\n", "\n", "Let's create the plot based on the centuries in which the artworks were created. \n", "First, we'll save the century in a separate column `century`." ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_cleancentury
0189619
1198720
2190320
3198020
4190320
.........
138156193420
138157193420
138158193420
138159193420
138160193420
\n", "

114559 rows × 2 columns

\n", "
" ], "text/plain": [ " date_clean century\n", "0 1896 19\n", "1 1987 20\n", "2 1903 20\n", "3 1980 20\n", "4 1903 20\n", "... ... ...\n", "138156 1934 20\n", "138157 1934 20\n", "138158 1934 20\n", "138159 1934 20\n", "138160 1934 20\n", "\n", "[114559 rows x 2 columns]" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract \n", "moma['century'] = ((moma['date_clean'] // 100) + 1).astype(int)\n", "\n", "# Inspect values\n", "moma[['date_clean', 'century']]" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21 12076\n", "20 97626\n", "19 4771\n", "18 86\n", "Name: century, dtype: int64" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "moma['century'].value_counts().sort_index(ascending=False)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\r\n", "\r\n", "\r\n", "\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " 2020-09-05T15:08:41.518034\r\n", " image/svg+xml\r\n", " \r\n", " \r\n", " Matplotlib v3.3.1, https://matplotlib.org/\r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", " \r\n", "\r\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Set the figure\n", "sns.set(context='paper', style='ticks', palette=snark_palette,\n", " rc={'xtick.major.size': 4, 'ytick.left':False,\n", " 'axes.spines.left': False, 'axes.spines.bottom': True,\n", " 'axes.spines.right': False, 'axes.spines.top': False\n", " }\n", " )\n", "\n", "# Create the plot\n", "moma_century = moma.loc[moma['century'].isin([19, 20, 21]), ['century', 'age']] # data\n", "g_ac = sns.FacetGrid(moma_century, hue='century')\n", "g_ac = g_ac.map(sns.distplot, 'age', hist=False, rug=False)\n", "\n", "g_ac.ax.axvline(x=33, ymin=0, ymax=0.98, marker='x', linestyle=':', color=snark_palette[-1]) # 33 boundary\n", "\n", "# Set some aesthetic params for the plot\n", "g_ac.fig.set_size_inches(6, 4)\n", "g_ac.ax.annotate('33', [28, 0.041], c=snark_palette[-1]) # set label for the 33 boundary\n", "g_ac.ax.legend() # set legend\n", "g_ac.ax.set_title('Amount of Artworks by Age: century', loc='right', c=snark_palette[-1]) # set title of the plot\n", "g_ac.ax.set_xlabel('Age', c=snark_palette[-1]) # set label of x axis\n", "g_ac.ax.get_yaxis().set_visible(False) # hide y labels\n", "g_ac.despine(left=True) # hide y axis\n", "g_ac.ax.set_xticks([i for i in range(0, 110, 10)]) # set x ticks labels\n", "g_ac.ax.set_xlim([10, 110]) # set x axis range\n", "g_ac.ax.tick_params(axis='x', colors=snark_palette[-1]) # color x ticks\n", "g_ac.ax.spines['bottom'].set_color(snark_palette[-1]) # color x axis\n", "\n", "# Save and plot\n", "g_ac.fig.subplots_adjust(bottom=0.125, top=0.88, left=0.125, right=0.9) # adjust for post picture\n", "g_ac.savefig('plot.pic\\plot.age.century.png', dpi=150, bbox_inches=None)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In total, *we found out, that 33 is really especial age!*\n", "\n", "\n", "# Blog Post" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1172" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "blog_post = r\"\"\"\n", "## WHEN IS MY BRAIN UP?🎓\n", "\n", "The human brain reaches its peak efficiency by the age of 30*. \n", "That say the scientists, let's ask the data! \n", "\n", "📌We wondered if most of the artworks in the Moma database were actually created by artists in their 30s. \n", "\n", "Since the artworks have been included in the collection of the museum, they represent a valuable result of the human brain activity. \n", "We calculated the age when the artist created his work and ploted at what age the most of the valuable artworks was created. \n", "It's curious that most of the works were created at the age of 33! \n", "33 is a meaningful age. \n", "For example, in Christianity, Jesus was crucified and then resurrected at the age of 33. \n", "\n", "We can assume that there is a certain time lag between the origin of the idea and its implementation. \n", "\n", "(Interested in more details? Follow the link in bio for the entire research project!) \n", "\n", "\\* (e.g, Chapter 6 of the book \"Behave: The Biology of Humans at Our Best and Worst\" by Robert M. Sapolsky) \n", ". \n", ". \n", ". \n", "\\#funtime \\#probably \\#datascience \\#datapower \\#data_sugar_brain \\#human_brain \\#art \\#data_know_everything_and_nothing \\#linkinbio \\#datajournalism \\#python\n", "\"\"\"\n", "\n", "# Check post text length for Instagram\n", "len(blog_post)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "\n", "## WHEN IS MY BRAIN UP?🎓\n", "\n", "The human brain reaches its peak efficiency by the age of 30*. \n", "That say the scientists, let's ask the data! \n", "\n", "📌We wondered if most of the artworks in the Moma database were actually created by artists in their 30s. \n", "\n", "Since the artworks have been included in the collection of the museum, they represent a valuable result of the human brain activity. \n", "We calculated the age when the artist created his work and ploted at what age the most of the valuable artworks was created. \n", "It's curious that most of the works were created at the age of 33! \n", "33 is a meaningful age. \n", "For example, in Christianity, Jesus was crucified and then resurrected at the age of 33. \n", "\n", "We can assume that there is a certain time lag between the origin of the idea and its implementation. \n", "\n", "(Interested in more details? Follow the link in bio for the entire research project!) \n", "\n", "\\* (e.g, Chapter 6 of the book \"Behave: The Biology of Humans at Our Best and Worst\" by Robert M. Sapolsky) \n", ". \n", ". \n", ". \n", "\\#funtime \\#probably \\#datascience \\#datapower \\#data_sugar_brain \\#human_brain \\#art \\#data_know_everything_and_nothing \\#linkinbio \\#datajournalism \\#python\n" ], "text/plain": [ "" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Markdown as md\n", "md(blog_post)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" } }, "nbformat": 4, "nbformat_minor": 4 }