{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Päivitetty 2023-05-03 / Aki Taanila\n" ] } ], "source": [ "from datetime import datetime\n", "print(f'Päivitetty {datetime.now().date()} / Aki Taanila')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ajoneuvojen avoin data sisältää kaikkien liikennekäytössä olevien ajoneuvojen rekisteröinti-, hyväksyntä- ja teknisiä tietoja Traficomin ylläpitämästä liikenneasioiden rekisteristä. Datan laajuus on yli 5 miljoonaa riviä. Dataa päivitetään säännöllisesti. Data löytyy osoitteesta:\n", "\n", "https://tieto.traficom.fi/fi/tietotraficom/avoin-data?toggle=Ajoneuvojen%20avoin%20data\n", "\n", "Samasta osoitteesta löytyy myös aineistokuvaus, muuttujaluettelo Excel-muodossa ja käytetty merkistökoodaus (ISO8859-1).\n", "\n", "Data on ladattavissa zip-pakettina. Seuraavassa oletetaan, että **Ajoneuvorekisteri.zip** on tallennettu samaan kansioon tämän muistion kanssa.\n", "\n", "Poimin esimerkkinä datasta tiedot vuoden 2019 alusta eteenpäin rekisteröidyistä henkilöautoista. Mukaan otan vain osan muuttujista. Tallennan datan myöhempää käyttöä varten.\n", "\n", "Lopuksi tutkin suosituimpia merkkejä ja värejä.\n", "\n", "**Koodin suoritus kestää jonkin aikaa, koska ison datan lataaminen netistä on hidasta**." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Jos et ole ladannut Ajoneuvorekisteri.zip -tiedostoa omalle koneellesi\n", "# niin voit tehdä sen seuraavalla koodilla:\n", "\n", "import requests\n", "url = \"https://opendata.traficom.fi/Content/Ajoneuvorekisteri.zip\"\n", "r = requests.get(url)\n", "with open(\"Ajoneuvorekisteri.zip\",'wb') as file:\n", " file.write(r.content)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\taaak\\AppData\\Local\\Temp\\ipykernel_744\\390937211.py:8: DtypeWarning: Columns (18,27) have mixed types. Specify dtype option on import or set low_memory=False.\n", " df = pd.read_csv('Ajoneuvorekisteri.zip', sep=';', encoding='ISO8859-1', usecols=cols)\n" ] } ], "source": [ "# Mukaan otettavat muuttujat\n", "cols = ['ajoneuvoluokka', 'ensirekisterointipvm', 'vari', 'korityyppi', 'omamassa', \n", " 'ajonKokPituus', 'ajonLeveys', 'ajonKorkeus', 'kayttovoima', 'suurinNettoteho', \n", " 'sahkohybridi', 'sahkohybridinluokka', 'merkkiSelvakielinen','mallimerkinta', \n", " 'vaihteisto', 'kaupallinenNimi', 'kunta', 'matkamittarilukema']\n", "\n", "# Datan avaus (pandas osaa purkaa zip-paketin)\n", "df = pd.read_csv('Ajoneuvorekisteri.zip', sep=';', encoding='ISO8859-1', usecols=cols)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 5163486 entries, 0 to 5163485\n", "Data columns (total 18 columns):\n", " # Column Dtype \n", "--- ------ ----- \n", " 0 ajoneuvoluokka object \n", " 1 ensirekisterointipvm object \n", " 2 vari object \n", " 3 korityyppi object \n", " 4 omamassa float64\n", " 5 ajonKokPituus float64\n", " 6 ajonLeveys float64\n", " 7 ajonKorkeus float64\n", " 8 kayttovoima object \n", " 9 suurinNettoteho float64\n", " 10 sahkohybridi object \n", " 11 sahkohybridinluokka float64\n", " 12 merkkiSelvakielinen object \n", " 13 mallimerkinta object \n", " 14 vaihteisto object \n", " 15 kaupallinenNimi object \n", " 16 kunta float64\n", " 17 matkamittarilukema float64\n", "dtypes: float64(8), object(10)\n", "memory usage: 709.1+ MB\n" ] } ], "source": [ "# Tietoa muuttujista\n", "df.info()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Poimitaan datasta vain henkilöautot (ajoneuvoluokat M1 ja M1G)\n", "df = df[df['ajoneuvoluokka'].isin(['M1', 'M1G'])]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Varmistetaan, että päivämäärät tunnistuvat päivämääriksi\n", "df['ensirekisterointipvm'] = pd.to_datetime(df['ensirekisterointipvm'], format='%Y-%m-%d')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Rajoitutaan vuoden 2019 alusta lähtien rekisteröityihin\n", "df = df[df['ensirekisterointipvm']>'2019-01-01'].sort_values('ensirekisterointipvm')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Tallennus csv-muotoon\n", "df.to_csv('autot.csv', index=False)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ajoneuvoluokkaensirekisterointipvmvarikorityyppiomamassaajonKokPituusajonLeveysajonKorkeuskayttovoimasuurinNettotehosahkohybridisahkohybridinluokkamerkkiSelvakielinenmallimerkintavaihteistokaupallinenNimikuntamatkamittarilukema
4363237M12019-01-021AC1647.04874.01820.01536.038.0110.0FalseNaNVolkswagenPASSAT Farmari (AC) 4ov 1390cm3 A2PASSAT405.0179375.0
4297425M12019-01-020AA2031.04963.01890.01437.01.0233.0True1.0VolvoS90 T8 Twin Engine Sedan (AA) 4ov 1969cm3 A2S90 T8 Twin Engine297.072330.0
4317685M12019-01-020AB1190.04158.01756.01498.01.073.5FalseNaNFordFiesta Viistoperä (AB) 4ov 998cm31Fiesta753.051033.0
4023431M12019-01-026AC1669.04525.01805.01605.02.0104.0FalseNaNKiaCARENS Farmari (AC) 5ov 1685cm3 A2CARENS91.076026.0
4351770M12019-01-026AC1408.04788.01844.01545.01.0110.6FalseNaNFordFocus Farmari (AC) 4ov 1496cm31Focus91.034696.0
.........................................................
5162089M12022-12-319AF2072.04750.01921.01624.04.0NaNFalseNaNTeslaModel Y Monikäyttöajoneuvo (AF) 5ov8Model Y49.0NaN
5158637M12022-12-310AA1919.04694.01850.01443.004NaNFalseNaNTeslaModel 3 Sedan (AA) 4ov8Model 3837.0NaN
5160973M12022-12-318AF2539.05057.01999.01680.04.0NaNFalseNaNTeslaModel X Monikäyttöajoneuvo (AF) 5ov8Model X49.0NaN
5155994M12022-12-312AC2118.04599.01852.01616.004NaNFalseNaNVolkswagen, VWID.5 PRO 150 KW Farmari (AC) 5ov A2ID.5 PRO 150 KW92.0NaN
5160968M12022-12-316AF2539.05057.01999.01680.04.0NaNFalseNaNTeslaModel X Monikäyttöajoneuvo (AF) 5ov8Model X49.0NaN
\n", "

530249 rows × 18 columns

\n", "
" ], "text/plain": [ " ajoneuvoluokka ensirekisterointipvm vari korityyppi omamassa \\\n", "4363237 M1 2019-01-02 1 AC 1647.0 \n", "4297425 M1 2019-01-02 0 AA 2031.0 \n", "4317685 M1 2019-01-02 0 AB 1190.0 \n", "4023431 M1 2019-01-02 6 AC 1669.0 \n", "4351770 M1 2019-01-02 6 AC 1408.0 \n", "... ... ... ... ... ... \n", "5162089 M1 2022-12-31 9 AF 2072.0 \n", "5158637 M1 2022-12-31 0 AA 1919.0 \n", "5160973 M1 2022-12-31 8 AF 2539.0 \n", "5155994 M1 2022-12-31 2 AC 2118.0 \n", "5160968 M1 2022-12-31 6 AF 2539.0 \n", "\n", " ajonKokPituus ajonLeveys ajonKorkeus kayttovoima suurinNettoteho \\\n", "4363237 4874.0 1820.0 1536.0 38.0 110.0 \n", "4297425 4963.0 1890.0 1437.0 1.0 233.0 \n", "4317685 4158.0 1756.0 1498.0 1.0 73.5 \n", "4023431 4525.0 1805.0 1605.0 2.0 104.0 \n", "4351770 4788.0 1844.0 1545.0 1.0 110.6 \n", "... ... ... ... ... ... \n", "5162089 4750.0 1921.0 1624.0 4.0 NaN \n", "5158637 4694.0 1850.0 1443.0 04 NaN \n", "5160973 5057.0 1999.0 1680.0 4.0 NaN \n", "5155994 4599.0 1852.0 1616.0 04 NaN \n", "5160968 5057.0 1999.0 1680.0 4.0 NaN \n", "\n", " sahkohybridi sahkohybridinluokka merkkiSelvakielinen \\\n", "4363237 False NaN Volkswagen \n", "4297425 True 1.0 Volvo \n", "4317685 False NaN Ford \n", "4023431 False NaN Kia \n", "4351770 False NaN Ford \n", "... ... ... ... \n", "5162089 False NaN Tesla \n", "5158637 False NaN Tesla \n", "5160973 False NaN Tesla \n", "5155994 False NaN Volkswagen, VW \n", "5160968 False NaN Tesla \n", "\n", " mallimerkinta vaihteisto \\\n", "4363237 PASSAT Farmari (AC) 4ov 1390cm3 A 2 \n", "4297425 S90 T8 Twin Engine Sedan (AA) 4ov 1969cm3 A 2 \n", "4317685 Fiesta Viistoperä (AB) 4ov 998cm3 1 \n", "4023431 CARENS Farmari (AC) 5ov 1685cm3 A 2 \n", "4351770 Focus Farmari (AC) 4ov 1496cm3 1 \n", "... ... ... \n", "5162089 Model Y Monikäyttöajoneuvo (AF) 5ov 8 \n", "5158637 Model 3 Sedan (AA) 4ov 8 \n", "5160973 Model X Monikäyttöajoneuvo (AF) 5ov 8 \n", "5155994 ID.5 PRO 150 KW Farmari (AC) 5ov A 2 \n", "5160968 Model X Monikäyttöajoneuvo (AF) 5ov 8 \n", "\n", " kaupallinenNimi kunta matkamittarilukema \n", "4363237 PASSAT 405.0 179375.0 \n", "4297425 S90 T8 Twin Engine 297.0 72330.0 \n", "4317685 Fiesta 753.0 51033.0 \n", "4023431 CARENS 91.0 76026.0 \n", "4351770 Focus 91.0 34696.0 \n", "... ... ... ... \n", "5162089 Model Y 49.0 NaN \n", "5158637 Model 3 837.0 NaN \n", "5160973 Model X 49.0 NaN \n", "5155994 ID.5 PRO 150 KW 92.0 NaN \n", "5160968 Model X 49.0 NaN \n", "\n", "[530249 rows x 18 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Näytetään tulosteissa kaikki sarakkeet\n", "pd.set_option.display_columns = None\n", "df" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Värit selviävät aineistokuvauksesta\n", "varit = ['Musta', 'Ruskea (beige)', 'Punainen', 'Oranssi','Keltainen', 'Vihreä', 'Sininen',\n", " 'Violetti', 'Harmaa', 'Valkoinen', 'Monivär.', 'Hopea', 'Turkoosi']" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f
Valkoinen140345
Harmaa108845
Musta103928
Sininen58818
Punainen48871
Hopea30091
Ruskea (beige)19918
Vihreä6241
Oranssi4875
Keltainen3846
Violetti952
Monivär.528
Turkoosi428
\n", "
" ], "text/plain": [ " f\n", "Valkoinen 140345\n", "Harmaa 108845\n", "Musta 103928\n", "Sininen 58818\n", "Punainen 48871\n", "Hopea 30091\n", "Ruskea (beige) 19918\n", "Vihreä 6241\n", "Oranssi 4875\n", "Keltainen 3846\n", "Violetti 952\n", "Monivär. 528\n", "Turkoosi 428" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Kuinka suosittuja ovat eri värit olleet?\n", "df1 = pd.crosstab(df['vari'], 'f')\n", "df1.columns.name = ''\n", "\n", "# Sijoitan värien nimet indeksiin, jossa alunperin on värien koodit\n", "df1.index = varit\n", "\n", "# Värit suosituimmuusjärjestyksessä\n", "df1.sort_values('f', ascending = False)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f
merkkiSelvakielinen
Toyota66568
Volvo56346
Mercedes-Benz52108
Volkswagen, VW49881
Skoda40506
BMW36423
Kia30981
Ford25755
Audi23338
Nissan16448
Opel14451
Hyundai14250
Mitsubishi12183
Peugeot10359
Tesla9687
Renault9075
Citroen7033
Seat6810
Mazda4822
Suzuki4517
\n", "
" ], "text/plain": [ " f\n", "merkkiSelvakielinen \n", "Toyota 66568\n", "Volvo 56346\n", "Mercedes-Benz 52108\n", "Volkswagen, VW 49881\n", "Skoda 40506\n", "BMW 36423\n", "Kia 30981\n", "Ford 25755\n", "Audi 23338\n", "Nissan 16448\n", "Opel 14451\n", "Hyundai 14250\n", "Mitsubishi 12183\n", "Peugeot 10359\n", "Tesla 9687\n", "Renault 9075\n", "Citroen 7033\n", "Seat 6810\n", "Mazda 4822\n", "Suzuki 4517" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 20 suosituinta merkkiä\n", "df2 = pd.crosstab(df['merkkiSelvakielinen'], 'f').nlargest(20, 'f' )\n", "df2.columns.name = ''\n", "df2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Edellä huomataan, että Volkswagen esiintyy kahdella eri nimellä (Volkswagen ja Volkswagen, VW). Seuraavassa yhdistän molemmat saman nimen alle." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f
merkkiSelvakielinen
Toyota66568
Volvo56346
Volkswagen54300
Mercedes-Benz52108
Skoda40506
BMW36423
Kia30981
Ford25755
Audi23338
Nissan16448
Opel14451
Hyundai14250
Mitsubishi12183
Peugeot10359
Tesla9687
Renault9075
Citroen7033
Seat6810
Mazda4822
Suzuki4517
\n", "
" ], "text/plain": [ " f\n", "merkkiSelvakielinen \n", "Toyota 66568\n", "Volvo 56346\n", "Volkswagen 54300\n", "Mercedes-Benz 52108\n", "Skoda 40506\n", "BMW 36423\n", "Kia 30981\n", "Ford 25755\n", "Audi 23338\n", "Nissan 16448\n", "Opel 14451\n", "Hyundai 14250\n", "Mitsubishi 12183\n", "Peugeot 10359\n", "Tesla 9687\n", "Renault 9075\n", "Citroen 7033\n", "Seat 6810\n", "Mazda 4822\n", "Suzuki 4517" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['merkkiSelvakielinen'] = df['merkkiSelvakielinen'].replace({'Volkswagen, VW':'Volkswagen'})\n", "df3 = pd.crosstab(df['merkkiSelvakielinen'], 'f').nlargest(20, 'f' )\n", "df3.columns.name = ''\n", "df3" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 MustaRuskea (beige)PunainenOranssiKeltainenVihreäSininenViolettiHarmaaValkoinenMonivär.HopeaTurkoosi
merkkiSelvakielinen             
Audi6691469115047119394222211467262183118913
BMW1296584570825451204397810684991575111222
Citroen836264814833207613143125530283
Ford34981884477909688499885694633151923
Hyundai1454100119921266722216191293139060730170
Kia3959793447026139910184932346626829036322
Mazda5122318310123371114880701480
Mercedes-Benz178219092042848133831884713123112601023254
Mitsubishi2885869164621131159862551328312202
Nissan2092116823345924216105162803407844612721
Opel196618619561608231154302981451008881
Peugeot1314145110510616389311014208727720210
Renault6341801615827967128742171426580301
Seat900639702505311540451471187603750
Skoda663716852723100337637859310374997902763
Suzuki4271205315339974564299279805417
Tesla184076740269353139647451220
Toyota70986392793819370941656131039563181682101654
Volvo171172587301533143945226161199314050315054
\n" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rajoitutaan vai suosituimpiin merkkeihin\n", "df_suositut = df[df['merkkiSelvakielinen'].isin(df2.index)]\n", "\n", "# Värit suosituilla merkeillä\n", "df4 = pd.crosstab(df_suositut['merkkiSelvakielinen'], df_suositut['vari'])\n", "\n", "# Sijoitan värien nimet sarakkeisiin\n", "df4.columns = varit\n", "\n", "# Värjätään kunkin merkin kohdalla suosituin väri\n", "df4.style.highlight_max(color='lightgreen', axis=1)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 MustaRuskea (beige)PunainenOranssiKeltainenVihreäSininenViolettiHarmaaValkoinenMonivär.HopeaTurkoosi
merkkiSelvakielinen             
Audi6691469115047119394222211467262183118913
BMW1296584570825451204397810684991575111222
Citroen836264814833207613143125530283
Ford34981884477909688499885694633151923
Hyundai1454100119921266722216191293139060730170
Kia3959793447026139910184932346626829036322
Mazda5122318310123371114880701480
Mercedes-Benz178219092042848133831884713123112601023254
Mitsubishi2885869164621131159862551328312202
Nissan2092116823345924216105162803407844612721
Opel196618619561608231154302981451008881
Peugeot1314145110510616389311014208727720210
Renault6341801615827967128742171426580301
Seat900639702505311540451471187603750
Skoda663716852723100337637859310374997902763
Suzuki4271205315339974564299279805417
Tesla184076740269353139647451220
Toyota70986392793819370941656131039563181682101654
Volvo171172587301533143945226161199314050315054
\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rajoitutaan vai suosituimpiin merkkeihin\n", "df_suositut = df[df['merkkiSelvakielinen'].isin(df2.index)]\n", "\n", "# Värit suosituilla merkeillä\n", "df4 = pd.crosstab(df_suositut['merkkiSelvakielinen'], df_suositut['vari'])\n", "\n", "# Sijoitan värien nimet sarakkeisiin\n", "df4.columns = varit\n", "\n", "# Värjätään kunkin merkin kohdalla suosituin väri\n", "df4.style.highlight_max(color='lightgreen', axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lisätietoa\n", "\n", "Data-analytiikka Pythonilla https://tilastoapu.wordpress.com/python/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 4 }