{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Päivitetty 2023-05-02 / Aki Taanila\n" ] } ], "source": [ "from datetime import datetime\n", "print(f'Päivitetty {datetime.now().date()} / Aki Taanila')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Desimaalipisteet pilkuiksi\n", "\n", "Pythonissa käytetään desimaalierottimena pistettä. Seuraavassa esittelen keinoja vaihtaa desimaalierottimeksi pilkku." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas-kirjaston versiosta 1.3.0 lähtien desimaalipisteet voi näyttää pilkkuina **style.format**-funktion **decimal**-parametria käyttäen." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
 nrosukupikäperhekoulutuspalveluvpalkkajohtotyötovtyöymppalkkattyötehttyötervlomaosakuntosahieroja
count82,082,082,082,081,080,082,082,081,082,082,082,047,020,09,022,0
mean41,51,238,01,62,012,22563,93,14,13,22,13,21,01,01,01,0
std23,80,49,80,50,88,8849,41,10,81,21,11,00,00,00,00,0
min1,01,020,01,01,00,01521,01,02,01,01,01,01,01,01,01,0
25%21,21,031,01,01,03,82027,02,04,03,01,03,01,01,01,01,0
50%41,51,037,52,02,012,52320,03,04,03,02,03,01,01,01,01,0
75%61,81,044,02,03,018,22808,04,05,04,03,04,01,01,01,01,0
max82,02,061,02,04,036,06278,05,05,05,05,05,01,01,01,01,0
\n" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Avaan datan\n", "df = pd.read_excel('https://taanila.fi/data1.xlsx')\n", "\n", "# Lasken tunnuslukuja ja esitän tulokset yhdellä desimaalilla desimaalipilkkua käyttäen\n", "df.describe().style.format('{:.1f}', decimal=',')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f%
Peruskoulu2733,3 %
2. aste3037,0 %
Korkeakoulu2227,2 %
Ylempi korkeakoulu22,5 %
\n" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Frekvenssitaulukko koulutuksesta\n", "\n", "df1 = pd.crosstab(df['koulutus'], 'f')\n", "df1['%'] = df1/df1.sum()*100\n", "df1.columns.name = ''\n", "df1.index = ['Peruskoulu', '2. aste', 'Korkeakoulu', 'Ylempi korkeakoulu']\n", "\n", "# Frekvenssit ilman desimaaleja, prosentit yhdellä desimaalilla, desimaalierottimena pilkku\n", "df1.style.format({'n':'{:.0f}', '%':'{:.1f} %'}, decimal=',')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vanhemmat pandas-kirjaston versiot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "style.format-funktion decimal-parametri on käytettävissä pandas-versiosta 1.3.0 alkaen. Vanhemmissa versiossa desimaalipisteen vaihtaminen pilkuiksi tarvitsee tehdä toisella tavalla.\n", "\n", "Seuraavassa **pilkut**-funktio pyöristää (round) luvun yhden desimaalin tarkkuuteen ja korvaa merkkijonoksi muutetusta luvusta (str) pisteen pilkulla.\n", "\n", "Jos x ei ole luku, niin round-funktio kaatuu virheilmoitukseen. Tässä kaatuminen estetään virheensieppauksella (try - except).\n", "\n", "**pilkutp**-funktio lisää perään välilyönnin ja %-merkin." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def pilkut(x):\n", " try:\n", " x = round(x, 1)\n", " except:\n", " pass\n", " x = str(x).replace('.', ',')\n", " return x\n", "\n", "def pilkutp(x):\n", " try:\n", " x = round(x, 1)\n", " except:\n", " pass\n", " x = str(x).replace('.', ',')\n", " return x+' %'" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nrosukupikäperhekoulutuspalveluvpalkkajohtotyötovtyöymppalkkattyötehttyötervlomaosakuntosahieroja
count82,082,082,082,081,080,082,082,081,082,082,082,047,020,09,022,0
mean41,51,238,01,62,012,22563,93,14,13,22,13,21,01,01,01,0
std23,80,49,80,50,88,8849,41,10,81,21,11,00,00,00,00,0
min1,01,020,01,01,00,01521,01,02,01,01,01,01,01,01,01,0
25%21,21,031,01,01,03,82027,02,04,03,01,03,01,01,01,01,0
50%41,51,037,52,02,012,52320,03,04,03,02,03,01,01,01,01,0
75%61,81,044,02,03,018,22808,04,05,04,03,04,01,01,01,01,0
max82,02,061,02,04,036,06278,05,05,05,05,05,01,01,01,01,0
\n", "
" ], "text/plain": [ " nro sukup ikä perhe koulutus palveluv palkka johto työtov työymp \\\n", "count 82,0 82,0 82,0 82,0 81,0 80,0 82,0 82,0 81,0 82,0 \n", "mean 41,5 1,2 38,0 1,6 2,0 12,2 2563,9 3,1 4,1 3,2 \n", "std 23,8 0,4 9,8 0,5 0,8 8,8 849,4 1,1 0,8 1,2 \n", "min 1,0 1,0 20,0 1,0 1,0 0,0 1521,0 1,0 2,0 1,0 \n", "25% 21,2 1,0 31,0 1,0 1,0 3,8 2027,0 2,0 4,0 3,0 \n", "50% 41,5 1,0 37,5 2,0 2,0 12,5 2320,0 3,0 4,0 3,0 \n", "75% 61,8 1,0 44,0 2,0 3,0 18,2 2808,0 4,0 5,0 4,0 \n", "max 82,0 2,0 61,0 2,0 4,0 36,0 6278,0 5,0 5,0 5,0 \n", "\n", " palkkat työteht työterv lomaosa kuntosa hieroja \n", "count 82,0 82,0 47,0 20,0 9,0 22,0 \n", "mean 2,1 3,2 1,0 1,0 1,0 1,0 \n", "std 1,1 1,0 0,0 0,0 0,0 0,0 \n", "min 1,0 1,0 1,0 1,0 1,0 1,0 \n", "25% 1,0 3,0 1,0 1,0 1,0 1,0 \n", "50% 2,0 3,0 1,0 1,0 1,0 1,0 \n", "75% 3,0 4,0 1,0 1,0 1,0 1,0 \n", "max 5,0 5,0 1,0 1,0 1,0 1,0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Lasken tunnuslukuja\n", "df2 = df.describe()\n", "\n", "# Korvaan pisteet pilkuilla\n", "for col in df2.columns:\n", " df2[col] = df2[col].apply(pilkut)\n", "\n", "df2" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f%
Peruskoulu2733.333333
2. aste3037.037037
Korkeakoulu2227.160494
Ylempi korkeakoulu22.469136
\n", "
" ], "text/plain": [ " f %\n", "Peruskoulu 27 33.333333\n", "2. aste 30 37.037037\n", "Korkeakoulu 22 27.160494\n", "Ylempi korkeakoulu 2 2.469136" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Aiemmin tehty frekvenssitaulukko koulutuksesta\n", "df1" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f%
Peruskoulu2733,3 %
2. aste3037,0 %
Korkeakoulu2227,2 %
Ylempi korkeakoulu22,5 %
\n", "
" ], "text/plain": [ " f %\n", "Peruskoulu 27 33,3 %\n", "2. aste 30 37,0 %\n", "Korkeakoulu 22 27,2 %\n", "Ylempi korkeakoulu 2 2,5 %" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Pisteet pilkuiksi (df1 muuttuu pysyvästi)\n", "\n", "df1['%'] = df1['%'].apply(pilkutp)\n", "df1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lisätietoa\n", "\n", "Data-analytiikka Pythonilla https://tilastoapu.wordpress.com/python/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 2 }