{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Päivitetty 2023-05-02 / Aki Taanila\n"
]
}
],
"source": [
"from datetime import datetime\n",
"print(f'Päivitetty {datetime.now().date()} / Aki Taanila')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Desimaalipisteet pilkuiksi\n",
"\n",
"Pythonissa käytetään desimaalierottimena pistettä. Seuraavassa esittelen keinoja vaihtaa desimaalierottimeksi pilkku."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas-kirjaston versiosta 1.3.0 lähtien desimaalipisteet voi näyttää pilkkuina **style.format**-funktion **decimal**-parametria käyttäen."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" nro | \n",
" sukup | \n",
" ikä | \n",
" perhe | \n",
" koulutus | \n",
" palveluv | \n",
" palkka | \n",
" johto | \n",
" työtov | \n",
" työymp | \n",
" palkkat | \n",
" työteht | \n",
" työterv | \n",
" lomaosa | \n",
" kuntosa | \n",
" hieroja | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 82,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 81,0 | \n",
" 80,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 81,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 47,0 | \n",
" 20,0 | \n",
" 9,0 | \n",
" 22,0 | \n",
"
\n",
" \n",
" mean | \n",
" 41,5 | \n",
" 1,2 | \n",
" 38,0 | \n",
" 1,6 | \n",
" 2,0 | \n",
" 12,2 | \n",
" 2563,9 | \n",
" 3,1 | \n",
" 4,1 | \n",
" 3,2 | \n",
" 2,1 | \n",
" 3,2 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" std | \n",
" 23,8 | \n",
" 0,4 | \n",
" 9,8 | \n",
" 0,5 | \n",
" 0,8 | \n",
" 8,8 | \n",
" 849,4 | \n",
" 1,1 | \n",
" 0,8 | \n",
" 1,2 | \n",
" 1,1 | \n",
" 1,0 | \n",
" 0,0 | \n",
" 0,0 | \n",
" 0,0 | \n",
" 0,0 | \n",
"
\n",
" \n",
" min | \n",
" 1,0 | \n",
" 1,0 | \n",
" 20,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 0,0 | \n",
" 1521,0 | \n",
" 1,0 | \n",
" 2,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" 25% | \n",
" 21,2 | \n",
" 1,0 | \n",
" 31,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 3,8 | \n",
" 2027,0 | \n",
" 2,0 | \n",
" 4,0 | \n",
" 3,0 | \n",
" 1,0 | \n",
" 3,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" 50% | \n",
" 41,5 | \n",
" 1,0 | \n",
" 37,5 | \n",
" 2,0 | \n",
" 2,0 | \n",
" 12,5 | \n",
" 2320,0 | \n",
" 3,0 | \n",
" 4,0 | \n",
" 3,0 | \n",
" 2,0 | \n",
" 3,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" 75% | \n",
" 61,8 | \n",
" 1,0 | \n",
" 44,0 | \n",
" 2,0 | \n",
" 3,0 | \n",
" 18,2 | \n",
" 2808,0 | \n",
" 4,0 | \n",
" 5,0 | \n",
" 4,0 | \n",
" 3,0 | \n",
" 4,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" max | \n",
" 82,0 | \n",
" 2,0 | \n",
" 61,0 | \n",
" 2,0 | \n",
" 4,0 | \n",
" 36,0 | \n",
" 6278,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Avaan datan\n",
"df = pd.read_excel('https://taanila.fi/data1.xlsx')\n",
"\n",
"# Lasken tunnuslukuja ja esitän tulokset yhdellä desimaalilla desimaalipilkkua käyttäen\n",
"df.describe().style.format('{:.1f}', decimal=',')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
" \n",
" \n",
" | \n",
" f | \n",
" % | \n",
"
\n",
" \n",
" \n",
" \n",
" Peruskoulu | \n",
" 27 | \n",
" 33,3 % | \n",
"
\n",
" \n",
" 2. aste | \n",
" 30 | \n",
" 37,0 % | \n",
"
\n",
" \n",
" Korkeakoulu | \n",
" 22 | \n",
" 27,2 % | \n",
"
\n",
" \n",
" Ylempi korkeakoulu | \n",
" 2 | \n",
" 2,5 % | \n",
"
\n",
" \n",
"
\n"
],
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Frekvenssitaulukko koulutuksesta\n",
"\n",
"df1 = pd.crosstab(df['koulutus'], 'f')\n",
"df1['%'] = df1/df1.sum()*100\n",
"df1.columns.name = ''\n",
"df1.index = ['Peruskoulu', '2. aste', 'Korkeakoulu', 'Ylempi korkeakoulu']\n",
"\n",
"# Frekvenssit ilman desimaaleja, prosentit yhdellä desimaalilla, desimaalierottimena pilkku\n",
"df1.style.format({'n':'{:.0f}', '%':'{:.1f} %'}, decimal=',')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Vanhemmat pandas-kirjaston versiot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"style.format-funktion decimal-parametri on käytettävissä pandas-versiosta 1.3.0 alkaen. Vanhemmissa versiossa desimaalipisteen vaihtaminen pilkuiksi tarvitsee tehdä toisella tavalla.\n",
"\n",
"Seuraavassa **pilkut**-funktio pyöristää (round) luvun yhden desimaalin tarkkuuteen ja korvaa merkkijonoksi muutetusta luvusta (str) pisteen pilkulla.\n",
"\n",
"Jos x ei ole luku, niin round-funktio kaatuu virheilmoitukseen. Tässä kaatuminen estetään virheensieppauksella (try - except).\n",
"\n",
"**pilkutp**-funktio lisää perään välilyönnin ja %-merkin."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def pilkut(x):\n",
" try:\n",
" x = round(x, 1)\n",
" except:\n",
" pass\n",
" x = str(x).replace('.', ',')\n",
" return x\n",
"\n",
"def pilkutp(x):\n",
" try:\n",
" x = round(x, 1)\n",
" except:\n",
" pass\n",
" x = str(x).replace('.', ',')\n",
" return x+' %'"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" nro | \n",
" sukup | \n",
" ikä | \n",
" perhe | \n",
" koulutus | \n",
" palveluv | \n",
" palkka | \n",
" johto | \n",
" työtov | \n",
" työymp | \n",
" palkkat | \n",
" työteht | \n",
" työterv | \n",
" lomaosa | \n",
" kuntosa | \n",
" hieroja | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 82,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 81,0 | \n",
" 80,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 81,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 82,0 | \n",
" 47,0 | \n",
" 20,0 | \n",
" 9,0 | \n",
" 22,0 | \n",
"
\n",
" \n",
" mean | \n",
" 41,5 | \n",
" 1,2 | \n",
" 38,0 | \n",
" 1,6 | \n",
" 2,0 | \n",
" 12,2 | \n",
" 2563,9 | \n",
" 3,1 | \n",
" 4,1 | \n",
" 3,2 | \n",
" 2,1 | \n",
" 3,2 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" std | \n",
" 23,8 | \n",
" 0,4 | \n",
" 9,8 | \n",
" 0,5 | \n",
" 0,8 | \n",
" 8,8 | \n",
" 849,4 | \n",
" 1,1 | \n",
" 0,8 | \n",
" 1,2 | \n",
" 1,1 | \n",
" 1,0 | \n",
" 0,0 | \n",
" 0,0 | \n",
" 0,0 | \n",
" 0,0 | \n",
"
\n",
" \n",
" min | \n",
" 1,0 | \n",
" 1,0 | \n",
" 20,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 0,0 | \n",
" 1521,0 | \n",
" 1,0 | \n",
" 2,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" 25% | \n",
" 21,2 | \n",
" 1,0 | \n",
" 31,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 3,8 | \n",
" 2027,0 | \n",
" 2,0 | \n",
" 4,0 | \n",
" 3,0 | \n",
" 1,0 | \n",
" 3,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" 50% | \n",
" 41,5 | \n",
" 1,0 | \n",
" 37,5 | \n",
" 2,0 | \n",
" 2,0 | \n",
" 12,5 | \n",
" 2320,0 | \n",
" 3,0 | \n",
" 4,0 | \n",
" 3,0 | \n",
" 2,0 | \n",
" 3,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" 75% | \n",
" 61,8 | \n",
" 1,0 | \n",
" 44,0 | \n",
" 2,0 | \n",
" 3,0 | \n",
" 18,2 | \n",
" 2808,0 | \n",
" 4,0 | \n",
" 5,0 | \n",
" 4,0 | \n",
" 3,0 | \n",
" 4,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
" max | \n",
" 82,0 | \n",
" 2,0 | \n",
" 61,0 | \n",
" 2,0 | \n",
" 4,0 | \n",
" 36,0 | \n",
" 6278,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 5,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
" 1,0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" nro sukup ikä perhe koulutus palveluv palkka johto työtov työymp \\\n",
"count 82,0 82,0 82,0 82,0 81,0 80,0 82,0 82,0 81,0 82,0 \n",
"mean 41,5 1,2 38,0 1,6 2,0 12,2 2563,9 3,1 4,1 3,2 \n",
"std 23,8 0,4 9,8 0,5 0,8 8,8 849,4 1,1 0,8 1,2 \n",
"min 1,0 1,0 20,0 1,0 1,0 0,0 1521,0 1,0 2,0 1,0 \n",
"25% 21,2 1,0 31,0 1,0 1,0 3,8 2027,0 2,0 4,0 3,0 \n",
"50% 41,5 1,0 37,5 2,0 2,0 12,5 2320,0 3,0 4,0 3,0 \n",
"75% 61,8 1,0 44,0 2,0 3,0 18,2 2808,0 4,0 5,0 4,0 \n",
"max 82,0 2,0 61,0 2,0 4,0 36,0 6278,0 5,0 5,0 5,0 \n",
"\n",
" palkkat työteht työterv lomaosa kuntosa hieroja \n",
"count 82,0 82,0 47,0 20,0 9,0 22,0 \n",
"mean 2,1 3,2 1,0 1,0 1,0 1,0 \n",
"std 1,1 1,0 0,0 0,0 0,0 0,0 \n",
"min 1,0 1,0 1,0 1,0 1,0 1,0 \n",
"25% 1,0 3,0 1,0 1,0 1,0 1,0 \n",
"50% 2,0 3,0 1,0 1,0 1,0 1,0 \n",
"75% 3,0 4,0 1,0 1,0 1,0 1,0 \n",
"max 5,0 5,0 1,0 1,0 1,0 1,0 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Lasken tunnuslukuja\n",
"df2 = df.describe()\n",
"\n",
"# Korvaan pisteet pilkuilla\n",
"for col in df2.columns:\n",
" df2[col] = df2[col].apply(pilkut)\n",
"\n",
"df2"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" f | \n",
" % | \n",
"
\n",
" \n",
" \n",
" \n",
" Peruskoulu | \n",
" 27 | \n",
" 33.333333 | \n",
"
\n",
" \n",
" 2. aste | \n",
" 30 | \n",
" 37.037037 | \n",
"
\n",
" \n",
" Korkeakoulu | \n",
" 22 | \n",
" 27.160494 | \n",
"
\n",
" \n",
" Ylempi korkeakoulu | \n",
" 2 | \n",
" 2.469136 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" f %\n",
"Peruskoulu 27 33.333333\n",
"2. aste 30 37.037037\n",
"Korkeakoulu 22 27.160494\n",
"Ylempi korkeakoulu 2 2.469136"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Aiemmin tehty frekvenssitaulukko koulutuksesta\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" f | \n",
" % | \n",
"
\n",
" \n",
" \n",
" \n",
" Peruskoulu | \n",
" 27 | \n",
" 33,3 % | \n",
"
\n",
" \n",
" 2. aste | \n",
" 30 | \n",
" 37,0 % | \n",
"
\n",
" \n",
" Korkeakoulu | \n",
" 22 | \n",
" 27,2 % | \n",
"
\n",
" \n",
" Ylempi korkeakoulu | \n",
" 2 | \n",
" 2,5 % | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" f %\n",
"Peruskoulu 27 33,3 %\n",
"2. aste 30 37,0 %\n",
"Korkeakoulu 22 27,2 %\n",
"Ylempi korkeakoulu 2 2,5 %"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pisteet pilkuiksi (df1 muuttuu pysyvästi)\n",
"\n",
"df1['%'] = df1['%'].apply(pilkutp)\n",
"df1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lisätietoa\n",
"\n",
"Data-analytiikka Pythonilla https://tilastoapu.wordpress.com/python/"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}