{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Menyiapkan Data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Pekerjaan Di Balik Layar\n", "- Menyiapkan data (*data cleaning*) sering menjadi pekerjaan paling memakan waktu\n", "- Metode menyiapkan data jarang diajarkan dalam kelas\n", "- Padahal kesalahan dalam menyiapkan data bisa mengubah hasil analisis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Memuat data dari file ke R" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dengan library `rio`, kita bisa mengimpor ke R data dari berbagai macam format: `txt`, `csv`, `xls`, `xlsx`, `dbf`, `sav`, `dta`, `sas7.bdat`. Data yang diimpor akan menjadi obyek R tipe `data.frame`.\n", "\n", "Install dulu package `rio` jika belum pernah diinstal sebelumnya. Muat package tersebut dengan perintah `library` " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# install.packages('rio')\n", "library(rio)\n", "datakab = import('https://raw.githubusercontent.com/msaidf/statek/master/content/indo-dapoer_data.csv')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mengecek Data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Tampilkan keseluruhan data dengan mengenter nama obyek data. Namun untuk data yang besar, ini tidak banyak membantu bahkan bisa makan waktu lama. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datakab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perlu dibiasakan untuk mengecek dulu jumlah baris dari tabel data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "12017" ], "text/latex": [ "12017" ], "text/markdown": [ "12017" ], "text/plain": [ "[1] 12017" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nrow(datakab)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
Gunakan `head` dan `tail` untuk menampilkan `n` baris teratas dan terbawah " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Region NameRegion CodeSeries NameSeries Code2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623 Human Development Index IDX.HDI .. .. .. .. 65.87778 66.86649 67.52173 68.36661 69.38033 .. .. 70.95 .. 72.07 ..
Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623 Morbidity Rate (in %) SH.MORB.ZS .. .. .. .. 29.19532 .. 33.22042 35.90795 31.811 29.2377 30.0167 33.93033 30.5736273527145 29.909548163414 ..
Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623 Net Enrollment Ratio: Primary (in %)SE.PRM.NENR.ZS .. .. .. .. 94.65 .. 94.89 94.34 96.21 96.55 98.32 86.16 90.96 95.395058 97.03
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " Region Name & Region Code & Series Name & Series Code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t Aceh Barat Daya, Kab. & IDN\\_Aceh\\_Barat\\_Daya\\_Kab\\_73623 & Human Development Index & IDX.HDI & .. & .. & .. & .. & 65.87778 & 66.86649 & 67.52173 & 68.36661 & 69.38033 & .. & .. & 70.95 & .. & 72.07 & .. \\\\\n", "\t Aceh Barat Daya, Kab. & IDN\\_Aceh\\_Barat\\_Daya\\_Kab\\_73623 & Morbidity Rate (in \\%) & SH.MORB.ZS & .. & .. & .. & .. & 29.19532 & .. & 33.22042 & 35.90795 & 31.811 & 29.2377 & 30.0167 & 33.93033 & 30.5736273527145 & 29.909548163414 & .. \\\\\n", "\t Aceh Barat Daya, Kab. & IDN\\_Aceh\\_Barat\\_Daya\\_Kab\\_73623 & Net Enrollment Ratio: Primary (in \\%) & SE.PRM.NENR.ZS & .. & .. & .. & .. & 94.65 & .. & 94.89 & 94.34 & 96.21 & 96.55 & 98.32 & 86.16 & 90.96 & 95.395058 & 97.03 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "Region Name | Region Code | Series Name | Series Code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "|---|---|---|\n", "| Aceh Barat Daya, Kab. | IDN_Aceh_Barat_Daya_Kab_73623 | Human Development Index | IDX.HDI | .. | .. | .. | .. | 65.87778 | 66.86649 | 67.52173 | 68.36661 | 69.38033 | .. | .. | 70.95 | .. | 72.07 | .. | \n", "| Aceh Barat Daya, Kab. | IDN_Aceh_Barat_Daya_Kab_73623 | Morbidity Rate (in %) | SH.MORB.ZS | .. | .. | .. | .. | 29.19532 | .. | 33.22042 | 35.90795 | 31.811 | 29.2377 | 30.0167 | 33.93033 | 30.5736273527145 | 29.909548163414 | .. | \n", "| Aceh Barat Daya, Kab. | IDN_Aceh_Barat_Daya_Kab_73623 | Net Enrollment Ratio: Primary (in %) | SE.PRM.NENR.ZS | .. | .. | .. | .. | 94.65 | .. | 94.89 | 94.34 | 96.21 | 96.55 | 98.32 | 86.16 | 90.96 | 95.395058 | 97.03 | \n", "\n", "\n" ], "text/plain": [ " Region Name Region Code \n", "1 Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623\n", "2 Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623\n", "3 Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623\n", " Series Name Series Code 2000 [YR2000]\n", "1 Human Development Index IDX.HDI .. \n", "2 Morbidity Rate (in %) SH.MORB.ZS .. \n", "3 Net Enrollment Ratio: Primary (in %) SE.PRM.NENR.ZS .. \n", " 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004] 2005 [YR2005]\n", "1 .. .. .. 65.87778 66.86649 \n", "2 .. .. .. 29.19532 .. \n", "3 .. .. .. 94.65 .. \n", " 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009] 2010 [YR2010]\n", "1 67.52173 68.36661 69.38033 .. .. \n", "2 33.22042 35.90795 31.811 29.2377 30.0167 \n", "3 94.89 94.34 96.21 96.55 98.32 \n", " 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]\n", "1 70.95 .. 72.07 .. \n", "2 33.93033 30.5736273527145 29.909548163414 .. \n", "3 86.16 90.96 95.395058 97.03 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(datakab, n = 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
`n` adalah argumen opsional" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Region NameRegion CodeSeries NameSeries Code2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
12012Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12013
12014
12015
12016Data from database: INDO-DAPOER (Indonesia Database for Policy and Economic Research)
12017Last Updated: 05/28/2015
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & Region Name & Region Code & Series Name & Series Code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t12012 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Other (in \\% of total villages) & ROD.VILG.OTHR.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12013 & & & & & & & & & & & & & & & & & & & \\\\\n", "\t12014 & & & & & & & & & & & & & & & & & & & \\\\\n", "\t12015 & & & & & & & & & & & & & & & & & & & \\\\\n", "\t12016 & Data from database: INDO-DAPOER (Indonesia Database for Policy and Economic Research) & & & & & & & & & & & & & & & & & & \\\\\n", "\t12017 & Last Updated: 05/28/2015 & & & & & & & & & & & & & & & & & & \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Name | Region Code | Series Name | Series Code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "|---|---|---|---|---|---|\n", "| 12012 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Other (in % of total villages) | ROD.VILG.OTHR.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12013 | | | | | | | | | | | | | | | | | | | | \n", "| 12014 | | | | | | | | | | | | | | | | | | | | \n", "| 12015 | | | | | | | | | | | | | | | | | | | | \n", "| 12016 | Data from database: INDO-DAPOER (Indonesia Database for Policy and Economic Research) | | | | | | | | | | | | | | | | | | | \n", "| 12017 | Last Updated: 05/28/2015 | | | | | | | | | | | | | | | | | | | \n", "\n", "\n" ], "text/plain": [ " Region Name \n", "12012 Yogyakarta, Kota \n", "12013 \n", "12014 \n", "12015 \n", "12016 Data from database: INDO-DAPOER (Indonesia Database for Policy and Economic Research)\n", "12017 Last Updated: 05/28/2015 \n", " Region Code \n", "12012 IDN_Yogyakarta_Kota_17983\n", "12013 \n", "12014 \n", "12015 \n", "12016 \n", "12017 \n", " Series Name Series Code \n", "12012 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS\n", "12013 \n", "12014 \n", "12015 \n", "12016 \n", "12017 \n", " 2000 [YR2000] 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004]\n", "12012 .. .. .. .. .. \n", "12013 \n", "12014 \n", "12015 \n", "12016 \n", "12017 \n", " 2005 [YR2005] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009]\n", "12012 .. .. .. .. .. \n", "12013 \n", "12014 \n", "12015 \n", "12016 \n", "12017 \n", " 2010 [YR2010] 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]\n", "12012 .. .. .. .. .. \n", "12013 \n", "12014 \n", "12015 \n", "12016 \n", "12017 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tail(datakab)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Lima baris terakhir tabel bukan merupakan data, sehingga perlu kita keluarkan. Kita bisa gunakan indeks **positif** untuk memilih baris yang **dipertahankan**," ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
Region NameRegion CodeSeries NameSeries Code2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
12011Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Gravel (in % of total villages)ROD.VILG.GRAVL.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12012Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & Region Name & Region Code & Series Name & Series Code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t12011 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Gravel (in \\% of total villages) & ROD.VILG.GRAVL.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12012 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Other (in \\% of total villages) & ROD.VILG.OTHR.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Name | Region Code | Series Name | Series Code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "|---|---|\n", "| 12011 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Gravel (in % of total villages) | ROD.VILG.GRAVL.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12012 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Other (in % of total villages) | ROD.VILG.OTHR.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "\n", "\n" ], "text/plain": [ " Region Name Region Code \n", "12011 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", "12012 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", " Series Name Series Code \n", "12011 Villages with road: Gravel (in % of total villages) ROD.VILG.GRAVL.ZS\n", "12012 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS \n", " 2000 [YR2000] 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2005 [YR2005] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2010 [YR2010] 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tail(datakab[1:12012,], 2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
atau gunakan indeks **negatif** untuk memilih baris yang **dibuang**.\n", "> R memproses fungsi dari dalam ke luar. Namun penulisan seperti ini sulit dibaca dan diikuti urutan prosesnya. Karenanya, kini pemrograman R banyak menggunakan operator *piping* `%>%` dari library `magrittr`. `f() %>% g()` berarti output dari fungsi `f` akan menjadi argumen pertama dari fungsi `g`" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
Region NameRegion CodeSeries NameSeries Code2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
12010Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Dirt (in % of total villages) ROD.VILG.DIRT.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12011Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Gravel (in % of total villages)ROD.VILG.GRAVL.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12012Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & Region Name & Region Code & Series Name & Series Code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t12010 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Dirt (in \\% of total villages) & ROD.VILG.DIRT.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12011 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Gravel (in \\% of total villages) & ROD.VILG.GRAVL.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12012 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Other (in \\% of total villages) & ROD.VILG.OTHR.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Name | Region Code | Series Name | Series Code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "|---|---|---|\n", "| 12010 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Dirt (in % of total villages) | ROD.VILG.DIRT.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12011 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Gravel (in % of total villages) | ROD.VILG.GRAVL.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12012 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Other (in % of total villages) | ROD.VILG.OTHR.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "\n", "\n" ], "text/plain": [ " Region Name Region Code \n", "12010 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", "12011 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", "12012 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", " Series Name Series Code \n", "12010 Villages with road: Dirt (in % of total villages) ROD.VILG.DIRT.ZS \n", "12011 Villages with road: Gravel (in % of total villages) ROD.VILG.GRAVL.ZS\n", "12012 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS \n", " 2000 [YR2000] 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004]\n", "12010 .. .. .. .. .. \n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2005 [YR2005] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009]\n", "12010 .. .. .. .. .. \n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2010 [YR2010] 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]\n", "12010 .. .. .. .. .. \n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "library(magrittr)\n", "datakab[-12013:-12017,] %>% tail(3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
Subset juga bisa dibuat dengan memfilter tabel data agar hanya memberikan baris yang nilai variabelnya memenuhi kriteria yang ditetapkan. Kita bisa gunakan fungsi `which` untuk menghasilkan indeks baris yang memenuhi kriteria tersebut." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> - Tanda titik dalam fungsi subset `[.,]` digunakan untuk memandu operator pipe bahwa input dari fungsi sebelumnya menjadi argumen di lokasi titik tersebut\n", "> - `$` digunakan untuk memilih kolom/variabel `Series Code` yang merupakan komponen dari tabel `datakab`\n", "> - nama variabel perlu diapit dengan *backtick* (```) hanya jika mengandung spasi " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
Region NameRegion CodeSeries NameSeries Code2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
12011Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Gravel (in % of total villages)ROD.VILG.GRAVL.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12012Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & Region Name & Region Code & Series Name & Series Code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t12011 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Gravel (in \\% of total villages) & ROD.VILG.GRAVL.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12012 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Other (in \\% of total villages) & ROD.VILG.OTHR.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Name | Region Code | Series Name | Series Code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "|---|---|\n", "| 12011 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Gravel (in % of total villages) | ROD.VILG.GRAVL.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12012 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Other (in % of total villages) | ROD.VILG.OTHR.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "\n", "\n" ], "text/plain": [ " Region Name Region Code \n", "12011 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", "12012 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", " Series Name Series Code \n", "12011 Villages with road: Gravel (in % of total villages) ROD.VILG.GRAVL.ZS\n", "12012 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS \n", " 2000 [YR2000] 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2005 [YR2005] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2010 [YR2010] 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "which(datakab$`Series Code` != \"\") %>% datakab[.,] %>% tail(2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
Cara lain untuk memilih baris berdasar kriteria adalah menggunakan fungsi `filter` dari library `dplyr`. \n", "> - Kita bisa menggunakan fungsi dari suatu library tanpa memuatnya terlebih dulu dengan menggunakan `::`\n", "> - Bandingkan hasil perintah di bawah jika dijalankan tanpa awalan `dplyr::`\n", "> - Perbedaan tersebut terjadi karena fungsi `filter` yang digunakan berasal dari package lain\n", "> - Jika ada lebih dari satu fungsi yang bernama sama, R akan memprioritaskan fungsi dari library yang dimuat paling akhir." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
Region NameRegion CodeSeries NameSeries Code2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
12011Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Gravel (in % of total villages)ROD.VILG.GRAVL.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12012Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & Region Name & Region Code & Series Name & Series Code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t12011 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Gravel (in \\% of total villages) & ROD.VILG.GRAVL.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12012 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Other (in \\% of total villages) & ROD.VILG.OTHR.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Name | Region Code | Series Name | Series Code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "|---|---|\n", "| 12011 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Gravel (in % of total villages) | ROD.VILG.GRAVL.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12012 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Other (in % of total villages) | ROD.VILG.OTHR.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "\n", "\n" ], "text/plain": [ " Region Name Region Code \n", "12011 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", "12012 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", " Series Name Series Code \n", "12011 Villages with road: Gravel (in % of total villages) ROD.VILG.GRAVL.ZS\n", "12012 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS \n", " 2000 [YR2000] 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2005 [YR2005] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2010 [YR2010] 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]\n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dplyr::filter(datakab, `Series Code` != \"\") %>% tail(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Jika sudah yakin indeks menghasilkan subset yang diinginkan, simpan subset tersebut menjadi objek. Jangan sertakan `tail` karena yang ingin disimpan adalah keseluruhan data, bukan cuma baris terbawah saja. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "datakab = dplyr::filter(datakab, `Series Code` != \"\") " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Memilih variabel\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seperti memilih baris, kita bisa memilih variabel menggunakan indeks kolom yang terletak setelah koma di fungsi subset `[]`" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Region NameSeries NameSeries Code2000 [YR2000]2014 [YR2014]
12009Yogyakarta, Kota Villages with road: Asphalt (in % of total villages)ROD.VILG.ASPH.ZS 100 ..
12010Yogyakarta, Kota Villages with road: Dirt (in % of total villages) ROD.VILG.DIRT.ZS .. ..
12011Yogyakarta, Kota Villages with road: Gravel (in % of total villages) ROD.VILG.GRAVL.ZS .. ..
12012Yogyakarta, Kota Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS .. ..
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", " & Region Name & Series Name & Series Code & 2000 {[}YR2000{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t12009 & Yogyakarta, Kota & Villages with road: Asphalt (in \\% of total villages) & ROD.VILG.ASPH.ZS & 100 & .. \\\\\n", "\t12010 & Yogyakarta, Kota & Villages with road: Dirt (in \\% of total villages) & ROD.VILG.DIRT.ZS & .. & .. \\\\\n", "\t12011 & Yogyakarta, Kota & Villages with road: Gravel (in \\% of total villages) & ROD.VILG.GRAVL.ZS & .. & .. \\\\\n", "\t12012 & Yogyakarta, Kota & Villages with road: Other (in \\% of total villages) & ROD.VILG.OTHR.ZS & .. & .. \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Name | Series Name | Series Code | 2000 [YR2000] | 2014 [YR2014] | \n", "|---|---|---|---|\n", "| 12009 | Yogyakarta, Kota | Villages with road: Asphalt (in % of total villages) | ROD.VILG.ASPH.ZS | 100 | .. | \n", "| 12010 | Yogyakarta, Kota | Villages with road: Dirt (in % of total villages) | ROD.VILG.DIRT.ZS | .. | .. | \n", "| 12011 | Yogyakarta, Kota | Villages with road: Gravel (in % of total villages) | ROD.VILG.GRAVL.ZS | .. | .. | \n", "| 12012 | Yogyakarta, Kota | Villages with road: Other (in % of total villages) | ROD.VILG.OTHR.ZS | .. | .. | \n", "\n", "\n" ], "text/plain": [ " Region Name Series Name \n", "12009 Yogyakarta, Kota Villages with road: Asphalt (in % of total villages)\n", "12010 Yogyakarta, Kota Villages with road: Dirt (in % of total villages) \n", "12011 Yogyakarta, Kota Villages with road: Gravel (in % of total villages) \n", "12012 Yogyakarta, Kota Villages with road: Other (in % of total villages) \n", " Series Code 2000 [YR2000] 2014 [YR2014]\n", "12009 ROD.VILG.ASPH.ZS 100 .. \n", "12010 ROD.VILG.DIRT.ZS .. .. \n", "12011 ROD.VILG.GRAVL.ZS .. .. \n", "12012 ROD.VILG.OTHR.ZS .. .. " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "c(1, 3:5, ncol(datakab)) %>% datakab[, .] %>% tail(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Indeks kolom angka bisa digantikan dengan vektor nama variabel " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Region CodeSeries Name
12007IDN_Yogyakarta_Kota_17983 Total GDP based on expenditure (in IDR Million)
12008IDN_Yogyakarta_Kota_17983 Total Population (in number of people)
12009IDN_Yogyakarta_Kota_17983 Villages with road: Asphalt (in % of total villages)
12010IDN_Yogyakarta_Kota_17983 Villages with road: Dirt (in % of total villages)
12011IDN_Yogyakarta_Kota_17983 Villages with road: Gravel (in % of total villages)
12012IDN_Yogyakarta_Kota_17983 Villages with road: Other (in % of total villages)
\n" ], "text/latex": [ "\\begin{tabular}{r|ll}\n", " & Region Code & Series Name\\\\\n", "\\hline\n", "\t12007 & IDN\\_Yogyakarta\\_Kota\\_17983 & Total GDP based on expenditure (in IDR Million) \\\\\n", "\t12008 & IDN\\_Yogyakarta\\_Kota\\_17983 & Total Population (in number of people) \\\\\n", "\t12009 & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Asphalt (in \\% of total villages)\\\\\n", "\t12010 & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Dirt (in \\% of total villages) \\\\\n", "\t12011 & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Gravel (in \\% of total villages) \\\\\n", "\t12012 & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Other (in \\% of total villages) \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Code | Series Name | \n", "|---|---|---|---|---|---|\n", "| 12007 | IDN_Yogyakarta_Kota_17983 | Total GDP based on expenditure (in IDR Million) | \n", "| 12008 | IDN_Yogyakarta_Kota_17983 | Total Population (in number of people) | \n", "| 12009 | IDN_Yogyakarta_Kota_17983 | Villages with road: Asphalt (in % of total villages) | \n", "| 12010 | IDN_Yogyakarta_Kota_17983 | Villages with road: Dirt (in % of total villages) | \n", "| 12011 | IDN_Yogyakarta_Kota_17983 | Villages with road: Gravel (in % of total villages) | \n", "| 12012 | IDN_Yogyakarta_Kota_17983 | Villages with road: Other (in % of total villages) | \n", "\n", "\n" ], "text/plain": [ " Region Code \n", "12007 IDN_Yogyakarta_Kota_17983\n", "12008 IDN_Yogyakarta_Kota_17983\n", "12009 IDN_Yogyakarta_Kota_17983\n", "12010 IDN_Yogyakarta_Kota_17983\n", "12011 IDN_Yogyakarta_Kota_17983\n", "12012 IDN_Yogyakarta_Kota_17983\n", " Series Name \n", "12007 Total GDP based on expenditure (in IDR Million) \n", "12008 Total Population (in number of people) \n", "12009 Villages with road: Asphalt (in % of total villages)\n", "12010 Villages with road: Dirt (in % of total villages) \n", "12011 Villages with road: Gravel (in % of total villages) \n", "12012 Villages with road: Other (in % of total villages) " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "datakab[,c('Region Code', 'Series Name')] %>% tail" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "
Pemilihan variabel juga bisa menggunakan `dplyr::select`. Namun kali ini lebih baik `dplyr` dimuat dulu agar semua fungsi pembantunya bisa ikut digunakan. \n", "> Ketika memuat `dplyr` akan ada pesan peringatan fungsi-fungsi dari package lain yang ditutupi oleh fungsi-fungsi `dplyr`. Fungsi `dplyr` itu sendiri akan berjalan baik, tapi penggunaan fungsi yang ditutupi kini perlu menyertakan prefix package asalnya." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "Attaching package: 'dplyr'\n", "\n", "The following objects are masked from 'package:stats':\n", "\n", " filter, lag\n", "\n", "The following objects are masked from 'package:base':\n", "\n", " intersect, setdiff, setequal, union\n", "\n" ] } ], "source": [ "library(dplyr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Berapa ketentuan dalam pemilihan variabel di fungsi `select`:\n", "- Nama variabel yang mengandung spasi harus diapit tanda kutip \n", "- Pilih sejumlah variabel yang berurutan cukup sebutkan variabel di pinggir, `var_kiri:var_kanan` \n", "- Pilih semua variabel yang namanya memiliki kesamaan pola, baik diawali (`starts_with`), diakhiri (`ends_with`), atau mengandung (`contains`) karakter tertentu \n", "- Gunakan tanda minus (`-`) untuk mengecualikan variabel \n", "- Urutan kolom tabel baru mengikuti urutan variabel dalam `select`\n", "- Mengganti nama variabel yang dipilih, `nama_baru = nama_lama`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
Region CodeSeries Coderegion_name2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]
12009IDN_Yogyakarta_Kota_17983ROD.VILG.ASPH.ZS Yogyakarta, Kota 100 .. .. 100 .. 100 .. .. 100 ..
12010IDN_Yogyakarta_Kota_17983ROD.VILG.DIRT.ZS Yogyakarta, Kota .. .. .. .. .. .. .. .. .. ..
12011IDN_Yogyakarta_Kota_17983ROD.VILG.GRAVL.ZS Yogyakarta, Kota .. .. .. .. .. .. .. .. .. ..
12012IDN_Yogyakarta_Kota_17983ROD.VILG.OTHR.ZS Yogyakarta, Kota .. .. .. .. .. .. .. .. .. ..
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllll}\n", " & Region Code & Series Code & region\\_name & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]}\\\\\n", "\\hline\n", "\t12009 & IDN\\_Yogyakarta\\_Kota\\_17983 & ROD.VILG.ASPH.ZS & Yogyakarta, Kota & 100 & .. & .. & 100 & .. & 100 & .. & .. & 100 & .. \\\\\n", "\t12010 & IDN\\_Yogyakarta\\_Kota\\_17983 & ROD.VILG.DIRT.ZS & Yogyakarta, Kota & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12011 & IDN\\_Yogyakarta\\_Kota\\_17983 & ROD.VILG.GRAVL.ZS & Yogyakarta, Kota & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12012 & IDN\\_Yogyakarta\\_Kota\\_17983 & ROD.VILG.OTHR.ZS & Yogyakarta, Kota & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Region Code | Series Code | region_name | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | \n", "|---|---|---|---|\n", "| 12009 | IDN_Yogyakarta_Kota_17983 | ROD.VILG.ASPH.ZS | Yogyakarta, Kota | 100 | .. | .. | 100 | .. | 100 | .. | .. | 100 | .. | \n", "| 12010 | IDN_Yogyakarta_Kota_17983 | ROD.VILG.DIRT.ZS | Yogyakarta, Kota | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12011 | IDN_Yogyakarta_Kota_17983 | ROD.VILG.GRAVL.ZS | Yogyakarta, Kota | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12012 | IDN_Yogyakarta_Kota_17983 | ROD.VILG.OTHR.ZS | Yogyakarta, Kota | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "\n", "\n" ], "text/plain": [ " Region Code Series Code region_name \n", "12009 IDN_Yogyakarta_Kota_17983 ROD.VILG.ASPH.ZS Yogyakarta, Kota\n", "12010 IDN_Yogyakarta_Kota_17983 ROD.VILG.DIRT.ZS Yogyakarta, Kota\n", "12011 IDN_Yogyakarta_Kota_17983 ROD.VILG.GRAVL.ZS Yogyakarta, Kota\n", "12012 IDN_Yogyakarta_Kota_17983 ROD.VILG.OTHR.ZS Yogyakarta, Kota\n", " 2000 [YR2000] 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004]\n", "12009 100 .. .. 100 .. \n", "12010 .. .. .. .. .. \n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. \n", " 2005 [YR2005] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009]\n", "12009 100 .. .. 100 .. \n", "12010 .. .. .. .. .. \n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "select(datakab, 'Region Code':'Series Code', -'Series Name', \n", " region_name = 'Region Name', contains('YR'), -contains('YR201')) %>% tail(4)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Merubah Nama Variabel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Jika hanya ingin mengganti nama sejumlah variabel, tanpa merubah struktur tabel data, gunakan `dplyr::rename`\n", "> hasil dari `rename` adalah tabel data, sehingga bisa langsung disubset dengan `[]` tanpa menggunakan *piping* `%>%`" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
region_nameRegion CodeSeries Nameseries_code2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
100Aceh Selatan, Kab. IDN_Aceh_Selatan_Kab_73626 Number of schools at primary level SE.SCHL.PRM 333 .. .. 211 .. 223 .. .. 128 .. .. 232 .. .. ..
101Aceh Selatan, Kab. IDN_Aceh_Selatan_Kab_73626 Number of schools at Senior Secondary levelSE.SCHL.SRSEC 25 .. .. 21 .. 25 .. .. 15 .. .. 40 .. .. ..
102Aceh Selatan, Kab. IDN_Aceh_Selatan_Kab_73626 Poverty Line (in IDR) SI.POV.NAPL .. .. .. .. .. 171815 186227 196167 203761 236741 257640 278854 281158 283446 285301
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & region\\_name & Region Code & Series Name & series\\_code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\hline\n", "\t100 & Aceh Selatan, Kab. & IDN\\_Aceh\\_Selatan\\_Kab\\_73626 & Number of schools at primary level & SE.SCHL.PRM & 333 & .. & .. & 211 & .. & 223 & .. & .. & 128 & .. & .. & 232 & .. & .. & .. \\\\\n", "\t101 & Aceh Selatan, Kab. & IDN\\_Aceh\\_Selatan\\_Kab\\_73626 & Number of schools at Senior Secondary level & SE.SCHL.SRSEC & 25 & .. & .. & 21 & .. & 25 & .. & .. & 15 & .. & .. & 40 & .. & .. & .. \\\\\n", "\t102 & Aceh Selatan, Kab. & IDN\\_Aceh\\_Selatan\\_Kab\\_73626 & Poverty Line (in IDR) & SI.POV.NAPL & .. & .. & .. & .. & .. & 171815 & 186227 & 196167 & 203761 & 236741 & 257640 & 278854 & 281158 & 283446 & 285301 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | region_name | Region Code | Series Name | series_code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "|---|---|---|\n", "| 100 | Aceh Selatan, Kab. | IDN_Aceh_Selatan_Kab_73626 | Number of schools at primary level | SE.SCHL.PRM | 333 | .. | .. | 211 | .. | 223 | .. | .. | 128 | .. | .. | 232 | .. | .. | .. | \n", "| 101 | Aceh Selatan, Kab. | IDN_Aceh_Selatan_Kab_73626 | Number of schools at Senior Secondary level | SE.SCHL.SRSEC | 25 | .. | .. | 21 | .. | 25 | .. | .. | 15 | .. | .. | 40 | .. | .. | .. | \n", "| 102 | Aceh Selatan, Kab. | IDN_Aceh_Selatan_Kab_73626 | Poverty Line (in IDR) | SI.POV.NAPL | .. | .. | .. | .. | .. | 171815 | 186227 | 196167 | 203761 | 236741 | 257640 | 278854 | 281158 | 283446 | 285301 | \n", "\n", "\n" ], "text/plain": [ " region_name Region Code \n", "100 Aceh Selatan, Kab. IDN_Aceh_Selatan_Kab_73626\n", "101 Aceh Selatan, Kab. IDN_Aceh_Selatan_Kab_73626\n", "102 Aceh Selatan, Kab. IDN_Aceh_Selatan_Kab_73626\n", " Series Name series_code 2000 [YR2000]\n", "100 Number of schools at primary level SE.SCHL.PRM 333 \n", "101 Number of schools at Senior Secondary level SE.SCHL.SRSEC 25 \n", "102 Poverty Line (in IDR) SI.POV.NAPL .. \n", " 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004] 2005 [YR2005]\n", "100 .. .. 211 .. 223 \n", "101 .. .. 21 .. 25 \n", "102 .. .. .. .. 171815 \n", " 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009] 2010 [YR2010]\n", "100 .. .. 128 .. .. \n", "101 .. .. 15 .. .. \n", "102 186227 196167 203761 236741 257640 \n", " 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]\n", "100 232 .. .. .. \n", "101 40 .. .. .. \n", "102 278854 281158 283446 285301 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "rename(datakab, region_name = 'Region Name', series_code = 'Series Code') [100:102,]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
Merubah nama bisa pula dilakukan dengan meng-*assign* vektor nama baru dengan panjang sama seperti nama yang hendak digantikan" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\n", "
REGION NAME REGION CODE SERIES NAME Series Code 2000 [YR2000]2001 [YR2001]2002 [YR2002]2003 [YR2003]2004 [YR2004]2005 [YR2005]2006 [YR2006]2007 [YR2007]2008 [YR2008]2009 [YR2009]2010 [YR2010]2011 [YR2011]2012 [YR2012]2013 [YR2013]2014 [YR2014]
\n" ], "text/latex": [ "\\begin{tabular}{lllllllllllllllllll}\n", "\t REGION NAME & REGION CODE & SERIES NAME & Series Code & 2000 {[}YR2000{]} & 2001 {[}YR2001{]} & 2002 {[}YR2002{]} & 2003 {[}YR2003{]} & 2004 {[}YR2004{]} & 2005 {[}YR2005{]} & 2006 {[}YR2006{]} & 2007 {[}YR2007{]} & 2008 {[}YR2008{]} & 2009 {[}YR2009{]} & 2010 {[}YR2010{]} & 2011 {[}YR2011{]} & 2012 {[}YR2012{]} & 2013 {[}YR2013{]} & 2014 {[}YR2014{]}\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| REGION NAME | REGION CODE | SERIES NAME | Series Code | 2000 [YR2000] | 2001 [YR2001] | 2002 [YR2002] | 2003 [YR2003] | 2004 [YR2004] | 2005 [YR2005] | 2006 [YR2006] | 2007 [YR2007] | 2008 [YR2008] | 2009 [YR2009] | 2010 [YR2010] | 2011 [YR2011] | 2012 [YR2012] | 2013 [YR2013] | 2014 [YR2014] | \n", "\n", "\n" ], "text/plain": [ " [,1] [,2] [,3] [,4] [,5] \n", "[1,] REGION NAME REGION CODE SERIES NAME Series Code 2000 [YR2000]\n", " [,6] [,7] [,8] [,9] [,10] \n", "[1,] 2001 [YR2001] 2002 [YR2002] 2003 [YR2003] 2004 [YR2004] 2005 [YR2005]\n", " [,11] [,12] [,13] [,14] [,15] \n", "[1,] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008] 2009 [YR2009] 2010 [YR2010]\n", " [,16] [,17] [,18] [,19] \n", "[1,] 2011 [YR2011] 2012 [YR2012] 2013 [YR2013] 2014 [YR2014]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "names(datakab)[1:3] = names(datakab)[1:3] %>% toupper\n", "names(datakab) %>% t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Format penulisan nama-nama variabel di atas akan mempersulit penulisan program selanjutnya. Fungsi `clean_names` dari package `janitor` bisa merubah sekaligus seluruh nama variabel dalam data agar mengikuti gaya penulisan nama variabel yang banyak disarankan." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\n", "
region_name region_code series_name series_code x2000_yr2000x2001_yr2001x2002_yr2002x2003_yr2003x2004_yr2004x2005_yr2005x2006_yr2006x2007_yr2007x2008_yr2008x2009_yr2009x2010_yr2010x2011_yr2011x2012_yr2012x2013_yr2013x2014_yr2014
\n" ], "text/latex": [ "\\begin{tabular}{lllllllllllllllllll}\n", "\t region\\_name & region\\_code & series\\_name & series\\_code & x2000\\_yr2000 & x2001\\_yr2001 & x2002\\_yr2002 & x2003\\_yr2003 & x2004\\_yr2004 & x2005\\_yr2005 & x2006\\_yr2006 & x2007\\_yr2007 & x2008\\_yr2008 & x2009\\_yr2009 & x2010\\_yr2010 & x2011\\_yr2011 & x2012\\_yr2012 & x2013\\_yr2013 & x2014\\_yr2014\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| region_name | region_code | series_name | series_code | x2000_yr2000 | x2001_yr2001 | x2002_yr2002 | x2003_yr2003 | x2004_yr2004 | x2005_yr2005 | x2006_yr2006 | x2007_yr2007 | x2008_yr2008 | x2009_yr2009 | x2010_yr2010 | x2011_yr2011 | x2012_yr2012 | x2013_yr2013 | x2014_yr2014 | \n", "\n", "\n" ], "text/plain": [ " [,1] [,2] [,3] [,4] [,5] [,6] \n", "[1,] region_name region_code series_name series_code x2000_yr2000 x2001_yr2001\n", " [,7] [,8] [,9] [,10] [,11] \n", "[1,] x2002_yr2002 x2003_yr2003 x2004_yr2004 x2005_yr2005 x2006_yr2006\n", " [,12] [,13] [,14] [,15] [,16] \n", "[1,] x2007_yr2007 x2008_yr2008 x2009_yr2009 x2010_yr2010 x2011_yr2011\n", " [,17] [,18] [,19] \n", "[1,] x2012_yr2012 x2013_yr2013 x2014_yr2014" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "datakab = janitor::clean_names(datakab)\n", "names(datakab) %>% t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kita bisa sederhanakan lagi nama-nama variabel tahun dengan hanya mengambil karakter setelah garis bawah ( _ ). Kita gunakan fungsi `str_replace` dari library `stringr` untuk mendeteksi pola karakter yang ingin dihapus, yakni diganti dengan karakter kosong (`''`)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\n", "
region_nameregion_codeseries_nameseries_codeyr2000 yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012 yr2013 yr2014
\n" ], "text/latex": [ "\\begin{tabular}{lllllllllllllllllll}\n", "\t region\\_name & region\\_code & series\\_name & series\\_code & yr2000 & yr2001 & yr2002 & yr2003 & yr2004 & yr2005 & yr2006 & yr2007 & yr2008 & yr2009 & yr2010 & yr2011 & yr2012 & yr2013 & yr2014 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| region_name | region_code | series_name | series_code | yr2000 | yr2001 | yr2002 | yr2003 | yr2004 | yr2005 | yr2006 | yr2007 | yr2008 | yr2009 | yr2010 | yr2011 | yr2012 | yr2013 | yr2014 | \n", "\n", "\n" ], "text/plain": [ " [,1] [,2] [,3] [,4] [,5] [,6] [,7] \n", "[1,] region_name region_code series_name series_code yr2000 yr2001 yr2002\n", " [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] \n", "[1,] yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012\n", " [,18] [,19] \n", "[1,] yr2013 yr2014" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "datakab %<>% rename_at(vars(starts_with('x20')), \n", " funs(stringr::str_replace(., 'x20[0-9][0-9]_', ''))) \n", "datakab %>% names %>% t" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Menggabungkan data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Untuk keperluan latihan penggabungan, datakab dipecah menjadi dua data dengan variabel berbeda kecuali variabel ID" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\n", "
region_nameregion_codeseries_nameseries_codeyr2010 yr2011 yr2012 yr2013 yr2014
\n" ], "text/latex": [ "\\begin{tabular}{lllllllll}\n", "\t region\\_name & region\\_code & series\\_name & series\\_code & yr2010 & yr2011 & yr2012 & yr2013 & yr2014 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| region_name | region_code | series_name | series_code | yr2010 | yr2011 | yr2012 | yr2013 | yr2014 | \n", "\n", "\n" ], "text/plain": [ " [,1] [,2] [,3] [,4] [,5] [,6] [,7] \n", "[1,] region_name region_code series_name series_code yr2010 yr2011 yr2012\n", " [,8] [,9] \n", "[1,] yr2013 yr2014" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data1 = datakab %>% select(region_name:series_code, contains('201'))\n", "names(data1) %>% t" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\n", "
region_nameregion_codeseries_nameseries_codeyr2000 yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009
\n" ], "text/latex": [ "\\begin{tabular}{llllllllllllll}\n", "\t region\\_name & region\\_code & series\\_name & series\\_code & yr2000 & yr2001 & yr2002 & yr2003 & yr2004 & yr2005 & yr2006 & yr2007 & yr2008 & yr2009 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| region_name | region_code | series_name | series_code | yr2000 | yr2001 | yr2002 | yr2003 | yr2004 | yr2005 | yr2006 | yr2007 | yr2008 | yr2009 | \n", "\n", "\n" ], "text/plain": [ " [,1] [,2] [,3] [,4] [,5] [,6] [,7] \n", "[1,] region_name region_code series_name series_code yr2000 yr2001 yr2002\n", " [,8] [,9] [,10] [,11] [,12] [,13] [,14] \n", "[1,] yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data2 = datakab %>% select(-contains('201'))\n", "names(data2) %>% t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Lalu gabungkan dua data tersebut menggunakan `merge` dari base R, atau seri fungsi \n", "`_join` dari `dplyr`. Penggabungan baris berdasar variabel ID dalam argumen `by`. Jika ada variabel bernama sama tapi tidak menjadi argumen `by`, maka di data baru variabel tersebut akan ditambahi akhiran `.x` dan `.y`" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
region_codeseries_coderegion_name.xseries_name.xyr2010yr2011yr2012yr2013yr2014region_name.y...yr2000yr2001yr2002yr2003yr2004yr2005yr2006yr2007yr2008yr2009
12010IDN_Yogyakarta_Kota_17983 SI.POV.NAPR.ZS Yogyakarta, Kota Poverty Rate (in % of population) 9.75 9.62 9.38 8.82 8.67 Yogyakarta, Kota ... .. .. 14.52 12.59 21.77 10.5 10.22 9.78 10.81 10.05
12011IDN_Yogyakarta_Kota_17983 SI.POV.NGAP Yogyakarta, Kota Poverty Gap (index) 1.29 1.19 1.57 1.24 1.14 Yogyakarta, Kota ... .. .. 3.23 3.23 2.96 2.34 1.88 2.26 2.1 1.91
12012IDN_Yogyakarta_Kota_17983 SP.POP.TOTL Yogyakarta, Kota Total Population (in number of people)388627 392506 397594 402679 .. Yogyakarta, Kota ... 397398 395775 394140 392492 396238 419163.765233477 445258 451118 456915 462663
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllllll}\n", " & region\\_code & series\\_code & region\\_name.x & series\\_name.x & yr2010 & yr2011 & yr2012 & yr2013 & yr2014 & region\\_name.y & ... & yr2000 & yr2001 & yr2002 & yr2003 & yr2004 & yr2005 & yr2006 & yr2007 & yr2008 & yr2009\\\\\n", "\\hline\n", "\t12010 & IDN\\_Yogyakarta\\_Kota\\_17983 & SI.POV.NAPR.ZS & Yogyakarta, Kota & Poverty Rate (in \\% of population) & 9.75 & 9.62 & 9.38 & 8.82 & 8.67 & Yogyakarta, Kota & ... & .. & .. & 14.52 & 12.59 & 21.77 & 10.5 & 10.22 & 9.78 & 10.81 & 10.05 \\\\\n", "\t12011 & IDN\\_Yogyakarta\\_Kota\\_17983 & SI.POV.NGAP & Yogyakarta, Kota & Poverty Gap (index) & 1.29 & 1.19 & 1.57 & 1.24 & 1.14 & Yogyakarta, Kota & ... & .. & .. & 3.23 & 3.23 & 2.96 & 2.34 & 1.88 & 2.26 & 2.1 & 1.91 \\\\\n", "\t12012 & IDN\\_Yogyakarta\\_Kota\\_17983 & SP.POP.TOTL & Yogyakarta, Kota & Total Population (in number of people) & 388627 & 392506 & 397594 & 402679 & .. & Yogyakarta, Kota & ... & 397398 & 395775 & 394140 & 392492 & 396238 & 419163.765233477 & 445258 & 451118 & 456915 & 462663 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | region_code | series_code | region_name.x | series_name.x | yr2010 | yr2011 | yr2012 | yr2013 | yr2014 | region_name.y | ... | yr2000 | yr2001 | yr2002 | yr2003 | yr2004 | yr2005 | yr2006 | yr2007 | yr2008 | yr2009 | \n", "|---|---|---|\n", "| 12010 | IDN_Yogyakarta_Kota_17983 | SI.POV.NAPR.ZS | Yogyakarta, Kota | Poverty Rate (in % of population) | 9.75 | 9.62 | 9.38 | 8.82 | 8.67 | Yogyakarta, Kota | ... | .. | .. | 14.52 | 12.59 | 21.77 | 10.5 | 10.22 | 9.78 | 10.81 | 10.05 | \n", "| 12011 | IDN_Yogyakarta_Kota_17983 | SI.POV.NGAP | Yogyakarta, Kota | Poverty Gap (index) | 1.29 | 1.19 | 1.57 | 1.24 | 1.14 | Yogyakarta, Kota | ... | .. | .. | 3.23 | 3.23 | 2.96 | 2.34 | 1.88 | 2.26 | 2.1 | 1.91 | \n", "| 12012 | IDN_Yogyakarta_Kota_17983 | SP.POP.TOTL | Yogyakarta, Kota | Total Population (in number of people) | 388627 | 392506 | 397594 | 402679 | .. | Yogyakarta, Kota | ... | 397398 | 395775 | 394140 | 392492 | 396238 | 419163.765233477 | 445258 | 451118 | 456915 | 462663 | \n", "\n", "\n" ], "text/plain": [ " region_code series_code region_name.x \n", "12010 IDN_Yogyakarta_Kota_17983 SI.POV.NAPR.ZS Yogyakarta, Kota\n", "12011 IDN_Yogyakarta_Kota_17983 SI.POV.NGAP Yogyakarta, Kota\n", "12012 IDN_Yogyakarta_Kota_17983 SP.POP.TOTL Yogyakarta, Kota\n", " series_name.x yr2010 yr2011 yr2012 yr2013 yr2014\n", "12010 Poverty Rate (in % of population) 9.75 9.62 9.38 8.82 8.67 \n", "12011 Poverty Gap (index) 1.29 1.19 1.57 1.24 1.14 \n", "12012 Total Population (in number of people) 388627 392506 397594 402679 .. \n", " region_name.y ... yr2000 yr2001 yr2002 yr2003 yr2004 yr2005 \n", "12010 Yogyakarta, Kota ... .. .. 14.52 12.59 21.77 10.5 \n", "12011 Yogyakarta, Kota ... .. .. 3.23 3.23 2.96 2.34 \n", "12012 Yogyakarta, Kota ... 397398 395775 394140 392492 396238 419163.765233477\n", " yr2006 yr2007 yr2008 yr2009\n", "12010 10.22 9.78 10.81 10.05 \n", "12011 1.88 2.26 2.1 1.91 \n", "12012 445258 451118 456915 462663" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "merge(data1, data2, by = c('region_code', 'series_code')) %>% tail(3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "
Jika variabel id berbeda nama di kedua data, pasangkan variabel id dari kedua data dengan tanda `=`, dengan format \n", "```\n", "by = c(\"ID1_data1\" = \"ID1_data2\", \"ID2_data1\" = \"ID2_data2\")\n", "```\n", "Jika tidak ada argumen `by`, penggabungan dilakukan dengan semua variabel bernama sama" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Joining, by = c(\"region_name\", \"region_code\", \"series_name\", \"series_code\")\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\n", "
region_nameregion_codeseries_nameseries_codeyr2010yr2011yr2012yr2013yr2014yr2000yr2001yr2002yr2003yr2004yr2005yr2006yr2007yr2008yr2009
12010Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Dirt (in % of total villages) ROD.VILG.DIRT.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12011Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Gravel (in % of total villages)ROD.VILG.GRAVL.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12012Yogyakarta, Kota IDN_Yogyakarta_Kota_17983 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & region\\_name & region\\_code & series\\_name & series\\_code & yr2010 & yr2011 & yr2012 & yr2013 & yr2014 & yr2000 & yr2001 & yr2002 & yr2003 & yr2004 & yr2005 & yr2006 & yr2007 & yr2008 & yr2009\\\\\n", "\\hline\n", "\t12010 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Dirt (in \\% of total villages) & ROD.VILG.DIRT.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12011 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Gravel (in \\% of total villages) & ROD.VILG.GRAVL.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\t12012 & Yogyakarta, Kota & IDN\\_Yogyakarta\\_Kota\\_17983 & Villages with road: Other (in \\% of total villages) & ROD.VILG.OTHR.ZS & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. & .. \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | region_name | region_code | series_name | series_code | yr2010 | yr2011 | yr2012 | yr2013 | yr2014 | yr2000 | yr2001 | yr2002 | yr2003 | yr2004 | yr2005 | yr2006 | yr2007 | yr2008 | yr2009 | \n", "|---|---|---|\n", "| 12010 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Dirt (in % of total villages) | ROD.VILG.DIRT.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12011 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Gravel (in % of total villages) | ROD.VILG.GRAVL.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "| 12012 | Yogyakarta, Kota | IDN_Yogyakarta_Kota_17983 | Villages with road: Other (in % of total villages) | ROD.VILG.OTHR.ZS | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | \n", "\n", "\n" ], "text/plain": [ " region_name region_code \n", "12010 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", "12011 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", "12012 Yogyakarta, Kota IDN_Yogyakarta_Kota_17983\n", " series_name series_code \n", "12010 Villages with road: Dirt (in % of total villages) ROD.VILG.DIRT.ZS \n", "12011 Villages with road: Gravel (in % of total villages) ROD.VILG.GRAVL.ZS\n", "12012 Villages with road: Other (in % of total villages) ROD.VILG.OTHR.ZS \n", " yr2010 yr2011 yr2012 yr2013 yr2014 yr2000 yr2001 yr2002 yr2003 yr2004\n", "12010 .. .. .. .. .. .. .. .. .. .. \n", "12011 .. .. .. .. .. .. .. .. .. .. \n", "12012 .. .. .. .. .. .. .. .. .. .. \n", " yr2005 yr2006 yr2007 yr2008 yr2009\n", "12010 .. .. .. .. .. \n", "12011 .. .. .. .. .. \n", "12012 .. .. .. .. .. " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dplyr::inner_join(data1, data2) %>% tail(3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Secara default, hasil merge hanya mempertahankan baris dengan ID yang terdapat di kedua data yang digabungkan. Hasil default merge ini ekuivalen dengan hasil dari `dplyr::inner_join`\n", "\n", "Jika ingin semua baris dipertahankan walau ID hanya terdapat di salah satu data, maka gunakan argumen `all = TRUE`. Ini ekuivalen dengan hasil dari `dplyr::all_join`\n", "\n", "Jika hanya ingin mempertahankan semua baris dari salah satu data \n", " - data I gunakan `all.x = TRUE`, ekuivalen dengan `dplyr::left_join`\n", " - data II gunakan `all.y = TRUE`, ekuivalen dengan `dplyr::right_join`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Merubah Nilai" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data dalam tabel bisa dirubah dengan meng-*assign* nilai baru ke lokasi baris dan kolom yang ingin dirubah. Dimensi nilai yang di-*assign* harus sama dengan dimensi data yang dirubah. Berikut ini data di baris kedua dan kolom kelima `datakab` diganti dari karakter \"..\" menjadi `NA`\n", "> `NA`, kependekan dari *Not Available*, adalah cara R mengeksplisitkan bagian data yang tidak memiliki nilai (*missing values*). Sel yang tampak kosong di R adalah karakter kosong yang dianggap ada datanya, yakni observasi yang valid." ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "data": { "text/html": [ "'..'" ], "text/latex": [ "'..'" ], "text/markdown": [ "'..'" ], "text/plain": [ "[1] \"..\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "datakab[2,5]" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<NA>" ], "text/latex": [ "" ], "text/markdown": [ "<NA>" ], "text/plain": [ "[1] NA" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "(datakab[2,5] = NA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Di bawah ini bisa dilihat bahwa variabel-variabel tahun didominasi dengan karakter \"..\" yang sebenarnya mewakili *missing values*. Karena di atas kita telah mengganti satu observasi menjadi `NA` dalam kolom ke-5 data, yakni variabel `yr2000`, maka variabel tersebut tidak lagi valid 100%. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> `dfSummary` dari package `summarytools` menampilkan ringkasan karakteristik data. Argumen `graph.col = F` digunakan karena informasi pada histogram sudah diwakili oleh kolom frekuensi `Freqs`. Demikian pula `na.col` yang mengandung jumlah dan persentase *missing values* adalah kebalikan dari kolom `Valid`. " ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data Frame Summary \n", "N: 12012 \n", "--------------------------------------------------------------------\n", "No Variable Stats / Values Freqs (% of Valid) Valid \n", "---- ------------- ----------------- -------------------- ----------\n", "1 yr2000 1. .. 7343 (61.1%) 12011 \n", " [character] 2. 1 85 ( 0.7%) (99.99%) \n", " 3. 2 79 ( 0.7%) \n", " 4. 4 49 ( 0.4%) \n", " 5. 3 47 ( 0.4%) \n", " 6. 100 31 ( 0.3%) \n", " 7. 5 26 ( 0.2%) \n", " 8. 7 21 ( 0.2%) \n", " 9. 6 20 ( 0.2%) \n", " 10. 26 18 ( 0.1%) \n", " [ 2901 others ] 4292 (40.2%) \n", "\n", "2 yr2001 1. .. 10189 (84.8%) 12012 \n", " [character] 2. 94.52 4 ( 0.0%) (100%) \n", " 3. 90.77 3 ( 0.0%) \n", " 4. 90.92 3 ( 0.0%) \n", " 5. 92.16 3 ( 0.0%) \n", " 6. 93.44 3 ( 0.0%) \n", " 7. 93.53 3 ( 0.0%) \n", " 8. 93.58 3 ( 0.0%) \n", " 9. 93.69 3 ( 0.0%) \n", " 10. 94.03 3 ( 0.0%) \n", " [ 1724 others ] 1795 (17.9%) \n", "--------------------------------------------------------------------\n" ] } ], "source": [ "datakab %>% select(yr2000, yr2001) %>% \n", " summarytools::dfSummary(graph.col = F, na.col = F) %>% \n", " print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Salah satu unsur penyiapan data untuk analisis adalah mengeksplisitkan seluruh *missing values* pada data menjadi `NA` agar diproses secara benar dalam analisis berikutnya. Untuk keperluan ini, kombinasi `dplyr::mutate` dan `ifelse` bisa digunakan untuk merubah seluruh nilai variabel \"..\" menjadi `NA`.\n", "> `ifelse` akan memberikan nilai argumen kedua jika ekspresi di argumen pertamanya benar, dan memberikan nilai argumen ketiga jika ekspresi tersebut salah. Uji kesamaan di R menggunakan simbol `==`, bukan `=`. Ketidaksamaan menggunakan simbol `!=`. Sementara relasi lainnya menggunakan simbol matematika seperti umumnya, `<`, `>`, `<=`, dan `>=`. " ] }, { "cell_type": "code", "execution_count": 126, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data Frame Summary \n", "N: 12012 \n", "--------------------------------------------------------------------\n", "No Variable Stats / Values Freqs (% of Valid) Valid \n", "---- ------------- ----------------- -------------------- ----------\n", "1 yr2000 1. 1 85 ( 1.8%) 4668 \n", " [character] 2. 2 79 ( 1.7%) (38.86%) \n", " 3. 4 49 ( 1.0%) \n", " 4. 3 47 ( 1.0%) \n", " 5. 100 31 ( 0.7%) \n", " 6. 5 26 ( 0.6%) \n", " 7. 7 21 ( 0.4%) \n", " 8. 6 20 ( 0.4%) \n", " 9. 26 18 ( 0.4%) \n", " 10. 11 17 ( 0.4%) \n", " [ 2900 others ] 4275 (87.3%) \n", "\n", "2 yr2001 1. 94.52 4 ( 0.2%) 1823 \n", " [character] 2. 90.77 3 ( 0.2%) (15.18%) \n", " 3. 90.92 3 ( 0.2%) \n", " 4. 92.16 3 ( 0.2%) \n", " 5. 93.44 3 ( 0.2%) \n", " 6. 93.53 3 ( 0.2%) \n", " 7. 93.58 3 ( 0.2%) \n", " 8. 93.69 3 ( 0.2%) \n", " 9. 94.03 3 ( 0.2%) \n", " 10. 94.07 3 ( 0.2%) \n", " [ 1723 others ] 1792 (90.3%) \n", "\n", "3 yr2002 1. .. 9487 (79.0%) 12012 \n", " [character] 2. 4.34 5 ( 0.0%) (100%) \n", " 3. 2.19 4 ( 0.0%) \n", " 4. 2.33 4 ( 0.0%) \n", " 5. 2.34 4 ( 0.0%) \n", " 6. 2.53 4 ( 0.0%) \n", " 7. 0.65 3 ( 0.0%) \n", " 8. 0.86 3 ( 0.0%) \n", " 9. 0.97 3 ( 0.0%) \n", " 10. 0.98 3 ( 0.0%) \n", " [ 2273 others ] 2492 (24.7%) \n", "--------------------------------------------------------------------\n" ] } ], "source": [ "datakab %>% mutate(yr2000 = ifelse(yr2000 == '..', NA, yr2000),\n", " yr2001 = ifelse(yr2001 == '..', NA, yr2001)) %>% \n", " select(yr2000:yr2002) %>% \n", " summarytools::dfSummary(graph.col = F, na.col = F) %>% \n", " print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gunakan `mutate_at` untuk memutasi sekaligus semua variabel yang dipilih dengan fungsi yang sama, tanpa harus mengulang satu per satu. Agar perintah mutasi bisa diterapkan ke semua variabel, nama variabel diganti dengan titik (`.`) dalam argumen `funs`" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [], "source": [ "datakab %<>% mutate_at(vars(starts_with(\"yr20\")), funs(ifelse(. == '..', NA, .))) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data pada variabel-variabel yang diawali `yr` pada dasarnya merupakan variabel angka, sehingga lebih baik dikonversi menjadi tipe angka dengan `as.numeric`." ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [], "source": [ "datakab %<>% mutate_at(vars(starts_with(\"yr20\")), as.numeric)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`dfSummary` kini dapat menyajikan statitik yang meringkas variabel angka, bukan hanya frekuensi tiap nilai uniknya." ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data Frame Summary \n", "N: 12012 \n", "------------------------------------------------------------------------------------------\n", "No Variable Stats / Values Freqs (% of Valid) Valid \n", "---- ----------- --------------------------------------- ---------------------- ----------\n", "1 yr2010 mean (sd) : 1360005.99 (24387637.53) 3776 distinct values 4305 \n", " [numeric] min < med < max : (35.84%) \n", " 0.19 < 59.02 < 862158976 \n", " IQR (CV) : 42791.39 (17.93) \n", "\n", "2 yr2011 mean (sd) : 329813.33 (11716890.33) 5981 distinct values 10396 \n", " [numeric] min < med < max : (86.55%) \n", " 0 < 57 < 982540032 \n", " IQR (CV) : 80.45 (35.53) \n", "\n", "3 yr2012 mean (sd) : 150075.75 (1148162.6) 3720 distinct values 4273 \n", " [numeric] min < med < max : (35.57%) \n", " 0 < 61.9 < 44643586 \n", " IQR (CV) : 78.08 (7.65) \n", "\n", "4 yr2013 mean (sd) : 137749.78 (1099946.11) 4373 distinct values 4778 \n", " [numeric] min < med < max : (39.78%) \n", " 0 < 69.12 < 45340799 \n", " IQR (CV) : 74.97 (7.99) \n", "\n", "5 yr2014 mean (sd) : 52560.86 (120968.25) 2582 distinct values 3180 \n", " [numeric] min < med < max : (26.47%) \n", " 0.2 < 68.97 < 657702 \n", " IQR (CV) : 85.91 (2.3) \n", "------------------------------------------------------------------------------------------\n" ] } ], "source": [ "datakab %>% select(starts_with(\"yr201\")) %>%\n", " summarytools::dfSummary(graph.col = F, na.col = F) %>% \n", " print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pivot Tabel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Tahun pada dasarnya bukan nama variabel, sehingga bisa dikumpulkan menjadi satu kolom/variabel dengan menggunakan `dplyr::gather`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "datakab %<>% gather(year, val, starts_with('yr')) \n", "datakab %>% head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kolom `series_code` dan `series_name` sebenarnya berisikan nama-nama variabel. Agar data rapi, siap analisis, nama-nama variabel dalam kolom tersebut perlu dijadikan sebagai nama kolom. \n", "\n", "`dplyr::spread` bisa digunakan untuk keperluan ini. Dalam menerapkannya, salah satu dari `series_code` atau `series_name` perlu dibuang dulu karena redundan dan justru membuat hasilnya tak sesuai harapan." ] }, { "cell_type": "code", "execution_count": 133, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
region_nameregion_codeyearHuman Development IndexMorbidity Rate (in %)Net Enrollment Ratio: Junior Secondary (in %)Net Enrollment Ratio: Primary (in %)Net Enrollment Ratio: Senior Secondary (in %)Number of DoctorsNumber of hospitals...Number of schools at Senior Secondary levelPoverty Gap (index)Poverty Line (in IDR)Poverty Rate (in % of population)Total GDP based on expenditure (in IDR Million)Total Population (in number of people)Villages with road: Asphalt (in % of total villages)Villages with road: Dirt (in % of total villages)Villages with road: Gravel (in % of total villages)Villages with road: Other (in % of total villages)
Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_736232000 NA NA NA NA NA NA NA ... NA NA NA NA NA NA NA NA NA NA
Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_736232001 NA NA NA NA NA NA NA ... NA NA NA NA NA NA NA NA NA NA
\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllllllllll}\n", " region\\_name & region\\_code & year & Human Development Index & Morbidity Rate (in \\%) & Net Enrollment Ratio: Junior Secondary (in \\%) & Net Enrollment Ratio: Primary (in \\%) & Net Enrollment Ratio: Senior Secondary (in \\%) & Number of Doctors & Number of hospitals & ... & Number of schools at Senior Secondary level & Poverty Gap (index) & Poverty Line (in IDR) & Poverty Rate (in \\% of population) & Total GDP based on expenditure (in IDR Million) & Total Population (in number of people) & Villages with road: Asphalt (in \\% of total villages) & Villages with road: Dirt (in \\% of total villages) & Villages with road: Gravel (in \\% of total villages) & Villages with road: Other (in \\% of total villages)\\\\\n", "\\hline\n", "\t Aceh Barat Daya, Kab. & IDN\\_Aceh\\_Barat\\_Daya\\_Kab\\_73623 & 2000 & NA & NA & NA & NA & NA & NA & NA & ... & NA & NA & NA & NA & NA & NA & NA & NA & NA & NA \\\\\n", "\t Aceh Barat Daya, Kab. & IDN\\_Aceh\\_Barat\\_Daya\\_Kab\\_73623 & 2001 & NA & NA & NA & NA & NA & NA & NA & ... & NA & NA & NA & NA & NA & NA & NA & NA & NA & NA \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "region_name | region_code | year | Human Development Index | Morbidity Rate (in %) | Net Enrollment Ratio: Junior Secondary (in %) | Net Enrollment Ratio: Primary (in %) | Net Enrollment Ratio: Senior Secondary (in %) | Number of Doctors | Number of hospitals | ... | Number of schools at Senior Secondary level | Poverty Gap (index) | Poverty Line (in IDR) | Poverty Rate (in % of population) | Total GDP based on expenditure (in IDR Million) | Total Population (in number of people) | Villages with road: Asphalt (in % of total villages) | Villages with road: Dirt (in % of total villages) | Villages with road: Gravel (in % of total villages) | Villages with road: Other (in % of total villages) | \n", "|---|---|\n", "| Aceh Barat Daya, Kab. | IDN_Aceh_Barat_Daya_Kab_73623 | 2000 | NA | NA | NA | NA | NA | NA | NA | ... | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | \n", "| Aceh Barat Daya, Kab. | IDN_Aceh_Barat_Daya_Kab_73623 | 2001 | NA | NA | NA | NA | NA | NA | NA | ... | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | \n", "\n", "\n" ], "text/plain": [ " region_name region_code year\n", "1 Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623 2000\n", "2 Aceh Barat Daya, Kab. IDN_Aceh_Barat_Daya_Kab_73623 2001\n", " Human Development Index Morbidity Rate (in %)\n", "1 NA NA \n", "2 NA NA \n", " Net Enrollment Ratio: Junior Secondary (in %)\n", "1 NA \n", "2 NA \n", " Net Enrollment Ratio: Primary (in %)\n", "1 NA \n", "2 NA \n", " Net Enrollment Ratio: Senior Secondary (in %) Number of Doctors\n", "1 NA NA \n", "2 NA NA \n", " Number of hospitals ... Number of schools at Senior Secondary level\n", "1 NA ... NA \n", "2 NA ... NA \n", " Poverty Gap (index) Poverty Line (in IDR) Poverty Rate (in % of population)\n", "1 NA NA NA \n", "2 NA NA NA \n", " Total GDP based on expenditure (in IDR Million)\n", "1 NA \n", "2 NA \n", " Total Population (in number of people)\n", "1 NA \n", "2 NA \n", " Villages with road: Asphalt (in % of total villages)\n", "1 NA \n", "2 NA \n", " Villages with road: Dirt (in % of total villages)\n", "1 NA \n", "2 NA \n", " Villages with road: Gravel (in % of total villages)\n", "1 NA \n", "2 NA \n", " Villages with road: Other (in % of total villages)\n", "1 NA \n", "2 NA " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "datakab %<>% mutate(year = str_replace(year, 'yr', '') %>% \n", " as.integer) %>% \n", " select(-series_code) %>% \n", " spread(key = series_name, value = val)\n", "datakab %>% head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nama kolom-kolom baru ini tidak memenuhi gaya standar. `janitor::clean_names` bisa digunakan lagi untuk menstandarisasi nama. Dan sekedar untuk menyingkat nama variabel, bagian nama yang menjelaskan unit pengukuran bisa dihapus dengan bantuan fungsi `str_replace_all` dari library `stringr`.\n", "> `str_replace` dan `str_detect` mencari urutan karakter yang memenuhi pola yang dispesifikasi menggunakan syntax [regular expression](https://regex101.com)." ] }, { "cell_type": "code", "execution_count": 155, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\n", "
region_name region_code year human_development_index morbidity_rate net_enrollment_ratio_junior_secondary net_enrollment_ratio_primary net_enrollment_ratio_senior_secondary number_of_doctors number_of_hospitals ... number_of_schools_at_senior_secondary_levelpoverty_gap_index poverty_line poverty_rate total_gdp_based_on_expenditure total_population villages_with_road_asphalt villages_with_road_dirt villages_with_road_gravel villages_with_road_other
\n" ], "text/latex": [ "\\begin{tabular}{lllllllllllllllllllllllll}\n", "\t region\\_name & region\\_code & year & human\\_development\\_index & morbidity\\_rate & net\\_enrollment\\_ratio\\_junior\\_secondary & net\\_enrollment\\_ratio\\_primary & net\\_enrollment\\_ratio\\_senior\\_secondary & number\\_of\\_doctors & number\\_of\\_hospitals & ... & number\\_of\\_schools\\_at\\_senior\\_secondary\\_level & poverty\\_gap\\_index & poverty\\_line & poverty\\_rate & total\\_gdp\\_based\\_on\\_expenditure & total\\_population & villages\\_with\\_road\\_asphalt & villages\\_with\\_road\\_dirt & villages\\_with\\_road\\_gravel & villages\\_with\\_road\\_other \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| region_name | region_code | year | human_development_index | morbidity_rate | net_enrollment_ratio_junior_secondary | net_enrollment_ratio_primary | net_enrollment_ratio_senior_secondary | number_of_doctors | number_of_hospitals | ... | number_of_schools_at_senior_secondary_level | poverty_gap_index | poverty_line | poverty_rate | total_gdp_based_on_expenditure | total_population | villages_with_road_asphalt | villages_with_road_dirt | villages_with_road_gravel | villages_with_road_other | \n", "\n", "\n" ], "text/plain": [ " [,1] [,2] [,3] [,4] [,5] \n", "[1,] region_name region_code year human_development_index morbidity_rate\n", " [,6] [,7] \n", "[1,] net_enrollment_ratio_junior_secondary net_enrollment_ratio_primary\n", " [,8] [,9] \n", "[1,] net_enrollment_ratio_senior_secondary number_of_doctors\n", " [,10] [,11] [,12] \n", "[1,] number_of_hospitals ... number_of_schools_at_senior_secondary_level\n", " [,13] [,14] [,15] [,16] \n", "[1,] poverty_gap_index poverty_line poverty_rate total_gdp_based_on_expenditure\n", " [,17] [,18] [,19] \n", "[1,] total_population villages_with_road_asphalt villages_with_road_dirt\n", " [,20] [,21] \n", "[1,] villages_with_road_gravel villages_with_road_other" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "datakab %<>% janitor::clean_names()\n", "names(datakab) %<>% str_replace_all('_in_.+$', '') %>% t\n", "datakab %>% names %>% t" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Menyimpan dan Menghapus Objek" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- simpan semua objek R di memori ke file dengan perintah `save.image`\n", "- gunakan `save` jika hanya sebagian objek yang ingin disimpan \n", "- simpan image dengan nama `.RData` jika ingin otomatis dimuat ke memori tiap awal menjalankan R dari direktori tersebut" ] }, { "cell_type": "code", "execution_count": 156, "metadata": {}, "outputs": [], "source": [ "save.image(file = \".RData\")\n", "save(datakab, file = \"datakab.rda\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- hapus objek R dari memori dengan `rm`" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "'datakab'" ], "text/latex": [ "'datakab'" ], "text/markdown": [ "'datakab'" ], "text/plain": [ "[1] \"datakab\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "rm(data1, data2)\n", "ls()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `ls` memberikan vektor nama semua objek yang ada di memori" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/latex": [], "text/markdown": [], "text/plain": [ "character(0)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "rm(list = ls())\n", "ls()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- gunakan `load` untuk memuat semua objek dalam file image ke memori" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- saat dimuat kembali dari file image, nama objek akan sama dengan saat disimpan \n", "- untuk bisa bebas memberikan nama baru pada objek saat dimuat kembali, simpan dengan `saveRDS` dan muat dengan `readRDS`\n", "- kedua perintah ini hanya bisa menyimpan dan memuat satu objek" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "'datakab'" ], "text/latex": [ "'datakab'" ], "text/markdown": [ "'datakab'" ], "text/plain": [ "[1] \"datakab\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "load('datakab.rda')\n", "ls()" ] }, { "cell_type": "code", "execution_count": 157, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 'datakab'
  2. \n", "\t
  3. 'indodapoer'
  4. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'datakab'\n", "\\item 'indodapoer'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'datakab'\n", "2. 'indodapoer'\n", "\n", "\n" ], "text/plain": [ "[1] \"datakab\" \"indodapoer\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "saveRDS(datakab, file = \"datakab.rds\")\n", "indodapoer = readRDS('datakab.rds')\n", "ls()" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.5.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "199.2px" }, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }