{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import requests\n", "from bs4 import BeautifulSoup\n", "import pandas as pd\n", "import zipfile\n", "import requests\n", "import io\n", "from datetime import datetime,timedelta, date\n", "import datetime " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def getLinkDownload(text):\n", " if text.find('/Download/BalanceposEfek') != -1:\n", " return 1\n", " else:\n", " return 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Description\n", "Kepemilikan saham ada yang dimiliki lokal (Local) atau asing (Foreign). Keduanya dibagi menjadi beberapa kelompok yaitu\n", "\n", "- IS – Insurance (perusahaan asuransi)\n", "- CP – Corporate (perusahaan)\n", "- PF – Pension Fund (Dana Pensiun)\n", "- IB – Institutional Banking (Bank)\n", "- ID – Individual (Investor individu)\n", "- MF – Mutual Fund (Perusahaan Reksadana)\n", "- SC – Securities (Perusahaan Efek)\n", "- FD – Foundation (Yayasan)\n", "- OT – Others (Lainnya)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Find Pattern\n", "Kita akan menggunakan fasilitas download via file zip untuk mendapatkan data kepemilikan saham di website KSEI." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "URL = \"https://www.ksei.co.id/archive_download/holding_composition\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "page = requests.get(URL)\n", "soup = BeautifulSoup(page.content, 'html.parser')\n", "ahref = soup.find_all(\"a\",href=True) #mendapatkan semua tag a href " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "ahref_extract = [k.get(\"href\") for k in ahref ] # extract url di dalam href" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "link_download = [k for k in ahref_extract if getLinkDownload(k)==1] #mengambil tag a href yang mengandung link donwload" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['/Download/BalanceposEfek20210930.zip',\n", " '/Download/BalanceposEfek20210831.zip',\n", " '/Download/BalanceposEfek20210730.zip',\n", " '/Download/BalanceposEfek20210630.zip',\n", " '/Download/BalanceposEfek20210531.zip',\n", " '/Download/BalanceposEfek20210430.zip',\n", " '/Download/BalanceposEfek20210331.zip',\n", " '/Download/BalanceposEfek20210226.zip',\n", " '/Download/BalanceposEfek20210129.zip']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "link_download" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "main_url = 'https://www.ksei.co.id'\n", "full_url = [main_url + k for k in link_download ]" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['https://www.ksei.co.id/Download/BalanceposEfek20210930.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210831.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210730.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210630.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210531.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210430.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210331.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210226.zip',\n", " 'https://www.ksei.co.id/Download/BalanceposEfek20210129.zip']" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "full_url" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Import Data\n", "Setelah pattern berhasil didapatkan, maka selanjutnya melakukan download dan import data" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "ls_kepemilikan = []\n", "for k in full_url:\n", " r = requests.get(k) #request\n", " z = zipfile.ZipFile(io.BytesIO(r.content)) #download dalam bentuk zip\n", " filename = z.filelist[0].filename #mendapatkan nama file utk digunakan sebagai parameter input data\n", " z.extractall() #extract file zip\n", " df = pd.read_csv(filename,sep='|') #read data csv\n", " ls_kepemilikan.append(df)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "df_kepemilikan = pd.concat(ls_kepemilikan)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# kolom Total menunjukkan total local, kolom Total.1 menunjukkan total foreign\n", "df_kepemilikan.rename(columns={\"Total\":\"Total Local\",\n", " \"Total.1\":\"Total Foreign\"},inplace=True)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateCodeTypeSec. NumPriceLocal ISLocal CPLocal PFLocal IBLocal ID...Foreign ISForeign CPForeign PFForeign IBForeign IDForeign MFForeign SCForeign FDForeign OTTotal Foreign
030-SEP-2021AALIEQUITY1924688333977598046648.01.282314e+072402431549700112262713...1722373450283673647061708281519668643241181126161535208822576053697182298
130-SEP-2021ABBAEQUITY275512500053542084400.01.724088e+0913905000485142947...0204000371585003684300019000040865100
230-SEP-2021ABDAEQUITY620806680695021885.04.754088e+0709847918272...042613655509740350040000062523540517
330-SEP-2021ABMMEQUITY275316500012002277000.07.030500e+061600066518400...057406700076281004116001035170000452514400528312500
430-SEP-2021ACESEQUITY171500000001270278775522.01.039961e+08195735110318630511...202208006025454671243693406464664725955700221834694617668065491790011363601915864385789
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " Date Code Type Sec. Num Price Local IS Local CP \\\n", "0 30-SEP-2021 AALI EQUITY 1924688333 9775 98046648.0 1.282314e+07 \n", "1 30-SEP-2021 ABBA EQUITY 2755125000 535 42084400.0 1.724088e+09 \n", "2 30-SEP-2021 ABDA EQUITY 620806680 6950 21885.0 4.754088e+07 \n", "3 30-SEP-2021 ABMM EQUITY 2753165000 1200 2277000.0 7.030500e+06 \n", "4 30-SEP-2021 ACES EQUITY 17150000000 1270 278775522.0 1.039961e+08 \n", "\n", " Local PF Local IB Local ID ... Foreign IS Foreign CP Foreign PF \\\n", "0 24024315 49700 112262713 ... 1722373 4502836 7364706 \n", "1 1390500 0 485142947 ... 0 20400 0 \n", "2 0 98 47918272 ... 0 426136555 0 \n", "3 1600 0 66518400 ... 0 57406700 0 \n", "4 19573511 0 318630511 ... 20220800 602545467 1243693406 \n", "\n", " Foreign IB Foreign ID Foreign MF Foreign SC Foreign FD Foreign OT \\\n", "0 17082815 1966864 32411811 26161535 208822 5760536 \n", "1 37158500 3684300 0 1900 0 0 \n", "2 97403500 400 0 0 0 62 \n", "3 7628100 411600 10351700 0 0 452514400 \n", "4 464664725 955700 2218346946 176680654 917900 1136360191 \n", "\n", " Total Foreign \n", "0 97182298 \n", "1 40865100 \n", "2 523540517 \n", "3 528312500 \n", "4 5864385789 \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_kepemilikan.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Cleaning Date Format" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "datelist = list(set(df_kepemilikan['Date']))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "fixed_date = []\n", "for k in datelist:\n", " day = int(k.split('-')[0])\n", " month = int(datetime.datetime.strptime(str(k.split('-')[1]), \"%b\").month)\n", " year = int(k.split('-')[2])\n", " date = datetime.date(year, month, day)\n", " fixed_date.append([k,date])" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "df_fixed_date = pd.DataFrame(fixed_date)\n", "df_fixed_date.columns = ['Date','Date_Format']" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "df = pd.merge(df_kepemilikan,df_fixed_date,on='Date',how='inner')\n", "df['Date'] = df['Date_Format']\n", "df.drop('Date_Format',axis=1,inplace=True)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateCodeTypeSec. NumPriceLocal ISLocal CPLocal PFLocal IBLocal ID...Foreign ISForeign CPForeign PFForeign IBForeign IDForeign MFForeign SCForeign FDForeign OTTotal Foreign
02021-09-30AALIEQUITY1924688333977598046648.01.282314e+072402431549700112262713...1722373450283673647061708281519668643241181126161535208822576053697182298
12021-09-30ABBAEQUITY275512500053542084400.01.724088e+0913905000485142947...0204000371585003684300019000040865100
22021-09-30ABDAEQUITY620806680695021885.04.754088e+0709847918272...042613655509740350040000062523540517
32021-09-30ABMMEQUITY275316500012002277000.07.030500e+061600066518400...057406700076281004116001035170000452514400528312500
42021-09-30ACESEQUITY171500000001270278775522.01.039961e+08195735110318630511...202208006025454671243693406464664725955700221834694617668065491790011363601915864385789
\n", "

5 rows × 25 columns

\n", "
" ], "text/plain": [ " Date Code Type Sec. Num Price Local IS Local CP \\\n", "0 2021-09-30 AALI EQUITY 1924688333 9775 98046648.0 1.282314e+07 \n", "1 2021-09-30 ABBA EQUITY 2755125000 535 42084400.0 1.724088e+09 \n", "2 2021-09-30 ABDA EQUITY 620806680 6950 21885.0 4.754088e+07 \n", "3 2021-09-30 ABMM EQUITY 2753165000 1200 2277000.0 7.030500e+06 \n", "4 2021-09-30 ACES EQUITY 17150000000 1270 278775522.0 1.039961e+08 \n", "\n", " Local PF Local IB Local ID ... Foreign IS Foreign CP Foreign PF \\\n", "0 24024315 49700 112262713 ... 1722373 4502836 7364706 \n", "1 1390500 0 485142947 ... 0 20400 0 \n", "2 0 98 47918272 ... 0 426136555 0 \n", "3 1600 0 66518400 ... 0 57406700 0 \n", "4 19573511 0 318630511 ... 20220800 602545467 1243693406 \n", "\n", " Foreign IB Foreign ID Foreign MF Foreign SC Foreign FD Foreign OT \\\n", "0 17082815 1966864 32411811 26161535 208822 5760536 \n", "1 37158500 3684300 0 1900 0 0 \n", "2 97403500 400 0 0 0 62 \n", "3 7628100 411600 10351700 0 0 452514400 \n", "4 464664725 955700 2218346946 176680654 917900 1136360191 \n", "\n", " Total Foreign \n", "0 97182298 \n", "1 40865100 \n", "2 523540517 \n", "3 528312500 \n", "4 5864385789 \n", "\n", "[5 rows x 25 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Summary\n", "Pada tahapan ini kita sudah mulai melakukan ekslorasi data sesuai yang kita inginkan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.1 Proporsi Lokal vs Asing\n", "Kita akan menacari berapa persentase asing dan lokal untuk masing masing emiten selama beberapa bulan" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "df['Total All'] = df['Total Local'] + df['Total Foreign']" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "df['persentase_lokal'] = 100*df['Total Local']/df['Total All']\n", "df['persentase_asing'] = 100*df['Total Foreign']/df['Total All']" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "df1 = df[['Date','Type','Code','Price','persentase_lokal','persentase_asing']]\n", "df1 = df1[df1.Type=='EQUITY']" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "desired_emiten = ['ACES','ADRO','ANTM','ASII','BRIS','CPIN','ICBP',\n", " 'INDF','JSMR','KAEF','KLBF','TLKM','UNVR','WIKA',\n", " 'BBCA','BBNI','BMRI','BUKA','EMTK']" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "df_filter1 = df1[df1.Code.isin(desired_emiten)].sort_values('persentase_asing',ascending=False)\n", "df_filter1.sort_values('Date',ascending=True,inplace=True)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateTypeCodePricepersentase_lokalpersentase_asing
181172021-01-29EQUITYTLKM311029.66306370.336937
159332021-02-26EQUITYTLKM349028.82000771.179993
137472021-03-31EQUITYTLKM342028.12873071.871270
115652021-04-30EQUITYTLKM320028.17696371.823037
94022021-05-31EQUITYTLKM344027.91537972.084621
72462021-06-30EQUITYTLKM315027.24448172.755519
50762021-07-30EQUITYTLKM324027.14036272.859638
29112021-08-31EQUITYTLKM340026.45708673.542914
7192021-09-30EQUITYTLKM369025.58561574.414385
\n", "
" ], "text/plain": [ " Date Type Code Price persentase_lokal persentase_asing\n", "18117 2021-01-29 EQUITY TLKM 3110 29.663063 70.336937\n", "15933 2021-02-26 EQUITY TLKM 3490 28.820007 71.179993\n", "13747 2021-03-31 EQUITY TLKM 3420 28.128730 71.871270\n", "11565 2021-04-30 EQUITY TLKM 3200 28.176963 71.823037\n", "9402 2021-05-31 EQUITY TLKM 3440 27.915379 72.084621\n", "7246 2021-06-30 EQUITY TLKM 3150 27.244481 72.755519\n", "5076 2021-07-30 EQUITY TLKM 3240 27.140362 72.859638\n", "2911 2021-08-31 EQUITY TLKM 3400 26.457086 73.542914\n", "719 2021-09-30 EQUITY TLKM 3690 25.585615 74.414385" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_filter1[df_filter1.Code=='TLKM']" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateTypeCodePricepersentase_lokalpersentase_asing
43722021-07-30EQUITYACES132012.91290087.087100
22052021-08-31EQUITYACES139013.28121986.718781
176082021-01-29EQUITYCPIN575013.35954586.640455
154232021-02-26EQUITYCPIN615013.74810886.251892
1872021-09-30EQUITYCPIN642514.29851985.701481
\n", "
" ], "text/plain": [ " Date Type Code Price persentase_lokal persentase_asing\n", "4372 2021-07-30 EQUITY ACES 1320 12.912900 87.087100\n", "2205 2021-08-31 EQUITY ACES 1390 13.281219 86.718781\n", "17608 2021-01-29 EQUITY CPIN 5750 13.359545 86.640455\n", "15423 2021-02-26 EQUITY CPIN 6150 13.748108 86.251892\n", "187 2021-09-30 EQUITY CPIN 6425 14.298519 85.701481" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_filter1.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Store Data" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "df.to_csv('DataKepemilikanSaham.csv',index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Refference\n", "1. https://www.ksei.co.id/archive_download/holding_composition\n", "2. https://www.ksei.co.id/Download/Panduan_Data_Statik_Investor.pdf\n", "3. https://www.programiz.com/python-programming/datetime/strftime" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }