{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **지난번 학습내용 정리하기**\n", "**Python 기본문법 정리
and Pandas**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **review Data Datum**\n", "1. Datum(숫자, \"문자\"), Data([list], {dict}, (tuple,))\n", "1. (기본/외부/사용자) 모듈, 함수, 메소드\n", "1. **문자**에서 **[]** 의 활용 ( [index], [:slicing])\n", "1. **[list]** 객체에서 **[]** 의 활용 ( [index], [:slicing], [if 조건문])\n", "1. [ List 포맷을 응용하여 for if ]\n", "1. { Dict 포맷을 응용하여 for if }\n", "1. for : 반복, if :판단, \n", "1. enumerate() : [List] 자료를 for 반복시 인덱스 를 함께 Tuple로 출력\n", "1. .item() : {dict} 자료를 for 반복시 key 와 value 를 Tuple로 출력 \n", "1. Web Crawling ==> type 변경 ==> list, dict, pandas 객체로 변환 ==> 시각화\n", "1. ndarray, Series, Dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **review Series**\n", "1. pd.Series( [ data ] , index = [ index ])\n", "1. series 사칙연산\n", "1. series [ Boolean 판단문 ]\n", "1. series.index = [ list ]\n", "1. series.isnull()\n", "1. series.drop()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **review Data Datum**\n", "1. Datum(숫자, \"문자\"), Data([list], {dict}, (tuple,))\n", "1. (기본/외부/사용자) 모듈, 함수, 메소드\n", "1. **문자**에서 **[]** 의 활용 ( [index], [:slicing])\n", "1. **[list]** 객체에서 **[]** 의 활용 ( [index], [:slicing], [if 조건문])\n", "1. [ List 포맷을 응용하여 for if ]\n", "1. { Dict 포맷을 응용하여 for if }\n", "1. for : 반복, if :판단, \n", "1. enumerate() : [List] 자료를 for 반복시 인덱스 를 함께 Tuple로 출력\n", "1. .item() : {dict} 자료를 for 반복시 key 와 value 를 Tuple로 출력 \n", "1. Web Crawling ==> type 변경 ==> list, dict, pandas 객체로 변환 ==> 시각화\n", "1. ndarray, Series, Dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **review DataFrame static**\n", "1. .count()\n", "1. .describe()\n", "1. .min() .max()\n", "1. .idxmin() .idxmax()\n", "1. .quantile() \n", "1. .sum()\n", "1. .mean() .median()\n", "1. .var() 분산 .std() 정규분산\n", "1. .cumsum() .cumprod() 누적 합 누적 곱\n", "1. .cummin() .cummax() 누적최소값, 누적최대값" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **review Series & DataFrame 결측치 제어하기**\n", "1. df.dropna()\n", "1. df.fillna(method='ffill', limit=2) # 결측치 대체\n", "1. df.fillna(df.mean()['컬럼명']) \n", "1. Series.interpolate(method='time') # 결측치 보간 (시계열적 특성을 부여가능)\n", "1. Series.interpolate(method='values', limit=1, limit_direction='backward') # 'forward','backward','both'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **review   TimeSeries**\n", "1. from datetime import datetime\n", "1. pandas.date_range(end = '2017-07-01', periods=30, freq='BM') \n", "1. pandas.date_range('2017/8/8 09:09:09', periods=5, normalize=True)\n", "1. [str(date.date())    for    date    in    pd.date_range('2017/01/01', '2017/01/11')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "## **Question About Pandas**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### **Question 1**\n", "Business Day 는 휴일이 포함되어 있나요?" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# DateTime Index를 생성한다\n", "import pandas as pd\n", "one_year = pd.date_range('2018-01-01',periods=365)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data = [i for i in range(len(one_year))]\n", "data = pd.Series(data, index=one_year)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "businessday = data.resample('B')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "52.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(365 - len(businessday)) / 2" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2018-01-31 15.0\n", "2018-02-28 44.5\n", "2018-03-31 74.0\n", "2018-04-30 104.5\n", "2018-05-31 135.0\n", "2018-06-30 165.5\n", "2018-07-31 196.0\n", "2018-08-31 227.0\n", "2018-09-30 257.5\n", "2018-10-31 288.0\n", "2018-11-30 318.5\n", "2018-12-31 349.0\n", "Freq: M, dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Tips\n", "data.resample('M').mean()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
openhighlowclose
2018-01-31030030
2018-02-2831583158
2018-03-3159895989
2018-04-309011990119
2018-05-31120150120150
2018-06-30151180151180
2018-07-31181211181211
2018-08-31212242212242
2018-09-30243272243272
2018-10-31273303273303
2018-11-30304333304333
2018-12-31334364334364
\n", "
" ], "text/plain": [ " open high low close\n", "2018-01-31 0 30 0 30\n", "2018-02-28 31 58 31 58\n", "2018-03-31 59 89 59 89\n", "2018-04-30 90 119 90 119\n", "2018-05-31 120 150 120 150\n", "2018-06-30 151 180 151 180\n", "2018-07-31 181 211 181 211\n", "2018-08-31 212 242 212 242\n", "2018-09-30 243 272 243 272\n", "2018-10-31 273 303 273 303\n", "2018-11-30 304 333 304 333\n", "2018-12-31 334 364 334 364" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Tips\n", "data.resample('M').ohlc()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### **Question 2**\n", "초단타 매매를 위한 초단위 생성도 가능한가요?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',\n", " '2011-01-01 02:00:00', '2011-01-01 03:00:00',\n", " '2011-01-01 04:00:00'],\n", " dtype='datetime64[ns]', freq='H')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.date_range('1/1/2011', periods=5, freq='H')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 00:01:00',\n", " '2011-01-01 00:02:00', '2011-01-01 00:03:00',\n", " '2011-01-01 00:04:00'],\n", " dtype='datetime64[ns]', freq='T')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.date_range('1/1/2011', periods=5, freq='min')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 00:00:01',\n", " '2011-01-01 00:00:02', '2011-01-01 00:00:03',\n", " '2011-01-01 00:00:04'],\n", " dtype='datetime64[ns]', freq='S')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.date_range('1/1/2011', periods=5, freq='s')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex([ '2011-01-01 00:00:00', '2011-01-01 00:00:00.001000',\n", " '2011-01-01 00:00:00.002000', '2011-01-01 00:00:00.003000',\n", " '2011-01-01 00:00:00.004000'],\n", " dtype='datetime64[ns]', freq='L')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.date_range('1/1/2011', periods=5, freq='ms')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex([ '2011-01-01 00:00:00', '2011-01-01 00:00:00.001000',\n", " '2011-01-01 00:00:00.002000', '2011-01-01 00:00:00.003000',\n", " '2011-01-01 00:00:00.004000'],\n", " dtype='datetime64[ns]', freq='L')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.date_range('1/1/2011', periods=5, freq='1ms')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "data = np.random.rand(110)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "date = pd.date_range('2018-06-06', periods=110, freq='ms')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2018-06-06 00:00:00.000 0.724359\n", "2018-06-06 00:00:00.001 0.426986\n", "2018-06-06 00:00:00.002 0.908999\n", "2018-06-06 00:00:00.003 0.247796\n", "2018-06-06 00:00:00.004 0.065260\n", "2018-06-06 00:00:00.005 0.959609\n", "2018-06-06 00:00:00.006 0.354474\n", "2018-06-06 00:00:00.007 0.635283\n", "2018-06-06 00:00:00.008 0.254580\n", "2018-06-06 00:00:00.009 0.426275\n", "2018-06-06 00:00:00.010 0.169618\n", "2018-06-06 00:00:00.011 0.088882\n", "2018-06-06 00:00:00.012 0.258088\n", "2018-06-06 00:00:00.013 0.659739\n", "2018-06-06 00:00:00.014 0.474797\n", "2018-06-06 00:00:00.015 0.909689\n", "2018-06-06 00:00:00.016 0.980873\n", "2018-06-06 00:00:00.017 0.167155\n", "2018-06-06 00:00:00.018 0.374242\n", "2018-06-06 00:00:00.019 0.057846\n", "Freq: L, dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(data, index=date)[:20]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### **Question 3**\n", "1. https://en.wikipedia.org/wiki/List_of_tz_database_time_zones\n", "1. pd.date_range() 에서 생성한 데이터의 시간대 변경은 어떻게 하나요?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2018-06-01 00:00:00', '2018-06-01 01:00:00',\n", " '2018-06-01 02:00:00', '2018-06-01 03:00:00',\n", " '2018-06-01 04:00:00'],\n", " dtype='datetime64[ns]', freq='H')" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date_nan = pd.date_range('2018-06-01', periods=5, freq='h')\n", "date_nan" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2018-06-01 00:00:00+00:00', '2018-06-01 01:00:00+00:00',\n", " '2018-06-01 02:00:00+00:00', '2018-06-01 03:00:00+00:00',\n", " '2018-06-01 04:00:00+00:00'],\n", " dtype='datetime64[ns, UTC]', freq='H')" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date_utc = pd.date_range('2018-06-01', periods=5, freq='h', tz='UTC')\n", "date_utc" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2018-01-01 00:00:00+09:00', '2018-01-02 00:00:00+09:00',\n", " '2018-01-03 00:00:00+09:00', '2018-01-04 00:00:00+09:00',\n", " '2018-01-05 00:00:00+09:00'],\n", " dtype='datetime64[ns, Asia/Tokyo]', freq='D')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date_kor = pd.date_range(start='2018-01-01', periods=5, tz='Asia/Tokyo')\n", "date_kor" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2017-12-31 15:00:00+00:00', '2018-01-01 15:00:00+00:00',\n", " '2018-01-02 15:00:00+00:00', '2018-01-03 15:00:00+00:00',\n", " '2018-01-04 15:00:00+00:00'],\n", " dtype='datetime64[ns, UTC]', freq='D')" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date_kor.tz_convert('UTC')" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2018-06-01 09:00:00+09:00', '2018-06-01 10:00:00+09:00',\n", " '2018-06-01 11:00:00+09:00', '2018-06-01 12:00:00+09:00',\n", " '2018-06-01 13:00:00+09:00'],\n", " dtype='datetime64[ns, Asia/Tokyo]', freq='H')" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "date_utc.tz_convert('Asia/Tokyo')" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# 처음 생성할때 UTC 정보가 없으면 수정이 어렵다\n", "# date_nan.tz_convert(9)\n", "# date_nan.tz_convert('UTC')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pytz.FixedOffset(0)\n" ] }, { "data": { "text/plain": [ "DatetimeIndex(['2018-06-01 00:00:09+00:00:09', '2018-06-01 01:00:09+00:00:09',\n", " '2018-06-01 02:00:09+00:00:09', '2018-06-01 03:00:09+00:00:09',\n", " '2018-06-01 04:00:09+00:00:09'],\n", " dtype='datetime64[ns, pytz.FixedOffset(0)]', freq='H')" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_nine = date_utc.tz_convert(9) \n", "print(data_nine.tzinfo)\n", "data_nine[:10]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# 목록보기 \n", "# List of tz database time zones 을 위키에서 찾기" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### **Question 4**\n", "지난번 코스닥 자료를 보니까 빈 공간이 많던데 이유가 뭔가요?\n", "> **from**    pandas_datareader    **import**    get_data_yahoo" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "from pandas_datareader import get_data_yahoo\n", "kosdaq = get_data_yahoo('031510.KQ', '2018-01-01').Close" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "kosdaq.plot()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date\n", "2018-01-19 6400\n", "2018-01-22 6350\n", "2018-01-23 6260\n", "2018-01-24 6090\n", "2018-01-25 6270\n", "2018-01-26 6230\n", "2018-01-29 6090\n", "2018-01-30 5870\n", "2018-01-31 5640\n", "2018-06-08 5050\n", "Name: Close, dtype: int64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "pd.DataFrame(kosdaq)\n", "kosdaq.index = pd.DatetimeIndex(kosdaq.index)\n", "kosdaq.tail(10)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# 현재 Yahoo finance API가 중간 자료가 빈 구간이 존재한다\n", "# 2018-02-01 부터 2018-06-03 4달 사이의 자료가 존재하지 않음\n", "# 이를 고치려면 별도 Naver 크롤링 등 별도 경로를 활용하여 보완해야 한다" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### **Question 5**\n", "pd.pct_change() 로 변환을 하는데 이유가 뭔가요?\n", "#### 데이터의 전처리\n", "1. 표준화 : 해당 데이터 고유한 값을 기준으로 범위를 재조정 한다\n", "1. 정규화 : 0 ~ 1 사이의 값으로 변환" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Q. voice, Signal 분석에서 중요한 데이터는?
\n", "A. Raw 데이터 로 해당 데이터를 직접 표준화/ 정규화 한다" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Q. 금융분석에서 중요한 데이터는?
\n", "A. Raw 데이터 (X)
\n", "A. 이익/ 손실여부의 판단
\n", "때문에 해당 데이터를 **변화율**로 전환한 뒤 이를 **Log 수익률** 을 사용하면 기간별 분석에 용이하다" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### **Question 6**\n", "Bitcoin 가격수집할때, 컬럼명에 찌꺼기들이 포함되어 있으면 어떻게 처리를 하나요?" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateOpen*HighLowClose**VolumeMarket Cap
143May 26, 20187486.487595.167349.127355.884051540000127682000000
144May 27, 20187362.087381.747270.967368.224056520000125575000000
145May 28, 20187371.317419.057100.897135.995040600000125748000000
146May 29, 20187129.467526.427090.687472.595662660000121636000000
147May 30, 20187469.737573.777313.607406.524922540000127454000000
\n", "
" ], "text/plain": [ " Date Open* High Low Close** Volume \\\n", "143 May 26, 2018 7486.48 7595.16 7349.12 7355.88 4051540000 \n", "144 May 27, 2018 7362.08 7381.74 7270.96 7368.22 4056520000 \n", "145 May 28, 2018 7371.31 7419.05 7100.89 7135.99 5040600000 \n", "146 May 29, 2018 7129.46 7526.42 7090.68 7472.59 5662660000 \n", "147 May 30, 2018 7469.73 7573.77 7313.60 7406.52 4922540000 \n", "\n", " Market Cap \n", "143 127682000000 \n", "144 125575000000 \n", "145 125748000000 \n", "146 121636000000 \n", "147 127454000000 " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "prices = pd.read_html(\"https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20180103&end=20180530\")[0]\n", "prices = prices[::-1]\n", "prices.reset_index(inplace=True, drop=True)\n", "prices.tail()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeMarket Cap
Date
2018-01-0314978.215572.814844.515201.016871900000251312000000
2018-01-0415270.715739.714522.215599.221783200000256250000000
2018-01-0515477.217705.215202.817429.523840900000259748000000
\n", "
" ], "text/plain": [ " Open High Low Close Volume Market Cap\n", "Date \n", "2018-01-03 14978.2 15572.8 14844.5 15201.0 16871900000 251312000000\n", "2018-01-04 15270.7 15739.7 14522.2 15599.2 21783200000 256250000000\n", "2018-01-05 15477.2 17705.2 15202.8 17429.5 23840900000 259748000000" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prices.index = pd.DatetimeIndex(prices.Date)\n", "prices.columns = [ col.replace('*', '') for col in prices.columns]\n", "del prices['Date']\n", "prices.head(3)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "DatetimeIndex: 148 entries, 2018-01-03 to 2018-05-30\n", "Data columns (total 6 columns):\n", "Open 148 non-null float64\n", "High 148 non-null float64\n", "Low 148 non-null float64\n", "Close 148 non-null float64\n", "Volume 148 non-null int64\n", "Market Cap 148 non-null int64\n", "dtypes: float64(4), int64(2)\n", "memory usage: 8.1 KB\n" ] } ], "source": [ "prices.info()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeMarket Cap
Date
2018-05-267486.487595.167349.127355.884051540000127682000000
2018-05-277362.087381.747270.967368.224056520000125575000000
2018-05-287371.317419.057100.897135.995040600000125748000000
2018-05-297129.467526.427090.687472.595662660000121636000000
2018-05-307469.737573.777313.607406.524922540000127454000000
\n", "
" ], "text/plain": [ " Open High Low Close Volume Market Cap\n", "Date \n", "2018-05-26 7486.48 7595.16 7349.12 7355.88 4051540000 127682000000\n", "2018-05-27 7362.08 7381.74 7270.96 7368.22 4056520000 125575000000\n", "2018-05-28 7371.31 7419.05 7100.89 7135.99 5040600000 125748000000\n", "2018-05-29 7129.46 7526.42 7090.68 7472.59 5662660000 121636000000\n", "2018-05-30 7469.73 7573.77 7313.60 7406.52 4922540000 127454000000" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "prices.Close.plot(figsize=(16,4), grid=True)\n", "prices.tail()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pandas_datareader import get_data_google" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/markbaum/Python/python/lib/python3.6/site-packages/pandas_datareader/google/daily.py:41: UnstableAPIWarning: \n", "The Google Finance API has not been stable since late 2017. Requests seem\n", "to fail at random. Failure is especially common when bulk downloading.\n", "\n", " warnings.warn(UNSTABLE_WARNING, UnstableAPIWarning)\n" ] }, { "ename": "RemoteDataError", "evalue": "Unable to read URL: https://finance.google.co.uk/bctzjpnsun/historical?q=KRX%3AKOSPI&startdate=Jan+01%2C+2010&enddate=Jun+08%2C+2018&output=csv\nResponse Text:\nb'\\n\\n \\n \\n Error 404 (Not Found)!!1\\n \\n \\n

404. That\\xe2\\x80\\x99s an error.\\n

The requested URL /bctzjpnsun/historical was not found on this server. That\\xe2\\x80\\x99s all we know.\\n'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mRemoteDataError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mget_data_google\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'KRX:KOSPI'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m~/Python/python/lib/python3.6/site-packages/pandas_datareader/data.py\u001b[0m in \u001b[0;36mget_data_google\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 65\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 66\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_data_google\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 67\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mGoogleDailyReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 68\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 69\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Python/python/lib/python3.6/site-packages/pandas_datareader/base.py\u001b[0m in \u001b[0;36mread\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 208\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msymbols\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mcompat\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstring_types\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 209\u001b[0m df = self._read_one_data(self.url,\n\u001b[0;32m--> 210\u001b[0;31m params=self._get_params(self.symbols))\n\u001b[0m\u001b[1;32m 211\u001b[0m \u001b[0;31m# Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 212\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msymbols\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mDataFrame\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Python/python/lib/python3.6/site-packages/pandas_datareader/base.py\u001b[0m in \u001b[0;36m_read_one_data\u001b[0;34m(self, url, params)\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[0;34m\"\"\" read one data from specified URL \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 83\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_format\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'string'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 84\u001b[0;31m \u001b[0mout\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_read_url_as_StringIO\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 85\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_format\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'json'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 86\u001b[0m \u001b[0mout\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_response\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjson\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Python/python/lib/python3.6/site-packages/pandas_datareader/base.py\u001b[0m in \u001b[0;36m_read_url_as_StringIO\u001b[0;34m(self, url, params)\u001b[0m\n\u001b[1;32m 93\u001b[0m \u001b[0mOpen\u001b[0m \u001b[0murl\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;32mand\u001b[0m \u001b[0mretry\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 94\u001b[0m \"\"\"\n\u001b[0;32m---> 95\u001b[0;31m \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_response\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0murl\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mparams\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 96\u001b[0m \u001b[0mtext\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_sanitize_response\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresponse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 97\u001b[0m \u001b[0mout\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mStringIO\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Python/python/lib/python3.6/site-packages/pandas_datareader/base.py\u001b[0m in \u001b[0;36m_get_response\u001b[0;34m(self, url, params, headers)\u001b[0m\n\u001b[1;32m 153\u001b[0m \u001b[0mmsg\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;34m'\\nResponse Text:\\n{0}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlast_response_text\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 154\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 155\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mRemoteDataError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 156\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 157\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_crumb\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mRemoteDataError\u001b[0m: Unable to read URL: https://finance.google.co.uk/bctzjpnsun/historical?q=KRX%3AKOSPI&startdate=Jan+01%2C+2010&enddate=Jun+08%2C+2018&output=csv\nResponse Text:\nb'\\n\\n \\n \\n Error 404 (Not Found)!!1\\n \\n \\n

404. That\\xe2\\x80\\x99s an error.\\n

The requested URL /bctzjpnsun/historical was not found on this server. That\\xe2\\x80\\x99s all we know.\\n'" ] } ], "source": [ "get_data_google('KRX:KOSPI')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from googlefinance.get import get_data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codeopenhighlowclosevolume
date
2018-04-24KRX:KOSPI2478.472479.322454.442464.14524280000
2018-04-25KRX:KOSPI2444.192453.542436.512448.81466095000
2018-04-26KRX:KOSPI2460.622484.092456.202475.64520917000
2018-04-27KRX:KOSPI2497.752508.132484.192492.40464957000
2018-04-30KRX:KOSPI2502.292515.382500.222515.38746291000
2018-05-03KRX:KOSPI2506.942507.912487.252487.25614102000
2018-05-04KRX:KOSPI2486.472487.772461.382461.38612024000
2018-05-08KRX:KOSPI2468.452479.752444.082449.81714734000
2018-05-09KRX:KOSPI2450.712451.862428.792443.98593365000
2018-05-10KRX:KOSPI2458.672464.722448.012464.16490004000
2018-05-11KRX:KOSPI2469.302483.852468.412477.71599900000
2018-05-14KRX:KOSPI2482.972486.172471.912476.11650707000
2018-05-15KRX:KOSPI2476.872480.222456.202458.54712619000
2018-05-16KRX:KOSPI2446.642465.552444.672459.82691813000
2018-05-17KRX:KOSPI2468.722472.822448.432448.45596771000
2018-05-18KRX:KOSPI2459.732461.952452.342460.65431376000
2018-05-21KRX:KOSPI2464.072472.302447.692465.57588088000
2018-05-23KRX:KOSPI2462.982476.862460.072471.91752675000
2018-05-24KRX:KOSPI2477.482481.312457.842466.01638125000
2018-05-25KRX:KOSPI2452.802466.572444.772460.80653592000
2018-05-28KRX:KOSPI2465.002482.402463.142478.96708362000
2018-05-29KRX:KOSPI2476.702479.682457.182457.25571836000
2018-05-30KRX:KOSPI2446.812449.882399.582409.03575922000
2018-05-31KRX:KOSPI2428.832430.152415.502423.01816930000
2018-06-01KRX:KOSPI2419.632445.312418.112438.96603023000
2018-06-04KRX:KOSPI2444.622452.672441.252447.76438929000
2018-06-05KRX:KOSPI2450.392455.782432.812453.76501363000
2018-06-07KRX:KOSPI2468.262478.672466.012470.58474146000
\n", "
" ], "text/plain": [ " code open high low close volume\n", "date \n", "2018-04-24 KRX:KOSPI 2478.47 2479.32 2454.44 2464.14 524280000\n", "2018-04-25 KRX:KOSPI 2444.19 2453.54 2436.51 2448.81 466095000\n", "2018-04-26 KRX:KOSPI 2460.62 2484.09 2456.20 2475.64 520917000\n", "2018-04-27 KRX:KOSPI 2497.75 2508.13 2484.19 2492.40 464957000\n", "2018-04-30 KRX:KOSPI 2502.29 2515.38 2500.22 2515.38 746291000\n", "2018-05-03 KRX:KOSPI 2506.94 2507.91 2487.25 2487.25 614102000\n", "2018-05-04 KRX:KOSPI 2486.47 2487.77 2461.38 2461.38 612024000\n", "2018-05-08 KRX:KOSPI 2468.45 2479.75 2444.08 2449.81 714734000\n", "2018-05-09 KRX:KOSPI 2450.71 2451.86 2428.79 2443.98 593365000\n", "2018-05-10 KRX:KOSPI 2458.67 2464.72 2448.01 2464.16 490004000\n", "2018-05-11 KRX:KOSPI 2469.30 2483.85 2468.41 2477.71 599900000\n", "2018-05-14 KRX:KOSPI 2482.97 2486.17 2471.91 2476.11 650707000\n", "2018-05-15 KRX:KOSPI 2476.87 2480.22 2456.20 2458.54 712619000\n", "2018-05-16 KRX:KOSPI 2446.64 2465.55 2444.67 2459.82 691813000\n", "2018-05-17 KRX:KOSPI 2468.72 2472.82 2448.43 2448.45 596771000\n", "2018-05-18 KRX:KOSPI 2459.73 2461.95 2452.34 2460.65 431376000\n", "2018-05-21 KRX:KOSPI 2464.07 2472.30 2447.69 2465.57 588088000\n", "2018-05-23 KRX:KOSPI 2462.98 2476.86 2460.07 2471.91 752675000\n", "2018-05-24 KRX:KOSPI 2477.48 2481.31 2457.84 2466.01 638125000\n", "2018-05-25 KRX:KOSPI 2452.80 2466.57 2444.77 2460.80 653592000\n", "2018-05-28 KRX:KOSPI 2465.00 2482.40 2463.14 2478.96 708362000\n", "2018-05-29 KRX:KOSPI 2476.70 2479.68 2457.18 2457.25 571836000\n", "2018-05-30 KRX:KOSPI 2446.81 2449.88 2399.58 2409.03 575922000\n", "2018-05-31 KRX:KOSPI 2428.83 2430.15 2415.50 2423.01 816930000\n", "2018-06-01 KRX:KOSPI 2419.63 2445.31 2418.11 2438.96 603023000\n", "2018-06-04 KRX:KOSPI 2444.62 2452.67 2441.25 2447.76 438929000\n", "2018-06-05 KRX:KOSPI 2450.39 2455.78 2432.81 2453.76 501363000\n", "2018-06-07 KRX:KOSPI 2468.26 2478.67 2466.01 2470.58 474146000" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_data('KRX:KOSPI')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }