{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Example Data Research Workflow using Python.ipynb", "provenance": [], "collapsed_sections": [], "authorship_tag": "ABX9TyOVd1j0ezmG21fWp22G6fRT", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "zV1jX0Je8Hg3" }, "source": [ "#Example Data Research Workflow using Python\n", "## Source: WRDS https://wrds-www.wharton.upenn.edu/" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "V9vLXLxQAvPF", "outputId": "2712522c-5166-43fd-aff9-9bcec0c9b4b6" }, "source": [ "!pip install wrds" ], "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Collecting wrds\n", " Downloading wrds-3.1.1-py3-none-any.whl (12 kB)\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from wrds) (1.21.6)\n", "Collecting psycopg2-binary\n", " Downloading psycopg2_binary-2.9.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)\n", "\u001b[K |████████████████████████████████| 3.0 MB 4.5 MB/s \n", "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from wrds) (1.3.5)\n", "Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.7/dist-packages (from wrds) (1.4.36)\n", "Collecting mock\n", " Downloading mock-4.0.3-py3-none-any.whl (28 kB)\n", "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->wrds) (2.8.2)\n", "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->wrds) (2022.1)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->wrds) (1.15.0)\n", "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.7/dist-packages (from sqlalchemy->wrds) (1.1.2)\n", "Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from sqlalchemy->wrds) (4.11.3)\n", "Requirement already satisfied: typing-extensions>=3.6.4 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->sqlalchemy->wrds) (4.2.0)\n", "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->sqlalchemy->wrds) (3.8.0)\n", "Installing collected packages: psycopg2-binary, mock, wrds\n", "Successfully installed mock-4.0.3 psycopg2-binary-2.9.3 wrds-3.1.1\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "ofJt5guG8HcX" }, "source": [ "The following series of stand-alone queries represents a basic example Python workflow using WRDS data. The commands in this workflow could be run interactively or submitted via a batch job using Python in the WRDS Cloud, or run locally from your computer using a Jupyter notebook. For this example, we'll use the CRSP Daily Stock File (```crsp.dsf```) data library\n", "\n", "In real-world Python programming, you would probably use the results of these queries to perform additional statistical analysis in your program. This document is meant as an introductory walkthrough.\n", "\n", "First, as with every Python program that intends to connect to WRDS, we must import the wrds module and make our connection:" ] }, { "cell_type": "markdown", "metadata": { "id": "STKRvgnP8jg4" }, "source": [ "We must also have set up our pgpass file as described earlier in the documentation.\n", "\n", "Note: Class accounts and IPAuth / Daypass accounts are not permitted to access WRDS in this manner and will receive an error if trying this connection. You must have your own, dedicated WRDS account in order to access WRDS from MATLAB.\n", "\n", "Let's get started. The initial queries (metadata queries) have dedicated wrds module methods to give you the results you're looking for. Later queries (data queries) will use ```raw_sql()``` exclusively.\n", "\n", "1. To determine the libraries available at WRDS:" ] }, { "cell_type": "code", "metadata": { "id": "6iNswu3f8gqe" }, "source": [ "import wrds\n", "db = wrds.Connection()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "nz8kNoTb8gs_", "outputId": "8a7af630-13e4-4ceb-f1eb-b415a11eabc7" }, "source": [ "db.list_libraries()" ], "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['compbd',\n", " 'compgd',\n", " 'compnad',\n", " 'compsegd',\n", " 'crspa',\n", " 'evts',\n", " 'issm',\n", " 'nastraq',\n", " 'wrds_lib_internal',\n", " 'tass',\n", " 'aha',\n", " 'estimize',\n", " 'toyo',\n", " 'levin',\n", " 'ahasamp',\n", " 'crspm',\n", " 'tresgsmp',\n", " 'trsdcgs',\n", " 'columnar',\n", " 'trws',\n", " 'lspd',\n", " 'twoiq',\n", " 'wcai',\n", " 'csmar',\n", " 'wind',\n", " 'wqa',\n", " 'tresg',\n", " 'markit',\n", " 'msrb_all',\n", " 'risk',\n", " 'custom_jl',\n", " 'centris',\n", " 'block_all',\n", " 'zacks',\n", " 'boardex_trial',\n", " 'audit',\n", " 'comp_na_monthly_all',\n", " 'emdb',\n", " 'auditsmp',\n", " 'boardsmp',\n", " 'calcbnch',\n", " 'comp_segments_hist',\n", " 'wrdsrpts_rep_usage',\n", " 'public_all',\n", " 'pwt_all',\n", " 'msfanly',\n", " 'etfg',\n", " 'cboe_all',\n", " 'wrdsapps_backtest_plus',\n", " 'pacap',\n", " 'ciqsamp_transcripts',\n", " 'ciq',\n", " 'tr_ds',\n", " 'ppublica',\n", " 'preqin',\n", " 'crsp_a_stock',\n", " 'fisdsamp',\n", " 'etfg_samp',\n", " 'windsmp',\n", " 'rent',\n", " 'wrdsapps_evtstudy_int_ginsight',\n", " 'compmcur',\n", " 'secsamp_all',\n", " 'rpna',\n", " 'ibes',\n", " 'rpa',\n", " 'wrdssec_midas',\n", " 'sdcsamp',\n", " 'cisdmsmp',\n", " 'snl',\n", " 'sprat',\n", " 'ciqsamp',\n", " 'cboe',\n", " 'sustain',\n", " 'trucost',\n", " 'sdc',\n", " 'fjc_linking',\n", " 'otc_endofday',\n", " 'optionm',\n", " 'compa',\n", " 'bank',\n", " 'blab',\n", " 'block',\n", " 'compb',\n", " 'compg',\n", " 'taqmsec',\n", " 'comp_execucomp',\n", " 'trsamp_ds_eq',\n", " 'contrib_ceo_turnover',\n", " 'cisdm',\n", " 'clrvt',\n", " 'clrvtsmp',\n", " 'compsamp',\n", " 'trdstrm',\n", " 'compm',\n", " 'compseg',\n", " 'comph',\n", " 'compsnap',\n", " 'comscore',\n", " 'sustainalyticssamp_all',\n", " 'ff_all',\n", " 'pitchbk',\n", " 'crspsamp',\n", " 'dmef',\n", " 'eursamp',\n", " 'contrib_kpss',\n", " 'auditsmp_all',\n", " 'ciqsamp_pplintel',\n", " 'execcomp',\n", " 'taqmsamp_all',\n", " 'crspq',\n", " 'boardex',\n", " 'ftse',\n", " 'cusipm',\n", " 'eureka',\n", " 'eventus',\n", " 'ftsesamp',\n", " 'ginsight',\n", " 'ktsamp',\n", " 'factset',\n", " 'ifgrsamp',\n", " 'imssamp',\n", " 'gmi',\n", " 'fisd',\n", " 'ff',\n", " 'fssamp',\n", " 'govpxsmp',\n", " 'hbsamp',\n", " 'hbase',\n", " 'ibeskpi',\n", " 'ifgr',\n", " 'ibescorp',\n", " 'mrktsamp_msf',\n", " 'iri',\n", " 'kld',\n", " 'govpx',\n", " 'hfr',\n", " 'lspdsamp',\n", " 'ims',\n", " 'ktmine',\n", " 'taq',\n", " 'aha_sample',\n", " 'calcbench_trial',\n", " 'ppubsamp_d4d',\n", " 'iss',\n", " 'ciqsamp_common',\n", " 'comp_na_daily_all',\n", " 'compdcur',\n", " 'otc',\n", " 'mrktsamp',\n", " 'contrib_general',\n", " 'preqsamp_all',\n", " 'twoiq_samp',\n", " 'msrb',\n", " 'msrbsamp',\n", " 'mrktsamp_cdx',\n", " 'phlx_all',\n", " 'zacksamp_all',\n", " 'fjc_litigation',\n", " 'mfl',\n", " 'etfgsamp',\n", " 'sustsamp',\n", " 'phlx',\n", " 'ppubsamp',\n", " 'preqsamp',\n", " 'wrdsapps_evtstudy_lr',\n", " 'ciqsamp_keydev',\n", " 'crsp_a_ccm',\n", " 'crsp',\n", " 'tfn',\n", " 'wappsamp',\n", " 'revere',\n", " 'toyosamp',\n", " 'zacksamp',\n", " 'wrdsrpts',\n", " 'eurekahedge_sample',\n", " 'ims_obp_trial',\n", " 'comp_na_annual_all',\n", " 'compsamp_snapshot',\n", " 'rq_all',\n", " 'comp_bank_daily',\n", " 'infogroupsamp_business',\n", " 'comp_segments_hist_daily',\n", " 'infogroupsamp_residential',\n", " 'ktmine_patents_samp',\n", " 'djones_all',\n", " 'doe_all',\n", " 'factsamp_all',\n", " 'wrdsapps_eushort',\n", " 'wrdsapps_subsidiary',\n", " 'wrdsappssamp_all',\n", " 'wrdsapps_link_datastream_wscope',\n", " 'wrdsapps_patents',\n", " 'wrdsapps_link_comp_eushort',\n", " 'dmef_all',\n", " 'macrofin_comm_trade',\n", " 'mrktsamp_cds',\n", " 'pitchsmp',\n", " 'totalq_all',\n", " 'toyosamp_all',\n", " 'trsamp_dscom',\n", " 'trsamp_dsecon',\n", " 'reprisk',\n", " 'rpnasamp',\n", " 'comp',\n", " 'audit_corp_legal',\n", " 'trcstsmp',\n", " 'trsamp_dsfut',\n", " 'wrdsapps_backtest_basic',\n", " 'wrdsapps_link_supplychain',\n", " 'wrdsapps',\n", " 'wrdssec',\n", " 'audit_common',\n", " 'contrib_intangible_value',\n", " 'ciqsamp_capstrct',\n", " 'contrib_char_returns',\n", " 'tr_dealscan',\n", " 'crsp_q_indexhist',\n", " 'bvd',\n", " 'hfrsamp_hfrdb',\n", " 'factsamp_revere',\n", " 'frb_all',\n", " 'trown',\n", " 'hbase_sample',\n", " 'secsamp',\n", " 'lvnsamp_all',\n", " 'optionmsamp_europe',\n", " 'optionmsamp_us',\n", " 'reprisk_sample',\n", " 'risksamp_all',\n", " 'snlsamp_fig',\n", " 'ravenpack_trial',\n", " 'wrdsapps_finratio_ccm',\n", " 'trace_standard',\n", " 'trace_enhanced',\n", " 'bvdsamp',\n", " 'contrib',\n", " 'wrdsapps_link_crsp_bond',\n", " 'djones',\n", " 'doe',\n", " 'frb',\n", " 'hfrsamp',\n", " 'lvnsamp',\n", " 'macrofin',\n", " 'midas',\n", " 'omtrial',\n", " 'public',\n", " 'pwt',\n", " 'repsamp',\n", " 'risksamp',\n", " 'snapsamp',\n", " 'snlsamp',\n", " 'taqsamp',\n", " 'totalq',\n", " 'trdssamp',\n", " 'fjc',\n", " 'trace',\n", " 'wrdsapps_link_crsp_factset',\n", " 'wrdsapps_link_crsp_taq',\n", " 'comp_bank',\n", " 'trsamp',\n", " 'twoiqsmp',\n", " 'audit_audit_comp',\n", " 'wrdsapps_finratio',\n", " 'dealscan',\n", " 'crsp_a_indexes',\n", " 'msfinst',\n", " 'taqmsamp']" ] }, "metadata": {}, "execution_count": 3 } ] }, { "cell_type": "markdown", "metadata": { "id": "iOMIZY4b8tgQ" }, "source": [ "2. Let's work with the CRSP library. From the above results, that's crsp. What datasets are available within this library?" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "MWp4cHDq8whY", "outputId": "ccc0a750-c7ae-4723-dbe6-685fe287c5bf" }, "source": [ "db.list_tables('crsp')" ], "execution_count": 4, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['acti',\n", " 'asia',\n", " 'asib',\n", " 'asic',\n", " 'asio',\n", " 'asix',\n", " 'bmdebt',\n", " 'bmheader',\n", " 'bmpaymts',\n", " 'bmquotes',\n", " 'bmyield',\n", " 'bndprt06',\n", " 'bndprt12',\n", " 'bxcalind',\n", " 'bxdlyind',\n", " 'bxmthind',\n", " 'bxquotes',\n", " 'bxyield',\n", " 'cap',\n", " 'ccm_lookup',\n", " 'ccm_qvards',\n", " 'ccmxpf_linktable',\n", " 'ccmxpf_lnkhist',\n", " 'ccmxpf_lnkrng',\n", " 'ccmxpf_lnkused',\n", " 'comphead',\n", " 'comphist',\n", " 'compmaster',\n", " 'contact_info',\n", " 'crsp_cik_map',\n", " 'crsp_daily_data',\n", " 'crsp_header',\n", " 'crsp_monthly_data',\n", " 'crsp_names',\n", " 'crsp_portno_map',\n", " 'crsp_ziman_daily_index',\n", " 'crsp_ziman_monthly_index',\n", " 'cs20yr',\n", " 'cs5yr',\n", " 'cs90d',\n", " 'cst_hist',\n", " 'daily_nav',\n", " 'daily_nav_ret',\n", " 'daily_returns',\n", " 'dividends',\n", " 'dport1',\n", " 'dport2',\n", " 'dport3',\n", " 'dport4',\n", " 'dport5',\n", " 'dport6',\n", " 'dport7',\n", " 'dport8',\n", " 'dport9',\n", " 'dsbc',\n", " 'dsbo',\n", " 'dse',\n", " 'dse62',\n", " 'dse62delist',\n", " 'dse62dist',\n", " 'dse62exchdates',\n", " 'dse62names',\n", " 'dse62nasdin',\n", " 'dse62shares',\n", " 'dseall',\n", " 'dseall62',\n", " 'dsedelist',\n", " 'dsedist',\n", " 'dseexchdates',\n", " 'dsenames',\n", " 'dsenasdin',\n", " 'dseshares',\n", " 'dsf',\n", " 'dsf62',\n", " 'dsfhdr',\n", " 'dsfhdr62',\n", " 'dsi',\n", " 'dsi62',\n", " 'dsia',\n", " 'dsib',\n", " 'dsic',\n", " 'dsio',\n", " 'dsir',\n", " 'dsix',\n", " 'dsiy',\n", " 'dsp500',\n", " 'dsp500list',\n", " 'dsp500p',\n", " 'dssc',\n", " 'dsso',\n", " 'eod_cap',\n", " 'eod_sector',\n", " 'eod_vg',\n", " 'erdport1',\n", " 'erdport2',\n", " 'erdport3',\n", " 'erdport4',\n", " 'erdport5',\n", " 'erdport6',\n", " 'erdport7',\n", " 'erdport8',\n", " 'erdport9',\n", " 'ermport1',\n", " 'ermport2',\n", " 'ermport3',\n", " 'ermport4',\n", " 'ermport5',\n", " 'fbpri',\n", " 'fbyld',\n", " 'front_load',\n", " 'front_load_det',\n", " 'front_load_grp',\n", " 'fund_fees',\n", " 'fund_flows',\n", " 'fund_hdr',\n", " 'fund_hdr_hist',\n", " 'fund_names',\n", " 'fund_style',\n", " 'fund_summary',\n", " 'fund_summary2',\n", " 'fwdask06',\n", " 'fwdask12',\n", " 'fwdave06',\n", " 'fwdave12',\n", " 'fwdbid06',\n", " 'fwdbid12',\n", " 'hldask06',\n", " 'hldask12',\n", " 'hldave06',\n", " 'hldave12',\n", " 'hldbid06',\n", " 'hldbid12',\n", " 'holdings',\n", " 'holdings_co_info',\n", " 'index_descriptions',\n", " 'index_type_map',\n", " 'mbi',\n", " 'mbmdat',\n", " 'mbmhdr',\n", " 'mbx',\n", " 'mbxid',\n", " 'mcti',\n", " 'mfdbname',\n", " 'mhista',\n", " 'mhistn',\n", " 'mhistq',\n", " 'monthly_nav',\n", " 'monthly_returns',\n", " 'monthly_tna',\n", " 'monthly_tna_ret_nav',\n", " 'mport1',\n", " 'mport2',\n", " 'mport3',\n", " 'mport4',\n", " 'mport5',\n", " 'mse',\n", " 'mse62',\n", " 'mse62delist',\n", " 'mse62dist',\n", " 'mse62exchdates',\n", " 'mse62names',\n", " 'mse62nasdin',\n", " 'mse62shares',\n", " 'mseall',\n", " 'mseall62',\n", " 'msedelist',\n", " 'msedist',\n", " 'mseexchdates',\n", " 'msenames',\n", " 'msenasdin',\n", " 'mseshares',\n", " 'msf',\n", " 'msf62',\n", " 'msfhdr',\n", " 'msfhdr62',\n", " 'msi',\n", " 'msi62',\n", " 'msia',\n", " 'msib',\n", " 'msic',\n", " 'msio',\n", " 'msir',\n", " 'msix',\n", " 'msiy',\n", " 'msp500',\n", " 'msp500list',\n", " 'msp500p',\n", " 'portnomap',\n", " 'priask06',\n", " 'priask12',\n", " 'priave06',\n", " 'priave12',\n", " 'pribid06',\n", " 'pribid12',\n", " 'price_type',\n", " 'property_type',\n", " 'qcti',\n", " 'qsia',\n", " 'qsib',\n", " 'qsic',\n", " 'qsio',\n", " 'qsix',\n", " 'rear_load',\n", " 'rear_load_det',\n", " 'rear_load_grp',\n", " 'rebala',\n", " 'rebaln',\n", " 'rebalq',\n", " 'reit_type',\n", " 'riskfree',\n", " 's6z_agg_ann',\n", " 's6z_agg_mth',\n", " 's6z_agg_qtr',\n", " 's6z_del',\n", " 's6z_dind',\n", " 's6z_dis',\n", " 's6z_dp_dly',\n", " 's6z_ds_dly',\n", " 's6z_hdr',\n", " 's6z_indhdr',\n", " 's6z_mdel',\n", " 's6z_mind',\n", " 's6z_mth',\n", " 's6z_nam',\n", " 's6z_ndi',\n", " 's6z_shr',\n", " 'saz_agg_ann',\n", " 'saz_agg_mth',\n", " 'saz_agg_qtr',\n", " 'saz_del',\n", " 'saz_dind',\n", " 'saz_dis',\n", " 'saz_dp_dly',\n", " 'saz_ds_dly',\n", " 'saz_hdr',\n", " 'saz_indhdr',\n", " 'saz_mdel',\n", " 'saz_mind',\n", " 'saz_mth',\n", " 'saz_nam',\n", " 'saz_ndi',\n", " 'saz_shr',\n", " 'sechead',\n", " 'sechist',\n", " 'sector',\n", " 'sfz_dind',\n", " 'sfz_indhdr',\n", " 'sfz_mbr',\n", " 'sfz_mind',\n", " 'sfz_portd',\n", " 'sfz_portm',\n", " 'sfz_rb',\n", " 'stock_qvards',\n", " 'stocknames',\n", " 'stocknames62',\n", " 'sub_property_type',\n", " 'tfz_dly',\n", " 'tfz_dly_cd',\n", " 'tfz_dly_cpi',\n", " 'tfz_dly_ft',\n", " 'tfz_dly_rf2',\n", " 'tfz_dly_ts2',\n", " 'tfz_idx',\n", " 'tfz_iss',\n", " 'tfz_mast',\n", " 'tfz_mth',\n", " 'tfz_mth_bp',\n", " 'tfz_mth_cd',\n", " 'tfz_mth_cpi',\n", " 'tfz_mth_fb',\n", " 'tfz_mth_ft',\n", " 'tfz_mth_rf',\n", " 'tfz_mth_rf2',\n", " 'tfz_mth_ts',\n", " 'tfz_mth_ts2',\n", " 'tfz_pay',\n", " 'vg',\n", " 'yldask06',\n", " 'yldask12',\n", " 'yldave06',\n", " 'yldave12',\n", " 'yldbid06',\n", " 'yldbid12',\n", " 'ziman_reit_info',\n", " 'zr_hdrnames']" ] }, "metadata": {}, "execution_count": 4 } ] }, { "cell_type": "markdown", "metadata": { "id": "7lh7QbOV8tnP" }, "source": [ "\n", "3. The results show that many datasets are available within the crsp library. Let's work with the Daily Stock File (```dsf```) dataset, and take a look at the list of available data variables (column names) in that dataset:\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 694 }, "id": "JmJREX908gv0", "outputId": "6a052795-95d0-4f36-eb0f-3ec102fd9654" }, "source": [ "db.describe_table('crsp', 'dsf')" ], "execution_count": 5, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Approximately 101070400 rows in crsp.dsf.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " name nullable type\n", "0 cusip True VARCHAR(8)\n", "1 permno True DOUBLE_PRECISION\n", "2 permco True DOUBLE_PRECISION\n", "3 issuno True DOUBLE_PRECISION\n", "4 hexcd True DOUBLE_PRECISION\n", "5 hsiccd True DOUBLE_PRECISION\n", "6 date True DATE\n", "7 bidlo True DOUBLE_PRECISION\n", "8 askhi True DOUBLE_PRECISION\n", "9 prc True DOUBLE_PRECISION\n", "10 vol True DOUBLE_PRECISION\n", "11 ret True DOUBLE_PRECISION\n", "12 bid True DOUBLE_PRECISION\n", "13 ask True DOUBLE_PRECISION\n", "14 shrout True DOUBLE_PRECISION\n", "15 cfacpr True DOUBLE_PRECISION\n", "16 cfacshr True DOUBLE_PRECISION\n", "17 openprc True DOUBLE_PRECISION\n", "18 numtrd True DOUBLE_PRECISION\n", "19 retx True DOUBLE_PRECISION" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namenullabletype
0cusipTrueVARCHAR(8)
1permnoTrueDOUBLE_PRECISION
2permcoTrueDOUBLE_PRECISION
3issunoTrueDOUBLE_PRECISION
4hexcdTrueDOUBLE_PRECISION
5hsiccdTrueDOUBLE_PRECISION
6dateTrueDATE
7bidloTrueDOUBLE_PRECISION
8askhiTrueDOUBLE_PRECISION
9prcTrueDOUBLE_PRECISION
10volTrueDOUBLE_PRECISION
11retTrueDOUBLE_PRECISION
12bidTrueDOUBLE_PRECISION
13askTrueDOUBLE_PRECISION
14shroutTrueDOUBLE_PRECISION
15cfacprTrueDOUBLE_PRECISION
16cfacshrTrueDOUBLE_PRECISION
17openprcTrueDOUBLE_PRECISION
18numtrdTrueDOUBLE_PRECISION
19retxTrueDOUBLE_PRECISION
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 5 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 412 }, "id": "QjZWRijkEE6L", "outputId": "2da3fccb-9371-4891-c0f6-63d17c4e6041" }, "source": [ "db.describe_table('crsp', 'dsi')" ], "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Approximately 25360 rows in crsp.dsi.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " name nullable type\n", "0 date True DATE\n", "1 vwretd True DOUBLE_PRECISION\n", "2 vwretx True DOUBLE_PRECISION\n", "3 ewretd True DOUBLE_PRECISION\n", "4 ewretx True DOUBLE_PRECISION\n", "5 sprtrn True DOUBLE_PRECISION\n", "6 spindx True DOUBLE_PRECISION\n", "7 totval True DOUBLE_PRECISION\n", "8 totcnt True DOUBLE_PRECISION\n", "9 usdval True DOUBLE_PRECISION\n", "10 usdcnt True DOUBLE_PRECISION" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namenullabletype
0dateTrueDATE
1vwretdTrueDOUBLE_PRECISION
2vwretxTrueDOUBLE_PRECISION
3ewretdTrueDOUBLE_PRECISION
4ewretxTrueDOUBLE_PRECISION
5sprtrnTrueDOUBLE_PRECISION
6spindxTrueDOUBLE_PRECISION
7totvalTrueDOUBLE_PRECISION
8totcntTrueDOUBLE_PRECISION
9usdvalTrueDOUBLE_PRECISION
10usdcntTrueDOUBLE_PRECISION
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 255 }, "id": "MPT2n9j8EE9P", "outputId": "d9b433c0-9675-475e-9bd3-1f427f639c07" }, "source": [ "db.describe_table('crsp', 'crsp_header')" ], "execution_count": 7, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Approximately 673 rows in crsp.crsp_header.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " name nullable type\n", "0 permno True DOUBLE_PRECISION\n", "1 permco True DOUBLE_PRECISION\n", "2 begdt True DATE\n", "3 enddt True DATE\n", "4 comnam True VARCHAR(64)\n", "5 hdlstcd True DOUBLE_PRECISION" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namenullabletype
0permnoTrueDOUBLE_PRECISION
1permcoTrueDOUBLE_PRECISION
2begdtTrueDATE
3enddtTrueDATE
4comnamTrueVARCHAR(64)
5hdlstcdTrueDOUBLE_PRECISION
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 7 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "09Aq_hgZEFAa", "outputId": "602ac86d-3399-462c-d8d3-7a5aef114d43" }, "source": [ "db.describe_table('crsp', 'comphead')" ], "execution_count": 8, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Approximately 51232 rows in crsp.comphead.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " name nullable type\n", "0 gvkey True VARCHAR(6)\n", "1 cik True VARCHAR(10)\n", "2 ein True VARCHAR(10)\n", "3 conm True VARCHAR(256)\n", "4 costat True VARCHAR(1)\n", "5 dlrsn True VARCHAR(8)\n", "6 priusa True VARCHAR(8)\n", "7 prican True VARCHAR(8)\n", "8 prirow True VARCHAR(8)\n", "9 idbflag True VARCHAR(1)\n", "10 fic True VARCHAR(3)\n", "11 loc True VARCHAR(3)\n", "12 incorp True VARCHAR(8)\n", "13 state True VARCHAR(8)\n", "14 county True VARCHAR(100)\n", "15 city True VARCHAR(104)\n", "16 naics True VARCHAR(8)\n", "17 conml True VARCHAR(100)\n", "18 weburl True VARCHAR(60)\n", "19 phone True VARCHAR(18)\n", "20 fax True VARCHAR(18)\n", "21 add1 True VARCHAR(65)\n", "22 add2 True VARCHAR(65)\n", "23 add3 True VARCHAR(65)\n", "24 add4 True VARCHAR(65)\n", "25 addzip True VARCHAR(24)\n", "26 busdesc True VARCHAR(2000)\n", "27 ipodate True DATE\n", "28 dldte True DATE\n", "29 stko True DOUBLE_PRECISION\n", "30 fyrc True DOUBLE_PRECISION\n", "31 sic True DOUBLE_PRECISION\n", "32 gsector True DOUBLE_PRECISION\n", "33 ggroup True DOUBLE_PRECISION\n", "34 gind True DOUBLE_PRECISION\n", "35 gsubind True DOUBLE_PRECISION\n", "36 spcindcd True DOUBLE_PRECISION\n", "37 spcseccd True DOUBLE_PRECISION" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namenullabletype
0gvkeyTrueVARCHAR(6)
1cikTrueVARCHAR(10)
2einTrueVARCHAR(10)
3conmTrueVARCHAR(256)
4costatTrueVARCHAR(1)
5dlrsnTrueVARCHAR(8)
6priusaTrueVARCHAR(8)
7pricanTrueVARCHAR(8)
8prirowTrueVARCHAR(8)
9idbflagTrueVARCHAR(1)
10ficTrueVARCHAR(3)
11locTrueVARCHAR(3)
12incorpTrueVARCHAR(8)
13stateTrueVARCHAR(8)
14countyTrueVARCHAR(100)
15cityTrueVARCHAR(104)
16naicsTrueVARCHAR(8)
17conmlTrueVARCHAR(100)
18weburlTrueVARCHAR(60)
19phoneTrueVARCHAR(18)
20faxTrueVARCHAR(18)
21add1TrueVARCHAR(65)
22add2TrueVARCHAR(65)
23add3TrueVARCHAR(65)
24add4TrueVARCHAR(65)
25addzipTrueVARCHAR(24)
26busdescTrueVARCHAR(2000)
27ipodateTrueDATE
28dldteTrueDATE
29stkoTrueDOUBLE_PRECISION
30fyrcTrueDOUBLE_PRECISION
31sicTrueDOUBLE_PRECISION
32gsectorTrueDOUBLE_PRECISION
33ggroupTrueDOUBLE_PRECISION
34gindTrueDOUBLE_PRECISION
35gsubindTrueDOUBLE_PRECISION
36spcindcdTrueDOUBLE_PRECISION
37spcseccdTrueDOUBLE_PRECISION
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 8 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 568 }, "id": "SFyy7h6vEnws", "outputId": "c45fce38-fae0-40e1-b33b-a30df3ce19c7" }, "source": [ "db.describe_table('crsp', 'crsp_names')" ], "execution_count": 9, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Approximately 2642 rows in crsp.crsp_names.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " name nullable type\n", "0 permno True DOUBLE_PRECISION\n", "1 namebegdt True DATE\n", "2 nameenddt True DATE\n", "3 ticker True VARCHAR(8)\n", "4 shrcls True VARCHAR(4)\n", "5 comnam True VARCHAR(64)\n", "6 exchcd True DOUBLE_PRECISION\n", "7 shrcd True DOUBLE_PRECISION\n", "8 siccd True DOUBLE_PRECISION\n", "9 primexch True VARCHAR(1)\n", "10 tsymbol True VARCHAR(12)\n", "11 secstat True VARCHAR(1)\n", "12 trdstat True VARCHAR(1)\n", "13 namedesc True VARCHAR(24)\n", "14 issuercd True VARCHAR(1)\n", "15 convcd True VARCHAR(1)" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namenullabletype
0permnoTrueDOUBLE_PRECISION
1namebegdtTrueDATE
2nameenddtTrueDATE
3tickerTrueVARCHAR(8)
4shrclsTrueVARCHAR(4)
5comnamTrueVARCHAR(64)
6exchcdTrueDOUBLE_PRECISION
7shrcdTrueDOUBLE_PRECISION
8siccdTrueDOUBLE_PRECISION
9primexchTrueVARCHAR(1)
10tsymbolTrueVARCHAR(12)
11secstatTrueVARCHAR(1)
12trdstatTrueVARCHAR(1)
13namedescTrueVARCHAR(24)
14issuercdTrueVARCHAR(1)
15convcdTrueVARCHAR(1)
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "code", "metadata": { "id": "4CYSNMhFEWlu" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 568 }, "id": "y1IBHijUFFtc", "outputId": "2e7a5179-0c52-4380-b991-d8740dbacb98" }, "source": [ "db.describe_table('crsp', 'stocknames')" ], "execution_count": 10, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Approximately 77779 rows in crsp.stocknames.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " name nullable type\n", "0 permno True DOUBLE_PRECISION\n", "1 namedt True DATE\n", "2 nameenddt True DATE\n", "3 shrcd True DOUBLE_PRECISION\n", "4 exchcd True DOUBLE_PRECISION\n", "5 siccd True DOUBLE_PRECISION\n", "6 ncusip True VARCHAR(8)\n", "7 ticker True VARCHAR(8)\n", "8 comnam True VARCHAR(35)\n", "9 shrcls True VARCHAR(4)\n", "10 permco True DOUBLE_PRECISION\n", "11 hexcd True DOUBLE_PRECISION\n", "12 cusip True VARCHAR(8)\n", "13 st_date True DATE\n", "14 end_date True DATE\n", "15 namedum True DOUBLE_PRECISION" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namenullabletype
0permnoTrueDOUBLE_PRECISION
1namedtTrueDATE
2nameenddtTrueDATE
3shrcdTrueDOUBLE_PRECISION
4exchcdTrueDOUBLE_PRECISION
5siccdTrueDOUBLE_PRECISION
6ncusipTrueVARCHAR(8)
7tickerTrueVARCHAR(8)
8comnamTrueVARCHAR(35)
9shrclsTrueVARCHAR(4)
10permcoTrueDOUBLE_PRECISION
11hexcdTrueDOUBLE_PRECISION
12cusipTrueVARCHAR(8)
13st_dateTrueDATE
14end_dateTrueDATE
15namedumTrueDOUBLE_PRECISION
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 10 } ] }, { "cell_type": "code", "metadata": { "id": "Ov1rrZeSEWoM" }, "source": [ "stocknames=db.get_table('crsp', 'stocknames')" ], "execution_count": 11, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 679 }, "id": "4GwsJaiTqnOm", "outputId": "cbdeb6a7-d1f4-433f-e7f8-101b8f25095f" }, "source": [ "stocknames" ], "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " permno namedt nameenddt shrcd exchcd siccd ncusip \\\n", "0 10000.0 1986-01-07 1987-06-11 10.0 3.0 3990.0 68391610 \n", "1 10001.0 1986-01-09 1993-11-21 11.0 3.0 4920.0 39040610 \n", "2 10001.0 1993-11-22 2008-02-04 11.0 3.0 4920.0 29274A10 \n", "3 10001.0 2008-02-05 2009-08-03 11.0 3.0 4920.0 29274A20 \n", "4 10001.0 2009-08-04 2009-12-17 11.0 3.0 4920.0 29269V10 \n", "... ... ... ... ... ... ... ... \n", "77774 93433.0 2013-04-24 2016-12-22 11.0 3.0 9999.0 92870X30 \n", "77775 93434.0 2010-06-14 2022-03-31 11.0 3.0 9999.0 78513510 \n", "77776 93435.0 2010-06-14 2012-05-18 11.0 3.0 6163.0 82936G20 \n", "77777 93436.0 2010-06-29 2017-02-01 11.0 3.0 9999.0 88160R10 \n", "77778 93436.0 2017-02-02 2022-03-31 11.0 3.0 9999.0 88160R10 \n", "\n", " ticker comnam shrcls permco hexcd cusip \\\n", "0 OMFGA OPTIMUM MANUFACTURING INC A 7952.0 3.0 68391610 \n", "1 GFGC GREAT FALLS GAS CO None 7953.0 2.0 36720410 \n", "2 EWST ENERGY WEST INC None 7953.0 2.0 36720410 \n", "3 EWST ENERGY WEST INC None 7953.0 2.0 36720410 \n", "4 EGAS ENERGY INC None 7953.0 2.0 36720410 \n", "... ... ... ... ... ... ... \n", "77774 VLTC VOLTARI CORP None 53451.0 3.0 92870X30 \n", "77775 SANW S & W SEED CO None 53427.0 3.0 78513510 \n", "77776 SCEI SINO CLEAN ENERGY INC None 53452.0 3.0 82936G20 \n", "77777 TSLA TESLA MOTORS INC None 53453.0 3.0 88160R10 \n", "77778 TSLA TESLA INC None 53453.0 3.0 88160R10 \n", "\n", " st_date end_date namedum \n", "0 1986-01-07 1987-06-11 2.0 \n", "1 1986-01-09 2017-08-03 2.0 \n", "2 1986-01-09 2017-08-03 2.0 \n", "3 1986-01-09 2017-08-03 2.0 \n", "4 1986-01-09 2017-08-03 2.0 \n", "... ... ... ... \n", "77774 2010-06-18 2016-12-22 2.0 \n", "77775 2010-06-14 2022-03-31 2.0 \n", "77776 2010-06-14 2012-05-18 2.0 \n", "77777 2010-06-29 2022-03-31 2.0 \n", "77778 2010-06-29 2022-03-31 2.0 \n", "\n", "[77779 rows x 16 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
permnonamedtnameenddtshrcdexchcdsiccdncusiptickercomnamshrclspermcohexcdcusipst_dateend_datenamedum
010000.01986-01-071987-06-1110.03.03990.068391610OMFGAOPTIMUM MANUFACTURING INCA7952.03.0683916101986-01-071987-06-112.0
110001.01986-01-091993-11-2111.03.04920.039040610GFGCGREAT FALLS GAS CONone7953.02.0367204101986-01-092017-08-032.0
210001.01993-11-222008-02-0411.03.04920.029274A10EWSTENERGY WEST INCNone7953.02.0367204101986-01-092017-08-032.0
310001.02008-02-052009-08-0311.03.04920.029274A20EWSTENERGY WEST INCNone7953.02.0367204101986-01-092017-08-032.0
410001.02009-08-042009-12-1711.03.04920.029269V10EGASENERGY INCNone7953.02.0367204101986-01-092017-08-032.0
...................................................
7777493433.02013-04-242016-12-2211.03.09999.092870X30VLTCVOLTARI CORPNone53451.03.092870X302010-06-182016-12-222.0
7777593434.02010-06-142022-03-3111.03.09999.078513510SANWS & W SEED CONone53427.03.0785135102010-06-142022-03-312.0
7777693435.02010-06-142012-05-1811.03.06163.082936G20SCEISINO CLEAN ENERGY INCNone53452.03.082936G202010-06-142012-05-182.0
7777793436.02010-06-292017-02-0111.03.09999.088160R10TSLATESLA MOTORS INCNone53453.03.088160R102010-06-292022-03-312.0
7777893436.02017-02-022022-03-3111.03.09999.088160R10TSLATESLA INCNone53453.03.088160R102010-06-292022-03-312.0
\n", "

77779 rows × 16 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "U6hDvCekqndW", "outputId": "e7e03d83-2453-4df6-a913-a0f976a60260" }, "source": [ "db" ], "execution_count": 13, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 13 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wxnrdyIOrxn0", "outputId": "10efc2e2-57a7-41d6-af75-fdff9e8f1174" }, "source": [ "stocknames.info()" ], "execution_count": 14, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 77779 entries, 0 to 77778\n", "Data columns (total 16 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 permno 77779 non-null float64\n", " 1 namedt 77779 non-null object \n", " 2 nameenddt 77779 non-null object \n", " 3 shrcd 77779 non-null float64\n", " 4 exchcd 77779 non-null float64\n", " 5 siccd 77779 non-null float64\n", " 6 ncusip 72335 non-null object \n", " 7 ticker 72881 non-null object \n", " 8 comnam 77779 non-null object \n", " 9 shrcls 5108 non-null object \n", " 10 permco 77779 non-null float64\n", " 11 hexcd 77779 non-null float64\n", " 12 cusip 77779 non-null object \n", " 13 st_date 77779 non-null object \n", " 14 end_date 77779 non-null object \n", " 15 namedum 77779 non-null float64\n", "dtypes: float64(7), object(9)\n", "memory usage: 9.5+ MB\n" ] } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9EDz9jaAsxU4", "outputId": "9a4f0137-7848-4fca-e03d-01d56ded246a" }, "source": [ "from google.colab import drive\n", "drive.mount(\"/content/gdrive\")" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Mounted at /content/gdrive\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "bpcPd86ysxeL" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mI7-yC5Ua816", "outputId": "644d0807-a2c4-4508-8b65-505f23a5eb91" }, "source": [ "cd /content/gdrive/My Drive/Colab Notebooks/" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "/content/gdrive/My Drive/Colab Notebooks\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "SrSuHLKZsxlh" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "Cl5xEmZKEWrJ" }, "source": [ "stocknames.to_csv('crsp_stocknames.csv')" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "crNiGqa_EWta" }, "source": [ "ls" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "OArlgLOvEWvc" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "RQpSxijNEWyW" }, "source": [ "" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "5XF8rnnt83bj" }, "source": [ "4. Now that we've examined the structure of our data by looking through its metadata, let's begin querying the data itself. First, let's take a peek at the first 100 rows of this database to see what we're working with. Using the LIMIT keyword to only return a small sample of the data, greatly speeding up the query a we only wish to get a quick summary.\n", "\n", "Both get_table() and raw_sql() can be used for this:" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 661 }, "id": "lLYHq5ra8gyB", "outputId": "8658a203-9e48-4fd1-894b-8d6ed3d0fd4a" }, "source": [ "db.get_table('crsp', 'dsf', obs=100)\n" ], "execution_count": 15, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " cusip permno permco issuno hexcd hsiccd date bidlo \\\n", "0 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-05 0.5000 \n", "1 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-08 0.4250 \n", "2 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-09 0.5101 \n", "3 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-10 0.5419 \n", "4 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-11 0.5200 \n", ".. ... ... ... ... ... ... ... ... \n", "95 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-21 18.8125 \n", "96 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-24 18.6250 \n", "97 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-25 18.1250 \n", "98 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-26 17.8750 \n", "99 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-27 17.8125 \n", "\n", " askhi prc vol ret bid ask shrout cfacpr \\\n", "0 0.6090 0.5999 9986.0 -0.045961 0.5999 0.6000 9296.0 1.0 \n", "1 0.6010 0.5700 31417.0 -0.049842 0.5699 0.5760 9296.0 1.0 \n", "2 0.5788 0.5788 9362.0 0.015439 0.5400 0.5770 9296.0 1.0 \n", "3 0.6000 0.5700 3878.0 -0.015204 0.5699 0.5700 9296.0 1.0 \n", "4 0.5600 0.5299 7930.0 -0.070351 0.5299 0.5300 9296.0 1.0 \n", ".. ... ... ... ... ... ... ... ... \n", "95 19.3125 19.0000 340136.0 0.006623 18.9375 19.1875 56689.0 1.0 \n", "96 19.2500 18.7500 173722.0 -0.013158 18.7500 18.8750 56689.0 1.0 \n", "97 18.7500 18.1875 322610.0 -0.030000 18.0000 18.1875 56689.0 1.0 \n", "98 18.2500 18.0625 357611.0 -0.006873 18.0000 18.0625 56689.0 1.0 \n", "99 18.1875 17.9375 50878.0 -0.006920 17.8125 17.9375 56689.0 1.0 \n", "\n", " cfacshr openprc numtrd retx \n", "0 1.0 0.5000 39.0 -0.045961 \n", "1 1.0 0.6000 53.0 -0.049842 \n", "2 1.0 0.5200 35.0 0.015439 \n", "3 1.0 0.6000 14.0 -0.015204 \n", "4 1.0 0.5600 21.0 -0.070351 \n", ".. ... ... ... ... \n", "95 1.0 19.0000 76.0 0.006623 \n", "96 1.0 19.0000 51.0 -0.013158 \n", "97 1.0 18.7500 98.0 -0.030000 \n", "98 1.0 18.0000 130.0 -0.006873 \n", "99 1.0 18.1875 42.0 -0.006920 \n", "\n", "[100 rows x 20 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnopermcoissunohexcdhsiccddatebidloaskhiprcvolretbidaskshroutcfacprcfacshropenprcnumtrdretx
07574791088371.037740.023794.03.07379.02016-02-050.50000.60900.59999986.0-0.0459610.59990.60009296.01.01.00.500039.0-0.045961
17574791088371.037740.023794.03.07379.02016-02-080.42500.60100.570031417.0-0.0498420.56990.57609296.01.01.00.600053.0-0.049842
27574791088371.037740.023794.03.07379.02016-02-090.51010.57880.57889362.00.0154390.54000.57709296.01.01.00.520035.00.015439
37574791088371.037740.023794.03.07379.02016-02-100.54190.60000.57003878.0-0.0152040.56990.57009296.01.01.00.600014.0-0.015204
47574791088371.037740.023794.03.07379.02016-02-110.52000.56000.52997930.0-0.0703510.52990.53009296.01.01.00.560021.0-0.070351
...............................................................
9591705J2088372.016512.023775.03.04832.02000-07-2118.812519.312519.0000340136.00.00662318.937519.187556689.01.01.019.000076.00.006623
9691705J2088372.016512.023775.03.04832.02000-07-2418.625019.250018.7500173722.0-0.01315818.750018.875056689.01.01.019.000051.0-0.013158
9791705J2088372.016512.023775.03.04832.02000-07-2518.125018.750018.1875322610.0-0.03000018.000018.187556689.01.01.018.750098.0-0.030000
9891705J2088372.016512.023775.03.04832.02000-07-2617.875018.250018.0625357611.0-0.00687318.000018.062556689.01.01.018.0000130.0-0.006873
9991705J2088372.016512.023775.03.04832.02000-07-2717.812518.187517.937550878.0-0.00692017.812517.937556689.01.01.018.187542.0-0.006920
\n", "

100 rows × 20 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 661 }, "id": "O_tNOkwQ8g1S", "outputId": "d1b202da-afcf-46c9-eb41-63e5b4db2777" }, "source": [ "db.raw_sql('select * from crsp.dsf LIMIT 100')" ], "execution_count": 16, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " cusip permno permco issuno hexcd hsiccd date bidlo \\\n", "0 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-05 0.5000 \n", "1 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-08 0.4250 \n", "2 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-09 0.5101 \n", "3 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-10 0.5419 \n", "4 75747910 88371.0 37740.0 23794.0 3.0 7379.0 2016-02-11 0.5200 \n", ".. ... ... ... ... ... ... ... ... \n", "95 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-21 18.8125 \n", "96 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-24 18.6250 \n", "97 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-25 18.1250 \n", "98 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-26 17.8750 \n", "99 91705J20 88372.0 16512.0 23775.0 3.0 4832.0 2000-07-27 17.8125 \n", "\n", " askhi prc vol ret bid ask shrout cfacpr \\\n", "0 0.6090 0.5999 9986.0 -0.045961 0.5999 0.6000 9296.0 1.0 \n", "1 0.6010 0.5700 31417.0 -0.049842 0.5699 0.5760 9296.0 1.0 \n", "2 0.5788 0.5788 9362.0 0.015439 0.5400 0.5770 9296.0 1.0 \n", "3 0.6000 0.5700 3878.0 -0.015204 0.5699 0.5700 9296.0 1.0 \n", "4 0.5600 0.5299 7930.0 -0.070351 0.5299 0.5300 9296.0 1.0 \n", ".. ... ... ... ... ... ... ... ... \n", "95 19.3125 19.0000 340136.0 0.006623 18.9375 19.1875 56689.0 1.0 \n", "96 19.2500 18.7500 173722.0 -0.013158 18.7500 18.8750 56689.0 1.0 \n", "97 18.7500 18.1875 322610.0 -0.030000 18.0000 18.1875 56689.0 1.0 \n", "98 18.2500 18.0625 357611.0 -0.006873 18.0000 18.0625 56689.0 1.0 \n", "99 18.1875 17.9375 50878.0 -0.006920 17.8125 17.9375 56689.0 1.0 \n", "\n", " cfacshr openprc numtrd retx \n", "0 1.0 0.5000 39.0 -0.045961 \n", "1 1.0 0.6000 53.0 -0.049842 \n", "2 1.0 0.5200 35.0 0.015439 \n", "3 1.0 0.6000 14.0 -0.015204 \n", "4 1.0 0.5600 21.0 -0.070351 \n", ".. ... ... ... ... \n", "95 1.0 19.0000 76.0 0.006623 \n", "96 1.0 19.0000 51.0 -0.013158 \n", "97 1.0 18.7500 98.0 -0.030000 \n", "98 1.0 18.0000 130.0 -0.006873 \n", "99 1.0 18.1875 42.0 -0.006920 \n", "\n", "[100 rows x 20 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnopermcoissunohexcdhsiccddatebidloaskhiprcvolretbidaskshroutcfacprcfacshropenprcnumtrdretx
07574791088371.037740.023794.03.07379.02016-02-050.50000.60900.59999986.0-0.0459610.59990.60009296.01.01.00.500039.0-0.045961
17574791088371.037740.023794.03.07379.02016-02-080.42500.60100.570031417.0-0.0498420.56990.57609296.01.01.00.600053.0-0.049842
27574791088371.037740.023794.03.07379.02016-02-090.51010.57880.57889362.00.0154390.54000.57709296.01.01.00.520035.00.015439
37574791088371.037740.023794.03.07379.02016-02-100.54190.60000.57003878.0-0.0152040.56990.57009296.01.01.00.600014.0-0.015204
47574791088371.037740.023794.03.07379.02016-02-110.52000.56000.52997930.0-0.0703510.52990.53009296.01.01.00.560021.0-0.070351
...............................................................
9591705J2088372.016512.023775.03.04832.02000-07-2118.812519.312519.0000340136.00.00662318.937519.187556689.01.01.019.000076.00.006623
9691705J2088372.016512.023775.03.04832.02000-07-2418.625019.250018.7500173722.0-0.01315818.750018.875056689.01.01.019.000051.0-0.013158
9791705J2088372.016512.023775.03.04832.02000-07-2518.125018.750018.1875322610.0-0.03000018.000018.187556689.01.01.018.750098.0-0.030000
9891705J2088372.016512.023775.03.04832.02000-07-2617.875018.250018.0625357611.0-0.00687318.000018.062556689.01.01.018.0000130.0-0.006873
9991705J2088372.016512.023775.03.04832.02000-07-2717.812518.187517.937550878.0-0.00692017.812517.937556689.01.01.018.187542.0-0.006920
\n", "

100 rows × 20 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 16 } ] }, { "cell_type": "markdown", "metadata": { "id": "nft0eD3H9IOA" }, "source": [ "5. From our results to #3 above, let's say we've decided to only work with the cusip, permno, date, bidlo, and askhi variables. We can request only those five specific columns by modifying the above like so, again using both get_table() and raw_sql():\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "1byJYBhu9PLX", "outputId": "08a962b4-bb87-4c12-d27c-2f8415f56bbd" }, "source": [ "db.get_table('crsp', 'dsf', columns=['cusip, permno, date, bidlo, askhi'], obs=100, )" ], "execution_count": 17, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " cusip permno date bidlo askhi\n", "0 75747910 88371.0 2016-02-05 0.5000 0.6090\n", "1 75747910 88371.0 2016-02-08 0.4250 0.6010\n", "2 75747910 88371.0 2016-02-09 0.5101 0.5788\n", "3 75747910 88371.0 2016-02-10 0.5419 0.6000\n", "4 75747910 88371.0 2016-02-11 0.5200 0.5600\n", ".. ... ... ... ... ...\n", "95 91705J20 88372.0 2000-07-21 18.8125 19.3125\n", "96 91705J20 88372.0 2000-07-24 18.6250 19.2500\n", "97 91705J20 88372.0 2000-07-25 18.1250 18.7500\n", "98 91705J20 88372.0 2000-07-26 17.8750 18.2500\n", "99 91705J20 88372.0 2000-07-27 17.8125 18.1875\n", "\n", "[100 rows x 5 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnodatebidloaskhi
07574791088371.02016-02-050.50000.6090
17574791088371.02016-02-080.42500.6010
27574791088371.02016-02-090.51010.5788
37574791088371.02016-02-100.54190.6000
47574791088371.02016-02-110.52000.5600
..................
9591705J2088372.02000-07-2118.812519.3125
9691705J2088372.02000-07-2418.625019.2500
9791705J2088372.02000-07-2518.125018.7500
9891705J2088372.02000-07-2617.875018.2500
9991705J2088372.02000-07-2717.812518.1875
\n", "

100 rows × 5 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 17 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "LEuK04z99QOv", "outputId": "0eac177a-2cd6-4bed-8043-b15bf6ebee57" }, "source": [ "db.raw_sql('select cusip, permno, date, bidlo, askhi from crsp.dsf LIMIT 100')" ], "execution_count": 18, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " cusip permno date bidlo askhi\n", "0 75747910 88371.0 2016-02-05 0.5000 0.6090\n", "1 75747910 88371.0 2016-02-08 0.4250 0.6010\n", "2 75747910 88371.0 2016-02-09 0.5101 0.5788\n", "3 75747910 88371.0 2016-02-10 0.5419 0.6000\n", "4 75747910 88371.0 2016-02-11 0.5200 0.5600\n", ".. ... ... ... ... ...\n", "95 91705J20 88372.0 2000-07-21 18.8125 19.3125\n", "96 91705J20 88372.0 2000-07-24 18.6250 19.2500\n", "97 91705J20 88372.0 2000-07-25 18.1250 18.7500\n", "98 91705J20 88372.0 2000-07-26 17.8750 18.2500\n", "99 91705J20 88372.0 2000-07-27 17.8125 18.1875\n", "\n", "[100 rows x 5 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnodatebidloaskhi
07574791088371.02016-02-050.50000.6090
17574791088371.02016-02-080.42500.6010
27574791088371.02016-02-090.51010.5788
37574791088371.02016-02-100.54190.6000
47574791088371.02016-02-110.52000.5600
..................
9591705J2088372.02000-07-2118.812519.3125
9691705J2088372.02000-07-2418.625019.2500
9791705J2088372.02000-07-2518.125018.7500
9891705J2088372.02000-07-2617.875018.2500
9991705J2088372.02000-07-2717.812518.1875
\n", "

100 rows × 5 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "markdown", "metadata": { "id": "iKbzMfob9Sxo" }, "source": [ "\n", "6. Run the query again, but filter the query for 'permno', limiting results to a single day:\n", "\n", "Since you are now filtering by specific values for variables, use raw_sql() exclusively. All future examples will be shown using raw_sql().\n", "\n", "NOTE: Use single quotation marks around the date value and double quotation marks elsewhere as shown below.\n" ] }, { "cell_type": "code", "metadata": { "id": "V0kVRYie9R3S", "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "outputId": "2bed0e65-c2ee-40a6-f36a-1cebff83a660" }, "source": [ "db.raw_sql(\"select cusip, permno, date, bidlo, askhi \"\n", " \"from crsp.dsf \"\n", " \"where permno in (14593, 90319, 12490, 17778) \"\n", " \"and date='2013-01-04'\")" ], "execution_count": 19, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " cusip permno date bidlo askhi\n", "0 45920010 12490.0 2013-01-04 192.779999 194.460007\n", "1 03783310 14593.0 2013-01-04 525.828613 538.629883\n", "2 08467010 17778.0 2013-01-04 140047.000000 141003.796875\n", "3 02079K30 90319.0 2013-01-04 727.680115 741.469971" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnodatebidloaskhi
04592001012490.02013-01-04192.779999194.460007
10378331014593.02013-01-04525.828613538.629883
20846701017778.02013-01-04140047.000000141003.796875
302079K3090319.02013-01-04727.680115741.469971
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 19 } ] }, { "cell_type": "markdown", "metadata": { "id": "-TiH_xuh9ZCp" }, "source": [ "6. Determine high 'askhi' values by running a query to get a list of dates where 'permno' values posted an Ask Price over $2,000 between the years 2010 and 2013, no longer limiting the number of returned rows (as this is a pretty specific query):\n" ] }, { "cell_type": "code", "metadata": { "id": "DCfzAFgs9R6P", "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "outputId": "3d5bfc62-037e-4a53-e9e3-b7c3c2dc7d4b" }, "source": [ "db.raw_sql(\"select cusip,permno,date,bidlo,askhi from crsp.dsf where permno in (14593, 90319, 12490, 17778) and date between '2010-01-01' and '2013-12-31' and askhi > 2000\")\n" ], "execution_count": 22, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " cusip permno date bidlo askhi\n", "0 08467010 17778.0 2010-01-04 99201.00000 99910.0\n", "1 08467010 17778.0 2010-01-05 99550.00000 100001.0\n", "2 08467010 17778.0 2010-01-06 99500.00000 100000.0\n", "3 08467010 17778.0 2010-01-07 99594.00000 100000.0\n", "4 08467010 17778.0 2010-01-08 99700.00000 100300.0\n", "... ... ... ... ... ...\n", "1001 08467010 17778.0 2013-12-24 175555.09375 176063.0\n", "1002 08467010 17778.0 2013-12-26 175610.00000 176900.0\n", "1003 08467010 17778.0 2013-12-27 176678.00000 177320.0\n", "1004 08467010 17778.0 2013-12-30 176655.00000 177685.0\n", "1005 08467010 17778.0 2013-12-31 176793.00000 177950.0\n", "\n", "[1006 rows x 5 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cusippermnodatebidloaskhi
00846701017778.02010-01-0499201.0000099910.0
10846701017778.02010-01-0599550.00000100001.0
20846701017778.02010-01-0699500.00000100000.0
30846701017778.02010-01-0799594.00000100000.0
40846701017778.02010-01-0899700.00000100300.0
..................
10010846701017778.02013-12-24175555.09375176063.0
10020846701017778.02013-12-26175610.00000176900.0
10030846701017778.02013-12-27176678.00000177320.0
10040846701017778.02013-12-30176655.00000177685.0
10050846701017778.02013-12-31176793.00000177950.0
\n", "

1006 rows × 5 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 22 } ] }, { "cell_type": "markdown", "metadata": { "id": "BlyzFht69efQ" }, "source": [ "7. Only a single 'permno' posted a high Ask price during this date range. Open the search to all permnos that have ever posted an Ask Price over $2,000 in any date range (use distinct to return only one entry per matching permno). Since the query is against all permnos and the entire date range of the dataset, this query may take a little longer:\n" ] }, { "cell_type": "code", "metadata": { "id": "yK3Tkvq89fDw", "colab": { "base_uri": "https://localhost:8080/", "height": 614 }, "outputId": "0be7c77b-663d-425a-d9af-87295cbbbdf0" }, "source": [ "db.raw_sql('select distinct permno from crsp.dsf where askhi > 2000')" ], "execution_count": 21, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " permno\n", "0 14542.0\n", "1 14752.0\n", "2 15395.0\n", "3 16280.0\n", "4 17486.0\n", "5 17778.0\n", "6 17881.0\n", "7 21709.0\n", "8 36281.0\n", "9 46068.0\n", "10 76605.0\n", "11 79785.0\n", "12 83443.0\n", "13 84788.0\n", "14 86783.0\n", "15 90319.0\n", "16 92221.0\n", "17 93436.0" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
permno
014542.0
114752.0
215395.0
316280.0
417486.0
517778.0
617881.0
721709.0
836281.0
946068.0
1076605.0
1179785.0
1283443.0
1384788.0
1486783.0
1590319.0
1692221.0
1793436.0
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 21 } ] }, { "cell_type": "markdown", "metadata": { "id": "tDsv6YWT9fK2" }, "source": [ "8. Retrieve all dates for which an Ask Price over $2000 was posted, along with the permnos that posted them. This will give a list of dates that match, with an additional entry for that date if additional permnos match as well. By 2011, there are two results:\n" ] }, { "cell_type": "code", "metadata": { "id": "9-r_K9hv9fR4", "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "outputId": "5dad2269-978f-4a7a-ba21-70faec1949e3" }, "source": [ "db.raw_sql(\"select distinct date,permno from crsp.dsf where askhi > 2000 order by date\")\n" ], "execution_count": 23, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " date permno\n", "0 1926-12-23 14752.0\n", "1 1926-12-24 14752.0\n", "2 1927-01-07 14752.0\n", "3 1927-01-08 14752.0\n", "4 1948-05-15 16280.0\n", "... ... ...\n", "17986 2022-03-31 76605.0\n", "17987 2022-03-31 79785.0\n", "17988 2022-03-31 84788.0\n", "17989 2022-03-31 86783.0\n", "17990 2022-03-31 90319.0\n", "\n", "[17991 rows x 2 columns]" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datepermno
01926-12-2314752.0
11926-12-2414752.0
21927-01-0714752.0
31927-01-0814752.0
41948-05-1516280.0
.........
179862022-03-3176605.0
179872022-03-3179785.0
179882022-03-3184788.0
179892022-03-3186783.0
179902022-03-3190319.0
\n", "

17991 rows × 2 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 23 } ] }, { "cell_type": "markdown", "metadata": { "id": "ut3YEa1K9fXr" }, "source": [ "\n", "9. Query for the highest Ask ever posted (searching only through Asks over $2000), on what date it posted, and which permno posted it. Use limit 1 to speed up the search since only the top value is desired:\n" ] }, { "cell_type": "code", "metadata": { "id": "blJA--OY9fcy", "colab": { "base_uri": "https://localhost:8080/", "height": 81 }, "outputId": "59f30d33-95a1-43c2-fab5-234987be953e" }, "source": [ "db.raw_sql('select permno,askhi,date from crsp.dsf where askhi > 2000 order by askhi desc LIMIT 1')\n" ], "execution_count": 24, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " permno askhi date\n", "0 17778.0 544389.25 2022-03-29" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
permnoaskhidate
017778.0544389.252022-03-29
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 24 } ] }, { "cell_type": "markdown", "metadata": { "id": "1wu1hY2X9fh4" }, "source": [ "This is one example of how you might approach an analytical task using Python. It begins by gathering metadata information using list_libraries() and list_tables() to learn more about the data structure available, and then using that information to run meaningful queries from the data itself using get_table() and raw_sql()..\n", "\n", "A common next step would be to write a batch program that uses the above one-off queries together. An example might be a program that uses a loop to iterate over each permno that has ever posted an Ask Price over $2000 and to calculate how long the date range was able to maintain that height. Or perhaps certain dates were more prolific than others - tallying the number of high asks per date might be informative." ] }, { "cell_type": "code", "metadata": { "id": "_6P3sJAa8Sv7" }, "source": [ "" ], "execution_count": null, "outputs": [] } ] }