{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PyStore Tutorial\n", "\n", "\n", "[PyStore](https://github.com/ranaroussi/pystore) is a utility that provides developers with fast data storage for Pandas dataframes. It's built on top of Pandas, Numpy and Dask and stores the data in the Parquet file format (via Fastparquet) in a hierarchical directory structure. Files are compressed using Snappy, a fast and efficient compression/decompression library from Google.\n", "\n", "The end result is a fast, powerful, and pythonic datastore for Pandas dataframes that can **easily query millions of rows in sub-second speed**.\n", "\n", "PyStore was designed with storing timeseries data in mind. It provides namespaced collections of data. These collections allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace) maps to a directory containing partitioned parquet files for each item (e.g. symbol)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's get started" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll start with importing the necessary libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pystore\n", "import quandl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's get some market data to work with. We'll use Quandl's API to download 37+ years worth of historical data for Apple's stock." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeEx-DividendSplit RatioAdj. OpenAdj. HighAdj. LowAdj. CloseAdj. Volume
Date
1980-12-1228.7528.8728.7528.752093900.00.01.00.4227060.4244700.4227060.422706117258400.0
1980-12-1527.3827.3827.2527.25785200.00.01.00.4025630.4025630.4006520.40065243971200.0
1980-12-1625.3725.3725.2525.25472000.00.01.00.3730100.3730100.3712460.37124626432000.0
1980-12-1725.8726.0025.8725.87385900.00.01.00.3803620.3822730.3803620.38036221610400.0
1980-12-1826.6326.7526.6326.63327900.00.01.00.3915360.3933000.3915360.39153618362400.0
\n", "
" ], "text/plain": [ " Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume\n", "Date \n", "1980-12-12 28.75 28.87 28.75 28.75 2093900.0 0.0 1.0 0.422706 0.424470 0.422706 0.422706 117258400.0\n", "1980-12-15 27.38 27.38 27.25 27.25 785200.0 0.0 1.0 0.402563 0.402563 0.400652 0.400652 43971200.0\n", "1980-12-16 25.37 25.37 25.25 25.25 472000.0 0.0 1.0 0.373010 0.373010 0.371246 0.371246 26432000.0\n", "1980-12-17 25.87 26.00 25.87 25.87 385900.0 0.0 1.0 0.380362 0.382273 0.380362 0.380362 21610400.0\n", "1980-12-18 26.63 26.75 26.63 26.63 327900.0 0.0 1.0 0.391536 0.393300 0.391536 0.391536 18362400.0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl = quandl.get('WIKI/AAPL')\n", "aapl.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can take a look at the storage path" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'~/pystore'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pystore.get_path()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This path can be changed by calling the ``set_path()`` mathod:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'./pystore_demo'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set storage path\n", "pystore.set_path('./pystore_demo')\n", "\n", "# show the new storage path\n", "pystore.get_path()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can get a list of datastores found this location. Since we're just getting started, all we'll get is an empty list." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./pystore_demo\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# List stores\n", "pystore.list_stores()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating / connecting to our datastore\n", "\n", "When connecting to a datastore, if it doesn't exist, it will be automatically created." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PyStore.datastore <./pystore_demo/mydatastore>" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "store = pystore.store('mydatastore')\n", "store" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now when we call ``pystore.list_stores()`` we'll get a list with our new datastore listed." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./pystore_demo\n" ] }, { "data": { "text/plain": [ "['mydatastore']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pystore.list_stores()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating / connecting to a Collection\n", "\n", "Before we can save our APPL time-series data, we need to create a **Collection**. As mentioned earlier, each collection (or namespace) maps to a directory containing partitioned parquet files for each item (e.g. symbol).\n", "\n", "When connecting to a collection, if it doesn't exist, it will be automatically created." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PyStore.collection " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Access a collection (create it if not exist)\n", "collection = store.collection('NASDAQ.EOD')\n", "collection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, when we list all collections in the datastore, we can see out newly created collection:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['NASDAQ.EOD']" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "store.list_collections()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Working with collection items" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before saving our data, let's see if there are any existing items in the collection:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.list_items()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we're ready to store our data. For demo purposes, we won't be storing the last row, which will be appended later. We'll also attach some metadata indicating the data source." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "collection.write('AAPL', aapl[:-1], metadata={'source': 'Quandl'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, when we list all items in the collections, we can see out newly created item:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['AAPL']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.list_items()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's read the item from the datastore's collection:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PyStore.item " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Read the item's data\n", "item = collection.item('AAPL')\n", "item" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The item object has two main properties: **data**, which returns a Dask dataframe, and **metadata** which returns the metadata we attached to our item, along with an \"updated\" timestamp\". To learn more about Dask dataframes and their capacilities, visit http://dask.pydata.org/en/latest/dataframe.html." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Dask DataFrame Structure:
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeEx-DividendSplit RatioAdj. OpenAdj. HighAdj. LowAdj. CloseAdj. Volume
npartitions=1
1980-12-12float64float64float64float64float64float64float64float64float64float64float64float64
2018-03-26....................................
\n", "
\n", "
Dask Name: read-parquet, 1 tasks
" ], "text/plain": [ "Dask DataFrame Structure:\n", " Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume\n", "npartitions=1 \n", "1980-12-12 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64\n", "2018-03-26 ... ... ... ... ... ... ... ... ... ... ... ...\n", "Dask Name: read-parquet, 1 tasks" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = item.data\n", "data" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'source': 'Quandl', '_updated': '2018-06-06 13:01:07.746784'}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To load the item's data as a Pandas dataframe, we call the ``to_pandas()`` method." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeEx-DividendSplit RatioAdj. OpenAdj. HighAdj. LowAdj. CloseAdj. Volume
Date
2018-03-20175.24176.80174.94175.24019314039.00.01.0175.24176.80174.94175.24019314039.0
2018-03-21175.04175.09171.26171.27035247358.00.01.0175.04175.09171.26171.27035247358.0
2018-03-22170.00172.68168.60168.84541051076.00.01.0170.00172.68168.60168.84541051076.0
2018-03-23168.39169.92164.94164.94040248954.00.01.0168.39169.92164.94164.94040248954.0
2018-03-26168.07173.10166.44172.77036272617.00.01.0168.07173.10166.44172.77036272617.0
\n", "
" ], "text/plain": [ " Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume\n", "Date \n", "2018-03-20 175.24 176.80 174.94 175.240 19314039.0 0.0 1.0 175.24 176.80 174.94 175.240 19314039.0\n", "2018-03-21 175.04 175.09 171.26 171.270 35247358.0 0.0 1.0 175.04 175.09 171.26 171.270 35247358.0\n", "2018-03-22 170.00 172.68 168.60 168.845 41051076.0 0.0 1.0 170.00 172.68 168.60 168.845 41051076.0\n", "2018-03-23 168.39 169.92 164.94 164.940 40248954.0 0.0 1.0 168.39 169.92 164.94 164.940 40248954.0\n", "2018-03-26 168.07 173.10 166.44 172.770 36272617.0 0.0 1.0 168.07 173.10 166.44 172.770 36272617.0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = item.to_pandas()\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's append the last day (row) to our item:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeEx-DividendSplit RatioAdj. OpenAdj. HighAdj. LowAdj. CloseAdj. Volume
Date
2018-03-21175.04175.09171.26171.27035247358.00.01.0175.04175.09171.26171.27035247358.0
2018-03-22170.00172.68168.60168.84541051076.00.01.0170.00172.68168.60168.84541051076.0
2018-03-23168.39169.92164.94164.94040248954.00.01.0168.39169.92164.94164.94040248954.0
2018-03-26168.07173.10166.44172.77036272617.00.01.0168.07173.10166.44172.77036272617.0
2018-03-27173.68175.15166.92168.34038962839.00.01.0173.68175.15166.92168.34038962839.0
\n", "
" ], "text/plain": [ " Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume\n", "Date \n", "2018-03-21 175.04 175.09 171.26 171.270 35247358.0 0.0 1.0 175.04 175.09 171.26 171.270 35247358.0\n", "2018-03-22 170.00 172.68 168.60 168.845 41051076.0 0.0 1.0 170.00 172.68 168.60 168.845 41051076.0\n", "2018-03-23 168.39 169.92 164.94 164.940 40248954.0 0.0 1.0 168.39 169.92 164.94 164.940 40248954.0\n", "2018-03-26 168.07 173.10 166.44 172.770 36272617.0 0.0 1.0 168.07 173.10 166.44 172.770 36272617.0\n", "2018-03-27 173.68 175.15 166.92 168.340 38962839.0 0.0 1.0 173.68 175.15 166.92 168.340 38962839.0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.append('AAPL', aapl[-1:])\n", "\n", "df = collection.item('AAPL').to_pandas()\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Querying Collections\n", "\n", "After a while, you'll have many items stored, and you may want to look some of them up by metadata. To do this, simply add your metadata key to the list_items method:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['AAPL']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.list_items(source='Quandl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Working with Snapshots\n", "\n", "When working with data, there will be times when you'll accidentally mess up the data, making it unusable. For that reason, PyStore allows you to create snapshots - a point-in-time, named reference for all current items in a collection.\n", "\n", "First, let's see if we have any existing snapshots:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.list_snapshots()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating a snapshot is done using the ``create_snapshot`` method:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['snapshot_name']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.create_snapshot('snapshot_name')\n", "\n", "collection.list_snapshots()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see how snapshots work, let's change our original AAPL to only include the `Close` and `Volume` columns." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CloseVolume
Date
2018-03-21171.27035247358.0
2018-03-22168.84541051076.0
2018-03-23164.94040248954.0
2018-03-26172.77036272617.0
2018-03-27168.34038962839.0
\n", "
" ], "text/plain": [ " Close Volume\n", "Date \n", "2018-03-21 171.270 35247358.0\n", "2018-03-22 168.845 41051076.0\n", "2018-03-23 164.940 40248954.0\n", "2018-03-26 172.770 36272617.0\n", "2018-03-27 168.340 38962839.0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.write('AAPL', aapl[['Close', 'Volume']],\n", " metadata={'source': 'Quandl'},\n", " overwrite=True)\n", "\n", "# Load the \"new\" item\n", "df = collection.item('AAPL').to_pandas()\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To load the item from a previous snapshot, we just need to specity it when using the ``item()`` method:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeEx-DividendSplit RatioAdj. OpenAdj. HighAdj. LowAdj. CloseAdj. Volume
Date
2018-03-21175.04175.09171.26171.27035247358.00.01.0175.04175.09171.26171.27035247358.0
2018-03-22170.00172.68168.60168.84541051076.00.01.0170.00172.68168.60168.84541051076.0
2018-03-23168.39169.92164.94164.94040248954.00.01.0168.39169.92164.94164.94040248954.0
2018-03-26168.07173.10166.44172.77036272617.00.01.0168.07173.10166.44172.77036272617.0
2018-03-27173.68175.15166.92168.34038962839.00.01.0173.68175.15166.92168.34038962839.0
\n", "
" ], "text/plain": [ " Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume\n", "Date \n", "2018-03-21 175.04 175.09 171.26 171.270 35247358.0 0.0 1.0 175.04 175.09 171.26 171.270 35247358.0\n", "2018-03-22 170.00 172.68 168.60 168.845 41051076.0 0.0 1.0 170.00 172.68 168.60 168.845 41051076.0\n", "2018-03-23 168.39 169.92 164.94 164.940 40248954.0 0.0 1.0 168.39 169.92 164.94 164.940 40248954.0\n", "2018-03-26 168.07 173.10 166.44 172.770 36272617.0 0.0 1.0 168.07 173.10 166.44 172.770 36272617.0\n", "2018-03-27 173.68 175.15 166.92 168.340 38962839.0 0.0 1.0 173.68 175.15 166.92 168.340 38962839.0" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "snap_df = collection.item('AAPL', snapshot='snapshot_name')\n", "snap_df.to_pandas().tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can, of course, restore our data from the snapshot:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseVolumeEx-DividendSplit RatioAdj. OpenAdj. HighAdj. LowAdj. CloseAdj. Volume
Date
2018-03-21175.04175.09171.26171.27035247358.00.01.0175.04175.09171.26171.27035247358.0
2018-03-22170.00172.68168.60168.84541051076.00.01.0170.00172.68168.60168.84541051076.0
2018-03-23168.39169.92164.94164.94040248954.00.01.0168.39169.92164.94164.94040248954.0
2018-03-26168.07173.10166.44172.77036272617.00.01.0168.07173.10166.44172.77036272617.0
2018-03-27173.68175.15166.92168.34038962839.00.01.0173.68175.15166.92168.34038962839.0
\n", "
" ], "text/plain": [ " Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume\n", "Date \n", "2018-03-21 175.04 175.09 171.26 171.270 35247358.0 0.0 1.0 175.04 175.09 171.26 171.270 35247358.0\n", "2018-03-22 170.00 172.68 168.60 168.845 41051076.0 0.0 1.0 170.00 172.68 168.60 168.845 41051076.0\n", "2018-03-23 168.39 169.92 164.94 164.940 40248954.0 0.0 1.0 168.39 169.92 164.94 164.940 40248954.0\n", "2018-03-26 168.07 173.10 166.44 172.770 36272617.0 0.0 1.0 168.07 173.10 166.44 172.770 36272617.0\n", "2018-03-27 173.68 175.15 166.92 168.340 38962839.0 0.0 1.0 173.68 175.15 166.92 168.340 38962839.0" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "collection.write('AAPL', snap_df,\n", " metadata={'source': 'Quandl'},\n", " overwrite=True)\n", "\n", "df = collection.item('AAPL').to_pandas()\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we're sure we no longer have the need for this snapshot, we can delete it." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete a collection snapshot\n", "collection.delete_snapshot('snapshot_name')\n", "\n", "# To delete all snapshots, use:\n", "# collection.delete_snapshots()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Deleting items, collections and stores" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete the item from the current version\n", "collection.delete_item('AAPL')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete the collection\n", "store.delete_collection('NASDAQ.EOD')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Delete the datastore\n", "pystore.delete_store('mydatastore')\n", "\n", "# to delete all datastores in the path, use:\n", "# pystore.delete_stores()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }