{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "##内容索引\n", "这一小节介绍NumPy的常用函数。\n", "\n", "1. 读入csv\n", "loadtxt函数\n", "\n", "2. 计算平均值\n", "average、mean函数\n", "\n", "3. 求最大最小值\n", "max、min函数\n", "\n", "4. 计算中位数\n", "median、msort函数\n", "\n", "5. 计算方差\n", "var函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##1. 读写CSV文件\n", "CSV(Comma-Separated Value,逗号分隔值)格式是一种常见的文件格式。通常,数据库的转存文件就是csv格式的,文件中的各个字段对应于数据库中的列。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Numpy中的loadtxt函数可以方便地读取csv文件,自动切分字段,并将数据载入Numpy数组。**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "c, v = np.loadtxt('data.csv', delimiter=',', usecols=(6,7), unpack=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "data.csv文件是苹果公司的历史股价数据。第一列为股票代码,第二列为dd-mm-yyyy格式的日期,第三列为空,随后各列依次是**开盘价(4)、最高价(5)、最低价(6)和收盘价(7)**,最后一列为当日的成交量(8)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "loadtxt函数中,*usecols参数*为一个元组,以获得第7字段至第8字段的数据,也就是股票的收盘价和成交量数据。\n", "\n", "*unpack参数*设置为True,意思是分拆存储不同列的数据,即分别将收盘价和成交量的数组赋值给变量c和v。\n", "\n", "**用usecols中的参数指定我们感兴趣的数据列,并将unpack参数设置为True使得不同列的数据分别存储。**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 336.1 , 339.32, 345.03, 344.32, 343.44, 346.5 , 351.88,\n", " 355.2 , 358.16, 354.54, 356.85, 359.18, 359.9 , 363.13,\n", " 358.3 , 350.56, 338.61, 342.62, 342.88, 348.16, 353.21,\n", " 349.31, 352.12, 359.56, 360. , 355.36, 355.76, 352.47,\n", " 346.67, 351.99])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 21144800., 13473000., 15236800., 9242600., 14064100.,\n", " 11494200., 17322100., 13608500., 17240800., 33162400.,\n", " 13127500., 11086200., 10149000., 17184100., 18949000.,\n", " 29144500., 31162200., 23994700., 17853500., 13572000.,\n", " 14395400., 16290300., 21521000., 17885200., 16188000.,\n", " 19504300., 12718000., 16192700., 18138800., 16824200.])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#选择第4列,开盘价\n", "opening_price = np.loadtxt('data.csv', delimiter=',', usecols=(3,), unpack=True)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 344.17 335.8 341.3 344.45 343.8 343.61 347.89 353.68 355.19\n", " 357.39 354.75 356.79 359.19 360.8 357.1 358.21 342.05 338.77\n", " 344.02 345.29 351.21 355.47 349.96 357.2 360.07 361.11 354.91\n", " 354.69 349.69 345.4 ]\n" ] } ], "source": [ "print opening_price" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##2. 计算平均值" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###2.1 计算加权平均\n", "VWAP是Volume-Weighted Average Price,成交量加权平均价格,某个价格的成交量越高,该价格所占的权重就越大。**VWAP就是以成交量为权重计算出来的加权平均值。**" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "VWAP = 350.589549353\n" ] } ], "source": [ "vwap = np.average(c, weights=v)\n", "print \"VWAP =\", vwap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TWAP是Time0Weighted Average Price,时间加权平均价格,其基本思想是最近的价格重要性大一些,所以我们应该对近期的价格给以较高的权重。\n", "我们使用arange函数创建从0递增的自然数序列,自然数的个数即为收盘价的个数。" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "twap = 352.428321839\n" ] } ], "source": [ "t = np.arange(len(c))\n", "print \"twap = \",np.average(c, weights=t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###2.2 算术平均\n", "使用mean函数计算算术平均" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean = 351.037666667\n" ] } ], "source": [ "mean = np.mean(c)\n", "print \"mean = \",mean" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean = 351.037666667\n" ] } ], "source": [ "print \"mean = \", c.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##3. 求最大最小值和取值范围" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "步骤:读入最高价和最低价,使用max和min函数得到最大最小值。" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hightest = 364.9\n", "lowest = 333.53\n" ] } ], "source": [ "h,l = np.loadtxt('data.csv', delimiter=',', usecols=(4,5), unpack=True)\n", "print 'hightest = ', np.max(h)\n", "print 'lowest = ', np.min(l)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "numpy中ptp函数可以计算数组的取值范围。该函数返回的是数组元素最大值和最小值的差值,即max(array)-min(array)。" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Spread high price : 24.86\n", "Spread low price : 26.97\n" ] } ], "source": [ "print 'Spread high price : ', np.ptp(h)\n", "print 'Spread low price : ', np.ptp(l)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##4. 计算中位数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "计算收盘价的中位数" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "median = 352.055\n" ] } ], "source": [ "closing_price = np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)\n", "print 'median = ', np.median(closing_price)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对数组进行排序,之后再去中位数" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sorted_closing_price = [ 336.1 338.61 339.32 342.62 342.88 343.44 344.32 345.03 346.5\n", " 346.67 348.16 349.31 350.56 351.88 351.99 352.12 352.47 353.21\n", " 354.54 355.2 355.36 355.76 356.85 358.16 358.3 359.18 359.56\n", " 359.9 360. 363.13]\n" ] } ], "source": [ "sorted_closing = np.msort(closing_price)\n", "print \"sorted_closing_price = \", sorted_closing" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "median = 352.055\n" ] } ], "source": [ "#先判断数组的个数是奇数还是偶数\n", "N = len(closing_price)\n", "median_ind = (N-1)/2\n", "if N & 0x1 :\n", " print \"median = \", sorted_closing[median_ind]\n", "else:\n", " print \"median = \", (sorted_closing[median_ind]+sorted_closing[median_ind+1])/2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##5. 计算方差\n", "方差体现变量变化的程度,股价变动过于剧烈的股票一定会给持有者制造麻烦。" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "variance = 50.1265178889\n" ] } ], "source": [ "print \"variance = \", np.var(closing_price)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "variance from definition = 50.1265178889\n" ] } ], "source": [ "#手动求方差\n", "print 'variance from definition = ', np.mean( (closing_price-c.mean())**2 )" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.5" } }, "nbformat": 4, "nbformat_minor": 0 }