{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 数组排序" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using matplotlib backend: Qt4Agg\n", "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## sort 函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "先看这个例子:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 20.8, 53.4, 61.8, 93.2])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names = array(['bob', 'sue', 'jan', 'ad'])\n", "weights = array([20.8, 93.2, 53.4, 61.8])\n", "\n", "sort(weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`sort` 返回的结果是从小到大排列的。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## argsort 函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`argsort` 返回从小到大的排列在数组中的索引位置:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([0, 2, 3, 1], dtype=int64)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ordered_indices = argsort(weights)\n", "ordered_indices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以用它来进行索引:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 20.8, 53.4, 61.8, 93.2])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights[ordered_indices]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array(['bob', 'jan', 'ad', 'sue'], \n", " dtype='|S3')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names[ordered_indices]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用函数并不会改变原来数组的值:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 20.8, 93.2, 53.4, 61.8])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weights" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## sort 和 argsort 方法" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "数组也支持方法操作:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([0, 2, 3, 1], dtype=int64)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = array([20.8, 93.2, 53.4, 61.8])\n", "data.argsort()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`argsort` 方法与 `argsort` 函数的使用没什么区别,也不会改变数组的值。" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 20.8, 93.2, 53.4, 61.8])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "但是 `sort`方法会改变数组的值:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data.sort()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 20.8, 53.4, 61.8, 93.2])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 二维数组排序" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于多维数组,sort方法默认沿着最后一维开始排序:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.2, 0.1, 0.5],\n", " [ 0.4, 0.8, 0.3],\n", " [ 0.9, 0.6, 0.7]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = array([\n", " [.2, .1, .5], \n", " [.4, .8, .3],\n", " [.9, .6, .7]\n", " ])\n", "a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对于二维数组,默认相当于对每一行进行排序:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.1, 0.2, 0.5],\n", " [ 0.3, 0.4, 0.8],\n", " [ 0.6, 0.7, 0.9]])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sort(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "改变轴,对每一列进行排序:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 0.2, 0.1, 0.3],\n", " [ 0.4, 0.6, 0.5],\n", " [ 0.9, 0.8, 0.7]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sort(a, axis = 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## searchsorted 函数" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " searchsorted(sorted_array, values)\n", "\n", "`searchsorted` 接受两个参数,其中,第一个必需是已排序的数组。" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sorted_array = linspace(0,1,5)\n", "values = array([.1,.8,.3,.12,.5,.25])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([1, 4, 2, 1, 2, 1], dtype=int64)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "searchsorted(sorted_array, values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "排序数组:\n", "\n", "|0|1|2|3|4|\n", "|-|-|-|-|-|\n", "|0.0|0.25|0.5|0.75|1.0\n", "\n", "数值:\n", "\n", "|值|0.1|0.8|0.3|0.12|0.5|0.25|\n", "|-|-|-|-|-|-|-|\n", "|插入位置|1|4|2|1|2|1|\n", "\n", "`searchsorted` 返回的值相当于保持第一个数组的排序性质不变,将第二个数组中的值插入第一个数组中的位置:\n", "\n", "例如 `0.1` 在 [0.0, 0.25) 之间,所以插入时应当放在第一个数组的索引 `1` 处,故第一个返回值为 `1`。" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from numpy.random import rand\n", "data = rand(100)\n", "data.sort()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "不加括号,默认是元组:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.4, 0.6)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bounds = .4, .6\n", "bounds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "返回这两个值对应的插入位置:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "low_idx, high_idx = searchsorted(data, bounds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "利用插入位置,将数组中所有在这两个值之间的值提取出来:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 0.41122674, 0.4395727 , 0.45609773, 0.45707137, 0.45772076,\n", " 0.46029997, 0.46757401, 0.47525517, 0.4969198 , 0.53068779,\n", " 0.55764166, 0.56288568, 0.56506548, 0.57003042, 0.58035233,\n", " 0.59279233, 0.59548555])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data[low_idx:high_idx]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }