{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Основы программирования в Python\n",
    "\n",
    "*Алла Тамбовцева*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Основы работы с библиотекой `numpy`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Знакомство с массивами"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Сегодня мы познакомимся с библиотекой `numpy` (сокращение от *numeric Python*), которая часто используется в задачах, связанных с машинным обучением и построением статистических моделей.\n",
    "\n",
    "Массивы `numpy` очень похожи на списки (даже больше на вложенные списки), только они имеют одну особенность: элементы массива должны быть одного типа. Либо все элементы целые числа, либо числа с плавающей точкой, либо строки. Для обычных списков это условие не является обязательным:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 2, 4, 0]\n",
      "[[1, 0, 3], [3, 6, 7], []]\n",
      "[[1, 3, 6], ['a', 'b', 'c']]\n"
     ]
    }
   ],
   "source": [
    "L = [1, 2, 4, 0]\n",
    "E = [[1, 0, 3], [3, 6, 7], []]\n",
    "D = [[1, 3, 6], ['a', 'b', 'c']]\n",
    "\n",
    "# все работает\n",
    "print(L)\n",
    "print(E)\n",
    "print(D)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Чем хороши массивы `numpy`? Почему обычных списков недостаточно? Во-первых, обработка массивов занимает меньше времени (а их хранение меньше памяти), что очень актуально в случае работы с большими объемами данных. Во-вторых, функции `numpy` являются векторизованными ‒ их можно применять сразу ко всему массиву, то есть поэлементно. В этом смысле работа с массивами напоминает работу с векторами в R. Если в R у нас есть вектор `c(1, 2, 5)`, то, прогнав строчку кода `c(1, 2, 5)**2`, мы получим вектор, состоящий из квадратов значений: `c(1, 4, 25)`. Со списками в Python такое проделать не получится: понадобятся циклы или списковые включения. Зато с массивами `numpy` ‒ легко, и без всяких циклов! И в этом мы сегодня убедимся."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Для начала импортируем библиотеку (и сократим название до `np`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Получить массив `numpy` можно из обычного списка, просто используя функцию `array()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 4, 0])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A = np.array(L)\n",
    "A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 4, 0])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A = np.array([1, 2, 4, 0]) \n",
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Как видно из примера выше, список значений можно просто вписать в `array()`. Главное не забыть квадратные скобки: Python не сможет склеить перечень элементов в список самостоятельно и выдаст ошибку:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "only 2 non-keyword arguments accepted",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-5-83a5f8fa93ae>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mA\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m: only 2 non-keyword arguments accepted"
     ]
    }
   ],
   "source": [
    "A = np.array(1, 2, 4, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Посмотрим, какую информацию о массиве можно получить. Например, тип его элементов:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('int64')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.dtype # integer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Число измерений ‒ число \"маленьких\" массивов внутри \"большого\" массива (здесь такой один)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.ndim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\"Форма\" массива, о котором можно думать как о размерности матрицы ‒ кортеж, включающий число строк и столбцов. Здесь у нас всего одна строка, поэтому `numpy` считает только число элементов внутри массива."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(4,)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Так как массив `A` одномерный, обращаться к его элементам можно так же, как и к элементам списка, указывая индекс элемента в квадратных скобках:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Попытка использовать двойной индекс приведет к неудаче:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "ename": "IndexError",
     "evalue": "invalid index to scalar variable.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mIndexError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-10-ac48a874bd97>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mA\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# index error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mIndexError\u001b[0m: invalid index to scalar variable."
     ]
    }
   ],
   "source": [
    "A[0][0] # index error"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Общее число элементов в массиве можно получить с помощью метода `size` (аналог `len()` для списков):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Кроме того, по массиву можно получить разные описательные статистики:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.max() # максимум"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.min() # минимум"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.75"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.mean() # среднее"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "О других полезных методах можно узнать, нажав *Tab* после `np.`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Наконец, массив `numpy` можно легко превратить в список:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 4, 0]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "А теперь перейдем к многомерным массивам."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Многомерные массивы"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Создадим многомерный массив, взяв за основу вложенный список:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "S = np.array([[8, 1, 2], [2, 8, 9]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[8, 1, 2],\n",
       "       [2, 8, 9]])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Посмотрим на число измерений:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.ndim # два массива внутри"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2, 3)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.shape # две строки (два списка) и три столбца (по три элемента в списке)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Общее число элементов в массиве (его длина):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Когда в массиве больше одного измерения, при различных операциях нужно указывать, по какому измерению мы движемся (по строкам или по столбцам). Посмотрим еще раз на массив S и подумаем о нем как о матрице, как о таблице с числами:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[8, 1, 2],\n",
       "       [2, 8, 9]])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Можно найти максимальное значение по строкам или столбцам S:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([8, 8, 9])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.max(axis=0) # по столбцам - три столбца и три максимальных значения"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([8, 9])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.max(axis=1) # по строкам - две строки и два максимальных значения"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5. , 4.5, 5.5])"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.mean(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3.66666667, 6.33333333])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S.mean(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Для того, чтобы обратиться к элементу двумерного массива, нужно указывать два индекса: сначала индекс массива, в котором находится нужный нам элемент, а затем индекс элемента внутри этого массива:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S[0][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S[1][2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Если мы оставим один индекс, мы просто получим массив с соответствующим индексом:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([8, 1, 2])"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Массивы ‒ изменяемые объекты в Python. Обращаясь к элементу массива, ему можно присвоить новое значение:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[8, 1, 2],\n",
       "       [2, 8, 6]])"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "S[1][2] = 6\n",
    "S"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Чтобы выбрать сразу несколько элементов, как и в случае со списками, можно использовать срезы. Рассмотрим массив побольше."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "T = np.array([[1, 3, 7], [8, 10, 1], [2, 8, 9], [1, 0, 5]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1,  3,  7],\n",
       "       [ 8, 10,  1],\n",
       "       [ 2,  8,  9],\n",
       "       [ 1,  0,  5]])"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Как и при выборе среза из списка, правый конец не включается:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1,  3,  7],\n",
       "       [ 8, 10,  1]])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T[0:2] # массивы с индексами 0 и 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Можно сделать что-то еще более интересное ‒ выставить шаг среза. Другими словами, сообщить Python, что нужно брать? например, элементы, начиная с нулевого, с шагом 2: элемент с индексом 0, с индексом 2, с индексом 4, и так до конца массива."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 3, 7],\n",
       "       [2, 8, 9]])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T[0::2] # старт, двоеточие, двоеточие, шаг"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "В примере выше совершенно логично были выбраны элементы с индексами 0 и 2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Как создать массив?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Способ 1**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "С первым способом мы уже отчасти познакомились: можно получить массив из готового списка, воспользовавшись функцие `array()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([10.5, 45. ,  2.4])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.array([10.5, 45, 2.4])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Кроме того, при создании массива из списка можно изменить его форму, используя функцию `reshape()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[2, 5, 6],\n",
       "       [9, 8, 0]])"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "old = np.array([[2, 5, 6], [9, 8, 0]])\n",
    "old "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2, 3)"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "old.shape # 2 на 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[2, 5],\n",
       "       [6, 9],\n",
       "       [8, 0]])"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new = old.reshape(3, 2) # изменим на 3 на 2\n",
    "new"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3, 2)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new.shape # 3 на 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Конечно, такие преобразования разумно применять, если произведение чисел в `reshape()` совпадает с общим числом элементов в массиве. В нашем случае в массиве `old` 6 элементов, поэтому из него можно получить массивы 2 на 3, 3 на 2, 1 на 6, 6 на 1. Несоответствующее число измерений приведет к ошибке:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "cannot reshape array of size 6 into shape (2,4)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-39-a803468708c0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mold\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# и Python явно пишет, что не так\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m: cannot reshape array of size 6 into shape (2,4)"
     ]
    }
   ],
   "source": [
    "old.reshape(2, 4) # и Python явно пишет, что не так"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Способ 2**\n",
    "\n",
    "Можно создать массив на основе промежутка, созданного с помощью`arange()` ‒ функции из `numpy`, похожей на `range()`, только более гибкую. Посмотрим, как работает эта функция."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2, 3, 4, 5, 6, 7, 8])"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.arange(2, 9) # по умолчанию - как обычный range()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "По умолчанию эта функция создает массив, элементы которого начинаются со значения 2 и заканчиваются на значении 8 (правый конец промежутка не включается), следуя друг за другом с шагом 1. Но этот шаг можно менять:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2, 5, 8])"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.arange(2, 9, 3) # с шагом 3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "И даже делать дробным!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5, 8. ,\n",
       "       8.5])"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.arange(2, 9, 0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "А теперь совместим `arange()` и `reshape()`, чтобы создать массив нужного вида:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ],\n",
       "       [5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5]])"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.arange(2, 9, 0.5).reshape(2, 7)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Получилось!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Способ 3**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Еще массив можно создать совсем с нуля. Единственное, что нужно четко представлять ‒ это его размерность, его форму, то есть опять же, число строк и столбцов. Библиотека `numpy` позволяет создать массивы, состоящие из нулей или единиц, а также \"пустые\" массивы (на самом деле, не совсем пустые, как убедимся позже). Удобство заключается в том, что сначала можно создать массив, инициализировать его (например, заполнить нулями), а затем заменить нули на другие значения в соответствии с требуемыми условиями. Как мы помним, массивы ‒ изменяемые объекты, и использовать замену в цикле еще никто не запрещал.\n",
    "\n",
    "Так выглядит массив из нулей:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 0., 0.],\n",
       "       [0., 0., 0.],\n",
       "       [0., 0., 0.]])"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Z = np.zeros((3, 3)) # размеры в виде кортежа - не теряйте еще одни круглые скобки\n",
    "Z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "А так ‒ массив из единиц:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1., 1.],\n",
       "       [1., 1.],\n",
       "       [1., 1.],\n",
       "       [1., 1.]])"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "O = np.ones((4, 2))\n",
    "O"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "С пустым (*empty*) массивом все более загадочно:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[6.92198590e-310, 6.92198590e-310],\n",
       "       [5.31021756e-317, 6.92194731e-310],\n",
       "       [5.39590831e-317, 5.39790038e-317]])"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Emp = np.empty((3, 2))\n",
    "Emp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Массив *Emp* ‒ не совсем пустой, в нем содержатся какие-то (псевдо)случайные элементы, которые примерно равны 0. Теоретически создавать массив таким образом можно, но не рекомендуется: лучше создать массив из \"чистых\" нулей, чем из какого-то непонятного \"мусора\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Задание:** Дан массив `ages` (см. ниже). Напишите программу с циклом, которая позволит получить массив `ages_bin` такой же размерности, что и `ages`, состоящий из 0 и 1 (0 - младше 18, 1 - не младше 18).\n",
    "\n",
    "*Подсказка:* используйте вложенный цикл."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "ages = np.array([[12, 16, 17, 18, 14], [20, 22, 18, 17, 23], [32, 16, 44, 16, 23]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Решение:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 0., 0., 1., 0.],\n",
       "       [1., 1., 1., 0., 1.],\n",
       "       [1., 0., 1., 0., 1.]])"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "shape = ages.shape\n",
    "ages_bin = np.zeros(shape)\n",
    "ages_bin\n",
    "\n",
    "for i in range(0, shape[0]):\n",
    "    for j in range(0, shape[1]):\n",
    "        if ages[i][j] >= 18:\n",
    "            ages_bin[i][j] = 1\n",
    "ages_bin"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Почему массивы `numpy` ‒ это удобно? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Как уже было отмечено в начале занятия, операции с массивами можно производить поэлементно, не используя циклы или их аналоги. Посмотрим на массив `A`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 4, 0])"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "А теперь возведем все его элементы в квадрат:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1,  4, 16,  0])"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A ** 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Или вычтем из всех элементов единицу:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  1,  3, -1])"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A - 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Кроме того, так же просто к элементам массива можно применять свои функции. Напишем функцию, которая будет добавлять к элементу 1, а затем считать от него натуральный логарифм (здесь эта функция актуальна, так как в массиве `A` есть 0)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "def my_log(x):\n",
    "    return np.log(x + 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Применим:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.69314718, 1.09861229, 1.60943791, 0.        ])"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_log(A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "И никаких циклов и иных нагромождений."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Превратить многомерный массив в одномерный (как список) можно, воспользовавшись методами `flatten()` и `ravel()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([12, 16, 17, 18, 14, 20, 22, 18, 17, 23, 32, 16, 44, 16, 23])"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ages.flatten() # \"плоский\" массив"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([12, 16, 17, 18, 14, 20, 22, 18, 17, 23, 32, 16, 44, 16, 23])"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ages.ravel()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Чем еще хорош `numpy`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.Позволяет производить вычисления ‒ нет необходимости дополнительно загружать модуль `math`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0986122886681098"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.log(3) # натуральный логарифм"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.6457513110645907"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.sqrt(7) # квадратный корень"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.38905609893065"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.exp(2) # e^2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.Позволяет производить операции с векторами и матрицами. Пусть у нас есть два вектора `a` и `b`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.array([1, 2, 3])\n",
    "b = np.array([0, 4, 7])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Если мы просто умножим `a`  на `b` с помощью символа `*`, мы получим массив, содержащий произведения соответствующих элементов `a` и `b`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0,  8, 21])"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a * b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "А если мы воспользуемся функцией `dot()`, получится [скалярное произведение](https://ru.wikipedia.org/wiki/%D0%A1%D0%BA%D0%B0%D0%BB%D1%8F%D1%80%D0%BD%D0%BE%D0%B5_%D0%BF%D1%80%D0%BE%D0%B8%D0%B7%D0%B2%D0%B5%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5) векторов (*dot product*)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "29"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.dot(a, b) # результат - число"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "При желании можно получить [векторное произведение](https://ru.wikipedia.org/wiki/%D0%92%D0%B5%D0%BA%D1%82%D0%BE%D1%80%D0%BD%D0%BE%D0%B5_%D0%BF%D1%80%D0%BE%D0%B8%D0%B7%D0%B2%D0%B5%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5) (*cross product*): "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 2, -7,  4])"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.cross(a, b) # результат- вектор"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Теперь создадим матрицу и поработаем с ней. Создадим ее не самым интуитивным образов ‒ из строки (да, так тоже можно)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = np.array(np.mat('2 4; 1 6')) # np.mat - матрица из строки, np.array - массив из матрицы "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Самое простоеи понятное, что можно сделать с матрицей ‒ транспонировать ее, то есть поменять местами строки и столбцы:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[2, 1],\n",
       "       [4, 6]])"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m.T "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Можно вывести ее диагональные элементы:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2, 6])"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m.diagonal()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "И посчитать след матрицы ‒ сумму ее диагональных элементов:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "m.trace()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Задание.** Создайте [единичную матрицу](https://ru.wikipedia.org/wiki/%D0%95%D0%B4%D0%B8%D0%BD%D0%B8%D1%87%D0%BD%D0%B0%D1%8F_%D0%BC%D0%B0%D1%82%D1%80%D0%B8%D1%86%D0%B0) 3 на 3, создав массив из нулей, а затем заполнив ее диагональные элементы значениями 1.\n",
    "\n",
    "*Подсказка:* функция `fill_diagonal()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Решение:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1., 0., 0.],\n",
       "       [0., 1., 0.],\n",
       "       [0., 0., 1.]])"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "I = np.zeros((3, 3))\n",
    "np.fill_diagonal(I, 1)\n",
    "I"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Правда, для создания массива в виде единичной матрицы в `numpy` уже есть готовая функция (наряду с `zeros` и `ones`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1., 0., 0.],\n",
       "       [0., 1., 0.],\n",
       "       [0., 0., 1.]])"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.eye(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Найдем [обратную матрицу](https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%80%D0%B0%D1%82%D0%BD%D0%B0%D1%8F_%D0%BC%D0%B0%D1%82%D1%80%D0%B8%D1%86%D0%B0):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-3, -5],\n",
       "       [-2, -7]])"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.invert(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Для других операций с матрицами (и вычислений в рамках линейной алгебры) можно использовать функции из подмодуля `linalg`. Например, так можно найти определитель матрицы:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.999999999999998"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.linalg.det(m) # вспоминаем истории про числа с плавающей точкой, это 8 на самом деле"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "И собственные значения:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1.17157288, 6.82842712])"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.linalg.eigvals(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Полный список функций с описанием см. в [документации](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3.Библиотеку `numpy` часто используют с библиотекой для визуализации `matplotlib`. \n",
    "\n",
    "Рассмотрим функцию `linspace()`. Она возвращает массив одинаково (с одинаковым шагом) распределенных чисел из фиксированного интервала."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 10. ,  32.5,  55. ,  77.5, 100. ])"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.linspace(10, 100, 5) # 5 чисел из интервала от 10 до 100\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([   0.        ,   10.1010101 ,   20.2020202 ,   30.3030303 ,\n",
       "         40.4040404 ,   50.50505051,   60.60606061,   70.70707071,\n",
       "         80.80808081,   90.90909091,  101.01010101,  111.11111111,\n",
       "        121.21212121,  131.31313131,  141.41414141,  151.51515152,\n",
       "        161.61616162,  171.71717172,  181.81818182,  191.91919192,\n",
       "        202.02020202,  212.12121212,  222.22222222,  232.32323232,\n",
       "        242.42424242,  252.52525253,  262.62626263,  272.72727273,\n",
       "        282.82828283,  292.92929293,  303.03030303,  313.13131313,\n",
       "        323.23232323,  333.33333333,  343.43434343,  353.53535354,\n",
       "        363.63636364,  373.73737374,  383.83838384,  393.93939394,\n",
       "        404.04040404,  414.14141414,  424.24242424,  434.34343434,\n",
       "        444.44444444,  454.54545455,  464.64646465,  474.74747475,\n",
       "        484.84848485,  494.94949495,  505.05050505,  515.15151515,\n",
       "        525.25252525,  535.35353535,  545.45454545,  555.55555556,\n",
       "        565.65656566,  575.75757576,  585.85858586,  595.95959596,\n",
       "        606.06060606,  616.16161616,  626.26262626,  636.36363636,\n",
       "        646.46464646,  656.56565657,  666.66666667,  676.76767677,\n",
       "        686.86868687,  696.96969697,  707.07070707,  717.17171717,\n",
       "        727.27272727,  737.37373737,  747.47474747,  757.57575758,\n",
       "        767.67676768,  777.77777778,  787.87878788,  797.97979798,\n",
       "        808.08080808,  818.18181818,  828.28282828,  838.38383838,\n",
       "        848.48484848,  858.58585859,  868.68686869,  878.78787879,\n",
       "        888.88888889,  898.98989899,  909.09090909,  919.19191919,\n",
       "        929.29292929,  939.39393939,  949.49494949,  959.5959596 ,\n",
       "        969.6969697 ,  979.7979798 ,  989.8989899 , 1000.        ])"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.linspace(0, 1000, 100) \n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "При чем тут `matplotlib`? Представьте, что нам нужно построить обычный график функции $y=x^2$. Если в наши задачи не входит построение графика по определенным данным, можно спокойно создать массив `x` с помощью `linspace`, а затем просто возвести его в квадрат! И нанести полученные точки на график."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.00000000e+00, 1.02030405e+02, 4.08121620e+02, 9.18273646e+02,\n",
       "       1.63248648e+03, 2.55076013e+03, 3.67309458e+03, 4.99948985e+03,\n",
       "       6.52994592e+03, 8.26446281e+03, 1.02030405e+04, 1.23456790e+04,\n",
       "       1.46923783e+04, 1.72431385e+04, 1.99979594e+04, 2.29568411e+04,\n",
       "       2.61197837e+04, 2.94867871e+04, 3.30578512e+04, 3.68329762e+04,\n",
       "       4.08121620e+04, 4.49954086e+04, 4.93827160e+04, 5.39740843e+04,\n",
       "       5.87695133e+04, 6.37690032e+04, 6.89725538e+04, 7.43801653e+04,\n",
       "       7.99918376e+04, 8.58075707e+04, 9.18273646e+04, 9.80512193e+04,\n",
       "       1.04479135e+05, 1.11111111e+05, 1.17947148e+05, 1.24987246e+05,\n",
       "       1.32231405e+05, 1.39679625e+05, 1.47331905e+05, 1.55188246e+05,\n",
       "       1.63248648e+05, 1.71513111e+05, 1.79981635e+05, 1.88654219e+05,\n",
       "       1.97530864e+05, 2.06611570e+05, 2.15896337e+05, 2.25385165e+05,\n",
       "       2.35078053e+05, 2.44975003e+05, 2.55076013e+05, 2.65381084e+05,\n",
       "       2.75890215e+05, 2.86603408e+05, 2.97520661e+05, 3.08641975e+05,\n",
       "       3.19967350e+05, 3.31496786e+05, 3.43230283e+05, 3.55167840e+05,\n",
       "       3.67309458e+05, 3.79655137e+05, 3.92204877e+05, 4.04958678e+05,\n",
       "       4.17916539e+05, 4.31078461e+05, 4.44444444e+05, 4.58014488e+05,\n",
       "       4.71788593e+05, 4.85766758e+05, 4.99948985e+05, 5.14335272e+05,\n",
       "       5.28925620e+05, 5.43720029e+05, 5.58718498e+05, 5.73921028e+05,\n",
       "       5.89327620e+05, 6.04938272e+05, 6.20752984e+05, 6.36771758e+05,\n",
       "       6.52994592e+05, 6.69421488e+05, 6.86052444e+05, 7.02887460e+05,\n",
       "       7.19926538e+05, 7.37169677e+05, 7.54616876e+05, 7.72268136e+05,\n",
       "       7.90123457e+05, 8.08182838e+05, 8.26446281e+05, 8.44913784e+05,\n",
       "       8.63585348e+05, 8.82460973e+05, 9.01540659e+05, 9.20824406e+05,\n",
       "       9.40312213e+05, 9.60004081e+05, 9.79900010e+05, 1.00000000e+06])"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = x ** 2\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Подробнее этот пример мы рассмотрим позже, когда будем работать с `matplotlib`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4.Еще `numpy` можно использовать в статистике. Например, чтобы посчитать выборочную дисперсию, стандартное отклонение, медиану. Важный момент: здесь `numpy` чем-то похож на R, он не сможет выдать результат в случае, если в массиве присутствуют пропущенные значения. Проверим."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1. , 0. , 4.5, nan, 3. ])"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q = np.array([1., 0., 4.5, np.nan, 3.]) # np.nan - Not a number (пропущенное значение)\n",
    "q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.var(q) # получаем nan вместо дисперсии"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3.046875"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.nanvar(q) # если функция начинается с nan, все работает"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Аналогичная история с медианой:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.5/dist-packages/numpy/lib/function_base.py:4033: RuntimeWarning: Invalid value encountered in median\n",
      "  r = func(a, **kwargs)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.median(q)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.0"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.nanmedian(q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Еще с помощью `numpy` можно считать выборочный коэффициент корреляции Пирсона. Функция `corrcoef()` возвращает корреляционную матрицу, из которой можно извлечь коэффициент линейной корреляции."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.        , 0.93307545],\n",
       "       [0.93307545, 1.        ]])"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = np.array([2, 6, 8, 10, 12])\n",
    "y = np.array([4, 7, 14, 21, 19])\n",
    "\n",
    "np.corrcoef(x, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9330754546745095"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.corrcoef(x, y)[0][1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Внимание:** не путайте функцию `corrcoef()` с `correlate()`!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([600])"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.correlate(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Функция `correlate()` используется для нахождения кросс-корреляции."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function correlate in module numpy.core.numeric:\n",
      "\n",
      "correlate(a, v, mode='valid')\n",
      "    Cross-correlation of two 1-dimensional sequences.\n",
      "    \n",
      "    This function computes the correlation as generally defined in signal\n",
      "    processing texts::\n",
      "    \n",
      "        c_{av}[k] = sum_n a[n+k] * conj(v[n])\n",
      "    \n",
      "    with a and v sequences being zero-padded where necessary and conj being\n",
      "    the conjugate.\n",
      "    \n",
      "    Parameters\n",
      "    ----------\n",
      "    a, v : array_like\n",
      "        Input sequences.\n",
      "    mode : {'valid', 'same', 'full'}, optional\n",
      "        Refer to the `convolve` docstring.  Note that the default\n",
      "        is 'valid', unlike `convolve`, which uses 'full'.\n",
      "    old_behavior : bool\n",
      "        `old_behavior` was removed in NumPy 1.10. If you need the old\n",
      "        behavior, use `multiarray.correlate`.\n",
      "    \n",
      "    Returns\n",
      "    -------\n",
      "    out : ndarray\n",
      "        Discrete cross-correlation of `a` and `v`.\n",
      "    \n",
      "    See Also\n",
      "    --------\n",
      "    convolve : Discrete, linear convolution of two one-dimensional sequences.\n",
      "    multiarray.correlate : Old, no conjugate, version of correlate.\n",
      "    \n",
      "    Notes\n",
      "    -----\n",
      "    The definition of correlation above is not unique and sometimes correlation\n",
      "    may be defined differently. Another common definition is::\n",
      "    \n",
      "        c'_{av}[k] = sum_n a[n] conj(v[n+k])\n",
      "    \n",
      "    which is related to ``c_{av}[k]`` by ``c'_{av}[k] = c_{av}[-k]``.\n",
      "    \n",
      "    Examples\n",
      "    --------\n",
      "    >>> np.correlate([1, 2, 3], [0, 1, 0.5])\n",
      "    array([ 3.5])\n",
      "    >>> np.correlate([1, 2, 3], [0, 1, 0.5], \"same\")\n",
      "    array([ 2. ,  3.5,  3. ])\n",
      "    >>> np.correlate([1, 2, 3], [0, 1, 0.5], \"full\")\n",
      "    array([ 0.5,  2. ,  3.5,  3. ,  0. ])\n",
      "    \n",
      "    Using complex sequences:\n",
      "    \n",
      "    >>> np.correlate([1+1j, 2, 3-1j], [0, 1, 0.5j], 'full')\n",
      "    array([ 0.5-0.5j,  1.0+0.j ,  1.5-1.5j,  3.0-1.j ,  0.0+0.j ])\n",
      "    \n",
      "    Note that you get the time reversed, complex conjugated result\n",
      "    when the two input sequences change places, i.e.,\n",
      "    ``c_{va}[k] = c^{*}_{av}[-k]``:\n",
      "    \n",
      "    >>> np.correlate([0, 1, 0.5j], [1+1j, 2, 3-1j], 'full')\n",
      "    array([ 0.0+0.j ,  3.0+1.j ,  1.5+1.5j,  1.0+0.j ,  0.5+0.5j])\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(np.correlate)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}