{ "cells": [ { "cell_type": "markdown", "id": "ea2c042f", "metadata": {}, "source": [ "--- \n", " \n", "\n", "

Department of Data Science

\n", "

Course: Tools and Techniques for Data Science

\n", "\n", "---\n", "

Instructor: Muhammad Arif Butt, Ph.D.

" ] }, { "cell_type": "markdown", "id": "fb955b9f", "metadata": {}, "source": [ "

Lecture 3.2 (NumPy-02)

" ] }, { "cell_type": "markdown", "id": "2f049883", "metadata": {}, "source": [ "\"Open" ] }, { "cell_type": "markdown", "id": "1217d912", "metadata": {}, "source": [ "# _Array vs List.ipynb_" ] }, { "cell_type": "markdown", "id": "0e5dc3f1", "metadata": {}, "source": [ " " ] }, { "cell_type": "code", "execution_count": null, "id": "eafe2e2d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "f5e8412f", "metadata": {}, "source": [ "# Learning agenda of this notebook\n", "1. A Comparison\n", " - Python Lists\n", " - Python Arrays\n", " - NumPy Arrays\n", "2. Memory Consumption of Python List and Numpy Array\n", "3. Operation cost on Python List and Numpy Array" ] }, { "cell_type": "code", "execution_count": null, "id": "0f8adf8a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "221f59c9", "metadata": {}, "source": [ "### a. Python Lists\n", "- Python List is a numerically ordered sequence of elements that can store elements of heterogeneous types, is iterable, mutable and allows duplicate elements.\n", "- A Python List is built-in type in Python and can be created by placing comma separated values in square brackets, and you don't have to specify the type while creating a Python List\n", "- Python list is by default 1 dimensional. But we can create an N-Dimensional list. But then too it will be 1 D list storing another 1D list\n", "- Items are stored non-contiguously in memory.\n", "- More memory hungry.\n", "- Operations on Lists are typically slower, however, append operation will take O(1) time." ] }, { "cell_type": "code", "execution_count": null, "id": "9f8e9078", "metadata": {}, "outputs": [], "source": [ "# creating a list containing elements belonging to different data types \n", "mylist = [1, \"Data Science\", ['a','e'], False, 5.72] \n", "print(mylist) \n", "print(type(mylist))" ] }, { "cell_type": "code", "execution_count": null, "id": "1b1fdd09", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "646459ef", "metadata": {}, "source": [ "### b. Python Arrays\n", "- A simple Python array is a sequence of objects of similar data dype. Python array module requires all array elements to be of the same type. Moreover, to create an array, you'll need to specify a value type. \n", "\n", "```\n", "array(typecode [, initializer])\n", "```\n", "\n", "- Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, string or iterable over elements of the appropriate type.\n", "\n", "- Arrays represent basic values and behave very much like lists, except the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character.\n", "- The following type codes are defined:\n", "\n", "\n", " Type code C Type Minimum size in bytes\n", " 'b' signed integer 1\n", " 'B' unsigned integer 1\n", " 'u' Unicode character 2 (see note)\n", " 'h' signed integer 2\n", " 'H' unsigned integer 2\n", " 'i' signed integer 2\n", " 'I' unsigned integer 2\n", " 'l' signed integer 4\n", " 'L' unsigned integer 4\n", " 'q' signed integer 8 (see note)\n", " 'Q' unsigned integer 8 (see note)\n", " 'f' floating point 4\n", " 'd' floating point 8" ] }, { "cell_type": "code", "execution_count": null, "id": "c009f9c6", "metadata": {}, "outputs": [], "source": [ "# To use Python arrays, you have to import Python's built-in array module\n", "import array\n", "\n", "# declaring array of integers\n", "arr1 = array.array('i', [3, 6, 9, 2]) \n", "print(arr1) \n", "print(type(arr1)) \n", "\n", "# declaring array of floats\n", "arr2 = array.array(\"f\", [3.4, 6.7, 9.5, 2]) \n", "print(arr2) \n", "print(type(arr2)) \n", "\n", "# Python arrays can grow/shrink dynamically\n", "arr2.append(999)\n", "print(arr2)" ] }, { "cell_type": "code", "execution_count": null, "id": "9b9d313d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3fb494e4", "metadata": {}, "source": [ "### c. Numpy Arrays\n", "- A NumPy array is a numerically ordered sequence of elements stored contiguously in memory, that can store elements of homogeneous types (usually numbers but can be boolians, strings, or other objects), is iterable, mutable, non-growable/shrinkable and allows duplicate elements.\n", "- NumPy arrays have a fixed size at creation, unlike Python lists/arrays (which can grow dynamically). If you change the size of a numPy array, it will create a new array and delete the original. \n", "- NumPy arrays are less memory hungry and offer better performance than Python Lists.\n", "\n", "\n", "\n", "**Differences between Python List and NumPy Arrays:**\n", "1. Lists are part of core Python. Arrays are not part of core Python\n", "2. Lists can contain elements of different types. An Array’s elements must all be of the same type\n", "3. Lists don’t need to be declared. Arrays need to be declared before use.\n", "4. Arrays (in Numpy) are optimized for fast mathematical operations. Lists are not.\n", "5. Arrays are optimized for storage (which is why you need to declare them before use). Lists are not.\n", "6. Lists can grow/shrink and are more flexible (they allow easy extension or reduction by adding/deleting elements). Arrays are not flexible.\n", "\n", "\n", "- In general if you are going to make heavy use of mathematical operations, or need to store and process a large amount of numerical data, you should go with arrays rather than lists. If you are also particular about efficient memory storage, you should use arrays." ] }, { "cell_type": "code", "execution_count": 2, "id": "7c5bae72", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3.5 1. 9. 2.7 0. ]\n", "\n", "\n" ] } ], "source": [ "# NumPy array upcast data type of all elements to bigger datatype in case of different types\n", "import numpy as np\n", "array1 = np.array([3.5, True, 9, 2.7, False])\n", "print(array1)\n", "print(type(array1))\n", "print(type(array1[1]))" ] }, { "cell_type": "code", "execution_count": 3, "id": "59c68320", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['3.5' '9' '2.7' 'arif' 'False']\n", "\n", "\n" ] } ], "source": [ "# NumPy array upcast data type of all elements to bigger datatype in case of different types\n", "import numpy as np\n", "array1 = np.array([3.5, 9, 2.7, 'arif', False])\n", "print(array1)\n", "print(type(array1))\n", "print(type(array1[1]))" ] }, { "cell_type": "code", "execution_count": 1, "id": "8b61d554", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3 0 9 2 1]\n", "\n", "\n" ] } ], "source": [ "# If you mention the data type, the elements are automatically typecasted to the mentioned type\n", "import numpy as np\n", "array1 = np.array([3.5, False, 9.8, 2.7, True], dtype=np.uint16)\n", "print(array1)\n", "print(type(array1))\n", "print(type(array1[1]))" ] }, { "cell_type": "code", "execution_count": null, "id": "db1ea8ad", "metadata": {}, "outputs": [], "source": [ "# If you mention the data type, the elements are automatically typecasted to the mentioned type\n", "import numpy as np\n", "array1 = np.array([3.5, False, 9.8, 2.7, True], dtype=np.str)\n", "print(array1)\n", "print(type(array1))\n", "print(type(array1[1]))" ] }, { "cell_type": "code", "execution_count": null, "id": "82d0c21a", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6559684c", "metadata": {}, "source": [ "## 2. Memory Consumption of NumPy Array and Python List\n", "- Python Lists consume more memory than NumPy arrays" ] }, { "cell_type": "code", "execution_count": null, "id": "d4f5466f", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import sys\n", " \n", "# declaring a list of 1000 elements \n", "list1 = range(1000)\n", "\n", "element_size = sys.getsizeof(list1)\n", "list1_size = element_size * len(list1)\n", "print(\"Size of each element = {} and Size of list1 = {} bytes\".format(element_size, list1_size))\n", " \n", "# declaring a Numpy array of 1000 elements \n", "array1 = np.arange(1000, dtype=np.uint8)\n", "print(\"\\nSize of each element = {} and Size of array1 = {} bytes\".format(array1.itemsize, array1.nbytes))" ] }, { "cell_type": "code", "execution_count": null, "id": "ef6683be", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6cd0dac0", "metadata": {}, "source": [ "## 3. Operations on NumPy Arrays vs Python Lists\n", "- NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. \n", "- This behavior is called **locality of reference** in computer science. \n", "- This is the main reason why NumPy is faster than lists. \n", "- As a proof of concept, we can multiply two list and and then two arrays, and compare their multiplication time" ] }, { "cell_type": "markdown", "id": "e2c6900a", "metadata": {}, "source": [ "### Effect of * operator on NumPy Array and Python List" ] }, { "cell_type": "code", "execution_count": null, "id": "adcf2c46", "metadata": {}, "outputs": [], "source": [ "# You can multiply two numPy arrays using * operator\n", "import numpy as np\n", "myarray1 = np.array([1, 2, 3, 4, 5, 6])\n", "myarray2 = np.array([1, 2, 3, 4, 5, 6])\n", "myarray3 = myarray1 * myarray2\n", "myarray3" ] }, { "cell_type": "code", "execution_count": null, "id": "4da8d1ed", "metadata": {}, "outputs": [], "source": [ "# you can't multiply two lists using a * operator, you have to use a loop\n", "mylist1 = [1, 2, 3, 4, 5, 6]\n", "mylist2 = [1, 2, 3, 4, 5, 6]\n", "mylist3 = [0, 0, 0, 0, 0, 0]\n", "for i in range(0,6):\n", " mylist3[i] = mylist1[i] * mylist2[i]\n", "mylist3" ] }, { "cell_type": "markdown", "id": "35eb1d0c", "metadata": {}, "source": [ "**Let us calculate time to multiply two numPy arrays of 1 million elements**" ] }, { "cell_type": "code", "execution_count": null, "id": "bc552f6c", "metadata": {}, "outputs": [], "source": [ "import time\n", "size = 1000000\n", "array1 = np.arange(size)\n", "array2 = np.arange(size)\n", "\n", "# capturing time before the multiplication of Numpy arrays\n", "initialTime = time.time()\n", "\n", "# multiplying elements of both the Numpy arrays and stored in another Numpy array\n", "array3 = array1 * array2\n", "\n", "# capturing time again after the multiplication is done\n", "finishTime = time.time()\n", "\n", "print(\"\\nTime taken by NumPy Arrays to perform multiplication:\", finishTime - initialTime, \"seconds\")\n" ] }, { "cell_type": "markdown", "id": "1238c376", "metadata": {}, "source": [ "**Let us calculate time to multiply two Python Lists of 1 million elements**" ] }, { "cell_type": "code", "execution_count": null, "id": "00ef9daf", "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", "# Creating two large size Lists and multiplying them element by element\n", "list1 = list(range(size))\n", "list2 = list(range(size))\n", "list3 = list(range(size))\n", "\n", "# capturing time before the multiplication of Python Lists\n", "initialTime = time.time()\n", "\n", "# multiplying elements of both the lists and stored in another list\n", "# simply run a loop and overwrite the elements of the new list with resulting value\n", "for i in range(0, len(list1)):\n", " list3[i] = list1[i] * list2[i]\n", "\n", "\n", "# capturing time again after the multiplication is done\n", "finishTime = time.time()\n", "\n", "print(\"\\nTime taken by Lists to perform multiplication:\", finishTime - initialTime, \"seconds\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b15a7194", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }