{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Created by [Nathan Kelber](http://nkelber.com) and Ted Lawless for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)
\n", "For questions/comments/improvements, email nathan.kelber@ithaka.org.
\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Counter Objects**\n", "\n", "**Description:** This notebook describes:\n", "* What a Counter object is\n", "* The difference between counters and dictionaries\n", "* Using Counter objects for finding the most common elements\n", "\n", "**Use Case:** For Learners (Detailed explanation, not ideal for researchers)\n", "\n", "**Difficulty:** Intermediate\n", "\n", "**Completion Time:** 20 minutes\n", "\n", "**Knowledge Required:** \n", "* Python Basics Series ([Start Python Basics I](./python-basics-1.ipynb))\n", "\n", "**Knowledge Recommended:** None\n", "\n", "**Data Format:** None\n", "\n", "**Libraries Used:** Counter from Collections\n", "\n", "**Research Pipeline:** None\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Counter Container Datatype\n", "\n", "A Counter is very similar to a dictionary, where the key is some variable and the value keeps count of the number of times it occurs. As the name suggests, Counters are very useful for counting the occurence of objects. We can create a Counter object from a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A Counter object created from a list\n", "sample_list = ['a', 'c', 'a', 'b', 'a', 'a', 'c', 'b', 'a', 'b', 'b', 'a', 'c', 'a']\n", "\n", "from collections import Counter\n", "Counter(sample_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also create a Counter object from a dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A Counter object created from a dictionary\n", "sample_dictionary = {'a': 4, 'b' : 13, 'c' : 2}\n", "Counter(sample_dictionary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The contents of a Counter object may look identical to a dictionary, but there are some significant differences. Let's imagine we are using our Counter object to count the occurences of five words in a text " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# An example dictionary with key/value pairs of words and numbers\n", "wordcounts_dictionary = {\n", " 'word_a': 23,\n", " 'word_b': 3,\n", " 'word_c': 4,\n", " 'word_d': 4,\n", " 'word_e': 32} \n", "\n", "# Create a Counter object `wordcounts_counter` from `wordcounts_dictionary`\n", "wordcounts_counter = Counter(wordcounts_dictionary)\n", "print(wordcounts_counter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Counter object looks just like a dictionary inside the parentheses `()` of `Counter()`. Both dictionaries and counters can return a **value** from a **key**." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Returning a value for a given key in Python dictionaries vs. Counter objects\n", "print(wordcounts_dictionary['word_a']) # Using a dictionary\n", "print(wordcounts_counter['word_a']) # Using a Counter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, the `Counter()` has some helpful differences from a dictionary. One difference is that a `Counter()` returns a 0 when no such key exists." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# With a Counter, the value of the made-up key `no_such_key_exists` is 0. \n", "print(wordcounts_counter['no_such_key_exists']) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a key is not in a dictionary, Python returns a KeyError." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# With a dictionary, the value of the made-up key `no_such_key_exists` causes a KeyError in Python\n", "# print(wordcounts_dictionary['no_such_key_exists']) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we wanted to overcome this difficulty using a dictionary, we could use the `.get()` method for retrieving values from a given key." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# A demonstration of returning a string when no such key exists\n", "print(wordcounts_dictionary.get('no_such_key_exists')) # If no key is found, `None` is returned\n", "print(wordcounts_dictionary.get('no_such_key_exists', 'No such key')) # We can also supply a second argument that defines a string to be returned" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Counter object, however, is also useful for a second purpose. Counter objects can be easily sorted using the `most_common()` method. We can specify an argument with this method to receive a certain number of results. Let's try it on our example `wordcounts_counter`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "wordcounts_counter.most_common(5) # Print the top 3 most common items in `counter_demo`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no least_common method, but we can get the least commmon element using a negative index or slice." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The least common element is at index -1\n", "wordcounts_counter.most_common()[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The final advantage of the Counter object is that it is a \"high performance\" container type, so it is able to process large amounts of data very quickly. Counter objects are the preferred method for working quickly and efficiently with counted elements in Python." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }