{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", " \n", " \n", "
\n", "
\n", " \n", "
Content Copyright by Pierian Data and xDM Consulting
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview of Received Emails\n", "\n", "Now that we understand how to send emails progammatically with Python, let's explore how we can read and search recieved emails. To do we will use the built-in [imaplib library](https://docs.python.org/3/library/imaplib.html#imap4-example). We will also use the built in [email](https://docs.python.org/3/library/email.examples.html) library for parsing through the recieved emails." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import imaplib" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "M = imaplib.IMAP4_SSL('imap.gmail.com')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import getpass" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "user = input(\"Enter your email: \")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Enter your password: ········\n" ] } ], "source": [ "# Remember , you may need an app password if you are a gmail user\n", "# \n", "password = getpass.getpass(\"Enter your password: \")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "M.login(user,password)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('OK',\n", " [b'(\\\\HasNoChildren) \"/\" \"INBOX\"',\n", " b'(\\\\HasNoChildren) \"/\" \"Personal\"',\n", " b'(\\\\HasNoChildren) \"/\" \"Receipts\"',\n", " b'(\\\\HasNoChildren) \"/\" \"Sent\"',\n", " b'(\\\\HasNoChildren) \"/\" \"Trash\"',\n", " b'(\\\\HasNoChildren) \"/\" \"Travel\"',\n", " b'(\\\\HasNoChildren) \"/\" \"Work\"',\n", " b'(\\\\HasChildren \\\\Noselect) \"/\" \"[Gmail]\"',\n", " b'(\\\\All \\\\HasNoChildren) \"/\" \"[Gmail]/All Mail\"',\n", " b'(\\\\Drafts \\\\HasNoChildren) \"/\" \"[Gmail]/Drafts\"',\n", " b'(\\\\HasNoChildren \\\\Important) \"/\" \"[Gmail]/Important\"',\n", " b'(\\\\HasNoChildren \\\\Sent) \"/\" \"[Gmail]/Sent Mail\"',\n", " b'(\\\\HasNoChildren \\\\Junk) \"/\" \"[Gmail]/Spam\"',\n", " b'(\\\\Flagged \\\\HasNoChildren) \"/\" \"[Gmail]/Starred\"',\n", " b'(\\\\HasNoChildren \\\\Trash) \"/\" \"[Gmail]/Trash\"'])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "M.list()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('OK', [b'28297'])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Connect to your inbox\n", "M.select(\"inbox\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching Mail\n", "\n", "Now that we have connected to our mail, we should be able to search for it using the specialized syntax of IMAP. Here are the different search keys you can use:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Keyword Definition
'ALL'\n", " Returns all messages in your email folder. Often there are size limits from imaplib.\n", " To change these use imaplib._MAXLINE = 100 , where 100 is whatever you want the limit to be.\n", "
'BEFORE date'\n", " Returns all messages before the date. Date must be formatted as 01-Nov-2000.\n", "
'ON date'\n", " Returns all messages on the date. Date must be formatted as 01-Nov-2000.\n", "
'SINCE date'\n", " Returns all messages after the date. Date must be formatted as 01-Nov-2000.\n", "
'FROM some_string '\n", " Returns all from the sender in the string. String can be an email, for example 'FROM user@example.com' or just a string that may appear in the email, \"FROM example\"\n", "
'TO some_string'\n", " Returns all outgoing email to the email in the string. String can be an email, for example 'FROM user@example.com' or just a string that may appear in the email, \"FROM example\"\n", "
'CC some_string' and/or 'BCC some_string'\n", " Returns all messages in your email folder. Often there are size limits from imaplib.\n", " To change these use imaplib._MAXLINE = 100 , where 100 is whatever you want the limit to be.\n", "
'SUBJECT string','BODY string','TEXT \"string with spaces\"'\n", " Returns all messages with the subject string or the string in the body of the email. If the string you are searching for has spaces in it, wrap it in double quotes.\n", "
'SEEN', 'UNSEEN'\n", " Returns all messages that have been seen or unseen. (Also known as read or unread)\n", "
'ANSWERED', 'UNANSWERED'\n", " Returns all messages that have been replied to or unreplied to. \n", "
'DELETED', 'UNDELETED'\n", " Returns all messages that have been deleted or that have not been deleted.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use the logical operators AND and OR to combine the above statements. Check out the full list of search keys here: http://www.4d.com/docs/CMU/CMU88864.HTM.\n", "\n", "Please note that some IMAP server providers for different email services will have slightly different syntax. You may need to experiment to get the results you want.\n", "\n", "___________\n", "___________\n", "\n", "Now we can search our mail for any term we want. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Use if you get an error saying limit was reached\n", "imaplib._MAXLINE = 10000000" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Send yourself a test email with the subject line:\n", "\n", " this is a test email for python\n", "\n", "Or some other uniquely identifying string. \n", "\n", "We will now need to reconnect to our imap server. You will probably need to restart your kernel for this step if you are using jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Restart your kernel and run the following:\n", "import imaplib\n", "import getpass\n", "M = imaplib.IMAP4_SSL('imap.gmail.com')\n", "user = input(\"Enter your email: \")\n", "password = getpass.getpass(\"Enter your password: \")\n", "M.login(user,password)\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('OK', [b'28299'])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Connect to your inbox\n", "M.select(\"inbox\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now search and confirm if it is there:" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "collapsed": true }, "outputs": [], "source": [ "typ ,data = M.search(None,'SUBJECT \"this is a test email for python\"')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "We can now save what it has returned:" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'OK'" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typ" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[b'28298']" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data will be a list of unique ids." ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "# typ, data = M.fetch(data[0],\"(RFC822)\")" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "collapsed": true }, "outputs": [], "source": [ "result, email_data = M.fetch(data[0],\"(RFC822)\")" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "collapsed": true }, "outputs": [], "source": [ "raw_email = email_data[0][1]" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "collapsed": true }, "outputs": [], "source": [ "raw_email_string = raw_email.decode('utf-8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the built in email library to help parse this raw string." ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import email" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "collapsed": true }, "outputs": [], "source": [ "email_message = email.message_from_string(raw_email_string)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'This is a test to see if the python search worked.\\r\\n'\n" ] } ], "source": [ "for part in email_message.walk():\n", " if part.get_content_type() == \"text/plain\":\n", " body = part.get_payload(decode=True)\n", " print(body)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Excellent! We've successfully have been able to check our email's inbox , filter by some condition, and read the body of the text that was there. This will come in handy in the near future!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 2 }