{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Interacting with 3E data using SQL\n", "Before beginning, we'll assume you've got a Jupyter Notebook running in your environment. Ideally in a Python virtual environment, but not completely necessary. You also have a read-only account set up on one of your 3E instances.\n", "\n", "If we are going to connect to 3E and run SQL commands we need a login on the SQL server and we need to install PyODBC. Microsoft does a good job of [describing how to install PyODBC](https://docs.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-2017). Your SQL administrator should be able to provide you a read-only account on the SQL server that houses your 3E instance. \n", "\n", "I should note that I'm using Jupyter Hub hosted on a Linux machine on our internal network. You are most likely running Jupyter Notebook locally on your own machine. The Hub allows me to grant a number of users access and run their own Notebooks while sharing some common utilities that I maintain. If you are on your own, you can store everything on your own machine, however, I will reference settings for the Juptyer Hub setup but I am hoping you can translate to your local setup.\n", "\n", "As we're publishing this to the web, I've placed my server name, database name, user and password in environment variables so they don't appear here in the Notebook. I use Linux and Systemd to start my Jupyter Hub and set the environment variables in my `/etc/systemd/system/jupyter.service` file. In my server's jupyterhub_config.py file I add any of the variables I want exposed to the individual users in the `c.Spawner.env_keep` option. Any users that I'm opening up Jupyter Hub to likely know these read only accounts anyway as they've had some SQL need in the past. Generally this would be other developers, report writers, and analysts.\n", "\n", "I encourage you to try and store the usernames and passwords somewhere other than in your Notebook, but it is not a requirement. You can type them in here if you like. The Notebook is under your control and not public if you've installed Jupyter on your local device.\n", "\n", "If everything is working properly run the next cell. If you get an error, things are not quite right and no point continuing until it works. To run the cell, click in it then use `Shift`+`Enter` or pick `Cell`->`Run Cells` from the menu." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pyodbc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whew. Assuming the import of pyodbc worked, we're ready to move on. Lets create some variables to hold our SQL connection information. We'll import `os` to get access to the environement variables if you've set it up that way. Importing `os` want cause you any issues if you are going to type the connection information in directly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os \n", "elite_db_server = os.environ['ELITE_PROD_HOST']\n", "elite_db = os.environ['ELITE_PROD_DB']\n", "elite_user = os.environ['ELITE_RO_NAME']\n", "elite_pass = os.environ['ELITE_RO_PASS']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you've done any connections to a SQL server before, you'll be familiar with a connection string. Basically, it's a formatted combination of parameters we pass to the SQL server to log in and obtain a connection. I use [Python f-strings](https://realpython.com/python-f-strings/) for string substition as I feel it is most readable. \n", "\n", "Again, as I'm using Linux this is for my setup. If you are running on Windows, you'll likely need to replace the `FreeTDS` with `SQL SERVER` and the `TDS_Version` is likely not required, but everything else should be fine. \n", "\n", "Making sure you've run the cell above (it should have a number in square parenthesis to the left of the cell), run this cell below and we'll establish a connection to the SQL Server." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conn_str = (f'DRIVER={{FreeTDS}};SERVER={elite_db_server};'\n", " f'PORT=1433;DATABASE={elite_db};UID={elite_user};'\n", " f'PWD={elite_pass};TDS_Version=8.0;ClientCharset=UTF-8')\n", "conn_3e = pyodbc.connect(conn_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so we're going to start simple with a list of some timekeepers with a particular title in a particular office as of today's date. Let's create the SQL which will return the Timekeeper Number and DisplayName with those parameters. From here on I'm not going to mention running the cells, but you must continue to do so :)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sql_timekeepers =\"\"\"\n", "select timekeeper.Number,\n", "timekeeper.DisplayName\n", "\n", "from Timekeeper\n", "join tkprdate on tkprdate.TimekeeperLkUp = Timekeeper.TkprIndex\n", "and convert(date,getdate()) between tkprdate.NxStartDate and tkprdate.NxEndDate\n", "join Title on tkprdate.Title = Title.Code\n", "join Office on tkprdate.Office = Office.Code\n", "\n", "where Title.Description = ?\n", "and Office.Description = ?\n", "\n", "order by Timekeeper.SortName\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The SQL statement is in the `sql_timekeepers` variable but we haven't handed it over to the SQL server yet. We'll do that in a moment, but for now, look at those two question marks in the sql statement. We're going to potentially want to hand in some different parameters to the same statement, so we've set it up so the statement can be prepared by the SQL server and run multiple times with a different set of parameters. We'll create another variable to hold the parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# the order matters here, they must be in the order they appear in the sql statement\n", "# alter the line below to suit your setup. Title description and Office description\n", "# NB: if you have a query with only one parameter, include the trailing comma: parameters = ('Partner',)\n", "parameters = ('Partner', 'Moncton')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You are all set now. Let's set up what's called a [cursor](https://github.com/mkleehammer/pyodbc/wiki/Cursor). It is not necessary to know everything about them. Not a perfect analogy, but I used to imagine a cursor as the straight edge my Grade 4 teacher moved down her list of student names as she called attendance. It is acting as a place holder as to where you are in the management of the data you have requested." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# first we create the cursor using the connection we've established above\n", "cursor = conn_3e.cursor()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# and we pass it the sql query we wrote and the parameters we've set, storing the result in a variable \n", "timekeeper_results = cursor.execute(sql_timekeepers, parameters)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this stage `timekeepers_results` is a prepared statement ready to execute and move through the data set. Let's loop through the cursor and store the results in a Python variable called `timekeepers`.\n", "\n", "Up to know, I'm sure even if you've never used Python, you've been able to follow along. It is an easy to read language. Now, however, we're going to have to store that data, and to do that we're going to use list comprehension to create a list of dictionaries holding the results of the SQL query. Yikes, here we go:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "timekeepers = [{'number': tkpr.Number, \n", " 'displayname': tkpr.DisplayName} for tkpr in timekeeper_results]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For me, this is where the beauty of Notebooks comes into play. I've now got the results in `timekeepers` and can analyse it however I like without having to run all the cells above. If I want a different set of parameters, I only need to re-run the cells above which runs the query and the one that sets the results into timekeepers with the specified parameters. At this point I'm free to view and use the data in any way I like without interacting with the database; we're only using the Python variable `timekeepers` at this point." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# in raw form timekeepers:\n", "timekeepers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imagining a more complicated query, you could now loop over these timekeepers and send an email, create a pdf, update some other system or anything else you needed to do. For now, we'll just print their names in a little table practicing some of the formatting with f-strings along the way. We'll finish here for now and will gloss over any of the stuff covered here in future notebooks. Feel free to add additional cells, change sql command, rerun and have fun." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for timekeeper in timekeepers:\n", " # this is where your imagination should kick in thinking, \"Can I then\n", " # create a new worksheet in an excel file with the Timekeeper's name which I then \n", " # populate with some data from a report all automatically without me having to manually\n", " # run that report and output to excel, massage, and update all the formulas etc? Wait,\n", " # if I do that do I really need to use Excel at all? Can Python just create the PDF I \n", " # want to send by email to each Partner? I've always thought there must be a better way\n", " # than re-storing all of Elite's data again in Excel; that one time back in 2006 I got\n", " # burned by copy and paste in Excel still gives me nightmares - is this the answer? \n", " # Can I use data from other systems as well and combine them all here? Oh, this is going\n", " # to fun.!\" \n", " print(f'{timekeeper[\"number\"]:<20}{timekeeper[\"displayname\"]:}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "In this Notebook, we've set up a connection to our Elite database server, run a SQL query and output the results. \n", "\n", "-30-" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Post article thoughts:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Sidebar for any more advanced Python folks\n", "For the advanced readers out there who recognize some pain points imaginging in having to set a loop where you specifically name everything like this in a dictionary, I'd usually have a function which converts the data set into a list of dictionaries in a more generic fashion. But lately I've been using Pandas which we'll talk about in a later Notebook which handles all of that for us. But for the curious, you can look at the available columns of the cursor in the cursor's `description` first element and you can write your own loops for the fun of it. Google hosts a [full description of the other attributes in the description variable](https://code.google.com/archive/p/pyodbc/wikis/Cursor.wiki)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# a look at the columns of the cursor (for those who are interested)\n", "# re-execute the sql in case we've already looped through the results above.\n", "# cursors are forward only\n", "timekeepers_cursor = cursor.execute(sql_timekeepers, parameters)\n", "timekeepers_cursor.description" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }