{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![CoST](images/cost_logo.png) | ![OCDS](images/ocds_logo.png) | ![ODS](images/ods_logo.png)\n", "-|-|-\n", "||" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "The raw code for this IPython notebook is by default hidden for easier reading.\n", "To toggle on/off the raw code, click here." ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import HTML\n", "\n", "HTML('''\n", "The raw code for this IPython notebook is by default hidden for easier reading.\n", "To toggle on/off the raw code, click here.''')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# OCDS for Infrastructure - Ukraine demonstrator\n", "\n", "This notebook demonstrates the steps involved in developing a lightweight prototype to bring together structured data on infrastructure projects and contracting processes and explores the added value of using OCDS data in infrastructure monitoring.\n", "\n", "The notebook is divided into the following sections:\n", "\n", "- Summary of findings\n", "- Part 1: Scraping projects data from the CoST Ukraine portal\n", "- Part 2: Finding related contracting processes in the Prozorro OCDS export\n", "- Part 3: Comparing data from CoST and Prozorro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary of findings\n", "\n", "### Availability of data\n", "\n", "**Projects**\n", "\n", "An export of structured data is not available from the front end of the CoST Ukraine portal, therefore a scraper script was written to extract projects data and store this in structured format.\n", "\n", "Currently only data from the State Highways Agency is available on the CoST Ukraine portal.\n", "\n", "**Contracting processes**\n", "\n", "Structured data is available from the Prozorro platform via the following methods:\n", "\n", "* An API which serves individual contracting processes in a [format similar to OCDS](http://api-docs.openprocurement.org/en/latest/standard/index.html)\n", "* Fortnightly [bulk exports](http://ocds.prozorro.openprocurement.io/) of OCDS format data, totaling c. 46gb per export \n", "\n", "The most recent bulk export is dated 2018-04-20.\n", "\n", "### Finding related contracting processes\n", "\n", "Currently the only public search interface available to the contracting data is in the [Prozorro front end](https://prozorro.gov.ua/) and there is no search API or interface to the bulk OCDS data to provide programmatic search of the data.\n", "\n", "To reduce the number of manual steps in searching for and retrieving OCDS data from Prozorro the latest fortnightly bulk export was downloaded into an instance of the [ocdsdata ETL tool](https://github.com/open-contracting/ocdsdata) so that the full dataset could be queried using PostgreSQL.\n", "\n", "Using a sample of project titles from the CoST Ukraine portal and searching the `tender/title` and `tender/description` fields in the OCDS yielded the following results (number of relevant results shown in brackets):\n", "\n", "| Search term | Project 1 | Project 2 | Project 3 | Project 4 | Project 5 |\n", "|-------------|-----------|-----------|-----------|-----------|-----------|\n", "| Project title | 1 (1) | 1 (1) | 1 (1) | 3 (3) | 0 (0) |\n", "| Highway name | 14 (6) | 5 (3) | 3 (2) | 18 (8) | 73 (n/a) |\n", "| Highway name and km marker | 6 (6) | 4 (3) | 2 (2) | 9 (8) | n/a |\n", "\n", "Based on this sample, it seems that an automated search for the full project title is not sufficient to discover all related contracting processes and that searching for the highway name and km marker results in the best coverage and accuracy. As such two manual steps are required:\n", "\n", "1. Extract highway name and km marker from project title to use as search term (per project)\n", "1. Review tender title and description to determine whether the contracting process is related (per search result)\n", "\n", "We also noted that:\n", "\n", "* For 3 of the 5 projects, a Prozorro \"announcement number\" (ocid) was recorded in the CoST Portal\n", "* For 4 of the 5 projects, the project title in the CoST Portal was an exact match for a construction contract in Prozorro\n", "\n", "Suggesting that the list of projects in the CoST Ukraine portal may have been generated from Prozorro (e.g. by looking for tenders from the buyer with an item classification relating to construction).\n", "\n", "### Identifiers\n", "\n", "**Projects**\n", "\n", "Projects in the CoST Portal are not assigned an identifier. For the purposes of this analysis identifiers were constructed from the **agency**, **region** and **foreign key** extracted from the URL for the project, e.g.\n", "\n", "> portal.costukraine.org/**UAD**/**POL**/PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=**222***&master_viewmode=0\n", "\n", "**Contracting processes**\n", "\n", "Contracting processes in Prozorro are assigned an identifier (the `ocid`) however identifiers are not provided for the project a contracting process is associated with.\n", "\n", "Some projects in the CoST Ukraine portal include a *single* `ocid`, which appears to be for the main construction contract associated with the project (check this).\n", "\n", "### Defining a project\n", "\n", "The CoST Portal appears to list projects at the level of individual construction contracts.\n", "\n", "When searching for related contracting processes we noted examples where there were several different construction contracts relating to a single highway, suggesting that there may be overarching projects of which the individual construction contracts form a part.\n", "\n", "### Tracking change\n", "\n", "The [OCDS Releases and Records model](http://standard.open-contracting.org/latest/en/getting_started/releases_and_records/) is designed to support disclosure of both individual updates about a contracting process (releases) and a summary of the latest state of the contracting process (records).\n", "\n", "Currently, both the openprocurement API and the bulk OCDS export from Prozorro provide the latest state of the contracting process rather than a history of changes.\n", "\n", "Prozorro functionality to publish individual releases to enable tracking change over the life of a process is in development.\n", "\n", "There are two possible approaches to tracking change with the existing functionality, however there are obstacles to both of these:\n", "\n", "**Scrape the openprocurement API daily for specific contracting processes**\n", "* No search API is available, so requires first using either the Prozorro front end or ocdsdata ETL to find the relevant contracting processes.\n", "* Results in non OCDS data.\n", "\n", "**Download and search historic bulk exports**\n", "* Contracting processes do not neccessarily appear in the same segment across exports, so the full export for each fortnight must be downloaded (c. 46GB per export)\n", "* Exports are fortnightly so changes which occur more frequently than this may be missed\n", "\n", "We weren't able to determine whether the release identifier is updated when the contracting process changes in either the API or bulk exports, so in either case it may be neccessary to compare all fields in the release to determine whether anything has changed.\n", "\n", "\n", "### Comparing project and contracting data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Comparing start and end dates\n", "\n", "For each project:\n", "\n", "* The end date listed in the CoST portal for the project matched the latest end date of the related contracts found in Prozorro.\n", "* There was a related contract in Prozorro with a start date earlier than the start date listed in the project." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
startDate
projectprojectStartDate
UAD-DNE-4018.04.20172017-02-09T00:00:00+02:00
UAD-DNE-4421.07.20172017-05-11T00:00:00+03:00
UAD-POL-22216.06.20172017-06-08T00:00:00+03:00
UAD-SUM-17312.10.20172016-09-21T18:00:00+00:00
\n", "
" ], "text/plain": [ " startDate\n", "project projectStartDate \n", "UAD-DNE-40 18.04.2017 2017-02-09T00:00:00+02:00\n", "UAD-DNE-44 21.07.2017 2017-05-11T00:00:00+03:00\n", "UAD-POL-222 16.06.2017 2017-06-08T00:00:00+03:00\n", "UAD-SUM-173 12.10.2017 2016-09-21T18:00:00+00:00" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "contracts_df.pivot_table(values='startDate', index=['project', 'projectStartDate'], aggfunc=np.min)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
endDate
projectprojectEndDate
UAD-DNE-4031.12.20182018-12-31T00:00:00+02:00
UAD-DNE-4431.12.20172017-12-31T00:00:00+02:00
UAD-POL-22231.12.20182018-12-31T00:00:00+02:00
UAD-SUM-17331.12.20182018-12-31T00:00:00+02:00
\n", "
" ], "text/plain": [ " endDate\n", "project projectEndDate \n", "UAD-DNE-40 31.12.2018 2018-12-31T00:00:00+02:00\n", "UAD-DNE-44 31.12.2017 2017-12-31T00:00:00+02:00\n", "UAD-POL-222 31.12.2018 2018-12-31T00:00:00+02:00\n", "UAD-SUM-173 31.12.2018 2018-12-31T00:00:00+02:00" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "contracts_df.pivot_table(values='endDate', index=['project', 'projectEndDate'], aggfunc=np.max)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Understanding the project timeline\n", "\n", "For each project we were able to identify design, construction and monitoring contracts and use the data from Prozorro to understand the project timeline, e.g.\n", "\n", "![Related Contracts](images/relatedContracts.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Comparing project value\n", "\n", "For some projects the total value of contracts in Prozorro was greater than the total cost in the CoST Portal and for some projects is was less:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
amount
projectprojectValue
UAD-DNE-40968,055,500.0957,555,315.79
UAD-DNE-44203,382,619.0204,569,982.24
UAD-POL-222575,109,925.0576,803,727.92
UAD-SUM-17399,202,651.2102,590,531.98
\n", "
" ], "text/plain": [ " amount\n", "project projectValue \n", "UAD-DNE-40 968,055,500.0 957,555,315.79\n", "UAD-DNE-44 203,382,619.0 204,569,982.24\n", "UAD-POL-222 575,109,925.0 576,803,727.92\n", "UAD-SUM-173 99,202,651.2 102,590,531.98" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.options.display.float_format = '{:,}'.format\n", "contracts_df.pivot_table(values='amount', index=['project','projectValue'], aggfunc=np.sum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Value Differences](images/valueDifferences.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Understanding the split of project costs\n", "\n", "Using the data from Prozorro we were able to understand the split of the total project cost across design, construction and monitoring contracts, e.g.\n", "\n", "![Cost Split](images/valuePie.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusions\n", "\n", "OCDS data can be used to:\n", "\n", "* Discover contracts related to a project\n", "* Understand the project timeline\n", "* Understand the total cost of a project\n", "\n", "The demonstrator suggests there is additional value in using OCDS data for infrastructure monitoring, for example, we found:\n", "\n", "* Multiple design contracts for a single project, including intial design contracts for 'new construction' followed by further design contracts for 'reconstruction'\n", "* Variances in the project values reported in the CoST portal and the total value of contracts in Prozorro\n", "* Variances in the split of the total project value between design, construction and monitoring contracts\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Scraping data from CoST Ukraine portal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Navigate to the [CoST Ukraine Portal](http://portal.costukraine.org)\n", "\n", "*Note: Use Google Chrome with auto-translation enabled, unless you speak Ukrainian*\n", "\n", "Currently only projects of Ukravtodor, the State Highways Agency, are listed, so click the Ukravtodor logo.\n", "\n", "Use the map to choose a region to see the projects list for, e.g. the [Sumy region](http://portal.costukraine.org/proekti/ukravtodor/sumska-oblast/)\n", "\n", "Auto-translate doesn't work on the projects list page, so click \"ТАБЛИЦІ\" in the grey header bar to get a view which can be translated.\n", "\n", "*Note: You might need to open the \"ТАБЛИЦІ\" link in a new tab to get it to load*\n", "\n", "Choose the project which you want to scrape from the list, e.g. [Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838](http://89.185.0.248:8888/UAD/SUM/PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=173&master_viewmode=0)\n", "\n", "*Note: Use of Prozorro for above and below threshold procurement has only been mandatory since 1st August 2016, so try and find a project which starts after this date to give the best chance of finding related contracts*\n", "\n", "Set the following variables based on the URL of the project you selected:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "#update with url segment for the public entity, e.g. UAD for State Highways Agency of Ukraine\n", "publicEntity = \"UAD\"\n", "\n", "#update with region for project, e.g. SUM for region, note for MFO projects the url construction is slightly different (see commented out html = line below)\n", "region = \"SUM\"\n", "\n", "#update with value of &fk0 parameter in URL of project you want to scrape - this identifies the project\n", "foreignKey = \"173\"\n", "\n", "#construct an identifier for saving data\n", "projectID = publicEntity + \"-\" + region + \"-\" + foreignKey" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scrape data from CoST Portal" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=3&master_viewmode=0\n", "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_SUBJECTS_handler&fk0=3&master_viewmode=0\n", "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_CUSTOMER_handler&fk0=3&master_viewmode=0\n", "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_FINANCING_handler&fk0=3&master_viewmode=0\n", "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_PROJECT_ORGANIZATION_handler&fk0=3&master_viewmode=0\n", "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_CONTRACTOR_handler&fk0=3&master_viewmode=0\n", "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_ENG_SUPERVIZORY_handler&fk0=3&master_viewmode=0\n", "scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_SUPERVIZORY_handler&fk0=3&master_viewmode=0\n", "done scraping\n" ] } ], "source": [ "from requests import get\n", "from bs4 import BeautifulSoup\n", "import pprint\n", "import json\n", "import ipywidgets as widgets\n", "\n", "#function to scrape content of main table\n", "def scrape(url,output):\n", " \n", " #get html and convert to nice object\n", " html = get(url,stream=True).content\n", " html = BeautifulSoup(html, \"html.parser\")\n", " \n", " #get name of section we are scraping and create an object for it\n", " section = html.body[\"id\"]\n", " output[section] = {} \n", " \n", " main_table = html.find(\"div\", class_=\"well\")\n", " \n", " if main_table != None:\n", " for td in main_table.select(\"td\"):\n", " if \"data-column-name\" in td.attrs:\n", " output[section][td[\"data-column-name\"]] = td.text\n", " \n", " return output\n", "\n", "#get html of first page and convert to nice object\n", "html = get(\"http://portal.costukraine.org/uad_mfo/PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=\" + foreignKey + \"&master_viewmode=0\",stream=True).content #use this line for MFO projects\n", "#html = get(\"http://89.185.0.248:8888/\"+publicEntity+\"/\"+region+\"/PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=\" + foreignKey + \"&master_viewmode=0\",stream=True).content #use this line for all other projects\n", "html = BeautifulSoup(html, \"html.parser\")\n", "\n", "#set up array to store urls for each view of project\n", "urls = []\n", "\n", "#get urls of each page\n", "navigation = html.find(\"ul\", class_=\"nav nav-tabs grid-details-tabs\")\n", "for li in navigation.select(\"li\"):\n", " urls.append(li.a[\"href\"])\n", " \n", "#put amendments URL in separate variable (no data found for this page yet, so we don't do anything with this)\n", "amendmentsURL = urls.pop()\n", "\n", "#set up object for scraped data\n", "project = {}\n", "\n", "#scrape summary table (appears on each page, so only do this once)\n", "project[\"summary\"] = {}\n", "\n", "summary_table = html.find(\"div\", class_=\"grid grid-table grid-master js-grid\")\n", "\n", "for th in summary_table.select(\"th\"):\n", " project[\"summary\"][th[\"data-name\"]] = \"\"\n", " \n", "for td in summary_table.select(\"td\"):\n", " if \"data-column-name\" in td.attrs:\n", " project[\"summary\"][td[\"data-column-name\"]] = td.text\n", "\n", "#scrape main table on each page\n", "for url in urls:\n", " print(\"scraping \" + url)\n", " project = scrape(\"http://portal.costukraine.org/uad_mfo/\" + url, project) #use this line for MFO projects\n", " #project = scrape(\"http://89.185.0.248:8888/\"+publicEntity+\"/\"+region+\"/\" + url, project) #use this line for all other projects\n", "\n", "print(\"done scraping\")\n", "\n", "data = {}\n", "data = {\"project\": project}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Translate scraped data" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "import copy\n", "import os\n", "from google.cloud import translate\n", "\n", "#Set google cloud API credentials\n", "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './ocds-f08bd860a3be.json'\n", "\n", "#create a copy of the project from the Ukraine portal to translate\n", "data[\"project_translated\"] = copy.deepcopy(project)\n", "\n", "# Instantiate translation client\n", "translate_client = translate.Client()\n", "target = 'en'\n", "\n", "#translate project\n", "project_translated = data[\"project_translated\"]\n", "for section in project_translated:\n", " for key in project_translated[section]:\n", " if type(project_translated[section][key]) == str:\n", " text = project_translated[section][key]\n", " translation = translate_client.translate(text, target_language = target)\n", " if text != translation:\n", " project_translated[section][key] = translation['translatedText']\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save data" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "\n", "if not os.path.exists(\"data\"):\n", " os.makedirs(\"data\")\n", "\n", "with open(\"data/\" + projectID + \".json\",\"w\") as export:\n", " json.dump(data,export,indent=2) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Finding related contracts in Prozorro (using ocdsdata ETL tool)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Connect to ocdsdata instance" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ddewhurst/open_data_services/ocds/repositories/ocinfra.ocinfra_ukraine_demonstrator/.ve/lib/python3.5/site-packages/psycopg2/__init__.py:144: UserWarning:\n", "\n", "The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use \"pip install psycopg2-binary\" instead. For details see: .\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Enter database user password: ········\n" ] } ], "source": [ "import getpass\n", "import psycopg2\n", "\n", "dbpassword = getpass.getpass(\"Enter database user password: \") #see /.pgpass file on server\n", "\n", "# db connection config\n", "conn = psycopg2.connect(\n", " database = 'ocdsdata',\n", " user = 'ocdsdata',\n", " password = dbpassword,\n", " host = '195.201.163.242',\n", " port = '5432',\n", ")\n", "\n", "# clear db user password\n", "dbpassword = ''\n", "\n", "# create db cursor\n", "cur = conn.cursor()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Search for related contracting processes" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tenders data already exists for this project, overwrite? (y/n) y\n", "\n", "\n", "Project Title:\n", "Будівництво обходів трьох населених пунктів на автомобільній дорозі М-03 Київ-Харків-Довжанський: с.Покровська Багачка, с.Красногорівка, м.Полтава (Ічерга). \n", "\n", "Project Title (translated):\n", "Construction of detours of three settlements on the motorway M-03 Kiev-Kharkiv-Dovzhansky: p.Pokrovskaya Bagachka, s.Krasnogorovka, Poltava (Icherga). \n", "\n", "Enter pipe (|) delimited list of search terms from Ukrainian project title: М-03 Київ-Харків-Довжанський|Будівництво обходів\n", "\n", " Searching...\n", "Found 0 contracting process(es)\n", "\n", "done\n" ] } ], "source": [ "import json\n", "import copy\n", "import os\n", "from google.cloud import translate\n", "from IPython.display import display\n", "\n", "#Set google cloud API credentials\n", "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/ddewhurst/open_data_services/ocds/repositories/ocinfra.ocinfra_ukraine_demonstrator/ocds-f08bd860a3be.json'\n", "\n", "# Instantiate translation client\n", "translate_client = translate.Client()\n", "target = 'en'\n", "\n", "#open project data\n", "with open(\"data/\"+projectID+\".json\") as file:\n", " data = json.load(file)\n", "\n", "#check if related contracting processes already exist and confirm overwrite\n", "if \"tenders\" in data:\n", " overwrite = input(\"Tenders data already exists for this project, overwrite? (y/n) \")\n", " if overwrite == \"y\":\n", " del(data[\"tenders\"])\n", " print(\"\\n\")\n", " \n", "\n", "if \"tenders\" not in data:\n", " \n", " #display project name in UK and EN and choose search terms\n", " print(\"Project Title:\")\n", " print(data[\"project\"][\"summary\"][\"name\"],\"\\n\")\n", " print(\"Project Title (translated):\")\n", " print(data[\"project_translated\"][\"summary\"][\"name\"],\"\\n\")\n", " \n", " #prompt for search terms\n", " searchInput = input(\"Enter pipe (|) delimited list of search terms from Ukrainian project title: \")\n", " searchTerms = searchInput.split(\"|\")\n", " \n", " #construct SQL query\n", " searchString = \"\"\"\n", " SELECT\n", " data\n", " FROM\n", " data\n", " WHERE\n", " data ->> 'ocid' LIKE 'ocds-be6bcu%'\n", " \"\"\"\n", " \n", " titleSearchString = \"\"\" AND ((\"\"\"\n", " descSearchString = \"\"\") OR (\"\"\"\n", " \n", " for term in searchTerms:\n", " titleSearchString = titleSearchString + \"data -> 'tender' ->> 'title' LIKE '%\" + term + \"%'\"\n", " descSearchString = descSearchString + \"data -> 'tender' ->> 'description' LIKE '%\" + term + \"%'\"\n", " if searchTerms.index(term) < len(searchTerms) - 1:\n", " titleSearchString = titleSearchString + \" AND \"\n", " descSearchString = descSearchString + \" AND \"\n", " else:\n", " descSearchString = descSearchString + \"))\"\n", "\n", " searchString = searchString + titleSearchString + descSearchString\n", "\n", " #search database and return results\n", " print(\"\\n\",\"Searching...\")\n", " cur.execute(\"rollback\")\n", " cur.execute(searchString)\n", " \n", " allTenders = []\n", " relatedTenders = []\n", " \n", " for result in cur.fetchall():\n", " allTenders.append(result[0])\n", " \n", " print(\"Found \",len(allTenders),\" contracting process(es)\\n\")\n", " \n", " #translate tender title & description\n", " for tender in allTenders:\n", " \n", " print(\"OCID: \",tender[\"ocid\"])\n", " \n", " if \"title\" in tender[\"tender\"]:\n", " text = tender[\"tender\"][\"title\"]\n", " translation = translate_client.translate(text, target_language = \"en\")\n", " tender[\"tender\"][\"title_en\"] = translation[\"translatedText\"]\n", " print(\"Title: \",tender[\"tender\"][\"title_en\"])\n", " \n", " if \"description\" in tender[\"tender\"]:\n", " text = tender[\"tender\"][\"description\"]\n", " translation = translate_client.translate(text, target_language = \"en\")\n", " tender[\"tender\"][\"description_en\"] = translation[\"translatedText\"]\n", " print(\"Description\",tender[\"tender\"][\"description_en\"],\"\\n\")\n", " \n", " #prompt to confirm tender is related and categorise as planning, contstruction or monitoring\n", " if input(\"Is this tender related to the project? (y/n): \") == \"y\":\n", " relatedTenders.append(tender)\n", " tenderType = input(\"Is this a (p)lanning, (c)onstruction or (m)onitoring tender?: \")\n", " tender[\"ocinfra\"] = {}\n", " if tenderType == \"p\":\n", " tender[\"ocinfra\"][\"type\"] = \"planning\"\n", " elif tenderType == \"c\":\n", " tender[\"ocinfra\"][\"type\"] = \"construction\"\n", " elif tenderType == \"m\":\n", " tender[\"ocinfra\"][\"type\"] = \"monitoring\"\n", " else:\n", " print(\"ERROR\")\n", " print(\"\\n\")\n", " \n", " #save data\n", " data[\"tenders\"] = relatedTenders\n", " \n", " with open(\"data/\"+projectID+\".json\",\"w\") as file:\n", " json.dump(data,file,indent=2)\n", " \n", " print(\"done\")\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Compare data in Prozorro to data in CoST Ukraine Portal" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load data on contracts" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import json\n", "import glob\n", "\n", "paths = glob.glob(\"data/*.json\")\n", "\n", "contracts = []\n", "\n", "for path in paths:\n", " if \"translated\" not in path:\n", " with open(path) as file:\n", " data = json.load(file)\n", "\n", " for tender in data[\"tenders\"]:\n", " if \"contracts\" in tender:\n", " for contract in tender[\"contracts\"]:\n", " contracts.append({\n", " \"project\": path.split(\"/\")[-1].split(\".\")[-2],\n", " \"projectStartDate\": data[\"project\"][\"summary\"][\"term_start\"],\n", " \"projectEndDate\": data[\"project\"][\"summary\"][\"term_end\"],\n", " \"projectValue\": data[\"project\"][\"summary\"][\"total_cost\"],\n", " \"ocid\": tender[\"ocid\"],\n", " \"title_en\": tender[\"tender\"][\"title_en\"],\n", " \"type\": tender[\"ocinfra\"][\"type\"],\n", " \"contractID\": contract[\"contractID\"],\n", " \"status\": contract[\"status\"],\n", " \"startDate\": contract[\"period\"][\"startDate\"],\n", " \"endDate\": contract[\"period\"][\"endDate\"],\n", " \"amount\": contract[\"value\"][\"amount\"]\n", " })\n", " if contract[\"value\"][\"currency\"] != \"UAH\":\n", " print(\"Warning: Multiple currencies\")\n", "\n", "contracts_df = pd.DataFrame(contracts)\n", "\n", "contracts_df = contracts_df[[\"project\",\"projectStartDate\",\"projectEndDate\",\"projectValue\",\"ocid\",\"type\",\"title_en\",\"contractID\",\"status\",\"startDate\",\"endDate\",\"amount\"]]\n", "\n", "contracts_df['projectValue'] = contracts_df['projectValue'].str.replace(' ', '')\n", "contracts_df['projectValue'] = pd.to_numeric(contracts_df['projectValue'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare start dates" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
startDate
projectprojectStartDate
UAD-DNE-4018.04.20172017-02-09T00:00:00+02:00
UAD-DNE-4421.07.20172017-05-11T00:00:00+03:00
UAD-POL-22216.06.20172017-06-08T00:00:00+03:00
UAD-POL-222_translated16.06.20172017-06-08T00:00:00+03:00
UAD-SUM-17312.10.20172016-09-21T18:00:00+00:00
\n", "
" ], "text/plain": [ " startDate\n", "project projectStartDate \n", "UAD-DNE-40 18.04.2017 2017-02-09T00:00:00+02:00\n", "UAD-DNE-44 21.07.2017 2017-05-11T00:00:00+03:00\n", "UAD-POL-222 16.06.2017 2017-06-08T00:00:00+03:00\n", "UAD-POL-222_translated 16.06.2017 2017-06-08T00:00:00+03:00\n", "UAD-SUM-173 12.10.2017 2016-09-21T18:00:00+00:00" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "contracts_df.pivot_table(values='startDate', index=['project', 'projectStartDate'], aggfunc=np.min)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare end dates" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
endDate
projectprojectEndDate
UAD-DNE-4031.12.20182018-12-31T00:00:00+02:00
UAD-DNE-4431.12.20172017-12-31T00:00:00+02:00
UAD-POL-22231.12.20182018-12-31T00:00:00+02:00
UAD-POL-222_translated31.12.20182018-12-31T00:00:00+02:00
UAD-SUM-17331.12.20182018-12-31T00:00:00+02:00
\n", "
" ], "text/plain": [ " endDate\n", "project projectEndDate \n", "UAD-DNE-40 31.12.2018 2018-12-31T00:00:00+02:00\n", "UAD-DNE-44 31.12.2017 2017-12-31T00:00:00+02:00\n", "UAD-POL-222 31.12.2018 2018-12-31T00:00:00+02:00\n", "UAD-POL-222_translated 31.12.2018 2018-12-31T00:00:00+02:00\n", "UAD-SUM-173 31.12.2018 2018-12-31T00:00:00+02:00" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "contracts_df.pivot_table(values='endDate', index=['project', 'projectEndDate'], aggfunc=np.max)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualise start and end dates" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "data": [ { "marker": { "color": "white" }, "name": "", "type": "scatter", "uid": "119451c8-83f0-11e8-bef2-5ce0c5766920", "x": [ "2017-10-24T00:00:00+03:00", "2017-12-31T00:00:00+02:00" ], "y": [ 0, 0 ] }, { "marker": { "color": "white" }, "name": "", "type": "scatter", "uid": "119451c9-83f0-11e8-bef2-5ce0c5766920", "x": [ "2017-10-24T00:00:00+03:00", "2018-12-31T00:00:00+02:00" ], "y": [ 1, 1 ] }, { "marker": { "color": "white" }, "name": "", "type": "scatter", "uid": "119451ca-83f0-11e8-bef2-5ce0c5766920", "x": [ "2017-10-12T00:00:00+03:00", "2018-12-31T00:00:00+02:00" ], "y": [ 2, 2 ] }, { "marker": { "color": "white" }, "name": "", "type": "scatter", "uid": "119451cb-83f0-11e8-bef2-5ce0c5766920", "x": [ "2016-12-14T22:00:00+02:00", "2017-12-31T22:00:00+02:00" ], "y": [ 3, 3 ] }, { "marker": { "color": "white" }, "name": "", "type": "scatter", "uid": "119451cc-83f0-11e8-bef2-5ce0c5766920", "x": [ "2016-12-14T00:00:00+02:00", "2017-12-31T00:00:00+02:00" ], "y": [ 4, 4 ] }, { "marker": { "color": "white" }, "name": "", "type": "scatter", "uid": "119451cd-83f0-11e8-bef2-5ce0c5766920", "x": [ "2016-09-21T18:00:00+00:00", "2016-12-31T20:00:00+00:00" ], "y": [ 5, 5 ] }, { "hoverinfo": "none", "marker": { "color": "rgb(251, 96, 69)", "size": 1 }, "name": "construction", "showlegend": true, "type": "scatter", "uid": "119451ce-83f0-11e8-bef2-5ce0c5766920", "x": [ "2016-09-21T18:00:00+00:00", "2016-09-21T18:00:00+00:00" ], "y": [ 0, 0 ] }, { "hoverinfo": "none", "marker": { "color": "rgb(108, 117, 225)", "size": 1 }, "name": "monitoring", "showlegend": true, "type": "scatter", "uid": "119451cf-83f0-11e8-bef2-5ce0c5766920", "x": [ "2016-09-21T18:00:00+00:00", "2016-09-21T18:00:00+00:00" ], "y": [ 1, 1 ] }, { "hoverinfo": "none", "marker": { "color": "rgb(214, 225, 0)", "size": 1 }, "name": "planning", "showlegend": true, "type": "scatter", "uid": "119451d0-83f0-11e8-bef2-5ce0c5766920", "x": [ "2016-09-21T18:00:00+00:00", "2016-09-21T18:00:00+00:00" ], "y": [ 2, 2 ] } ], "layout": { "height": 600, "hovermode": "closest", "shapes": [ { "fillcolor": "rgb(214, 225, 0)", "line": { "width": 0 }, "opacity": 1, "type": "rect", "x0": "2017-10-24T00:00:00+03:00", "x1": "2017-12-31T00:00:00+02:00", "xref": "x", "y0": -0.2, "y1": 0.2, "yref": "y" }, { "fillcolor": "rgb(108, 117, 225)", "line": { "width": 0 }, "opacity": 1, "type": "rect", "x0": "2017-10-24T00:00:00+03:00", "x1": "2018-12-31T00:00:00+02:00", "xref": "x", "y0": 0.8, "y1": 1.2, "yref": "y" }, { "fillcolor": "rgb(251, 96, 69)", "line": { "width": 0 }, "opacity": 1, "type": "rect", "x0": "2017-10-12T00:00:00+03:00", "x1": "2018-12-31T00:00:00+02:00", "xref": "x", "y0": 1.8, "y1": 2.2, "yref": "y" }, { "fillcolor": "rgb(214, 225, 0)", "line": { "width": 0 }, "opacity": 1, "type": "rect", "x0": "2016-12-14T22:00:00+02:00", "x1": "2017-12-31T22:00:00+02:00", "xref": "x", "y0": 2.8, "y1": 3.2, "yref": "y" }, { "fillcolor": "rgb(214, 225, 0)", "line": { "width": 0 }, "opacity": 1, "type": "rect", "x0": "2016-12-14T00:00:00+02:00", "x1": "2017-12-31T00:00:00+02:00", "xref": "x", "y0": 3.8, "y1": 4.2, "yref": "y" }, { "fillcolor": "rgb(214, 225, 0)", "line": { "width": 0 }, "opacity": 1, "type": "rect", "x0": "2016-09-21T18:00:00+00:00", "x1": "2016-12-31T20:00:00+00:00", "xref": "x", "y0": 4.8, "y1": 5.2, "yref": "y" } ], "showlegend": true, "title": "Related Contracts", "width": 900, "xaxis": { "rangeselector": { "buttons": [ { "count": 7, "label": "1w", "step": "day", "stepmode": "backward" }, { "count": 1, "label": "1m", "step": "month", "stepmode": "backward" }, { "count": 6, "label": "6m", "step": "month", "stepmode": "backward" }, { "count": 1, "label": "YTD", "step": "year", "stepmode": "todate" }, { "count": 1, "label": "1y", "step": "year", "stepmode": "backward" }, { "step": "all" } ] }, "showgrid": true, "type": "date", "zeroline": false }, "yaxis": { "autorange": false, "range": [ -1, 7 ], "showgrid": true, "ticktext": [ "ГБН Г.1-218-182: 2011 Development of the design documentation stage The working documentation for the object "Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838" (DK 021: 2015: 71322000-1 Services from engineering design in the field of civil engineering)", "ГБН Г.1-218-182: 2011 Implementation of author's supervision on the object "Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838" (71247000-1 Supervision of construction work)", "ДСТУ Б.Д.1.1-1: 2013 Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838 (DK 021: 2015: 45221000-2 Construction of bridges and tunnels, mines and undergrounds)", "Development of project documentation "Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838"", "Development of the project documentation "Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838" (Additional agreement for the state examination of the project documentation.)", "Development of project documentation "New construction of a bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838"" ], "tickvals": [ 0, 1, 2, 3, 4, 5 ], "zeroline": false } } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly\n", "import plotly.figure_factory as ff\n", "from copy import copy\n", "\n", "plotly.offline.init_notebook_mode()\n", "\n", "# generate gantt chart for each project\n", "gantt_charts = {}\n", "\n", "for project in contracts_df[\"project\"].unique():\n", " \n", " # create dataframe with correct structure for gantt chart\n", " gantt_charts[project] = {}\n", " gantt_df = contracts_df.rename(index=str, columns={\"title_en\":\"Task\", \"startDate\":\"Start\",\"endDate\":\"Finish\"})\n", " gantt_df = gantt_df.loc[gantt_df[\"project\"] == project]\n", " gantt_charts[project][\"data\"] = copy(gantt_df)\n", " \n", " # set colours for contract types\n", " colors= {}\n", " \n", " if \"planning\" in gantt_df[\"type\"].unique():\n", " colors[\"planning\"] = \"#d6e100\"\n", " if \"construction\" in gantt_df[\"type\"].unique():\n", " colors[\"construction\"] = \"#fb6045\"\n", " if \"monitoring\" in gantt_df[\"type\"].unique():\n", " colors[\"monitoring\"] = \"#6c75e1\"\n", " \n", " gantt_df = gantt_df.sort_values(by=\"Start\",ascending=False)\n", " \n", " # create gantt chart\n", " gantt_charts[project][\"chart\"] = ff.create_gantt(\n", " gantt_df,\n", " colors=colors,\n", " index_col='type',\n", " #reverse_colors=True,\n", " show_colorbar=True,\n", " showgrid_x=True,\n", " showgrid_y=True,\n", " title=\"Related Contracts\"\n", " )\n", "# plot example gantt chart and save image for display in summary\n", "plotly.offline.iplot(gantt_charts[projectID][\"chart\"], image=\"png\", filename=\"relatedContracts\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compare project values" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
amount
projectprojectValue
UAD-DNE-40968,055,500.0957,555,315.79
UAD-DNE-44203,382,619.0204,569,982.24
UAD-POL-222575,109,925.0576,803,727.92
UAD-SUM-17399,202,651.2102,590,531.98
\n", "
" ], "text/plain": [ " amount\n", "project projectValue \n", "UAD-DNE-40 968,055,500.0 957,555,315.79\n", "UAD-DNE-44 203,382,619.0 204,569,982.24\n", "UAD-POL-222 575,109,925.0 576,803,727.92\n", "UAD-SUM-173 99,202,651.2 102,590,531.98" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.options.display.float_format = '{:,}'.format\n", "contracts_df.pivot_table(values='amount', index=['project', 'projectValue'], aggfunc=np.sum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualise project value differences" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "data": [ { "marker": { "color": "#d6e100" }, "name": "Difference", "type": "bar", "uid": "278b09d6-83f0-11e8-bef2-5ce0c5766920", "x": [ "Project 1", "Project 2", "Project 3", "Project 4" ], "y": [ -10500184.210000038, 1187363.2400000095, 1693802.919999957, 3387880.780000001 ] } ], "layout": { "title": "Difference between value in Prozorro and value in CoST Portal (UAH)" } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly\n", "import plotly.graph_objs as go\n", "\n", "plotly.offline.init_notebook_mode()\n", "\n", "# set number formatting\n", "pd.options.display.float_format = '{:,}'.format\n", "\n", "# calculate sum of contract values for each project\n", "value_pivot = contracts_df.pivot_table(values='amount', index=['project','projectValue'], aggfunc=np.sum)\n", "\n", "# create list of values for chart\n", "values = []\n", "\n", "for index, row in value_pivot.iterrows():\n", " values.append(row['amount']-index[1])\n", "\n", "# set chart properties\n", "trace1 = go.Bar(\n", " x=[\"Project 1\",\"Project 2\", \"Project 3\", \"Project 4\"],\n", " y=values,\n", " name='Difference',\n", " marker=dict(\n", " color='#d6e100'\n", " )\n", ")\n", "data = [trace1]\n", "layout = go.Layout(\n", " title='Difference between value in Prozorro and value in CoST Portal (UAH)'\n", ")\n", "\n", "# plot chart and save image for display in summary\n", "fig = go.Figure(data=data, layout=layout)\n", "plotly.offline.iplot(fig, filename='valueDifferences', image='png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualise value by contract type" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "data": [ { "hoverinfo": "label+percent", "labels": [ "planning", "construction", "monitoring" ], "marker": { "colors": [ "#d6e100", "#fb6045", "#6c75e1" ], "line": { "color": "#000000", "width": 1 } }, "name": "pie", "textinfo": "value", "type": "pie", "uid": "426dfab0-83f0-11e8-bef2-5ce0c5766920", "values": [ 3067077.58, 99202651.2, 320803.2 ] } ], "layout": {} }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly\n", "import plotly.graph_objs as go\n", "\n", "plotly.offline.init_notebook_mode()\n", "\n", "# create pie chart of contract value by type for each project\n", "pie_charts = {}\n", "\n", "for project in contracts_df[\"project\"].unique():\n", " \n", " pie_charts[project] = {}\n", " \n", " # create dataframe with correct structure for pie chart\n", " pie_df = contracts_df.loc[contracts_df[\"project\"] == project]\n", " \n", " # set colours, values and labels for pie chart\n", " colors= []\n", " values = []\n", " labels = []\n", " \n", " if \"planning\" in pie_df[\"type\"].unique():\n", " colors.append(\"#d6e100\")\n", " values.append(pie_df[pie_df[\"type\"] == \"planning\"][\"amount\"].sum())\n", " labels.append(\"planning\")\n", " if \"construction\" in pie_df[\"type\"].unique():\n", " colors.append(\"#fb6045\")\n", " values.append(pie_df[pie_df[\"type\"] == \"construction\"][\"amount\"].sum())\n", " labels.append(\"construction\")\n", " if \"monitoring\" in pie_df[\"type\"].unique():\n", " colors.append(\"#6c75e1\")\n", " values.append(pie_df[pie_df[\"type\"] == \"monitoring\"][\"amount\"].sum())\n", " labels.append(\"monitoring\")\n", "\n", " labels = ['planning','construction','monitoring']\n", " \n", " # create pie chart\n", " pie_charts[project] = go.Pie(\n", " labels=labels,\n", " values=values,\n", " hoverinfo='label+percent',\n", " textinfo='value',\n", " name=\"pie\",\n", " #layout = {\"title\":\"Global Emissions 1990-2011\"},\n", " marker=dict(\n", " colors=colors,\n", " line=dict(color='#000000', width=1)\n", " )\n", " )\n", "\n", "# display example pie chart and save image to display in summary\n", "plotly.offline.iplot([pie_charts[projectID]], filename='valuePie', image='png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Assess CoST IDS coverage\n", "\n", "Based on [CoST IDS](http://www.constructiontransparency.org/documentdownload.axd?documentresourceid=31) 'Table 1. Project and Contract Data for proactive disclosure'\n", "\n", "*Note: Some elements of 'Table 2. Project and Contract information for disclosure upon request' are included in the Prozorro data*\n", "\n", "There are some discrepancies between the following versions of the CoST IDS published on the CoST website:\n", "\n", "* [Version A](http://www.constructiontransparency.org/documentdownload.axd?documentresourceid=31)\n", "* [Version B](http://www.constructiontransparency.org/documentdownload.axd?documentresourceid=36)\n", "\n", "Version A includes 'Last updated' and 'Project reference number' elements which are not present in Version B.\n", "\n", "Version A combines 'Contract start date' and 'Contract duration' into a single element, they are separate in Version B\n", "\n", "Version A includes a duplicate element 'Variation to contract price' which there is only a single instance of Version B, whilst Version B includes 'Escalation to contract price' which is not present in Version A.\n", "\n", "For the purposes of this analysis we have used Version A and added the two additional elements from version B." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
objectphasedataprozorro_mappingprozorro_examplecost_mappingcost_examplenotes
id
P0ProjectProject identificationProject reference numberProject identifiers were not included in eithe...
P1ProjectProject identificationProject ownerorganization\"organization\": \"SAD in Poltava region\"The owner of the project could be inferred fro...
P2ProjectProject identificationSector, subsectorDiffers from item classification in Prozorro (...
P3ProjectProject identificationProject namename\"name\": \"Construction of overpass and transpor...Project name may appear in tender/title or ten...
P4ProjectProject identificationProject Locationtender/items/deliveryAddress\"deliveryAddress\": { ...Delivery address from the main construction co...
P5ProjectProject identificationPurposeDefinition requires clarification
P6ProjectProject identificationProject descriptionProject description may appear in tender/descr...
P7ProjectProject preparationProject scope (main output)Definition requires clarification
P8ProjectProject preparationEnvironmental impact
P9ProjectProject preparationLand and settlement impact
P10ProjectProject preparationContact detailsDefinition requires clarification, contact det...
P11ProjectProject preparationFunding sources
P12ProjectProject preparationProject Budget
P13ProjectProject preparationProject budget approval date
P14ProjectProject completionProject status (current)
P15ProjectProject completionCompletion cost (projected)total_cost\"total_cost\": \"575 109 925.00\"The projected completion cost could be calcula...
P16ProjectProject completionCompletion date (projected)term_end\"term_end\": \"31.12.2018\"The projected completion date for the project ...
P17ProjectProject completionScope at completion (projected)
P18ProjectProject completionReasons for project changes
P19ProjectProject completionReference to audit and evaluation reports
C1ContractProcurementProcuring entitytender/procuringEntity\"procuringEntity\": { \"additi...Recommend mapping to whole object to capture b...
C2ContractProcurementProcuring entity contact detailstender/procuringEntity/contactPoint\"contactPoint\": { \"email...
C3ContractProcurementContract administration entity
C5ContractProcurementProcurement processtender/procurementMethod,tender/procurementMet...\"procurementMethod\": \"open\",\"procurementMethod...Definition requires clarification
C6ContractProcurementContract typeDefinition requires clarification
C4ContractProcurementContract status (current)Prozorro includes a contracts/status field how...
C7ContractProcurementNumber of firms tenderingtender/numberOfTenderers\"numberOfTenderers\": 4Prozorro also includes the name, address, iden...
C8ContractProcurementCost estimatetender/value\"value\": { \"amount\": 5780000...tender/value is the estimated cost at the time...
C9ContractProcurementContract titletender/title\"title\": \"Construction of overpass and transpo...The contract object in Prozorro does not inclu...
C10ContractProcurementContract firm(s)awards/suppliers\"suppliers\": [ { ...
C11ContractProcurementContract pricecontracts/values\"value\": { \"amount\": 575...
C12ContractProcurementContract scope of workcontracts/documents\"documents\": [ { ...A scope or description for the contract isn't ...
C13ContractProcurementContract start date and durationcontracts/period\"period\": { \"endDate\": \"...
C14ContractImplementationVariation to contract priceOur understanding is that information in Prozo...
C15ContractImplementationEscalation of contract priceOur understanding is that information in Prozo...
C16ContractImplementationVariation to contract durationOur understanding is that information in Prozo...
C17ContractImplementationVariation to contract scopeOur understanding is that information in Prozo...
C18ContractImplementationReasons for price changesOur understanding is that information in Prozo...
C19ContractImplementationReasons for scope and duration changesOur understanding is that information in Prozo...
\n", "
" ], "text/plain": [ " object phase \\\n", "id \n", "P0 Project Project identification \n", "P1 Project Project identification \n", "P2 Project Project identification \n", "P3 Project Project identification \n", "P4 Project Project identification \n", "P5 Project Project identification \n", "P6 Project Project identification \n", "P7 Project Project preparation \n", "P8 Project Project preparation \n", "P9 Project Project preparation \n", "P10 Project Project preparation \n", "P11 Project Project preparation \n", "P12 Project Project preparation \n", "P13 Project Project preparation \n", "P14 Project Project completion \n", "P15 Project Project completion \n", "P16 Project Project completion \n", "P17 Project Project completion \n", "P18 Project Project completion \n", "P19 Project Project completion \n", "C1 Contract Procurement \n", "C2 Contract Procurement \n", "C3 Contract Procurement \n", "C5 Contract Procurement \n", "C6 Contract Procurement \n", "C4 Contract Procurement \n", "C7 Contract Procurement \n", "C8 Contract Procurement \n", "C9 Contract Procurement \n", "C10 Contract Procurement \n", "C11 Contract Procurement \n", "C12 Contract Procurement \n", "C13 Contract Procurement \n", "C14 Contract Implementation \n", "C15 Contract Implementation \n", "C16 Contract Implementation \n", "C17 Contract Implementation \n", "C18 Contract Implementation \n", "C19 Contract Implementation \n", "\n", " data \\\n", "id \n", "P0 Project reference number \n", "P1 Project owner \n", "P2 Sector, subsector \n", "P3 Project name \n", "P4 Project Location \n", "P5 Purpose \n", "P6 Project description \n", "P7 Project scope (main output) \n", "P8 Environmental impact \n", "P9 Land and settlement impact \n", "P10 Contact details \n", "P11 Funding sources \n", "P12 Project Budget \n", "P13 Project budget approval date \n", "P14 Project status (current) \n", "P15 Completion cost (projected) \n", "P16 Completion date (projected) \n", "P17 Scope at completion (projected) \n", "P18 Reasons for project changes \n", "P19 Reference to audit and evaluation reports \n", "C1 Procuring entity \n", "C2 Procuring entity contact details \n", "C3 Contract administration entity \n", "C5 Procurement process \n", "C6 Contract type \n", "C4 Contract status (current) \n", "C7 Number of firms tendering \n", "C8 Cost estimate \n", "C9 Contract title \n", "C10 Contract firm(s) \n", "C11 Contract price \n", "C12 Contract scope of work \n", "C13 Contract start date and duration \n", "C14 Variation to contract price \n", "C15 Escalation of contract price \n", "C16 Variation to contract duration \n", "C17 Variation to contract scope \n", "C18 Reasons for price changes \n", "C19 Reasons for scope and duration changes \n", "\n", " prozorro_mapping \\\n", "id \n", "P0 \n", "P1 \n", "P2 \n", "P3 \n", "P4 tender/items/deliveryAddress \n", "P5 \n", "P6 \n", "P7 \n", "P8 \n", "P9 \n", "P10 \n", "P11 \n", "P12 \n", "P13 \n", "P14 \n", "P15 \n", "P16 \n", "P17 \n", "P18 \n", "P19 \n", "C1 tender/procuringEntity \n", "C2 tender/procuringEntity/contactPoint \n", "C3 \n", "C5 tender/procurementMethod,tender/procurementMet... \n", "C6 \n", "C4 \n", "C7 tender/numberOfTenderers \n", "C8 tender/value \n", "C9 tender/title \n", "C10 awards/suppliers \n", "C11 contracts/values \n", "C12 contracts/documents \n", "C13 contracts/period \n", "C14 \n", "C15 \n", "C16 \n", "C17 \n", "C18 \n", "C19 \n", "\n", " prozorro_example cost_mapping \\\n", "id \n", "P0 \n", "P1 organization \n", "P2 \n", "P3 name \n", "P4 \"deliveryAddress\": { ... \n", "P5 \n", "P6 \n", "P7 \n", "P8 \n", "P9 \n", "P10 \n", "P11 \n", "P12 \n", "P13 \n", "P14 \n", "P15 total_cost \n", "P16 term_end \n", "P17 \n", "P18 \n", "P19 \n", "C1 \"procuringEntity\": { \"additi... \n", "C2 \"contactPoint\": { \"email... \n", "C3 \n", "C5 \"procurementMethod\": \"open\",\"procurementMethod... \n", "C6 \n", "C4 \n", "C7 \"numberOfTenderers\": 4 \n", "C8 \"value\": { \"amount\": 5780000... \n", "C9 \"title\": \"Construction of overpass and transpo... \n", "C10 \"suppliers\": [ { ... \n", "C11 \"value\": { \"amount\": 575... \n", "C12 \"documents\": [ { ... \n", "C13 \"period\": { \"endDate\": \"... \n", "C14 \n", "C15 \n", "C16 \n", "C17 \n", "C18 \n", "C19 \n", "\n", " cost_example \\\n", "id \n", "P0 \n", "P1 \"organization\": \"SAD in Poltava region\" \n", "P2 \n", "P3 \"name\": \"Construction of overpass and transpor... \n", "P4 \n", "P5 \n", "P6 \n", "P7 \n", "P8 \n", "P9 \n", "P10 \n", "P11 \n", "P12 \n", "P13 \n", "P14 \n", "P15 \"total_cost\": \"575 109 925.00\" \n", "P16 \"term_end\": \"31.12.2018\" \n", "P17 \n", "P18 \n", "P19 \n", "C1 \n", "C2 \n", "C3 \n", "C5 \n", "C6 \n", "C4 \n", "C7 \n", "C8 \n", "C9 \n", "C10 \n", "C11 \n", "C12 \n", "C13 \n", "C14 \n", "C15 \n", "C16 \n", "C17 \n", "C18 \n", "C19 \n", "\n", " notes \n", "id \n", "P0 Project identifiers were not included in eithe... \n", "P1 The owner of the project could be inferred fro... \n", "P2 Differs from item classification in Prozorro (... \n", "P3 Project name may appear in tender/title or ten... \n", "P4 Delivery address from the main construction co... \n", "P5 Definition requires clarification \n", "P6 Project description may appear in tender/descr... \n", "P7 Definition requires clarification \n", "P8 \n", "P9 \n", "P10 Definition requires clarification, contact det... \n", "P11 \n", "P12 \n", "P13 \n", "P14 \n", "P15 The projected completion cost could be calcula... \n", "P16 The projected completion date for the project ... \n", "P17 \n", "P18 \n", "P19 \n", "C1 Recommend mapping to whole object to capture b... \n", "C2 \n", "C3 \n", "C5 Definition requires clarification \n", "C6 Definition requires clarification \n", "C4 Prozorro includes a contracts/status field how... \n", "C7 Prozorro also includes the name, address, iden... \n", "C8 tender/value is the estimated cost at the time... \n", "C9 The contract object in Prozorro does not inclu... \n", "C10 \n", "C11 \n", "C12 A scope or description for the contract isn't ... \n", "C13 \n", "C14 Our understanding is that information in Prozo... \n", "C15 Our understanding is that information in Prozo... \n", "C16 Our understanding is that information in Prozo... \n", "C17 Our understanding is that information in Prozo... \n", "C18 Our understanding is that information in Prozo... \n", "C19 Our understanding is that information in Prozo... " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import qgrid\n", "import json\n", "\n", "mapping_df = pd.read_csv(\"cost_mapping.csv\")\n", "\n", "for col in ['prozorro_mapping','prozorro_example','cost_mapping','cost_example','notes']:\n", " mapping_df[col] = mapping_df[col].astype('str')\n", "\n", "mapping_df = mapping_df.set_index(\"id\")\n", " \n", "mapping_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Appendix 1: Creating a versioned release\n", "\n", "The Prozorro dataset provides compiled releases (i.e. the last known state of a contracting process) rather than a change history, so in this section we attempt to reconstruct a change history from previous bulk OCDS downloads from Prozorro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get list of package uris we are interested in:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "cur.execute(\"\"\"\n", "SELECT\n", " data.data ->> 'ocid' as ocid,\n", " package_data.data ->> 'uri' as package_uri\n", "FROM ((\n", " package_data\n", " JOIN\n", " release\n", " ON\n", " package_data.id = release.package_data_id\n", " )\n", " JOIN\n", " data\n", " ON\n", " data.id = release.data_id\n", " )\n", "WHERE\n", " data.data ->> 'ocid' LIKE 'ocds-be6bcu%'\n", "AND\n", " data.data -> 'tender' ->> 'title' LIKE '%Н-12 Суми-Полтава%'\n", "AND\n", " data.data -> 'tender' ->> 'title' LIKE '%838%'\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ocid package_uri\n", "------ -------------\n" ] } ], "source": [ "printResults(cur)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above query is broken, so manual copy and paste of results from pgadmin:\n", "\n", "\"ocds-be6bcu-UA-2017-06-22-000543-b\"\t\"http://ocds.prozorro.openprocurement.io/merged_with_extensions_2018-04-20/release-0000054.json\"\n", "\n", "\"ocds-be6bcu-UA-2017-10-24-000672-a\"\t\"http://ocds.prozorro.openprocurement.io/merged_with_extensions_2018-04-20/release-0000042.json\"\n", "\n", "\"ocds-be6bcu-UA-2017-08-18-000844-c\"\t\"http://ocds.prozorro.openprocurement.io/merged_with_extensions_2018-04-20/release-0000161.json\"\n", "\n", "\"ocds-be6bcu-UA-2016-08-22-000314-b\"\t\"http://ocds.prozorro.openprocurement.io/merged_with_extensions_2018-04-20/release-0000086.json\"\n", "\n", "\"ocds-be6bcu-UA-2016-12-14-000431-a\"\t\"http://ocds.prozorro.openprocurement.io/merged_with_extensions_2018-04-20/release-0000392.json\"\n", "\n", "\"ocds-be6bcu-UA-2017-10-24-000686-a\"\t\"http://ocds.prozorro.openprocurement.io/merged_with_extensions_2018-04-20/release-0000429.json\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the ocid of the biggest contract:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ocds-be6bcu-UA-2017-06-22-000543-b'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "contracts_df[contracts_df['amount']==contracts_df['amount'].max()]['ocid'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download historical packages for the biggest contract, dating back to before the tender.period.startDate:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-04-13.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=103119101.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-04-06.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=106824214.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-03-30.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=112747162.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-03-23.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=107905082.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-03-16.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=101410162.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-03-09.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=105871337.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-03-02.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=108025409.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-02-23.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=105660149.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-02-16.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=104189085.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-02-09.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=102377447.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-02-02.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=101725350.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-01-26.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=111157440.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-01-19.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=99870098.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-01-12.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=102351391.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2018-01-05.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-12-29.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-12-22.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-12-15.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-12-08.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-12-01.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-11-24.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-11-17.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-11-10.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-11-03.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-10-27.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-10-20.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=106365281.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-10-13.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=54955577.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-10-06.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=55340212.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-09-29.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=53099417.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-09-22.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-09-15.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=105998048.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-09-08.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=116270598.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-09-01.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-08-25.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-08-18.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=52089030.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-08-11.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=117947469.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-08-04.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-07-28.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=50869727.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-07-21.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-07-14.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=95.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-07-07.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=111018535.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-06-30.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=106951321.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-06-23.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=115060229.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-06-16.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=116402054.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-06-09.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=109039645.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-06-02.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=105778793.0)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading packages/release-0000054_2017-05-26.json\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "86fc2d975e0b41d08ad4919a22607ca0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FloatProgress(value=0.0, bar_style='info', description='Downloading:', max=111417821.0)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from datetime import datetime, timedelta\n", "\n", "import ipywidgets as widgets\n", "\n", "dates = []\n", "\n", "date = datetime.strptime(\"2018-04-20\",\"%Y-%m-%d\")\n", "\n", "while date > datetime.strptime(\"2017-06-01\",\"%Y-%m-%d\"):\n", " date = date - timedelta(days=7)\n", " dates.append(datetime.strftime(date,\"%Y-%m-%d\"))\n", "\n", "progress = widgets.FloatProgress(\n", " min=0,\n", " max=10.0,\n", " step=0.1,\n", " description='Downloading:',\n", " bar_style='info',\n", " orientation='horizontal'\n", ")\n", "\n", "import requests\n", "from IPython.display import display\n", "\n", "for date in dates:\n", " link = \"http://ocds.prozorro.openprocurement.io/merged_with_extensions_\" + date + \"/release-0000054.json\"\n", " file_name = \"packages/release-0000054_\" + date + \".json\"\n", " progress.value = 0\n", " with open(file_name, \"wb\") as f:\n", " print(\"Downloading %s\" % file_name)\n", " response = requests.get(link, stream=True)\n", " total_length = response.headers.get('content-length')\n", "\n", " if total_length is None: # no content length header\n", " print(\"no content length header\")\n", " f.write(response.content)\n", " else:\n", " progress.max = int(total_length)\n", " display(progress)\n", " for data in response.iter_content(chunk_size=4096):\n", " f.write(data)\n", " progress.value += len(data)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "processing: packages/release-0000054_2017-06-23.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-03-02.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-10-20.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-11-03.json\n", "Loading JSON failed for: packages/release-0000054_2017-11-03.json\n", "processing: packages/release-0000054_2017-09-08.json\n", "package contains 4100 releases\n", "processing: packages/release-0000054_2017-12-15.json\n", "Loading JSON failed for: packages/release-0000054_2017-12-15.json\n", "processing: packages/release-0000054_2017-12-22.json\n", "Loading JSON failed for: packages/release-0000054_2017-12-22.json\n", "processing: packages/release-0000054_2017-06-02.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-12-01.json\n", "Loading JSON failed for: packages/release-0000054_2017-12-01.json\n", "processing: packages/release-0000054_2018-04-06.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-02-09.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-08-25.json\n", "Loading JSON failed for: packages/release-0000054_2017-08-25.json\n", "processing: packages/release-0000054_2017-09-15.json\n", "package contains 4098 releases\n", "processing: packages/release-0000054_2017-05-26.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-09-01.json\n", "Loading JSON failed for: packages/release-0000054_2017-09-01.json\n", "processing: packages/release-0000054_2017-10-27.json\n", "Loading JSON failed for: packages/release-0000054_2017-10-27.json\n", "processing: packages/release-0000054_2017-06-30.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-06-16.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-12-08.json\n", "Loading JSON failed for: packages/release-0000054_2017-12-08.json\n", "processing: packages/release-0000054_2017-10-06.json\n", "package contains 2048 releases\n", "processing: packages/release-0000054_2018-04-13.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-02-23.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-09-22.json\n", "Loading JSON failed for: packages/release-0000054_2017-09-22.json\n", "processing: packages/release-0000054_2017-08-18.json\n", "package contains 2048 releases\n", "processing: packages/release-0000054_2018-02-02.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-03-16.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-07-28.json\n", "package contains 2048 releases\n", "processing: packages/release-0000054_2018-03-30.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-03-09.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-09-29.json\n", "package contains 2049 releases\n", "processing: packages/release-0000054_2017-12-29.json\n", "Loading JSON failed for: packages/release-0000054_2017-12-29.json\n", "processing: packages/release-0000054_2018-01-26.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-01-12.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-07-14.json\n", "Loading JSON failed for: packages/release-0000054_2017-07-14.json\n", "processing: packages/release-0000054_2017-11-10.json\n", "Loading JSON failed for: packages/release-0000054_2017-11-10.json\n", "processing: packages/release-0000054_2018-01-19.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-02-16.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-11-24.json\n", "Loading JSON failed for: packages/release-0000054_2017-11-24.json\n", "processing: packages/release-0000054_2017-11-17.json\n", "Loading JSON failed for: packages/release-0000054_2017-11-17.json\n", "processing: packages/release-0000054_2018-01-05.json\n", "Loading JSON failed for: packages/release-0000054_2018-01-05.json\n", "processing: packages/release-0000054_2017-06-09.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2018-03-23.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-10-13.json\n", "package contains 2048 releases\n", "processing: packages/release-0000054_2017-08-11.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-07-07.json\n", "package contains 4096 releases\n", "processing: packages/release-0000054_2017-07-21.json\n", "Loading JSON failed for: packages/release-0000054_2017-07-21.json\n", "processing: packages/release-0000054_2017-08-04.json\n", "Loading JSON failed for: packages/release-0000054_2017-08-04.json\n" ] } ], "source": [ "import glob\n", "import json\n", "\n", "packagelist = glob.glob(\"packages/*\")\n", "releases = []\n", "\n", "for package in packagelist:\n", " print(\"processing: \", package)\n", " with open(package) as file:\n", " try:\n", " package_json = json.load(file)\n", " print(\"package contains \", len(package_json[\"releases\"]), \" releases\")\n", " for release in package_json[\"releases\"]:\n", " if \"d64bd98f2e35428b93f958b421f55356\" in release[\"id\"]:\n", " print(release[\"id\"])\n", " releases.append(release)\n", " except ValueError:\n", " print(\"Loading JSON failed for: \", package)b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No matching releases were found, suggesting that releases are not located in a consistent segment of the bulk download across dates, therefore constructing a history of changes would involve downloading all segments for all bulk releases (c. 46GB data for every two weeks)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Appendix 2: Exploring the Prozorro dataset" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " release_count ocid_count\n", "--------------- ------------\n", " 1892352 1892352\n" ] } ], "source": [ "cur.execute(\"\"\"\n", "SELECT\n", " COUNT(id) as release_count,\n", " COUNT(DISTINCT data ->> 'ocid') as ocid_count\n", "FROM\n", " data\n", "WHERE\n", " data ->> 'ocid' LIKE 'ocds-be6bcu%'\n", "\"\"\")\n", "\n", "printResults(cur)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One release per ocid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Releases by tag " ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " release_count tag\n", "--------------- --------------------------------------\n", " 230269 [\"tender\"]\n", " 13203 [\"tender\", \"award\"]\n", " 61133 [\"tender\", \"award\", \"bid\"]\n", " 1004187 [\"tender\", \"award\", \"contract\"]\n", " 546471 [\"tender\", \"award\", \"contract\", \"bid\"]\n", " 37089 [\"tender\", \"bid\"]\n" ] } ], "source": [ "cur.execute(\"\"\"\n", "SELECT\n", " COUNT(id) as release_count,\n", " data ->> 'tag' as tag\n", "FROM\n", " data\n", "WHERE\n", " data ->> 'ocid' LIKE 'ocds-be6bcu%'\n", "GROUP BY\n", " tag\n", "\"\"\")\n", "\n", "printResults(cur)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Results suggest that releases and records model is not fully implemented -> expected to see similar numbers of tender and award releases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Search for the name of the highway (Н-12 Суми-Полтава)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " release_count ocid_count\n", "--------------- ------------\n", " 14 14\n" ] } ], "source": [ "cur.execute(\"\"\"\n", "SELECT\n", " COUNT(id) as release_count,\n", " COUNT(DISTINCT data ->> 'ocid') as ocid_count\n", "FROM\n", " data\n", "WHERE\n", " data ->> 'ocid' LIKE 'ocds-be6bcu%'\n", "AND\n", " data -> 'tender' ->> 'title' LIKE '%Н-12 Суми-Полтава%'\n", "\n", "\"\"\")\n", "\n", "printResults(cur)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Refine search by km marker" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " release_count ocid_count\n", "--------------- ------------\n", " 6 6\n" ] } ], "source": [ "cur.execute(\"\"\"\n", "SELECT\n", " COUNT(id) as release_count,\n", " COUNT(DISTINCT data ->> 'ocid') as ocid_count\n", "FROM\n", " data\n", "WHERE\n", " data ->> 'ocid' LIKE 'ocds-be6bcu%'\n", "AND\n", " data -> 'tender' ->> 'title' LIKE '%Н-12 Суми-Полтава%'\n", "AND\n", " data -> 'tender' ->> 'title' LIKE '%838%'\n", "\"\"\")\n", "\n", "printResults(cur)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check how many different ocids there are for the project and what tags" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " release_count ocid tag\n", "--------------- ---------------------------------- --------------------------------------\n", " 1 ocds-be6bcu-UA-2016-08-22-000314-b [\"tender\", \"award\", \"contract\", \"bid\"]\n", " 1 ocds-be6bcu-UA-2016-12-14-000431-a [\"tender\", \"award\", \"contract\"]\n", " 1 ocds-be6bcu-UA-2017-06-22-000543-b [\"tender\", \"award\", \"contract\", \"bid\"]\n", " 1 ocds-be6bcu-UA-2017-08-18-000844-c [\"tender\", \"award\", \"contract\"]\n", " 1 ocds-be6bcu-UA-2017-10-24-000672-a [\"tender\", \"award\", \"contract\"]\n", " 1 ocds-be6bcu-UA-2017-10-24-000686-a [\"tender\", \"award\", \"contract\"]\n" ] } ], "source": [ "cur.execute(\"\"\"\n", "SELECT\n", " COUNT(id) as release_count,\n", " data ->> 'ocid' as ocid,\n", " data ->> 'tag' as tag\n", "FROM\n", " data\n", "WHERE\n", " data ->> 'ocid' LIKE 'ocds-be6bcu%'\n", "AND\n", " data -> 'tender' ->> 'title' LIKE '%Н-12 Суми-Полтава%'\n", "AND\n", " data -> 'tender' ->> 'title' LIKE '%838%'\n", "GROUP BY\n", " ocid,\n", " tag\n", "\"\"\")\n", "\n", "printResults(cur)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Appendix 3: Finding related contracts in Prozorro (front end search / openprocurement API)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using **untranslated** name of the project, search the [Prozorro front end](https://prozorro.gov.ua) for related tenders." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The name of our project:\n", "\n", "> Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838\n", "\n", "Was translated from:\n", "\n", "> Реконструкція мостового переходу на автомобільній дорозі Н-12 Суми-Полтава км 70+838\n", "\n", "Searching for the name of the highway (Н-12 Суми-Полтава) returns [67 results](https://prozorro.gov.ua/tender/search?query=%D0%9D-12%20%D0%A1%D1%83%D0%BC%D0%B8-%D0%9F%D0%BE%D0%BB%D1%82%D0%B0%D0%B2%D0%B0). Manually reviewing the results to identify those relating to the bridge at km 70 +838 results in 6 contracts:\n", "\n", "https://prozorro.gov.ua/tender/UA-2017-10-24-000686-a (ecd7008713ce40898a0b8a9e725cd75a)\n", "\n", "https://prozorro.gov.ua/tender/UA-2017-10-24-000672-a (1711f9b90d7244d68aa47d02f538e0f9)\n", "\n", "https://prozorro.gov.ua/tender/UA-2017-06-22-000543-b (1dc0ebaf0c3e4330bf242a91f39579e9)\n", "\n", "https://prozorro.gov.ua/tender/UA-2017-08-18-000844-c (58c516cadcfe4767852e94d63860aed6)\n", "\n", "https://prozorro.gov.ua/tender/UA-2016-12-14-000431-a (d87604c7236a43a09b5ec4249cc5cb84)\n", "\n", "https://prozorro.gov.ua/tender/UA-2016-08-22-000314-b (2f5da1f7f080416e91b3fd655abffc85)\n", "\n", "The individual tender view in the Prozorro front end includes the identifier needed to retrieve the related record from the [openprocurementapi](https://public.api.openprocurement.org/api/2/tenders/) (shown in brackets above)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Populate the following array with the tender identifiers you found in the previous step:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "identifiers = [\"ecd7008713ce40898a0b8a9e725cd75a\",\"1711f9b90d7244d68aa47d02f538e0f9\",\"1dc0ebaf0c3e4330bf242a91f39579e9\",\"58c516cadcfe4767852e94d63860aed6\",\"d87604c7236a43a09b5ec4249cc5cb84\",\"2f5da1f7f080416e91b3fd655abffc85\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the following script to download the data from the openprocurement API:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "import urllib\n", "\n", "data[\"tenders\"] = []\n", "\n", "for identifier in identifiers:\n", " with urllib.request.urlopen(\"https://public.api.openprocurement.org/api/2/tenders/\" + identifier) as url:\n", " response = json.loads(url.read().decode())\n", " data[\"tenders\"].append(response[\"data\"])\n" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "#save results to file\n", "with open(\"data_\" + publicEntity + \"_\" + region + \"_\" + foreignKey + \".json\", \"w\") as output:\n", " json.dump(data, output, indent = 4, ensure_ascii=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Appendix 4: Initial mapping / fields for comparison" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following table provides a high level mapping of the fields in the CoST Ukraine to fields in OCDS, to identify opportunities to compare data from CoST Ukraine and Prozorro:\n", "\n", "| Section |Field | Example | Related OCDS field(s) | Notes |\n", "|---------|-------|---------|---------------------|--------|\n", "| Summary |`term_end` | 31.12.2018 | `contract/period/endDate` ||\n", "| Summary |`term_start` | 12.10.2017 | `contract/period/startDate` ||\n", "| Summary |`total_cost` | 99 202 651.20 | `contract/value/amount` ||\n", "| Summary |`customer` | Sumymostobud LLC | `awards/suppliers/name` ||\n", "| TECH_DETAILS | `left_cost` | 99 202 651.20 | | Cost remaining? Matches total_cost in summary section |\n", "| SUBJECTS | `prognose cost` | 99202651.20 | | Expected cost? Matches total_cost in summary section |\n", "| SUBJECTS | `decision_date` | 12.10.2017 | `awards/suppliers/date` |\n", "| CONTRACTOR | `costs` | 99 202 651.20 | `awards/amount` | Matches total_cost in summary section |\n", "| CONTRACTOR | `org_name` | Sumymostobud LLC | `awards/suppliers/name` ||\n", "| CONTRACTOR | `announce_place` | Prozor | | Notice location |\n", "| CONTRACTOR | `terms` | 31.12.2018 | `contract.period.endDate` | |\n", "| CONTRACTOR | `announce_date` | 22.06.2017 | | tender date? |\n", "| CONTRACTOR | `start_date` | 12.10.2017 | `contract/period/startDate` | |\n", "| CONTRACTOR | `announce_num` | UA-2017-06-22-000543-b | `ocid` ||\n", "| FINANCING | `state_budget` | 19 667 768.40 | `budget.amount` ||\n" ] } ], "metadata": { "kernelspec": { "display_name": "ocinfra2", "language": "python", "name": "ocinfra2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }