{ "metadata": { "name": "", "signature": "sha256:a06c07399b38eb2573ceb5f0906ac2d1d941f9e78f7e7954f0a6465e244e8bae" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Original dataset to HDF5" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pylab inline" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import joblib" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Schema\n", "\n", "1. **Square id**: the id of the square that is part of the Milano GRID; TYPE: numeric\n", "2. **Time interval**: the beginning of the time interval expressed as the number of millisecond elapsed from the Unix Epoch on January 1st, 1970 at UTC. The end of the time interval can be obtained by adding 600000 milliseconds (10 minutes) to this value. TYPE: numeric\n", "3. **Country code**: the phone country code of a nation. Depending on the measured activity this value assumes different meanings that are explained later. TYPE: numeric\n", "4. **SMS-in activity**: the activity in terms of received SMS inside the Square id, during the Time interval and sent from the nation identified by the Country code. TYPE: numeric\n", "5. **SMS-out activity**: the activity in terms of sent SMS inside the Square id, during the Time interval and received by the nation identified by the Country code. TYPE: numeric\n", "6. **Call-in activity**: the activity in terms of received calls inside the Square id, during the Time interval and issued from the nation identified by the Country code. TYPE: numeric\n", "7. **Call-out activity**: the activity in terms of issued calls inside the Square id, during the Time interval and received by the nation identified by the Country code. TYPE: numeric\n", "8. **Internet traffic activity**: the activity in terms of performed internet traffic inside the Square id, during the Time interval and by the nation of the users performing the connection identified by the Country code . TYPE: numeric" ] }, { "cell_type": "code", "collapsed": false, "input": [ "column_names = ['Square_id',\n", " 'Time_interval',\n", " 'Country_code',\n", " 'SMS_in',\n", " 'SMS_out',\n", " 'Call_in',\n", " 'Call_out',\n", " 'Internet_traffic']\n", "\n", "dtypes = {'Square_id': int32,\n", " 'Time_interval': uint64,\n", " 'Country_code': int32,\n", " 'SMS_in': float32,\n", " 'SMS_out': float32,\n", " 'Call_in': float32,\n", " 'Call_out': float32,\n", " 'Internet_traffic': float32}\n", "\n", "file_pattern = 'data/telco/sms-call-internet-mi-2013-{:02d}-{:02d}.txt'\n", "\n", "df_list = []\n", "\n", "for day in range(1,31):\n", " print 'loading', file_pattern.format(11, day)\n", " df = pd.read_csv(file_pattern.format(11, day),\n", " sep='\\t', header=None,\n", " names=column_names, dtype=dtypes)\n", " df_list.append(df)\n", "\n", "\n", "for day in range(1,32):\n", " print 'loading', file_pattern.format(12, day)\n", " df = pd.read_csv(file_pattern.format(12, day),\n", " sep='\\t', header=None,\n", " names=column_names, dtype=dtypes)\n", " df_list.append(df)\n", "\n", "\n", "df = pd.concat(df_list)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "loading data/telco/sms-call-internet-mi-2013-11-01.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-02.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-03.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-04.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-05.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-06.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-07.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-08.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-09.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-10.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-11.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-12.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-13.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-14.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-15.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-16.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-17.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-18.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-19.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-20.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-21.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-22.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-23.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-24.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-25.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-26.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-27.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-28.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-29.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-11-30.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-01.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-02.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-03.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-04.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-05.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-06.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-07.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-08.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-09.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-10.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-11.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-12.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-13.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-14.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-15.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-16.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-17.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-18.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-19.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-20.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-21.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-22.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-23.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-24.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-25.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-26.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-27.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-28.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-29.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-30.txt\n", "loading" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " data/telco/sms-call-internet-mi-2013-12-31.txt\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "df.index = pd.to_datetime(df.Time_interval.values, unit='ms').tz_localize('utc').tz_convert('Europe/Rome')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "df.drop('Time_interval', axis=1, inplace=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "df.sort_index(inplace=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "STORE_FILE = 'stores/sms-call-internet-mi-table-blosc.h5'\n", "store = pd.HDFStore(STORE_FILE, 'w', complib='blosc')\n", "store.append('telco_data', df, format='t', data_columns=['Square_id', 'Country_code'])\n", "store.append('telco_codes', df.Country_code.drop_duplicates(), format='t')\n", "store.close()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 } ], "metadata": {} } ] }