# datadictionary_demo

This notebook demonstrates the use of the DataDictionary object in the ukds package.

This demonstration uses for an example the following dataset: Gershuny, J., Sullivan, O. (2017). United Kingdom Time Use Survey, 2014-2015. Centre for Time Use Research, University of Oxford. [data collection]. UK Data Service. SN: 8128, http://doi.org/10.5255/UKDA-SN-8128-1


## Import the ukds package

In [1]:
import ukds

## Set up a filepath to a UKDS .rtf data dictionary file

In [2]:
fp=r'C:\Users\cvskf\OneDrive - Loughborough University\_Data\United_Kingdom_Time_Use_Survey_2014-2015' + \
 r'\UKDA-8128-tab\mrdoc\allissue\uktus15_household_ukda_data_dictionary.rtf'

## Create a DataDictionary object 

In [3]:
dd=ukds.DataDictionary()
print(dd.__doc__)
dd

A class for reading a UK Data Service .rtf data dictionary file
 




## Read in the .rtf file

This reads in the .rdf file and converts the information into a list of dictionaries which is stored in the attribute `variable_list`.

In [4]:
dd.read_rtf(fp)

## Exploring the variable_list attribute

The number of variables is:

In [5]:
len(dd.variable_list)

335

The first variable has the following information:

In [6]:
dd.variable_list[0]

{'pos': '1',
 'variable': 'serial',
 'variable_label': 'Household number',
 'variable_type': 'numeric',
 'SPSS_measurement_level': 'SCALE',
 'SPSS_user_missing_values': '',
 'value_labels': ''}

Here is an example of a variable which uses 'value_labels':

In [7]:
dd.variable_list[3]

{'pos': '4',
 'variable': 'HhOut',
 'variable_label': 'Final outcome - household',
 'variable_type': 'numeric',
 'SPSS_measurement_level': 'SCALE',
 'SPSS_user_missing_values': '',
 'value_labels': {0.0: 'Outstanding',
 640.0: 'Unknown whether address is residential: No contact after 6+ calls',
 214.0: 'Productive : Household questionnaire completed – no individual interviews',
 520.0: 'Away / in hospital during survey period',
 650.0: 'Residential: unknown if eligible person(s) due to no-contact after 6+ calls',
 531.0: 'Physically unable / incompetent',
 532.0: 'Mentally unable / incompetent',
 110.0: 'Productive : Household interview completed, all eligible household members completed individual interviews and diary',
 790.0: 'Other ineligible',
 410.0: 'Office refusal',
 540.0: 'Language difficulties',
 421.0: 'Information refused about number of DUs/HHs at address',
 422.0: 'Information refused about people in household',
 810.0: 'Information refused about whether address is resid

## get_variable_dict

The `get_variable_dict` method provides access to the information for a single variable:

In [8]:
dd.get_variable_dict('serial')

{'pos': '1',
 'variable': 'serial',
 'variable_label': 'Household number',
 'variable_type': 'numeric',
 'SPSS_measurement_level': 'SCALE',
 'SPSS_user_missing_values': '',
 'value_labels': ''}

In [9]:
dd.get_variable_dict('HhOut')

{'pos': '4',
 'variable': 'HhOut',
 'variable_label': 'Final outcome - household',
 'variable_type': 'numeric',
 'SPSS_measurement_level': 'SCALE',
 'SPSS_user_missing_values': '',
 'value_labels': {0.0: 'Outstanding',
 640.0: 'Unknown whether address is residential: No contact after 6+ calls',
 214.0: 'Productive : Household questionnaire completed – no individual interviews',
 520.0: 'Away / in hospital during survey period',
 650.0: 'Residential: unknown if eligible person(s) due to no-contact after 6+ calls',
 531.0: 'Physically unable / incompetent',
 532.0: 'Mentally unable / incompetent',
 110.0: 'Productive : Household interview completed, all eligible household members completed individual interviews and diary',
 790.0: 'Other ineligible',
 410.0: 'Office refusal',
 540.0: 'Language difficulties',
 421.0: 'Information refused about number of DUs/HHs at address',
 422.0: 'Information refused about people in household',
 810.0: 'Information refused about whether address is resid

## get_variable_names

This method returns a list of all variable names in the data dictionary file.

In [14]:
print(dd.get_variable_names())

['serial', 'strata', 'psu', 'HhOut', 'hh_wt', 'IMonth', 'IYear', 'DM014', 'DM016', 'DM510', 'DM1115', 'DM1619', 'NumAdult', 'NumChild', 'NumSSex', 'NumCPart', 'NumMPart', 'NumCivP', 'DVHsize', 'Relsize', 'SelPer', 'CCPersNo', 'Accom', 'Hhldr1', 'Hhldr2', 'Hhldr3', 'Hhldr4', 'Hhldr5', 'Hhldr6', 'Hhldr7', 'Hhldr8', 'Hhldr9', 'Hhldr10', 'HiHNum', 'Tenure', 'NumRooms', 'TVSet', 'TVSetNum', 'Cable', 'CableNum', 'Games', 'GamesNum', 'Land', 'LandNum', 'Mob', 'MobPerm', 'Comp', 'CompNum', 'Microwav', 'Dishwash', 'WashMach', 'Tumble', 'Freezer', 'HmIntnet', 'IntAcc1', 'IntAcc2', 'IntAcc3', 'IntAcc4', 'IntAcc5', 'IntAcc6', 'IntPurch', 'VehOwn', 'VehNum', 'Repairs', 'Wages', 'SelfEmp', 'Pension', 'UnempBen', 'BenOth', 'Invest', 'IncOth', 'Income', 'IncCat', 'Help1', 'Help2', 'Help3', 'Help4', 'Help5', 'Help6', 'Help7', 'Help8', 'Help9', 'Help10', 'Help11', 'Help12', 'Help13', 'Help14', 'Help15', 'Help16', 'Help17', 'Help18', 'Help19', 'Help20', 'Help21', 'Help22', 'Help23', 'Help24', 'Help25', '

## to_rdf

This method places the variable_list data in an existing rdflib.Graph

In [12]:
print(ukds.DataDictionary.to_rdf.__doc__)

Places the DataDictionary data in an rdflib Graph.
 
 Arguments:
 - graph (rdflib.Graph): a graph to place the data in
 - prefix (str): a prefix for the Data Dictionary ontology
 - uri (str): a uri for the Data Dictionary ontology
 
 Returns:
 - (rdflib.Graph): the input graph with the DataDictionary data inserted into it.
 
 


In [13]:
import rdflib
g=rdflib.Graph()
g=dd.to_rdf(graph=g,
 prefix='o8128',
 uri='http://purl.org/berg/ontology/10.5255/UKDA-SN-8128-1/')
print(g.serialize(format='ttl').decode())

@prefix o8128: .
@prefix rdf: .
@prefix rdfs: .
@prefix ukds: .
@prefix xml: .
@prefix xsd: .

o8128:Accom a rdf:Property ;
 ukds:SPSS_measurement_level "NOMINAL" ;
 ukds:pos 23 ;
 ukds:value_labels [ ukds:label "Flat or maisonette" ;
 ukds:value "2.0" ],
 [ ukds:label "House or bungalow" ;
 ukds:value "1.0" ],
 [ ukds:label "Interview not achieved" ;
 ukds:value "-7.0" ],
 [ ukds:label "Schedule not applicable" ;
 ukds:value "-2.0" ],
 [ ukds:label "Room or rooms" ;
 ukds:value "3.0" ],
 [ ukds:label "No answer/refused" ;
 ukds:value "-9.0" ],
 [ ukds:label "Item not applicable" ;
 ukds:value "-1.0" ],
 [ ukds:label "Don't know" ;
 ukds:value "-8.0" ],
 [ ukds:label "Other" ;
 ukds:value "4.0" ] ;
 ukds:variable "Accom" ;
 ukds:variable_label "Type of accomodation" ;
 ukds:variable_type "numeric" .

o8128:BenOth a rdf:Property ;
 ukds:SPSS_measurement_level "SCALE" ;
 ukds:pos 69 ;
 ukds:value_labels [ ukds:label "Yes" ;
 ukds:value "1.0" ],
 [ ukds:label "No answer/refused" ;
 ukds:v