# DataFrame Conversions

This notebook demonstrates how to [convert](https://nexus-forge.readthedocs.io/en/latest/interaction.html#converting) a [Resource](https://nexus-forge.readthedocs.io/en/latest/interaction.html#resource) to pandas DataFrame and vice-versa.

In [1]:
from kgforge.core import KnowledgeGraphForge

A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook [00-Initialization.ipynb](00%20-%20Initialization.ipynb).

In [2]:
forge = KnowledgeGraphForge("../../configurations/forge.yml")

## Imports

In [3]:
import pandas as pd
import numpy as np

In [4]:
from kgforge.core import Resource

## List of Resources to DataFrame

In [5]:
address = Resource(type="PostalAddress", country="Switzerland", locality="Geneva")

In [6]:
jane = Resource(type="Person", name="Jane Doe", address=address, email="(missing)")

In [7]:
john = Resource(type="Person", name="John Smith", email="john.smith@epfl.ch")

In [8]:
persons = [jane, john]

In [9]:
forge.register(persons)

<count> 2
<action> _register_many
<succeeded> True


In [10]:
forge.as_json(jane)

{'id': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/99105664-6d99-45a5-90e8-82f58e45f36a',
 'type': 'Person',
 'address': {'type': 'PostalAddress',
  'country': 'Switzerland',
  'locality': 'Geneva'},
 'email': '(missing)',
 'name': 'Jane Doe'}

In [11]:
forge.as_json(john)

{'id': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/9b4976dc-6fb8-49fb-892d-a634d27eac3b',
 'type': 'Person',
 'email': 'john.smith@epfl.ch',
 'name': 'John Smith'}

In [12]:
john._store_metadata

{'id': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/9b4976dc-6fb8-49fb-892d-a634d27eac3b',
 '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json',
 '_createdAt': '2022-04-12T22:24:14.009Z',
 '_createdBy': 'https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy',
 '_deprecated': False,
 '_incoming': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/9b4976dc-6fb8-49fb-892d-a634d27eac3b/incoming',
 '_outgoing': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/9b4976dc-6fb8-49fb-892d-a634d27eac3b/outgoing',
 '_project': 'https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge',
 '_rev': 1,
 '_schemaProject': 'https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge',
 '_self': 'https://bbp.epfl.ch/nexus/v1/resources/dke/kgforge/_/9b4976dc-6fb8-49fb-892d-a634d27eac3b',
 '_updatedAt': '2022-04-12T22:24:14.009Z',
 '_updatedBy': 'https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy'}

In [13]:
forge.as_dataframe(persons)

Unnamed: 0,id,type,address.type,address.country,address.locality,email,name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,,,,john.smith@epfl.ch,John Smith


It is possible to specify what values (here '(missing)') should be replaced by `NaN` using the `na` parameter.

In [14]:
forge.as_dataframe(persons, na="(missing)")

Unnamed: 0,id,type,address.type,address.country,address.locality,email,name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,PostalAddress,Switzerland,Geneva,,Jane Doe
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,,,,john.smith@epfl.ch,John Smith


It is possible to specify a string to use in the column names to show nested values, the default is dot `.`.

In [15]:
forge.as_dataframe(persons, nesting="__")

Unnamed: 0,id,type,address__type,address__country,address__locality,email,name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,,,,john.smith@epfl.ch,John Smith


The `expanded` parameter will show fields and values according to the JSON-LD context.

In [16]:
forge.as_dataframe(persons, expanded=True)

Unnamed: 0,@id,@type,http://schema.org/address,http://schema.org/email,http://schema.org/name
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,[http://schema.org/Person],[{'@type': ['https://bbp.epfl.ch/nexus/v1/reso...,[{'@value': '(missing)'}],[{'@value': 'Jane Doe'}]
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,[http://schema.org/Person],,[{'@value': 'john.smith@epfl.ch'}],[{'@value': 'John Smith'}]


In [17]:
forge.as_dataframe(persons, store_metadata=True)

Unnamed: 0,id,type,address.type,address.country,address.locality,email,name,_constrainedBy,_createdAt,_createdBy,_deprecated,_incoming,_outgoing,_project,_rev,_schemaProject,_self,_updatedAt,_updatedBy
0,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe,https://bluebrain.github.io/nexus/schemas/unco...,2022-04-12T22:24:14.013Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2022-04-12T22:24:14.013Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy
1,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,Person,,,,john.smith@epfl.ch,John Smith,https://bluebrain.github.io/nexus/schemas/unco...,2022-04-12T22:24:14.009Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy,False,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,1,https://bbp.epfl.ch/nexus/v1/projects/dke/kgforge,https://bbp.epfl.ch/nexus/v1/resources/dke/kgf...,2022-04-12T22:24:14.009Z,https://bbp.epfl.ch/nexus/v1/realms/bbp/users/sy


## DataFrame to list of Resources

In [20]:
data = pd.DataFrame([
    {
        "type": "Person",
        "address.type": "PostalAddress",
        "address.country": "Switzerland",
        "address.locality": "Geneva",
        "email": "(missing)",
        "name": "Jane Doe",
    },
    {
        "type": "Person",
        "address.type": np.nan,
        "address.country": np.nan,
        "address.locality": np.nan,
        "email": "john.smith@epfl.ch",
        "name": "John Smith",
    }
])

In [21]:
data

Unnamed: 0,type,address.type,address.country,address.locality,email,name
0,Person,PostalAddress,Switzerland,Geneva,(missing),Jane Doe
1,Person,,,,john.smith@epfl.ch,John Smith


In [22]:
resources = forge.from_dataframe(data)

In [23]:
address = Resource(type="PostalAddress", country="Switzerland", locality="Geneva")

In [24]:
jane = Resource(type="Person", name="Jane Doe", address=address, email="(missing)")

In [25]:
john = Resource(type="Person", name="John Smith", email="john.smith@epfl.ch")

In [26]:
persons = [jane, john]

In [27]:
resources == persons

True

In [28]:
resources_na = forge.from_dataframe(data, na="(missing)")

In [29]:
print(resources[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    email: (missing)
    name: Jane Doe
}


In [30]:
print(resources_na[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    name: Jane Doe
}


In [31]:
resources_nesting = forge.from_dataframe(data, nesting=".")

In [32]:
print(resources_nesting[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    email: (missing)
    name: Jane Doe
}


In [33]:
data = pd.DataFrame([
    {
        "type": "Person",
        "address_type": "PostalAddress",
        "address_country": "Switzerland",
        "address_locality": "Geneva",
        "email": "(missing)",
        "name": "Jane Doe",
    },
    {
        "type": "Person",
        "address_type": np.nan,
        "address_country": np.nan,
        "address_locality": np.nan,
        "email": "john.smith@epfl.ch",
        "name": "John Smith",
    }
])

In [34]:
resources_nesting = forge.from_dataframe(data, nesting="_")

In [35]:
print(resources_nesting[0])

{
    type: Person
    address:
    {
        type: PostalAddress
        country: Switzerland
        locality: Geneva
    }
    email: (missing)
    name: Jane Doe
}
