# Data mining using pyiron tables

In this example, the data mining capabilities of pyiron using the `PyironTables` class is demonstrated by computing and contrasting the ground state properties of fcc-Al using various force fields.

In [1]:
from pyiron import Project
import numpy as np

In [2]:
pr = Project("potential_scan")

## Creating a dummy job to get list of potentials

In order to get the list of available LAMMPS potentials, a dummy job with an Al bulk structure is created

In [3]:
dummy_job = pr.create_job(pr.job_type.Lammps, "dummy_job")
dummy_job.structure = pr.create_ase_bulk("Al")
# Chosing only select potentials to run (you can play with these valuess)
num_potentials = 5
potential_list = dummy_job.list_potentials()[:num_potentials]

## Creating a Murnaghan job for each potential in their respective subprojects

A separate Murnaghan job (to compute equilibrium lattice constant and the bulk modulus) is created and run for every potential

In [4]:
for pot in potential_list:
    pot_str = pot.replace("-", "_")
    # open a subproject within a project
    with pr.open(pot_str) as pr_sub:
        # no need for unique job name if in different subprojects 
        job_name = "murn_Al"
        # Use the subproject to create the jobs
        murn = pr_sub.create_job(pr.job_type.Murnaghan, job_name)
        job_ref = pr_sub.create_job(pr.job_type.Lammps, "Al_ref")
        job_ref.structure = pr.create_ase_bulk("Al", cubic=True)
        job_ref.potential = pot
        job_ref.calc_minimize()
        murn.ref_job = job_ref
        # Some potentials may not work with certain LAMMPS compilations.
        # Therefore, we need to have a little exception handling
        try:
            murn.run()
        except RuntimeError:
            pass

The job murn_Al was saved and received the ID: 1
The job strain_0_9 was saved and received the ID: 2
The job strain_0_92 was saved and received the ID: 3
The job strain_0_94 was saved and received the ID: 4
The job strain_0_96 was saved and received the ID: 5
The job strain_0_98 was saved and received the ID: 6
The job strain_1_0 was saved and received the ID: 7
The job strain_1_02 was saved and received the ID: 8
The job strain_1_04 was saved and received the ID: 9
The job strain_1_06 was saved and received the ID: 10
The job strain_1_08 was saved and received the ID: 11
The job strain_1_1 was saved and received the ID: 12
job_id:  2 finished
job_id:  3 finished
job_id:  4 finished
job_id:  5 finished
job_id:  6 finished
job_id:  7 finished
job_id:  8 finished
job_id:  9 finished
job_id:  10 finished
job_id:  11 finished
job_id:  12 finished
The job murn_Al was saved and received the ID: 13
The job strain_0_9 was saved and received the ID: 14
The job strain_0_92 was saved and received

Reading data file ...
  orthogonal box = (0 0 0) to (3.91023 3.91023 3.91023)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  4 atoms
  read_data CPU = 0.00191307 secs
ERROR: MEAM library error 3 (src/USER-MEAMC/pair_meamc.cpp:596)
Last command: pair_coeff * * MgAlZn.library.meam Mg Al MgAlZn.parameter.meam Mg Al Zn



The job murn_Al was saved and received the ID: 39
The job strain_0_9 was saved and received the ID: 40
The job strain_0_92 was saved and received the ID: 41
The job strain_0_94 was saved and received the ID: 42
The job strain_0_96 was saved and received the ID: 43
The job strain_0_98 was saved and received the ID: 44
The job strain_1_0 was saved and received the ID: 45
The job strain_1_02 was saved and received the ID: 46
The job strain_1_04 was saved and received the ID: 47
The job strain_1_06 was saved and received the ID: 48
The job strain_1_08 was saved and received the ID: 49
The job strain_1_1 was saved and received the ID: 50
job_id:  40 finished
job_id:  41 finished
job_id:  42 finished
job_id:  43 finished
job_id:  44 finished
job_id:  45 finished
job_id:  46 finished
job_id:  47 finished
job_id:  48 finished
job_id:  49 finished
job_id:  50 finished


If you inspect the job table, you would find that each Murnaghan job generates various small LAMMPS jobs (see column `hamilton`). Some of these jobs might have failed with status `aborted`.

In [5]:
pr.job_table()

Unnamed: 0,id,status,chemicalformula,job,subjob,projectpath,project,timestart,timestop,totalcputime,computer,hamilton,hamversion,parentid,masterid
0,1,finished,Al4,murn_Al,/murn_Al,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/,2020-05-01 14:20:15.185926,2020-05-01 14:20:52.212726,37.0,pyiron@cmdell17#1#11/11,Murnaghan,0.3.0,,
1,2,finished,Al4,strain_0_9,/strain_0_9,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:16.872239,2020-05-01 14:20:18.199291,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
2,3,finished,Al4,strain_0_92,/strain_0_92,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:20.376998,2020-05-01 14:20:21.474685,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
3,4,finished,Al4,strain_0_94,/strain_0_94,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:23.410323,2020-05-01 14:20:24.454505,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
4,5,finished,Al4,strain_0_96,/strain_0_96,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:26.407384,2020-05-01 14:20:27.448024,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
5,6,finished,Al4,strain_0_98,/strain_0_98,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:29.389853,2020-05-01 14:20:30.457648,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
6,7,finished,Al4,strain_1_0,/strain_1_0,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:32.440577,2020-05-01 14:20:33.587692,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
7,8,finished,Al4,strain_1_02,/strain_1_02,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:35.659606,2020-05-01 14:20:36.717203,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
8,9,finished,Al4,strain_1_04,/strain_1_04,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:39.247825,2020-05-01 14:20:40.631913,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0
9,10,finished,Al4,strain_1_06,/strain_1_06,/home/surendralal/,programs/pyiron/notebooks/potential_scan/Al_Mg_Mendelev_eam/murn_Al_hdf5/,2020-05-01 14:20:43.093369,2020-05-01 14:20:44.365442,1.0,pyiron@cmdell17#1,Lammps,0.1,,1.0


## Analysis using `PyironTables`

The idea now is to go over all finished Murnaghan jobs and extract the equilibrium lattice parameter and bulk modulus, and classify them based of the potential used.

### Defining filter functions

Since a project can have thousands if not millions of jobs, it is necessary to "filter" the data and only apply the functions (some of which can be computationally expensive) to only this data. In this example, we need to filter jobs that are finished and are of type `Murnaghan`. This can be done in two ways: using the job table i.e. the entries in the database, or using the job itself i.e. using entries in the stored HDF5 file. Below are examples of filter functions acting on the job and the job table respectively.

In [6]:
# Filtering using the database entries (which are obtained as a pandas Dataframe)
def db_filter_function(job_table):
    # Returns a pandas Series of boolean values (True for entries that have status finished 
    # and hamilton type Murnaghan.)
    return (job_table.status == "finished") & (job_table.hamilton == "Murnaghan")

# Filtering based on the job
def job_filter_function(job):
    # returns a boolean value if the status of the job 
    #is finished and if "murn" is in it's job name 
    return (job.status == "finished") & ("murn" in job.job_name)

Obviously, using the database is faster in this case but sometimes it might be necessary to filter based on some data that are stored in the HDF5 file of the job. The database filter is applied first followed by the job based filter.

### Defining functions that act on jobs

Now we define a set of functions that will be applied on each job to return a certain value. The filtered jobs will be loaded and these functions will be applied on the loaded jobs. The advantage of such functions is that the jobs do not have to be loaded every time such operations are performed. The filtered jobs are loaded once, and then they are passed to these functions to construct the table.

In [7]:
# Getting equilibrium lattice parameter from Murnaghan jobs
def get_lattice_parameter(job):
    return job["output/equilibrium_volume"] ** (1/3)

# Getting equilibrium bulk modulus from Murnaghan jobs
def get_bm(job):
    return job["output/equilibrium_bulk_modulus"]

# Getting the potential used in each Murnaghan job
def get_pot(job):
    child = job.project.inspect(job["output/id"][0])
    return child["input/potential/Name"]

### Creating a pyiron table

Now that all the functions are defined, the pyiron table called "table" is created in the following way. This works like a job and can be reloaded at any time.

In [8]:
%%time
# creating a pyiron table
table = pr.create_table("table")

# assigning a database filter function
table.db_filter_function = db_filter_function

# Alternatively/additionally, a job based filter function can be applied 
# (it does the same thing in this case).

#table.filter_function = job_filter_function

# Adding the functions using the labels you like
table.add["a_eq"] = get_lattice_parameter
table.add["bulk_modulus"] = get_bm
table.add["potential"] = get_pot
# Running the table to generate the data
table.run(run_again=True)

  0%|          | 0/4 [00:00<?, ?it/s]

The job table was saved and received the ID: 51


100%|██████████| 4/4 [00:00<00:00, 20.91it/s]


CPU times: user 531 ms, sys: 156 ms, total: 688 ms
Wall time: 725 ms


The output can now be obtained as a pandas DataFrame

In [9]:
table.get_dataframe()

Unnamed: 0,job_id,a_eq,bulk_modulus,potential
0,1,4.045415,89.015487,Al_Mg_Mendelev_eam
1,13,4.049946,80.836779,Zope_Ti_Al_2003_eam
2,25,4.049954,81.040445,Al_H_Ni_Angelo_eam
3,39,4.031246,78.213776,2000--Landa-A--Al-Pb--LAMMPS--ipr1


You can now compare the computed equilibrium lattice constants for each potential to those computed in the NIST database for Al (fcc phase). https://www.ctcms.nist.gov/potentials/system/Al/#Al.