{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Back to the main [Index](index.ipynb) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tasks, Workflows and Flow\n", "\n", "\n", "## Table of Contents\n", "[[back to top](#top)]\n", "\n", "- [Building a Flow for band structure calculations](#Building-a-Flow-for-band-structure-calculations)\n", "- [How to build and run the Flow](#How-to-build-and-run-the-Flow)\n", "- [Executing a Flow](#Executing-a-Flow)\n", "- [More on Works, Tasks and dependencies](#More-on-Works,-Tasks-and-dependencies)\n", "- [Abirun.py](#Abirun.py)\n", "\n", "In this notebook we discuss some of the basic concepts used in AbiPy \n", "to automate ab-initio calculations. \n", "In particular we will focus on the following three objects: \n", "\n", " * `Task`\n", " * `Work`\n", " * `Flow`\n", " \n", "The `Task` represent the most *elementary step* of the automatic workflow. \n", "Roughly speaking, it corresponds to the execution of a single Abinit calculation **without** multiple datasets.\n", "\n", "From the point of view of AbiPy, a calculation consists of a set of `Tasks` that are connected \n", "by dependencies. \n", "Each task has a list of files that are needed to start the calculation, \n", "and a list of files that are produced at the end of the run.\n", "\n", "Some of the input files needed by a `Task` must be provided by the user in the form of Abinit input variables \n", "(e.g. the crystalline structure, the pseudopotentials), other inputs may be produced by other tasks.\n", "When a `Task` **B** requires the output file `DEN` of another task **A**, \n", "we say that **B** depends on **A** through a `DEN` file, and we express this dependency with the dictionary:\n", "\n", "```python\n", "B_deps = {A: \"DEN\"}\n", "```\n", "\n", "To clarify this point, let's take a standard KS band structure calculation as an example.\n", "In this case, we have an initial `ScfTask` that solves the KS equations self-consistently to produce a `DEN` file. \n", "The density is then used by a second `NscfTask` to compute a band structure on an arbitrary \n", "set of $k$-points.\n", "The `NscfTask` thus has a dependency on the `ScfTask` in the sense that it cannot be executed \n", "until the `ScfTask` is completed and the `DEN` file is produced by the `ScfTask`.\n", "\n", "Now that we have clarified the concept of `Task`, we can finally turn to `Works` and `Flow`.\n", "The `Work` can be seen as a list of `Tasks`, while the `Flow` is essentially a list of `Work` objects.\n", "Works are usually used to group tasks that are connected to each other.\n", "Flows are the final objects that are executed. The `Flow` provides an easy-to-use\n", "interface for performing this execution. \n", "The `Flow` provides a high-level API to perform common operations like launching the actual jobs, checking the status of the `Tasks`, correcting problems etc. \n", "\n", "AbiPy provides several tools to generate Flows for typical first-principles calculations, so called factory functions.\n", "This means that you do not need to understand all the technical details of the python implementation.\n", "In many cases, indeed, we already provide some kind of `Work` or `Flow` that automates \n", "your calculation, and you only need to provide the correct list of input files.\n", "This list, obviously, must be consistent with the kind Flow/Work you are using.\n", "(For instance, you should not pass a list of inputs for performing a band structure calculation to a Work \n", "that is expected to compute phonons with DFPT!)\n", "\n", "All the `Works` and the `Tasks` of a flow are created and executed inside the working directory (`workdir`). \n", "This is usually specified by the user during the creation of the `Flow` object.\n", "AbiPy creates the workdir of the different Works/Tasks when the `Flow` is executed\n", "for the first time.\n", "\n", "Each `Task` contains a set of input variables that will be used to generate the \n", "Abinit input file. \n", "This input **must** be provided by the user during the creation of the `Task`.\n", "Fortunately, AbiPy provides an object named `AbinitInput` to facilitate the creation \n", "of such input. \n", "Once you have an `AbinitInput`, you can create the corresponding `Task` with the (pseudo) code:\n", "\n", "```python\n", "new_task = Task(abinit_input_object)\n", "```\n", "\n", "The `Task` provides several methods for monitoring the status of the calculation and \n", "post-processing the results.\n", "Note that the concept of dependency is not limited to files. All the Tasks in the \n", "flow are connected and can interact with each other. This allows programmers to implements python\n", "functions that will be invoked by the framework at run time. For example, one can \n", "implement a Task that fetches the relaxed structure from a previous Task and \n", "use this configuration to start a DFPT calculation.\n", "\n", "In the next paragraph, we discuss how to construct a `Flow` for band-structure calculations\n", "with a high-level interface that only requires the specifications of the input files.\n", "This example allows us to discuss the most important methods of the `Flow`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Building a Flow for band structure calculations\n", "[[back to top](#top)]\n", "\n", "Let's start by creating a function that produces two input files. \n", "The first input is a standard self-consistent ground-state run.\n", "The second input uses the density produced in the first run to perform \n", "a non self-consistent band structure calculation." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# This line configures matplotlib to show figures embedded in the notebook.\n", "# Replace `inline` with `notebook` in classic notebook\n", "%matplotlib inline \n", "\n", "# Option available in jupyterlab. See https://github.com/matplotlib/jupyter-matplotlib\n", "#%matplotlib widget \n", "\n", "import warnings\n", "warnings.filterwarnings(\"ignore\") # to get rid of deprecation warnings\n", "\n", "from abipy import abilab\n", "import os\n", "import abipy.flowtk as flowtk\n", "import abipy.data as abidata\n", "\n", "def make_scf_nscf_inputs():\n", " \"\"\"Build ands return the input files for the GS-SCF and the GS-NSCF tasks.\"\"\"\n", " multi = abilab.MultiDataset(structure=abidata.cif_file(\"si.cif\"),\n", " pseudos=abidata.pseudos(\"14si.pspnc\"), ndtset=2)\n", "\n", " # Set global variables (dataset1 and dataset2)\n", " multi.set_vars(ecut=6, nband=8)\n", "\n", " # Dataset 1 (GS-SCF run)\n", " multi[0].set_kmesh(ngkpt=[8,8,8], shiftk=[0,0,0])\n", " multi[0].set_vars(tolvrs=1e-6)\n", "\n", " # Dataset 2 (GS-NSCF run on a k-path)\n", " kptbounds = [\n", " [0.5, 0.0, 0.0], # L point\n", " [0.0, 0.0, 0.0], # Gamma point\n", " [0.0, 0.5, 0.5], # X point\n", " ]\n", "\n", " multi[1].set_kpath(ndivsm=6, kptbounds=kptbounds)\n", " multi[1].set_vars(tolwfr=1e-12)\n", " \n", " # Return two input files for the GS and the NSCF run\n", " scf_input, nscf_input = multi.split_datasets()\n", " return scf_input, nscf_input" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we have our two input files, we pass them to the \n", "factory function `bandstructure_flow` that returns our `Flow`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "scf_input, nscf_input = make_scf_nscf_inputs()\n", "\n", "workdir = \"/tmp/hello_bands\"\n", "flow = flowtk.bandstructure_flow(workdir, scf_input, nscf_input)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`bandstructure_flow` took care of creating the correct dependency between the two tasks.\n", "The `NscfTask`, indeed, depends on the `ScfTask` in w0/t0, whereas the `ScfTask` has no dependency:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174773, workdir=../../../../../tmp/hello_bands\n", "\n", "clusterw0\n", "\n", "BandStructureWork (w0)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "ScfTask\n", "\n", "\n", "\n", "w0_t1\n", "\n", "w0_t1\n", "NscfTask\n", "\n", "\n", "\n", "w0_t0->w0_t1\n", "\n", "\n", "DEN\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flow.get_graphviz()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Note that we have not used `getden2 = -1` in the second dataset \n", "since AbiPy knows how to connect the two Tasks.\n", "So no need for `get*` or `ird*` variables with Abipy. \n", "Just specify the correct dependency and python will do the rest!\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To have useful information on the status of the flow, one uses:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Work #0: , Finalized=False\n", "+--------+-------------+---------+--------------+------------+----------+-----------------+--------+-----------+\n", "| Task | Status | Queue | MPI|Omp|Gb | Warn|Com | Class | Sub|Rest|Corr | Time | Node_ID |\n", "+========+=============+=========+==============+============+==========+=================+========+===========+\n", "| w0_t0 | Initialized\u001b[0m | None | 1| 1|2.0 | 1| 1 | ScfTask | (0, 0, 0) | None | 174775 |\n", "+--------+-------------+---------+--------------+------------+----------+-----------------+--------+-----------+\n", "| w0_t1 | Initialized\u001b[0m | None | 1| 1|2.0 | 0| 3 | NscfTask | (0, 0, 0) | None | 174776 |\n", "+--------+-------------+---------+--------------+------------+----------+-----------------+--------+-----------+\n", "\n" ] } ], "source": [ "flow.show_status()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Meaning of the different columns:\n", " \n", " * *Task*: short name of the task (usually *w[index_of_work_in_flow]_t[index_of_task_in_work]*\n", " * *Status*: Status of the task\n", " * *Queue*: QueueName@Job identifier returned by the resource manager when the task is submitted\n", " * *(MPI|Omp|Gb)*: Number of MPI procs, OMP threads, and memory per MPI proc\n", " * *(Warn|Com)*: Number of Error/Warning/Comment messages found in the ABINIT log\n", " * *Class*: The class of the `Task`\n", " * *(Sub|Rest|Corr)*: Number of (submissions/restart/AbiPy corrections) performed\n", " * *Node_ID* : identifier of the task, used to select tasks or works in python code or `abirun.py`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both `Flow` and `Work` are *iterable*. \n", "Iterating on a `Flow` gives `Work` objects, whereas\n", "iterating over a `Work` gives the `Tasks` inside that particular `Work`.\n", "\n", "```python\n", "for work in flow:\n", " for task in work:\n", " print(task)\n", "```\n", "\n", "`Flows` and `Works` are containers and we can select items in these containers\n", "with the syntax: flow[start:stop] or work[start:stop].\n", "This means that the previous loop is equivalent to the much more verbose version: \n", "\n", "```python\n", "for i in range(len(flow)):\n", " work = flow[i]\n", " for t in range(len(work):\n", " print(work[t])\n", "```\n", "\n", "At this point it should not be so difficult to understand that:\n", "\n", "```python\n", "flow[0][0]\n", "```\n", "\n", "gives the first task in the first work of the flow while\n", "\n", "```python\n", "flow[-1][-1]\n", "```\n", "\n", "selects the last `Task` in the last `Work`.\n", "In several cases, we only need to iterate over a flat list of tasks without caring about the works.\n", "In this case, we can use:\n", "\n", "```python\n", "for task in flow.iflat_tasks():\n", " print(task)\n", "```\n", "\n", "to iterate over all Tasks in the Flow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to build and run the Flow\n", "[[back to top](#top)]\n", "\n", "The flow is still in memory and no file has been produced. \n", "In order to build the workflow, use: " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if os.path.isdir(\"/tmp/hello_bands/\"): \n", " import shutil\n", " shutil.rmtree(\"/tmp/hello_bands\")\n", "\n", "flow.build_and_pickle_dump()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function creates the directories of the `Flow`:\n", "\n", "(If you rely on MacOSX, the tree command might not be available. To fix this, see http://osxdaily.com/2016/09/09/view-folder-tree-terminal-mac-os-tree-equivalent/)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/bin/sh: tree: command not found\n" ] } ], "source": [ "!tree /tmp/hello_bands" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look at the files/directories associated to the first work (flow[0]):" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "directory\n", "\n", "/tmp/hello_bands/w0\n", "\n", "cluster_/tmp/hello_bands/w0\n", "\n", "w0\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/tmpdata\n", "\n", "tmpdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/indata\n", "\n", "indata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/outdata\n", "\n", "outdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1\n", "\n", "t1\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0\n", "\n", "t0\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1/tmpdata\n", "\n", "tmpdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1/indata\n", "\n", "indata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1/outdata\n", "\n", "outdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0/tmpdata\n", "\n", "tmpdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0/indata\n", "\n", "indata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0/outdata\n", "\n", "outdata\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/run.abi\n", "\n", "run.abi\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/job.sh\n", "\n", "job.sh\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/run.files\n", "\n", "run.files\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/run.abi\n", "\n", "run.abi\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/job.sh\n", "\n", "job.sh\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/run.files\n", "\n", "run.files\n", "\n", "\n", "\n", "\n", "\n", "__0:cluster_/tmp/hello_bands/w0->__1:cluster_/tmp/hello_bands/w0/tmpdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__2:cluster_/tmp/hello_bands/w0->__3:cluster_/tmp/hello_bands/w0/indata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__4:cluster_/tmp/hello_bands/w0->__5:cluster_/tmp/hello_bands/w0/outdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__6:cluster_/tmp/hello_bands/w0->__7:cluster_/tmp/hello_bands/w0/t1\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__8:cluster_/tmp/hello_bands/w0->__9:cluster_/tmp/hello_bands/w0/t0\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__10:cluster_/tmp/hello_bands/w0/t1->__11:cluster_/tmp/hello_bands/w0/t1/tmpdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__12:cluster_/tmp/hello_bands/w0/t1->__13:cluster_/tmp/hello_bands/w0/t1/indata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__14:cluster_/tmp/hello_bands/w0/t1->__15:cluster_/tmp/hello_bands/w0/t1/outdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__16:cluster_/tmp/hello_bands/w0/t0->__17:cluster_/tmp/hello_bands/w0/t0/tmpdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__18:cluster_/tmp/hello_bands/w0/t0->__19:cluster_/tmp/hello_bands/w0/t0/indata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__20:cluster_/tmp/hello_bands/w0/t0->__21:cluster_/tmp/hello_bands/w0/t0/outdata\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flow[0].get_graphviz_dirtree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`w0` is the directory containing the input files of the first workflow (well, we have only one workflow in our example).\n", "`t0` and `t1` contain the input files needed to run the SCF and the NSC run, respectively.\n", "\n", "You might have noticed that each `Task` directory present the same structure:\n", " \n", " * *run.abi*: Input file\n", " * *run.files*: Files file\n", " * *job.sh*: Submission script\n", " * *outdata*: Directory containing output data files\n", " * *indata*: Directory containing input data files \n", " * *tmpdata*: Directory with temporary files\n", " \n", "
\n", "`__AbinitFlow__.pickle` is the pickle file used to save the status of `Flow`. **Don't touch it!** \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An Abinit Task *has* an [AbinitInput](abinit_input.ipynb) which in turn has a [Structure](structure.ipynb):" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "##############################################
#### SECTION: basic
##############################################
ecut 6
nband 8
ngkpt 8 8 8
kptopt 1
nshiftk 1
shiftk 0 0 0
tolvrs 1e-06
##############################################
#### STRUCTURE
##############################################
natom 2
ntypat 1
typat 1 1
znucl 14
xred
0.0000000000 0.0000000000 0.0000000000
0.2500000000 0.2500000000 0.2500000000
acell 1.0 1.0 1.0
rprim
6.3285005272 0.0000000000 3.6537614829
2.1095001757 5.9665675167 3.6537614829
0.0000000000 0.0000000000 7.3075229659" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ " flow[0][0].input" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "Full Formula (Si2)
Reduced Formula: Si
abc : 3.866975 3.866975 3.866975
angles: 60.000000 60.000000 60.000000
Sites (2)
# SP a b c
--- ---- ---- ---- ----
0 Si 0 0 0
1 Si 0.25 0.25 0.25" ], "text/plain": [ "Structure Summary\n", "Lattice\n", " abc : 3.86697462 3.86697462 3.86697462\n", " angles : 59.99999999999999 59.99999999999999 59.99999999999999\n", " volume : 40.88829179346891\n", " A : 3.3488982567096763 0.0 1.9334873100000005\n", " B : 1.1162994189032256 3.1573715557642927 1.9334873100000005\n", " C : 0.0 0.0 3.86697462\n", "PeriodicSite: Si (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000]\n", "PeriodicSite: Si (1.1163, 0.7893, 1.9335) [0.2500, 0.2500, 0.2500]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flow[0][0].input.structure" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " summary: Troullier-Martins psp for element Si Thu Oct 27 17:31:21 EDT 1994\n", " number of valence electrons: 4.0\n", " maximum angular momentum: d\n", " angular momentum for local part: d\n", " XC correlation: LDA_XC_TETER93\n", " supports spin-orbit: False\n", " radius for non-linear core correction: 1.80626423934776\n", " hint for low accuracy: ecut: 0.0, pawecutdg: 0.0\n", " hint for normal accuracy: ecut: 0.0, pawecutdg: 0.0\n", " hint for high accuracy: ecut: 0.0, pawecutdg: 0.0\n" ] } ], "source": [ "for p in flow[0][0].input.pseudos: \n", " print(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's print the value of `kptopt` for all tasks in our flow with:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, -2]\n" ] } ], "source": [ "print([task.input[\"kptopt\"] for task in flow.iflat_tasks()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "that, in this particular case, gives the same result as:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, -2]\n" ] } ], "source": [ "print([task.input[\"kptopt\"] for task in flow[0]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Executing a Flow\n", "[[back to top](#top)]\n", "\n", "The `Flow` can be executed with two different approaches: a programmatic interface based \n", "on `flow.make_scheduler` or the `abirun.py` script. \n", "In this section, we discuss the first approach because it plays well with the jupyter notebook.\n", "Note however that `abirun.py` is highly recommended especially when running non-trivial calculations. " ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33m[Wed Feb 5 11:04:13 2020] Number of launches: 1\u001b[0m\n", "\n", "Work #0: , Finalized=False\n", "+--------+-------------+------------+--------------+------------+----------+-----------------+----------+-----------+\n", "| Task | Status | Queue | MPI|Omp|Gb | Warn|Com | Class | Sub|Rest|Corr | Time | Node_ID |\n", "+========+=============+============+==============+============+==========+=================+==========+===========+\n", "| w0_t0 | \u001b[34mSubmitted\u001b[0m | 35638@gmac | 2| 1|2.0 | 0| 0 | ScfTask | (1, 0, 0) | 0:00:00Q | 174775 |\n", "+--------+-------------+------------+--------------+------------+----------+-----------------+----------+-----------+\n", "| w0_t1 | Initialized\u001b[0m | None | 1| 1|2.0 | NA|NA | NscfTask | (0, 0, 0) | None | 174776 |\n", "+--------+-------------+------------+--------------+------------+----------+-----------------+----------+-----------+\n", "\n", "\u001b[33m[Wed Feb 5 11:04:14 2020] Number of launches: 1\u001b[0m\n", "\n", "Work #0: , Finalized=False\n", "+--------+-----------+------------+--------------+------------+----------+-----------------+----------+-----------+\n", "| Task | Status | Queue | MPI|Omp|Gb | Warn|Com | Class | Sub|Rest|Corr | Time | Node_ID |\n", "+========+===========+============+==============+============+==========+=================+==========+===========+\n", "| w0_t0 | \u001b[32mCompleted\u001b[0m | 35638@gmac | 2| 1|2.0 | 1| 1 | ScfTask | (1, 0, 0) | 0:00:01R | 174775 |\n", "+--------+-----------+------------+--------------+------------+----------+-----------------+----------+-----------+\n", "| w0_t1 | \u001b[34mSubmitted\u001b[0m | 35647@gmac | 2| 1|2.0 | 0| 0 | NscfTask | (1, 0, 0) | 0:00:00Q | 174776 |\n", "+--------+-----------+------------+--------------+------------+----------+-----------------+----------+-----------+\n", "\n", "\n", "Work #0: , Finalized=\u001b[32mTrue\u001b[0m\n", " Finalized works are not shown. Use verbose > 0 to force output.\n", "\u001b[32m\n", "all_ok reached\n", "\u001b[0m\n", "\n", "Submitted on: Wed Feb 5 11:04:12 2020\n", "Completed on: Wed Feb 5 11:04:16 2020\n", "Elapsed time: 0:00:04.021915\n", "Flow completed successfully\n", "\n", "Calling flow.finalize()...\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flow.make_scheduler().start()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The flow keeps track of the different actions performed by the python code:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32m\n", "==================================================================================================================\n", "================= =================\n", "==================================================================================================================\u001b[0m\n", "[Wed Feb 5 11:04:14 2020] In on_ok with sender \n", "[Wed Feb 5 11:04:16 2020] In on_ok with sender \n", "[Wed Feb 5 11:04:16 2020] Finalized set to True\n", "[Wed Feb 5 11:04:16 2020] Work is finalized and broadcasts signal S_OK\n", "[Wed Feb 5 11:04:16 2020] Node broadcasts signal Completed\n", "\u001b[32m\n", "===================================================================================================================\n", "===================== =====================\n", "===================================================================================================================\u001b[0m\n", "[Wed Feb 5 11:04:12 2020] Status changed to Ready. msg: Status set to Ready\n", "[Wed Feb 5 11:04:12 2020] Setting input variables: {'autoparal': 1, 'max_ncpus': 2, 'mem_test': 0}\n", "[Wed Feb 5 11:04:12 2020] Old values: {'autoparal': None, 'max_ncpus': None, 'mem_test': None}\n", "[Wed Feb 5 11:04:13 2020] Setting input variables: {'npimage': 1, 'npkpt': 2, 'npspinor': 1, 'npfft': 1, 'npband': 1, 'bandpp': 1}\n", "[Wed Feb 5 11:04:13 2020] Old values: {'npimage': None, 'npkpt': None, 'npspinor': None, 'npfft': None, 'npband': None, 'bandpp': None}\n", "[Wed Feb 5 11:04:13 2020] Status changed to Initialized. msg: finished autoparal run\n", "[Wed Feb 5 11:04:13 2020] Submitted with MPI=2, Omp=1, Memproc=2.0 [Gb] Submitted to queue \n", "[Wed Feb 5 11:04:14 2020] Task completed status set to OK based on abiout\n", "[Wed Feb 5 11:04:14 2020] Finalized set to True\n", "[Wed Feb 5 11:04:14 2020] Node broadcasts signal Completed\n", "\u001b[32m\n", "==================================================================================================================\n", "==================== ====================\n", "==================================================================================================================\u001b[0m\n", "[Wed Feb 5 11:04:14 2020] Status changed to Ready. msg: Status set to Ready\n", "[Wed Feb 5 11:04:14 2020] Need path /tmp/hello_bands/w0/t0/outdata/out_DEN with ext DEN\n", "[Wed Feb 5 11:04:14 2020] Linking path /tmp/hello_bands/w0/t0/outdata/out_DEN --> /tmp/hello_bands/w0/t1/indata/in_DEN\n", "[Wed Feb 5 11:04:14 2020] Setting input variables: {'ngfft': [18, 18, 18]}\n", "[Wed Feb 5 11:04:14 2020] Old values: {'ngfft': None}\n", "[Wed Feb 5 11:04:14 2020] Adding connecting vars {'irdden': 1}\n", "[Wed Feb 5 11:04:14 2020] Setting input variables: {'irdden': 1}\n", "[Wed Feb 5 11:04:14 2020] Old values: {'irdden': None}\n", "[Wed Feb 5 11:04:14 2020] Setting input variables: {'autoparal': 1, 'max_ncpus': 2, 'mem_test': 0}\n", "[Wed Feb 5 11:04:14 2020] Old values: {'autoparal': None, 'max_ncpus': None, 'mem_test': None}\n", "[Wed Feb 5 11:04:14 2020] Setting input variables: {'npimage': 1, 'npkpt': 2, 'npspinor': 1, 'npfft': 1, 'npband': 1, 'bandpp': 1}\n", "[Wed Feb 5 11:04:14 2020] Old values: {'npimage': None, 'npkpt': None, 'npspinor': None, 'npfft': None, 'npband': None, 'bandpp': None}\n", "[Wed Feb 5 11:04:14 2020] Status changed to Initialized. msg: finished autoparal run\n", "[Wed Feb 5 11:04:14 2020] Submitted with MPI=2, Omp=1, Memproc=2.0 [Gb] Submitted to queue \n", "[Wed Feb 5 11:04:16 2020] Task completed status set to OK based on abiout\n", "[Wed Feb 5 11:04:16 2020] Finalized set to True\n", "[Wed Feb 5 11:04:16 2020] Node broadcasts signal Completed\n", "\u001b[32m\n", "==================================================================================================================\n", "========================= =========================\n", "==================================================================================================================\u001b[0m\n", "[Wed Feb 5 11:04:16 2020] Calling flow.finalize.\n", "[Wed Feb 5 11:04:16 2020] Finalized set to True\n" ] } ], "source": [ "flow.show_history()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you read the logs carefully, you will realize that in the first iteration of the scheduler, \n", "only the `ScfTask` is executed because the second task depends on it. \n", "After the initial submission, the scheduler starts to monitor all the tasks in the flow.\n", "\n", "When the ScfTask completes, the dependency of the NscfTask is fullfilled and \n", "a new submission takes place. Once the second task completes, the scheduler calls `flow.finalize`\n", "to execute (optional) logic that is supposed to be executed to perform some sort of cleanup or final processing.\n", "At this point, all the tasks in the flow are completed and the scheduler exits.\n", "\n", "Now we can have a look at the different output files produced by our flow with:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "directory\n", "\n", "/tmp/hello_bands/w0\n", "\n", "cluster_/tmp/hello_bands/w0\n", "\n", "w0\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/tmpdata\n", "\n", "tmpdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/indata\n", "\n", "indata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/outdata\n", "\n", "outdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1\n", "\n", "t1\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0\n", "\n", "t0\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1/tmpdata\n", "\n", "tmpdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1/indata\n", "\n", "indata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t1/outdata\n", "\n", "outdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0/tmpdata\n", "\n", "tmpdata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0/indata\n", "\n", "indata\n", "\n", "\n", "cluster_/tmp/hello_bands/w0/t0/outdata\n", "\n", "outdata\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/run.log\n", "\n", "run.log\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/run.abo\n", "\n", "run.abo\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/run.abi\n", "\n", "run.abi\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/__startlock__\n", "\n", "__startlock__\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/run.err\n", "\n", "run.err\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/job.sh\n", "\n", "job.sh\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/autoparal.json\n", "\n", "autoparal.json\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/run.files\n", "\n", "run.files\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/indata/in_DEN\n", "\n", "in_DEN\n", "\n", "\n", "\n", "../../../../../../private/tmp/hello_bands/w0/t0/outdata/out_DEN\n", "\n", "../../../../../../private/tmp/hello_bands/w0/t0/outdata/out_DEN\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/indata/in_DEN->../../../../../../private/tmp/hello_bands/w0/t0/outdata/out_DEN\n", "\n", "\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/outdata/out_EIG.nc\n", "\n", "out_EIG.nc\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/outdata/out_WFK\n", "\n", "out_WFK\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/outdata/out_EBANDS.agr\n", "\n", "out_EBANDS.agr\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/outdata/out_GSR.nc\n", "\n", "out_GSR.nc\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/outdata/out_EIG\n", "\n", "out_EIG\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/outdata/out_DEN\n", "\n", "out_DEN\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t1/outdata/out_OUT.nc\n", "\n", "out_OUT.nc\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/run.log\n", "\n", "run.log\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/run.abo\n", "\n", "run.abo\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/run.abi\n", "\n", "run.abi\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/__startlock__\n", "\n", "__startlock__\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/run.err\n", "\n", "run.err\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/job.sh\n", "\n", "job.sh\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/autoparal.json\n", "\n", "autoparal.json\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/run.files\n", "\n", "run.files\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_EIG.nc\n", "\n", "out_EIG.nc\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_WFK\n", "\n", "out_WFK\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_EBANDS.agr\n", "\n", "out_EBANDS.agr\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_DDB\n", "\n", "out_DDB\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_GSR.nc\n", "\n", "out_GSR.nc\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_EIG\n", "\n", "out_EIG\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_DEN\n", "\n", "out_DEN\n", "\n", "\n", "\n", "/tmp/hello_bands/w0/t0/outdata/out_OUT.nc\n", "\n", "out_OUT.nc\n", "\n", "\n", "\n", "\n", "\n", "__0:cluster_/tmp/hello_bands/w0->__1:cluster_/tmp/hello_bands/w0/tmpdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__2:cluster_/tmp/hello_bands/w0->__3:cluster_/tmp/hello_bands/w0/indata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__4:cluster_/tmp/hello_bands/w0->__5:cluster_/tmp/hello_bands/w0/outdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__6:cluster_/tmp/hello_bands/w0->__7:cluster_/tmp/hello_bands/w0/t1\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__8:cluster_/tmp/hello_bands/w0->__9:cluster_/tmp/hello_bands/w0/t0\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__10:cluster_/tmp/hello_bands/w0/t1->__11:cluster_/tmp/hello_bands/w0/t1/tmpdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__12:cluster_/tmp/hello_bands/w0/t1->__13:cluster_/tmp/hello_bands/w0/t1/indata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__14:cluster_/tmp/hello_bands/w0/t1->__15:cluster_/tmp/hello_bands/w0/t1/outdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__16:cluster_/tmp/hello_bands/w0/t0->__17:cluster_/tmp/hello_bands/w0/t0/tmpdata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__18:cluster_/tmp/hello_bands/w0/t0->__19:cluster_/tmp/hello_bands/w0/t0/indata\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__20:cluster_/tmp/hello_bands/w0/t0->__21:cluster_/tmp/hello_bands/w0/t0/outdata\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flow[0].get_graphviz_dirtree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or list only the files with a given extension:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 2 files with extension `GSR.nc` produced by the flow\n", "File Size [Mb] Node_ID Node Class\n", "------------------------------------------------------- ----------- --------- ------------\n", "../../../../../tmp/hello_bands/w0/t0/outdata/out_GSR.nc 0.02 174775 ScfTask\n", "../../../../../tmp/hello_bands/w0/t1/outdata/out_GSR.nc 0.02 174776 NscfTask\n" ] } ], "source": [ "flow.listext(\"GSR.nc\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The nice thing about the flow is that the object knows how to locate and interpret the \n", "different input/ouput files produced by Abinit. \n", "As a consequence, it is very easy to expose the AbiPy post-processing tools with a easy-to-use API\n", "in which only tasks/works/flow plus a very few input arguments are required.\n", "\n", "Let's call, for instance, the inspect method to plot the self-consistent cycles:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mTask does not provide an inspect method\u001b[0m\n" ] } ], "source": [ "flow.inspect(tight_layout=True);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the other AbiPy tutorials, we have explained how to use abiopen to create \n", "python objects from netcdf files. \n", "Well, the same code can be reused with the flow. \n", "It is just a matter of replacing\n", "\n", "```python\n", "with abiopen(filepath) as gsr:\n", "```\n", "\n", "with\n", "\n", "```python\n", "with task.open_gsr() as gsr:\n", "```\n", "\n", "Note that there is no need to specify the file path when you use the task-based API, because \n", "the `Task` knows how to locate its `GSR.nc` output.\n", "Let's do some practice..." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "with flow[0][0].open_gsr() as gsr:\n", " ebands_kmesh = gsr.ebands\n", " \n", "with flow[0][1].open_gsr() as gsr:\n", " gsr.ebands.plot_with_edos(ebands_kmesh.get_edos(), with_gaps=True);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More on Works, Tasks and dependencies \n", "[[back to top](#top)]\n", "\n", "In the previous example, we have constructed a workflow for band structure calculations\n", "starting from two input files and the magic line\n", "\n", "```python\n", "flow = flowtk.bandstructure_flow(workdir, scf_input, nscf_input)\n", "```\n", "\n", "Now it is the right time to explain in more details the syntax and the API used in AbiPy \n", "to build a flow with dependencies.\n", "Let's try to build a `Flow` from scratch and use graphviz after each step to show what's happening.\n", "We start with an empty flow in the `hello_flow` directory:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174777, workdir=hello_flow\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hello_flow = flowtk.Flow(workdir=\"hello_flow\")\n", "hello_flow.get_graphviz()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we add a new `Task` by just passing an `AbinitInput` for SCF calculations:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174777, workdir=hello_flow\n", "\n", "clusterw0\n", "\n", "Work (w0)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "ScfTask\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hello_flow.register_scf_task(scf_input, append=True)\n", "hello_flow.get_graphviz()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the tricky part. \n", "We want to register a NSCF calculation that should depend on the `scf_task` in `w0_t0` via the DEN file.\n", "We can use the same API but we **must** specify the dependency between the two steps with the \n", "```{scf_task: \"DEN\"}``` dictionary:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174777, workdir=hello_flow\n", "\n", "clusterw0\n", "\n", "Work (w0)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "ScfTask\n", "\n", "\n", "\n", "w0_t1\n", "\n", "w0_t1\n", "NscfTask\n", "\n", "\n", "\n", "w0_t0->w0_t1\n", "\n", "\n", "DEN\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hello_flow.register_nscf_task(nscf_input, deps={hello_flow[0][0]: \"DEN\"}, append=True)\n", "hello_flow.get_graphviz(engine=\"dot\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Excellent, we managed to build our first AbiPy flow with inter-dependent tasks in just six lines \n", "of code (including the three calls to graphviz).\n", "Now let's assume we want to add a second Nscf calculation (`NscTask`) in which we change one of the input parameters\n", "e.g. the number of bands and that, for some reason, we really want to re-use the output WFK file \n", "produced by `w0_t1` to initialize the eigenvalue solver (obviously we still need a DEN file).\n", "How can we express this with AbiPy? \n", "\n", "Well, the syntax for the new deps, it's just:\n", "\n", "```python\n", "deps = {hello_flow[0][0]: \"DEN\", hello_flow[0][1]: \"WFK\"}\n", "```\n", "\n", "but we should also change the input variable nband in the `nscf_input` before creating\n", "the new `NscTask` (remember that building a `Task` requires an `AbinitInput` object \n", "and a list of dependencies, if any).\n", "\n", "Now there are two ways to increase nband: the **wrong** way and the **correct** one!\n", "Let's start from the *wrong* way because it's always useful to learn from our mistakes.\n", "Let's print some values just for the record:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nband in the first NscfTask: 8\n" ] } ], "source": [ "t1 = flow[0][1]\n", "print(\"nband in the first NscfTask:\", t1.input[\"nband\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's use the \"recipe\" recommended to us by the FORTRAN guru of our group: \n", "\n", "\"\"" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nband in the first NscfTask: 1000\n", "nband in the new input: 1000\n" ] } ], "source": [ "# Just copy the previous input and change nband, it's super-easy, the FORTRAN guru said!\n", "new_input = t1.input\n", "new_input[\"nband\"] = 1000\n", "\n", "print(\"nband in the first NscfTask:\", t1.input[\"nband\"])\n", "print(\"nband in the new input:\", new_input[\"nband\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tada! Thanks to the trick of our beloved FORTRAN guru, we ended up with *two* NscfTaks\n", "with the **same number** of bands (1000!). Why?\n", "\n", "Because `AbinitInput` is implemented internally with a dictionary, python dictionaries are **mutable**\n", "and python variables are essentially references (they do not store data, actually they store the address of the data)." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a dict: {'foo': 'bar', 'hello': 'world'}\n", "b dict: {'foo': 'bar', 'hello': 'world'}\n", "c dict: {'foo': 'bar'}\n" ] } ], "source": [ "a = {\"foo\": \"bar\"}\n", "b = a \n", "c = a.copy()\n", "a[\"hello\"] = \"world\"\n", "print(\"a dict:\", a)\n", "print(\"b dict:\", b)\n", "print(\"c dict:\", c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a more techical explanation see [here](http://docs.python-guide.org/en/latest/writing/gotchas/)\n", "\n", "To avoid this mistake, we need to *copy* the object before changing it" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nband in the first NscfTask: 8\n", "nband in the new input: 1000\n" ] } ], "source": [ "t1.input[\"nband\"] = 8 # back to the old value\n", "\n", "new_input = t1.input.new_with_vars(nband=1000) # Copy and change nband\n", "\n", "print(\"nband in the first NscfTask:\", t1.input[\"nband\"])\n", "print(\"nband in the new input:\", new_input[\"nband\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can finally add the second `NscfTask` with 1000 bands:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174777, workdir=hello_flow\n", "\n", "clusterw0\n", "\n", "Work (w0)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "ScfTask\n", "\n", "\n", "\n", "w0_t1\n", "\n", "w0_t1\n", "NscfTask\n", "\n", "\n", "\n", "w0_t0->w0_t1\n", "\n", "\n", "DEN\n", "\n", "\n", "\n", "w0_t2\n", "\n", "w0_t2\n", "NscfTask\n", "\n", "\n", "\n", "w0_t0->w0_t2\n", "\n", "\n", "DEN\n", "\n", "\n", "\n", "w0_t1->w0_t2\n", "\n", "\n", "WFK\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hello_flow.register_nscf_task(new_input, deps={hello_flow[0][0]: \"DEN\", hello_flow[0][1]: \"WFK\"}, append=True)\n", "hello_flow.get_graphviz(engine=\"dot\")" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[8, 8, 1000]\n" ] } ], "source": [ "print([task.input[\"nband\"] for task in hello_flow.iflat_tasks()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that AbiPy dependencies can also be fulfilled with external files that are already available\n", "when the flow is constructed. There is no change in the syntax we've used so far.\n", "It is just a matter of using the absolute path to the DEN file as keyword of the dictionary instead of a `Task`.\n", "Let's start with a new `Flow` to avoid confusion and create a `NscfTask` that will start from a pre-computed `DEN` file." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174782, workdir=flow_with_file\n", "\n", "clusterw0\n", "\n", "Work (w0)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "NscfTask\n", "\n", "\n", "\n", "/Users/gmatteo/git_repos/abipy/abipy/data/refs/si_ebands/si_DEN.nc\n", "\n", "FileNode, node_id=174785, rpath=../../abipy/abipy/data/refs/si_ebands/si_DEN.nc\n", "\n", "\n", "\n", "/Users/gmatteo/git_repos/abipy/abipy/data/refs/si_ebands/si_DEN.nc->w0_t0\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flow_with_file = flowtk.Flow(workdir=\"flow_with_file\")\n", "\n", "den_filepath = abidata.ref_file(\"si_DEN.nc\")\n", "flow_with_file.register_nscf_task(nscf_input, deps={den_filepath: \"DEN\"})\n", "\n", "flow_with_file.get_graphviz(engine=\"dot\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A call to `new_with_vars` inside a python `for` loop is all we need to add other two `NscfTasks`\n", "with different `nband`, all starting from the same DEN file:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[8, 10, 20]\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174782, workdir=flow_with_file\n", "\n", "clusterw0\n", "\n", "Work (w0)\n", "\n", "\n", "clusterw1\n", "\n", "Work (w1)\n", "\n", "\n", "clusterw2\n", "\n", "Work (w2)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "NscfTask\n", "\n", "\n", "\n", "w1_t0\n", "\n", "w1_t0\n", "NscfTask\n", "\n", "\n", "\n", "w2_t0\n", "\n", "w2_t0\n", "NscfTask\n", "\n", "\n", "\n", "/Users/gmatteo/git_repos/abipy/abipy/data/refs/si_ebands/si_DEN.nc\n", "\n", "FileNode, node_id=174785, rpath=../../abipy/abipy/data/refs/si_ebands/si_DEN.nc\n", "\n", "\n", "\n", "/Users/gmatteo/git_repos/abipy/abipy/data/refs/si_ebands/si_DEN.nc->w0_t0\n", "\n", "\n", "\n", "\n", "\n", "/Users/gmatteo/git_repos/abipy/abipy/data/refs/si_ebands/si_DEN.nc->w1_t0\n", "\n", "\n", "\n", "\n", "\n", "/Users/gmatteo/git_repos/abipy/abipy/data/refs/si_ebands/si_DEN.nc->w2_t0\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "for nband in [10, 20]:\n", " flow_with_file.register_nscf_task(nscf_input.new_with_vars(nband=nband), \n", " deps={den_filepath: \"DEN\"}, append=False)\n", "\n", "print([task.input[\"nband\"] for task in flow_with_file.iflat_tasks()])\n", "flow_with_file.get_graphviz()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, you may ask why we need `Works` since all the examples presented so far \n", "mainly involve the `Flow` object.\n", "\n", "The answer is that `Works` allow us to encapsulate reusable logic in magic boxes \n", "that can perform lot of useful work. \n", "These boxes can then be connected together to generate more complicated workflows.\n", "We have already encountered the `BandStructureWork` at the beginning of this lesson\n", "and now it is time to introduce another fancy animal of the AbiPy zoo, the `PhononWork`." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
class PhononWork(Work, MergeDdb):\n",
       "    """\n",
       "    This work consists of nirred Phonon tasks where nirred is\n",
       "    the number of irreducible atomic perturbations for a given set of q-points.\n",
       "    It provides the callback method (on_all_ok) that calls mrgddb (mrgdv) to merge\n",
       "    all the partial DDB (POT) files produced. The two files are available in the\n",
       "    output directory of the Work.\n",
       "\n",
       "    .. rubric:: Inheritance Diagram\n",
       "    .. inheritance-diagram:: PhononWork\n",
       "    """\n",
       "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abilab.print_doc(flowtk.PhononWork)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
    @classmethod\n",
       "    def from_scf_task(cls, scf_task, qpoints, is_ngqpt=False, tolerance=None, with_becs=False,\n",
       "                      ddk_tolerance=None, manager=None):\n",
       "        """\n",
       "        Construct a `PhononWork` from a |ScfTask| object.\n",
       "        The input file for phonons is automatically generated from the input of the ScfTask.\n",
       "        Each phonon task depends on the WFK file produced by the `scf_task`.\n",
       "\n",
       "        Args:\n",
       "            scf_task: |ScfTask| object.\n",
       "            qpoints: q-points in reduced coordinates. Accepts single q-point, list of q-points\n",
       "                or three integers defining the q-mesh if `is_ngqpt`.\n",
       "            is_ngqpt: True if `qpoints` should be interpreted as divisions instead of q-points.\n",
       "            tolerance: dict {"varname": value} with the tolerance to be used in the phonon run.\n",
       "                None to use AbiPy default.\n",
       "            with_becs: Activate calculation of Electric field and Born effective charges.\n",
       "            ddk_tolerance: dict {"varname": value} with the tolerance used in the DDK run if with_becs.\n",
       "                None to use AbiPy default.\n",
       "            manager: |TaskManager| object.\n",
       "        """\n",
       "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abilab.print_doc(flowtk.PhononWork.from_scf_task)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The docstring seems to suggest that if I have a `scf_task`, I can construct a magic box\n", "to compute phonons but wait, I already have such a task! \n", "Actually I already have another magic box to compute the electronic band structure \n", "and it would be really great if I could compute the electronic and vibrational properties in a single flow.\n", "Let's connect the two boxes together with:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174792, workdir=phflow\n", "\n", "clusterw0\n", "\n", "BandStructureWork (w0)\n", "\n", "\n", "clusterw1\n", "\n", "PhononWork (w1)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "ScfTask\n", "\n", "\n", "\n", "w0_t1\n", "\n", "w0_t1\n", "NscfTask\n", "\n", "\n", "\n", "w0_t0->w0_t1\n", "\n", "\n", "DEN\n", "\n", "\n", "\n", "w1_t0\n", "\n", "w1_t0\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t0\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w1_t1\n", "\n", "w1_t1\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t1\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w1_t2\n", "\n", "w1_t2\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t2\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w1_t3\n", "\n", "w1_t3\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t3\n", "\n", "\n", "WFK\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create new flow.\n", "ph_flow = flowtk.Flow(workdir=\"phflow\")\n", "\n", "# Band structure (SCF + NSCF)\n", "bands_work = flowtk.BandStructureWork(scf_input, nscf_input, dos_inputs=None)\n", "ph_flow.register_work(bands_work) \n", " \n", "# Build second work from scf_task.\n", "scf_task = bands_work[0]\n", "ph_work = flowtk.PhononWork.from_scf_task(scf_task, [2, 2, 2], is_ngqpt=True, tolerance=None)\n", "ph_flow.register_work(ph_work) \n", "\n", "ph_flow.get_graphviz()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now it turns out that the `PhononWork` merges all the DDB files produced by its `PhononTask`\n", "and put this final output file in its outdir. \n", "So from the AbiPy perspective, a `PhononWork` is not that different from a `ScfTask` that produces e.g. a DEN file.\n", "This means that we can connect other magic boxes to our `PhononWork` e.g. a set of `EPhTasks` that \n", "require a DDB file and another input file with the DFPT potentials \n", "(DVDB, merged by `PhononWork` similarly to what is done for the DDB)." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "flow\n", "\n", "Flow, node_id=174792, workdir=phflow\n", "\n", "clusterw0\n", "\n", "BandStructureWork (w0)\n", "\n", "\n", "clusterw1\n", "\n", "PhononWork (w1)\n", "\n", "\n", "clusterw2\n", "\n", "Work (w2)\n", "\n", "\n", "\n", "w0_t0\n", "\n", "w0_t0\n", "ScfTask\n", "\n", "\n", "\n", "w0_t1\n", "\n", "w0_t1\n", "NscfTask\n", "\n", "\n", "\n", "w0_t0->w0_t1\n", "\n", "\n", "DEN\n", "\n", "\n", "\n", "w1_t0\n", "\n", "w1_t0\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t0\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w1_t1\n", "\n", "w1_t1\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t1\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w1_t2\n", "\n", "w1_t2\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t2\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w1_t3\n", "\n", "w1_t3\n", "PhononTask\n", "\n", "\n", "\n", "w0_t0->w1_t3\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w2_t0\n", "\n", "w2_t0\n", "EphTask\n", "\n", "\n", "\n", "w0_t1->w2_t0\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w2_t1\n", "\n", "w2_t1\n", "EphTask\n", "\n", "\n", "\n", "w0_t1->w2_t1\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "w2_t2\n", "\n", "w2_t2\n", "EphTask\n", "\n", "\n", "\n", "w0_t1->w2_t2\n", "\n", "\n", "WFK\n", "\n", "\n", "\n", "DDB (w1)\n", "\n", "DDB (w1)\n", "\n", "\n", "\n", "DDB (w1)->w2_t0\n", "\n", "\n", "\n", "\n", "\n", "DDB (w1)->w2_t1\n", "\n", "\n", "\n", "\n", "\n", "DDB (w1)->w2_t2\n", "\n", "\n", "\n", "\n", "\n", "DVDB (w1)\n", "\n", "DVDB (w1)\n", "\n", "\n", "\n", "DVDB (w1)->w2_t0\n", "\n", "\n", "\n", "\n", "\n", "DVDB (w1)->w2_t1\n", "\n", "\n", "\n", "\n", "\n", "DVDB (w1)->w2_t2\n", "\n", "\n", "\n", "\n", "\n", "\n", "__0:clusterw1->DDB (w1)\n", "\n", "\n", "\n", "\n", "\n", "\n", "__1:clusterw1->DVDB (w1)\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# EPH tasks require 3 input files (WFK, DDB, DVDB)\n", "eph_deps = {ph_flow[0][1]: \"WFK\", ph_work: [\"DDB\", \"DVDB\"]}\n", "\n", "for i, ecut in enumerate([2, 3, 4]):\n", " ph_flow.register_eph_task(nscf_input.new_with_vars(ecut=ecut), deps=eph_deps, append=(i != 0))\n", " \n", "ph_flow.get_graphviz()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This explains why in AbiPy we have this classification in terms of `Tasks/Works/Flows`.\n", "As a consequence, we can implement highly specialized `Works/Tasks` to\n", "solve specific problems and then connect all these nodes together.\n", "Just 11 lines of code to get electrons + phonons + (electrons + phononons)!\n", "\n", "But wait, did you see the gorilla?" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "image/jpeg": "\n", "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import YouTubeVideo\n", "YouTubeVideo(\"vJG698U2Mvo\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's indeed a bug in the last step. The connections among the nodes are OK but we made\n", "a mistake while creating the `EPhTasks` with:\n", "\n", "```python\n", "ph_flow.register_eph_task(nscf_input.new_with_vars(ecut=ecut), ...)\n", "```\n", "\n", "because we passed an input for a standard band structure calculation to something that is supposed\n", "to deal with E-PH interaction.\n", "This essentially to stress that the AbiPy `Flow`, *by design*, does not try to validate your input to make sure\n", "it is consistent with the workflow logic.\n", "This is done on purpose for two reasons:\n", "\n", "- Expert users should be able to customize/tune their input files and validating all the possible cases in python is not trivial\n", "- Only Abinit (and God) knows at run-time if your input file makes sense and we can't reimplement the same logic in python\n", "\n", "At this point, you may wonder why we have so many different Abipy Tasks (`ScfTask`, `NscfTask`, `RelaxTask`, `PhononTask`, `EPHTask` ...) if there's no input validation when we create them...\n", "\n", "The answer is that we need all these subclasses to implement extra logic that is specific to that particular calculation. Abipy, indeed, is not just submitting jobs. It also monitors the evolution of the calculation\n", "and execute pre-defined code to fix run-time problems (and these problems are calculation specific).\n", "An example will help clarify this point.\n", "\n", "Restarting jobs is one of the typical problem encountered in ab-initio calculations\n", "and restarting a `RelaxTask` requires a different logic from e.g. restarting a `ScfTask`.\n", "In the case of a `ScfTask` we only need to use the output WFK (DEN) of the previous execution \n", "as input of the restarted job while a `RelaxTask` must also re-use the (unconverged) final structure \n", "of the previous job to be effective and avoid a possibly infinite loop.\n", "In a nutshell, when you are using a particular `Task/Work` class you are telling AbiPy how to handle possible\n", "problems at run-time and you are also specifying the actions that should be performed \n", "at the beginning/end of the execution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Abirun.py\n", "[[back to top](#top)]\n", "\n", "Executing \n", "\n", "```python \n", "flow.make_scheduler().start()\n", "``` \n", "\n", "inside a jupyter notebook is handy if you are dealing with small calculations that require few seconds or minutes. \n", "This approach, however, is unpractical when you have large flows or big calculations requiring hours or days, \n", "even on massively parallel machines.\n", "In this case, indeed, one would like to run the scheduler in a separate process in the backgroud \n", "so that the scheduler is not killed when the jupyter server is closed.\n", "\n", "To start the scheduler in a separate process, use the `abirun.py` script.\n", "The syntax is:\n", "\n", " abirun.py flow_workdir COMMAND\n", "\n", "where `flow_workdir` is the directory containing the `Flow` \n", "(the directory with the pickle file) and `command` selects the operation to be performed.\n", "\n", "Typical examples:\n", "\n", " abirun.py /tmp/hello_bands status\n", " \n", "checks the status of the `Flow` and print the results to screen while\n", "\n", " nohup abirun.py /tmp/hello_bands scheduler > sched.log 2> sched.err &\n", " \n", "starts the scheduler in the background redirecting the standard output to file `sched.log`\n", "\n", "
\n", "`nohup` is a standard Unix tool. The command make the scheduler immune \n", "to hangups so that you can close the shell session without killing the scheduler.\n", "
\n", "\n", "This brings us to the last and most crucial question. \n", "How do we configure AbiPy to run Abinit workflows on different architectures ranging from \n", "standard laptops to high-performance supercomputers?\n", "\n", "Unfortunately this notebook is already quite long and these details are best covered \n", "in a technical documentation.\n", "What should be stressed here is that the behaviour can be customized with two Yaml files.\n", "All the information related to your environment (Abinit build, modules, resource managers, shell environment)\n", "are read from the `manager.yml` configuration file, that is usually located in the directory `~/.abinit/abipy/`\n", "The options for the python scheduler responsible for job submission are given in `scheduler.yml`.\n", "\n", "For a more complete description of these configuration options, \n", "please consult the [TaskManager documentation](http://abinit.github.io/abipy/workflows/taskmanager.html).\n", "A list of configuration files for different machines and clusters is available \n", "[here](http://abinit.github.io/abipy/workflows/manager_examples.html)\n", "while the [Flows HOWTO](http://abinit.github.io/abipy/flows_howto.html)\n", "gathers answers to frequently asked questions.\n", "\n", "Last but not least, check out our \n", "[gallery of AbiPy Flows](http://abinit.github.io/abipy/flow_gallery/index.html) for inspiration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Back to the main [Index](index.ipynb)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" }, "latex_envs": { "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 0 } }, "nbformat": 4, "nbformat_minor": 4 }