{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before running the tutorial you should have Julia 1.5.x or newer installed and add the JuliaMolSim registry \n", "\n", "`] registry add https://github.com/JuliaRegistries/General`\n", "\n", "`] registry add https://github.com/JuliaMolSim/MolSim.git`\n", "\n", "Packages can then be installed using\n", "\n", "`] add JuLIP, ACE, IPFitting`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# ACE Fitting and Regularisation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "using IPFitting, ACE, JuLIP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading in the configurations" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reading in ./Al_LD_MD.xyz ...\n", "Processing data ...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Keys used: E => \"energy\", F => \"force\", V => \"virial\"\n", "└ @ IPFitting.Data /Users/Cas/.julia/dev/IPFitting/src/data.jl:153\n", "┌ Info: Array keys found: \"force\" [1063], \"masses\" [999], \"momenta\" [999], \"numbers\" [1063], \"positions\" [1063]\n", "└ @ IPFitting.Data /Users/Cas/.julia/dev/IPFitting/src/data.jl:157\n", "┌ Info: Info keys found: \"config_type\" [1063], \"energy\" [1063], \"virial\" [1063]\n", "└ @ IPFitting.Data /Users/Cas/.julia/dev/IPFitting/src/data.jl:157\n", "\u001b[32mProgress: 100%|█████████████████████████████████████████| Time: 0:00:07\u001b[39m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "┌─────────────────┬───────┬───────┬───────┬───────┬───────┐\n", "│\u001b[1m config_type \u001b[0m│\u001b[1m #cfgs \u001b[0m│\u001b[1m #envs \u001b[0m│\u001b[1m #E \u001b[0m│\u001b[1m #F \u001b[0m│\u001b[1m #V \u001b[0m│\n", "│\u001b[90m String \u001b[0m│\u001b[90m Int64 \u001b[0m│\u001b[90m Int64 \u001b[0m│\u001b[90m Int64 \u001b[0m│\u001b[90m Int64 \u001b[0m│\u001b[90m Int64 \u001b[0m│\n", "├─────────────────┼───────┼───────┼───────┼───────┼───────┤\n", "│ Al_T4000 │ 32 │ 2240 │ 32 │ 6720 │ 288 │\n", "│ Al_T1000 │ 32 │ 2240 │ 32 │ 6720 │ 288 │\n", "│ Al_fcc_LD_1000K │ 999 │ 999 │ 999 │ 2997 │ 8991 │\n", "├─────────────────┼───────┼───────┼───────┼───────┼───────┤\n", "│ total │ 1063 │ 5479 │ 1063 │ 16437 │ 9567 │\n", "│ missing │ 0 │ 0 │ 0 │ 0 │ 0 │\n", "└─────────────────┴───────┴───────┴───────┴───────┴───────┘\n" ] } ], "source": [ "cfgs = IPFitting.Data.read_xyz(\"./Al_LD_MD.xyz\", energy_key=\"energy\", force_key=\"force\", virial_key=\"virial\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Listing the unique configuration types (taken from Python ASE Atoms objects `at.info[\"config_type\"]` from .xyz file)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "unique(configtype.(cfgs)) = [\"Al_T4000\", \"Al_T1000\", \"Al_fcc_LD_1000K\"]\n" ] } ], "source": [ "@show unique(configtype.(cfgs));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ACE Basis\n", "\n", "We set up a 2B + ACE basis. We will use the many-body ACE basis for interatomic distances [`0.8*r0`, `2.0*r0`], and the 2B for all distances within a `3.0*r0` cutoff.\n", "\n", "Here `B2` is the polynomial degree of the 2B part, `N` is the number of interaction neighbours, `polydeg` the maximum polynomial degree in the ACE fit" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "r0 = rnn(:Al)\n", "N = 3\n", "deg_site = 12\n", "deg_pair = 3\n", "\n", "# construction of a basic basis for site energies \n", "Bsite = rpi_basis(species = :Al, \n", " N = N, # correlation order = body-order - 1\n", " maxdeg = deg_site, # polynomial degree\n", " r0 = r0, # estimate for NN distance\n", " rin = 0.8*r0, rcut = 2*r0, # domain for radial basis (cf documentation)\n", " pin = 2) # require smooth inner cutoff\n", "\n", "# pair potential basis \n", "Bpair = pair_basis(species = :Al, r0 = r0, maxdeg = deg_pair, \n", " rcut = 3 * r0, rin = 0.0, \n", " pin = 0 ) # pin = 0 means no inner cutoff\n", "\n", "\n", "B = JuLIP.MLIPs.IPSuperBasis([Bpair, Bsite]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`length(B)` returns the size of the ACE Basis. Generally chosen in a 1000 to 10 000 range" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "214" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "length(B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LsqDB assembly\n", "\n", "Use `export JULIA_NUM_THREADS=16` to parallelise the LSQDB assembly over 16 cores. Default is assembly in serial. Unfortunately Julia doesn't allow (yet?) controlling the number of threads from within the code.\n", "\n", "The assembled least squares system is then stored on disk. For a simple problem such as this, it is not necessary. But for larger problems it can take hours to assemble the lsq matrix versus second to load it from disk. This makes it possible to experiment interactively with different fitting parameters" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dbname = \"./Al_MD_LD_PH_B3_N3_12\"\n", "Assemble LSQ blocks in serial\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32mProgress: 100%|█████████████████████████████████████████| Time: 0:01:15\u001b[39m\n", "┌ Info: Elapsed: 75.8s\n", "└ @ IPFitting.Tools /Users/Cas/.julia/dev/IPFitting/src/tools.jl:68\n", "┌ Info: Writing db to disk...\n", "└ @ IPFitting.DB /Users/Cas/.julia/dev/IPFitting/src/lsq_db.jl:188\n", "┌ Warning: The file ./Al_MD_LD_PH_B3_N3_12_info.json already exists. It will be renamed to ./Al_MD_LD_PH_B3_N3_12_info.json.gyrja to avoid overwriting.\n", "└ @ IPFitting.DB /Users/Cas/.julia/dev/IPFitting/src/lsq_db.jl:76\n", "┌ Warning: The file ./Al_MD_LD_PH_B3_N3_12_kron.h5 already exists. It will be renamed to ./Al_MD_LD_PH_B3_N3_12_kron.h5.zvtwj to avoid overwriting.\n", "└ @ IPFitting.DB /Users/Cas/.julia/dev/IPFitting/src/lsq_db.jl:76\n", "┌ Info: ... done\n", "└ @ IPFitting.DB /Users/Cas/.julia/dev/IPFitting/src/lsq_db.jl:195\n" ] } ], "source": [ "dbname = \"./Al_MD_LD_PH_B$(deg_pair)_N$(N)_$(deg_site)\"\n", "@show dbname\n", "dB = LsqDB(dbname, B, cfgs);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fitting\n", "\n", "Load the assembled LsqDB database. This is not strictly necessary in this case since it is already in memory. But the more typical workflow is to have one script that assembles the database and a second script that loads it and produces fits." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "dB = LsqDB(dbname);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`E0` is the isolated atom energy\n", "\n", "Set the weights, `default` sets the default weights, which can be overwritten for a specific configuration type" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "OneBody{Float64}(Dict(:Al => -105.5951))" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "E0 = -105.5951\n", "\n", "weights = Dict(\n", " \"default\" => Dict(\"E\" => 30.0, \"F\" => 1.0 , \"V\" => 1.0 ),\n", " \"Al_fcc_LD_1000K\" => Dict(\"E\" => 40.0, \"F\" => 1.0 , \"V\" => 10.0 ));\n", "\n", "Vref = OneBody(:Al => E0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perform the fit using the `lsqfit` command. We're using simple ridge regression here.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: assemble lsq system\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:383\n", "┌ Info: Free Memory: ≈ 0.03 GB\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:307\n", "┌ Info: Free Memory: ≈ 0.03 GB\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:307\n", "┌ Info: Free Memory: ≈ 0.13 GB\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:307\n", "┌ Info: solve (23878, 214) LSQ system using Ridge Regression [r = 1.05] \n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:432\n", "┌ Info: `reglsq` : solve regularised least squares\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:1160\n", "┌ Info: found bracket, starting bisection\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:1188\n", "┌ Info: found a solution\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:1211\n", "┌ Info: Free Memory: ≈ 0.09 GB\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:307\n", "┌ Info: Relative RMSE on training set: 1.4690414202516895\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:1024\n", "┌ Info: Assemble errors table\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:1056\n", "┌ Warning: new error implementation... redo this part please \n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:1057\n", "┌ Info: Assemble Information about the fit\n", "└ @ IPFitting.Lsq /Users/Cas/.julia/dev/IPFitting/src/lsq.jl:1070\n" ] } ], "source": [ "IP, lsqinfo = lsqfit(dB, solver=(:rid, 1.05), weights = weights, Vref = Vref,\n", " asmerrs = true);" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "┌─────────────────┬─────────┬──────────┬────────┬────────┐\n", "│\u001b[1m config type \u001b[0m│\u001b[1m E [meV] \u001b[0m│\u001b[1m F [eV/A] \u001b[0m│\u001b[1m F [%] \u001b[0m│\u001b[1m V[meV] \u001b[0m│\n", "├─────────────────┼─────────┼──────────┼────────┼────────┤\n", "│ Al_T1000 │ 3.402 │ 0.078 │ 10.306 │ 19.783 │\n", "│ Al_T4000 │ 5.238 │ 0.296 │ 8.816 │ 50.306 │\n", "│ Al_fcc_LD_1000K │ 4.092 │ 0.000 │ NaN │ 55.919 │\n", "├─────────────────┼─────────┼──────────┼────────┼────────┤\n", "│ set │ 4.112 │ 0.196 │ 8.894 │ 55.014 │\n", "└─────────────────┴─────────┴──────────┴────────┴────────┘\n" ] } ], "source": [ "rmse_table(lsqinfo[\"errors\"])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "save_dict(\"ACE_Al.json\", Dict(\"IP\" => write_dict(IP), \"info\" => lsqinfo))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2B Repulsion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add 2B repulsion for interatomic distances < `0.6*r0`. This ensures the potential is repulsive at small interatomic distances where there is no data." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "ri = round(0.6 * r0, digits=2)\n", "Vfit = IP.components[2]\n", "Vrep = ACE.PairPotentials.RepulsiveCore(Vfit, ri)\n", "\n", "IP.components[2] = Vrep;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculating error/force/virial errors with regularised and 2B repulsion fit. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "add_fits_serial!(IP, cfgs; fitkey=\"fit_rep\")\n", "rmse_, rmserel_ = rmse(cfgs; fitkey=\"fit_rep\");" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "┌─────────────────┬─────────┬──────────┬────────┬────────┐\n", "│\u001b[1m config type \u001b[0m│\u001b[1m E [meV] \u001b[0m│\u001b[1m F [eV/A] \u001b[0m│\u001b[1m F [%] \u001b[0m│\u001b[1m V[meV] \u001b[0m│\n", "├─────────────────┼─────────┼──────────┼────────┼────────┤\n", "│ Al_T1000 │ 3.402 │ 0.078 │ 10.306 │ 19.783 │\n", "│ Al_T4000 │ 5.198 │ 0.296 │ 8.810 │ 50.504 │\n", "│ Al_fcc_LD_1000K │ 4.092 │ 0.000 │ NaN │ 55.919 │\n", "├─────────────────┼─────────┼──────────┼────────┼────────┤\n", "│ set │ 4.111 │ 0.196 │ 8.888 │ 55.020 │\n", "└─────────────────┴─────────┴──────────┴────────┴────────┘\n" ] } ], "source": [ "rmse_table(rmse_, rmserel_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Saving the fit" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "ename": "LoadError", "evalue": "\u001b[91mUndefVarError: IP_reg not defined\u001b[39m", "output_type": "error", "traceback": [ "\u001b[91mUndefVarError: IP_reg not defined\u001b[39m", "", "Stacktrace:", " [1] top-level scope at In[15]:5", " [2] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091" ] } ], "source": [ "lsqinfo[\"errors\"][\"rmse\"] = rmse_\n", "lsqinfo[\"errors\"][\"rmserel\"] = rmserel_\n", "\n", "potname = \"$(dbname)_regfit_$(ri)_rep.json\"\n", "save_dict(potname, Dict(\"IP\" => write_dict(IP_reg), \"info\" => lsqinfo))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Usage" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "ename": "LoadError", "evalue": "\u001b[91mSystemError: opening file \"./Al_MD_LD_PH_B3_N3_12_regfit_1.72_rep.json\": No such file or directory\u001b[39m", "output_type": "error", "traceback": [ "\u001b[91mSystemError: opening file \"./Al_MD_LD_PH_B3_N3_12_regfit_1.72_rep.json\": No such file or directory\u001b[39m", "", "Stacktrace:", " [1] systemerror(::String, ::Int32; extrainfo::Nothing) at ./error.jl:168", " [2] #systemerror#48 at ./error.jl:167 [inlined]", " [3] systemerror at ./error.jl:167 [inlined]", " [4] open(::String; lock::Bool, read::Nothing, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing) at ./iostream.jl:284", " [5] open at ./iostream.jl:273 [inlined]", " [6] open(::JSON.Parser.var\"#4#5\"{DataType,DataType,Nothing,Bool,Bool,Int64}, ::String; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./io.jl:323", " [7] open at ./io.jl:323 [inlined]", " [8] #parsefile#3 at /Users/Cas/.julia/packages/JSON/3rsiS/src/Parser.jl:519 [inlined]", " [9] parsefile at /Users/Cas/.julia/packages/JSON/3rsiS/src/Parser.jl:518 [inlined]", " [10] load_dict(::String) at /Users/Cas/.julia/dev/JuLIP/src/FIO.jl:78", " [11] top-level scope at In[16]:1", " [12] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091" ] } ], "source": [ "V = read_dict(load_dict(potname)[\"IP\"]);" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "at = bulk(:Al) * (2,2,2);" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "ename": "LoadError", "evalue": "\u001b[91mUndefVarError: V not defined\u001b[39m", "output_type": "error", "traceback": [ "\u001b[91mUndefVarError: V not defined\u001b[39m", "", "Stacktrace:", " [1] top-level scope at In[18]:1", " [2] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091" ] } ], "source": [ "energy(V, at)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "ename": "LoadError", "evalue": "\u001b[91mUndefVarError: V not defined\u001b[39m", "output_type": "error", "traceback": [ "\u001b[91mUndefVarError: V not defined\u001b[39m", "", "Stacktrace:", " [1] top-level scope at In[19]:1", " [2] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091" ] } ], "source": [ "forces(V, at)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.5.2", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }