{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Probability map of phytoplankton in the North Sea using DIVAnd and a neural network\n",
    "The first step is to load the required modules"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "using DIVAnd\n",
    "using DIVAndNN\n",
    "using LinearAlgebra\n",
    "using Statistics\n",
    "using Random\n",
    "using Dates"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "The domain and the directory path `datadir` is defined in the file `emodnet_bio_grid.jl`"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "include(\"../scripts/emodnet_bio_grid.jl\");\n",
    "include(\"../scripts/validate_probability.jl\");\n",
    "include(\"../scripts/PhytoInterp.jl\");"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Create working directories"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "mkpath(datadir)\n",
    "mkpath(joinpath(datadir,\"tmp\"))"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Helper function to download file from an URL is necessary"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "function maybedownload(url,fname)\n",
    "    if !isfile(fname)\n",
    "        mv(download(url),fname)\n",
    "    else\n",
    "        @info(\"$url is already downloaded\")\n",
    "    end\n",
    "end"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Download the GEBCO Bathymetry"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "bathname = joinpath(datadir,\"gebco_30sec_4.nc\");\n",
    "bathisglobal = true\n",
    "maybedownload(\"https://dox.ulg.ac.be/index.php/s/RSwm4HPHImdZoQP/download\",\n",
    "              joinpath(datadir,\"gebco_30sec_4.nc\"))"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Download a sample data file.\n",
    "Here we use the _Biddulphia sinensis_ prepared by Deltares, NL"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "datafile = joinpath(datadir, \"Biddulphia sinensis-1995-2020.csv\")\n",
    "maybedownload(\"https://dox.ulg.ac.be/index.php/s/VgLglubaTLetHzc/download\", datafile)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Mask and bathymetry\n",
    "Interpolate land-sea mask"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "maskname = joinpath(datadir,\"mask.nc\")\n",
    "DIVAndNN.prep_mask(bathname,bathisglobal,gridlon,gridlat,years,maskname)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Load the mask (true: sea, false: land)"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "ds = Dataset(maskname,\"r\")\n",
    "mask = nomissing(ds[\"mask\"][:,:]) .== 1\n",
    "close(ds)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Interpolate the bathymetry"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "DIVAndNN.prep_bath(bathname,bathisglobal,gridlon,gridlat,datadir)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Environmental covariables\n",
    "These files are quite large and processing them takes some time. We therefore\n",
    "download the prepared data files for the North Sea."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "These files can be generated by:\n",
    "```julia\n",
    "maybedownload(\"https://ec.oceanbrowser.net/data/emodnet-projects/Phase-3/Combined/Water_body_phosphate_combined_V1.nc\",\n",
    "              joinpath(datadir,\"tmp\",\"Water_body_phosphate_combined_V1.nc\"))\n",
    "\n",
    "maybedownload(\"https://ec.oceanbrowser.net/data/emodnet-projects/Phase-3/Combined/Water_body_nitrogen_combined_V1.nc\",\n",
    "              joinpath(datadir,\"tmp\",\"Water_body_nitrogen_combined_V1.nc\"))\n",
    "\n",
    "maybedownload(\"https://ec.oceanbrowser.net/data/emodnet-projects/Phase-3/Combined/Water_body_silicate_combined_V1.nc\",\n",
    "              joinpath(datadir,\"tmp\",\"Water_body_silicate_combined_V1.nc\"))\n",
    "\n",
    "DIVAndNN.prep_tempsalt(gridlon,gridlat,data_TS,datadir)\n",
    "```"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "maybedownload(\"https://dox.ulg.ac.be/index.php/s/y9Z0c1wb5YshVDW/download\",\n",
    "              joinpath(datadir,\"silicate.nc\"))\n",
    "\n",
    "maybedownload(\"https://dox.ulg.ac.be/index.php/s/A1NPSWwQYkx6Wy6/download\",\n",
    "              joinpath(datadir,\"phosphate.nc\"))\n",
    "\n",
    "maybedownload(\"https://dox.ulg.ac.be/index.php/s/LDPbPWBvW6wPmCw/download\",\n",
    "              joinpath(datadir,\"nitrogen.nc\"))\n",
    "\n",
    "\n",
    "BLAS.set_num_threads(1)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Compute local resolution"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "mask_unused,pmn,xyi = DIVAnd.domain(bathname,bathisglobal,gridlon,gridlat);"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Next we load the covariables.\n",
    "The entries below correspond to the file name, the variable name and\n",
    "transformation function"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "covars_fname = [\n",
    "    (\"bathymetry.nc\" , \"batymetry\" , identity),\n",
    "    (\"nitrogen.nc\"   , \"nitrogen\"  , identity),\n",
    "    (\"phosphate.nc\"  , \"phosphate\" , identity),\n",
    "    (\"silicate.nc\"   , \"silicate\"  , identity),\n",
    "]"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Add `datadir` to the file file names"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "covars_fname = map(entry -> (joinpath(datadir,entry[1]),entry[2:end]...),covars_fname)\n",
    "\n",
    "field = DIVAndNN.loadcovar((gridlon,gridlat),covars_fname;\n",
    "                           covars_const = true);"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Normalize covariables"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "DIVAndNN.normalize!(mask,field)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Inventory of all data files\n",
    "For this example we have just one file"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "data_analysis = DIVAndNN.Format2020(datadir,\"\")\n",
    "scientificname_accepted = listnames(data_analysis);"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "Parameters for the analysis\n",
    "Except `len`, all parameters are adimensional."
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "niter = 500                         # number of iterations\n",
    "trainfrac = 0.01                    # fraction of data using during training\n",
    "epsilon2ap = 10                     # data constraint parameter\n",
    "epsilon2_background = 10            # error variance of obs. relative to background\n",
    "NLayers = [size(field)[end],4,1]    # number of layers of the neural network\n",
    "learning_rate = 0.001               # learning rate for the optimizer\n",
    "L2reg = 0.0001                      # L2 regularization for the weights\n",
    "dropoutprob = 0.6                   # drop-out probability\n",
    "len = 75e3                          # correlation length-scale (meters)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "output directory"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "outdir = joinpath(datadir,\"Results\",\"test\")\n",
    "mkpath(outdir)\n",
    "\n",
    "sname = String(scientificname_accepted[1])\n",
    "\n",
    "@info sname"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "load data"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "lon_a,lat_a,obstime_a,value_a,ids_a = loadbyname(data_analysis,years,sname)\n",
    "\n",
    "Random.seed!(1234)\n",
    "\n",
    "xobs_a = (lon_a,lat_a)\n",
    "lenxy = (len,len)"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Start the analysis"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "value_analysis,fw0 = DIVAndNN.analysisprob(\n",
    "    mask,pmn,xyi,xobs_a,\n",
    "    value_a,\n",
    "    lenxy,epsilon2ap,\n",
    "    field,\n",
    "    NLayers,\n",
    "    costfun = DIVAndNN.nll,\n",
    "    niter = niter,\n",
    "    dropoutprob = dropoutprob,\n",
    "    L2reg = L2reg,\n",
    "    learning_rate = learning_rate,\n",
    "    rmaverage = true,\n",
    "    trainfrac = trainfrac,\n",
    "    epsilon2_background = epsilon2_background,\n",
    ");"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Save the results"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "outname = joinpath(outdir,\"DIVAndNN_$(sname)_interp.nc\")\n",
    "create_nc_results(outname, gridlon, gridlat, value_analysis, sname;\n",
    "                  varname = \"probability\", long_name=\"occurrence probability\");"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Plots"
   ],
   "metadata": {}
  },
  {
   "outputs": [],
   "cell_type": "code",
   "source": [
    "include(\"../scripts/emodnet_bio_plot2.jl\")"
   ],
   "metadata": {},
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "source": [
    "---\n",
    "\n",
    "*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*"
   ],
   "metadata": {}
  }
 ],
 "nbformat_minor": 3,
 "metadata": {
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.4.1"
  },
  "kernelspec": {
   "name": "julia-1.4",
   "display_name": "Julia 1.4.1",
   "language": "julia"
  }
 },
 "nbformat": 4
}