{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Started with rTASSEL\n",
    "In this notebook, we will show you the basic usages of `rTASSEL`. `rTASSEL` can be utilized in genetic analysis pipelines in several ways:\n",
    "* Genotype data handling and manipulation\n",
    "* Genetic analysis (e.g. association, LD, etc.)\n",
    "* Visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load package\n",
    "`rTASSEL` can simply be loaded using the `library()` method once installed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(rTASSEL)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load genotype and phenotype data\n",
    "In the following code block, we will load example phenotype and genotype data from a file path within the `rTASSEL` package. Once loaded, we can use `rTASSEL`'s `readGenotypePhenotype` method to create an `rTASSEL` data object from the file paths:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load gentoype info\n",
    "genoPathHMP <- system.file(\"extdata\", \"mdp_genotype.hmp.txt\", package = \"rTASSEL\")\n",
    "\n",
    "# Load phenotype info\n",
    "phenoPath  <- system.file(\"extdata\", \"mdp_traits.txt\", package = \"rTASSEL\")\n",
    "\n",
    "# Create rTASSEL data object\n",
    "tasGenoPheno <- readGenotypePhenotype(\n",
    "    genoPathOrObj    = genoPathHMP,\n",
    "    phenoPathDFOrObj = phenoPath\n",
    ")\n",
    "\n",
    "# View data object\n",
    "tasGenoPheno"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate kinship analysis\n",
    "In TASSEL, for mixed linear model analyses, a kinship matrix calculated from genotype data is necessary. This can be accomplished by calculating a kinship TASSEL object using the function `kinshipMatrix()` from `rTASSEL`. The main parameter input is a `TasselGenotypePhenotype` class object that contains a genotype data set (i.e. our prior data object):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tasKin <- kinshipMatrix(tasObj = tasGenoPheno)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show summary object\n",
    "tasKin"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since this object is essentially a TASSEL reference object, it is relatively not of use until association analysis. We can coerce this TASSEL object by using the following function, `as.matrix()` from base R to return a `matrix` object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get full R matrix\n",
    "tasKinRMat <- as.matrix(tasKin)\n",
    "\n",
    "# Inspect the first 5 rows and columns\n",
    "tasKinRMat[1:5, 1:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Association analyses\n",
    "One of TASSEL's most powerful functionalities is its capability of performing\n",
    "a variety of different association modeling techniques. If you have started\n",
    "reading the walkthrough here it is *strongly suggested that you read the other\n",
    "components of this walkthrough since the following parameters require what we\n",
    "have previously created!*\n",
    "\n",
    "If you are not familar with these methods, more information about how\n",
    "these operate in base TASSEL can be found at following links:\n",
    "\n",
    "* [BLUE/GLM](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/GLM/GLM)\n",
    "* [MLM](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/MLM/MLM)\n",
    "* [Fast Association](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/FastAssociation/FastAssociation)\n",
    "\n",
    "The `assocModelFitter()` function has several primary components:\n",
    "\n",
    "* `tasObj`: a `TasselGenotypePhenotype` class R object\n",
    "* `formula`: an R-based linear model formula\n",
    "* `fitMarkers`: a boolean parameter to differentiate between BLUE and GLM\n",
    "  analyses\n",
    "* `kinship`: a TASSEL kinship object\n",
    "* `fastAssociation`: a boolean parameter for data sets that have many traits\n",
    "\n",
    "Probably the most important concept of this function is `formula` parameter.\n",
    "If you are familar with standard R linear model functions, this concept is\n",
    "fairly similar. In TASSEL, a linear model is composed of the following scheme:\n",
    "\n",
    "$$y \\sim A_{1} + A_{2} + \\dots A_{n}$$\n",
    "\n",
    "Where $y$ is any TASSEL `data` type and $A_{n}$ is any TASSEL `covariate`\n",
    "and / or `factor` types. In the following example, we will fit a mixed linear model (MLM) with our `rTASSEL` data object. In addition to the prior parameters, we will also need the TASSEL kinship object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate MLM\n",
    "tasMLM <- assocModelFitter(\n",
    "    tasObj = tasGenoPheno,  # <- our prior TASSEL object\n",
    "    formula = EarHT ~ .,    # <- run only EarHT\n",
    "    fitMarkers = TRUE,      # <- set this to TRUE for GLM\n",
    "    kinship = tasKin,       # <- our prior kinship object\n",
    "    fastAssociation = FALSE\n",
    ")\n",
    "\n",
    "# Return names of objects returned\n",
    "names(tasMLM)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more information about the extent of these methods, please read this [section](https://maize-genetics.github.io/rTASSEL/articles/rtassel_walkthrough.html#overview-2) of our introductory vignette."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualizations\n",
    "`rTASSEL` supports automated visualizations for Manhattan plots and linkage disequilibrium (LD) analyses. In this example, we will generate a Manhattan plot for our prior MLM object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate Manhattan plot for ear height trait\n",
    "manhattanEH <- manhattanPlot(\n",
    "    assocStats = tasMLM$MLM_Stats,\n",
    "    trait      = \"EarHT\",\n",
    "    threshold  = 5\n",
    ")\n",
    "\n",
    "# View visualization\n",
    "manhattanEH"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, we can also visualize LD using automated methods. Like most LD plots, it is wise to filter your genotype information to a specific region of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Filter genotype table by position\n",
    "tasGenoPhenoFilt <- filterGenotypeTableSites(\n",
    "    tasObj              = tasGenoPheno,\n",
    "    siteRangeFilterType = \"position\",\n",
    "    startPos            = 228e6,\n",
    "    endPos              = 300e6,\n",
    "    startChr            = 2,\n",
    "    endChr              = 2\n",
    ")\n",
    "\n",
    "# Generate and visualize LD\n",
    "myLD <- ldPlot(\n",
    "    tasObj  = tasGenoPhenoFilt,\n",
    "    ldType  = \"All\",\n",
    "    plotVal = \"r2\",\n",
    "    verbose = FALSE\n",
    ")\n",
    "\n",
    "# View LD\n",
    "myLD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Genomic prediction\n",
    "`rTASSEL` also allows for phenotypic prediction through genotype information via genomic best linear unbiased predictors (gBLUPs):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tasCV <- genomicPrediction(\n",
    "    tasPhenoObj = tasGenoPheno,\n",
    "    kinship     = tasKin,\n",
    "    doCV        = TRUE,\n",
    "    kFolds      = 5,\n",
    "    nIter       = 1\n",
    ")\n",
    "head(tasCV)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.1.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}