{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Started with rTASSEL\n", "In this notebook, we will show you the basic usages of `rTASSEL`. `rTASSEL` can be utilized in genetic analysis pipelines in several ways:\n", "* Genotype data handling and manipulation\n", "* Genetic analysis (e.g. association, LD, etc.)\n", "* Visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load package\n", "`rTASSEL` can simply be loaded using the `library()` method once installed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "library(rTASSEL)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load genotype and phenotype data\n", "In the following code block, we will load example phenotype and genotype data from a file path within the `rTASSEL` package. Once loaded, we can use `rTASSEL`'s `readGenotypePhenotype` method to create an `rTASSEL` data object from the file paths:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load gentoype info\n", "genoPathHMP <- system.file(\"extdata\", \"mdp_genotype.hmp.txt\", package = \"rTASSEL\")\n", "\n", "# Load phenotype info\n", "phenoPath <- system.file(\"extdata\", \"mdp_traits.txt\", package = \"rTASSEL\")\n", "\n", "# Create rTASSEL data object\n", "tasGenoPheno <- readGenotypePhenotype(\n", " genoPathOrObj = genoPathHMP,\n", " phenoPathDFOrObj = phenoPath\n", ")\n", "\n", "# View data object\n", "tasGenoPheno" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate kinship analysis\n", "In TASSEL, for mixed linear model analyses, a kinship matrix calculated from genotype data is necessary. This can be accomplished by calculating a kinship TASSEL object using the function `kinshipMatrix()` from `rTASSEL`. The main parameter input is a `TasselGenotypePhenotype` class object that contains a genotype data set (i.e. our prior data object):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tasKin <- kinshipMatrix(tasObj = tasGenoPheno)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Show summary object\n", "tasKin" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since this object is essentially a TASSEL reference object, it is relatively not of use until association analysis. We can coerce this TASSEL object by using the following function, `as.matrix()` from base R to return a `matrix` object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get full R matrix\n", "tasKinRMat <- as.matrix(tasKin)\n", "\n", "# Inspect the first 5 rows and columns\n", "tasKinRMat[1:5, 1:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Association analyses\n", "One of TASSEL's most powerful functionalities is its capability of performing\n", "a variety of different association modeling techniques. If you have started\n", "reading the walkthrough here it is *strongly suggested that you read the other\n", "components of this walkthrough since the following parameters require what we\n", "have previously created!*\n", "\n", "If you are not familar with these methods, more information about how\n", "these operate in base TASSEL can be found at following links:\n", "\n", "* [BLUE/GLM](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/GLM/GLM)\n", "* [MLM](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/MLM/MLM)\n", "* [Fast Association](https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/FastAssociation/FastAssociation)\n", "\n", "The `assocModelFitter()` function has several primary components:\n", "\n", "* `tasObj`: a `TasselGenotypePhenotype` class R object\n", "* `formula`: an R-based linear model formula\n", "* `fitMarkers`: a boolean parameter to differentiate between BLUE and GLM\n", " analyses\n", "* `kinship`: a TASSEL kinship object\n", "* `fastAssociation`: a boolean parameter for data sets that have many traits\n", "\n", "Probably the most important concept of this function is `formula` parameter.\n", "If you are familar with standard R linear model functions, this concept is\n", "fairly similar. In TASSEL, a linear model is composed of the following scheme:\n", "\n", "$$y \\sim A_{1} + A_{2} + \\dots A_{n}$$\n", "\n", "Where $y$ is any TASSEL `data` type and $A_{n}$ is any TASSEL `covariate`\n", "and / or `factor` types. In the following example, we will fit a mixed linear model (MLM) with our `rTASSEL` data object. In addition to the prior parameters, we will also need the TASSEL kinship object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculate MLM\n", "tasMLM <- assocModelFitter(\n", " tasObj = tasGenoPheno, # <- our prior TASSEL object\n", " formula = EarHT ~ ., # <- run only EarHT\n", " fitMarkers = TRUE, # <- set this to TRUE for GLM\n", " kinship = tasKin, # <- our prior kinship object\n", " fastAssociation = FALSE\n", ")\n", "\n", "# Return names of objects returned\n", "names(tasMLM)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more information about the extent of these methods, please read this [section](https://maize-genetics.github.io/rTASSEL/articles/rtassel_walkthrough.html#overview-2) of our introductory vignette." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizations\n", "`rTASSEL` supports automated visualizations for Manhattan plots and linkage disequilibrium (LD) analyses. In this example, we will generate a Manhattan plot for our prior MLM object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate Manhattan plot for ear height trait\n", "manhattanEH <- manhattanPlot(\n", " assocStats = tasMLM$MLM_Stats,\n", " trait = \"EarHT\",\n", " threshold = 5\n", ")\n", "\n", "# View visualization\n", "manhattanEH" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can also visualize LD using automated methods. Like most LD plots, it is wise to filter your genotype information to a specific region of interest:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter genotype table by position\n", "tasGenoPhenoFilt <- filterGenotypeTableSites(\n", " tasObj = tasGenoPheno,\n", " siteRangeFilterType = \"position\",\n", " startPos = 228e6,\n", " endPos = 300e6,\n", " startChr = 2,\n", " endChr = 2\n", ")\n", "\n", "# Generate and visualize LD\n", "myLD <- ldPlot(\n", " tasObj = tasGenoPhenoFilt,\n", " ldType = \"All\",\n", " plotVal = \"r2\",\n", " verbose = FALSE\n", ")\n", "\n", "# View LD\n", "myLD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Genomic prediction\n", "`rTASSEL` also allows for phenotypic prediction through genotype information via genomic best linear unbiased predictors (gBLUPs):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tasCV <- genomicPrediction(\n", " tasPhenoObj = tasGenoPheno,\n", " kinship = tasKin,\n", " doCV = TRUE,\n", " kFolds = 5,\n", " nIter = 1\n", ")\n", "head(tasCV)" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.1.1" } }, "nbformat": 4, "nbformat_minor": 4 }