--- title: "HW5 - Programming in R" linkTitle: "HW5: R Programming" description: > type: docs weight: 305 ---

Source code downloads:   [ .R ]
## A. Reverse and complement of DNA __Task 1__: Write a `RevComp` function that returns the reverse and complement of a DNA sequence string. Include an argument that will allow to return only (i) the reversed sequence, (ii) the complemented sequence, or (iii) the reversed and complemented sequence. The following R functions will be useful for the implementation: Generate a short test DNA sequence ```r x <- c("ATGCATTGGACGTTAG") x ``` ``` ## [1] "ATGCATTGGACGTTAG" ``` Vectorize sequence ```r x <- substring(x, 1:nchar(x), 1:nchar(x)) x ``` ``` ## [1] "A" "T" "G" "C" "A" "T" "T" "G" "G" "A" "C" "G" "T" "T" "A" "G" ``` Reverse sequence ```r x <- rev(x) x ``` ``` ## [1] "G" "A" "T" "T" "G" "C" "A" "G" "G" "T" "T" "A" "C" "G" "T" "A" ``` Collapse sequence back to character string ```r x <- paste(x, collapse="") x ``` ``` ## [1] "GATTGCAGGTTACGTA" ``` Form complement of sequence ```r chartr("ATGC", "TACG", x) ``` ``` ## [1] "CTAACGTCCAATGCAT" ``` __Task 2__: Write a function that applies the `RevComp` function to many sequences stored in a vector. In addition, write an export function that saves the sequences generated under Tasks 1 and 2 to a file in FASTA format. ## B. Translate DNA into Protein __Task 3__: Write a function that will translate one or many DNA sequences in all three reading frames into proteins. The following commands will simplify this task: Import lookup table of genetic code ```r AAdf <- read.table(file="http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/AA.txt", header=TRUE, sep="\t") AAdf[1:4,] ``` ``` ## Codon AA_1 AA_3 AA_Full AntiCodon ## 1 TCA S Ser Serine TGA ## 2 TCG S Ser Serine CGA ## 3 TCC S Ser Serine GGA ## 4 TCT S Ser Serine AGA ``` Generated named vector of relevant components ```r AAv <- as.character(AAdf[,2]) names(AAv) <- AAdf[,1] AAv ``` ``` ## TCA TCG TCC TCT TTT TTC TTA TTG TAT TAC TAA TAG TGT TGC TGA TGG CTA CTG CTC CTT CCA CCG CCC CCT CAT ## "S" "S" "S" "S" "F" "F" "L" "L" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L" "P" "P" "P" "P" "H" ## CAC CAA CAG CGA CGG CGC CGT ATT ATC ATA ATG ACA ACG ACC ACT AAT AAC AAA AAG AGT AGC AGA AGG GTA GTG ## "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T" "N" "N" "K" "K" "S" "S" "R" "R" "V" "V" ## GTC GTT GCA GCG GCC GCT GAT GAC GAA GAG GGA GGG GGC GGT ## "V" "V" "A" "A" "A" "A" "D" "D" "E" "E" "G" "G" "G" "G" ``` Tripletize sequence and translate by name subsetting/sorting of AAv ```r y <- gsub("(...)", "\\1_", x) y <- unlist(strsplit(y, "_")) y <- y[grep("^...$", y)] AAv[y] ``` ``` ## GAT TGC AGG TTA CGT ## "D" "C" "R" "L" "R" ``` ## Homework submission Submit the 3 functions in one well structured and annotated R script to your private GitHub repository under `Homework/HW5/HW5.R`. The script should include instructions on how to use the functions. ## Due date This homework is due on Fri, April 26th at 6:00 PM. ## Homework Solutions See [here](https://raw.githubusercontent.com/tgirke/GEN242/main/static/custom/hw_solutions/hw5_solution.R).