# Day 1 in-class Exercises

## The Unix directory file structure

1. Using one command move to your home directory.


2. Using one command list the contents of the reference_data directory that is within the unix_lesson directory.


3. Using one command list one of the files in reference_data.


## Copying, creating, moving and removing data

1. Create a new folder in unix_lesson called selected_fastq


2. Copy over the Irrel_kd_2.subset.fq and Mov10_oe_2.subset.fq from raw_fastq to the ~/unix_lesson/selected_fastq folder


3. Rename the selected_fastq folder and call it exercise1


# Day 1 Self-learning Exercises

## Saving time with tab completion, wildcards and other shortcuts

### Wildcards

Do each of the following using a single ls command without navigating to a different directory.

   a. List all of the files in /bin that start with the letter 'c'

   b. List all of the files in /bin that contain the letter 'a'

   c. List all of the files in /bin that end with the letter 'o'

   d. BONUS: Using one command to list all of the files in /bin that contain either 'a' or 'c'.


### Command history

1. Checking the output of the history command, how many commands have you typed in so far?

2. Use the up arrow key to check the command you used before history command. What is it? Does it make sense?

3. Type several random characters on the command prompt. Can you bring the cursor to the start with Ctrl + A? Next, can you bring the cursor to the end with Ctrl + E? Finally, what happens when you use Ctrl + C?


## Working with Files

### Examining Files

1. Change directories into genomics_data. You can do this using a full or relative path.
 
2. Use the less command to open up the file Encode-hesc-Nanog.bed.

3. Search for the string chr11; you'll see all instances in the file highlighted.

4. Staying in the less buffer, use the shortcut to get to the end of the file. Report three highlighted lines at the end of the file where you see chr11. 

5. Exit the less buffer and come back to the command prompt.

6. Print to screen the last 5 lines of the file Encode-hesc-Nanog.bed. Report what you see as the output within the Terminal.


### Writing/Creating files with Vim

We have covered some basic commands in vim, but practice is key for getting comfortable with the program. Let's practice what we just learned in a brief challenge.

1. Open spider.txt, and delete the word "water" from line #2.

2. Quit without saving.

3. Open spider.txt again, and replace every occurrence of "spider" with "unicorn".

4. Delete: "Down came the rain."

5. Save the file.

6. Undo your previous deletion.

7. Redo your previous deletion.

8. Delete the first and last words from each of the lines.

9. Save the file. 

10. Open up the file and copy and paste the contents to a text editor on your local laptop to submit as homework.


## Searching Files

1. Search for the sequence CTCAATGAGCCA in Mov10_oe_1.subset.fq. How many sequences do you find?

2. In addition to finding the sequence, how can you modify the command so that your search also returns the name of the sequence?

3. If you want to search for that sequence in **all** Mov10 replicate fastq files, what command would you use?


### Searching and redirection final exercise

Now that we know what type of information is inside of our gtf file, let's explore our commands to answer a simple question about our data: how many unique exons are present on chromosome 1 using chr1-hg19_genes.gtf?

1. Extract only the genomic coordinates of exon features

2. Subset dataset to only keep genomic coordinates

3. Remove duplicate exons

4. Count the total number of exons


## Shell scripts and variables 

### A simple shell script

1. Open up the script listing.sh using vim. Add the command which prints to screen the contents of the file Mov10_rnaseq_metadata.txt.

2. Add an echo statement for the command, which tells the user "This is information about the files in our dataset:"

3. Run the new script. Report the contents of the new script and the output you got after running it.
 

### Bash variables

1. Use the $file variable as input to the head and tail commands, and modify the arguments to display only four lines. Provide the lines of code used and report the header lines (@HWI) you retrieve from each command.
  
2. Create a new variable called meta and assign it the value Mov10_rnaseq_metadata.txt. For the following questions, use the $meta variable but do not change directories. Provide the code you would run to:
    a. Display the contents of the file using cat.
    b. Retrieve only the lines which contain normal samples. (Hint: use grep)

    
### basename command

1. How would you modify the above basename command above to only return Mov10_oe_1?
  
2. Use basename with the file Irrel_kd_1.subset.fq as input. Return only Irrel_kd_1 to the terminal.


### Shell scripting with bash variables

1. Run the script directory_info.sh. Report what gets printed to the screen.
  
2. Open up the script directory_info.sh using vim. Change the appropriate line of code so that our directory of interest is ~/unix_lesson/genomics_data. Save and exit Vim.

3. Run the script with the changes and report what gets printed to the screen.