### Running in Docker container on EC2 instance

#### Started Docker container with the following command:

```docker run -p 8888:8888 -v /home/ubuntu/gitrepos/LabDocs/jupyter_nbs/sam:/home/notebooks -v /home/ubuntu/data:/data/ -it kubu4/bioinformatics:v11 /bin/bash```

The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files accessible to the Docker container

Once the container was started, started Jupyter Notebook with the following command inside the Docker container:

```jupyter notebook```

This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.

#### Created a tunnel from my local computer to the Docker container:

```ssh -i ~/Dropbox/Lab/Sam/bioinformatics.pem -N -L localhost:8888:localhost:8888 ubuntu@ec2.ip.address```

This command is run in a separate Terminal window than the one that is used to ssh into the EC2 instance to start Docker and all of that.

This ssh command specifies to use my Amazon EC2 authentication file (bioinformatics.pem), along with the -N and -L options for port forwarding stuff (see man ssh for deets), and binds the port 8888 on my local computer to port 8888 on the EC2 isntance. 

The tunnel allows me to start the Jupyter Notebook in my web browser. I enter ```localhost:8888``` in as the URL.

In [1]:
%%bash
date

Thu Jul 14 21:17:15 UTC 2016


In [2]:
%%bash
hostname

570c28713283


### Check computer specs

In [3]:
%%bash
lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
Stepping: 2
CPU MHz: 2900.106
BogoMIPS: 5800.21
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-7


In [4]:
cd /data/

/data


In [5]:
%%bash
mkdir stacks

In [8]:
%%bash
mkdir /data/stacks/radtags_out

In [1]:
cd /data

/data


In [2]:
ls

160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz [0m[01;32m1NF_25A_1.fq.gz[0m*
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz [01;32m1NF_25A_2.fq.gz[0m*
[01;32m1HL_10A_1.fq.gz[0m* [01;32m1NF_26A_1.fq.gz[0m*
[01;32m1HL_10A_2.fq.gz[0m* [01;32m1NF_26A_2.fq.gz[0m*
[01;32m1HL_11A_1.fq.gz[0m* [01;32m1NF_27A_1.fq.gz[0m*
[01;32m1HL_11A_2.fq.gz[0m* [01;32m1NF_27A_2.fq.gz[0m*
[01;32m1HL_12A_1.fq.gz[0m* [01;32m1NF_28A_1.fq.gz[0m*
[01;32m1HL_12A_2.fq.gz[0m* [01;32m1NF_28A_2.fq.gz[0m*
[01;32m1HL_13A_1.fq.gz[0m* [01;32m1NF_29A_1.fq.gz[0m*
[01;32m1HL_13A_2.fq.gz[0m* [01;32m1NF_29A_2.fq.gz[0m*
[01;32m1HL_14A_1.fq.gz[0m* [01;32m1NF_2A_1.fq.gz[0m*
[01;32m1HL_14A_2.fq.gz[0m* [01;32m1NF_2A_2.fq.gz[0m*
[01;32m1HL_15A_1.fq.gz[0m* [01;32m1NF_30A_1.fq.gz[0m*
[01;32m1HL_15A_2.fq.gz[0m* [01;32m1NF_30A_2.fq.gz[0m*
[01;32m1HL_16A_1.fq.gz[0m* [01;32m1NF_31A_1.fq.gz[0m*
[01;32m1HL_16A_2.fq.gz[0m* [01;32m1NF_31A_2.fq.gz

### Run ```process_radtags```

Cell below creates two lists (one list for each individual pair of FASTQs), loops through each as an array and assigns the corresponding FASTQ pairs to the variables "i" and "j". The values in "i" and "j" are used in the process_radtags command for matching the paired FASTQ files.

Trims sequences to 90 bases (I believe uniform sequence length is needed for later Stacks steps)

In [4]:
%%bash
seq1=( 1HL_10A_1.fq.gz 1HL_11A_1.fq.gz 1HL_12A_1.fq.gz 1HL_13A_1.fq.gz 1HL_14A_1.fq.gz 1HL_15A_1.fq.gz 1HL_16A_1.fq.gz 1HL_17A_1.fq.gz 1HL_19A_1.fq.gz 1HL_1A_1.fq.gz 1HL_20A_1.fq.gz 1HL_21A_1.fq.gz 1HL_22A_1.fq.gz 1HL_23A_1.fq.gz 1HL_24A_1.fq.gz 1HL_25A_1.fq.gz 1HL_26A_1.fq.gz 1HL_27A_1.fq.gz 1HL_28A_1.fq.gz 1HL_29A_1.fq.gz 1HL_2A_1.fq.gz 1HL_31A_1.fq.gz 1HL_33A_1.fq.gz 1HL_34A_1.fq.gz 1HL_35A_1.fq.gz 1HL_3A_1.fq.gz 1HL_4A_1.fq.gz 1HL_5A_1.fq.gz 1HL_6A_1.fq.gz 1HL_7A_1.fq.gz 1HL_8A_1.fq.gz 1HL_9A_1.fq.gz 1NF_10A_1.fq.gz 1NF_11A_1.fq.gz 1NF_12A_1.fq.gz 1NF_13A_1.fq.gz 1NF_14A_1.fq.gz 1NF_15A_1.fq.gz 1NF_16A_1.fq.gz 1NF_17A_1.fq.gz 1NF_18A_1.fq.gz 1NF_19A_1.fq.gz 1NF_1A_1.fq.gz 1NF_20A_1.fq.gz 1NF_21A_1.fq.gz 1NF_22A_1.fq.gz 1NF_23A_1.fq.gz 1NF_24A_1.fq.gz 1NF_25A_1.fq.gz 1NF_26A_1.fq.gz 1NF_27A_1.fq.gz 1NF_28A_1.fq.gz 1NF_29A_1.fq.gz 1NF_2A_1.fq.gz 1NF_30A_1.fq.gz 1NF_31A_1.fq.gz 1NF_32A_1.fq.gz 1NF_33A_1.fq.gz 1NF_4A_1.fq.gz 1NF_5A_1.fq.gz 1NF_6A_1.fq.gz 1NF_7A_1.fq.gz 1NF_8A_1.fq.gz 1NF_9A_1.fq.gz 1SN_10A_1.fq.gz 1SN_11A_1.fq.gz 1SN_12A_1.fq.gz 1SN_13A_1.fq.gz 1SN_14A_1.fq.gz 1SN_15A_1.fq.gz 1SN_16A_1.fq.gz 1SN_17A_1.fq.gz 1SN_18A_1.fq.gz 1SN_19A_1.fq.gz 1SN_1A_1.fq.gz 1SN_20A_1.fq.gz 1SN_21A_1.fq.gz 1SN_22A_1.fq.gz 1SN_23A_1.fq.gz 1SN_24A_1.fq.gz 1SN_25A_1.fq.gz 1SN_26A_1.fq.gz 1SN_27A_1.fq.gz 1SN_28A_1.fq.gz 1SN_29A_1.fq.gz 1SN_2A_1.fq.gz 1SN_30A_1.fq.gz 1SN_31A_1.fq.gz 1SN_32A_1.fq.gz 1SN_3A_1.fq.gz 1SN_4A_1.fq.gz 1SN_5A_1.fq.gz 1SN_6A_1.fq.gz 1SN_7A_1.fq.gz 1SN_8A_1.fq.gz 1SN_9A_1.fq.gz )
seq2=( 1HL_10A_2.fq.gz 1HL_11A_2.fq.gz 1HL_12A_2.fq.gz 1HL_13A_2.fq.gz 1HL_14A_2.fq.gz 1HL_15A_2.fq.gz 1HL_16A_2.fq.gz 1HL_17A_2.fq.gz 1HL_19A_2.fq.gz 1HL_1A_2.fq.gz 1HL_20A_2.fq.gz 1HL_21A_2.fq.gz 1HL_22A_2.fq.gz 1HL_23A_2.fq.gz 1HL_24A_2.fq.gz 1HL_25A_2.fq.gz 1HL_26A_2.fq.gz 1HL_27A_2.fq.gz 1HL_28A_2.fq.gz 1HL_29A_2.fq.gz 1HL_2A_2.fq.gz 1HL_31A_2.fq.gz 1HL_33A_2.fq.gz 1HL_34A_2.fq.gz 1HL_35A_2.fq.gz 1HL_3A_2.fq.gz 1HL_4A_2.fq.gz 1HL_5A_2.fq.gz 1HL_6A_2.fq.gz 1HL_7A_2.fq.gz 1HL_8A_2.fq.gz 1HL_9A_2.fq.gz 1NF_10A_2.fq.gz 1NF_11A_2.fq.gz 1NF_12A_2.fq.gz 1NF_13A_2.fq.gz 1NF_14A_2.fq.gz 1NF_15A_2.fq.gz 1NF_16A_2.fq.gz 1NF_17A_2.fq.gz 1NF_18A_2.fq.gz 1NF_19A_2.fq.gz 1NF_1A_2.fq.gz 1NF_20A_2.fq.gz 1NF_21A_2.fq.gz 1NF_22A_2.fq.gz 1NF_23A_2.fq.gz 1NF_24A_2.fq.gz 1NF_25A_2.fq.gz 1NF_26A_2.fq.gz 1NF_27A_2.fq.gz 1NF_28A_2.fq.gz 1NF_29A_2.fq.gz 1NF_2A_2.fq.gz 1NF_30A_2.fq.gz 1NF_31A_2.fq.gz 1NF_32A_2.fq.gz 1NF_33A_2.fq.gz 1NF_4A_2.fq.gz 1NF_5A_2.fq.gz 1NF_6A_2.fq.gz 1NF_7A_2.fq.gz 1NF_8A_2.fq.gz 1NF_9A_2.fq.gz 1SN_10A_2.fq.gz 1SN_11A_2.fq.gz 1SN_12A_2.fq.gz 1SN_13A_2.fq.gz 1SN_14A_2.fq.gz 1SN_15A_2.fq.gz 1SN_16A_2.fq.gz 1SN_17A_2.fq.gz 1SN_18A_2.fq.gz 1SN_19A_2.fq.gz 1SN_1A_2.fq.gz 1SN_20A_2.fq.gz 1SN_21A_2.fq.gz 1SN_22A_2.fq.gz 1SN_23A_2.fq.gz 1SN_24A_2.fq.gz 1SN_25A_2.fq.gz 1SN_26A_2.fq.gz 1SN_27A_2.fq.gz 1SN_28A_2.fq.gz 1SN_29A_2.fq.gz 1SN_2A_2.fq.gz 1SN_30A_2.fq.gz 1SN_31A_2.fq.gz 1SN_32A_2.fq.gz 1SN_3A_2.fq.gz 1SN_4A_2.fq.gz 1SN_5A_2.fq.gz 1SN_6A_2.fq.gz 1SN_7A_2.fq.gz 1SN_8A_2.fq.gz 1SN_9A_2.fq.gz )
time for pair in "${!seq1[@]}"; do
 i=${seq1[$pair]}
 j=${seq2[$pair]}
 /usr/local/bioinformatics/stacks-1.40/process_radtags \
 -1 $i \
 -2 $j \
 -o /data/stacks/radtags_out/ \
 -e apeKI \
 -c \
 -q \
 -i gzfastq \
 -t 16 \
 >>radtags.stdout 2>>radtags.stderr
done


real	61m18.225s
user	50m30.730s
sys	0m7.866s


In [5]:
cd /data/stacks/radtags_out

/data/stacks/radtags_out


In [6]:
ls

1HL_10A_1.1.fq.gz 1NF_10A_1.rem.1.fq.gz 1SN_10A_2.2.fq.gz
1HL_10A_1.rem.1.fq.gz 1NF_10A_2.2.fq.gz 1SN_10A_2.rem.2.fq.gz
1HL_10A_2.2.fq.gz 1NF_10A_2.rem.2.fq.gz 1SN_11A_1.1.fq.gz
1HL_10A_2.rem.2.fq.gz 1NF_11A_1.1.fq.gz 1SN_11A_1.rem.1.fq.gz
1HL_11A_1.1.fq.gz 1NF_11A_1.rem.1.fq.gz 1SN_11A_2.2.fq.gz
1HL_11A_1.rem.1.fq.gz 1NF_11A_2.2.fq.gz 1SN_11A_2.rem.2.fq.gz
1HL_11A_2.2.fq.gz 1NF_11A_2.rem.2.fq.gz 1SN_12A_1.1.fq.gz
1HL_11A_2.rem.2.fq.gz 1NF_12A_1.1.fq.gz 1SN_12A_1.rem.1.fq.gz
1HL_12A_1.1.fq.gz 1NF_12A_1.rem.1.fq.gz 1SN_12A_2.2.fq.gz
1HL_12A_1.rem.1.fq.gz 1NF_12A_2.2.fq.gz 1SN_12A_2.rem.2.fq.gz
1HL_12A_2.2.fq.gz 1NF_12A_2.rem.2.fq.gz 1SN_13A_1.1.fq.gz
1HL_12A_2.rem.2.fq.gz 1NF_13A_1.1.fq.gz 1SN_13A_1.rem.1.fq.gz
1HL_13A_1.1.fq.gz 1NF_13A_1.rem.1.fq.gz 1SN_13A_2.2.fq.gz
1HL_13A_1.rem.1.fq.gz 1NF_13A_2.2.fq.gz 1SN_13A_2.rem.2.fq.gz
1HL_13A_2.2.fq.gz 1NF_13A_2.rem.2.fq.gz 1SN_14A_1.1.fq.gz
1HL_13A_2.rem.2.fq.gz 1NF_14A_1.1.fq.gz 1SN_14A_1.rem.1.fq.gz
1HL_14A_1.1.fq.gz 1NF_14

### Concatenate process_radtags output files into single FASTQ file

Creates four lists of each fastq file and processes them using arrays. Uses a for loop to precess each value (i.e. index) of the four arrays is assigned to four different variables. The contents of the four variables are concatentated into a single file. The output file is named utilizing parameter substitution: ${i/_1.1/}. This takes the file stored in the variable "i", matches the text "_1.1" in that file name and replaces it with nothing "/" (i.e. deletes it).

In [7]:
%%bash
list1=( 1HL_10A_1.1.fq.gz 1HL_11A_1.1.fq.gz 1HL_12A_1.1.fq.gz 1HL_13A_1.1.fq.gz 1HL_14A_1.1.fq.gz 1HL_15A_1.1.fq.gz 1HL_16A_1.1.fq.gz 1HL_17A_1.1.fq.gz 1HL_19A_1.1.fq.gz 1HL_1A_1.1.fq.gz 1HL_20A_1.1.fq.gz 1HL_21A_1.1.fq.gz 1HL_22A_1.1.fq.gz 1HL_23A_1.1.fq.gz 1HL_24A_1.1.fq.gz 1HL_25A_1.1.fq.gz 1HL_26A_1.1.fq.gz 1HL_27A_1.1.fq.gz 1HL_28A_1.1.fq.gz 1HL_29A_1.1.fq.gz 1HL_2A_1.1.fq.gz 1HL_31A_1.1.fq.gz 1HL_33A_1.1.fq.gz 1HL_34A_1.1.fq.gz 1HL_35A_1.1.fq.gz 1HL_3A_1.1.fq.gz 1HL_4A_1.1.fq.gz 1HL_5A_1.1.fq.gz 1HL_6A_1.1.fq.gz 1HL_7A_1.1.fq.gz 1HL_8A_1.1.fq.gz 1HL_9A_1.1.fq.gz 1NF_10A_1.1.fq.gz 1NF_11A_1.1.fq.gz 1NF_12A_1.1.fq.gz 1NF_13A_1.1.fq.gz 1NF_14A_1.1.fq.gz 1NF_15A_1.1.fq.gz 1NF_16A_1.1.fq.gz 1NF_17A_1.1.fq.gz 1NF_18A_1.1.fq.gz 1NF_19A_1.1.fq.gz 1NF_1A_1.1.fq.gz 1NF_20A_1.1.fq.gz 1NF_21A_1.1.fq.gz 1NF_22A_1.1.fq.gz 1NF_23A_1.1.fq.gz 1NF_24A_1.1.fq.gz 1NF_25A_1.1.fq.gz 1NF_26A_1.1.fq.gz 1NF_27A_1.1.fq.gz 1NF_28A_1.1.fq.gz 1NF_29A_1.1.fq.gz 1NF_2A_1.1.fq.gz 1NF_30A_1.1.fq.gz 1NF_31A_1.1.fq.gz 1NF_32A_1.1.fq.gz 1NF_33A_1.1.fq.gz 1NF_4A_1.1.fq.gz 1NF_5A_1.1.fq.gz 1NF_6A_1.1.fq.gz 1NF_7A_1.1.fq.gz 1NF_8A_1.1.fq.gz 1NF_9A_1.1.fq.gz 1SN_10A_1.1.fq.gz 1SN_11A_1.1.fq.gz 1SN_12A_1.1.fq.gz 1SN_13A_1.1.fq.gz 1SN_14A_1.1.fq.gz 1SN_15A_1.1.fq.gz 1SN_16A_1.1.fq.gz 1SN_17A_1.1.fq.gz 1SN_18A_1.1.fq.gz 1SN_19A_1.1.fq.gz 1SN_1A_1.1.fq.gz 1SN_20A_1.1.fq.gz 1SN_21A_1.1.fq.gz 1SN_22A_1.1.fq.gz 1SN_23A_1.1.fq.gz 1SN_24A_1.1.fq.gz 1SN_25A_1.1.fq.gz 1SN_26A_1.1.fq.gz 1SN_27A_1.1.fq.gz 1SN_28A_1.1.fq.gz 1SN_29A_1.1.fq.gz 1SN_2A_1.1.fq.gz 1SN_30A_1.1.fq.gz 1SN_31A_1.1.fq.gz 1SN_32A_1.1.fq.gz 1SN_3A_1.1.fq.gz 1SN_4A_1.1.fq.gz 1SN_5A_1.1.fq.gz 1SN_6A_1.1.fq.gz 1SN_7A_1.1.fq.gz 1SN_8A_1.1.fq.gz 1SN_9A_1.1.fq.gz )
list2=( 1HL_10A_1.rem.1.fq.gz 1HL_11A_1.rem.1.fq.gz 1HL_12A_1.rem.1.fq.gz 1HL_13A_1.rem.1.fq.gz 1HL_14A_1.rem.1.fq.gz 1HL_15A_1.rem.1.fq.gz 1HL_16A_1.rem.1.fq.gz 1HL_17A_1.rem.1.fq.gz 1HL_19A_1.rem.1.fq.gz 1HL_1A_1.rem.1.fq.gz 1HL_20A_1.rem.1.fq.gz 1HL_21A_1.rem.1.fq.gz 1HL_22A_1.rem.1.fq.gz 1HL_23A_1.rem.1.fq.gz 1HL_24A_1.rem.1.fq.gz 1HL_25A_1.rem.1.fq.gz 1HL_26A_1.rem.1.fq.gz 1HL_27A_1.rem.1.fq.gz 1HL_28A_1.rem.1.fq.gz 1HL_29A_1.rem.1.fq.gz 1HL_2A_1.rem.1.fq.gz 1HL_31A_1.rem.1.fq.gz 1HL_33A_1.rem.1.fq.gz 1HL_34A_1.rem.1.fq.gz 1HL_35A_1.rem.1.fq.gz 1HL_3A_1.rem.1.fq.gz 1HL_4A_1.rem.1.fq.gz 1HL_5A_1.rem.1.fq.gz 1HL_6A_1.rem.1.fq.gz 1HL_7A_1.rem.1.fq.gz 1HL_8A_1.rem.1.fq.gz 1HL_9A_1.rem.1.fq.gz 1NF_10A_1.rem.1.fq.gz 1NF_11A_1.rem.1.fq.gz 1NF_12A_1.rem.1.fq.gz 1NF_13A_1.rem.1.fq.gz 1NF_14A_1.rem.1.fq.gz 1NF_15A_1.rem.1.fq.gz 1NF_16A_1.rem.1.fq.gz 1NF_17A_1.rem.1.fq.gz 1NF_18A_1.rem.1.fq.gz 1NF_19A_1.rem.1.fq.gz 1NF_1A_1.rem.1.fq.gz 1NF_20A_1.rem.1.fq.gz 1NF_21A_1.rem.1.fq.gz 1NF_22A_1.rem.1.fq.gz 1NF_23A_1.rem.1.fq.gz 1NF_24A_1.rem.1.fq.gz 1NF_25A_1.rem.1.fq.gz 1NF_26A_1.rem.1.fq.gz 1NF_27A_1.rem.1.fq.gz 1NF_28A_1.rem.1.fq.gz 1NF_29A_1.rem.1.fq.gz 1NF_2A_1.rem.1.fq.gz 1NF_30A_1.rem.1.fq.gz 1NF_31A_1.rem.1.fq.gz 1NF_32A_1.rem.1.fq.gz 1NF_33A_1.rem.1.fq.gz 1NF_4A_1.rem.1.fq.gz 1NF_5A_1.rem.1.fq.gz 1NF_6A_1.rem.1.fq.gz 1NF_7A_1.rem.1.fq.gz 1NF_8A_1.rem.1.fq.gz 1NF_9A_1.rem.1.fq.gz 1SN_10A_1.rem.1.fq.gz 1SN_11A_1.rem.1.fq.gz 1SN_12A_1.rem.1.fq.gz 1SN_13A_1.rem.1.fq.gz 1SN_14A_1.rem.1.fq.gz 1SN_15A_1.rem.1.fq.gz 1SN_16A_1.rem.1.fq.gz 1SN_17A_1.rem.1.fq.gz 1SN_18A_1.rem.1.fq.gz 1SN_19A_1.rem.1.fq.gz 1SN_1A_1.rem.1.fq.gz 1SN_20A_1.rem.1.fq.gz 1SN_21A_1.rem.1.fq.gz 1SN_22A_1.rem.1.fq.gz 1SN_23A_1.rem.1.fq.gz 1SN_24A_1.rem.1.fq.gz 1SN_25A_1.rem.1.fq.gz 1SN_26A_1.rem.1.fq.gz 1SN_27A_1.rem.1.fq.gz 1SN_28A_1.rem.1.fq.gz 1SN_29A_1.rem.1.fq.gz 1SN_2A_1.rem.1.fq.gz 1SN_30A_1.rem.1.fq.gz 1SN_31A_1.rem.1.fq.gz 1SN_32A_1.rem.1.fq.gz 1SN_3A_1.rem.1.fq.gz 1SN_4A_1.rem.1.fq.gz 1SN_5A_1.rem.1.fq.gz 1SN_6A_1.rem.1.fq.gz 1SN_7A_1.rem.1.fq.gz 1SN_8A_1.rem.1.fq.gz 1SN_9A_1.rem.1.fq.gz )
list3=( 1HL_10A_2.2.fq.gz 1HL_11A_2.2.fq.gz 1HL_12A_2.2.fq.gz 1HL_13A_2.2.fq.gz 1HL_14A_2.2.fq.gz 1HL_15A_2.2.fq.gz 1HL_16A_2.2.fq.gz 1HL_17A_2.2.fq.gz 1HL_19A_2.2.fq.gz 1HL_1A_2.2.fq.gz 1HL_20A_2.2.fq.gz 1HL_21A_2.2.fq.gz 1HL_22A_2.2.fq.gz 1HL_23A_2.2.fq.gz 1HL_24A_2.2.fq.gz 1HL_25A_2.2.fq.gz 1HL_26A_2.2.fq.gz 1HL_27A_2.2.fq.gz 1HL_28A_2.2.fq.gz 1HL_29A_2.2.fq.gz 1HL_2A_2.2.fq.gz 1HL_31A_2.2.fq.gz 1HL_33A_2.2.fq.gz 1HL_34A_2.2.fq.gz 1HL_35A_2.2.fq.gz 1HL_3A_2.2.fq.gz 1HL_4A_2.2.fq.gz 1HL_5A_2.2.fq.gz 1HL_6A_2.2.fq.gz 1HL_7A_2.2.fq.gz 1HL_8A_2.2.fq.gz 1HL_9A_2.2.fq.gz 1NF_10A_2.2.fq.gz 1NF_11A_2.2.fq.gz 1NF_12A_2.2.fq.gz 1NF_13A_2.2.fq.gz 1NF_14A_2.2.fq.gz 1NF_15A_2.2.fq.gz 1NF_16A_2.2.fq.gz 1NF_17A_2.2.fq.gz 1NF_18A_2.2.fq.gz 1NF_19A_2.2.fq.gz 1NF_1A_2.2.fq.gz 1NF_20A_2.2.fq.gz 1NF_21A_2.2.fq.gz 1NF_22A_2.2.fq.gz 1NF_23A_2.2.fq.gz 1NF_24A_2.2.fq.gz 1NF_25A_2.2.fq.gz 1NF_26A_2.2.fq.gz 1NF_27A_2.2.fq.gz 1NF_28A_2.2.fq.gz 1NF_29A_2.2.fq.gz 1NF_2A_2.2.fq.gz 1NF_30A_2.2.fq.gz 1NF_31A_2.2.fq.gz 1NF_32A_2.2.fq.gz 1NF_33A_2.2.fq.gz 1NF_4A_2.2.fq.gz 1NF_5A_2.2.fq.gz 1NF_6A_2.2.fq.gz 1NF_7A_2.2.fq.gz 1NF_8A_2.2.fq.gz 1NF_9A_2.2.fq.gz 1SN_10A_2.2.fq.gz 1SN_11A_2.2.fq.gz 1SN_12A_2.2.fq.gz 1SN_13A_2.2.fq.gz 1SN_14A_2.2.fq.gz 1SN_15A_2.2.fq.gz 1SN_16A_2.2.fq.gz 1SN_17A_2.2.fq.gz 1SN_18A_2.2.fq.gz 1SN_19A_2.2.fq.gz 1SN_1A_2.2.fq.gz 1SN_20A_2.2.fq.gz 1SN_21A_2.2.fq.gz 1SN_22A_2.2.fq.gz 1SN_23A_2.2.fq.gz 1SN_24A_2.2.fq.gz 1SN_25A_2.2.fq.gz 1SN_26A_2.2.fq.gz 1SN_27A_2.2.fq.gz 1SN_28A_2.2.fq.gz 1SN_29A_2.2.fq.gz 1SN_2A_2.2.fq.gz 1SN_30A_2.2.fq.gz 1SN_31A_2.2.fq.gz 1SN_32A_2.2.fq.gz 1SN_3A_2.2.fq.gz 1SN_4A_2.2.fq.gz 1SN_5A_2.2.fq.gz 1SN_6A_2.2.fq.gz 1SN_7A_2.2.fq.gz 1SN_8A_2.2.fq.gz 1SN_9A_2.2.fq.gz )
list4=( 1HL_10A_2.rem.2.fq.gz 1HL_11A_2.rem.2.fq.gz 1HL_12A_2.rem.2.fq.gz 1HL_13A_2.rem.2.fq.gz 1HL_14A_2.rem.2.fq.gz 1HL_15A_2.rem.2.fq.gz 1HL_16A_2.rem.2.fq.gz 1HL_17A_2.rem.2.fq.gz 1HL_19A_2.rem.2.fq.gz 1HL_1A_2.rem.2.fq.gz 1HL_20A_2.rem.2.fq.gz 1HL_21A_2.rem.2.fq.gz 1HL_22A_2.rem.2.fq.gz 1HL_23A_2.rem.2.fq.gz 1HL_24A_2.rem.2.fq.gz 1HL_25A_2.rem.2.fq.gz 1HL_26A_2.rem.2.fq.gz 1HL_27A_2.rem.2.fq.gz 1HL_28A_2.rem.2.fq.gz 1HL_29A_2.rem.2.fq.gz 1HL_2A_2.rem.2.fq.gz 1HL_31A_2.rem.2.fq.gz 1HL_33A_2.rem.2.fq.gz 1HL_34A_2.rem.2.fq.gz 1HL_35A_2.rem.2.fq.gz 1HL_3A_2.rem.2.fq.gz 1HL_4A_2.rem.2.fq.gz 1HL_5A_2.rem.2.fq.gz 1HL_6A_2.rem.2.fq.gz 1HL_7A_2.rem.2.fq.gz 1HL_8A_2.rem.2.fq.gz 1HL_9A_2.rem.2.fq.gz 1NF_10A_2.rem.2.fq.gz 1NF_11A_2.rem.2.fq.gz 1NF_12A_2.rem.2.fq.gz 1NF_13A_2.rem.2.fq.gz 1NF_14A_2.rem.2.fq.gz 1NF_15A_2.rem.2.fq.gz 1NF_16A_2.rem.2.fq.gz 1NF_17A_2.rem.2.fq.gz 1NF_18A_2.rem.2.fq.gz 1NF_19A_2.rem.2.fq.gz 1NF_1A_2.rem.2.fq.gz 1NF_20A_2.rem.2.fq.gz 1NF_21A_2.rem.2.fq.gz 1NF_22A_2.rem.2.fq.gz 1NF_23A_2.rem.2.fq.gz 1NF_24A_2.rem.2.fq.gz 1NF_25A_2.rem.2.fq.gz 1NF_26A_2.rem.2.fq.gz 1NF_27A_2.rem.2.fq.gz 1NF_28A_2.rem.2.fq.gz 1NF_29A_2.rem.2.fq.gz 1NF_2A_2.rem.2.fq.gz 1NF_30A_2.rem.2.fq.gz 1NF_31A_2.rem.2.fq.gz 1NF_32A_2.rem.2.fq.gz 1NF_33A_2.rem.2.fq.gz 1NF_4A_2.rem.2.fq.gz 1NF_5A_2.rem.2.fq.gz 1NF_6A_2.rem.2.fq.gz 1NF_7A_2.rem.2.fq.gz 1NF_8A_2.rem.2.fq.gz 1NF_9A_2.rem.2.fq.gz 1SN_10A_2.rem.2.fq.gz 1SN_11A_2.rem.2.fq.gz 1SN_12A_2.rem.2.fq.gz 1SN_13A_2.rem.2.fq.gz 1SN_14A_2.rem.2.fq.gz 1SN_15A_2.rem.2.fq.gz 1SN_16A_2.rem.2.fq.gz 1SN_17A_2.rem.2.fq.gz 1SN_18A_2.rem.2.fq.gz 1SN_19A_2.rem.2.fq.gz 1SN_1A_2.rem.2.fq.gz 1SN_20A_2.rem.2.fq.gz 1SN_21A_2.rem.2.fq.gz 1SN_22A_2.rem.2.fq.gz 1SN_23A_2.rem.2.fq.gz 1SN_24A_2.rem.2.fq.gz 1SN_25A_2.rem.2.fq.gz 1SN_26A_2.rem.2.fq.gz 1SN_27A_2.rem.2.fq.gz 1SN_28A_2.rem.2.fq.gz 1SN_29A_2.rem.2.fq.gz 1SN_2A_2.rem.2.fq.gz 1SN_30A_2.rem.2.fq.gz 1SN_31A_2.rem.2.fq.gz 1SN_32A_2.rem.2.fq.gz 1SN_3A_2.rem.2.fq.gz 1SN_4A_2.rem.2.fq.gz 1SN_5A_2.rem.2.fq.gz 1SN_6A_2.rem.2.fq.gz 1SN_7A_2.rem.2.fq.gz 1SN_8A_2.rem.2.fq.gz 1SN_9A_2.rem.2.fq.gz )

time for index in "${!list1[@]}"; do
 i=${list1[$index]}
 j=${list2[$index]}
 k=${list3[$index]}
 l=${list4[$index]}
 cat $i $j $k $l > ${i/_1.1/}
 
done


real	6m24.711s
user	0m0.089s
sys	0m6.347s


#### List newly concatenated fastq files

In [8]:
%%bash
echo *A.fq.gz

1HL_10A.fq.gz 1HL_11A.fq.gz 1HL_12A.fq.gz 1HL_13A.fq.gz 1HL_14A.fq.gz 1HL_15A.fq.gz 1HL_16A.fq.gz 1HL_17A.fq.gz 1HL_19A.fq.gz 1HL_1A.fq.gz 1HL_20A.fq.gz 1HL_21A.fq.gz 1HL_22A.fq.gz 1HL_23A.fq.gz 1HL_24A.fq.gz 1HL_25A.fq.gz 1HL_26A.fq.gz 1HL_27A.fq.gz 1HL_28A.fq.gz 1HL_29A.fq.gz 1HL_2A.fq.gz 1HL_31A.fq.gz 1HL_33A.fq.gz 1HL_34A.fq.gz 1HL_35A.fq.gz 1HL_3A.fq.gz 1HL_4A.fq.gz 1HL_5A.fq.gz 1HL_6A.fq.gz 1HL_7A.fq.gz 1HL_8A.fq.gz 1HL_9A.fq.gz 1NF_10A.fq.gz 1NF_11A.fq.gz 1NF_12A.fq.gz 1NF_13A.fq.gz 1NF_14A.fq.gz 1NF_15A.fq.gz 1NF_16A.fq.gz 1NF_17A.fq.gz 1NF_18A.fq.gz 1NF_19A.fq.gz 1NF_1A.fq.gz 1NF_20A.fq.gz 1NF_21A.fq.gz 1NF_22A.fq.gz 1NF_23A.fq.gz 1NF_24A.fq.gz 1NF_25A.fq.gz 1NF_26A.fq.gz 1NF_27A.fq.gz 1NF_28A.fq.gz 1NF_29A.fq.gz 1NF_2A.fq.gz 1NF_30A.fq.gz 1NF_31A.fq.gz 1NF_32A.fq.gz 1NF_33A.fq.gz 1NF_4A.fq.gz 1NF_5A.fq.gz 1NF_6A.fq.gz 1NF_7A.fq.gz 1NF_8A.fq.gz 1NF_9A.fq.gz 1SN_10A.fq.gz 1SN_11A.fq.gz 1SN_12A.fq.gz 1SN_13A.fq.gz 1SN_14A.fq.gz 1SN_15A.fq.gz 1SN_16A.fq.gz 1SN_17A.fq.gz 1SN_18A.f

In [None]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in *A.fq.gz; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done

Jupyter kernel crased while this was running, so no output to screen. I used the top command to verify that pyrad was still running despite Jupyter being down. Let's look at some files to see what happened...


In [1]:
ls

20150313_LSU_Oil_Spill_IndexID_Comparisons.ipynb
20150316_LSU_OilSpill_Adapter_ID.ipynb
20150317_LSU_OilSpill_EpinextAdaptor1_ID.ipynb
20150408_Install_Bismark_bisulfite_mapper.ipynb
20150414_C_gigas_Larvae_OA_Trimmomatic_FASTQC.ipynb
20150414_C_virginica_LSU_Oil_Spill_Trimmomatic_FASTQC.ipynb
20150429_Gigas_larvae_OA_BLASTn.ipynb
20150501_Cgigas_larvae_OA_BLASTn_nt.ipynb
20150506_Cgigas_larvae_OA_trimmomatic_FASTQC.ipynb
20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC.ipynb
20160114_wasted_space_synologies.ipynb
20160126_Olurida_BGI_data_handling.ipynb
20160126_Pgenerosa_BGI_data_handling.ipynb
20160203_Olurida_Zymo_Data_Handling.ipynb
20160308_find_rename_2bRAD_undetermined_fastqs.ipynb
20160314_Olurida_GBS_data_management.ipynb
20160406_Oly_GBS_STACKS.ipynb
20160406_STACKS_install.ipynb
20160411_Concatenate_Oly_MBDseq.ipynb
20160418_Oly_GBS_PE-Pyrad_populations.ipynb
20160418_pyrad_oly_PE-GBS.ipynb
20160427_Oly_GBS_data_management.ipynb
20160427_speed_comparison.ipynb
20160428_Oly_GBS_

In [1]:
cd /data/stacks/

/data/stacks


In [2]:
ls

[0m[01;34mradtags_out[0m/


In [3]:
cd ..

/data


In [4]:
ls

[0m[01;32m1HL_10A_1.fq.gz[0m* [01;32m1HL_35A_2.fq.gz[0m* [01;32m1NF_26A_1.fq.gz[0m* [01;32m1SN_19A_2.fq.gz[0m*
[01;32m1HL_10A_2.fq.gz[0m* [01;32m1HL_3A_1.fq.gz[0m* [01;32m1NF_26A_2.fq.gz[0m* [01;32m1SN_1A_1.fq.gz[0m*
[01;32m1HL_11A_1.fq.gz[0m* [01;32m1HL_3A_2.fq.gz[0m* [01;32m1NF_27A_1.fq.gz[0m* [01;32m1SN_1A_2.fq.gz[0m*
[01;32m1HL_11A_2.fq.gz[0m* [01;32m1HL_4A_1.fq.gz[0m* [01;32m1NF_27A_2.fq.gz[0m* [01;32m1SN_20A_1.fq.gz[0m*
[01;32m1HL_12A_1.fq.gz[0m* [01;32m1HL_4A_2.fq.gz[0m* [01;32m1NF_28A_1.fq.gz[0m* [01;32m1SN_20A_2.fq.gz[0m*
[01;32m1HL_12A_2.fq.gz[0m* [01;32m1HL_5A_1.fq.gz[0m* [01;32m1NF_28A_2.fq.gz[0m* [01;32m1SN_21A_1.fq.gz[0m*
[01;32m1HL_13A_1.fq.gz[0m* [01;32m1HL_5A_2.fq.gz[0m* [01;32m1NF_29A_1.fq.gz[0m* [01;32m1SN_21A_2.fq.gz[0m*
[01;32m1HL_13A_2.fq.gz[0m* [01;32m1HL_6A_1.fq.gz[0m* [01;32m1NF_29A_2.fq.gz[0m* [01;32m1SN_22A_1.fq.gz[0m*
[01;32m1HL_14A_1.fq.gz[0m* [01;32m1HL_6A_2.fq.gz[0m* [01;32m1NF_

Guess it didn't complete...

In [6]:
cd /data/

/data


In [7]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in *A.fq.gz; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done

ustacks paramters selected:
 Min depth of coverage to create a stack: 3
 Max distance allowed between stacks: 2
 Max distance allowed to align secondary reads: 4
 Max number of stacks allowed per de novo locus: 3
 Deleveraging algorithm: enabled
 Removal algorithm: enabled
 Model type: SNP
 Alpha significance level for model: 0.05
 Gapped alignments: disabled
Failed to open gzipped file '*A.fq.gz': No such file or directory.

real	0m0.002s
user	0m0.004s
sys	0m0.001s


In [8]:
ls

[0m[01;32m1HL_10A_1.fq.gz[0m* [01;32m1HL_35A_2.fq.gz[0m* [01;32m1NF_26A_1.fq.gz[0m* [01;32m1SN_19A_2.fq.gz[0m*
[01;32m1HL_10A_2.fq.gz[0m* [01;32m1HL_3A_1.fq.gz[0m* [01;32m1NF_26A_2.fq.gz[0m* [01;32m1SN_1A_1.fq.gz[0m*
[01;32m1HL_11A_1.fq.gz[0m* [01;32m1HL_3A_2.fq.gz[0m* [01;32m1NF_27A_1.fq.gz[0m* [01;32m1SN_1A_2.fq.gz[0m*
[01;32m1HL_11A_2.fq.gz[0m* [01;32m1HL_4A_1.fq.gz[0m* [01;32m1NF_27A_2.fq.gz[0m* [01;32m1SN_20A_1.fq.gz[0m*
[01;32m1HL_12A_1.fq.gz[0m* [01;32m1HL_4A_2.fq.gz[0m* [01;32m1NF_28A_1.fq.gz[0m* [01;32m1SN_20A_2.fq.gz[0m*
[01;32m1HL_12A_2.fq.gz[0m* [01;32m1HL_5A_1.fq.gz[0m* [01;32m1NF_28A_2.fq.gz[0m* [01;32m1SN_21A_1.fq.gz[0m*
[01;32m1HL_13A_1.fq.gz[0m* [01;32m1HL_5A_2.fq.gz[0m* [01;32m1NF_29A_1.fq.gz[0m* [01;32m1SN_21A_2.fq.gz[0m*
[01;32m1HL_13A_2.fq.gz[0m* [01;32m1HL_6A_1.fq.gz[0m* [01;32m1NF_29A_2.fq.gz[0m* [01;32m1SN_22A_1.fq.gz[0m*
[01;32m1HL_14A_1.fq.gz[0m* [01;32m1HL_6A_2.fq.gz[0m* [01;32m1NF_

In [9]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in "*A.fq.gz"; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done

ustacks paramters selected:
 Min depth of coverage to create a stack: 3
 Max distance allowed between stacks: 2
 Max distance allowed to align secondary reads: 4
 Max number of stacks allowed per de novo locus: 3
 Deleveraging algorithm: enabled
 Removal algorithm: enabled
 Model type: SNP
 Alpha significance level for model: 0.05
 Gapped alignments: disabled
Failed to open gzipped file '*A.fq.gz': No such file or directory.

real	0m0.001s
user	0m0.000s
sys	0m0.002s


In [10]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in *A.fq.gz; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done

ustacks paramters selected:
 Min depth of coverage to create a stack: 3
 Max distance allowed between stacks: 2
 Max distance allowed to align secondary reads: 4
 Max number of stacks allowed per de novo locus: 3
 Deleveraging algorithm: enabled
 Removal algorithm: enabled
 Model type: SNP
 Alpha significance level for model: 0.05
 Gapped alignments: disabled
Failed to open gzipped file '*A.fq.gz': No such file or directory.

real	0m0.002s
user	0m0.000s
sys	0m0.002s


In [11]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in *A.fq.gz; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done

ustacks paramters selected:
 Min depth of coverage to create a stack: 3
 Max distance allowed between stacks: 2
 Max distance allowed to align secondary reads: 4
 Max number of stacks allowed per de novo locus: 3
 Deleveraging algorithm: enabled
 Removal algorithm: enabled
 Model type: SNP
 Alpha significance level for model: 0.05
 Gapped alignments: disabled
Failed to open gzipped file '*A.fq.gz': No such file or directory.

real	0m0.002s
user	0m0.000s
sys	0m0.001s


Something weird was going on. Got an error message in the PyRad notebook I was running (see my [notebook entry](http://onsnetwork.org/kubu4/2016/07/17/computing-amazon-ec2-instance-out-of-space/) about being out of space. Maybe that's what's causing the issue here and preventing me from re-running this analysis? I've since expanded the storage space of this Amazon EC2 Instance. Let's try again...

In [2]:
%%bash
date

Mon Jul 18 15:07:09 UTC 2016


In [3]:
cd /data/

/data


In [4]:
ls

[0m[01;32m1HL_10A_1.fq.gz[0m* [01;32m1HL_35A_2.fq.gz[0m* [01;32m1NF_26A_1.fq.gz[0m* [01;32m1SN_19A_2.fq.gz[0m*
[01;32m1HL_10A_2.fq.gz[0m* [01;32m1HL_3A_1.fq.gz[0m* [01;32m1NF_26A_2.fq.gz[0m* [01;32m1SN_1A_1.fq.gz[0m*
[01;32m1HL_11A_1.fq.gz[0m* [01;32m1HL_3A_2.fq.gz[0m* [01;32m1NF_27A_1.fq.gz[0m* [01;32m1SN_1A_2.fq.gz[0m*
[01;32m1HL_11A_2.fq.gz[0m* [01;32m1HL_4A_1.fq.gz[0m* [01;32m1NF_27A_2.fq.gz[0m* [01;32m1SN_20A_1.fq.gz[0m*
[01;32m1HL_12A_1.fq.gz[0m* [01;32m1HL_4A_2.fq.gz[0m* [01;32m1NF_28A_1.fq.gz[0m* [01;32m1SN_20A_2.fq.gz[0m*
[01;32m1HL_12A_2.fq.gz[0m* [01;32m1HL_5A_1.fq.gz[0m* [01;32m1NF_28A_2.fq.gz[0m* [01;32m1SN_21A_1.fq.gz[0m*
[01;32m1HL_13A_1.fq.gz[0m* [01;32m1HL_5A_2.fq.gz[0m* [01;32m1NF_29A_1.fq.gz[0m* [01;32m1SN_21A_2.fq.gz[0m*
[01;32m1HL_13A_2.fq.gz[0m* [01;32m1HL_6A_1.fq.gz[0m* [01;32m1NF_29A_2.fq.gz[0m* [01;32m1SN_22A_1.fq.gz[0m*
[01;32m1HL_14A_1.fq.gz[0m* [01;32m1HL_6A_2.fq.gz[0m* [01;32m1NF_2A_1.fq.

In [5]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in *A.fq.gz; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done

ustacks paramters selected:
 Min depth of coverage to create a stack: 3
 Max distance allowed between stacks: 2
 Max distance allowed to align secondary reads: 4
 Max number of stacks allowed per de novo locus: 3
 Deleveraging algorithm: enabled
 Removal algorithm: enabled
 Model type: SNP
 Alpha significance level for model: 0.05
 Gapped alignments: disabled
Failed to open gzipped file '*A.fq.gz': No such file or directory.

real	0m4.045s
user	0m0.000s
sys	0m0.002s


Why do I keep getting this error???!! Try this with a different command...

OMG... It's because the command isn't set to look in the correct directory! Duh! Grrrrr!!!!

A good example of why I should always use absolute paths for everything...

In [7]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in /data/*A.fq.gz; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done

ustacks paramters selected:
 Min depth of coverage to create a stack: 3
 Max distance allowed between stacks: 2
 Max distance allowed to align secondary reads: 4
 Max number of stacks allowed per de novo locus: 3
 Deleveraging algorithm: enabled
 Removal algorithm: enabled
 Model type: SNP
 Alpha significance level for model: 0.05
 Gapped alignments: disabled
Failed to open gzipped file '/data/*A.fq.gz': No such file or directory.

real	0m2.260s
user	0m0.000s
sys	0m0.003s


Brain dead. Yeesh. I did it again! The data I want is NOT in the location listed in the above command! It's here:

#### List newly concatenated fastq files

In [8]:
ls /data/stacks/radtags_out/

1HL_10A.fq.gz 1NF_10A_1.1.fq.gz 1SN_10A_1.rem.1.fq.gz
1HL_10A_1.1.fq.gz 1NF_10A_1.rem.1.fq.gz 1SN_10A_2.2.fq.gz
1HL_10A_1.rem.1.fq.gz 1NF_10A_2.2.fq.gz 1SN_10A_2.rem.2.fq.gz
1HL_10A_2.2.fq.gz 1NF_10A_2.rem.2.fq.gz 1SN_11A.fq.gz
1HL_10A_2.rem.2.fq.gz 1NF_11A.fq.gz 1SN_11A_1.1.fq.gz
1HL_11A.fq.gz 1NF_11A_1.1.fq.gz 1SN_11A_1.rem.1.fq.gz
1HL_11A_1.1.fq.gz 1NF_11A_1.rem.1.fq.gz 1SN_11A_2.2.fq.gz
1HL_11A_1.rem.1.fq.gz 1NF_11A_2.2.fq.gz 1SN_11A_2.rem.2.fq.gz
1HL_11A_2.2.fq.gz 1NF_11A_2.rem.2.fq.gz 1SN_12A.fq.gz
1HL_11A_2.rem.2.fq.gz 1NF_12A.fq.gz 1SN_12A_1.1.fq.gz
1HL_12A.fq.gz 1NF_12A_1.1.fq.gz 1SN_12A_1.rem.1.fq.gz
1HL_12A_1.1.fq.gz 1NF_12A_1.rem.1.fq.gz 1SN_12A_2.2.fq.gz
1HL_12A_1.rem.1.fq.gz 1NF_12A_2.2.fq.gz 1SN_12A_2.rem.2.fq.gz
1HL_12A_2.2.fq.gz 1NF_12A_2.rem.2.fq.gz 1SN_13A.fq.gz
1HL_12A_2.rem.2.fq.gz 1NF_13A.fq.gz 1SN_13A_1.1.fq.gz
1HL_13A.fq.gz 1NF_13A_1.1.fq.gz 1SN_13A_1.rem.1.fq.gz
1HL_13A_1.1.fq.gz 1NF_13A_1.rem.1.fq.gz 1SN_13A_2.2.fq.gz
1HL_13A_1.rem.1.fq.gz 1NF

In [None]:
%%bash
#Runs ustacks and appends sql ID to each file for downstream analysis.
sql_id=0
time for i in /data/stacks/radtags_out/*A.fq.gz; do
 ((sql_id++))
 /usr/local/bioinformatics/stacks-1.40/ustacks \
 -t gzfastq \
 -f $i \
 -o /data/stacks/ \
 -i $sql_id \
 -d \
 -r \
 -p 16
done