--- title: HW1 - Online Exercise and Basic GitHub Usage linkTitle: "HW1: Intro & GitHub" description: > type: docs weight: 301 --- ## A. Online Excercise: Databases and Software Tools This is an easy warm-up homework exposing students to a variety of online databases and software tools. 1. Go to [http://www.ncbi.nlm.nih.gov](http://www.ncbi.nlm.nih.gov), select `Protein` database in dropdown, and then run query: `P450 & hydroxylase & human [organism]`, select under _Source_ databases UniProtKB/Swiss-Prot 1. Report final query syntax from _Search Details_ field.

2. Save GIs of the final query result to a file. For this select under `Send to` dropdown `GI List` format. 1. Report the number of retrieved GIs.

3. Retrieve the corresponding sequences through [Batch-Entrez](http://www.ncbi.nlm.nih.gov/sites/batchentrez) using GI list file as query input -> save sequences in FASTA format

4. Generate multiple alignment and tree of these sequences using [MultAalin](http://multalin.toulouse.inra.fr/multalin/) 1. Save multiple alignment and tree to file 2. Identify putative heme binding cysteine in multiple alignment

5. Open corresponding [UniProt page](http://www.uniprot.org) and search for first P450 sequence in your list. 1. Compare putative heme binding cysteine with consensus pattern from Prosite database ([Syntax](http://prosite.expasy.org/scanprosite/scanprosite_doc.html#mo_motifs)) 2. Report corresponding Pfam ID

6. [BLASTP](http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) against the PDB database (use again first P450 in your list); on result page click first entry in BLAST hit list (here [3K9V_A](https://www.ncbi.nlm.nih.gov/protein/3K9V_A?report=genbank&log$=protalign&blast_rank=1&RID=6BZUZS51016)); then select 'Identify Conserved Domains' on side bar; click grey bar labelled 'CYP24A1'; then select 'Interactive View' under 'Structure' menu which will download a file named 'pfam00067.cn3'. 1. Compare resulting alignment with result from MultAlin 2. View 3D structure (pfam00067.cn3) in Cn3D*, save structure (screen shot) and highlight heme binding cysteine. Note, Cn3D* can be downloaded from [here](https://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml). *If there are problems in the last step (6.2) with the install of Cn3D, then please use this online only alternative: (i) click in the [3K9V_A](https://www.ncbi.nlm.nih.gov/protein/3K9V_A?report=genbank&log$=protalign&blast_rank=1&RID=6BZUZS51016) page _'Protein 3D Structure'_ instead of _'Identify Conserved Domains'_; (ii) choose one of the two structure entries provided on the subsequent page; (iii) select option "full-featured 3D viewer" in the bottom right corner of the structure image; (iv) choose the '_Details'_ tab on the right; (v) after this the structure of the protein is shown on the left and the underlying protein sequence on the right; (vi) highlight the heme binding cysteine in the structure by selecting it in the sequence; and (vii) then save the structure view to a PNG file or take a screenshot. ## B. Homework Submission to a Private GitHub Repository Please assemble the results of this homework in one PDF file and upload it to your private course GitHub repository under `Homework/HW1/HW1.pdf`. ## Due date Most homework will be due one week after they are assigned. This one is due on Thu, April 11th at 6:00 PM. You have unlimited attempts. Students can edit and re-upload files anytime before the deadline. ## Homework solution A solution for this homework is not required since the tasks are identical to the steps described above under sections HW1A-B.