Alview User's Manual -Richard Finney NCI/NIH Latest revision Feb 11, 2016 Contents. I. Introduction. II. Power User Fast Start III. Installation. IV. Compilation. V. Operation as Command Line Program VI. Operation as Web Server VII. General Operation VIII. "alview.conf" text configuration file IX. Utilities X. FAQs - Frequently Asked Questions. I. Introduction. Alview is a fast computer program for viewing BAM files. Various versions are available at my webserver at http://45.56.125.191/ . Alview source is available via the NCI/NIH github at https://github.com/NCIP/alview . This is relatively new, I'd appreciate feedback. Contact me at my email address : [finneyr AT mail nih gov](add @ and dots) with any questions or to troubleshoot a problem. Alview can take three forms: #1 as a native standalone desktop GUI application. (Cocoa, GTK or WIN32). #2 as a command line utility. #3 as a CGI web server. It is supported on three Operating Systems: #1 Windows 32 bit (works on 64bit) #2 Linux (current binary release is 64 bit, recompilation on another UNIX is easy) #3 Apple MAC 64 bit Alview is available via the NCI/NIH github at https://github.com/NCIP/alview Please note the Windows GUI version installer. The Alview Mac is available as a DMG or as a tarball . A 64bit Linux version is as is a tarball, too. The command line can be used to batch create images of snp locations in bam files. The user can then create a slide show for rapid viewing. This takes some user engagement. Alview currently supports hg18 and hg19 via large genome support files. They are required. More builds are forthcoming (mouse, GrCh38, etc). II. Power User Fast Start Consult the Alview github page for the link to my webserver. Currently it is at http://45.56.125.191/ . Various Alview builds are available there. If you are blocked from installing your system, then do this: Pick or create a directory on your computer to put Alview in. Download the binary for your system at https://github.com/NCIP/alview/tree/master/BINARIES. Put it in your directory. Set permissions to the alview executable to "execute". Download http://45.56.125.191/AlviewGenomeData.tar (from my webserver) to your Alview directory. This is a hassle on slower connections; it might make take a few minutes on your home internet connection. Untar this with "tar xvf AlviewGenomeData.tar" command (or use a graphical uncompression program for your OS). Untar-ing (extracting) the genomic support data from "AlviewGenomeData.tar" creates genome subdirectories off of the newly create directory named "GENOMEDATA" . Make sure all new files have "read" permissions set correctly. You may need to use chmod on Linux, Mac or Cygwin or icacls.exe on Windows. Cygwin users will want to check their work carefully. Set up text configuration file "alview.conf". This must be correct. The file "alview.conf" is editable using a text editor. Check it carefully. Make sure it points to the GENOMEDATA directory . See the Section on "alview.conf" file in this document for details. For command line or webserver, you might have to compile it yourself; I provide several version at http://45.56.125.191/ . It's pretty straightforward for Linux and Apple MAC ... Make and/or install 1) samtools and 2)zlib if they are not already there. Sample Compile for command line Using NEW samtools 1.0 ( after august 2014) on Linux Note, please make samtools, GTK2 development libraries, and gcc are installed. GTK2 runtime is often already installed but you need to GTK2 development libraries. gcc (c compiler)is often already installed on most Linux systems. samtools only requires download and "make". Know where you installed this. # set samtools to proper directory for your system. cp alvmisc.c alvmisc.cpp gcc -Wall -DSAMTOOLS1=1 -DUNIX=1 -DCMD_LINE=1 -o alviewmisc.cpp alviewcore.cpp -I. \ -I/h1/finneyr/samtools-1.0/ \ -I/h1/finneyr/samtools-1.0/htslib-1.0/ \ -I/h1/finneyr/samtools-1.0/htslib-1.0/htslib/ \ -L/h1/finneyr/samtools-1.0/ \ /h1/finneyr/samtools-1.0/libbam.a \ /h1/finneyr/samtools-1.0//htslib-1.0/libhts.a \ -lbam -lm -lz -lgd -lpthread -lstdc++ Compile for command line using older samtools - compile for command line on Linux ... #set samtools to proper directory for you system. #****** note, can't mix c and cpp effortlessly ** ouch *** so just copy file.c to file.cpp cp alvmisc.c alvmisc.cpp gcc -Wall -DUNIX=1 -DCMD_LINE=1 -o alview_cmdline_linux alvmisc.cpp \ alviewcore.cpp -I. -I/data/nextgen/finneyr/samtools-0.1.18/ \ -L/data/nextgen/finneyr/samtools-0.1.18/ -lbam -lm -lz -lstdc++ Cygwin users may need to modify the samtools make file to get samtools to compile. There's information on the internet on how to do this. For most, download samtools and "make" should do the trick. Currently the samtools 1.1 is a bit hard to get right on Windows. For Windows the samtools version 0.1.18 is what is currently supported and works with old compilers. For a GUI Linux, users should install gtk2 (not gtk3) development packages. This is easily install-able via apt-get or yum or your GUI installer. Know in what directories the libraries are in as you'll need to provide them to the linker. Sources for *Windows versions* of the required support libraries are provided at Alview github (zlib and samtools). Note that these are modified to work with the often very picky (Visual Studio) cl.exe C/C++ compiler. Set the appropriate IFDEF flags. These flags are APPLE, UNIX, WINDOWS, WEB_SERVER, CMD_LINE Set UNIX=1 for Linux. Set them as compiler flags. Compile alview.cpp with proper flags and pointing to the proper libraries. Note that most version have static compilation, you can compile Alview with shared libraries if you wish. Windows users may just want to use the Linux version by using Cygwin; just compile Linux version and under CYGWIN. You'll might need to install zlib and samtools. The makefile must be modified to make samtools support library compile. Consult the internet for troubleshooting tips on making samtools on Cygwin. You will have to set up the genome data directory GENOMEDATADIR for the command line version in the text file "alview.conf" properly !!! Be careful. Sample Webserver compilation for Linux ... # note that you have to point to samtools directory, samtools 0.1.18 # version is safest version to use. g++ -DUNIX -DWEB_SERVER=1 -UQT_GUI -UCOMMAND_LINE -DNATIVE -Wall -o alview \ alviewcore.cpp alvmisc.cpp -lz -Wall -I/h1/finneyr/samtools-0.1.18/ \ /h1/finneyr/samtools-0.1.18/libbam.a -lm -lz See the WEBSERVER section for Web Server setup. III. Installation Alview is a fast computer program for viewing BAM files. Various versions are available at http://45.56.125.191/ . Windows: run the Windows version installer from http://45.56.125.191/ . APPLE MAC GUI: Download DMG file, click on it. A Mac tarball is also available. Alternately, you can hand install on MAC: Download the "alvmac" (alview mac) binary executable file. Download GENOMEDATADIR data Edit alview.conf Check permission, run "alvmac". Linux: A 64bit Linux version made on an old system is available, so it ~should~ run on later 64 bit Linux versions. Download, untar , set permissions, edit "alview.conf", run alvgtk (alview linux). Hand install is this ... Download the "alvgtk" binary executable file. Download GENOMEDATADIR data Edit alview.conf Check permission, run "alvgtk". IV. Compilation. At the Alview github page, check README.md file for the latest versions with and additional instructions. Remember, you can just install GUIs on your desktop, and some command line versions of Alview, but must compile for webserver or some command lines versions. Install 1 samtools, 2) curses and 3)zlib if they are not already there. Samtools is a non-standard package and must be installed by hand. The package "curses" and "zlib" are required by samtools. Sometimes curses is called "ncurses". Linux compilation is done via the "gogtk" batch file. Edit this file ("gogtk") to point to your samtools. You may have to download gtk2 development library; which is not big deal if you have root or admin privileges. Ask your sysadmin for help or download and make gtk2 in your user local filesystems. Set the appropriate IFDEF flags. These flags are APPLE, UNIX, WINDOWS, WEB_SERVER, CMD_LINE. You can set them via the command line program options as you invoke the compiler. Example compilation on Linux for command line ... # note the -I and -L parameters need to point to *your*' samtools directory gcc -Wall -DUNIX=1 -DCMD_LINE=1 -o alview alviewcore.cpp -I. \ -I/data/nextgen/finneyr/samtools-0.1.18/ \ -L/data/nextgen/finneyr/samtools-0.1.18/ -lbam -lm -lz -lgd -lstdc++ You must also download the large genome data support files targzipped in the file http://45.56.125.191/AlviewGenomeData.tar . From the directory where you want to put a new GENOMEDATA directory, run this command line to prepare this AlviewGenomeData.tar tar xvf AlviewGenomeData.tar (this works on on LINUX, Mac command line or Cygwin [ or cmd.exe with unix utilities ] ) Set up the "alview.conf" file . Edit this file with a text editor. V. Operation as Command Line Utility. This can be used in scripts for batch processing files. It's pretty simple to operate. Note that this an ideal utility for power users who wish to create there own batch files or integrate into existing viewing tools. PERL, Python, shell, Powershell users and users with other programming and scripting skills can use their favorite tools to call alview command line executable to customize their examination of bam files. The output of alview command line is a PNG image file that can produced and dealt with as the user wises. You might have to compile for standalone or download a working version. See compilation section if you need to compile. We provide some command line binaries. You need to setup the "alview.conf" text configuration file to point to the GENOMEDATADIR and have downloaded the *large* GENOMEDATA tar and untar it. Here's a quick example on Windows: alview_cmd_win.exe c:\rich\BAMS\RW1.BWA-aln.RG.MKDUP.bam temptest.png chr17:7512444-7519536 hg19 The actual command name is OS specific. The Alview command line argument are ... arg1 fn_bam = name of bam files (will use to determine bai file) arg2 outimg = output file name ( png file) arg3 position = example chr17:10000-10500 it can handle 17:10000-10500 type positions arg4 blds = genomic build string (example "hg19") (optional) arg5 ih = image height (optional) arg6 iw = image width So, create a batch file to call Alview repeatedly call to make pictures of different BAM files or regions (centered on the SNP of interest). *** THIS PART TAKES SOME USER INVOLVEMENT ** Feel free to improvise ot customize as you see fit. Alview command line just takes a picture. You can add lots of value to this collection of pictures via other tools. You can use whatever tools you want to organize or examine these images. You can view the output in a slideshow, the file "slideshowtemplate.html" is a template file we provide for assisting in this. This is on github. This is just an html file you can hack to point to your images and custom annotation (like from a VCF file or annotated snp caller output). Edit the file "slideshowtemplate.html" like this: Set the "IMGSHERE" and "ANOSHERE" lines with image files and annotation. This is a little work, but makes viewing your massive 40,000 SNP calls easy to review! This is a true power user's tool for reviewing his lab results and prioritizing further work. Actually reviewing your snp calls for anomalies can save you a lot of grief. V. Operation as Web Server For the webserver, create the alview program for webserver and copy the resulting alview to the cgi-bin directory and check permissions. See compilation for examples. Then, set up the genome data directory and edit "alview.conf" properly. Setting up the genome data directory involves this ... Download the *large* genome data support files which are in the tar file t http://45.56.125.191/AlviewGenomeData.tar . (This is my linode server). This is large (1.6 GB) file and takes a while to decompress (via "tar xvf " command) Use this command tar xvf AlviewGenomeData.tar Edit "alview.conf" file to point to the correct genome data location that you just created, make sure cgi programs can access this. If in doubt, just set it up off the executable (cgi-bin) directory as a subdirectory [ the default location should get detected]. VI. Operation as Web Server Alview webserver uses 1 CSS and 3 javascript files. Hand install them from https://github.com/NCIP/alview/blob/master/MISC/ ... Create at "js" directory off of your HTDOCS directory (the Apache default directory). Make sure it's readable. alview.jquery-1.8.3.js - put in "js" of off the HTDOCS (default) directory alview.jquery-ui-1.9.2.custom.js - put in "js" of off the HTDOCS (default) directory alview.jquery.imgareaselect.js - put in "js" of off the HTDOCS (default) directory imgareaselect-default.css - put in HTDOCS (default) directory Compile the file alviewcore.cpp with appropriate OS flag and define WEBSERVER to 1. Install binary in cgi-bin directory and run as web application. Example webserver compilation: g++ -DUNIX -DWEB_SERVER=1 -UQT_GUI -UCOMMAND_LINE -DNATIVE -Wall -o alview \ alviewcore.cpp alvmisc.cpp -lz -Wall -I${SAMTOOLSDIR} ${SAMTOOLSDIR}/libbam.a -lm -lz Edit the text file list of files called "list.public" to list the full paths to your bam files. Place this in the cgi-bin directory. Example "list.public" file is ... #This is list for PUBLIC ALVIEW - public datasets - comments begin with # /tcga_next_gen_a/bam_tier_2/GASTRIC/AGP1_RNA.bam AGP1_RNA T Asian_Gastric_RNA /tcga_next_gen_a/bam_tier_2/GASTRIC/AGP2_RNA.bam AGP2_RNA T Asian_Gastric_RNA /tcga_next_gen_a/bam_tier_2/GASTRIC/AGP3_RNA.bam AGP3_RNA T Asian_Gastric_RNA /tcga_next_gen_a/bam_tier_2/GASTRIC/AGP4_RNA.bam AGP4_RNA T Asian_Gastric_RNA Hooks in the code are available for creating your own security mechanisms. ****This is the user's responsibility**** For instance you can create user logins and examine cookies and you on user databases. Additionally, you could restrict access to certain IP addresses. VII. General Operation Enter the genomic coordinates in the position box or a HUGO gene name. You can drag select with a mouse a region to zoom in. You can use the navigation buttons to move around. You can use the quality flag to view a grayscale graphic of the quality values (if available). See http://45.56.125.191/help.html for latest help file. You can the specified button to call the UCSC browser focused on your region of interest. You ~can~ view a whole chromosome. This may take some time. Users will find zooming out too far is not very informative. VIII. alview.conf file Example "alview.conf" file , this one is setup for windows... This must be a text file, not a word file or html file. ---NEXT LINE IS FIRST LINE # comments stat with # # USE / at end of directories ( backslash for Windows ) # windows ... GENOMEDATADIR=c:\rich\GENOMEDATA\ # Linux ... #GENOMEDATADIR=/home/rfinney/GENOMEDATA/ # mac ... #GENOMEDATADIR=/Users/finneyr/GENOMEDATA/ #end ---PREVIOUS LINE IS LAST LINE OF alview.conf CONFIGURATION TEXT FILE IX. Utilities Programs used to create refseq and snp annotation tracks files are in https://github.com/NCIP/alview/tree/master/UTILS To modify, edit UTILS/ccflat.c UTILS/nsfp2mydb.c and re-compile. Instructions are at very top of file. Please note, the output is *not* necessarily the C/C++ struct record size. Record definition is specified by the field byte sizes which are in the read/write functions in these programs. nsftp2mydb.c is a very new program. These are simple "vi gcc run" C programs. They create raw data, binary search-able flat files for RefSeq and NSFPdb+dbSNP. Input files are from UCSC (reflat and dbsnp)and dbNSFP. dbNSFP is at https://sites.google.com/site/jpopgen/dbNSFP. It is "a database developed for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome". UCSC files are the refflat for the build and a dbsnp file. Currently its hg18.138 hg19.dbsnpComon and hg18.reffat hg19.reflat. X. FAQS - Frequently Asked Questions. -What is Memory Usage? There is a base level, typically very small in a running program. Windows Task Manger reports 11MB memory used on startup. Additional memory usage is based on image size and size of genomic region viewed size. Images are RGB and there is a buffer for XOR selecting zoom region and a "is this space already occupied" buffer. Space is needed for ATCG base content and counts of reference vs. nonreference count calls. So memory usage is basic_amount + (Height * width) * 3 * 3 + region_size + (region_size * 2 * 2); where basic_amount is fairly negligible basic runtime footprint (11MB on Windows) Height is height of image Width is height of image 3 (first 3) is for RGB and 3 (sedond 3) is for 1) main , 2) XORbuffer nd 3) is_spot_occupied images of the same sized image region_size is the number of bases (nucleotides) the user selected to view 2 (first 2) is for the count of bases from reads that map to that location 2 (second 2) is for the count of mismatched bases at that location #01234567890123456789012345678901234567890123456789012345678901234567890123456789 zoomin zoomng