Release Notes geWorkbench V2.0.0 June 8th, 2010 Joint Centers for Systems Biology, Columbia University New York, NY 10032 http://www.geworkbench.org ================================================================ Contents ================================================================ 1.0 geWorkbench Installation Notes 2.0 geWorkbench Introduction and History 3.0 New Features and Updates 4.0 Known Issues/Defects 5.0 Bug Reports and Support 6.0 Documentation and Files 7.0 geWorkbench Web Pages ================================================================ 1.0 geWorkbench Installation Notes ================================================================ System Requirements: Java: The Java 6 JRE is required. On Windows and Linux it can be installed separately or together with geWorkbench. On MacOSX, the Java 6 JRE is included with MacOSX versions 10.5 and higher. Please note that Java 6 is, in places referred to by Sun as Java 1.6. 32-bit and 64-bit versions of Java can be used on appropriate platforms. See http://java.sun.com/javase/downloads/index.jsp Memory: At least 2 GB is recommended. geWorkbench 2.0.0 by default will request up to 1 GB of memory for the Java VM. Operating System: Windows XP/Vista/Windows 7 (32 or 64-bit): no special requirements. MacOSX: Version 10.5 or higher is required to provide the Java 6 JRE. Linux: no special requirements known. All three platform-specific versions of geWorkbench (Windows, Linux, and Macintosh) provide an installation wizard (generated using InstallAnywhere). A generic version of geWorkbench, which does not use any installer, is also available. Additional installation details are provided below, and at www.geworkbench.org. All user documentation is maintained in online form at www.geworkbench.org. geWorkbench, unless otherwise noted for particular components, can be run on both 32 and 64-bit operating systems and JREs. Platform-specific release details: 1. Windows (XP/Vista/Windows 7) Special note for Vista/Windows 7 - if you run the installer on Vista or Windows 7, please install geWorkbench to your root directory, e.g. c:\geWorkbench_2.0.0 rather than to C:\Program Files\geWorkbench_2.0.0. File: geWorkbench_v2.0.0_Windows_installer_with_JRE6.exe Includes the 32-bit Sun Java 6 JRE. File: geWorkbench_v2.0.0_Windows_installer_noJRE.exe No JRE is included, you must make sure that an appropriate Java 6 JRE is installed on your system before installing geWorkbench. Download and double-click the installer file to begin installation. 2. MacOSX File: geWorkbench_v2.0.0_MacOSX_installer.zip. This version relies on the Java 6 JRE included with recent updates to the MacOSX operating system. Double-click geworkbench_v2.0.0_MacOSX_installer.zip to begin installation. Notes * Requires Mac OS X 10.5 or later * The compressed installer should be recognized by Stuffit Expander and should automatically be expanded after downloading. If it is not expanded, you can expand it manually using a recent version of StuffIt Expander. 3. Linux File: geWorkbench_v2.0.0_Linux_installer_with_JRE6.bin. Includes the 32-bit Sun Java 6 JRE. The Linux version of geWorkbench relies on X-Windows being installed and running. If you are running Linux on a server and e.g. Windows on your desktop, you will also need to run an X-windows server on your desktop machine. Further information can be found on the Download and Installation page of geworkench.org. After downloading, cd (if needed) to the directory to which you downloaded the installer. To begin the installation, type the command: "sh ./geWorkbench_v2.0.0_Linux_installer_with_JRE6.bin". This will extract geWorkbench into a new directory called geWorkbench_2.0.0. To run geWorkbench, and assuming you are using the Linux bash shell, issue the command: "./rungeWorkbench_2.0.0" 4. Generic - A non-installer-based version of geWorkbench is supplied in a Zip file which should work on any platform. File: geWorkbench_v2.0.0_Generic.zip Installation: Unzip the file. It will create a directory geWorkbench_2.0.0. Running geWorkbench (generic): You must have the Java 6 JRE installed and the JRE must be in the path for geWorkbench. Windows: you can double click on the file "launch_geWorkbench.bat" to launch geWorkbench, or run it from a command window. Linux/Unix: Execute the script "launch_geworkbench.sh". Any: Alternatively, if you have Apache Ant installed, you can type "ant run" in the geWorkbench directory. ================================================================ 2.0 - geWorkbench Introduction and History ================================================================ geWorkBench, an open source bioinformatics platform written in Java, makes sophisticated tools for data management, analysis and visualization available to the community in a convenient fashion. geWorkbench evolved from a project, caWorkbench, which was originally sponsored by the National Cancer Institute Center for Bioinformatics (NCICB). Some of the most fully developed capabilities of the platform include microarray data analysis, pathway analysis, sequence analysis, transcription factor binding site analysis, and pattern discovery. geWorkbench 2.0.0 adds several new components contributed by the MAGNet Center at Columbia University. It also adds file parsers for MAGE-TAB and GEO Soft files, new filtering components, and includes many other new features, enhancements, and bug fixes. ================================================================ 3.0 New Features and Updates ================================================================ Changes included in geWorkbench 2.0.0 New components Skyline - A high-throughput comparative modeling pipeline. It creates structural homology models for protein sequences with similarity to a protein with an experimentally determined 3-D structure. The input is a PDB file. Skybase - SkyBase is a database that stores the homology models built by SkyLine analysis for all NESG PSI2 protein structures. It is queried using FASTA-format protein sequence files. Pudge - Interface to a protein structure prediction server which integrates tools used at different stages of the structural prediction process. Modeling starts with a FASTA-format protein sequence file. Other major new features in release 2.0 * More than 250 "bug reports" were closed. These included many new features, improvements in the usability of numerous components, and actual bug fixes. * Java 6 - Moved from Java 5 to Java 6. geWorkbench now requires Java 6. Works on both 32 bit and 64 bit VMs (JREs). * Look and Feel - Switched to new, more modern Look and Feel (Nimbus). geWorkbench appearance now consistent across all platforms. * caBIO component updated from 4.2 to 4.3. * Cellular Network Knowledge Base (CNKB) - Revamped interface to allow choice of interactome and data types. * File parsers added: MAGE-TAB data matix GEO Soft format - added series (GSE) and curated matrix (GDS). * Filtering - completely revamped - now works directly for all modes, allows specification of minimum % matching arrays before filtering occurs. List of other major changes * caArray - Improved memory usage on downloads from caArray. * CNKB - Can now return markers direct from CNKB without use of Cytoscape. * Color Mosaic - enhancements to display (bug 2147): toggle array names on/off search on array name, accession, or label * Component Configuration Manager - now can filter display list by categories: Analysis, Viewer, Normalizer, Filter * Cytoscape - Corrected mapping between gene names in Cytoscape display and markers in Marker Sets panel (now uses Entrez IDs). * Dendrogram - can now create Array subsets as well as marker subsets. * Markers and Arrays - Hover text available in Markers and Arrays phenotypes to visualize long names if needed. * Marker Annotation - search results can be saved to a text file, including relevant URLs and pathway BioCarta pathway names. * File loading - Checking for "out of memory" errors during file loading. * GUI - in switching to new L&F, fixed many text highlighting problems that were previously seen on Macintosh only but now appeared on Windows also. * File parser menu - The file parser selection menu now shows valid file extensions for each type. * Promoter - JASPAR promoter motifs now filterable by taxon. * Sequence alignment (BLAST) - many enhancements, including added additional databases to match those listed at NCBI improved handling of results from searches containing long query sequences. Online Help chapters updated * CCM * Filtering * Normalization Versions of external files/components included in this release * gene_ontology.1_2.obo downloaded 5/24/2010 from geneontology.org. * Ontologizer.jar version 2.0, file released 3/10/2010. * Jaspar_CORE (http://jaspar.genereg.net/) SQL files last updated on server 10/2009. * JMOL - component updated. ***Changes in previous versions*** Changes included in geWorkbench 1.8.0 New components 1. Gene Ontology Enrichment - Analysis and visual components. Analysis component is built on Ontologizer 2.0. Other changes in release 1.8.0: 1. caArray - Update caArray component to use caArray 2.3.0 Java API. Please note that geWorkbench 1.8.0 is not compatible with earlier versions of caArray. 2. CNKB - The network graph generated by CNKB was only showing nodes centered about a focus node. Now all accepted nodes will be displayed. 3. Dataset History - Additions for several modules. 4. Grid Services - A number of fixes to grid services were made. 5. Marker Annotations - Fixed a problem with retrieving marker annotations when microarray data downloaded from caArray. 6. Mark-Us - JMOL dependency added for molecule display. 7. Promoter - Update JASPAR motifs to release of December 2007. -Note on October 12, 2009 a new version of JASPAR was released which made an incompatible change in the file format. 8. Promoter - component now displays logos using the "Schneider" method, including his "small-value correction", rather than using a previous "in-house" method. 9. Promoter - the displayed data now does not include the effects of the pseudo-count normalization process. 10. Promoter - Added ability to specify pseudocount or select previous hard-coded option of square root of number of sequences. 11. Promoter - Loaded TFs now are properly added to the list of available TFs. 12. Sequence Alignment (BLAST) - PFP filtering option removed 13. Usability fixes - operation of cancel buttons, progress bar. 14. Release Notes - Added specific installation instructions. Online Help chapters updated 1. ANOVA 2. ARACNe 3. CNKB 4. Marker Annotations 5. Master Regulator Analysis 6. Promoter 7. Sequence Alignment (BLAST) Changes included in geWorkbench 1.7.0 New components 1. MarkUs - The MarkUs component assists in the assessment of the biochemical function for a given protein structure. It serves as an interface to the Mark-Us web server at Columbia. Mark-Us identifies related protein structures and sequences, detects protein cavities, and calculates the surface electrostatic potentials and amino acid conservation profile. 2. MRA - The Master Regulator Analysis component attempts to identify transcription factors which control the regulation of a set of differentially expressed target genes (TGs). Differential expression is determined using a t-test on microarray gene expression profiles from 2 cellular phenotypes, e.g. experimental and control. 3. Pudge - Interface to a protein structure prediction server (Honig lab) which integrates tools used at different stages of the structural prediction process. 4. ARACNe2 - upgraded to ARACNe2 distribution from Califano lab, which adds selectable modes (Preprocessing, Discovery, Complete) and a new algorithm (Adaptive Partitioning). Preprocessing allows determination of key parameters from actual input dataset. 5. caGrid v1.3 - Upgrading of grid services to caGrid v1.3 + introduction of caTransfer for large data tranfers. 6. Component Configuration Manager - allows individual components to be loaded into or unloaded from geWorkbench. 7. genSpace collaborative framework - discovery and visualization of workflows. Implemented user registration and preferences. 8. SVM 3.0 (GenePattern) - Support Vector machines for classification. Other changes in release 1.7.0: 1. Analysis - Parameter saving implemented in all components. If current settings match a saved set, it is highlighted. 2. ARACNe - improved description of DPI in Online Help. 3. caArray - query filtering on Array Provider, Organism and Investigator implemented. 4. caArray - can now add a local annotation file to caArray data downloads. 5. caGrid - caGrid connectivity is now built directly in to supported components rather than being a separate component itself. 6. caScript - The caScript editor is no longer supported. 7. Color Mosaic - now interactive with the Marker Sets list and Selection set. 8. Cytoscape - Upgrade to Cytoscape version 2.4 for network visualization and interaction. 9. Cytoscape - Set operations on genes being returned from Cytoscape network visualizations, via right-click menu. 10. Cytoscape - Changes to tag-for-visualization - e.g., now only one way, from marker set to Cytoscape, not vice-versa. 11. Gene Ontology file - the OBO 1.2 file format is supported. 12. Marker Annotations - Direct access to the NCI Cancer Gene Index was added. It supplies detailed literature-based annotations on a curated set of cancer-related genes. 13. Marker Annotations - add export to CSV file. 14. Marker Sets component - a set copy function was added. 15. MINDy - many improvements to display and results filtering - including marker set filtering. 16. Scatter Plot - Up to 100 overlapping points can be displayed in a single tooltip. 17. Various - A number of components were refactored. 18. Workspace saving - now works properly for all components. Changes included in geWorkbench 1.6.3 * geWorkbench 1.6.3 fixes several caArray related issues: - connection issue that may cause a time-out on some machines. - incorrect caching of caArray query results. - duplicate query process removed. Changes included in geWorkbench 1.6.2 * geWorkbench 1.6.2 provides improved proxy communication with its grid service dispatcher component (see Mantis bug 1631). * A problem was fixed in the server-side grid implementation of hierarchical clustering (Mantis bug 1598). Changes included in geWorkbench 1.6.1 * A Java servlet now provides connectivity to the Cellular Networks Knowledge Base database through the firewall. * Online help for the Sequence Retriever component was added. * The GenePix annotation parser was augmented to include more data fields. * Added a missing GenSpace component. * The GenSpace component was moved from the visual area to the command area. * Volcano plot scaling was fixed to display extreme P-values (E-45). Changes included in geWorkbench 1.6.0 * Adds Mindy component * The GO Terms component is not included in this release. It will return in a future release. * Fixed a problem (caused by a change in a server-side URL) with retrieving annotations for genes in Biocarta pathway diagrams (bug 1577). * The default caArray server was set to the production server at NCI (array.nci.nih.gov, port 8080) (bug 1602). The URL for the staging array was updated to array-stage.nci.nih.gov. * An incorrect argument was being sent to NCBI's BLAST server. Due to recent changes there implementing stricter checking, blastn would no longer run. (bug 1597). * Corrected a problem where, when using the adjusted Bonferroni correction, or the Westphal-Young with MaxT, only values with positive fold-changes were returned and displayed (bug 1603). * Added a feature whereby the user is warned before any operation that will alter the dataset, e.g. before filtering out markers, or before a log2 transformation. * Added a feature to allow adding a new empty marker set. This can then be used to receive markers selected interactively in Cytoscape (bug 1541). * Fixed a problem displaying patterns in the sequence viewer after running Pattern Discovery (SPLASH) (bug 1415). * Fixed a problem with displaying adjacency matrices generated by ARACNE in the Cytoscape component (bug 1449). * Numerous changes were made to improve responsiveness, including when - selecting a marker in a large dataset (bug 1346), - right-clicking on Project with a large dataset (bug 1337), - saving a workspace (bug 1525), and - starting an analysis (bug 1544). * Remaining changes, not listed here in detail, included - internal issues within geWorkbench, - improved verification of parameters and set selections before beginning a calculation, - improvements to the graphical user interfaces of many components, and - corrections to the grid implementations of analytical services (Hierarchical Clustering, SOM, ANOVA etc). Changes included in geWorkbench 1.5.1: . * It addresses changes in the APIs for the caArray and caBIO data services since geWorkbench 1.5 was released. geWorkbench 1.5.1 can currently connect with caArray 2.1 and caBIO 4.0/4.1. * It also includes an update to parse the new release 26 of Affymetrix annotation files. * Fixes a problem where annotation information was not associated with arrays that were merged. Changes included in geWorkbench 1.5: New Modules: * ARACNE – gene network reverse engineering (from Andrea Califano's lab at Columbia University, http://wiki.c2b2.columbia.edu/califanolab/index.php/Software). * ANOVA – Analysis of variance, ported from TIGR's MEV, http://www.tm4.org/mev.html). * caArray2.0 connectivity – query for and download data from caArray 2.0 directly into geWorkbench. * Cellular Networks Knowledge Base – database of molecular interactions. (from Andrea Califano's lab at Columbia University, http://amdec-bioinfo.cu-genome.org/html/BCellInteractome.html). * GenSpace - provide social networking capabilities and allow you to connect with other geWorkbench users. * MatrixReduce – transcription factor binding site prediction (from Harmen Bussemaker's lab at Columbia University, http://bussemaker.bio.columbia.edu/software/MatrixREDUCE/). * Analysis components ported from GenePattern (http://www.genepattern.org) - Principle Component Analysis (PCA) - K-nearest neighbors (KNN) - Weighted Voting (WV) New File types supported * The NCBI GEO series matrix file for microarray data (tab-delimited) New server side architecture * Invocation of caGrid services is now delagated to an independent component (the Dispatcher). This makes it possible to exit geWorkbench after submitting a long-running job and then automatically pick up any results next time the application starts. Other changes * The Marker and Array/Phenotypes components now support algebraic operations (union, intersection, xor) on marker and array groups. * Upon exiting the application, the user is prompted to store their workspace. * Workspace persistence problems have been resolved. * The Marker Annotations component has been enhanced in several ways: ** The integration with caBIO has been updated to use API Version 4.0 ** The caBIO Pathway component (previously an independent geWorkbench component that would display BioCarta pathway images) has been integrated into the Marker Annotations component. ** Markers can be returned from BioCarta pathway diagrams. ** A new option is provided to choose between human or mouse CGAP annotation pages. ================================================================ 4.0 Known Issues/Defects ================================================================ Affymetrix Annotation files: Due to licensing restrictions, Affymetrix annotation files cannot be included in this distribution. geWorkbench users who are working with Affymetrix chip data should retrieve the latest version of the appropriate annotation file for the chip type they using directly from https://www.affymetrix.com/site/login/login.affx A free account at Affymetrix.com is required. Current annotation files in CSV format are listed there. If you need an annotation file for an older file you can use its name in the search field on the web page, e.g. "HG_U95Av2". An example file from the Affymetrix site is "HG_U95Av2.na29.annot.csv.zip". This file would need to be unzipped before use. You can place the file in any convenient directory. When you load a new data file, you will be asked for the location of the annotation file and can browse to it. Grid Computations The reference implementations of the server-side grid-enabled algorithms currently are running on a single front-end server not meant for heavy computational use. That server is not configured for computing on large datasets or for long-running jobs. ================================================================ 5.0 Bug Reports and Support ================================================================ Support is provided via online forums at the NCI's Molecular Analysis Tools Knowledge Center. See https://cabig-kc.nci.nih.gov/Molecular/forums/ FAQs and other articles are also available at https://cabig-kc.nci.nih.gov/Molecular/KC/index.php/Main_Page#geWorkbench Finally, please see the geWorkbench project page for additional known issues and FAQs. www.geworkbench.org. ================================================================ 6.0 Documentation and Files ================================================================ The documents and support files in this distribution include: geWorkbench Release Notes: ReleaseNotes_2.0.0.txt (this file) geWorkbench License: geWorkbenchLicense.txt Online Help: Within geWorkbench, users can access "Help Topics" by clicking the top menu. It has detailed information about each module. For other documentation not directly included as part of the distribution, see the following section (7.0) Web Resources. ================================================================ 7.0 geWorkbench Web Resources ================================================================ The geWorkbench team maintains a Wiki containing extensive documentation, a User Manual, tutorials and training slides. It is available at: http://www.geworkbench.org