Directory 'tsv/ensembl-compara/homologies' contains tab-separated value (TSV) dumps of homologies inferred on gene trees. All homology TSV dump files have the same naming convention and fields, as follows. Compara.{release}.{protein|ncrna}_{gene_tree_collection}.homologies.tsv.gz Each homology between a pair of genes is represented on one tab-delimited line, with the following fields: gene_stable_id : gene stable_id of the first homologous gene protein_stable_id : sequence stable_id of the first homologous gene (may be a protein or transcript stable_id, depending on whether the gene is protein-coding) species : name of the genome containing the first homologous gene identity : identity between homologous sequences, expressed as a percentage of the length of the representative sequence of the first homologous gene homology_type : homology type and cardinality (e.g. 'ortholog_one2one') homology_gene_stable_id : gene stable_id of the second homologous gene homology_protein_stable_id : sequence stable_id of the second homologous gene (may be a protein or transcript stable_id) homology_species : name of the genome containing the second homologous gene homology_identity : identity between homologous sequences, expressed as a percentage of the length of the representative sequence of the second homologous gene dn : non-synonymous mutation rate (currently unused) ds : synonymous mutation rate (currently unused) goc_score : gene order conservation (GOC) score of the homology wga_coverage : whole genome alignment (WGA) coverage of the homology is_high_confidence : whether this is considered a 'high-confidence' homology homology_id : unique internal ID of the homology Note that within these files, the order of the first and second homologous genes within each row is arbitrary, and should not be interpreted as conferring any special status on one or the other. Both genes are co-equal participants in a homology relationship. The contents of these files differs depending on their location relative to the directory 'tsv/ensembl-compara/homologies'. Each homology TSV dump file at the top level in directory 'tsv/ensembl-compara/homologies' contains the complete set of available homologies for a given gene-tree collection (e.g. 'default') and member type (e.g. 'protein'). It is recommended to download this complete homology TSV file if you need to access most or all of the homologies in a gene-tree collection. For those who need access to homologies for a subset of genomes in a gene-tree collection, genome-specific homology TSV dump files are available within a subdirectory of 'tsv/ensembl-compara/homologies' named for a particular genome. Subdirectories are structured in a way that mirrors the directory structure for single-species TSV data, with data for genomes from species-specific core databases in a single directory named for that species (e.g. 'tsv/ensembl-compara/homologies/saccharomyces_cerevisiae'), and data for genomes from collection core databases in a directory path which includes the name of the collection core (e.g. 'tsv/ensembl-compara/homologies/fungi_ascomycota2_collection/erysiphe_necator_gca_000798715'). NOTE: To eliminate redundancy, each genome-specific homology TSV file contains an arbitrary subset of orthologies involving the given genome. To access all available orthologies between two genomes (e.g. 'drosophila_melanogaster' and 'saccharomyces_cerevisiae'), you will need to download the genome-specific files of both genomes (e.g. 'drosophila_melanogaster/Compara.116.protein_default.homologies.tsv.gz' and 'saccharomyces_cerevisiae/Compara.116.protein_default.homologies.tsv.gz').