--- name: bio-phylo-distance-calculations description: Compute evolutionary distances and build phylogenetic trees using Biopython Bio.Phylo.TreeConstruction. Use when creating distance matrices from alignments, building NJ/UPGMA trees, or generating bootstrap consensus trees. tool_type: python primary_tool: Bio.Phylo.TreeConstruction --- # Distance Calculations and Tree Building Compute distances from alignments and construct phylogenetic trees. ## Required Import ```python from Bio import Phylo, AlignIO from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor from Bio.Phylo.TreeConstruction import DistanceMatrix from Bio.Phylo.TreeConstruction import ParsimonyScorer, ParsimonyTreeConstructor, NNITreeSearcher from Bio.Phylo.Consensus import strict_consensus, majority_consensus, bootstrap_trees, bootstrap_consensus ``` ## Distance Matrix from Alignment ```python from Bio import AlignIO from Bio.Phylo.TreeConstruction import DistanceCalculator alignment = AlignIO.read('alignment.fasta', 'fasta') # Create calculator with distance model calculator = DistanceCalculator('identity') # Simple identity-based distance dm = calculator.get_distance(alignment) print(dm) # Available models for DNA calculator = DistanceCalculator('blastn') # BLASTN-style distance # Available models for protein calculator = DistanceCalculator('blosum62') # BLOSUM62-based distance ``` ## Available Distance Models | Model | Type | Description | |-------|------|-------------| | `identity` | DNA/Protein | 1 - (identical positions / total) | | `blastn` | DNA | BLASTN scoring distance | | `trans` | DNA | Transition/transversion weighted | | `blosum62` | Protein | BLOSUM62 matrix distance | | `blosum45` | Protein | BLOSUM45 matrix distance | | `blosum80` | Protein | BLOSUM80 matrix distance | | `pam250` | Protein | PAM250 matrix distance | | `pam30` | Protein | PAM30 matrix distance | ## Building Trees with Distance Methods ### Neighbor Joining (NJ) ```python from Bio import AlignIO from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor alignment = AlignIO.read('alignment.fasta', 'fasta') calculator = DistanceCalculator('identity') dm = calculator.get_distance(alignment) constructor = DistanceTreeConstructor() nj_tree = constructor.nj(dm) Phylo.draw_ascii(nj_tree) ``` ### UPGMA ```python constructor = DistanceTreeConstructor() upgma_tree = constructor.upgma(dm) Phylo.draw_ascii(upgma_tree) ``` ### One-Step Tree Building ```python # Build tree directly from alignment constructor = DistanceTreeConstructor(calculator, 'nj') tree = constructor.build_tree(alignment) # Or with UPGMA constructor = DistanceTreeConstructor(calculator, 'upgma') tree = constructor.build_tree(alignment) ``` ## Pairwise Distances Between Taxa ```python from Bio import Phylo tree = Phylo.read('tree.nwk', 'newick') # Distance between two taxa (sum of branch lengths) taxon1 = tree.find_any(name='Human') taxon2 = tree.find_any(name='Mouse') dist = tree.distance(taxon1, taxon2) print(f'Distance Human-Mouse: {dist:.4f}') # All pairwise distances terminals = tree.get_terminals() for i, t1 in enumerate(terminals): for t2 in terminals[i+1:]: d = tree.distance(t1, t2) print(f'{t1.name}-{t2.name}: {d:.4f}') ``` ## Creating Distance Matrix Manually ```python from Bio.Phylo.TreeConstruction import DistanceMatrix names = ['A', 'B', 'C', 'D'] # Lower triangular matrix (including diagonal) matrix = [ [0], [0.1, 0], [0.2, 0.15, 0], [0.3, 0.25, 0.2, 0] ] dm = DistanceMatrix(names, matrix) print(dm) # Build tree from custom matrix constructor = DistanceTreeConstructor() tree = constructor.nj(dm) ``` ## Parsimony Tree Construction ```python from Bio import AlignIO, Phylo from Bio.Phylo.TreeConstruction import ParsimonyScorer, NNITreeSearcher, ParsimonyTreeConstructor alignment = AlignIO.read('alignment.fasta', 'fasta') # Create scorer and searcher scorer = ParsimonyScorer() searcher = NNITreeSearcher(scorer) # Build parsimony tree (needs starting tree) constructor = DistanceTreeConstructor(DistanceCalculator('identity'), 'nj') starting_tree = constructor.build_tree(alignment) pars_constructor = ParsimonyTreeConstructor(searcher, starting_tree) pars_tree = pars_constructor.build_tree(alignment) print(f'Parsimony score: {scorer.get_score(pars_tree, alignment)}') Phylo.draw_ascii(pars_tree) ``` ## Bootstrap Analysis ```python from Bio import AlignIO from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor from Bio.Phylo.Consensus import bootstrap_trees, bootstrap_consensus, majority_consensus alignment = AlignIO.read('alignment.fasta', 'fasta') calculator = DistanceCalculator('identity') constructor = DistanceTreeConstructor(calculator, 'nj') # Generate bootstrap trees boot_trees = list(bootstrap_trees(alignment, 100, constructor)) print(f'Generated {len(boot_trees)} bootstrap trees') # Get bootstrap consensus consensus = bootstrap_consensus(alignment, 100, constructor, majority_consensus) Phylo.draw_ascii(consensus) ``` ## Consensus Tree Methods ```python from Bio.Phylo.Consensus import strict_consensus, majority_consensus, adam_consensus trees = list(Phylo.parse('bootstrap.nwk', 'newick')) # Strict consensus (only clades in ALL trees) strict = strict_consensus(trees) # Majority rule consensus (clades in >50% of trees) majority = majority_consensus(trees, cutoff=0.5) # Adam consensus adam = adam_consensus(trees) Phylo.draw_ascii(majority) ``` ## Tree Depths and Total Length ```python tree = Phylo.read('tree.nwk', 'newick') # Total branch length total = tree.total_branch_length() print(f'Total branch length: {total:.4f}') # Depths from root to each node depths = tree.depths() for clade, depth in depths.items(): if clade.is_terminal(): print(f'{clade.name}: {depth:.4f}') # Maximum depth (tree height) tree_height = max(depths.values()) print(f'Tree height: {tree_height:.4f}') ``` ## Comparing Tree Distances ```python tree1 = Phylo.read('tree1.nwk', 'newick') tree2 = Phylo.read('tree2.nwk', 'newick') # Compare total branch lengths len1 = tree1.total_branch_length() len2 = tree2.total_branch_length() print(f'Tree 1 total: {len1:.4f}') print(f'Tree 2 total: {len2:.4f}') # Compare specific pairwise distances taxa = ['Human', 'Mouse'] t1 = [tree1.find_any(name=t) for t in taxa] t2 = [tree2.find_any(name=t) for t in taxa] d1 = tree1.distance(t1[0], t1[1]) d2 = tree2.distance(t2[0], t2[1]) print(f'Human-Mouse distance: Tree1={d1:.4f}, Tree2={d2:.4f}') ``` ## Complete Pipeline: Alignment to Bootstrapped Tree ```python from Bio import AlignIO, Phylo from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor from Bio.Phylo.Consensus import bootstrap_consensus, majority_consensus alignment = AlignIO.read('sequences.aln', 'clustal') print(f'Alignment: {len(alignment)} sequences, {alignment.get_alignment_length()} positions') calculator = DistanceCalculator('identity') constructor = DistanceTreeConstructor(calculator, 'nj') # Build simple tree simple_tree = constructor.build_tree(alignment) simple_tree.ladderize() # Build bootstrap consensus (100 replicates) consensus_tree = bootstrap_consensus(alignment, 100, constructor, majority_consensus) consensus_tree.ladderize() Phylo.write(simple_tree, 'nj_tree.nwk', 'newick') Phylo.write(consensus_tree, 'bootstrap_consensus.nwk', 'newick') ``` ## Quick Reference: Distance Models ### DNA Models | Model | Description | |-------|-------------| | `identity` | Simple mismatch counting | | `blastn` | BLASTN-style scoring | | `trans` | Weights transitions vs transversions | ### Protein Models | Model | Description | |-------|-------------| | `blosum62` | General proteins | | `blosum45` | Divergent proteins | | `blosum80` | Similar proteins | | `pam250` | Distant homologs | | `pam30` | Close homologs | ## Related Skills - tree-io - Save constructed trees to files - tree-visualization - Draw resulting trees - tree-manipulation - Root and process built trees - alignment-io - Read alignments for tree building - msa-statistics - Alignment quality before tree building