--- name: coot-model-building description: "Best Practices for Model-Building Tools and Refinement" --- # Key Lessons from Chain A Refinement Session ## Workflow Checklist - Follow This for Model-Building and Refimentment Best practices for fixing any issue: 1. ☐ Center on the interesting residue: `coot.set_go_to_atom_chain_residue_atom_name(chain, resno, "CA")` or interesting postion `coot.set_rotation_centre(x,y,z)` 2. ☐ Check current metrics (Rama/correlation/overlaps) 3. ☐ Make checkpoint if trying something experimental 4. ☐ Apply fix 5. ☐ Re-check metrics to confirm improvement 6. ☐ If worse, restore checkpoint ## Refinement Best Practices ### 1. Make Checkpoints Before Model Changes **CRITICAL: Always create a checkpoint before making significant model changes.** Use `make_backup_checkpoint()` before any operation that might need to be reverted: - Adding/deleting residues - Adding ligands or waters - Major refinement operations - Any experimental model building ```python # Create a named checkpoint before risky operation checkpoint_idx = coot.make_backup_checkpoint(0, "before adding OXT") # Try the operation coot.add_OXT_to_residue(0, "A", 93, "") result = coot.refine_residues_py(0, [["A", 93, ""]]) # Check if it worked - if not, restore if result_is_bad: coot.restore_to_backup_checkpoint(0, checkpoint_idx) ``` **Why checkpoints are better than undo:** - `apply_undo()` only steps back one operation at a time - Checkpoints let you jump back to a specific point - Named checkpoints are self-documenting - Multiple checkpoints allow comparing different approaches ```python # Compare two different approaches checkpoint_before = coot.make_backup_checkpoint(0, "original state") # Try approach 1 coot.auto_fit_best_rotamer(0, "A", 42, "", "", 1, 1, 0.01) results = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]]) score_approach1 = check_correlation(0, "A", 42) checkpoint_approach1 = coot.make_backup_checkpoint(0, "after approach 1") # Restore and try approach 2 coot.restore_to_backup_checkpoint(0, checkpoint_before) coot.pepflip(0, "A", 42, "", "") results = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]]) score_approach2 = check_correlation(0, "A", 42) # Keep the better result if score_approach1 > score_approach2: coot.restore_to_backup_checkpoint(0, checkpoint_approach1) ``` ## MANDATORY: Complete Validation Workflow **CRITICAL: You MUST check ALL validation metrics before AND after EVERY fix.** Fixing only one problem (e.g., Ramachandran) while leaving others (rotamer, density fit) is a FAILED fix. A residue is only "fixed" when ALL metrics are acceptable. ### Before fixing ANY residue: 1. **ALWAYS** center on it: `coot.set_go_to_atom_chain_residue_atom_name(chain, resno, "CA")` 2. **ALWAYS** check ALL of these: - Ramachandran probability (from `all_molecule_ramachandran_score_py`) - Rotamer score (from `rotamer_graphs_py`) - Density correlation - both all-atom and side-chain (from `map_to_model_correlation_stats_per_residue_range_py`) - Atom overlaps involving this residue (from `molecule_atom_overlaps_py`) ### After fixing ANY residue: 3. **ALWAYS** re-check ALL the same metrics, Sometime residues/issues are just not fixable (that's what makes refinement and model-building tricky). 4. **ONLY MOVE** on to the next residue/issue unless you have tried to make all of these are true: - Ramachandran probability > 0.02 (preferably > 0.1) - Rotamer score > 1.0% (preferably > 5%) - Density correlation > 0.7 (all-atom and side-chain, preferably > 0.8) - No severe clashes (< 2.0 Å cubed overlap volume) ### If ANY metric is still bad after your first fix: 5. **MUST** try additional fixes: - Bad rotamer → `auto_fit_best_rotamer()`, and try experiment with following that up with refine_residues_py() for that residue and its upstream and downstream neighbours (if any). - Poor density fit → try alternative rotamers, check for missing atoms - Persistent clashes → refine with the addition of spatial neighbors using `residues_near_residue()` 6. **NEVER** declare a residue "fixed" based on only one metric improving 7. **ALWAYS** re-validate after each additional fix ### Acceptable Reasons to Stop (without perfect metrics): - You've tried multiple approaches and documented them - The best achievable metrics are still recorded - You've created checkpoints to compare approaches - You explain why the residue remains problematic (e.g., poor density, crystal contact) ## Example of CORRECT Workflow ```python # 1. ALWAYS center on problem residue first coot.set_go_to_atom_chain_residue_atom_name("A", 41, "CA") # 2. Get ALL metrics BEFORE rama_data = [r for r in coot.all_molecule_ramachandran_score_py(0)[5:] if r[1] == ['A', 41, '']][0] rama_prob_before = rama_data[2] rotamer_data = [r for r in coot.rotamer_graphs_py(0) if r[0] == 'A' and r[1] == 41][0] rotamer_score_before = rotamer_data[3] corr_data = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0] if s[0][1] == 41][0] correlation_before = corr_data[1][1] overlaps_before = [o for o in coot.molecule_atom_overlaps_py(0, 30) if (o['atom-1-spec'][1:3] == ['A', 41] or o['atom-2-spec'][1:3] == ['A', 41])] print(f"BEFORE: Rama={rama_prob_before:.4f}, Rotamer={rotamer_score_before:.2f}%, Corr={correlation_before:.3f}, Clashes={len(overlaps_before)}") # 3. Apply first fix (e.g., pepflip for backbone) coot.pepflip(0, "A", 41, "", "") coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]]) # 4. Check ALL metrics AFTER first fix rama_prob_after = [r for r in coot.all_molecule_ramachandran_score_py(0)[5:] if r[1] == ['A', 41, '']][0][2] rotamer_score_after = [r for r in coot.rotamer_graphs_py(0) if r[0] == 'A' and r[1] == 41][0][3] correlation_after = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0] if s[0][1] == 41][0][1][1] print(f"AFTER: Rama={rama_prob_after:.4f}, Rotamer={rotamer_score_after:.2f}%, Corr={correlation_after:.3f}") # 5. If rotamer or correlation still bad, DON'T STOP - fix them! if rotamer_score_after < 1.0: print("Rotamer still bad - trying auto_fit_best_rotamer") coot.auto_fit_best_rotamer(0, "A", 41, "", "", 1, 1, 0.01) coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""]]) # 6. ALWAYS re-check after additional fixes rotamer_score_final = [r for r in coot.rotamer_graphs_py(0) if r[0] == 'A' and r[1] == 41][0][3] correlation_final = [s for s in coot.map_to_model_correlation_stats_per_residue_range_py(0, "A", 1, 1, 0)[0] if s[0][1] == 41][0][1][1] print(f"FINAL: Rotamer={rotamer_score_final:.2f}%, Corr={correlation_final:.3f}") # 7. Only NOW can you move to the next residue ``` ## Example of WRONG Workflow (DO NOT DO THIS) ```python # ❌ WRONG: Checking only Ramachandran coot.pepflip(0, "A", 41, "", "") coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""]]) rama_after = coot.all_molecule_ramachandran_score_py(0)[5][39][2] print(f"Ramachandran improved to {rama_after}") # MOVES ON without checking rotamer or density fit - WRONG! # ❌ WRONG: Not centering on residue # Goes straight to fix without set_go_to_atom_chain_residue_atom_name() # ❌ WRONG: Not checking metrics before the fix # How do you know if it improved if you don't know what it was before? # ❌ WRONG: Declaring success with bad rotamer rama = 0.30 # Good! rotamer = 0.0001 # TERRIBLE! correlation = 0.59 # POOR! print("Residue fixed!") # NO IT ISN'T! ``` ## Why This Matters A residue with: - ✅ Good Ramachandran (0.30) - ❌ Terrible rotamer (0.01%) - ❌ Poor density fit (0.59) is NOT fixed. The side chain is clearly wrong. The backbone geometry might be OK, but the model is still incorrect. **ALL metrics must be acceptable before moving on.** ### 2. Extend Selection Around Problem Residues **Don't refine problem residues in isolation** - include neighboring residues for context. - ❌ **Bad:** `refine_residues_py(0, [["A", 41, ""]])` - Often fails to correct the model - ✅ **Good:** `refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]])` **Recommended approach:** - For single problem residue: include ±1 or ±2 neighbors - For consecutive problem residues: include ±1 neighbor on each end - Larger regions (±3-4 residues) can sometimes help severe issues **Example from session:** - Residues 41-42 had severe Ramachandran outliers - Refining just 41-42 failed - Refining 40-43 succeeded: Residue 41 improved from p=0.00004 to p=0.308 **Neighboring Residues:** - You can use coot.residues_near_residue() to find residues that are close in space but distant in sequence, so that they can be added to the residue selection for refinement. ### 3. Include Spatial Neighbours, Not Just Sequence Neighbours **Critical insight:** Residues that are close in 3D space affect each other during refinement, even if they're far apart in sequence. Coot's refinement includes spatially neighbouring atoms in the non-bonded contact interactions, but only the selected residues can move during minimization. If a nearby (but unselected) residue is in the wrong position, it will "push" your selected residues away via non-bonded contact penalties - potentially pushing them out of correct density to avoid the clash with the incorrectly-placed neighbour. **Diagnostic workflow:** 1. **Check for clashes** after refinement: ```python overlaps = coot.molecule_atom_overlaps_py(0, 50) for o in overlaps: spec1, spec2 = o['atom-1-spec'], o['atom-2-spec'] vol = o['overlap-volume'] if vol > 0.5: # Significant clash print(f"{spec1[1]}/{spec1[2]} {spec1[4]} - {spec2[1]}/{spec2[2]} {spec2[4]}: {vol:.2f}") ``` 2. **If a problem residue clashes with a distant residue**, fix the distant residue first: ```python # Example: A/2 has poor correlation (0.13) and clashes with A/89 # First fix A/89: coot.auto_fit_best_rotamer(0, "A", 89, "", "", 1, 1, 0.01) results = coot.refine_residues_py(0, [["A", 88, ""], ["A", 89, ""], ["A", 90, ""]]) # Then re-refine A/2 INCLUDING A/89 as a spatial neighbour: results = coot.refine_residues_py(0, [["A", 1, ""], ["A", 2, ""], ["A", 3, ""], ["A", 89, ""]]) # A/2 correlation improved: 0.13 → 0.81 ``` **Why this matters:** - Coot's refinement "feels" spatial neighbours via non-bonded contact terms - But only selected residues can move during minimization - A/89 was in a wrong position (correlation 0.050) and pushing A/2 away - A/2 moved out of its correct density to reduce the non-bonded penalty with A/89 - Fixing A/89 first put it in the right place, so it no longer pushed A/2 incorrectly **Real example:** ``` Before: A/2 correlation = 0.131, A/89 correlation = 0.050 A/2 CA ↔ A/89 CZ clash: 1.06 Ų After fixing A/89 first, then refining together: A/2 correlation = 0.805, A/89 correlation = 0.928 No clash ``` ### 4. Iterative Refinement Strategy Sometimes multiple rounds of refinement with different selections help: 1. **First pass:** Refine larger region to establish general geometry 2. **Second pass:** Refine smaller region to fine-tune specific problem 3. **Check validation** after each step 4. **Restore checkpoint** if results get worse **Example workflow:** ```python # Create checkpoint first! checkpoint = coot.make_backup_checkpoint(0, "before iterative refinement") # First: larger region results_1 = coot.refine_residues_py(0, [["A", i, ""] for i in range(40, 44)]) check_validation() # Did it help? # Second: targeted refinement results_2 = coot.refine_residues_py(0, [["A", 41, ""], ["A", 42, ""], ["A", 43, ""]]) check_validation() # Better or worse? # If worse: coot.restore_to_backup_checkpoint(0, checkpoint) ``` ### 5. Measure Before and After **Always validate changes objectively** using: - Ramachandran probabilities - Density correlation (all-atom and side-chain) - Geometry statistics ```python def check_residue_validation(imol, chain_id, resno): """Check both Ramachandran and density correlation""" # Get Ramachandran rama_data = coot.all_molecule_ramachandran_score_py(imol) residue_data = rama_data[5] rama_score = None for r in residue_data: if r[1][0] == chain_id and r[1][1] == resno: rama_score = r[2] break # Get density correlation corr_data = coot.map_to_model_correlation_stats_per_residue_range_py( imol, chain_id, 1, 1, 1 ) all_atom_corr = None sidechain_corr = None for r in corr_data[0]: if r[0][1] == resno: all_atom_corr = r[1][1] break for r in corr_data[1]: if r[0][1] == resno: sidechain_corr = r[1][1] break return { 'residue': resno, 'rama_prob': rama_score, 'all_atom_corr': all_atom_corr, 'sidechain_corr': sidechain_corr } # Usage with checkpoint checkpoint = coot.make_backup_checkpoint(0, "before refinement test") before = check_residue_validation(0, "A", 41) coot.refine_residues_py(0, [["A", 40, ""], ["A", 41, ""], ["A", 42, ""], ["A", 43, ""]]) coot.accept_moving_atoms_py() after = check_residue_validation(0, "A", 41) # Compare and decide if after['all_atom_corr'] > before['all_atom_corr']: # Keep it! pass else: # Revert to checkpoint coot.restore_to_backup_checkpoint(0, checkpoint) ``` ### 6. Use Checkpoints and Undo Liberally **Don't be afraid to revert changes:** - `make_backup_checkpoint()` / `restore_to_backup_checkpoint()` - for jumping back to a specific state - `apply_undo()` - for stepping back one operation at a time Use checkpoints when: - Starting a new model-building task - About to try something experimental - Before a series of related operations Use undo when: - The last single operation made things worse - Quick single-step revert needed ### 7. Auto-fit Rotamer for Side-chain Issues **For poor side-chain density correlation**, try `auto_fit_best_rotamer()` first: ```python # Create checkpoint first checkpoint = coot.make_backup_checkpoint(0, "before rotamer fitting") # Check if it's a side-chain problem validation = check_residue_validation(0, "A", 89) if validation['sidechain_corr'] < 0.5: # Try auto-fit rotamer score = coot.auto_fit_best_rotamer(0, "A", 89, "", "", 1, 1, 0.01) if score > 0: # Positive score is good # Check improvement after = check_residue_validation(0, "A", 89) if after['sidechain_corr'] > validation['sidechain_corr']: # Success! (e.g., 0.034 → 0.900) pass else: coot.restore_to_backup_checkpoint(0, checkpoint) else: # Negative score means failure coot.restore_to_backup_checkpoint(0, checkpoint) ``` ### 8. Set Refinement to Synchronous Mode **Always call this at the start** to make refinement complete immediately: ```python coot.set_refinement_immediate_replacement(1) ``` Without this, refinement may be asynchronous and difficult to control programmatically. ### 9. Navigate to Residue Before Working **Bring residue to screen center** so you can watch the refinement: ```python coot.set_go_to_atom_molecule(0) coot.set_go_to_atom_chain_residue_atom_name("A", 41, "CA") ``` This helps with: - Visual inspection of the problem - Seeing the refinement in real-time - Verifying the result makes geometric sense ### 10. Flipping peptides If the Ramachandran Plot is poor, try using `coot.pepflip(imol, chain_id, res_no, ins_code, alt_conf)` followed by a refinement of the residues in the extended region. ### 11. Flipping side-chains terminal Chi-angle If the Rotamer score is poor, try using `coot.do_180_degree_side_chain_flip()` to improve the Rotamer score. It is occasionally useful. ## Key Takeaway **Context matters in refinement.** Including neighboring residues provides the geometric and density context needed for refinement algorithms to find better solutions, especially for severe outliers. **Always checkpoint before changes.** Use `make_backup_checkpoint()` before any significant model modification so you can easily revert if needed.