--- name: gflownet description: "Bengio's GFlowNets: Generative Flow Networks that sample proportionally to reward. Diversity over maximization for causal discovery and molecule design." version: 1.0.0 --- # GFlowNet Skill > *"Sample x with probability proportional to R(x), not just maximize R(x)."* > — Yoshua Bengio ## Overview **GFlowNets** (Generative Flow Networks) are a new paradigm: - **RL**: Maximize expected reward → single optimal solution - **MCMC**: Sample from distribution → slow mixing - **GFlowNet**: Learn to sample P(x) ∝ R(x) → fast, diverse sampling ## Core Concept ```latex GFlowNet Objective: ∀ terminal state x: P_θ(x) = R(x) / Z Where: P_θ(x) = probability of generating x via forward policy R(x) = unnormalized reward function Z = partition function (normalizing constant) Key Insight: We DON'T need to know Z to train! ``` ## Architecture ``` ┌─────────────────────────────────────────────────────┐ │ GFlowNet │ ├─────────────────────────────────────────────────────┤ │ Initial State s₀ │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Forward │ P_F(s' | s) = learned policy │ │ │ Policy │ │ │ └──────┬──────┘ │ │ │ sample action │ │ ▼ │ │ ┌─────────────┐ │ │ │ Transition │ s → s' │ │ └──────┬──────┘ │ │ │ │ │ ▼ │ │ ┌─────────────┐ │ │ │ Terminal? │───No──▶ continue │ │ └──────┬──────┘ │ │ │ Yes │ │ ▼ │ │ ┌─────────────┐ │ │ │ R(x) │ Evaluate reward │ │ └─────────────┘ │ └─────────────────────────────────────────────────────┘ ``` ## Training Objectives ### 1. Trajectory Balance (TB) ```python def trajectory_balance_loss(trajectory: List[State], reward: float) -> Tensor: """ TB: Z × Π P_F(s_t → s_{t+1}) = R(x) × Π P_B(s_{t+1} → s_t) In log space: log Z + Σ log P_F = log R + Σ log P_B """ log_Z = self.log_Z # Learnable parameter log_P_F = sum(self.forward_policy.log_prob(s, s_next) for s, s_next in zip(trajectory[:-1], trajectory[1:])) log_P_B = sum(self.backward_policy.log_prob(s_next, s) for s, s_next in zip(trajectory[:-1], trajectory[1:])) loss = (log_Z + log_P_F - torch.log(reward) - log_P_B) ** 2 return loss ``` ### 2. Detailed Balance (DB) ```python def detailed_balance_loss(s: State, s_next: State, reward_s: float) -> Tensor: """ DB: F(s) × P_F(s → s') = F(s') × P_B(s' → s) Where F(s) = learned flow function. """ log_F_s = self.flow_network(s) log_F_s_next = self.flow_network(s_next) log_P_F = self.forward_policy.log_prob(s, s_next) log_P_B = self.backward_policy.log_prob(s_next, s) loss = (log_F_s + log_P_F - log_F_s_next - log_P_B) ** 2 return loss ``` ## Applications ### 1. Molecule Design ```python # GFlowNet for drug discovery class MoleculeGFlowNet: def __init__(self): self.action_space = ['add_atom', 'add_bond', 'terminate'] def sample_molecule(self) -> SMILES: state = EmptyMolecule() while not state.is_terminal(): action = self.forward_policy.sample(state) state = state.apply(action) return state.to_smiles() def reward(self, molecule: SMILES) -> float: # Combines: drug-likeness, binding affinity, synthesizability return docking_score(molecule) * qed(molecule) ``` ### 2. Causal Discovery ```python # GFlowNet for DAG sampling class CausalDAGGFlowNet: def __init__(self, n_variables: int): self.n = n_variables def sample_dag(self) -> DAG: """Sample DAG with P(G) ∝ P(data | G).""" dag = EmptyDAG(self.n) while not dag.is_complete(): edge = self.forward_policy.sample(dag) if not dag.would_create_cycle(edge): dag.add_edge(edge) return dag ``` ### 3. Combinatorial Optimization ```python # GFlowNet for set generation class SetGFlowNet: def sample_set(self, universe: Set) -> Set: """Sample set S with P(S) ∝ R(S).""" current_set = set() for element in self.ordering(universe): include = self.forward_policy.sample(current_set, element) if include: current_set.add(element) return current_set ``` ## GF(3) Triads ``` # Causal-Categorical Triad sheaf-cohomology (-1) ⊗ cognitive-superposition (0) ⊗ gflownet (+1) = 0 ✓ # Diversity Triad persistent-homology (-1) ⊗ glass-bead-game (0) ⊗ gflownet (+1) = 0 ✓ # Sampling Triad three-match (-1) ⊗ epistemic-arbitrage (0) ⊗ gflownet (+1) = 0 ✓ ``` ## Integration with Interaction Entropy ```ruby module GFlowNet def self.sample_proportional(candidates, reward_fn, seed) gen = SplitMixTernary::Generator.new(seed) # Build forward trajectory trajectory = [] state = initial_state until terminal?(state) # Use color to guide sampling color = gen.next_color action = select_action(state, color) next_state = transition(state, action) trajectory << { state: state, action: action, color: color } state = next_state end reward = reward_fn.call(state) { terminal_state: state, reward: reward, trajectory: trajectory, trit: 1 # Generator (creates diverse samples) } end end ``` ## Key Properties 1. **Amortized**: Learn once, sample many times (unlike MCMC per-problem) 2. **Off-policy**: Can train on any trajectories 3. **Diverse**: Samples cover modes proportionally to reward 4. **Compositional**: Build complex objects step-by-step ## References 1. Bengio, E. et al. (2021). "Flow Network Based Generative Models for Non-Iterative Diverse Candidate Generation." 2. Malkin, N. et al. (2022). "Trajectory Balance: Improved Credit Assignment in GFlowNets." 3. Deleu, T. et al. (2022). "Bayesian Structure Learning with Generative Flow Networks." 4. [torchgfn library](https://github.com/GFNOrg/torchgfn) ## Scientific Skill Interleaving This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem: ### Graph Theory - **networkx** [○] via bicomodule - Universal graph hub ### Bibliography References - `dynamical-systems`: 41 citations in bib.duckdb ## SDF Interleaving This skill connects to **Software Design for Flexibility** (Hanson & Sussman, 2021): ### Primary Chapter: 1. Flexibility through Abstraction **Concepts**: combinators, compose, parallel-combine, spread-combine, arity ### GF(3) Balanced Triad ``` gflownet (−) + SDF.Ch1 (+) + [balancer] (○) = 0 ``` **Skill Trit**: -1 (MINUS - verification) ### Secondary Chapters - Ch5: Evaluation - Ch4: Pattern Matching - Ch10: Adventure Game Example ### Connection Pattern Combinators compose operations. This skill provides composable abstractions. ## Cat# Integration This skill maps to **Cat# = Comod(P)** as a bicomodule in the equipment structure: ``` Trit: 0 (ERGODIC) Home: Prof Poly Op: ⊗ Kan Role: Adj Color: #26D826 ``` ### GF(3) Naturality The skill participates in triads satisfying: ``` (-1) + (0) + (+1) ≡ 0 (mod 3) ``` This ensures compositional coherence in the Cat# equipment structure.