glm-4-6’s worst writing failures look less like “bad style” and more like the model losing control of the boundary between story text and everything that can appear in its training continuations. Under high lyrical pressure—dense imagery, long sentences, and especially endings where it tries to deliver mood, meaning, and closure at once—it starts emitting non-diegetic artifacts as if they were valid next tokens. That’s how you get lines that read like pipeline corruption rather than prose: “crisis_line_748”, stray counters like “639]”, or outright drafting intrusions such as “[650? No, recount needed.]”. These are not random typos; they’re consistent with a weak “output sanitization” prior where local token plausibility outruns the global constraint “this must be clean narrative English.” Once the stream is contaminated, nearby coherence often degrades too, because the same loss of document-level control that lets markup through also weakens the model’s ability to keep sentence structure and referents stable.

The second major mechanism is state-tracking collapse: the model doesn’t reliably preserve a persistent ledger of who is who, what is where, and when things happen as paragraphs accumulate. In climaxes with many props and callbacks, it optimizes each sentence for fresh resonance and forward momentum, but it doesn’t consistently reconcile those sentences with earlier invariants. The result is name-slot swapping (“Elian… Elias… Elliot”), object teleportation, and timeline math that contradicts itself (“It had been centuries…” paired with “seasons later”). You see the same pattern in smaller, physical continuities: an envelope torn open becomes “the sealed envelope,” or a prop is pocketed and later “sat on the desk” without any transfer verb. Mechanistically, this looks like a planning-horizon limit combined with aggressive paraphrase behavior: it can restate motifs vividly, but it doesn’t keep bindings stable across scene beats unless the prompt or structure forces explicit re-anchoring.

When glm-4-6 tries to be cinematic inside that same compressed, poetic mode, its reality-model checks weaken, and metaphor starts getting treated as literal staging without the necessary “magic license.” That’s why errors range from glaring factual slips (“absence of all thirty-two pawns.”) to basic physics violations where the narrative demands stakes but the world can’t support them (“its cable snapped like a taut thread…” followed by a safe landing). The pattern repeats in technical or craft scenes, where it reaches for plausibility via gadgets or jargon but doesn’t simulate the intermediate constraints: “scraped a fragment onto his portable analyzer… displaying fragmented sentences” turns adhesive into exposition, and “subsonic frequencies… studying background radiation” mashes domains into something that sounds technical while being physically incoherent. This is the same underlying bias as the lyrical imagery issue: it chooses high-contrast, high-status tokens that “feel like” the right kind of solution, without a strong downstream verifier that the implied mechanics can actually work.

Plot and explanation then fail in a predictable direction: causality shortcuts. Whenever the story requires friction—guards, locks, investigations, moral dilemmas that must be decided—it often jumps straight to the payoff line and leaves the connective tissue unmodeled. The guards become “momentarily mesmerized,” outcomes arrive because “the method came to him,” and climaxes resolve via undefined transformations like “the seashell’s whispers turned into a confession that unraveled everything.” This isn’t only a plotting weakness; it’s a continuation strategy. If the model can’t (or doesn’t) maintain a multi-step causal chain, it substitutes an “outcome statement” with mystical or pseudo-technical bridging, which feels satisfying at the sentence level but collapses tension because nothing was earned onstage.

These failures cluster because they share the same trigger: “poetic compression” under endpoint pressure. In those moments, glm-4-6 tries to maximize density—image, theme, twist, and closure in one breath—and that overload drives multiple control systems to fail together: cleanliness constraints (artifact leakage), global invariants (names/props/timeline), and world-model plausibility (physics/biology/geometry) all get sacrificed to keep the prose moving. The reader experiences it as a single kind of unreliability: not just that something is wrong, but that the narration can’t be trusted to keep its own rules from one sentence to the next. That is why the errors are so immersion-breaking: a glitch like “a鈥檘” or a sudden “I saw them daily in my shop” isn’t merely awkward—it’s evidence that the model has switched modes midstream, prioritizing the next locally-attractive phrase over the story’s maintained state, camera position, and physical reality.