Posts

Paper deep dives, experiments, and everything in between.

Tutorials·Apr 6, 2026·12 min read
Anatomy of a Claude Code Setup
A visual guide to configuring Claude Code: what goes in your global config, what goes in the project, and how to keep personal settings out of git.
@kraxkrokat
Tutorials·Apr 1, 2026·15 min read
Is That Improvement Real? A visual guide to eval statistics
You changed the prompt and the score went up. Should you ship it? An interactive companion to Anthropic's eval guide that walks through variance, standard error, and paired comparisons on a concrete example — with less assumed stats background.
@kraxkrokat
Paper Deep Dives·Mar 24, 2026·11 min read
Tree of Thoughts: What happens when you let a model explore before committing
How treating reasoning as search — generating, evaluating, and pruning candidate thoughts — unlocked planning capabilities that CoT and ReAct couldn't reach.
@kraxkrokat
Paper Deep Dives·Mar 20, 2026·10 min read
Self-Refine: What happens when you let an LLM critique its own work
How a single model can generate, critique, and revise its own outputs — improving 5–40% across seven tasks without any training.
@kraxkrokat
Experiments·Mar 17, 2026·12 min read
Replicating Reflexion: What happens when you actually run the code
A reimplementation of the Reflexion framework with modern models reveals that stronger models don't improve reflection — they make it worse.
@kraxkrokat
Paper Deep Dives·Mar 14, 2026·9 min read
Reflexion: What happens when an agent can learn from its own mistakes
The Reflexion paper added memory across attempts — letting agents reflect on failures in natural language and improve without retraining.
@kraxkrokat
Paper Deep Dives·Mar 10, 2026·10 min read
ReAct: How giving LLMs the ability to think and act changed everything
How the ReAct paper established the think-act-observe loop that powers every modern agent system.
@kraxkrokat
Paper Deep Dives·Mar 5, 2026·11 min read
Chain-of-Thought: The prompting trick that unlocked reasoning in language models
How a simple prompting technique unlocked multi-step reasoning in LLMs — and why it only works above a critical scale threshold.
@kraxkrokat
8 posts