Posts

Paper deep dives, experiments, and everything in between.

Tutorials·Apr 6, 2026·9 min read

Claude Code Has a Configuration System. You Should Use It.

I used Claude Code for a year before I started customizing it. Skills, hooks, MCP servers, project instructions. Here's the mental model and my actual config.

@kraxkrokat

Tutorials·Apr 1, 2026·15 min read

Is That Improvement Real? A visual guide to eval statistics

You changed the prompt and the score went up. Should you ship it? An interactive companion to Anthropic's eval guide that walks through variance, standard error, and paired comparisons on a concrete example — with less assumed stats background.

@kraxkrokat

Paper Deep Dives·Mar 24, 2026·11 min read

Tree of Thoughts: What happens when you let a model explore before committing

How treating reasoning as search — generating, evaluating, and pruning candidate thoughts — unlocked planning capabilities that CoT and ReAct couldn't reach.

@kraxkrokat

Paper Deep Dives·Mar 20, 2026·10 min read

Self-Refine: What happens when you let an LLM critique its own work

How a single model can generate, critique, and revise its own outputs — improving 5–40% across seven tasks without any training.

@kraxkrokat

Experiments·Mar 17, 2026·12 min read

Replicating Reflexion: What happens when you actually run the code

A reimplementation of the Reflexion framework with modern models reveals that stronger models don't improve reflection — they make it worse.

@kraxkrokat

Paper Deep Dives·Mar 14, 2026·9 min read

Reflexion: What happens when an agent can learn from its own mistakes

The Reflexion paper added memory across attempts — letting agents reflect on failures in natural language and improve without retraining.

@kraxkrokat

Paper Deep Dives·Mar 10, 2026·10 min read

ReAct: How giving LLMs the ability to think and act changed everything

How the ReAct paper established the think-act-observe loop that powers every modern agent system.

@kraxkrokat

Paper Deep Dives·Mar 5, 2026·11 min read

Chain-of-Thought: The prompting trick that unlocked reasoning in language models

How a simple prompting technique unlocked multi-step reasoning in LLMs — and why it only works above a critical scale threshold.

@kraxkrokat

8 posts