TL;DR
Anthropic published lessons from running hundreds of Claude Code Skills across its engineering organization, describing Skills as folders that bundle instructions, scripts, references, templates and guardrails. The report says verification-focused Skills had the biggest measured effect on output quality, though broader adoption practices are still developing.
Anthropic has published lessons from running hundreds of Claude Code Skills across its engineering organization, saying the reusable units helped move teams away from repeated ad-hoc prompting and toward shared, versioned workflows for AI coding agents.
The post, titled “Lessons from building Claude Code: How we use skills” and attributed to Thariq Shihipar, describes a Skill as a folder the agent can discover, read and run, rather than a saved prompt. Anthropic says such folders can include instructions, scripts, references, templates, configuration files and hooks.
According to the source material, Anthropic grouped its internal Skills into nine categories: library and API reference, product verification, data fetching and analysis, business-process automation, code scaffolding and templates, code quality and review, CI/CD and deployment, runbooks, and infrastructure operations.
The strongest claim in the material is attributed to Anthropic’s own measurement: verification Skills, which check whether work is correct, had the largest effect on output quality. The exact measurement method, sample size and benchmark design were not included in the provided source text.
A Skill is a folder, not a prompt
Anthropic published what it learned running hundreds of Skills across its own engineering org. Read as a business memo, the point is bigger than a coding trick: this is how ad-hoc prompting becomes durable institutional capability — the SOPs your agents actually follow, versioned and shared.
“A Skill is just a clever markdown prompt you save in a file.”
A folder the agent can discover, read & run — instructions, scripts, references, templates, config & on-demand hooks.
The knowledge of how your organization actually operates can be captured, versioned, shared & executed — and the thing capturing it is a humble folder with a script and a gotchas list inside. For the builder, that’s context engineering with real tools attached. For whoever owns the budget, it’s the difference between AI that starts from zero every morning and an asset that compounds. Caveats: best practices are still evolving, checked-in Skills cost context, and curation beats accumulation. Start with one Skill, one gotcha, and the category that catches your mistakes.
The report matters because it frames AI agent setup as organizational infrastructure, not just individual prompt craft. If a Skill stores a team’s instructions, scripts and caveats in one reusable folder, the same working knowledge can be shared, reviewed and improved like other engineering assets.
For companies using coding agents, the practical point is consistency. Anthropic’s account suggests Skills can reduce the need for workers to restate the same rules each day and may help agents apply team-specific standards, especially around verification and review.
AI development automation tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Claude Code’s Folder Model
The Thorsten Meyer AI write-up describes the key correction as simple: a Skill is a folder, not a clever markdown file. In that model, SKILL.md provides root instructions and a trigger description, while subfolders such as references, scripts and assets supply deeper material only when needed.
The source compares the design to giving a new hire a short operating guide that points to detailed documentation. Anthropic’s suggested practices include writing descriptions for the model, adding scripts instead of only prose, using on-demand guardrails, and keeping room for the agent to adapt.
“A Skill is a folder, not a prompt.”
— Thorsten Meyer AI summary
AI code review software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Measurement Details Still Missing
Several points remain unclear from the provided material. The source does not give the underlying data for Anthropic’s quality measurement, how many teams were included, how output quality was scored, or whether the results apply outside Claude Code and Anthropic’s own engineering practices.
It is also not yet clear how quickly Skills become hard to maintain at scale. The source notes that best practices are still evolving, that checked-in Skills can cost context, and that curation matters more than accumulation.
AI workflow automation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Teams Test Skill Libraries
The next step for adopters is likely to be small pilots, especially around verification Skills that catch mistakes in existing workflows. The source recommends starting with one Skill, one known caveat and the category most likely to prevent repeated errors.
Anthropic’s documentation at code.claude.com/docs/en/skills is the cited reference for implementation details. Readers should expect practices around Skill design, memory, hooks and governance to keep changing as more teams test agent workflows in production settings.
AI scripting and reference tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
What did Anthropic publish?
Anthropic published a June 3, 2026 Claude blog post about how it uses Claude Code Skills across its engineering organization.
What is a Skill in this report?
A Skill is described as a folder an agent can discover, read and run. It can contain instructions, scripts, references, templates, configuration and hooks.
Which Skill type had the biggest reported effect?
According to the source material, verification Skills had the largest measured effect on output quality, though the provided material does not include the full measurement details.
Why does this matter for engineering teams?
The report suggests teams can turn repeated instructions into shared workflows that are versioned and reused, reducing reliance on one-off prompting.
What is still uncertain?
The source does not confirm how well Anthropic’s findings generalize to other companies, other agents or non-engineering teams. Maintenance costs and governance practices are also still developing.
Source: Thorsten Meyer AI