Anthropic’s lineup got two new names recently — Claude Fable 5 and Claude Mythos 5 — sitting above an already-strong Opus tier (Opus 4.6, 4.7, and 4.8). The marketing pages don’t really tell you what to use when, and the API differences are significant enough that a naive model-string swap will break production code. This post is the guide I wish existed: what these models are, how they relate to each other, exactly what changes at the API level, what changes behaviorally, and how to decide which one your workload actually needs.

Short version: Fable 5 is Anthropic’s most capable widely released model. Mythos 5 is the same model under a different name, accessible only through Project Glasswing. Both behave very differently from Opus on the API surface, and the pricing reflects it.

A quick map of the current lineup

Before diving into Fable specifically, it helps to see where everything sits. Anthropic’s models currently form four tiers:

Model ID Context Max Output Input $/MTok Output $/MTok Availability
Claude Fable 5 claude-fable-5 1M 128K $10.00 $50.00 Widely released
Claude Mythos 5 claude-mythos-5 1M 128K $10.00 $50.00 Project Glasswing only
Claude Opus 4.8 claude-opus-4-8 1M 128K $5.00 $25.00 Widely released
Claude Opus 4.7 claude-opus-4-7 1M 128K $5.00 $25.00 Widely released
Claude Opus 4.6 claude-opus-4-6 1M 128K $5.00 $25.00 Widely released
Claude Sonnet 4.6 claude-sonnet-4-6 1M 64K $3.00 $15.00 Widely released
Claude Haiku 4.5 claude-haiku-4-5 200K 64K $1.00 $5.00 Widely released

Three things to absorb from this table:

  1. Fable 5 and Mythos 5 are priced at exactly 2× Opus 4.8. That premium has to earn itself per-workload, and it doesn’t always — more on that below.
  2. Everything Opus-tier and above now has a 1M-token context window at standard pricing — no long-context surcharge. The context window is no longer the differentiator between tiers; capability is.
  3. The Opus family is three generations deep and all still active. Opus 4.8 is the current default “smartest reasonable choice”; 4.7 and 4.6 remain pinnable for teams mid-migration.

One thing that trips people up: model IDs are exact strings. It’s claude-fable-5, not claude-fable-5-20260601 or any date-suffixed variant. Anthropic’s aliases are complete as-is.

Fable 5 vs Mythos 5: same model, different door

This part is simple, and worth stating plainly so nobody overthinks it.

Fable 5 and Mythos 5 are the same underlying model — same capabilities, same pricing, same context window, same API surface, same behavioral characteristics, same safety classifiers, same data-retention requirement. Every API note in this post applies to both. The only thing that differs is how you reach them.

  • Mythos 5 (claude-mythos-5) is available exclusively through Project Glasswing. If your organization isn’t a Glasswing participant, you cannot call it — there is no waitlist workaround, no regional exception. Mythos 5 succeeds the older invitation-only Claude Mythos Preview (claude-mythos-preview), which is now a migration source, not a destination.
  • Fable 5 (claude-fable-5) is the widely released version. Anyone with an Anthropic API account (meeting the data-retention requirement — see below) can call it.

Why two names for one model?

Access channels. Project Glasswing participants get the model under the Mythos brand as a continuation of the Mythos Preview program they were already in; everyone else gets the same model as Fable 5. Think of it like an airline selling the same seat under two fare codes — the seat doesn’t change.

Practical guidance

  • Writing docs, blog posts, or sample code? Use claude-fable-5. Your readers can actually run it.
  • Migrating off Mythos Preview? Target claude-mythos-5 if your org is in Glasswing, claude-fable-5 otherwise. The migration is mostly a model-ID swap because the tokenizer family matches — but you must remove any thinking configuration and any assistant prefill (details in the migration section).
  • Building a product where some customers are Glasswing orgs and some aren’t? Parameterize the model ID. Everything else about the integration is identical.

From here on I’ll say “Fable 5” and mean both.

Where Fable 5 actually shines

It’s tempting to read “most capable model” as “Opus 4.8 but a bit better at everything.” That’s not the right mental model. Fable 5’s gains are concentrated on work above what prior models could do at all:

  • Long-horizon autonomous runs. Overnight coding sessions, multi-hour refactors, research agents that have to stay coherent across hundreds of tool calls without a human correcting course.
  • First-shot implementations of well-specified systems. Give it a complete spec up front and it will often deliver a working system in one pass where Opus 4.8 needed iteration.
  • End-to-end enterprise deliverables. Financial analysis, spreadsheets, slide decks, formal documents — work where the artifact has to land correct, not approximately correct.
  • Repository-scale code review and debugging. Including searching repository history to find when and why a behavior changed. (Note: the bug-finding gains explicitly exclude security-focused analysis — the cyber safety classifiers apply there, which is part of why the refusal handling below matters.)
  • Vision on dense or degraded images. It’s explicitly trained to use bash and crop tools on flipped, blurry, or noisy inputs rather than giving up.
  • Parallel sub-agent delegation. It reliably sustains ongoing communication with long-running sub-agents and peer agents — delegation is a strength to lean on, not a behavior to suppress.
  • Navigating ambiguity. It’s a stronger thought partner: it asks the right scoping questions, pushes back when a plan is wrong, and infers intent from context.

The teams with the best early outcomes gave it their hardest unsolved problems first — had it scope the problem, ask questions, then execute. If you evaluate Fable 5 only on workloads Opus 4.8 already handles, you’ll conclude it’s an expensive sidegrade. The win shows up on the failure cases the previous frontier didn’t reach.

Fable 5 vs the Opus family: the API-level differences

This is where the practical deltas live. Moving from Opus 4.7 or 4.8 to Fable 5 is not just a model-string swap — the request shape changes, the response shape changes, and some failure modes are genuinely new. Each item below tells you what changed, why it matters, and what to do about it.

1. Thinking is always on. You don’t configure it — at all.

The history here, briefly: Opus 4.6 introduced adaptive thinking (thinking: {type: "adaptive"}) alongside the old fixed-budget extended thinking. Opus 4.7 removed budget_tokens entirely — adaptive became the only on-mode, but thinking still defaulted to off and you could pass {type: "disabled"} explicitly. Opus 4.8 kept that surface unchanged.

Fable 5 takes the final step: thinking is always on, and any attempt to configure it beyond {type: "adaptive"} is rejected.

Request Opus 4.8 Fable 5
thinking omitted Runs without thinking Adaptive thinking applies automatically
thinking: {type: "adaptive"} Adaptive thinking on Accepted (same as omitting)
thinking: {type: "disabled"} Accepted — thinking off 400 error
thinking: {type: "enabled", budget_tokens: N} 400 error 400 error

Note the subtle trap in the first row: omitting thinking means off on Opus 4.8 but on on Fable 5. Code that “doesn’t use thinking” on Opus is silently using it on Fable — which is fine (and intended), but it changes your latency and token profile.

There is no thinking token budget on Fable 5 and no replacement for one. The depth control is output_config.effort:

client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    output_config={"effort": "high"},   # low | medium | high | xhigh | max
    messages=[{"role": "user", "content": "..."}],
)

2. Effort is the primary control — and lower settings are surprisingly good

output_config.effort controls how much the model thinks and acts: how many tool calls it makes, how much it deliberates, how thoroughly it verifies. On Fable 5 it matters more than any other parameter you can set. The levels:

Level Use for What to expect
max Extremely hard, latency-insensitive problems The ceiling. Most rigorous verification; can overthink routine work
xhigh The most capability-sensitive coding/agentic workloads Sustained deep work; pair with large max_tokens
high The recommended default for most tasks Strong reasoning with sane latency
medium Routine work, cost-sensitive paths Fewer, more consolidated tool calls; terser output
low Sub-agents, scoped tasks, latency-sensitive routes Fast and direct

The non-obvious finding from early adopters: lower effort settings on Fable 5 often exceed the xhigh/max performance of previous models. A low-effort Fable 5 call can outperform a maxed-out older model on routine work, at a fraction of the wall-clock time. So don’t migrate by mapping “we used max before → use max now.” Run an effort sweep — medium, high, xhigh — on your own eval set per route, and let the data pick. The relationship between effort and total cost isn’t even monotonic on agentic work: higher effort up front often means smarter planning, fewer wasted tool calls, fewer turns, and lower total cost.

Also inherited from Opus 4.7+: sampling parameters (temperature, top_p, top_k) are removed and return 400. If you were using temperature = 0 for determinism, use lower effort plus a tighter prompt; if you were using high temperature for variety, ask the model for variety explicitly in the prompt.

3. The raw chain of thought is never returned

On Fable 5, the model’s actual reasoning trace is never exposed over the API — to anyone, under any setting. What you receive are regular thinking blocks whose contents depend on thinking.display:

  • display: "omitted" (the default) — thinking blocks arrive with an empty text field. Thinking still happened and is billed identically; you just don’t see it.
  • display: "summarized"thinking blocks contain a readable summary of the reasoning, not the raw trace.

Two practical consequences:

The UX one. If your product streams responses, the default looks like a long, dead pause before output begins — possibly minutes long on hard tasks. If you show users any kind of “thinking” indicator, set thinking: {type: "adaptive", display: "summarized"} explicitly so there’s visible progress. (This default also silently changed back at Opus 4.7 — Opus 4.6 defaulted to summarized — so teams coming from 4.6 get bitten twice.)

The multi-turn one. When continuing a conversation on the same model, pass thinking blocks back exactly as received — including the ones with empty text. The API rejects blocks that have been modified, not blocks you’ve read; displaying the summary is fine, editing or reconstructing blocks is not. And if you replay a Fable 5 conversation on a different model, its thinking blocks are silently dropped from the prompt before pricing — you aren’t billed for them and there’s nothing to strip. Don’t write stripping logic; it can trigger ordering errors and solves a problem that doesn’t exist.

Related corner: prompts that try to elicit the model’s internal reasoning in the response text can be refused with stop_details.category: "reasoning_extraction". If your application needs reasoning visibility, read the summarized thinking blocks — don’t prompt for the raw trace.

4. The refusal stop reason is a first-class response you must handle

Fable 5 runs safety classifiers on incoming requests, primarily targeting research biology and most cybersecurity content — domains the model is explicitly not intended for. The important operational fact: benign adjacent work can occasionally trigger false positives. Security tooling, life-sciences data pipelines, even vivid fiction touching the wrong themes. That’s why the handling below matters even for completely legitimate workloads.

A declined request is not an HTTP error. It’s a successful 200 with stop_reason: "refusal" plus a stop_details object carrying a category ("cyber", "bio", "reasoning_extraction", "frontier_llm", or null). The classifier can fire:

  • Before any outputcontent is an empty array, and the request is not billed at all (no input tokens, no rate-limit consumption).
  • Mid-stream — after partial output. The already-streamed portion is billed; discard it rather than treating it as a complete answer.

Code written for older models that reads response.content[0] unconditionally breaks immediately:

response = client.messages.create(model="claude-fable-5", ...)

if response.stop_reason == "refusal":
    # content is empty (pre-output) or partial (mid-stream) — don't read it as an answer
    category = response.stop_details.category if response.stop_details else None
    handle_refusal(category)
else:
    print(response.content[0].text)

Branch on stop_reason, never on stop_detailsstop_details is informational and can be null even on a refusal.

Retrying on a fallback model. For production traffic the standard pattern is to retry refused requests on Opus 4.8, which doesn’t run these classifiers. Three options, in order of preference:

  1. Server-side fallbacks parameter (beta, header server-side-fallback-2026-06-01) — one round trip; the API runs Opus 4.8 on the same request when Fable 5 declines and returns its answer, with credit-style repricing applied automatically:
response = client.beta.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    betas=["server-side-fallback-2026-06-01"],
    fallbacks=[{"model": "claude-opus-4-8"}],
    messages=[{"role": "user", "content": "..."}],
)
# a `fallback` content block marks the switch point;
# usage.iterations tells you which model actually served the response
  1. SDK client-side middlewareBetaRefusalFallbackMiddleware registered on the client retries refusals automatically (streaming included), and BetaFallbackState pins follow-up turns to the model that accepted. This is the path on providers without server-side fallbacks (Amazon Bedrock, Vertex AI, Microsoft Foundry). Create one state object per conversation — it’s the pinning scope.

  2. Hand-rolled retry — detect via stop_reason, re-send the conversation as-is on claude-opus-4-8 (its thinking blocks get dropped automatically — no cleanup needed), and keep using the fallback model for subsequent turns. The beta fallback credit mechanism makes these retries cheaper by honoring your prompt-cache spend on the new model.

Whichever you pick, size your Opus 4.8 rate limits for expected refusal volume — a fallback you can’t serve is just a slower refusal.

5. No assistant prefill

The old trick of forcing a response shape by ending messages with {role: "assistant", content: "{"} returns 400 — same as on the whole 4.6+ family. The replacements, depending on what the prefill was doing:

Prefill was for Use instead
Forcing JSON / schema output output_config.format with a json_schema (structured outputs)
Forcing a classification label A tool with an enum field, or structured outputs
Skipping preambles (“Here is the summary:”) System prompt: “Respond directly without preamble”
Continuing an interrupted response Put the continuation request in the user turn

6. 30-day data retention is required

Fable 5 is not available under zero data retention. If your org’s retention configuration is below 30 days, every Fable 5 request returns 400 invalid_request_error — with a perfectly valid payload. This is the single most confusing failure mode in the whole migration, because the error looks like a malformed request and nothing in the body is wrong. If a Fable 5 rollout suddenly 400s across the board, check the org’s retention configuration before debugging the payload. Opus 4.8 has no such requirement, which makes it the natural home for ZDR orgs.

(On Amazon Bedrock, Google Vertex AI, and Microsoft Foundry, retention requirements are set by each platform rather than this rule.)

7. Same tokenizer as Opus 4.8

Fable 5 uses the same tokenizer as Opus 4.8 (the one introduced with Opus 4.7). Token counts are roughly unchanged when migrating from Opus 4.7/4.8 or Mythos Preview — your max_tokens settings, compaction triggers, and context budgets carry over; only the per-token price differs. Coming from Opus 4.6, Sonnet, or Haiku, the tokenizer is different (roughly 1×–1.35× more tokens for the same content) — re-baseline with the count_tokens endpoint on representative prompts rather than applying a blanket multiplier.

8. Turns can run for minutes — design for it

This is the biggest structural shift and it isn’t a parameter. A single request on a hard task at higher effort can run many minutes — a 15-minute single call is normal when the task involves gathering context, building, and self-verifying. Code and UX written around “responses arrive in 30 seconds” needs rework:

  • Always stream. Non-streaming requests with large max_tokens will hit SDK HTTP timeouts; the SDKs will actually refuse to send them. Use the streaming helpers and .get_final_message() if you don’t need per-token handling.
  • Set client timeouts generously and make them per-route, not global.
  • Show real progress. This is exactly what display: "summarized" is for — streamed thinking summaries are your progress indicator.
  • Structure long work asynchronously. Callers should check in on runs rather than blocking inside one request. If you’re on Anthropic’s Managed Agents, this is the session/event-stream model working as intended.

What carries over unchanged

It’s not all churn. The Messages API shape, tool use patterns, structured outputs, the effort parameter itself, Task Budgets (beta), server-side compaction (beta), the memory tool, context editing, prompt caching, and high-resolution vision all work on Fable 5 exactly as on Opus 4.7/4.8. One caching note: the minimum cacheable prefix on Fable 5 is 2048 tokens (vs 4096 on Opus 4.8), and switching models always invalidates the cache — the first Fable 5 request on a previously-Opus conversation writes the cache fresh.

The behavioral differences: prompting Fable 5 well

None of these break code, but they’re where a migrated workload feels different — and where prompts tuned for Opus actively hurt.

De-prescribe your prompts. This is the counterintuitive one. Prompts and skills written for prior models — step-by-step scaffolding, “first do X, then Y, then Z” — are often too prescriptive for Fable 5 and measurably reduce output quality. State the goal and the constraints; let it figure out the steps. After migrating, A/B your workload with the old scaffolding removed. You’ll probably delete more prompt than you add.

Give it the reason, not just the request. Fable 5 performs better when it understands intent — it connects the task to relevant context rather than guessing. A frame like “I’m working on [larger task] for [who]. They need [what the output enables]. With that in mind: [request]” costs one sentence and pays off most on long-running agents juggling multiple workstreams.

Front-load the task spec. The model’s long-horizon strength comes partly from planning deeply at the start. One complete, well-specified opening turn beats the same information dribbled across five follow-ups — in both quality and total token spend.

Nudge it past overplanning on ambiguous tasks. At higher effort it can deliberate beyond what a task needs. A standard system-prompt line fixes it: “When you have enough information to act, act. If you are weighing a choice, give a recommendation, not an exhaustive survey.”

Guard against unrequested tidying. Higher effort buys excellent verification behavior, but also a tendency to refactor adjacent code or add abstractions nobody asked for. The fix is explicit: “Don’t add features, refactor, or introduce abstractions beyond what the task requires. Do the simplest thing that works well.”

Ground progress claims. For long autonomous runs, require claims to be audited against tool results: “Before reporting progress, audit each claim against a tool result from this session. If something is not yet verified, say so explicitly.” In testing this nearly eliminated fabricated status reports on tasks designed to elicit them.

Lean into sub-agents. Prior-model guardrails often suppressed delegation because sub-agents went off the rails. On Fable 5, parallel sub-agents are dependable — flip the guidance: “Delegate independent subtasks to sub-agents and keep working while they run.” Asynchronous delegation (orchestrator keeps working) outperforms spawn-and-block.

Give it a memory surface. Even a plain .md file the agent writes learnings to measurably improves multi-session performance. Tell it where the file is, tell it to consult it, give it a format: one lesson per file, one-line summary at top, update rather than duplicate, delete what turns out wrong.

Two rare quirks worth knowing. Deep into very long sessions it can occasionally end a turn stating intent (“I’ll now run X”) without doing it — a “continue” recovers it interactively; autonomous pipelines should add a system reminder that no one is watching and it should proceed without asking. And if your harness surfaces a remaining-context countdown, it can develop “context anxiety” and start truncating its own work — avoid showing explicit token counts, or tell it plainly that it has ample context and should continue.

Quick decision table

You need… Use
The absolute hardest reasoning, longest agentic runs, willing to pay 2× Fable 5 (claude-fable-5)
The same, and you’re in Project Glasswing Mythos 5 (claude-mythos-5)
The current default for “make this as smart as reasonable” Opus 4.8 (claude-opus-4-8)
Best balance of speed, cost, and quality for most production apps Sonnet 4.6 (claude-sonnet-4-6)
High-volume classification, simple Q&A, latency-critical paths Haiku 4.5 (claude-haiku-4-5)
Migrating off Mythos Preview Mythos 5 if in Glasswing, Fable 5 otherwise
A ZDR org, or thinking: disabled for latency Opus 4.8 — Fable 5 can’t do either

When Fable 5 is worth 2× Opus 4.8 — and when it isn’t

Worth it: when the workload genuinely sits above what Opus 4.8 does reliably. Autonomous overnight runs that must not derail. First-shot builds from a complete spec. Deliverables where a wrong number in a spreadsheet costs more than the API bill. Repository-scale debugging. Research agents coordinating sub-agents for hours. On these, Fable 5’s higher per-token price often nets out cheaper than Opus retrying its way to the answer — fewer turns, less rework, less human correction.

Not worth it: when your evals show Opus 4.8 already hitting the bar. Paying 2× for the median request buys you very little; the win lives on the tail. Also stay on Opus 4.8 if any of these apply:

  • Code relies on thinking: {type: "disabled"} for latency or cost (Fable 5 rejects it).
  • Your org runs zero data retention and can’t change it.
  • Volume economics: at scale, 2× on every token is a real number.
  • Latency budgets that can’t absorb minutes-long turns even occasionally.

A useful middle path: route by difficulty. Send the median traffic to Opus 4.8 or Sonnet 4.6, and escalate the requests that fail or that you classify as hard to Fable 5. The fallbacks machinery and consistent API shape make this kind of tiered routing much less painful than it used to be.

And remember the Opus 4.7 → 4.8 hop itself has no breaking changes — it’s a model-string swap plus prompt re-tuning. If you’re on 4.7 and not ready for Fable 5, taking 4.8 first is a free upgrade and gets you onto the same tokenizer and API surface, making the eventual Fable 5 move smaller.

Migration cheat sheet — Opus 4.7/4.8 → Fable 5

# Before (Opus 4.8)
client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive", "display": "summarized"},
    output_config={"effort": "high"},
    messages=[...],
)

# After (Fable 5)
client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    # thinking is implicit and mandatory — omit the config entirely,
    # or keep {"type": "adaptive", "display": "summarized"} if you
    # surface reasoning summaries to users
    output_config={"effort": "high"},
    messages=[...],
)

The checklist, in order of “this will 400” to “this will surprise you”:

  1. Update the model ID to claude-fable-5 (or claude-mythos-5 in Glasswing).
  2. Remove thinking: {type: "disabled"} anywhere it appears — it errors on Fable 5.
  3. Remove any remaining budget_tokens, temperature, top_p, top_k — all 400 (these already 400 on Opus 4.7/4.8, so 4.7/4.8 callers are clean).
  4. Remove assistant prefills; switch to output_config.format or system-prompt instructions.
  5. Confirm 30-day data retention at the org level — otherwise every request 400s.
  6. Add stop_reason == "refusal" handling before reading response.content, and pick a fallback strategy (server-side fallbacks → SDK middleware → hand-rolled).
  7. Set display: "summarized" explicitly if you surface reasoning — the default is "omitted".
  8. Plan for minutes-long turns: streaming everywhere, generous per-route timeouts, async check-in UX.
  9. Re-baseline cost, not tokens — counts match Opus 4.8, prices don’t.
  10. A/B your prompts with old scaffolding removed — over-prescriptive prompts reduce Fable 5 quality.
  11. Run an effort sweep including medium and low for routine routes before defaulting everything to xhigh.

Closing

Fable 5 and Mythos 5 are the same model behind different doors — Glasswing participants get the Mythos badge, everyone else gets Fable, and nothing else differs. Against the Opus family, the differences are real and asymmetric: thinking is mandatory and private, refusals are a first-class response with their own retry machinery, sampling knobs and prefills are gone, data retention is a hard gate, and single turns can run as long as a coffee break. In exchange you get the strongest long-horizon agentic model Anthropic has shipped — one that rewards complete specs, honest delegation, and prompts that state goals instead of steps.

Pick by workload, not by version number. Opus 4.8 remains the right default for most demanding work, Sonnet 4.6 the right answer for most production traffic, and Fable 5 the model you reach for when the task is the kind that used to be impossible. The expensive model is only the right model when the task actually needs it — but when it does, nothing else comes close.