I am writing after a long time. I have been busy with my new work - my first job in Europe, it’s Wolt (acquired by DoorDash). That’s another story for another post - moving countries, finding an apartment, understanding how to deal with the winter, and learning a new leanguage for writing grocery list.
But today I want to share something practical. Something I have been refining for months on personal projects, not taking the risk at work for now: using a coding agent to take a JIRA ticket and turn it into shipped, reviewed, merged code.
When it works, it feels like cheating. When it fails, it fails because I got lazy with the process. So this post is the process. It is long on purpose - I want a new developer to read this once and have everything they need to start tomorrow.
Why Bother With a Workflow?
Most people who are disappointed by coding agents are using them as a fancy autocomplete. You type, it suggests, you accept. That is fine for tiny tasks but it does not scale to “implement this ticket.”
The mindset shift is this: treat the agent like a capable junior engineer. A junior who is fast, has read every open-source project ever written, but who will confidently ship the wrong thing if you hand them a vague ticket and walk away. Your job is to give them structure. The structure is the workflow.
There are three parts to this: read the ticket, classify the work, run the right playbook. Let us walk through each one.
Part 1: Give the Agent Eyes - The JIRA MCP Tool
MCP (Model Context Protocol) is the reason any of this works. It is a standard way to give an agent access to external systems without baking API keys into prompts. Think of it as USB for AI tools - plug in a server, the agent can now call its functions.
For our workflow we need a JIRA MCP server that exposes at least these functions:
tool: jira.get_ticket
input: { ticket_id: "WOLT-1234" }
output:
key: string
title: string
description: string
status: string
type: string # Story, Bug, Task, Epic, etc.
acceptance_criteria: string[]
labels: string[]
linked_tickets: [{id, relation}]
comments: [{author, timestamp, body}]
tool: jira.get_sub_tickets
input: { parent_id: "WOLT-1234" }
output: [{id, title, status, type, assignee}]
tool: jira.add_comment
input: { ticket_id, body }
Why these three? Because they cover the read-summarise-report loop. The agent reads the parent, walks into each sub-ticket, does the work, and reports back on the ticket where the PM is actually looking.
A small but important rule: give the agent read access to everything, but require human approval for write actions (creating tickets, transitioning status, posting comments). You want speed without surprises.
Part 2: Summary and TODO Before a Single Line of Code
This is the step everyone wants to skip, and skipping it is the most common way I see agent workflows fail.
Before touching code, the agent produces two artifacts:
The Summary
Three to five sentences answering:
- What problem are we solving?
- Who is the user or stakeholder?
- What does “done” look like? What is not in scope?
- What assumptions is the agent making?
If the summary is wrong, you correct it in 30 seconds. If the code is wrong, you waste an afternoon.
The TODO List
One line per logical chunk of work. Usually this maps to sub-tickets, but often you need to split further. A good TODO item has:
- A verb (add, fix, extract, migrate, investigate)
- A noun (the thing being changed)
- An acceptance signal (how we know it is done)
Example prompt:
Use
jira.get_ticketandjira.get_sub_ticketson WOLT-1234. Write a 5-sentence summary, a numbered TODO list, and list any assumptions you are making. Do NOT write code yet. Wait for my approval.
That last line is the seatbelt. Use it every time.
Part 3: Classify Each Task
This is the heart of the workflow. A bug is not a feature. A refactor is not a migration. Treating them the same is how you ship broken code with confident commit messages.
I use five types. They are exhaustive enough for real work and simple enough for the agent to classify correctly ~95% of the time. For the other 5%, it asks.
| Type | What it means | Output shape | Risk profile |
|---|---|---|---|
| Bug | Something is broken or behaves wrong | Root cause + targeted fix + regression test | Medium - tempting to over-fix |
| New Feature | Net-new behaviour or capability | Design + implementation + full test suite | High - most likely to drift from intent |
| Refactor | Restructure without behaviour change | Diff-style changes, tests unchanged | Low if disciplined, high if not |
| Migration | Move data or systems | Forward script + rollback + runbook | High - hard to undo in production |
| Spike | Investigation or research | Findings document, not code | Low - no production impact |
different from expected?} Q1 -->|Yes| BUG[Bug] Q1 -->|No| Q2{Does this add new
user-visible behaviour?} Q2 -->|Yes| FEAT[New Feature] Q2 -->|No| Q3{Is this a structure change
with no behaviour change?} Q3 -->|Yes| REF[Refactor] Q3 -->|No| Q4{Are we moving data
or swapping systems?} Q4 -->|Yes| MIG[Migration] Q4 -->|No| SPK[Spike] style BUG fill:#ef4444 style FEAT fill:#3b82f6 style REF fill:#8b5cf6 style MIG fill:#f59e0b style SPK fill:#10b981
Prompt:
For every TODO item, classify it as Bug, New Feature, Refactor, Migration, or Spike. If any item is ambiguous or mixes types, stop and ask me. Do not classify by guessing.
Part 4: Run the Right Playbook
Now the work begins. Each type has its own script. Do not cross streams.
The Bug Playbook
Bugs have a natural shape: reproduce, isolate, fix, protect. The agent should never “just fix it.” The failing test must come first - that is the single best defence against the fix being silent lies.
Steps:
- Reproduce. The agent pulls the ticket, follows repro steps, confirms the bug. If it cannot reproduce, it asks.
- Write a failing test that captures the bug in plain language (
should_not_charge_twice_on_retry). It must fail for the right reason. - Root cause analysis. The agent reads the relevant code and writes a one-paragraph explanation of why this happens. Not “the total was wrong” but “floor was used instead of round in
calculatePricingWithDiscount.” - TDD fix. Smallest change that turns the test green. Resist the urge to refactor nearby code.
- Regression test. Often the bug has a sibling. If you fix
floorhere, check forfloorin tax too. - Run the full suite. Green everywhere, not just the new test.
- Branch named
fix/WOLT-1234-discount-rounding. - Commit with
fix(pricing): round discount amount (WOLT-1234). - PR with: repro steps, root cause, fix summary, test list.
- Comment on the ticket with the PR link and a one-sentence summary.
Scenario: Customers report checkout totals are off by one cent on orders with percentage discounts. Agent reproduces, writes should_round_discounted_total_correctly, traces the bug to Math.floor(price * (1 - discount) * 100) / 100, fixes it to Math.round, adds a test for the 33% case that was silently failing, pushes.
Common pitfall: The agent fixes the symptom (the test case) instead of the cause. Require the root-cause paragraph before the fix.
The New Feature Playbook
Features are where agents drift hardest. A vague ticket plus a capable agent equals 800 lines of plausible-looking code that solves the wrong problem. The antidote is designing first.
Steps:
- Read ticket + acceptance criteria. List any AC that is unclear. Ask.
- Design note (half a page). Data model, API shape, state transitions, three edge cases, what is explicitly out of scope.
- Human approval. You read the note. You redirect. The agent does not code until approved.
- Feature flag if the blast radius is bigger than a screen. Default off.
- Vertical slice first. End-to-end skeleton (API → service → DB → UI) before filling in details. This surfaces integration pain early.
- Unit tests for logic, integration tests for boundaries, one E2E for the happy path.
- Negative tests. What happens when the input is empty, malformed, unauthorized?
- Branch
feat/WOLT-1234-checkout-tipping. - PR links to the design note and the feature flag config.
- Comment on ticket with the PR link, design link, and how to enable the flag in staging.
Scenario: Add tipping at checkout. Agent writes a design note proposing a Tip value object with {amount, currency, type: preset|custom}, three edge cases (zero tip, tip exceeds cap, currency mismatch), a feature flag checkout.tipping, and a vertical slice that persists a tip through the order lifecycle. You approve. Agent implements, tests, opens PR.
Common pitfall: The agent keeps “helpfully” adding scope. Tell it: “Implement only what the design note describes. Anything new goes in the follow-up section of the PR description.”
The Refactor Playbook
Refactors are deceptively dangerous. The whole promise is “nothing changes” - which means any behaviour change is a bug you shipped while pretending to clean up. Tests are your lie detector.
Steps:
- Scope it tightly. “Extract
PricingEnginefromOrderService” is good. “Clean up the order module” is not. - Check test coverage. If the area is not covered, write characterisation tests that pin down current behaviour (even weird behaviour - capture it all).
- Diff-style changes. Small, mechanical, reviewable. Move, rename, extract - in that order, not all at once.
- Run the full suite after every step. Green must stay green. Any red is a sign the refactor is changing behaviour.
- No fixes on the way. See a bug during refactor? File a ticket. Do not mix.
- Branch
refactor/WOLT-1234-extract-pricing-engine. - PR description starts with “No behaviour change.” Then explain the structural win.
Scenario: Pricing logic is tangled into OrderService. Agent writes characterisation tests for six pricing paths, extracts PricingEngine via three mechanical commits (move file, update imports, narrow interface), runs the suite after each, opens PR saying “No behaviour change. Reduces OrderService by 240 lines, isolates pricing for upcoming regional VAT work.”
Common pitfall: The agent “improves” logic while moving it. That is not a refactor, that is a behaviour change hiding inside one. Disallow it.
The Migration Playbook
Migrations touch data you cannot easily re-create. The rollback plan is not optional - it is the other half of the work. Until the rollback exists, the migration does not exist.
Steps:
- Define scope. Which table, which rows, which env? Estimate row counts.
- Forward script. Idempotent if possible. Logs what it did.
- Rollback script. Same commit. Tested. If it cannot be fully rolled back (rare, but real), call that out loudly in the PR.
- Dry run on staging with a representative data snapshot. Measure: duration, lock time, row counts before/after.
- Backfill strategy. For big tables: batched updates, not one giant transaction.
- Runbook. How to run, how to verify, how to revert, who to call if it explodes.
- Branch
migration/WOLT-1234-add-loyalty-tier. - PR includes forward script, rollback, runbook, dry-run results.
Scenario: Replace freeform tier string with a loyalty_tier_id FK to a new lookup table. Agent writes: forward migration (add column nullable, backfill in batches of 5k, add NOT NULL), rollback (drop column), runbook with “how to verify counts match,” runs on staging against a 2M-row copy, reports 47-minute runtime with no locks held over 300ms. PR reviewed, scheduled, run.
Common pitfall: Agent writes forward migration only. Always require rollback in the same PR, or the PR does not merge.
The Spike Playbook
Spikes are investigations. The deliverable is knowledge, not code. The mistake is letting spike code sneak into main.
Steps:
- Frame the question in one sentence. “Should we use WebSockets or SSE for live order tracking?”
- Define what success looks like. What data would change your mind? Latency under X? Cost under Y? Operational simplicity?
- Throwaway prototype on a scratch branch or notebook. No production code. No merging.
- Measure or compare - numbers if you can, tradeoffs if you cannot.
- Findings document: what was tried, what was learned, what surprised you, what is still unknown.
- Recommendation with your confidence level. “Use SSE. High confidence for this use case, low confidence if we add bidirectional features later.”
- Post findings on the ticket, link the prototype branch read-only.
- File follow-up tickets for the chosen path.
Scenario: “How should we push live order status to the customer app?” Agent prototypes WebSockets and SSE, measures reconnection behaviour on flaky networks, checks infra cost estimates, writes a 1-page findings doc recommending SSE with a fallback to polling. No production code lands. Three follow-up tickets get filed for the implementation.
Common pitfall: Spike code gets promoted to production because “it already works.” It does not. It is a sketch. Require a fresh implementation ticket.
Putting It All Together
Lessons From Using This Day to Day
- Always demand the summary first. The cost is 30 seconds. The value is catching misunderstandings before they become code.
- Reward the agent for asking. An agent that says “Is this a bug or a refactor?” is an agent that will not silently ship the wrong playbook. A confident agent that never asks is scarier than a cautious one.
- One ticket at a time. Agents drift on big batches. Keep the unit of work small enough that you can hold the context in your head.
- Review the diff like a human wrote it. Because one did - just a digital one. Look for plausible-but-wrong patterns: the right library used the wrong way, the right pattern applied in the wrong layer.
- Keep playbooks in a file in the repo, such as
AGENTS.mdor.claude/playbooks.md. The agent reads it every session. When a playbook step proves wrong, update the file - not a one-off prompt. - Branch and commit naming is non-negotiable.
fix/,feat/,refactor/,migration/,spike/prefixes with the ticket ID. This is how Future You finds things. - Never let the agent transition ticket status or merge on its own. Read is free, write should require a human nod.
- Measure your own trust. Over a month, note how often the agent’s first attempt is acceptable per task type. Mine is roughly: Bug 80%, Refactor 75%, Feature 50%, Migration 40%, Spike 90%. That tells me where to keep the shortest leash.
Closing
You do not need a fancy framework to make coding agents genuinely useful. You need three things and a habit:
- MCP access to JIRA so the agent reads tickets like you do.
- A summary-before-code habit so you catch misunderstandings cheap.
- Five playbooks keyed to task type, written down in the repo.
Start with the Bug playbook tomorrow. It is the easiest to trust because the test either goes green or it does not. Once you see it work, add the Refactor playbook. Then Feature. Migration and Spike are the hardest - save them for when the others feel boring.
It is good to be back writing. More from Helsinki soon - I promised I would tell the Wolt story, and I will.