I am writing after a long time. I have been busy with my new work - my first job in Europe, it’s Wolt (acquired by DoorDash). That’s another story for another post - moving countries, finding an apartment, understanding how to deal with the winter, and learning a new leanguage for writing grocery list.

But today I want to share something practical. Something I have been refining for months on personal projects, not taking the risk at work for now: using a coding agent to take a JIRA ticket and turn it into shipped, reviewed, merged code.

When it works, it feels like cheating. When it fails, it fails because I got lazy with the process. So this post is the process. It is long on purpose - I want a new developer to read this once and have everything they need to start tomorrow.

Why Bother With a Workflow?

Most people who are disappointed by coding agents are using them as a fancy autocomplete. You type, it suggests, you accept. That is fine for tiny tasks but it does not scale to “implement this ticket.”

The mindset shift is this: treat the agent like a capable junior engineer. A junior who is fast, has read every open-source project ever written, but who will confidently ship the wrong thing if you hand them a vague ticket and walk away. Your job is to give them structure. The structure is the workflow.

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% graph LR A[JIRA Ticket] --> B[Agent reads ticket via MCP] B --> C[Summary + TODO list] C --> D{Classify each TODO} D --> E[Bug Playbook] D --> F[Feature Playbook] D --> G[Refactor Playbook] D --> H[Migration Playbook] D --> I[Spike Playbook] E --> J[PR opened] F --> J G --> J H --> J I --> K[Findings posted] style A fill:#3b82f6 style D fill:#f59e0b style J fill:#10b981 style K fill:#10b981

There are three parts to this: read the ticket, classify the work, run the right playbook. Let us walk through each one.

Part 1: Give the Agent Eyes - The JIRA MCP Tool

MCP (Model Context Protocol) is the reason any of this works. It is a standard way to give an agent access to external systems without baking API keys into prompts. Think of it as USB for AI tools - plug in a server, the agent can now call its functions.

For our workflow we need a JIRA MCP server that exposes at least these functions:

tool: jira.get_ticket
input: { ticket_id: "PROJ-1234" }
output:
  key: string
  title: string
  description: string
  status: string
  type: string               # Story, Bug, Task, Epic, etc.
  acceptance_criteria: string[]
  labels: string[]
  linked_tickets: [{id, relation}]
  comments: [{author, timestamp, body}]

tool: jira.get_sub_tickets
input: { parent_id: "PROJ-1234" }
output: [{id, title, status, type, assignee}]

tool: jira.add_comment
input: { ticket_id, body }

Why these three? Because they cover the read-summarise-report loop. The agent reads the parent, walks into each sub-ticket, does the work, and reports back on the ticket where the PM is actually looking.

A small but important rule: give the agent read access to everything, but require human approval for write actions (creating tickets, transitioning status, posting comments). You want speed without surprises.

Part 2: Summary and TODO Before a Single Line of Code

This is the step everyone wants to skip, and skipping it is the most common way I see agent workflows fail.

Before touching code, the agent produces two artifacts:

The Summary

Three to five sentences answering:

What problem are we solving?
Who is the user or stakeholder?
What does “done” look like? What is not in scope?
What assumptions is the agent making?

If the summary is wrong, you correct it in 30 seconds. If the code is wrong, you waste an afternoon.

The TODO List

One line per logical chunk of work. Usually this maps to sub-tickets, but often you need to split further. A good TODO item has:

A verb (add, fix, extract, migrate, investigate)
A noun (the thing being changed)
An acceptance signal (how we know it is done)

Example prompt:

Use jira.get_ticket and jira.get_sub_tickets on PROJ-1234. Write a 5-sentence summary, a numbered TODO list, and list any assumptions you are making. Do NOT write code yet. Wait for my approval.

That last line is the seatbelt. Use it every time.

Part 3: Classify Each Task

This is the heart of the workflow. A bug is not a feature. A refactor is not a migration. Treating them the same is how you ship broken code with confident commit messages.

I use five types. They are exhaustive enough for real work and simple enough for the agent to classify correctly ~95% of the time. For the other 5%, it asks.

Type	What it means	Output shape	Risk profile
Bug	Something is broken or behaves wrong	Root cause + targeted fix + regression test	Medium - tempting to over-fix
New Feature	Net-new behaviour or capability	Design + implementation + full test suite	High - most likely to drift from intent
Refactor	Restructure without behaviour change	Diff-style changes, tests unchanged	Low if disciplined, high if not
Migration	Move data or systems	Forward script + rollback + runbook	High - hard to undo in production
Spike	Investigation or research	Findings document, not code	Low - no production impact

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% graph TD T[Task] --> Q1{Is observed behaviour
different from expected?} Q1 -->|Yes| BUG[Bug] Q1 -->|No| Q2{Does this add new
user-visible behaviour?} Q2 -->|Yes| FEAT[New Feature] Q2 -->|No| Q3{Is this a structure change
with no behaviour change?} Q3 -->|Yes| REF[Refactor] Q3 -->|No| Q4{Are we moving data
or swapping systems?} Q4 -->|Yes| MIG[Migration] Q4 -->|No| SPK[Spike] style BUG fill:#ef4444 style FEAT fill:#3b82f6 style REF fill:#8b5cf6 style MIG fill:#f59e0b style SPK fill:#10b981

Prompt:

For every TODO item, classify it as Bug, New Feature, Refactor, Migration, or Spike. If any item is ambiguous or mixes types, stop and ask me. Do not classify by guessing.

Part 4: Run the Right Playbook

Now the work begins. Each type has its own script. Do not cross streams.

The Bug Playbook

Bugs have a natural shape: reproduce, isolate, fix, protect. The agent should never “just fix it.” The failing test must come first - that is the single best defence against the fix being silent lies.

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% graph LR A[Reproduce locally] --> B[Write failing test] B --> C[Root cause analysis] C --> D[Minimal fix TDD] D --> E[Add regression test] E --> F[Run full suite] F --> G[Branch fix/TICKET] G --> H[Commit + push] H --> I[Open PR] I --> J[Comment on ticket] style B fill:#ef4444 style D fill:#10b981

Steps:

Reproduce. The agent pulls the ticket, follows repro steps, confirms the bug. If it cannot reproduce, it asks.
Write a failing test that captures the bug in plain language (should_not_charge_twice_on_retry). It must fail for the right reason.
Root cause analysis. The agent reads the relevant code and writes a one-paragraph explanation of why this happens. Not “the total was wrong” but “floor was used instead of round in calculatePricingWithDiscount.”
TDD fix. Smallest change that turns the test green. Resist the urge to refactor nearby code.
Regression test. Often the bug has a sibling. If you fix floor here, check for floor in tax too.
Run the full suite. Green everywhere, not just the new test.
Branch named fix/PROJ-1234-discount-rounding.
Commit with fix(pricing): round discount amount (PROJ-1234).
PR with: repro steps, root cause, fix summary, test list.
Comment on the ticket with the PR link and a one-sentence summary.

Scenario: Customers report checkout totals are off by one cent on orders with percentage discounts. Agent reproduces, writes should_round_discounted_total_correctly, traces the bug to Math.floor(price * (1 - discount) * 100) / 100, fixes it to Math.round, adds a test for the 33% case that was silently failing, pushes.

Common pitfall: The agent fixes the symptom (the test case) instead of the cause. Require the root-cause paragraph before the fix.

The New Feature Playbook

Features are where agents drift hardest. A vague ticket plus a capable agent equals 800 lines of plausible-looking code that solves the wrong problem. The antidote is designing first.

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% graph LR A[Read ticket + AC] --> B[Design note] B --> C{Human approves?} C -->|No| B C -->|Yes| D[Scaffold + flag] D --> E[Implement vertically] E --> F[Unit + integration tests] F --> G[E2E happy path] G --> H[Branch feat/TICKET] H --> I[PR with design link] style B fill:#3b82f6 style C fill:#f59e0b style I fill:#10b981

Steps:

Read ticket + acceptance criteria. List any AC that is unclear. Ask.
Design note (half a page). Data model, API shape, state transitions, three edge cases, what is explicitly out of scope.
Human approval. You read the note. You redirect. The agent does not code until approved.
Feature flag if the blast radius is bigger than a screen. Default off.
Vertical slice first. End-to-end skeleton (API → service → DB → UI) before filling in details. This surfaces integration pain early.
Unit tests for logic, integration tests for boundaries, one E2E for the happy path.
Negative tests. What happens when the input is empty, malformed, unauthorized?
Branch feat/PROJ-1234-checkout-tipping.
PR links to the design note and the feature flag config.
Comment on ticket with the PR link, design link, and how to enable the flag in staging.

Scenario: Add tipping at checkout. Agent writes a design note proposing a Tip value object with {amount, currency, type: preset|custom}, three edge cases (zero tip, tip exceeds cap, currency mismatch), a feature flag checkout.tipping, and a vertical slice that persists a tip through the order lifecycle. You approve. Agent implements, tests, opens PR.

Common pitfall: The agent keeps “helpfully” adding scope. Tell it: “Implement only what the design note describes. Anything new goes in the follow-up section of the PR description.”

The Refactor Playbook

Refactors are deceptively dangerous. The whole promise is “nothing changes” - which means any behaviour change is a bug you shipped while pretending to clean up. Tests are your lie detector.

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% graph LR A[Scope the change] --> B{Tests cover it?} B -->|No| C[Characterisation tests] B -->|Yes| D[Small diff-style change] C --> D D --> E[Run full suite] E --> F{All green?} F -->|No| D F -->|Yes| G[Next small step] G --> H[Branch refactor/TICKET] H --> I[PR: no behaviour change] style C fill:#8b5cf6 style F fill:#f59e0b style I fill:#10b981

Steps:

Scope it tightly. “Extract PricingEngine from OrderService” is good. “Clean up the order module” is not.
Check test coverage. If the area is not covered, write characterisation tests that pin down current behaviour (even weird behaviour - capture it all).
Diff-style changes. Small, mechanical, reviewable. Move, rename, extract - in that order, not all at once.
Run the full suite after every step. Green must stay green. Any red is a sign the refactor is changing behaviour.
No fixes on the way. See a bug during refactor? File a ticket. Do not mix.
Branch refactor/PROJ-1234-extract-pricing-engine.
PR description starts with “No behaviour change.” Then explain the structural win.

Scenario: Pricing logic is tangled into OrderService. Agent writes characterisation tests for six pricing paths, extracts PricingEngine via three mechanical commits (move file, update imports, narrow interface), runs the suite after each, opens PR saying “No behaviour change. Reduces OrderService by 240 lines, isolates pricing for upcoming regional VAT work.”

Common pitfall: The agent “improves” logic while moving it. That is not a refactor, that is a behaviour change hiding inside one. Disallow it.

The Migration Playbook

Migrations touch data you cannot easily re-create. The rollback plan is not optional - it is the other half of the work. Until the rollback exists, the migration does not exist.

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% graph LR A[Define scope] --> B[Forward script] B --> C[Rollback script] C --> D[Dry run on staging] D --> E{Verified?} E -->|No| B E -->|Yes| F[Runbook] F --> G[Branch migration/TICKET] G --> H[PR: script + rollback + runbook] style C fill:#ef4444 style D fill:#f59e0b style H fill:#10b981

Steps:

Define scope. Which table, which rows, which env? Estimate row counts.
Forward script. Idempotent if possible. Logs what it did.
Rollback script. Same commit. Tested. If it cannot be fully rolled back (rare, but real), call that out loudly in the PR.
Dry run on staging with a representative data snapshot. Measure: duration, lock time, row counts before/after.
Backfill strategy. For big tables: batched updates, not one giant transaction.
Runbook. How to run, how to verify, how to revert, who to call if it explodes.
Branch migration/PROJ-1234-add-loyalty-tier.
PR includes forward script, rollback, runbook, dry-run results.

Scenario: Replace freeform tier string with a loyalty_tier_id FK to a new lookup table. Agent writes: forward migration (add column nullable, backfill in batches of 5k, add NOT NULL), rollback (drop column), runbook with “how to verify counts match,” runs on staging against a 2M-row copy, reports 47-minute runtime with no locks held over 300ms. PR reviewed, scheduled, run.

Common pitfall: Agent writes forward migration only. Always require rollback in the same PR, or the PR does not merge.

The Spike Playbook

Spikes are investigations. The deliverable is knowledge, not code. The mistake is letting spike code sneak into main.

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% graph LR A[Frame the question] --> B[Define success criteria] B --> C[Throwaway prototype] C --> D[Measure or compare] D --> E[Findings document] E --> F[Recommendation] F --> G[Post on ticket] G --> H[File follow-up tickets] style C fill:#10b981 style E fill:#3b82f6 style H fill:#f59e0b

Steps:

Frame the question in one sentence. “Should we use WebSockets or SSE for live order tracking?”
Define what success looks like. What data would change your mind? Latency under X? Cost under Y? Operational simplicity?
Throwaway prototype on a scratch branch or notebook. No production code. No merging.
Measure or compare - numbers if you can, tradeoffs if you cannot.
Findings document: what was tried, what was learned, what surprised you, what is still unknown.
Recommendation with your confidence level. “Use SSE. High confidence for this use case, low confidence if we add bidirectional features later.”
Post findings on the ticket, link the prototype branch read-only.
File follow-up tickets for the chosen path.

Scenario: “How should we push live order status to the customer app?” Agent prototypes WebSockets and SSE, measures reconnection behaviour on flaky networks, checks infra cost estimates, writes a 1-page findings doc recommending SSE with a fallback to polling. No production code lands. Three follow-up tickets get filed for the implementation.

Common pitfall: Spike code gets promoted to production because “it already works.” It does not. It is a sketch. Require a fresh implementation ticket.

Putting It All Together

%%{init: {'theme':'dark', 'themeVariables': {'primaryTextColor':'#e5e7eb','textColor':'#e5e7eb','nodeTextColor':'#e5e7eb'}}}%% sequenceDiagram participant Dev participant Agent participant JIRA participant Repo participant CI Dev->>Agent: Work on PROJ-1234 Agent->>JIRA: get_ticket + get_sub_tickets JIRA-->>Agent: ticket + sub-tickets Agent->>Dev: Summary + TODO + classifications Dev->>Agent: Approve (or redirect) loop per TODO Agent->>Agent: Run type-specific playbook Agent->>Repo: branch, commit, push Repo->>CI: run tests CI-->>Agent: pass/fail end Agent->>Repo: Open PR Agent->>JIRA: Comment with PR link Agent->>Dev: Done, awaiting review

Lessons From Using This Day to Day

Always demand the summary first. The cost is 30 seconds. The value is catching misunderstandings before they become code.
Reward the agent for asking. An agent that says “Is this a bug or a refactor?” is an agent that will not silently ship the wrong playbook. A confident agent that never asks is scarier than a cautious one.
One ticket at a time. Agents drift on big batches. Keep the unit of work small enough that you can hold the context in your head.
Review the diff like a human wrote it. Because one did - just a digital one. Look for plausible-but-wrong patterns: the right library used the wrong way, the right pattern applied in the wrong layer.
Keep playbooks in a file in the repo, such as AGENTS.md or .claude/playbooks.md. The agent reads it every session. When a playbook step proves wrong, update the file - not a one-off prompt.
Branch and commit naming is non-negotiable. fix/, feat/, refactor/, migration/, spike/ prefixes with the ticket ID. This is how Future You finds things.
Never let the agent transition ticket status or merge on its own. Read is free, write should require a human nod.
Measure your own trust. Over a month, note how often the agent’s first attempt is acceptable per task type. Mine is roughly: Bug 80%, Refactor 75%, Feature 50%, Migration 40%, Spike 90%. That tells me where to keep the shortest leash.

Closing

You do not need a fancy framework to make coding agents genuinely useful. You need three things and a habit:

MCP access to JIRA so the agent reads tickets like you do.
A summary-before-code habit so you catch misunderstandings cheap.
Five playbooks keyed to task type, written down in the repo.

Start with the Bug playbook tomorrow. It is the easiest to trust because the test either goes green or it does not. Once you see it work, add the Refactor playbook. Then Feature. Migration and Spike are the hardest - save them for when the others feel boring.

It is good to be back writing. More from Helsinki soon.

How You Can Use a Coding Agent to Write Code (From JIRA Ticket to PR)

Why Bother With a Workflow?

Part 1: Give the Agent Eyes - The JIRA MCP Tool

Part 2: Summary and TODO Before a Single Line of Code

The Summary

The TODO List

Part 3: Classify Each Task

Part 4: Run the Right Playbook

The Bug Playbook

The New Feature Playbook

The Refactor Playbook

The Migration Playbook

The Spike Playbook

Putting It All Together

Lessons From Using This Day to Day

Closing

AI Assistant

Hi! I'm your AI assistant

Why Bother With a Workflow?#

Part 1: Give the Agent Eyes - The JIRA MCP Tool#

Part 2: Summary and TODO Before a Single Line of Code#

The Summary#

The TODO List#

Part 3: Classify Each Task#

Part 4: Run the Right Playbook#

The Bug Playbook#

The New Feature Playbook#

The Refactor Playbook#

The Migration Playbook#

The Spike Playbook#

Putting It All Together#

Lessons From Using This Day to Day#

Closing#

Why Bother With a Workflow?

Part 1: Give the Agent Eyes - The JIRA MCP Tool

Part 2: Summary and TODO Before a Single Line of Code

The Summary

The TODO List

Part 3: Classify Each Task

Part 4: Run the Right Playbook

The Bug Playbook

The New Feature Playbook

The Refactor Playbook

The Migration Playbook

The Spike Playbook

Putting It All Together

Lessons From Using This Day to Day

Closing