AI-Powered Backlog Refinement Isn’t Evil (or is it?)

Backlog refinement has always been a tedious task (whether using Scrum, Waterfall, Kanban, etc.): the Product Manager and a few dedicated allies sift through hundreds of support tickets, Slack rants, and “would n’t-it-be-nice” ideas to create something the developers can understand and even estimate.

We’ve gone through this process with dozens of product managers:

using sticky-note walls
spreadsheets
late-night copy-paste marathons

It works, but it takes away the very time we should be using to talk with developers about trade-offs and value. Our developers’ time is precious, and we must be great stewards of their time. There is no shortcut in collaboration, though. That’s the messy part; we have to do it if we want collective wisdom, shared knowledge of a goal, and a unifying commitment with the team to deliver.

That’s why we’re turning to AI. AI-Powered Backlog Refinement isn’t evil. Having it replace conversations and collaboration is. Let’s explore a transformer script that can extract raw quotes, draft clear stories, and flag initial risks in minutes, giving us a backlog that’s ready for discussion, not just transcription. We’re not replacing the conversation; far from it. Of course, your environment has nuance and differences. My goal is to share an idea and let you figure out how to apply it in your world (if you want).

I believe in my examples, we’re freeing up the hour we usually spend formatting tickets so we can focus on debating customer impact, design options, and testing strategies with the team; faster preparation, richer dialogue, and maintaining human judgment where it truly matters.

Product management is a job; regardless of how you build your product, you need to understand market needs and how your product is performing. Product Owner is a role in Scrum that does this :)

MediConnect Tele-Health Portal.

Our test bed is MediConnect, a telehealth portal that allows patients to book video appointments, upload lab results, and receive e-prescriptions, while providers manage schedules and share records.

Two user groups, patients seeking convenience and clinicians focused on workflow efficiency, highlight different pain points each week. By collecting their help-desk tickets, post-visit surveys, and doctor feedback sessions into a single quote list, we get an unfiltered pulse on what breaks, what delights, and what’s still missing.

This raw feedback from patients and providers is exactly the input we feed into AI, transforming scattered anecdotes into backlog items the team can act on. We extracted 50 real-world style quotes (38 frustrations, 12 delights) from support tickets, post-visit surveys, and provider interviews (download the CSV file here).

The Process In Four Steps

Step	Action	Output
1. Harvest quotes	Export support tickets, survey comments, and provider notes into a single CSV (category, quote).	50-row data file with frustrations and delights.
2. Transform with GPT-4o	Run `quote2story.py`; each quote is sent through a structured prompt.	JSON bundle: user story, job story, 3 acceptance criteria, risk flag.
3. Append to backlog table	Script inserts a new row into your Confluence HTML table via CI.	Live backlog grows with INVEST-tested items.
4. Human review	PO and team tweak wording, slice high-risk stories, and prioritise.	“Ready” backlog items in minutes, not meetings.

The Script

Here is our quote2story.py script, you will likely need to make adjustments for this to work for your organization.

#!/usr/bin/env python3
"""
quote2story.py
Convert customer quotes to backlog-ready items.
Inputs
------
• CSV with columns: category, quote
• OPENAI_API_KEY environment variable
• BACKLOG_TABLE_URL (raw HTML of an existing table in your wiki)
Outputs
-------
Appends a new <tr> to the backlog table containing:
  • Customer quote
  • User story (INVEST-ready)
  • Job story (Intercom style)
  • 3 acceptance criteria (Given/When/Then)
  • Risk flag (Low | Medium | High)
Run:  python quote2story.py mediconnect_customer_quotes_50_ascii.csv
"""
import csv
import json
import os
import sys
from typing import Dict, List
import bs4                 # pip install beautifulsoup4
import openai               # pip install openai
import requests             # pip install requests
# --------------------------- CONFIG -----------------------------------------
openai.api_key = os.getenv("OPENAI_API_KEY")
TABLE_URL      = os.getenv("BACKLOG_TABLE_URL")         # raw HTML of backlog table
MODEL          = "gpt-4o"
PROMPT_TMPL = """
You are a senior Product Owner.
For the following customer QUOTE, return JSON with:
user_story  : "As a <role>, I want <goal> so that <benefit>."
job_story   : "When <situation>, I want <motivation>, so I can <expected_outcome>."
criteria    : list[3] acceptance criteria strings (Given / When / Then).
risk        : Low, Medium, or High based on clarity, dependencies, unknowns.
QUOTE: "{quote}"
"""
# ---------------------------------------------------------------------------
def generate_backlog_item(quote: str) -> Dict[str, str]:
    """Call OpenAI and return the parsed JSON response."""
    resp = openai.ChatCompletion.create(
        model=MODEL,
        messages=[{"role": "system", "content": PROMPT_TMPL.format(quote=quote)}],
        temperature=0.2,
    )
    content = resp.choices[0].message.content
    return json.loads(content)
def append_row_to_table(cells: List[str]) -> None:
    """Fetch table HTML, append a row, write back to file (commit via CI)."""
    html = requests.get(TABLE_URL, timeout=10).text
    soup = bs4.BeautifulSoup(html, "html.parser")
    tbody = soup.find("tbody")
    tr = soup.new_tag("tr")
    for cell_text in cells:
        td = soup.new_tag("td")
        td.string = cell_text
        tr.append(td)
    tbody.append(tr)
    with open("backlog_table.html", "w") as f:
        f.write(str(soup))
def main(csv_path: str) -> None:
    with open(csv_path, newline="") as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            quote = row["quote"]
            data  = generate_backlog_item(quote)
            append_row_to_table([
                quote,
                data["user_story"],
                data["job_story"],
                "<br>".join(data["criteria"]),
                data["risk"],
            ])
            print(f"Added story for quote: {quote[:60]}...")
if __name__ == "__main__":
    if len(sys.argv) != 2:
        sys.exit("Usage: python quote2story.py <quotes.csv>")
    main(sys.argv[1])

How the Script Works

Inputs

CSV file: one line per quote with two columns: category and quote. We export this directly from support and survey tools.
OpenAI key: allows the script to send each quote to GPT-4.
Backlog table URL: a link to the raw HTML of the table on our Confluence or wiki page.

Step-by-step process

The script reads the CSV, one quote at a time.
For each quote, it creates a prompt asking your AI tool to return five items: a user story, a job story, three acceptance criteria, and a risk rating (just for our sample I wanted to provide both formats at the same time, you likely wouldn't provide both versions to the team).
GPT-4 responds in JSON (here is a sample). The script parses this and converts it into table cells.
It fetches the existing backlog table, adds a new <tr> with those cells, and writes the updated HTML back to disk.
In our CI pipeline, we commit that file to the wiki. When refreshed, the team sees the new backlog row.

Outputs you can monitor

Terminal log: “Added story for quote: Uploading lab results…” for quick progress updates.
Updated HTML table: now includes the original quote, along with AI-generated story, criteria, and risk indicator.
Commit that file (or have your CI pipeline auto-commit it) and the new backlog items appear in your wiki or Confluence page on the next refresh.

Tools used

OpenAI Python SDK for the chat requests.
Beautifulsoup4 for easy HTML editing.
Requests to retrieve the wiki table via its raw link.
Your CI runner (GitHub Actions, GitLab CI, Azure DevOps, any will work) to push the updated file.

That’s it: three simple libraries, one API key, and a CSV file. No databases, no new dashboards. The AI produces the first draft so the team can focus meeting time on what matters: Is this the right story? Is the risk fair? Should we build it now or later?

The Output

Here is an HTML version (well just a sample of about 10 of the items) of what the output can look like. We also might just import the contents into JIRA or similar (see a JIRA file import here).

Customer Quote	User Story	Job Story	Acceptance Criteria	Risk
The video call froze twice during my appointment, and I had to reconnect each time.	As a patient, I want a stable video connection so my consultation is not interrupted.	When I start a video visit, I want the call to stay connected, so I can finish my appointment smoothly.	Given a video consultation, When connection drops, Then system auto-reconnects within 5 s. Given repeated connection failures, When call cannot auto-reconnect, Then patient receives a reschedule prompt. Given a stable call, When call ends, Then a survey asks about video quality.	🔴 High
I receive appointment reminders by email but not via push notifications, so I often miss them.	As a patient, I want push notifications for upcoming appointments so I don't miss visits.	When my appointment is 24 hours away, I want a push reminder, so I can prepare on time.	Given an upcoming appointment, When 24 h remain, Then send a push notification. Given notification settings, When patient opts out, Then system stops pushes and shows email-only indicator. Given a sent push, When patient opens it, Then app navigates to appointment details.	🟡 Medium
Uploading lab results from my phone is confusing; the upload button is tiny and hidden.	As a mobile patient, I want a visible, large upload button so I can share lab results easily.	When I tap Lab Upload on my phone, I want an obvious control, so I can attach results without error.	Given mobile UI, When Lab Upload is rendered, Then button height is at least 44 px. Given file selection, When upload starts, Then show progress bar. Given upload success, When done, Then thumbnail appears in chart.	🟢 Low
I had to fill out the same medical history form three times for different doctors.	As a patient, I want my medical history saved across providers so I fill it out once.	When I visit a new doctor, I want my existing medical history pre-populated, so I avoid duplicate entry.	Given completed history, When booking new provider, Then form auto-fills existing answers. Given missing data fields, When provider requires extras, Then system highlights only new questions. Given patient edits history, When saved, Then updates propagate to all future forms.	🟡 Medium
The portal logs me out after five minutes of inactivity, even when I'm reading doctor notes.	As a patient, I want longer inactivity timeout when reading notes so I don't get logged out.	When I'm viewing doctor notes, I want at least 15 minutes before timeout, so I can read at my pace.	Given note viewing, When no interaction for 15 minutes, Then show warning 60 s before logout. Given warning, When patient taps Continue, Then session renews without reload. Given timeout, When session ends, Then draft notes are auto-saved.	🟢 Low
The waiting room music on hold is loud and cannot be muted while I wait for the doctor.	As a patient, I want the waiting room audio to be adjustable so I can mute or lower volume.	When I'm on hold, I want volume controls, so the experience is comfortable.	Given waiting room, When patient taps mute, Then audio stops. Given waiting room, When volume slider changed, Then level persists for session. Given doctor's arrival, When consult starts, Then waiting room audio stops automatically.	🟡 Medium
I couldn't find the option to switch from front to rear camera during the consultation.	As a patient, I want a camera switch button so I can show rear camera images during a consult.	When I need to show my injury, I want to switch to rear camera, so the doctor sees clearly.	Given active call, When patient taps Switch Camera, Then app toggles front/rear in <2 s. Given device with one camera, When button tapped, Then system shows explanation. Given camera switch, When successful, Then icon reflects current camera.	🟢 Low
My prescription history takes ages to load; sometimes it shows an empty list.	As a patient, I want prescription history to load quickly so I can track my medications.	When I open Prescription History, I want the list within 3 seconds, so I can verify refills.	Given valid account, When history request sent, Then API responds within 2 s. Given >20 prescriptions, When list loads, Then pagination or lazy-load applies. Given empty history, When API returns none, Then friendly empty state appears.	🟡 Medium
There's no way to message my provider directly without scheduling another visit.	As a patient, I want a secure messaging feature so I can ask providers quick questions.	When I have a follow-up question, I want to message my doctor without booking another visit.	Given enabled provider, When patient clicks Message, Then secure chat opens. Given message sent, When provider replies, Then notification appears. Given provider disabled chat, When patient attempts, Then system suggests booking.	🔴 High
Appointment slots are always shown in UTC; I have to calculate my local time manually.	As a patient, I want appointment times shown in my local zone so I don't convert UTC manually.	When I browse slots, I want them in my time zone, so I avoid scheduling errors.	Given device locale, When schedule loads, Then times display in local zone. Given zone change, When patient travels, Then app auto-updates future slot displays. Given patient toggles zone display, When off, Then UTC times show with converting hint.	🟢 Low

Why This Matters for Product & GTM Teams

Automating the quote-to-story process saves your product team minutes, rather than hours by transforming raw feedback into a format developers can easily understand. GTM partners benefit from a single, living backlog that links every work item to an actual patient or provider voice, allowing marketing and CS to trace features back to evidence.

Developers, in turn, walk into refinement sessions already equipped with clear user stories, acceptance criteria, and initial risks, free to concentrate on why the feature is important, challenge assumptions, and share cross-functional ideas instead of debating wording.

The result is a refinement session that truly achieves its purpose: building a shared understanding of the goal, fostering unified commitment, and encouraging skill-sharing discussions that elevate the whole team. Faster prep for product teams, richer conversations for engineers, and better outcomes for patients and providers.

People thrive → Service shines → Profit follows.