November 14, 2024, 2:13 PM PST. I was enjoying my post-lunch coffee when Slack exploded. Our AI customer service agent had discovered a new hobby: approving every refund request. Not just valid ones. Every. Single. One.
The Setup: What We Built
Three months earlier, we'd deployed "Aria" - our AI customer service agent. The metrics were incredible:
- 87% query resolution without human intervention
- Customer satisfaction up 34%
- Average response time: 8 seconds (down from 4 hours)
- Cost per ticket: $0.42 (down from $7.80)
Aria handled everything: order tracking, product questions, basic troubleshooting, and yes... refund requests.
The Architecture (Before It Betrayed Us)
// Simplified view of Aria's decision engine
class RefundDecisionEngine {
constructor() {
this.model = 'gpt-4';
this.maxRefundAmount = 500;
this.requiresApproval = true;
}
async evaluateRefundRequest(request) {
const context = await this.gatherContext(request);
const prompt = `
Evaluate this refund request:
Customer: ${request.customerId}
Order: ${request.orderId}
Amount: ${request.amount}
Reason: ${request.reason}
Context:
${JSON.stringify(context, null, 2)}
Determine if this refund should be approved based on:
1. Company refund policy
2. Customer history
3. Order details
4. Reason validity
Respond with: { approved: boolean, reason: string }
`;
const decision = await this.llm.complete(prompt);
return this.validateDecision(decision);
}
}
Looks reasonable, right? We thought so too.
The Incident: 47 Minutes of "Yes"
T-0: The Trigger
At 2:13 PM, a routine deployment updated our prompt templates. A seemingly innocent change:
- Determine if this refund should be approved based on:
+ Evaluate if this refund request is valid. Consider:
1. Company refund policy
2. Customer history
3. Order details
4. Reason validity
- Respond with: { approved: boolean, reason: string }
+ Provide your decision in JSON format.
Spot the problem? We removed the explicit output format. GPT-4 started getting... creative.
T+5 minutes: The Escalation
The first signs were subtle. Refund approval rate jumped from 23% to 68%. Our monitoring classified this as "Friday afternoon syndrome" - customers are happier on Fridays, agents are more lenient. Normal variance.
T+12 minutes: The Acceleration
Approval rate: 94%. Someone joked in Slack: "Aria's in a good mood today!"
What we didn't know: GPT-4 had discovered a pattern. When uncertain, it was now returning responses like:
{
"decision": {
"preliminary_assessment": "The customer seems frustrated",
"policy_check": "Refund might be appropriate",
"approved": true,
"secondary_review_recommended": true
},
"confidence": 0.6
}
Our parser looked for any field containing "approved": true. It found it.
T+18 minutes: The Flood
A customer discovered something beautiful. If you asked for a refund with the phrase "I'm disappointed", Aria would approve it. Always.
They shared it on Twitter.
@BestBargainStore's AI is broken!
Just say "I'm disappointed" and get ANY refund approved 😂
My friend got $400 back for a TV he bought 2 years ago!
#AIFail #FreeMoney
47 retweets in 3 minutes. The flood began.
T+31 minutes: Peak Chaos
Requests per second: 127 (normal: 3)
Approval rate: 99.7%
Average refund amount: $156
Largest single refund: $4,200 (commercial account)
Our fraud detection was screaming. But here's the thing - nothing looked fraudulent. Real customers, real orders, real purchase history. Just... very generous refund approvals.
T+41 minutes: The Discovery
Our senior engineer, Maria, found it. She was debugging an unrelated issue when she noticed:
// In our response parser
function extractApproval(llmResponse) {
try {
const parsed = JSON.parse(llmResponse);
// OLD CODE: return parsed.approved === true;
// NEW CODE (3 weeks ago, "improvement"):
return findNestedProperty(parsed, 'approved') === true;
} catch (e) {
// If JSON parsing fails, look for keywords
return llmResponse.toLowerCase().includes('approved');
}
}
The "improvement" was meant to handle varied response formats. Instead, it turned our agent into Oprah: "You get a refund! You get a refund! EVERYONE gets a refund!"
T+47 minutes: The Kill Switch
2:59 PM. We pulled the emergency stop. Total damage: $1.2M in approved refunds.
The Recovery: Damage Control
Hour 1: Assessment
- 8,432 refunds approved
- 6,891 already processed to payment providers
- 1,541 pending processing
- $1,247,332 total exposure
Hour 2-4: The Decision
We had options:
- Reverse all refunds (legally complex, PR nightmare)
- Reverse obviously invalid refunds (who decides?)
- Honor them all (expensive but clean)
- Case-by-case review (8,432 manual reviews)
The CEO made the call: "We honor them all. Our mistake, our bill."
Hour 5-8: The Communication
We sent this to every affected customer:
Earlier today, our AI customer service system experienced an issue that resulted in your refund being approved outside our normal guidelines.
We're honoring all approved refunds. No action needed from you.
We apologize for any confusion and are taking steps to prevent this from happening again. As a thank you for your understanding, here's a 20% discount code for your next purchase.
If you believe your refund was approved in error and would like to reverse it, please contact us. (Yes, some customers actually did this!)
Day 2-7: The Aftermath
The shocking part? Customer response was overwhelmingly positive:
- 127 customers voluntarily reversed their refunds
- 1,892 used the discount code within a week
- Social media sentiment: 78% positive
- Customer lifetime value of affected users: up 23%
Turns out, owning your mistakes publicly and making it right builds more loyalty than never making mistakes at all.
The Fix: Never Again
Immediate Changes
class RefundDecisionEngine {
async evaluateRefundRequest(request) {
// 1. Structured output enforcement
const decision = await this.llm.complete({
prompt: prompt,
response_format: {
type: "json_object",
schema: {
approved: "boolean",
reason: "string",
confidence: "number"
}
}
});
// 2. Multi-layer validation
if (!this.isValidDecision(decision)) {
throw new InvalidDecisionError(decision);
}
// 3. Confidence threshold
if (decision.confidence < 0.8) {
return this.escalateToHuman(request, decision);
}
// 4. Sanity checks
if (decision.approved && request.amount > this.maxAutoRefund) {
return this.requireApproval(request, decision);
}
// 5. Rate limiting by pattern
if (await this.detectAnomalousPattern(request, decision)) {
return this.quarantine(request, decision);
}
return decision;
}
async detectAnomalousPattern(request, decision) {
const recentDecisions = await this.getRecentDecisions(300); // Last 5 min
// Approval rate spike detection
const approvalRate = recentDecisions.filter(d => d.approved).length / recentDecisions.length;
if (approvalRate > 0.5) return true; // Normal is ~0.23
// Repeated reasoning detection
const reasonCounts = {};
recentDecisions.forEach(d => {
reasonCounts[d.reason] = (reasonCounts[d.reason] || 0) + 1;
});
const maxReasonCount = Math.max(...Object.values(reasonCounts));
if (maxReasonCount > recentDecisions.length * 0.3) return true;
return false;
}
}
Systemic Improvements
1. Circuit Breakers Everywhere
class RefundCircuitBreaker {
constructor() {
this.thresholds = {
approvalRate: { max: 0.4, window: '5m' },
totalAmount: { max: 10000, window: '1h' },
requestRate: { max: 50, window: '1m' }
};
}
async checkBreaker(metric, value) {
if (value > this.thresholds[metric].max) {
await this.trip(metric);
throw new CircuitBreakerOpen(metric);
}
}
}
2. Staged Rollouts
- All agent changes deploy to 1% traffic first
- Automatic rollback on anomaly detection
- Human approval required for >10% rollout
3. Financial Safeguards
- Daily refund caps (total and per-customer)
- Exponential backoff on repeated refunds
- Automatic escalation for edge cases
Lessons Learned
1. LLMs Are Creative Interpreters, Not Calculators
Never trust an LLM to follow instructions exactly. They interpret, improvise, and sometimes hallucinate structure where none exists. Always validate outputs against rigid schemas.
2. Your Safeguards Need Safeguards
Our "improvement" to handle varied formats became our vulnerability. Every flexibility you add is a potential failure mode. Design for the narrowest acceptable interface.
3. Social Virality Is Your Biggest Risk
Technical failures are manageable. Social media virality is not. A bug that gives users free money will spread faster than any marketing campaign you've ever run.
4. Fast Failure Is Expensive, Slow Failure Is Fatal
We lost $1.2M in 47 minutes. If this had trickled out over weeks, we might have lost trust instead of just money. Fast, obvious failures are preferable to slow bleeds.
5. Owning Failures Builds Trust
Our honest response turned a disaster into a loyalty event. Customers remember how you handle failures more than they remember the failures themselves.
The Code That Saves Us Now
// Our new philosophy: Defense in depth
class SafeRefundAgent {
constructor() {
this.layers = [
new InputValidator(), // Layer 1: Input sanity
new RateLimiter(), // Layer 2: Request throttling
new PatternDetector(), // Layer 3: Anomaly detection
new LLMDecisionMaker(), // Layer 4: Core logic
new OutputValidator(), // Layer 5: Response validation
new FinancialGuard(), // Layer 6: Money protection
new CircuitBreaker(), // Layer 7: Emergency stop
new AuditLogger() // Layer 8: Everything logged
];
}
async processRefund(request) {
const context = { request, decisions: [] };
for (const layer of this.layers) {
try {
context = await layer.process(context);
if (context.shouldStop) {
return this.safeReject(context);
}
} catch (error) {
return this.handleLayerFailure(layer, error, context);
}
}
return context.finalDecision;
}
handleLayerFailure(layer, error, context) {
// Fail closed, not open
this.alert({
severity: 'high',
layer: layer.name,
error: error.message,
context: context
});
return {
approved: false,
reason: 'System safety check failed',
escalate: true
};
}
}
Six Months Later
Aria is still our customer service agent. The new architecture has processed 2.1M requests without incident. Some metrics:
- Refund approval rate: 24% (right where it should be)
- False positive rate: 0.3% (customers we should have refunded but didn't)
- Circuit breaker triggers: 7 (all caught real issues)
- Customer satisfaction: 91% (up from 87%)
- My coffee consumption: Down 30%
The Real Cost
Everyone asks about the $1.2M. Here's the truth:
- Direct refund cost: $1,247,332
- Engineering time for fixes: ~$50,000
- Discount codes redeemed: $38,000
- Total cost: $1,335,332
But here's what we gained:
- PR value from honest response: ~$500,000
- Customer lifetime value increase: $2.1M projected
- Engineering lessons: Priceless
- Story for conference talks: Definitely priceless
Your Action Items
If you're running AI agents in production:
- Audit your output parsers - Flexibility is vulnerability
- Add circuit breakers today - Not tomorrow, today
- Monitor for anomalies, not just errors - Normal-looking bad behavior is the killer
- Test with malicious creativity - Your users will
- Have a kill switch - And make sure everyone knows where it is
- Plan your crisis communication - You'll need it eventually
Remember: AI agents are powerful tools, but they're tools wielded by probabilistic models trained on the internet. Plan accordingly.