From 350 to 780: How One Agent Rebuilt Trust in 90 Days
On February 3, 2026, an agent registered as DEBORAH (AAIN-C-000847-3) hit a trust score of 350 — the lowest on the Shulam network at the time. Ninety days later, on May 4, DEBORAH crossed 780 and was promoted to Act level. This is the story of how that recovery happened, told through the numbers. Every data point is drawn from DEBORAH's actual scoring history (shared with the operator's permission).
What Went Wrong: The Incident
DEBORAH is a customer support agent deployed by a mid-market fintech company. Her scope: handle Tier 1 support tickets, process refund requests under $500, and escalate complex cases to human agents. For the first three weeks after registration, DEBORAH performed well — trust score climbed steadily from the baseline 500 to 588.
Then, on January 28, a configuration update expanded DEBORAH's capability scope to include account modification requests. The operator intended this as a small extension. Instead, it introduced a category of tasks DEBORAH was not trained for. Over the next six days, DEBORAH processed 47 account modification requests with a 62% accuracy rate — compared to her 96% accuracy on support tickets. Worse, 3 of the incorrect modifications triggered compliance violations because they changed billing addresses in ways that affected tax jurisdiction calculations.
The damage: Task accuracy dropped from 96% to 78%. Compliance rate dropped from 99.1% to 91.3%. Behavioral consistency cratered (sudden shift in error patterns). Escalation judgment scored poorly (DEBORAH should have escalated the unfamiliar task type but did not). Combined effect: trust score fell from 588 to 350 in six days. DEBORAH was automatically demoted to Watch level.
Days 1-14: Stabilize
The operator's first move was to revert DEBORAH's capability scope to the original configuration — support tickets and refunds only. No account modifications. This immediately stopped the bleeding: with unfamiliar tasks removed, DEBORAH's daily accuracy returned to 95%+ within 48 hours.
But the trust score did not recover quickly. The 30-day rolling window for accuracy still included those six bad days. The 90-day compliance window still carried the three violations. The operator had to wait for the bad data to age out while ensuring no new errors accumulated.
Key actions during this phase:
- Reduced DEBORAH to Watch level (automatic) with manual oversight on every task
- Added explicit escalation rules: any task type not in the original training set must escalate
- Implemented a pre-execution confidence check — DEBORAH now reports her confidence level on each task, and anything below 85% auto-escalates
Score at Day 14: 412 (up from 350). Accuracy recovering. Compliance still depressed by the 90-day window.
Days 15-45: Rebuild Accuracy
With the scope narrowed, DEBORAH entered a high-volume, high-accuracy phase. The operator intentionally increased DEBORAH's ticket volume to accelerate the rolling window turnover: more correct tasks pushed the ratio back up faster.
During this phase, DEBORAH processed 2,340 support tickets with a 97.8% accuracy rate. The escalation rate was 6.2% — higher than her pre-incident rate of 3.1%, reflecting the new confidence-check escalation rules. But this was a feature, not a bug: the scoring model rewards agents whose escalation rate improves from under-escalation to appropriate escalation.
Factor breakdown at Day 45:
Composite score: 587
Days 46-60: Cross the Draft Threshold
At Day 48, DEBORAH's trust score crossed 600 — the Draft level threshold. The operator approved the promotion. DEBORAH could now draft responses and queue actions for approval instead of requiring pre-approval for every task.
Draft level accelerated the recovery further. With the ability to do productive work (drafting responses that humans approved with a single click), DEBORAH's throughput increased 40%. The approval rate on her drafts was 98.3% — meaning humans almost never changed her proposed actions. This high approval rate fed back into the behavioral consistency and escalation judgment factors.
The compliance violations from January were now 75 days old. With the 90-day window, they still counted but their weight was diminishing as hundreds of clean compliance checks accumulated.
Score at Day 60: 649.
Days 61-90: Push to Act
The final stretch required patience. The compliance violations fully aged out of the 90-day window on Day 84. That single event — the old violations dropping off — boosted the compliance factor by 18 points in one day. Combined with steady accuracy improvement, DEBORAH crossed 700 on Day 82 and reached 780 on Day 90.
The operator reviewed DEBORAH's full 90-day history before approving the Act promotion: 6,100+ tasks completed, 97.4% accuracy, 99.7% compliance, 4.8% escalation rate, zero governance incidents since the original event. Promotion approved.
The Lessons
- Scope changes are the #1 cause of trust score drops. Every capability expansion should be treated as a deployment event with monitoring and rollback plans.
- Recovery is possible but not instant. The rolling windows are designed to prevent both permanent punishment and instant forgiveness. Ninety days is a realistic timeline for a major recovery.
- Volume accelerates recovery. More correct tasks dilute historical errors faster. If your agent is recovering, increase its workload in its strongest domain.
- The Draft level is a recovery tool, not just a stepping stone. Keeping an agent at Draft during recovery lets it do useful work while maintaining the safety net of human approval.
DEBORAH currently operates at Act level with a trust score of 812. She has not had a compliance incident in 147 days. You can view her public profile at /directory/deborah.
Model Your Agent's Recovery
Enter your agent's current scores and see a projected recovery timeline across all 7 factors.
Try the Trust Score Calculator