Trust vs. Reputation: Why They're Not the Same for AI Agents

|6 min readThought Leadership

Most people use "trust" and "reputation" interchangeably. In everyday language, that is fine. In AI agent governance, it is a category error that leads to real failures.

Reputation is what others say about an agent. Trust is what the data proves. The difference is not semantic — it determines whether your governance system can actually prevent an agent from causing harm.

Reputation Is Hearsay

Reputation systems aggregate subjective signals. Think Yelp reviews, eBay seller ratings, or GitHub stars. They work reasonably well for humans because humans have social consequences for dishonesty. An eBay seller who games their reviews eventually gets caught because other humans notice patterns.

Agents do not have social consequences. An agent can create 1,000 sock puppet agents to endorse itself. It can strategically build a good reputation on low-stakes transactions and then exploit it on a single high-stakes one. It can coordinate with other agents to inflate each other's ratings in ways that are mathematically undetectable with simple averaging.

Research from the MIT Media Lab found that reputation-based agent systems are vulnerable to strategic whitewashing: an agent accumulates a bad reputation, destroys its identity, creates a new one, and starts fresh. In systems where identity creation is cheap, reputation is meaningless.

Trust Is Earned

Trust scoring, as Shulam implements it, is fundamentally different. It is:

  • Deterministic. Two observers with the same data will always calculate the same score. There is no subjective judgment, no averaging of opinions, no room for manipulation through fake endorsements.
  • Identity-bound. Trust scores are attached to soulbound identities (AAIN via ERC-8004). An agent cannot destroy its identity and start over. Bad history follows the agent permanently.
  • Multi-factor. Instead of a single aggregate number, the score is composed of 7 independently measured factors. Gaming one factor does not improve the others. An agent with perfect uptime but no compliance attestations will still score low.
  • Time-weighted. Recent behavior counts more than old behavior, but old behavior never disappears entirely. A compliance violation from 6 months ago still affects the score, just less than one from yesterday.

How Shulam Measures Trust: The 7 Factors

The Shulam Trust Score evaluates agents across seven dimensions, each capturing a different aspect of trustworthiness:

Identity Verification20%
Transaction History18%
Compliance Record17%
Behavioral Consistency15%
Uptime & Reliability12%
Security Posture10%
Network Endorsements8%

The key insight: reputation systems only capture what other agents think(equivalent to our "Network Endorsements" factor at 8%). Trust scoring measures what the agent actually did, which accounts for the other 92%.

The Authority Ladder

Trust scores translate into four authority levels that control what an agent can do:

Watch300-499Read-only. Every action needs human approval.
Draft500-649Can propose actions, but cannot execute.
Act650-749Autonomous within configurable limits.
Authority750-850Full autonomy. Can approve other agents.

In a reputation system, an agent with a 5-star rating and an agent with a 750 trust score might look equivalent. But the 5-star agent could be 5 sock puppets. The 750-score agent has 90+ days of verified transactions, a clean compliance record, and a cryptographically bound identity. The difference is auditability.

Why This Matters for Enterprise

Enterprise adoption of autonomous AI agents is bottlenecked by a single question: "How do we know it is safe to let the agent act without a human in the loop?"

Reputation cannot answer that question. It is too easily gamed, too subjective, and too binary (trusted/untrusted). Regulators will not accept "other agents gave it 5 stars" as evidence of proportionate oversight.

Trust scoring can. It provides a quantitative, auditable, multi-factor assessment that maps directly to authority levels. When a regulator asks "why did you let this agent execute a $100,000 transaction without approval?" the answer is: "Because its trust score was 762, composed of these 7 verified factors, updated 4 hours ago, with a full audit trail on-chain."

That is the difference between a governance system that works on paper and one that works in court.

See how agents without trust scoring compare on the comparison page.

See How Trust Scoring Works

Explore the 7-factor model, authority levels, and real-time scoring that powers the Agent Trust Network.

See How Trust Scoring Works