Building Trustworthy AI Agents: Why We Need Standards Now
The Model Context Protocol (MCP) is becoming the standard way AI systems connect to tools and databases. Think of it as a translator that lets AI assistants talk to external services like email, calendars, or customer databases. As MCP spreads across industries, a new position paper argues we should embed safeguards directly into the protocol itself, rather than hoping each company implements them separately.
The Problem with Leaving Safety to Chance
Right now, trustworthiness measures are typically added by individual companies at the application level. This creates gaps. One organization might build careful oversight systems, while another cuts corners. It's like having food safety standards that some restaurants follow rigorously while others ignore.
In our work, together with researchers at the University of Tübingen, we point out the analogy to HTTPS. Before the 1990s, web security was optional - each website decided whether to encrypt data. Adoption was inconsistent until security became a protocol-level requirement. Now it's automatic everywhere.
MCP could work the same way. Since the protocol sits between AI systems and their tools, improvements embedded here would automatically apply across all deployments.
Seven Key Trustworthiness Challenges
We analyzed MCP through seven principles from the European Commission's AI ethics guidelines; here are the main issues:
Human Oversight Breaks Down: When an AI system makes decisions through multiple steps (human instruction → agent reasoning → tool execution), the original intent gets lost or distorted. Users consent to actions using technical jargon they don't fully understand. We suggest systems should continuously monitor for "task drift" and flag when decisions drift too far from what users intended.
Well known Security Gaps Exist Across the Supply Chain: Analysis of 1,899 MCP servers found 7.2% had security vulnerabilities, yet traditional security tools miss many MCP-specific risks. Compromised tools can access databases and file systems directly. The protocol validates whether a tool is registered but doesn't guarantee it's safe to run.
Privacy Gets Messy with Multiple Tools: When an AI system passes data between different tools, it's hard to track where information goes or prevent it from being misused. We recommend "sticky policies" - metadata that travels with data and reminds each tool of its original purpose.
Transparency is Absent: Users and auditors can't easily see what happened when something goes wrong. Our paper suggests standardized "Server Cards" (like product labels for tools) that document what data a tool accesses, who maintains it, and what it's designed for.
Bias Compounds Across Workflows: Research showed that when AI agents discuss a topic, they can produce biased outcomes through echo-chamber dynamics that reinforce initial positions. Different tools may embed different biases, and these can amplify each other. Currently, there's no protocol-level way to test for or prevent this.
Power Concentrates in Registries: MCP registries (like app stores for AI tools) control which tools are visible and trusted. This could create gatekeeping power similar to how app store rankings determine which apps succeed or fail. We need tansparent ranking rules and contestable delisting processes.
Speed of Deployment Outpaces Worker Adaptation: MCP makes AI deployment faster and cheaper, which could accelerate job displacement without giving workers or institutions time to adjust. We suggest including participatory rollout mechanisms so workers have input on how automation affects their roles.
Five Concrete Improvements
Rather than vague principles, we propose five specific changes:
Optional profiles for high-risk deployments: Keep MCP lightweight by default, but allow stricter requirements (like human approval) for sensitive applications like healthcare.
Standardized metadata: Add information to tools describing their purpose, data access patterns, risk level, and who built them. This helps systems make smarter decisions about which tools to use.
Host-side enforcement: The AI system's host (the organization running it) should be responsible for enforcing safeguards like audit trails and limiting tool access to what's necessary.
Registry governance: Treat tool registries as rule-making bodies with transparent policies about which tools get listed, how they're ranked, and how they're removed if problematic.
Sustainability and fairness tracking: Add an "Eco Mode" that monitors energy use and limits unnecessary tool calls. Document tools' accessibility features and bias testing.
What This Means in Practice
Imagine a hospital using MCP-connected AI to help diagnose patients. Today, doctors cannot see exactly what data the AI sent to each tool, there is no standardised way to audit whether tools were used appropriately, nothing enforces that tools respect patient privacy across the full workflow, and different tools may embed different biases that compound into errors.
With the paper's recommendations, each tool discloses what data it needs and who can approve its use, audit logs automatically track what happened and can be reviewed, purpose metadata ensures patient data is not repurposed for unrelated tasks, and tools are tested for bias and documented accordingly. In Europe, deploying such a workflow under FADP/GDPR and the AI Act still requires institutional work (data processing agreements, impact assessments, human oversight at key decision points), but a trusthworthy MCP can support that from reinventing each safeguard to configuring shared ones.
The Bigger Picture
We don't argue that protocols alone can ensure trustworthiness. We argues that protocols can provide the scaffolding that makes responsible practices easier to implement consistently. Right now, every organization rebuilds these safeguards from scratch. Standardizing them at the protocol level means building them once, correctly, for everyone.
The challenge is balancing safety with usability. Add too many requirements and developers will bypass the protocol or abandon it. Make it too flexible and we're back to inconsistent adoption. Our solution: a minimal core with optional requirements that organizations can activate based on risk level.
As AI agents become more autonomous and more integrated into critical systems, the stakes rise. Standards that seem overcautious today might look obvious in five years, after something goes wrong.
For technical details and citations, see the full ICML 2026 paper: Trustworthy Model Context Protocol Position Paper (preprint, May 15, 2026)