SAFE-MCP Community Recap: Building a Security Baseline for AI Agents
From Multi-modal Prompt Injection to Agent CLI Exploitation, from Non-Human Identity to Decentralized Trust — a deep dive into MCP security in practice
📌 TL;DR: Quick Take on SAFE-MCP
- SAFE-MCP is an open-source security framework designed to build an actionable security baseline for the Model Context Protocol (MCP), helping enterprises and developers systematically defend against AI Agent risks.
- It adopts a MITRE ATT&CK-style taxonomy, assigning unique IDs (e.g.,
SAFE-T1001
) to each MCP attack (Prompt Injection, OAuth abuse, Agent Chain attacks), enabling standardized naming and cross-team communication. - All security entries are based on real-world attack cases, including:
- Multi-modal Prompt Injection (instructions hidden in images/audio)
- Agent CLI Weaponization (malicious NPM packages stealing SSH keys and crypto wallets)
- Vector Database Poisoning (covert prompt injection in RAG systems)
- OAuth token abuse and missing audience validation
- Driving collaboration with OIDC AI Identity WG, Linux Foundation OpenSSF, OWASP, aiming to evolve SAFE-MCP into an industry-wide security standard.
- Provides a vulnerability sandbox with reproducible attack scenarios, supporting hackathons and security training.
- Encourages community contributions: developers can submit techniques, mitigations, or tool integrations to become co-authors of the standard.
- The ultimate goal: Enable CISOs and engineers to confidently answer—“Is your MCP system SAFE-MCP compliant?”
🔍 What is SAFE-MCP? More Than a Guide—It’s a “Security Language”
SAFE-MCP (Secure Model Context Protocol) is an open-source security framework created to provide an actionable evaluation and defense system for Anthropic’s Model Context Protocol (MCP).
Its origin comes from a real-world problem:
When a CISO is asked, “Can we safely deploy MCP?”, they have no authoritative security baseline to rely on.
SAFE-MCP aims to become that checklist you can tick off—
just like you reference the OWASP Top 10 for web services, or NIST SP 800-204D for cloud-native systems, in the future you will ask:
“Is it SAFE-MCP compliant?”
👉 GitHub: https://github.com/SAFE-MCP/safe-mcp
🧩 Three Core Design Principles
1. Modeled after MITRE ATT&CK, Building a Taxonomy of Attacks
SAFE-MCP models MCP threats in three layers:
- Tactics: The attacker’s goals, such as privilege escalation or data exfiltration
- Techniques: The concrete methods, such as Prompt Injection or OAuth token abuse
- Evidence: Real-world incidents and forensic indicators
This structure shifts security teams from vague concerns to specific defenses.
2. Defining a “Security Vocabulary”: Unified Language for Engineers
Each attack is assigned a unique ID, for example:
Tactic ID | Tactic Name | Technique ID | Technique Name | Description |
---|---|---|---|---|
ATK-TA0001 | Initial Access | SAFE-T1001 | Tool Poisoning Attack (TPA) | Attackers embed malicious instructions within MCP tool descriptions that are invisible to users but processed by LLMs |
ATK-TA0002 | Execution | SAFE-T1101 | Command Injection | Exploitation of unsanitized input in MCP server implementations leading to remote code execution |
ATK-TA0003 | Persistence | SAFE-T1201 | MCP Rug Pull Attack | Time-delayed malicious tool definition changes after initial approval |
Like HTTP 404 or CVE IDs, this makes communication faster and more precise.
3. Grounded in Real Attacks, Not Theoretical Models
Frederick Kautz emphasized:
“Many developers think MCP attacks are still theoretical, but the reality is we see new reports almost daily—crypto wallets drained, data leaks, systems compromised via prompt injection.”
Every SAFE-MCP entry must include a real-world case or reproducible PoC to ensure practical value.
⚠️ OAuth: “A Simple Protocol with Complex Traps”
The MCP spec looks clean—it defines a JSON interface over SSE. But the real complexity comes from its dependencies: OAuth and Server-Sent Events (SSE).
As Frederick noted:
“MCP itself is ‘easy’ but not ‘simple.’ It pulls in protocols like OAuth, which are extremely complex and disastrous if misused.”
🔐 OAuth Best Practices: Two Non-Negotiable Rules
✅ Rule 1: Never Hand-Roll OAuth
- OAuth involves dozens of security edge cases.
- Correct approach: use mature libraries (Auth0, Okta SDK), avoid reinventing the wheel.
✅ Rule 2: Always Validate the audience
Field
Suppose your MCP service is acme.example
. When receiving a GitHub token, you must check:
{
"aud": "acme.example"
}
If aud
is github.com
or something else, the token was not issued for you and must be rejected.
🌪️ Otherwise, attackers could reuse tokens from other MCP services to impersonate users.
Frederick warned:
“If I don’t validate audience, an attacker can take a token issued for foo.com and use it against me. The token is valid, but it’s not mine — I must reject it.”
🧪 New Attack Cases: MCP’s Rapidly Expanding Threat Surface
SAFE-MCP highlights entirely new attack paths emerging in the AI Agent ecosystem. Recent PRs captured threats like:
Attack Type | Description | Mitigation |
---|---|---|
Multi-modal Prompt Injection | Encode malicious instructions in image pixels (e.g., 255,255,254) or audio spectrograms | Semantic input validation, limit contextual injections |
Agent CLI Weaponization | Malicious NPM package executes on install, stealing .env , SSH keys, wallets |
Disable auto-exec, enforce human code review |
Vector DB Poisoning (RAG Attacks) | Inject prompts into KB, triggered during retrieval | Integrity checks and access controls for RAG sources |
Container Escape & Privilege Escalation | Exploit sandbox tool vulnerabilities to break out to host | Enforce least privilege, strong isolation |
Agent Chain MITM | Alter messages between chained Agents to mislead actions | Secure handshake ensuring trusted sources |
Privilege Escalation in Delegation | Read-only Agent upgraded to read-write+execute mid-chain | Implement privilege chain validation |
These cases show: MCP risks go far beyond input poisoning — they affect the full AI workflow.
🛠️ Identity & Trust Models: From NHI to Decentralized CA
SAFE-MCP also tackles trust — how do we know who an Agent is, and what it can do?
🔑 Non-Human Identity (NHI)
- Current identity systems (Okta, etc.) are human-centric.
- AI Agents, scripts, microservices also need identity.
- Proposal:
- Use TPM or cloud TEEs (e.g., AWS Nitro) as trust roots
- Issue sub-certificates to Agents
- Build traceable, auditable chains
💡 Example: PayPal could run its own CA, issuing certs for all internal AI Agents.
🌐 Decentralized Trust Models
- Don’t rely on one central CA.
- Support multi-org, multi-tenant distributed trust.
- Inspired by Web3: record policies/identities on blockchain or decentralized storage.
“Instead of one central CA, PayPal controls its own Agents, another company controls theirs — that’s a sane trust boundary.” — Arjun Subedi
🧩 MCP in Essence: Not an API Replacement, but a “Bridge”
A frequent question: How is MCP different from APIs?
Answer: MCP is an API spec purpose-built for LLMs.
🔄 MCP’s Two-phase Interaction Model
- Initialization (Schema Registration)
Tool provides JSON Schema describing functions, parameters, return values.
Example:{ "name": "send_email", "description": "Send an email to a recipient", "parameters": { ... } }
- Invocation (Structured Execution)
- LLM interprets intent → generates schema-compliant JSON
- Tool executes → returns structured result
✅ Why MCP?
Traditional APIs | MCP |
---|---|
Manual integration logic | LLM auto-discovers and calls |
Assumes deterministic caller | Maps fuzzy language to exact fields |
No unified discovery | Tools auto-exposed via /tools |
“The value of MCP isn’t fewer APIs, but enabling LLMs to call any tool safely and reliably.” — Frederick Kautz
📚 From Policy to Procedure: SAFE-MCP in Compliance Systems
SAFE-MCP’s philosophy maps neatly to enterprise compliance layers:
Layer | Example | SAFE-MCP Role |
---|---|---|
Policy | “AI systems must prevent data leakage” | Defines high-level security principle |
Standard | MITRE ATT&CK, OWASP AI Top 10 | Maps attack techniques, taxonomy |
Procedure | “How to configure MCP to prevent T1001” | Concrete operational checklist |
Guidance | “Use LangFuse for observability” | Tooling integration advice |
SAFE-MCP is evolving into the “playbook” of AI security — bridging “what to do” with “how to do it.”
🤝 Community & Collaboration: From Project to Standard
- 9 active contributors, spanning security, AI, and cloud-native domains
- 10+ merged PRs, covering OAuth, CLI security, multimodal attacks
- Migrated to independent GitHub org to mark project’s open-source maturity
- First attack sandbox released, with common MCP misconfigurations for testing
- Bi-weekly hackathons & contributor meetups keep momentum high
🌐 Three Key Standards Partnerships
Org | Collaboration | Impact |
---|---|---|
OIDC AI Identity WG | Co-develop MCP identity standards | NHI standardization |
Linux Foundation OpenSSF | Apply for incubation | Resource support, industry endorsement |
OWASP | Align with AI Security Top 10 | Cross-standard mapping |
Frederick:
“Our goal isn’t an isolated framework — it’s a shared language for AI security.”
🧑💻 How Developers Can Contribute
SAFE-MCP is open, transparent, and community-driven.
🚀 Ways to Join
- Contribute Techniques
- Browse open
T1xxx
issues on GitHub - Add cases, tactics, evidence, mitigations
- Browse open
- Submit Mitigations
- Suggest defenses for existing attacks (
M1
,M2
, etc.)
- Suggest defenses for existing attacks (
- Build Toolchains
- Auditing, detection, identity mgmt, observability
- Write Docs
- Guides, best practices, onboarding materials
✅ Conclusion: The Future of SAFE-MCP
SAFE-MCP is building a scalable, verifiable, and practical AI Agent security baseline:
- ✅ Standardized language: IDs and definitions unify communication
- ✅ Real-world grounded: based on true cases, reproducible PoCs
- ✅ Operationalized: checklists, sandboxes, integrations
- ✅ Ecosystem-driven: connected to OIDC, OpenSSF, OWASP
Future possibilities:
- Startups building APIs around
SAFE-T1001
defenses - CISOs adopting SAFE-MCP as MCP launch criteria
- LLM devs embedding SAFE-MCP checks in dev workflows
Author’s Note: This blog is based on the SAFE-MCP Contributor Gathering (Sept 1, 2024). All details and cases are drawn from the live talks and discussions, aiming to faithfully capture the project’s vision and progress.
If you found this post useful, feel free to bookmark, share, or follow my blog at astromen.github.io!