How CraftedTrust scores MCP servers.
The public score is simple on purpose: 12 weighted categories, backed by 63 underlying checks and clearer scan-depth labels.
One canonical model
Everything public rolls up in the same order: score model, categories, underlying checks, then scan depth.
100 points across 12 categories
The score buyers see is a 0-100 point model. Each category has a fixed weight, and the weights always add up to 100.
63 checks across 9 research domains
Touchstone runs the deeper checks. Those checks feed the 12 public categories so the registry stays readable without hiding the underlying work.
Coverage changes confidence
A score is stronger when coverage is deeper. Metadata-only evidence is lighter than package verification, live endpoint scans, or manual review.
The 12 public score categories
These are the only categories used in the public trust score.
Identity & Auth
10 points. Auth requirements, documented auth flow, and obvious credential-handling risk.
Permission Scope
8 points. Whether the server asks for more power than it appears to need.
Transport Security
8 points. HTTPS, TLS posture, and basic transport-layer safety signals.
Network Behavior
10 points. Observed outbound behavior, undeclared connections, and suspicious network activity.
Protocol Compliance
8 points. MCP compatibility, capability negotiation, and basic protocol correctness.
Declaration Accuracy
8 points. Whether declared tools and resources match what is actually exposed.
Tool Integrity
10 points. Prompt-injection risk, tool tampering patterns, and risky hidden behavior.
Input Validation
8 points. Input constraints, schema quality, and common injection resistance signals.
Supply Chain
8 points. Dependency risk, package provenance, and known vulnerability exposure.
Code Transparency
6 points. Source availability, repository health, and basic documentation quality.
Publisher Trust
8 points. Verified publisher signals, review history, and public accountability.
Data Protection
8 points. Exposure of credentials, sensitive data, and avoidable data-handling risk.
How the 63 checks feed the 12 categories
Touchstone organizes its deeper checks into 9 research domains. Those domains do not replace the public score. They feed it.
| Research domain | Checks | Feeds these public categories |
|---|---|---|
| Authentication | 9 | Identity & Auth, Permission Scope |
| Tool Security | 10 | Declaration Accuracy, Tool Integrity |
| Input Validation | 9 | Input Validation, Data Protection |
| Data Security | 6 | Data Protection, Network Behavior |
| Supply Chain | 8 | Supply Chain, Code Transparency, Publisher Trust |
| Infrastructure | 8 | Transport Security, Network Behavior, Protocol Compliance |
| Runtime Behavior | 5 | Tool Integrity, Network Behavior, Protocol Compliance |
| A2A Agent Cards | 5 | Declaration Accuracy, Protocol Compliance |
| Fairness & Bias | 3 | Data Protection, Publisher Trust |
Not every check carries the same weight, and not every scan runs every check. That is why CraftedTrust separates the public score categories from the underlying research domains and then shows scan depth and confidence separately.
Scan depth and confidence
The same server can have a strong or weak evidence base depending on how much was actually observed.
Metadata only
Basic listing information is present, but package or live behavior has not been fully verified yet. Lowest confidence.
Package verified
Package and source metadata were reviewed. Useful for supply-chain evidence, but still lighter than a live scan.
Live endpoint reached
CraftedTrust successfully contacted the live server and recorded behavior. This is stronger evidence for buyer review.
Manual review performed
A deeper publisher review or certification pass exists. This adds the strongest public confidence signal.
Letter grades stay fixed
- A: 90-100
- B: 75-89
- C: 60-74
- D: 40-59
- F: 0-39
Support material, not a second score
CoSAI, OWASP MCP and agentic AI guidance, and selected buyer diligence mappings help translate findings. They do not replace the core 12-category score model.