Methodology

How CraftedTrust scores MCP servers.

The public score is simple on purpose: 12 weighted categories, backed by 63 underlying checks and clearer scan-depth labels.

One canonical model

Everything public rolls up in the same order: score model, categories, underlying checks, then scan depth.

Score model

100 points across 12 categories

The score buyers see is a 0-100 point model. Each category has a fixed weight, and the weights always add up to 100.

Underlying checks

63 checks across 9 research domains

Touchstone runs the deeper checks. Those checks feed the 12 public categories so the registry stays readable without hiding the underlying work.

Scan depth

Coverage changes confidence

A score is stronger when coverage is deeper. Metadata-only evidence is lighter than package verification, live endpoint scans, or manual review.

The 12 public score categories

These are the only categories used in the public trust score.

Authentication & Access

Identity & Auth

10 points. Auth requirements, documented auth flow, and obvious credential-handling risk.

Authentication & Access

Permission Scope

8 points. Whether the server asks for more power than it appears to need.

Server Security

Transport Security

8 points. HTTPS, TLS posture, and basic transport-layer safety signals.

Server Security

Network Behavior

10 points. Observed outbound behavior, undeclared connections, and suspicious network activity.

Server Security

Protocol Compliance

8 points. MCP compatibility, capability negotiation, and basic protocol correctness.

Tool Safety

Declaration Accuracy

8 points. Whether declared tools and resources match what is actually exposed.

Tool Safety

Tool Integrity

10 points. Prompt-injection risk, tool tampering patterns, and risky hidden behavior.

Tool Safety

Input Validation

8 points. Input constraints, schema quality, and common injection resistance signals.

Supply Chain

Supply Chain

8 points. Dependency risk, package provenance, and known vulnerability exposure.

Supply Chain

Code Transparency

6 points. Source availability, repository health, and basic documentation quality.

Supply Chain

Publisher Trust

8 points. Verified publisher signals, review history, and public accountability.

Data Handling

Data Protection

8 points. Exposure of credentials, sensitive data, and avoidable data-handling risk.

How the 63 checks feed the 12 categories

Touchstone organizes its deeper checks into 9 research domains. Those domains do not replace the public score. They feed it.

Research domain Checks Feeds these public categories
Authentication 9 Identity & Auth, Permission Scope
Tool Security 10 Declaration Accuracy, Tool Integrity
Input Validation 9 Input Validation, Data Protection
Data Security 6 Data Protection, Network Behavior
Supply Chain 8 Supply Chain, Code Transparency, Publisher Trust
Infrastructure 8 Transport Security, Network Behavior, Protocol Compliance
Runtime Behavior 5 Tool Integrity, Network Behavior, Protocol Compliance
A2A Agent Cards 5 Declaration Accuracy, Protocol Compliance
Fairness & Bias 3 Data Protection, Publisher Trust

Not every check carries the same weight, and not every scan runs every check. That is why CraftedTrust separates the public score categories from the underlying research domains and then shows scan depth and confidence separately.

Scan depth and confidence

The same server can have a strong or weak evidence base depending on how much was actually observed.

Depth 1

Metadata only

Basic listing information is present, but package or live behavior has not been fully verified yet. Lowest confidence.

Depth 2

Package verified

Package and source metadata were reviewed. Useful for supply-chain evidence, but still lighter than a live scan.

Depth 3

Live endpoint reached

CraftedTrust successfully contacted the live server and recorded behavior. This is stronger evidence for buyer review.

Depth 4

Manual review performed

A deeper publisher review or certification pass exists. This adds the strongest public confidence signal.

Grades

Letter grades stay fixed

  • A: 90-100
  • B: 75-89
  • C: 60-74
  • D: 40-59
  • F: 0-39
Framework mapping

Support material, not a second score

CoSAI, OWASP MCP and agentic AI guidance, and selected buyer diligence mappings help translate findings. They do not replace the core 12-category score model.