AI app trust & transparency index

Methodology

This page explains exactly how every grade is produced: what we read, how we score it, what each grade means, and what the index deliberately does not claim. The method is designed to be reproducible: anyone with the same policy text and this rubric should arrive at the same letter.

What does this index measure?

The index scores what an AI app discloses about its data-governance practices in its public privacy policy and terms. It measures the quality and substance of those written commitments. It does not measure whether the commitments are true, whether they are enforced, or whether the underlying AI is “safe.”

A strong policy can hide weak practice, and a thin policy can hide good practice. Every grade is a statement about documents, not about behaviour.

We express confidence from positive public evidence. When an app clearly documents a good practice, it earns points. When a practice is undocumented, it earns nothing, and we make no assumption that the practice is bad. Absence of evidence lowers the score and the confidence figure. It is not an accusation.

The scoring methodology

Each app is scored across seven data-governance domains. A domain’s point budget is its weight, made visible, so a reader can re-add the points by hand instead of trusting a hidden formula. The allocation is a declared editorial judgement by VerifyWise about how much weight each disclosure carries.

Domain	Points
Training-data useD1 Whether user inputs train models, opt-out or opt-in, human-reviewer access, and whether the user owns generated outputs.	2.0
Data-subject rights & user controlD2 Access, deletion, portability, correction, and the right to object or opt out, each with a named mechanism.	2.2
Retention, deletion & minimizationD3 Named retention periods, deletion timelines, shorter windows for AI conversation logs, and a data-minimization commitment.	1.5
Third-party sharing, sub-processors & transfersD4 Categories of recipients, a sub-processor list or DPA, whether data is sold or shared, international-transfer safeguards, and government-access standards.	1.5
Transparency & AI disclosureD5 Disclosure that you are interacting with AI, marking of AI-generated output, the data categories collected, legal bases, and policy versioning.	1.5
Sensitive data, children & automated decisionsD6 Special-category data limits, biometric governance, children's-data protections, and disclosure of consequential automated decisions.	0.7
Security & accountabilityD7 Named security controls, a breach-notification commitment, and named certifications or a privacy contact.	0.6
Total	10.0

How points are earned

Each domain holds several indicators (30 in total). Every indicator awards full, half, or zero of its point slice:

Full. The clause explicitly grants the right or limit with specifics: a named mechanism, a concrete timeframe (a number), or a defined scope, such as “30 days,” “Standard Contractual Clauses,” or “AES-256.”
Half. The topic is addressed, but vaguely, conditionally, or only on a paid tier (for example “industry-standard security” or “as long as necessary”). Named-but-vague boilerplate always scores half, never zero.
Zero. The policy is silent on the topic, or it reserves the harmful behaviour. Both earn zero, and the published record distinguishes silent from adverse.

The total is the sum of awarded points, scaled to 100 over the indicators that apply to that app. Indicators that depend on a capability the app does not have, such as synthetic-media marking for a text-only app, are removed from both the earned points and the maximum, so no app is penalised for a capability it lacks.

Length is never scored, only documented substance. A short policy that explicitly grants deletion, names a retention window, and says it does not train on your data earns those full slices whatever its length.

How are AI app grades given?

The score, on a scale of 0 to 100, is banded into five letter grades:

A70–100· Strong disclosureB60–69· Good disclosureC48–59· Partial disclosureD35–47· Weak disclosureF0–34· Poor disclosure

The raw score and the per-domain points are always published next to the letter, so you can see the distance between two apps in the same band and re-derive the total yourself.

Dealbreaker flags

Some clauses are dealbreakers for trust no matter how good the rest of the policy reads. When an app explicitly reserves one of these, we raise a prominent flag and cap its displayed grade at B. The underlying score is never changed, so the number stays accurate while the warning stays visible:

Trains on your data with no way to opt out.
Claims a broad, perpetual licence to your content.
Refuses deletion or asserts indefinite retention with no deletion right.
Sells or shares your data for advertising with no opt-out.

The most common trigger by far is training on user data with no opt-out; an explicit perpetual content licence or a refusal to delete is rarer. Where a policy instead pushes risk into silence and vagueness rather than openly reserving a harmful right, no flag is raised and the point sum penalises that silence on its own.

Confidence

Alongside each grade we publish a confidence band of High, Medium, or Low, based on how much of the judgement rests on quoted evidence versus recorded silence. An app graded low on many silent indicators carries lower confidence than one graded low on quoted adverse clauses. So we state exactly how much of each grade is backed by quotes, which a self-attested label never does.

Why do some of the best apps cluster at B?

Most general-purpose assistants train on your conversations by default and offer an opt-out, instead of not training at all. We treat “we do not train on your data” (or opt-in only) as clearly stronger than “we train by default, opt out if you find the setting.” So even the strongest such app cannot top the scale on training alone, and a group of strong assistants lands at B. That reflects how the current market actually works.

Which apps are included?

The index covers 200+ AI apps drawn from two worlds. Consumer apps are seeded from a citable third-party ranking (a16z Top 100 Gen AI Consumer Apps, 6th Edition (Mar 2026)). Enterprise and business apps are drawn from a market-share report (Menlo Ventures: 2025 State of Generative AI in the Enterprise (Dec 2025)) and supplemented by an editorial selection of widely-used business AI tools that store or process customer and corporate data, the place where data-governance transparency matters most.

In every case, inclusion is about reach and data exposure, not a judgement of any app: VerifyWise decides the score, never whether an app belongs. Capability metadata, such as whether an app generates images, processes biometrics, or operates internationally, is recorded from public sources and is never inferred from policy text. Apps that no longer exist as standalone products, for example after an acquisition, are removed.

Reproducibility, freshness & disputes

These are the ground rules behind every grade: how a score can be reproduced, how often it is refreshed, which version of a policy we read, and how a vendor can contest a result.

Reproducible. For each app we read both the privacy policy and the terms of service where a distinct terms document exists, capturing each page’s text as a dated snapshot. Each of the rubric’s indicators is awarded full, half, or no credit against quoted evidence at temperature 0, and the final score is summed in code, not estimated. The same snapshot and rubric version 2.0 yield the same grade.
A snapshot in time. A score reflects each policy as captured on its assessed date; policies change, and re-publication triggers a re-score.
Region. We score the global / US-default version most users receive, not the strongest regional carve-out.
Independence. No app can pay to change its grade. A score changes only on document evidence, such as a new or corrected clause, never on a vendor’s claim about its own behaviour.

Known limitations

No scoring method is perfect, and this one has clear boundaries worth stating up front.

It scores disclosure, not behaviour. A vendor can write the right words without doing them.
It can be gamed by careful drafting. A well-written policy scores well whatever the practice behind it.
Weights and thresholds are declared editorial judgements, calibrated in a pilot rather than derived empirically.
The index keeps growing, and grades may be revised as policies change or the rubric evolves; every change is versioned.

Get in touch

Want your app added to the index, have a question about how a grade was reached, or think an app should be re-graded? Email us at hello@verifywise.ai. We read every message and update grades when the evidence supports it.

Disclaimer

The VerifyWise AI Trust & Transparency Index is provided for general informational purposes only and reflects VerifyWise's good-faith analysis of publicly available privacy policies and terms of service as of each app's stated assessment date. Each grade is a statement of opinion based on our published methodology, not a statement of fact, legal or compliance advice, or an endorsement or warning about any company or product. We assess only what these documents disclose; we do not audit, test, or verify any company's actual data handling, security practices, or legal compliance, and a grade does not measure whether an app is safe or trustworthy to use. Policies change over time, so a grade may not reflect an app's current documents, and any assessment may contain errors or omissions. The index is not a substitute for your own review; you should read the relevant policies and seek professional advice before relying on it. VerifyWise makes no warranty as to accuracy or completeness and, to the fullest extent permitted by law, accepts no liability for any decision made in reliance on the index. Company and product names are trademarks of their respective owners; their inclusion does not imply any affiliation with or endorsement by VerifyWise. To request a correction, contact privacy@verifywise.ai.

Last assessed 2026-07-08. Rubric version 2.0.