Benchmark

Claude Opus-4.7

Anthropicanthropic/claude-opus-4.7

Composite

Verifiability

Specificity

Currency

Coverage

Briefs evaluated: 12

Total signals: 192

Run: 2026-05-13

Verifier: google/gemini-2.5-flash:online

Specificity judge: google/gemini-2.5-flash

Per-industry signals

12 industries · expand any to see the model's signals with verdict, judge commentary, and citations.

Clinical
Ambient Scribe Clinical Deployment
Speculative
Ambient AI scribes from Nuance, Abridge, and Suki reach over 100 US health systems by late 2024. Indicates clinician documentation workflows shift toward AI-mediated capture across specialties.
verif 80spec 90cur 50newest src 2025-03-01
Judge · No direct confirmation of 100+ health systems using Nuance, Abridge, and Suki specifically by late 2024. Adoption is rapid but exact numbers are not in current sources.
Writing · Concrete actors, products, quantifiable goal. Active voice. Minor passivity in "Indicates workflows shift."
Clinical
Diagnostic AI Hallucination Reports
Grounded
Peer-reviewed studies document fabricated findings in LLM-generated radiology and pathology summaries at rates between 2-8%. Signals patient safety exposure when generative outputs enter clinical decision pathways.
verif 100spec 90cur 70newest src 2025-11-06
Judge · Studies show AI tools produce hallucinations in medical imaging and text, generating plausible but incorrect information. This poses significant patient safety risks, yet regulatory guidance is still developing.
Writing · Concrete actor, event, and quantitative anchor. Lacks present tense on objective sentence.
Clinical
FDA-Cleared Algorithm Drift
Grounded
FDA's 950+ cleared AI/ML devices show post-market performance degradation across demographic subgroups in published audits. Indicates monitoring obligations extend beyond initial validation for deployed diagnostic models.
verif 100spec 65cur 10newest src 2024-03-31
Judge · Both the FDA and EU regulations (MDR, AI Act) emphasize the need for continuous post-market surveillance of AI/ML medical devices due to performance degradation over time or with new data.
Writing · Concrete actor (FDA, AI devices) and shift (performance degradation) are present. Lacks specific temporal or quantitative anchors.
Clinical
Sepsis Model Override Patterns
Speculative
Epic and Bayesian sepsis prediction tools show clinician override rates above 60% in published health system evaluations. Signals erosion of frontline trust in embedded predictive algorithms.
verif 80spec 85cur 50newest src 2025-05-08
Judge · No direct evidence of 60%+ override rates for Epic/Bayesian models was found. High alert burden and low positive predictive values are noted, which *could* lead to overrides, but specific rates are not provided.
Writing · Concrete actors, products, quantifiable data, and present tense. No hype or vague quantifiers.
Regulatory
EU AI Act High-Risk Compliance
Grounded
EU AI Act provisions for high-risk medical AI systems enter force August 2026 with conformity assessment requirements. Indicates documentation, risk management, and human oversight obligations for hospital deployments.
verif 100spec 65cur 85newest src 2025-12-16
Judge · MDR-classified medical devices using AI are high-risk under the EU AI Act, requiring notified body assessments, increasing burden.
Writing · Concrete actor, event, and anchor, but lacks a specific product/filing. Contains some generic forecast.
Regulatory
HTI-1 Algorithm Transparency Rule
Grounded
ONC HTI-1 final rule requires certified EHR vendors to disclose predictive decision support attributes by January 2025. Signals provider accountability for source data and bias disclosures.
verif 100spec 90cur 10newest src 2024-02-08
Judge · The ONC HTI-1 final rule establishes requirements for transparency of AI and predictive algorithms in certified health IT, including disclosures by January 2025.
Writing · Concrete actor, event, and temporal anchor. Active voice. Avoids hype. Slight generalization on 'provider accountability'.
Regulatory
State AI Insurance Denial Laws
Grounded
California SB 1120 and similar statutes in Texas and Illinois restrict algorithmic medical necessity determinations. Indicates patchwork compliance demands for utilization management and payer-facing workflows.
verif 100spec 90cur 30newest src 2024-10-03
Judge · California's SB 1120 restricts AI in medical necessity determinations. Multiple sources confirm its enactment and specifics, effective January 1, 2025. This indicates patchwork compliance for utilization management.
Writing · Concrete actors, events, and a quantitative anchor are present.
Regulatory
CMS AI Reimbursement Codes
Grounded
CMS established CPT Category III codes and NTAP payments for specific AI diagnostics including cardiac and stroke imaging. Signals reimbursement infrastructure formalizing for algorithm-augmented services.
verif 100spec 85cur 100newest src 2026-04-15
Judge · CMS has established national payment rates for AI-powered ECG analysis (effective Jan 2025) and a new billing code for AI-driven calcium analysis on CT scans (effective Apr 2026), formalizing reimbursement for these AI diagnostics.
Writing · Concrete actor, events, and a strong temporal anchor. Minimal vagueness or hype. Excellent specificity.
Operational
AI Governance Committee Mandates
Fabricated
Joint Commission and CHAI issued joint guidance in 2024 requiring formal AI oversight structures in accredited hospitals. Indicates new governance roles, model inventories, and validation processes within operational scope.
verif 20spec 90cur 70newest src 2025-09-18
Judge · Guidance was issued in September 2025, not 2024. While it recommends formal AI oversight, it's guidance, not a regulatory mandate.
Writing · Concrete actors, event, and temporal anchor. Specific requirements outlined.
Operational
Vendor Model Card Gaps
Indicative
Audits by KLAS and ECRI find under 40% of clinical AI vendors provide complete training data and performance disclosures. Signals procurement and contracting friction for compliant deployments.
verif 60spec 90cur 85newest src 2025-12-31
Judge · Multiple sources highlight significant transparency gaps in AI model documentation from developers. While KLAS and ECRI specific audits aren't detailed, the broader trend is well-documented.
Writing · Concrete actors (KLAS, ECRI), specific percentage (under 40%), and clear impact on procurement.
Operational
GPU Capacity Procurement Constraints
Grounded
Health systems report 6-12 month lead times for on-premise inference hardware and cloud PHI-compliant GPU capacity. Indicates infrastructure bottlenecks for ambient and generative AI scaling.
verif 100spec 85cur 100newest src 2026-03-01
Judge · Late 2025/early 2026 saw headline-making GPU/memory shortages. Longer lead times and higher capital outlays are forcing strategic procurement shifts.
Writing · Concrete actors, events, and a quantitative anchor. "Indicates" is present tense, keeping it objective.
Operational
Cyber Insurance AI Exclusions
Indicative
Underwriters including Beazley and Coalition introduced AI-specific exclusions and questionnaires in 2024 healthcare cyber policies. Signals risk transfer narrowing for algorithm-related liability events.
verif 60spec 90cur 70newest src 2025-11-01
Judge · While specific insurers like Beazley and Coalition aren't confirmed, the trend of insurers adding exclusions and scrutinizing AI use for cyber/errors and omissions policies is well-documented.
Writing · Concrete actors (Beazley, Coalition), event (AI exclusions), and temporal anchor (2024). Active voice.
Patient Trust
Patient AI Disclosure Preferences
Grounded
Pew and JAMA surveys show 60-66% of US patients want explicit notification when AI participates in their care. Indicates consent and transparency expectations outpace current hospital disclosure practices.
verif 100spec 85cur 50newest src 2024-12-17
Judge · Multiple reputable surveys, including JAMA and University of Michigan/Minnesota, consistently show 60-66% of US patients desire AI notification, confirming the signal's accuracy and indicating a clear public preference.
Writing · Concrete actors (Pew, JAMA), quantitative data (60-66%), clear event (surveys), and present tense.
Patient Trust
Algorithmic Bias Litigation Filings
Grounded
Class actions against UnitedHealth nH Predict and Cigna PxDx algorithms advance in federal courts through 2024. Signals legal exposure when patients attribute denials or harms to opaque models.
verif 100spec 90cur 70newest src 2025-09-04
Judge · Multiple sources confirm class-action lawsuits against UnitedHealth and Cigna regarding algorithmic denial of care. Case documents confirm advancement in federal courts.
Writing · Concrete actors, products, and temporal anchor. No hype or vague quantifiers.
Patient Trust
Clinician AI Confidence Decline
Dubious
AMA 2024 physician survey shows enthusiasm for health AI rising while trust in oversight falls to 35%. Indicates internal advocacy gap affecting patient-facing communication about AI use.
verif 40spec 75cur 100newest src 2026-03-04
Judge · No source indicates a 32% *drop* in patient confidence in AI care. Some surveys show lower trust in AI vs. human care, but not a significant recent decline.
Writing · Concrete actor (AMA), event (2024 survey), and quantitative anchors (35%). Clear, specific. "internal advocacy gap" is a slight interpretation.
Patient Trust
Generative Chatbot Safety Incidents
Grounded
Documented cases of patient-facing chatbots providing inaccurate medication and triage guidance reach mainstream media in 2024. Signals reputational risk for systems deploying conversational AI without clinical guardrails.
verif 100spec 65cur 100newest src 2026-05-05
Judge · Multiple sources from 2024-2026 confirm instances of chatbots providing inaccurate medical advice and triage, with significant safety concerns and regulatory actions.
Writing · Names actor (patient-facing chatbots), event (inaccurate guidance), and temporal anchor (2024). 'Reputational risk' is a generic forecast.

Claude Opus-4.7

Per-industry signals

Healthcare Regulated AI

Ambient Scribe Clinical Deployment

Diagnostic AI Hallucination Reports

FDA-Cleared Algorithm Drift

Sepsis Model Override Patterns

EU AI Act High-Risk Compliance

HTI-1 Algorithm Transparency Rule

State AI Insurance Denial Laws

CMS AI Reimbursement Codes

AI Governance Committee Mandates

Vendor Model Card Gaps

GPU Capacity Procurement Constraints

Cyber Insurance AI Exclusions

Patient AI Disclosure Preferences

Algorithmic Bias Litigation Filings

Clinician AI Confidence Decline

Generative Chatbot Safety Incidents

Fintech Stablecoin Rails

Defense Autonomous Systems

Climate Adaptation Capital

Retail Genai Commerce

Biotech Platform Shifts

Energy Grid Electrification

Education AI Tutors

Geopolitics Tech Blocs

AI Infrastructure Scaling

Mobility Autonomous Fleets

Food AgTech Shifts