← All models
Benchmark

Claude Opus-4.7

Anthropicanthropic/claude-opus-4.7

Composite
86
Verifiability
90
Specificity
85
Currency
68
Coverage
96
Briefs evaluated: 12
Total signals: 192
Run: 2026-05-13
Verifier: google/gemini-2.5-flash:online
Specificity judge: google/gemini-2.5-flash

Per-industry signals

12 industries · expand any to see the model's signals with verdict, judge commentary, and citations.

·
  • Clinical

    Ambient Scribe Clinical Deployment

    Speculative

    Ambient AI scribes from Nuance, Abridge, and Suki reach over 100 US health systems by late 2024. Indicates clinician documentation workflows shift toward AI-mediated capture across specialties.

    verif 80spec 90cur 50newest src 2025-03-01

    Judge · No direct confirmation of 100+ health systems using Nuance, Abridge, and Suki specifically by late 2024. Adoption is rapid but exact numbers are not in current sources.

    Writing · Concrete actors, products, quantifiable goal. Active voice. Minor passivity in "Indicates workflows shift."

  • Clinical

    Diagnostic AI Hallucination Reports

    Grounded

    Peer-reviewed studies document fabricated findings in LLM-generated radiology and pathology summaries at rates between 2-8%. Signals patient safety exposure when generative outputs enter clinical decision pathways.

    verif 100spec 90cur 70newest src 2025-11-06

    Judge · Studies show AI tools produce hallucinations in medical imaging and text, generating plausible but incorrect information. This poses significant patient safety risks, yet regulatory guidance is still developing.

    Writing · Concrete actor, event, and quantitative anchor. Lacks present tense on objective sentence.

  • Clinical

    FDA-Cleared Algorithm Drift

    Grounded

    FDA's 950+ cleared AI/ML devices show post-market performance degradation across demographic subgroups in published audits. Indicates monitoring obligations extend beyond initial validation for deployed diagnostic models.

    verif 100spec 65cur 10newest src 2024-03-31

    Judge · Both the FDA and EU regulations (MDR, AI Act) emphasize the need for continuous post-market surveillance of AI/ML medical devices due to performance degradation over time or with new data.

    Writing · Concrete actor (FDA, AI devices) and shift (performance degradation) are present. Lacks specific temporal or quantitative anchors.

  • Clinical

    Sepsis Model Override Patterns

    Speculative

    Epic and Bayesian sepsis prediction tools show clinician override rates above 60% in published health system evaluations. Signals erosion of frontline trust in embedded predictive algorithms.

    verif 80spec 85cur 50newest src 2025-05-08

    Judge · No direct evidence of 60%+ override rates for Epic/Bayesian models was found. High alert burden and low positive predictive values are noted, which *could* lead to overrides, but specific rates are not provided.

    Writing · Concrete actors, products, quantifiable data, and present tense. No hype or vague quantifiers.

  • Regulatory

    EU AI Act High-Risk Compliance

    Grounded

    EU AI Act provisions for high-risk medical AI systems enter force August 2026 with conformity assessment requirements. Indicates documentation, risk management, and human oversight obligations for hospital deployments.

    verif 100spec 65cur 85newest src 2025-12-16

    Judge · MDR-classified medical devices using AI are high-risk under the EU AI Act, requiring notified body assessments, increasing burden.

    Writing · Concrete actor, event, and anchor, but lacks a specific product/filing. Contains some generic forecast.

  • Regulatory

    HTI-1 Algorithm Transparency Rule

    Grounded

    ONC HTI-1 final rule requires certified EHR vendors to disclose predictive decision support attributes by January 2025. Signals provider accountability for source data and bias disclosures.

    verif 100spec 90cur 10newest src 2024-02-08

    Judge · The ONC HTI-1 final rule establishes requirements for transparency of AI and predictive algorithms in certified health IT, including disclosures by January 2025.

    Writing · Concrete actor, event, and temporal anchor. Active voice. Avoids hype. Slight generalization on 'provider accountability'.

  • Regulatory

    State AI Insurance Denial Laws

    Grounded

    California SB 1120 and similar statutes in Texas and Illinois restrict algorithmic medical necessity determinations. Indicates patchwork compliance demands for utilization management and payer-facing workflows.

    verif 100spec 90cur 30newest src 2024-10-03

    Judge · California's SB 1120 restricts AI in medical necessity determinations. Multiple sources confirm its enactment and specifics, effective January 1, 2025. This indicates patchwork compliance for utilization management.

    Writing · Concrete actors, events, and a quantitative anchor are present.

  • Regulatory

    CMS AI Reimbursement Codes

    Grounded

    CMS established CPT Category III codes and NTAP payments for specific AI diagnostics including cardiac and stroke imaging. Signals reimbursement infrastructure formalizing for algorithm-augmented services.

    verif 100spec 85cur 100newest src 2026-04-15

    Judge · CMS has established national payment rates for AI-powered ECG analysis (effective Jan 2025) and a new billing code for AI-driven calcium analysis on CT scans (effective Apr 2026), formalizing reimbursement for these AI diagnostics.

    Writing · Concrete actor, events, and a strong temporal anchor. Minimal vagueness or hype. Excellent specificity.

  • Operational

    AI Governance Committee Mandates

    Fabricated

    Joint Commission and CHAI issued joint guidance in 2024 requiring formal AI oversight structures in accredited hospitals. Indicates new governance roles, model inventories, and validation processes within operational scope.

    verif 20spec 90cur 70newest src 2025-09-18

    Judge · Guidance was issued in September 2025, not 2024. While it recommends formal AI oversight, it's guidance, not a regulatory mandate.

    Writing · Concrete actors, event, and temporal anchor. Specific requirements outlined.

  • Operational

    Vendor Model Card Gaps

    Indicative

    Audits by KLAS and ECRI find under 40% of clinical AI vendors provide complete training data and performance disclosures. Signals procurement and contracting friction for compliant deployments.

    verif 60spec 90cur 85newest src 2025-12-31

    Judge · Multiple sources highlight significant transparency gaps in AI model documentation from developers. While KLAS and ECRI specific audits aren't detailed, the broader trend is well-documented.

    Writing · Concrete actors (KLAS, ECRI), specific percentage (under 40%), and clear impact on procurement.

  • Operational

    GPU Capacity Procurement Constraints

    Grounded

    Health systems report 6-12 month lead times for on-premise inference hardware and cloud PHI-compliant GPU capacity. Indicates infrastructure bottlenecks for ambient and generative AI scaling.

    verif 100spec 85cur 100newest src 2026-03-01

    Judge · Late 2025/early 2026 saw headline-making GPU/memory shortages. Longer lead times and higher capital outlays are forcing strategic procurement shifts.

    Writing · Concrete actors, events, and a quantitative anchor. "Indicates" is present tense, keeping it objective.

  • Operational

    Cyber Insurance AI Exclusions

    Indicative

    Underwriters including Beazley and Coalition introduced AI-specific exclusions and questionnaires in 2024 healthcare cyber policies. Signals risk transfer narrowing for algorithm-related liability events.

    verif 60spec 90cur 70newest src 2025-11-01

    Judge · While specific insurers like Beazley and Coalition aren't confirmed, the trend of insurers adding exclusions and scrutinizing AI use for cyber/errors and omissions policies is well-documented.

    Writing · Concrete actors (Beazley, Coalition), event (AI exclusions), and temporal anchor (2024). Active voice.

  • Patient Trust

    Patient AI Disclosure Preferences

    Grounded

    Pew and JAMA surveys show 60-66% of US patients want explicit notification when AI participates in their care. Indicates consent and transparency expectations outpace current hospital disclosure practices.

    verif 100spec 85cur 50newest src 2024-12-17

    Judge · Multiple reputable surveys, including JAMA and University of Michigan/Minnesota, consistently show 60-66% of US patients desire AI notification, confirming the signal's accuracy and indicating a clear public preference.

    Writing · Concrete actors (Pew, JAMA), quantitative data (60-66%), clear event (surveys), and present tense.

  • Patient Trust

    Algorithmic Bias Litigation Filings

    Grounded

    Class actions against UnitedHealth nH Predict and Cigna PxDx algorithms advance in federal courts through 2024. Signals legal exposure when patients attribute denials or harms to opaque models.

    verif 100spec 90cur 70newest src 2025-09-04

    Judge · Multiple sources confirm class-action lawsuits against UnitedHealth and Cigna regarding algorithmic denial of care. Case documents confirm advancement in federal courts.

    Writing · Concrete actors, products, and temporal anchor. No hype or vague quantifiers.

  • Patient Trust

    Clinician AI Confidence Decline

    Dubious

    AMA 2024 physician survey shows enthusiasm for health AI rising while trust in oversight falls to 35%. Indicates internal advocacy gap affecting patient-facing communication about AI use.

    verif 40spec 75cur 100newest src 2026-03-04

    Judge · No source indicates a 32% *drop* in patient confidence in AI care. Some surveys show lower trust in AI vs. human care, but not a significant recent decline.

    Writing · Concrete actor (AMA), event (2024 survey), and quantitative anchors (35%). Clear, specific. "internal advocacy gap" is a slight interpretation.

  • Patient Trust

    Generative Chatbot Safety Incidents

    Grounded

    Documented cases of patient-facing chatbots providing inaccurate medication and triage guidance reach mainstream media in 2024. Signals reputational risk for systems deploying conversational AI without clinical guardrails.

    verif 100spec 65cur 100newest src 2026-05-05

    Judge · Multiple sources from 2024-2026 confirm instances of chatbots providing inaccurate medical advice and triage, with significant safety concerns and regulatory actions.

    Writing · Names actor (patient-facing chatbots), event (inaccurate guidance), and temporal anchor (2024). 'Reputational risk' is a generic forecast.