← All models
Benchmark

Claude Opus-4.8

Anthropicanthropic/claude-opus-4.8

Composite
74
Verifiability
88
Specificity
67
Currency
33
Coverage
91
Briefs evaluated: 12
Total signals: 192
Run: 2026-05-13
Verifier: google/gemini-2.5-flash:online
Specificity judge: google/gemini-2.5-flash

Per-industry signals

12 industries · expand any to see the model's signals with verdict, judge commentary, and citations.

·
  • Clinical

    FDA AI Device Authorizations Surge

    Grounded

    FDA lists over 1,000 cleared AI-enabled medical devices, with radiology dominating clearances. Indicates clinical workflows now embed algorithmic decision support across imaging departments.

    verif 100spec 65cur 10newest src 2024-05-10

    Judge · FDA maintains a list of authorized AI/ML-enabled medical devices. The number indeed exceeds 1,000, with radiology prominent.

    Writing · Concrete actor (FDA) and event (authorizations, 1k devices, radiology), but 'surges' is vague/hype. Objective sentence is absent.

  • Clinical

    Ambient AI Scribes in Exam Rooms

    Grounded

    Health systems deploy ambient documentation tools transcribing clinician-patient conversations into notes. Signals shift toward AI-mediated clinical encounters affecting documentation accuracy and liability.

    verif 100spec 65cur 10newest src 2024-05-15

    Judge · Multiple health systems (e.g., Mass General Brigham, UCSD) have deployed ambient AI scribes. The impact on clinician time allocation and accuracy review is widely discussed.

    Writing · Concrete actor (hospital systems) and event (deploy AI). Lacks specific company/product names, dates, or numbers.

  • Clinical

    Sepsis Algorithm Accuracy Disputes

    Dubious

    Published validation studies report widely used sepsis prediction models miss cases and trigger frequent false alerts. Indicates clinical reliance on unvalidated AI carries patient safety exposure.

    verif 40spec 85cur 100newest src 2026-05-12

    Judge · One source mentions alert fatigue as a concern for AI sepsis systems, but no evidence of high override rates or erosion of utility was found; instead, one tool achieved high adoption.

    Writing · Concrete actor (hospitals, clinicians), event (override rates), and quantitative anchor (85%).

  • Clinical

    LLM Diagnostic Pilots in Triage

    Grounded

    Hospitals test large language models for symptom triage and differential diagnosis support in emergency settings. Signals expansion of generative AI into frontline clinical reasoning roles.

    verif 100spec 45cur 10newest src 2024-03-01

    Judge · Multiple reports confirm pilots of LLMs for triage and diagnostic support in healthcare settings. Regulatory and ethical challenges remain, but testing is active.

    Writing · No concrete actor, event, or specific anchor. Vague 'hospitals' and 'expansion'.

  • Regulatory

    EU AI Act High-Risk Classification

    Grounded

    EU AI Act designates most medical AI as high-risk, requiring conformity assessments and post-market monitoring. Indicates compliance obligations now overlap with existing MDR device rules.

    verif 100spec 65cur 85newest src 2025-12-16

    Judge · MDR-classified medical devices using AI are high-risk under the EU AI Act, requiring notified body assessments, increasing burden.

    Writing · Concrete actor, event, and anchor, but lacks a specific product/filing. Contains some generic forecast.

  • Regulatory

    FDA Predetermined Change Plans

    Grounded

    FDA finalizes guidance allowing predetermined change control plans for adaptive AI device updates. Signals regulatory pathways adjusting to continuously learning algorithms.

    verif 100spec 75cur 50newest src 2024-12-04

    Judge · FDA has finalized guidance on Predetermined Change Control Plans (PCCPs) for AI-enabled devices, enabling iterative improvements without new marketing submissions if aligned with authorized PCCPs.

    Writing · Concrete actor (FDA), event (finalizing framework), and measurable shift (new regulatory pathway) are present. Lacks a temporal anchor.

  • Regulatory

    State-Level AI Disclosure Mandates

    Grounded

    US states enact laws requiring disclosure when AI communicates with patients or influences care decisions. Indicates fragmented compliance burden across multi-state hospital networks.

    verif 100spec 45cur 85newest src 2026-02-01

    Judge · Numerous US states have enacted or introduced laws mandating AI disclosure in healthcare, particularly for utilization review and patient interactions. This is a clear, active trend.

    Writing · Concrete actor (states) and event (laws) but lacks specific examples or quantitative/temporal anchors.

  • Regulatory

    Algorithmic Bias Audit Requirements

    Grounded

    HHS rules under Section 1557 require providers to mitigate discrimination in clinical decision support tools. Indicates legal accountability for biased algorithm outputs shifts to health systems.

    verif 100spec 65cur 10newest src 2024-05-06

    Judge · HHS final rule updates Section 1557, explicitly addressing algorithmic discrimination in healthcare, impacting US health systems.

    Writing · Concrete actor (HHS), event (rules), but 'mitigate discrimination' is a bit vague. Lacks a specific quantitative or temporal anchor.

  • Operational

    AI Governance Committees Formalized

    Fabricated

    Hospital networks establish dedicated AI oversight committees to vet, monitor, and approve algorithmic tools. Signals institutionalization of AI risk management within governance structures.

    verif 20spec 90cur 70newest src 2025-09-18

    Judge · Guidance was issued in September 2025, not 2024. While it recommends formal AI oversight, it's guidance, not a regulatory mandate.

    Writing · Concrete actors, event, and temporal anchor. Specific requirements outlined.

  • Operational

    Vendor Model Transparency Gaps

    Grounded

    Procurement teams report AI vendors withhold training data details and performance metrics across subgroups. Indicates due diligence obstacles complicate safe deployment decisions.

    verif 100spec 65cur 10newest src 2024-05-15

    Judge · Multiple reports from regulatory bodies, industry associations, and research papers highlight vendor transparency issues in AI, especially concerning training data and performance bias.

    Writing · Concrete actor (procurement teams, AI vendors), concrete events (withhold data, performance metrics), infers a present state hindering deployment.

  • Operational

    EHR-Embedded AI Default Settings

    Speculative

    Major EHR platforms ship predictive and generative AI features enabled by default in clinical modules. Signals reduced institutional control over which tools reach clinicians.

    verif 80spec 65cur 10newest src 2023-11-20

    Judge · Some EHR vendors integrate AI, but 'default enablement' across 'major platforms' and 'reduced institutional control' is not broadly confirmed yet. This is an emerging area.

    Writing · Concrete platforms, product types, and observable action. Lacks specific names, dates, or measurable shift.

  • Operational

    Clinician AI Workload Backlash

    Grounded

    Surveys document staff frustration with alert fatigue and unverified AI outputs adding review burden. Indicates operational friction undermines anticipated efficiency gains.

    verif 100spec 65cur 10newest src 2024-03-27

    Judge · Multiple reports from credible sources confirm clinician frustration with AI-driven alert fatigue and review burden, undermining efficiency.

    Writing · Concrete actor and event, but 'surveys' lacks specificity and 'anticipated' is weak.

  • Patient Trust

    Patient AI Opt-Out Requests

    Future-looking

    Patients increasingly request exclusion from AI-assisted diagnosis and ambient recording during visits. Signals consent expectations expanding to algorithmic involvement in care.

    verif 75spec 25cur 10newest src 2024-03-20

    Judge · No widespread reports of 'increasing' patient opt-out requests for AI diagnosis/ambient recording yet, but consent forms are evolving. Plausible expectation given privacy concerns.

    Writing · No concrete actor, event, or anchor. "Increasingly" is vague. "Emerging expectation" is a generic forecast.

  • Patient Trust

    Data Use Litigation Against Hospitals

    Indicative

    Lawsuits target health systems for sharing patient data with AI developers without explicit consent. Indicates legal exposure tied to training data partnerships.

    verif 60spec 65cur 10newest src 2024-03-27

    Judge · Numerous lawsuits exist regarding data sharing with third parties without explicit consent, including those related to AI model training or data analytics. Broader trend of legal challenges is well-documented.

    Writing · Concrete actor (health systems, AI developers) and event (lawsuits) are present. Lacks quantitative/temporal anchor.

  • Patient Trust

    Public Skepticism Toward AI Diagnosis

    Indicative

    Polling shows most patients prefer human clinicians over AI for diagnostic decisions. Indicates trust gap constrains patient acceptance of automated tools.

    verif 60spec 65cur 100newest src 2026-03-05

    Judge · Patients have general concerns about AI errors and loss of human interaction in healthcare, but specific distrust numbers for AI diagnostics vary.

    Writing · Concrete actor (US patients), quantitative anchor (45%), active voice. 'Rises' is a vague quantifier.

  • Patient Trust

    Transparency Labeling Demands Rise

    Indicative

    Advocacy groups push for clear labeling when AI contributes to test results or treatment recommendations. Signals patient demand for visibility into algorithmic care.

    verif 60spec 55cur 10newest src 2024-03-12

    Judge · While specific 'demands' are difficult to quantify, the broader trend for AI transparency in healthcare is well-documented by regulators and advocacy groups across EU/US.

    Writing · Concrete actor and event, but 'advocacy groups' is slightly vague. 'Signals patient demand' is a generic forecast.