← All models
Benchmark

Claude Opus-4.6

Anthropicanthropic/claude-opus-4.6

Composite
81
Verifiability
81
Specificity
80
Currency
71
Coverage
96
Briefs evaluated: 12
Total signals: 192
Run: 2026-05-13
Verifier: google/gemini-2.5-flash:online
Specificity judge: google/gemini-2.5-flash

Per-industry signals

12 industries · expand any to see the model's signals with verdict, judge commentary, and citations.

·
  • Clinical

    AI Diagnostic Drift in Radiology

    Speculative

    FDA adverse event reports show AI-assisted radiology tools producing inconsistent sensitivity rates across diverse patient populations. Signals a calibration gap that affects diagnostic equity in imaging departments.

    verif 80spec 65cur 85newest src 2025-12-01

    Judge · The FDA is soliciting public comment on AI drift and real-world performance. The signal is plausible but no specific adverse event report numbers were found.

    Writing · Concrete actor (FDA, imaging dept) & event (adverse reports) cited. 'Inconsistent sensitivity' is quantifiable, but 'diverse patient populations' is vague.

  • Clinical

    LLM Hallucination in Clinical Notes

    Grounded

    Health systems report large language model-generated clinical summaries containing fabricated medication histories and lab values. Indicates an immediate patient safety risk in AI-augmented documentation workflows.

    verif 100spec 75cur 50newest src 2025-02-19

    Judge · Multiple studies and reports confirm LLM hallucinations in clinical notes, outlining immediate safety risks due to fabricated information like medication histories and lab values.

    Writing · Concrete actor (health systems), event (fabricated data), and immediate risk. Lacks a specific temporal anchor.

  • Clinical

    Sepsis Algorithm Alert Fatigue Rise

    Dubious

    Hospitals using AI-based sepsis prediction tools report clinician override rates exceeding 85% due to false positives. Signals erosion of clinical utility and potential liability exposure for missed true cases.

    verif 40spec 85cur 100newest src 2026-05-12

    Judge · One source mentions alert fatigue as a concern for AI sepsis systems, but no evidence of high override rates or erosion of utility was found; instead, one tool achieved high adoption.

    Writing · Concrete actor (hospitals, clinicians), event (override rates), and quantitative anchor (85%).

  • Clinical

    AI Pathology Second-Read Mandates

    Speculative

    Academic medical centers now require human pathologist confirmation for all AI-flagged malignancy classifications before treatment decisions. Indicates institutional recognition that autonomous AI diagnosis remains premature for oncology.

    verif 80spec 65cur 85newest src 2026-01-13

    Judge · While there is strong emphasis on human oversight and second reads are common in pathology workflows, a formal 'mandate' for all AI-flagged malignancy classifications is not explicitly stated as a new, widespread requirement.

    Writing · Concrete actor (academic medical centers), concrete event (mandates), but lacks a temporal anchor.

  • Regulatory

    EU AI Act Health Tier Compliance

    Speculative

    The EU AI Act classifies most clinical decision-support tools as high-risk, requiring conformity assessments by August 2025. Signals mandatory infrastructure investment for any US health system operating in European markets.

    verif 80spec 85cur 50newest src 2025-05-01

    Judge · The EU AI Act classifies most clinical decision-support tools as high-risk. However, the August 2025 compliance date for high-risk AI was delayed to August 2026, or potentially December 2027.

    Writing · Concrete actor, event, and temporal anchor. Active voice. Avoids hype. 'Most' is slightly vague.

  • Regulatory

    FDA Draft Rule on LLM Oversight

    Speculative

    FDA releases draft guidance requiring continuous post-market surveillance for generative AI tools used in clinical settings. Indicates a shift from one-time clearance to ongoing algorithmic monitoring obligations.

    verif 80spec 85cur 100newest src 2026-04-28

    Judge · No specific mention of a draft rule requiring continuous post-market surveillance for *generative AI tools* in clinical settings. The provided sources discuss draft guidances for AI-enabled medical devices and AI in drug development, which encompass broader AI applications and lifecycle management. The closest reference to 'ongoing algorithmic monitoring obligations' is the recommendation for postmarket performance monitoring for AI-enabled devices [fda.gov], but it is not specific to generative AI tools or a 'draft rule requiring' this. While the FDA is taking steps towards real-time clinical trials [fda.gov] and continuous monitoring, a specific 'draft rule on LLM oversight' or 'generative AI' is not found.

    Writing · Concrete actor (FDA), event (draft guidance), and clear shift. Specific about 'post-market surveillance'.

  • Regulatory

    State-Level AI Transparency Laws

    Grounded

    Colorado and California enact laws requiring patient notification when AI contributes to coverage denials or clinical recommendations. Signals a fragmented US compliance landscape that complicates multi-state health system operations.

    verif 100spec 85cur 50newest src 2025-05-12

    Judge · Multiple states are enacting laws requiring human oversight and disclosure of AI use in healthcare decisions, particularly for denials.

    Writing · Concrete actors, events, and a clear shift. Avoids hype though 'complicates' is slightly vague.

  • Regulatory

    CMS Reimbursement Code AI Limits

    Indicative

    CMS proposes restricting reimbursement for AI-only diagnostic interpretations without documented physician involvement. Indicates payer-side pressure to maintain human accountability in billable clinical services.

    verif 60spec 85cur 70newest src 2025-11-05

    Judge · CMS is focusing on preventing discrimination and bias in AI use within healthcare. No explicit 'AI-only diagnostic interpretation' reimbursement restriction was found, but the stated intent to maintain human accountability in billable clinical services is evident.

    Writing · Names a concrete actor (CMS), a concrete event (proposes restricting), and a specific condition for reimbursement.

  • Operational

    Vendor Lock-In for AI Platforms

    Speculative

    Health systems report inability to switch AI clinical vendors due to proprietary data formatting and integration dependencies. Signals strategic risk in long-term contracting without interoperability safeguards.

    verif 80spec 45cur 100newest src 2026-03-11

    Judge · While federal regulations are pushing for interoperability and transparency to mitigate risks, current sources do not directly confirm vendor lock-in as a widespread reported issue.

    Writing · No concrete actors, events, or numbers. Uses active voice for the core observation.

  • Operational

    AI Workforce Role Reclassification

    Speculative

    Hospitals create new positions such as clinical AI liaisons and algorithm auditors to manage deployed machine learning tools. Indicates rising operational overhead that offsets projected AI efficiency gains.

    verif 80spec 75cur 0

    Judge · No direct evidence of hospitals creating new roles like 'clinical AI liaisons' or 'algorithm auditors' to manage ML tools was found in the provided sources. No direct evidence of rising operational overhead offsetting efficiency gains. The sources focus on AI adoption and regulatory changes within HHS and FDA.

    Writing · Concrete actors (hospitals), events (create positions), and specific roles named. 'Rising' is vague.

  • Operational

    Cybersecurity Gaps in AI Pipelines

    Grounded

    Penetration tests reveal AI model endpoints in hospital networks lack standard access controls and audit logging. Signals an expanded attack surface requiring immediate security architecture review.

    verif 100spec 65cur 85newest src 2026-02-02

    Judge · Multiple sources confirm AI-related cybersecurity gaps in healthcare, including a real-world hospital audit and new guidelines addressing these risks for autonomous agents.

    Writing · Concrete actor (hospital networks), specific event (penetration tests), measurable shift implied. Lacks precise quantifiers or a named project.

  • Operational

    EHR-AI Integration Downtime Costs

    Speculative

    Unplanned outages of AI modules embedded in EHR workflows cause documentation backlogs averaging four hours per incident. Indicates fragile system dependencies that reduce rather than enhance operational resilience.

    verif 80spec 85cur 85newest src 2025-11-17

    Judge · The signal points to potential disruptions from AI-EHR integration. While rapid adoption is noted, there's no direct evidence of specific 'four-hour documentation backlogs' due to AI module outages within the provided sources. However, the potential for workflow disruption and administrative burden stemming from AI integration is implied.

    Writing · Concrete actor (EHR-AI), event (downtime), and quantitative anchor (four hours) are strong.

  • Patient Trust

    Patient Opt-Out Rates for AI Care

    Speculative

    Surveyed patients at US academic centers show 34% decline AI involvement in their diagnostic process when given explicit choice. Signals a consent-design challenge that affects AI tool utilization and ROI projections.

    verif 80spec 85cur 100newest src 2026-04-07

    Judge · While patient trust in AI is debated, a specific 34% opt-out rate from US academic centers for diagnostic AI is not explicitly confirmed across multiple sources. The Ohio State survey indicates a decline in openness to AI in healthcare generally, but not a specific diagnostic opt-out rate.

    Writing · Concrete actors, events, and a quantitative anchor are strong. Minor deduction for 'measurable factor'.

  • Patient Trust

    Bias Perception Among Minority Groups

    Dubious

    Community health studies document higher distrust of AI recommendations among Black and Hispanic patient populations. Indicates that health equity concerns directly limit AI adoption in underserved communities.

    verif 40spec 65cur 85newest src 2026-01-01

    Judge · Minority groups, especially Black and Hispanic adults, show higher reported trust in AI for health advice, particularly mental health. This contradicts the signal's claim of higher distrust.

    Writing · Concrete actors, event, and temporal anchor are present. No future tense or hype.

  • Patient Trust

    Demand for AI Explainability Reports

    Grounded

    Patient advocacy organizations now request plain-language explanations of how AI tools influence individual treatment plans. Signals rising accountability expectations that require new clinician communication protocols.

    verif 100spec 65cur 100newest src 2026-04-10

    Judge · Multiple sources confirm patient and consumer groups demanding AI explainability, driven by new EU regulations and existing privacy laws.

    Writing · Concrete actor, measurable shift implied. Abstract 'expectations' and 'standards' detract.

  • Patient Trust

    Malpractice Litigation Citing AI Use

    Grounded

    Plaintiff attorneys in three US jurisdictions file malpractice claims specifically naming AI decision-support tools as contributing factors. Indicates that public perception of AI liability shapes both trust and institutional risk exposure.

    verif 100spec 65cur 100newest src 2026-03-25

    Judge · Multiple lawsuits in various US jurisdictions cite AI as a contributing factor in denied medical claims, often alleging improper denials and lack of human review.

    Writing · Concrete actors, event, and temporal anchor are good. Avoids hype and generic forecasts.