← All models
Benchmark

GPT-5.4

OpenAIopenai/gpt-5.4

Composite
83
Verifiability
95
Specificity
61
Currency
78
Coverage
99
Briefs evaluated: 12
Total signals: 192
Run: 2026-05-13
Verifier: google/gemini-2.5-flash:online
Specificity judge: google/gemini-2.5-flash

Per-industry signals

12 industries · expand any to see the model's signals with verdict, judge commentary, and citations.

·
  • Clinical

    Ambient Scribe Liability Reviews

    Grounded

    Hospitals document diagnostic and medication errors linked to ambient AI scribes that omit symptoms, allergies, or negations in clinical notes. Signals immediate need for clinician verification standards, audit trails, and specialty-specific deployment limits.

    verif 100spec 85cur 70newest src 2025-08-01

    Judge · Multiple sources confirm risks of omissions and hallucinations, impacting diagnoses and treatments. Clinician review, logging, and evaluation frameworks are crucial for safety.

    Writing · Concrete actor, event, and quantifiable shift included. Future-tense recommendations deduct slightly.

  • Clinical

    Sepsis Model Drift Incidents

    Indicative

    Health systems report sepsis alert performance changes after EHR upgrades, population shifts, and revised lab workflows alter input patterns. Indicates immediate relevance for continuous validation, recalibration schedules, and oversight of model-dependent care pathways.

    verif 60spec 65cur 50newest src 2025-05-08

    Judge · Multiple sources highlight AI model variability and the need for localized validation and recalibration due to differing patient populations and clinical contexts, implying drift.

    Writing · Concrete actor (health systems), event (EHR upgrades), and temporal anchor (after). Deductions for 'many' implicit, 'alters input patterns'. Uses active voice effectively.

  • Clinical

    AI Imaging Triage Overrides

    Grounded

    Radiology services track cases where AI triage flags differ from radiologist prioritization, especially in stroke, fracture, and chest imaging queues. Signals immediate pressure to define override authority, escalation rules, and documentation for discrepant findings.

    verif 100spec 85cur 100newest src 2026-03-01

    Judge · The increasing use of AI for triage in radiology, as highlighted by recent research in mammography, makes the tracking of AI-radiologist discrepancies a present concern.

    Writing · Concrete actors (radiology services, AI triage, radiologist), specific events (flags, prioritization), and domain.

  • Clinical

    LLM Discharge Instruction Errors

    Grounded

    Pilot programs find large language models producing discharge instructions with reading-level mismatches, dosing ambiguities, and unsupported follow-up advice. Indicates immediate relevance for human review, multilingual validation, and standardized patient education controls.

    verif 100spec 65cur 50newest src 2024-11-20

    Judge · Multiple studies identify hallucinations, medication errors (including dosage omissions), and lack of personalization in AI-generated discharge instructions. Human review is consistently emphasized.

    Writing · Good concrete problems identified; lacks specific actor/event for higher score.

  • Regulatory

    EU AI Act Device Mapping

    Grounded

    Health technology vendors map clinical AI products to EU AI Act risk tiers alongside MDR and IVDR classifications. Signals immediate compliance work for procurement criteria, documentation requests, and post-market monitoring responsibilities.

    verif 100spec 65cur 100newest src 2026-04-15

    Judge · The EU AI Act classifies AI systems in healthcare by risk, imposing specific compliance and oversight requirements. This necessitates comprehensive mapping and governance shifts.

    Writing · Concrete actor (EU AI Act, Europe), event (risk mapping), and shift (procurement to governance).

  • Regulatory

    FDA AI Change Control Plans

    Grounded

    FDA guidance discussions center on predetermined change control plans for software functions that update through machine learning. Indicates immediate relevance for vendor contracts, validation evidence, and governance of model modifications after deployment.

    verif 100spec 65cur 70newest src 2025-08-18

    Judge · The FDA has issued guidance on PCCPs for AI-enabled devices, with ongoing research into evaluation methods.

    Writing · Concrete actor/event, but 'discussions center on' is passive and 'immediate relevance' is weak.

  • Regulatory

    OCR AI Privacy Enforcement

    Grounded

    US regulators scrutinize health data flows to analytics and AI tools that transmit identifiers through tracking pixels, prompts, and cloud logs. Signals immediate need for HIPAA risk reviews, vendor restrictions, and logging minimization practices.

    verif 100spec 65cur 10newest src 2024-05-06

    Judge · OCR actively investigating AI-related complaints and emphasizing HIPAA compliance for AI, especially concerning tracking technologies and vendor agreements.

    Writing · Concrete actor (US regulators), event (scrutiny), and anchor (HIPAA) are present. 'Immediate need' is a slight deduction.

  • Regulatory

    Algorithmic Bias Audit Mandates

    Grounded

    State and EU policymakers advance rules requiring impact assessments, dataset documentation, and bias testing for high-risk automated decisions. Indicates immediate relevance for hospital governance committees, evidence retention, and procurement due diligence.

    verif 100spec 65cur 100newest src 2026-05-13

    Judge · Both US and EU regulations mandate bias audits and impact assessments for AI, especially in healthcare, with compliance deadlines imminent.

    Writing · Concrete actors (State, EU), specific actions (impact assessments, dataset documentation, bias testing). Lacks a specific quantitative or temporal anchor.

  • Operational

    GPU Capacity Allocation Conflicts

    Speculative

    Hospital IT teams face compute bottlenecks as imaging, documentation, and revenue-cycle AI projects compete for limited GPU and cloud budgets. Signals immediate need for portfolio prioritization, usage metering, and cost-to-value tracking.

    verif 80spec 65cur 50newest src 2025-01-01

    Judge · While the impact of resource constraints on AI adoption in healthcare is acknowledged, specific evidence linking it directly to GPU capacity allocation conflicts is not explicitly detailed across multiple sources within the provided context.

    Writing · Concrete actors (Hospital IT teams, AI projects) and events. Lacks a quantitative/temporal anchor.

  • Operational

    EHR Copilot Workflow Friction

    Speculative

    Clinicians report extra clicks, inbox clutter, and note-reconciliation work when EHR copilots insert suggestions outside established documentation patterns. Indicates immediate relevance for workflow redesign, usability testing, and specialty-level adoption metrics.

    verif 80spec 65cur 85newest src 2025-11-17

    Judge · No direct mention of 'extra clicks,' 'inbox clutter,' or 'note-reconciliation work' from EHR copilots, but studies indicate a dynamic impact on workflow and the need for careful review of AI-generated notes, suggesting potential friction points.

    Writing · Concrete actor (Clinicians, EHR copilots), specific frictions named, no vague qualifiers or hype.

  • Operational

    Shadow AI Procurement Gaps

    Grounded

    Departments purchase AI tools through local budgets or free trials, bypassing security review, integration checks, and data-processing assessments. Signals immediate need for centralized intake, inventory controls, and standardized contract language.

    verif 100spec 65cur 100newest src 2026-03-16

    Judge · Shadow AI is widespread in healthcare, often bypassing IT and compliance for speed and functionality.

    Writing · Concrete actors (Departments), clear actions, and identified gaps. Lacks specific temporal/quantitative anchors.

  • Operational

    Vendor Indemnity Clause Disputes

    Grounded

    Contract negotiations increasingly focus on responsibility for clinical harm, copyright claims, and regulatory violations tied to generative AI outputs. Indicates immediate relevance for legal review, insurance coverage checks, and deployment approval thresholds.

    verif 100spec 65cur 100newest src 2026-05-13

    Judge · Indemnification clauses are increasingly covering regulatory non-compliance, algorithmic bias, and IP infringement in healthcare AI contracts, driven by evolving regulations and potential for patient harm.

    Writing · Concrete areas of dispute are named. Lacks specific actors, products, or quantitative anchors.

  • Patient Trust

    Consent Questions on AI Notes

    Grounded

    Patients ask whether ambient listening tools record encounters, store audio, or train models using sensitive visit conversations. Signals immediate relevance for disclosure language, consent workflows, and visible safeguards during appointments.

    verif 100spec 65cur 85newest src 2025-12-23

    Judge · Multiple sources confirm patient concerns regarding ambient AI recording, data use, and the need for clear consent. HHS is seeking feedback on related issues.

    Writing · Concrete actor (patients), event (ask questions), and immediate relevance. Lacks a temporal anchor.

  • Patient Trust

    Portal Chatbot Confidence Gaps

    Indicative

    Patient portal users report confusion when chatbots answer billing, triage, or medication questions without clear sourcing or escalation paths. Indicates immediate relevance for labeling rules, handoff options, and response quality monitoring.

    verif 60spec 65cur 100newest src 2026-05-09

    Judge · While specific user confusion isn't detailed, sources highlight AI risks in patient portals and underscore the need for transparency, clear governance, and monitoring.

    Writing · Concrete actor/event (Portal Chatbot/users) and a clear problem identified. Lacks a temporal or quantitative anchor.

  • Patient Trust

    Bias Concerns in Risk Scores

    Grounded

    Community groups challenge algorithmic risk scores that use proxies linked to race, disability, language, or prior access patterns. Signals immediate relevance for explainability materials, fairness reviews, and stakeholder engagement in deployment decisions.

    verif 100spec 55cur 100newest src 2026-03-25

    Judge · Multiple sources confirm concerns about algorithmic bias, particularly in Medicare Advantage, impacting equitable access and patient outcomes. Regulations are emerging to address this.

    Writing · No actor/event/quantity; uses active voice; concrete topics.

  • Patient Trust

    Data Use Objections for Training

    Grounded

    Patients and advocates object when de-identified records support vendor model training without clear notice, opt-out processes, or benefit-sharing terms. Indicates immediate relevance for data governance transparency, contract disclosures, and public communication.

    verif 100spec 45cur 85newest src 2025-12-29

    Judge · Multiple sources highlight concerns over data privacy, transparency, consent challenges, and regulatory complexity in AI development using patient data, emphasizing the need for clear governance and stakeholder involvement.

    Writing · No concrete actor, event, product or quantitative anchor. Uses vague concepts like "clear notice".