Large Language Models: From Concept to Public-Health Workhorse

AIPublicHealthupdate.com — July 9, 2025

Why now?
Six months ago, most public-health teams were still “experimenting” with ChatGPT. Since then, large language models (LLMs) have jumped from pilot projects to production systems that forecast outbreaks, lighten regulatory workloads, and translate guidance for multilingual communities. Below are four fresh, freely accessible examples, followed by the guardrails every agency should be watching.

Real-world deployments (Jan – Jul 2025)

PandemicLLM forecasts disease dynamics – Johns Hopkins and Duke researchers fused real-time epidemiology, policy, genomic and demographic feeds into an LLM called PandemicLLM. In U.S. retrospective tests it beat gold-standard CDC models at predicting COVID-19 hospitalisations one- to three-weeks ahead, especially during variant surges.(hub.jhu.edu)
FDA’s “Elsa” accelerates regulatory review – Rolled out agency-wide on 2 June, Elsa summarises adverse-event reports, compares drug labels and even drafts SQL for internal databases—all inside a secure GovCloud so no industry data leave FDA firewalls. Early pilots cut clinical-protocol review times by double-digit percentages.(fda.gov)
ChatGPT Gov reaches health agencies – OpenAI’s government-only edition lets federal, state and local offices—including public-health departments—run GPT-4o within their own Azure environments. Minnesota’s Enterprise Translations Office is already using it to deliver instant plain-language health alerts in 24 languages.(theverge.com)
Misinformation stress-test exposes weak spots – An Australian study showed popular LLMs could be re-programmed to deliver polished—but false—health answers with fabricated citations; only Anthropic’s Claude resisted most attempts. The finding underscores the need for robust prompt-level and system-level safeguards before deploying chatbots to the public.(reuters.com)

Ethical, regulatory & implementation notes

Data protection – FDA’s Elsa and ChatGPT Gov both operate in zero-training, FedRAMP-ready clouds, addressing HIPAA-style concerns about mingling protected health information with commercial model training.
Transparency & reproducibility – PandemicLLM’s team published model cards and open-sourced code, aligning with HHS’s 2025 AI strategy that calls for shareable documentation and evaluation benchmarks.
Misinformation resilience – The Reuters study highlights the ease of “jail-breaking” consumer models. Agencies should combine constitutional-AI–style guardrails, domain-specific fine-tuning, and human review for any public-facing chatbot.
Equity & language access – LLM-powered translation (e.g., Minnesota) can shrink information gaps, but must be validated for cultural nuance and readability at different health-literacy levels.

Take-home points for practitioners

LLMs are operational today – Forecasting, regulatory review and multilingual communication have moved beyond pilots; start small but plan for scale.
Security and governance first – Use government-grade instances or private-cloud deployments, publish model cards, and require human oversight for any patient-facing output.
Test for harm, then value – Stress-test models for misinformation and bias before touting efficiency gains; ethical deployment sustains public trust.