HIPAA is not a model selection problem. It is a configuration discipline. Any of the major cloud LLM providers can be deployed in a HIPAA-compliant posture, and any of them can be misconfigured into a violation. The reference architecture below is what we deploy when healthcare PHI is in scope and an auditor will eventually visit.
TL;DR — The Five Controls That Matter
- BAA chain. Signed BAA from every vendor that handles PHI, end to end. No exceptions, no exemptions.
- Encryption. TLS 1.3 in transit, AES-256 at rest, KMS-controlled keys with rotation.
- Access control. Least privilege, MFA-required, role-based, with break-glass procedures.
- Audit logging. Every PHI access, retained six years, tamper-evident, queryable.
- Minimum necessary. AI workflows must request only the PHI fields the task actually needs.
The architecture below operationalizes each of these.
What HIPAA Actually Requires (the Short Version)
HIPAA's relevant rules for AI systems:
- Privacy Rule — limits how PHI is used and disclosed. Drives the "minimum necessary" principle.
- Security Rule — administrative, physical, and technical safeguards. The technical safeguards (access control, audit controls, integrity, transmission security) map directly onto AI system design.
- Breach Notification Rule — disclosure rules if PHI is exposed. Drives logging and monitoring requirements.
A useful framing: HIPAA does not prescribe technology. It prescribes outcomes. Your architecture must produce evidence that PHI is protected; how you produce that evidence is engineering judgment.
The Reference Architecture
┌────────────────────────────────────┐
│ Customer-owned AWS / Azure / GCP │ ← Single tenant, BAA in place
│ │
│ ┌──────────────────────────────┐ │
│ │ Edge / API gateway │ │ ← TLS 1.3, WAF, IP allowlist
│ │ (SSO → MFA → JWT) │ │
│ └──────────┬───────────────────┘ │
│ │ │
│ ┌──────────▼───────────────────┐ │
│ │ PHI Tokenizer / De-id │ │ ← Optional, depends on workflow
│ └──────────┬───────────────────┘ │
│ │ │
│ ┌──────────▼───────────────────┐ │
│ │ Orchestration (private VPC) │ │ ← LangGraph in private subnet
│ └──┬───────────┬───────────────┘ │
│ │ │ │
│ ┌──▼──┐ ┌─────▼─────┐ │
│ │ RAG │ │ LLM │ │
│ │vec │ │ provider │ │ ← BAA-covered: Azure OpenAI,
│ │store│ │ │ │ AWS Bedrock, on-prem
│ └──┬──┘ └─────┬─────┘ │
│ │ │ │
│ ┌──▼───────────▼─────────────┐ │
│ │ Audit Log (immutable) │ │ ← S3 Object Lock / WORM
│ │ 6-year retention │ │
│ └────────────────────────────┘ │
└────────────────────────────────────┘
Everything inside the box is single-tenant infrastructure owned by the customer (or a partner with a BAA). Anything calling out of the box must be BAA-covered.
Control 1 — The BAA Chain
A Business Associate Agreement is the legal instrument that makes a vendor a "business associate" handling PHI on your behalf. Without it, the vendor is not authorised to receive PHI, period.
The chain must be unbroken:
Patient (Covered Entity) → Your Org (BA) → Cloud Provider (BA) → LLM Provider (BA) → Subprocessors (BA)
A missing link is a violation regardless of how well the rest is configured.
What to verify before deploying:
| Vendor | Has BAA for AI workloads? | Notes |
|---|---|---|
| Azure OpenAI | Yes | Standard Microsoft BAA covers it on enterprise contracts |
| AWS Bedrock | Yes | AWS BAA must explicitly include Bedrock |
| Google Vertex AI | Yes | Google Cloud BAA covers it |
| Anthropic (direct) | Yes, enterprise tier | Requires separate BAA, not on standard plans |
| OpenAI (direct) | Yes, enterprise tier | API-only; ChatGPT consumer not BAA-eligible |
| Pinecone | Yes, enterprise tier | Confirm BAA explicitly covers your workload |
| Self-hosted models | N/A | You are the operator; no third-party data sharing |
A common mistake: using a vendor's "HIPAA-eligible" service through the wrong contract type. "Eligible" is not the same as "covered" — you have to sign the BAA explicitly and configure into the eligible subset.
Control 2 — Encryption and Key Management
- In transit: TLS 1.3 between every component. mTLS for service-to-service.
- At rest: AES-256 on every store — vector DB, audit logs, conversation history, embeddings.
- Keys: KMS-managed (AWS KMS, Azure Key Vault, GCP KMS) with documented rotation policy and access audit.
- Customer-managed keys (CMKs) are not strictly required by HIPAA but are increasingly expected in enterprise procurement and dramatically simplify breach scoping.
Embeddings are derivative of PHI. They must be encrypted at rest with the same key class as the source documents.
Control 3 — Access Control
The technical safeguards in the Security Rule map onto five concrete controls:
- Unique user identification. Service accounts forbidden for human-driven access. Every action attributable to a person.
- Automatic logoff. Sessions expire on inactivity. AI workflows must re-authenticate after configurable timeouts.
- Encryption and decryption. Keys held in KMS, not in environment variables.
- MFA at every entry point. SSO + MFA, no exceptions for "convenience" tools.
- Role-based access. A care coordinator can read patient charts; an AI training engineer cannot. The AI orchestrator runs under a service identity with strictly scoped permissions.
Break-glass procedures: for emergencies, a designated role can elevate permissions with mandatory justification, mandatory paired approval, and a logged alert to the security team. Used <0.1% of the time; logged 100% of it.
Control 4 — Audit Logging
The single most-failed HIPAA control in AI deployments.
Every PHI access — generation, retrieval, embedding, display — must be logged with:
- Timestamp (to the millisecond, UTC)
- Actor identity (user_id, service identity if applicable)
- Action (read / write / generate / retrieve)
- PHI affected (patient_id, document_id, field-level granularity where relevant)
- Outcome (success / failure / denied)
- Source context (request_id, trace_id)
Logs go to immutable storage — S3 Object Lock in compliance mode, Azure Immutable Blob Storage, or write-once tape — and retained for six years (HIPAA minimum) or seven (defensible margin).
pythonasync def log_phi_access(event: PHIAccessEvent): record = { "timestamp": utcnow().isoformat(timespec="milliseconds"), "actor": event.actor.id, "actor_type": event.actor.type, # human / service "action": event.action, "patient_id": event.patient_id, "document_id": event.document_id, "fields": event.fields, "outcome": event.outcome, "trace_id": event.trace_id, "ip": event.client_ip, "user_agent": event.user_agent, } record["hash"] = hmac_sign(record, SIGNING_KEY) await audit_store.append(record) # immutable WORM target
The hash-chain pattern (each log entry's hash includes the previous entry's hash) gives you tamper-evidence cheaply. An auditor can verify the chain is intact in seconds.
Control 5 — Minimum Necessary in AI Workflows
The Privacy Rule's "minimum necessary" standard requires that PHI use be limited to what is needed for the task.
In AI architecture this turns into concrete patterns:
- Field-level retrieval. A medication-history agent retrieves only medication rows, not full charts.
- De-identification before generation when possible. If an LLM call doesn't strictly need names and dates of birth, mask them before sending.
- Per-tool authz on PHI fields. A "summarize visit" tool authorizes against
read:encounter; notread:billing. - Eval data is also PHI. If your eval set is built from production traces, it is PHI and inherits all protections.
What HIPAA-Compliance Does NOT Require
Several myths cost teams unnecessary engineering effort. HIPAA does not require:
- On-premise hosting (cloud + BAA + correct configuration is fine).
- US-only data residency (HIPAA itself is silent on location; specific BAAs may impose it).
- Disabling logging or storage of conversations (you must log them).
- Avoiding cloud LLM providers categorically (you may use them under BAA).
- A specific encryption algorithm (AES-256 is current best practice but the Security Rule is technology-neutral).
What it does require: documented decisions, evidence of controls, and the ability to produce that evidence on demand.
The Annual Compliance Calendar
Compliance is not a one-time achievement. The cadence we run with healthcare-AI clients:
| Activity | Frequency |
|---|---|
| Risk analysis (SR 164.308(a)(1)) | Annually + after material changes |
| Access review — who has access to PHI systems | Quarterly |
| Audit log review — sampled and trend analysis | Monthly |
| Penetration test (external) | Annually |
| Workforce HIPAA training | Annually + onboarding |
| Disaster recovery test | Annually |
| BAA inventory review | Annually + before adding any new vendor |
| Incident response tabletop exercise | Annually |
Each of these is documented; the documentation is what survives an audit.
Common Audit Findings (and Fixes)
The findings auditors flag most often in AI systems, with fixes:
| Finding | Root cause | Fix |
|---|---|---|
| Audit logs incomplete or inconsistent | Logging added per-feature, not per-data-access | Centralise logging through a wrapper around every PHI access |
| No documented risk analysis | Compliance treated as engineering-only | Run the 164.308 risk analysis annually with named owners |
| Excessive PHI in vector stores | "We embedded the whole record" | Field-level chunking; de-id before embedding when possible |
| Lapsed BAAs after vendor changes | No process to re-verify | Quarterly BAA inventory; block adds without BAA |
| AI prompts contain PHI in logs | Logs include raw prompts | Redact or hash PHI in app logs; full record only in audit log |
Frequently Asked Questions
Is GPT-4 / Claude / Gemini HIPAA-compliant?
The base APIs are not. Azure OpenAI, AWS Bedrock, and Google Vertex AI offer HIPAA-eligible enterprise tiers with a signed BAA when deployed correctly. Anthropic offers HIPAA eligibility on enterprise plans via direct BAA. Configuration, not the model, determines compliance.
Does HIPAA require on-premise AI?
No. HIPAA requires administrative, physical, and technical safeguards over PHI — not a specific deployment topology.
What is the most-failed HIPAA control in AI systems?
Audit logging. Specifically: not logging every PHI access, not retaining logs long enough (six years), and not logging at the granularity that lets an investigator answer "who accessed which patient's record on which date?"
Can RAG systems be HIPAA-compliant?
Yes. The vector store, embeddings, retriever, and generation must all operate inside a BAA-covered environment. Embeddings are derivative of PHI and inherit the same protections. Logging covers retrieval as well as generation.
How long does it take to certify a HIPAA-aligned architecture?
HIPAA has no certification; vendors self-attest. Achieving "audit-ready" posture on a new system typically takes 8-12 weeks of focused work plus a six-month operational track record before the first external assessment.
Key Takeaways
- HIPAA compliance is configuration, not architecture style — cloud and on-prem both work when configured correctly.
- Embeddings and retrieval logs are PHI; treat them with the same controls as the source records.
- Audit logging at request granularity, retained for six years, is the most-failed control in healthcare AI.
- The BAA chain must be unbroken from your customer down to every provider that touches PHI.
- Compliance is a continuous practice, not a one-time achievement — the annual cadence is the work.
Is GPT-4 / Claude / Gemini HIPAA-compliant?
The base APIs are not. Azure OpenAI, AWS Bedrock, and Google Vertex AI offer HIPAA-eligible enterprise tiers with a signed BAA when deployed correctly. Anthropic offers HIPAA eligibility on enterprise plans via direct BAA. Configuration, not the model, determines compliance.
Does HIPAA require on-premise AI?
No. HIPAA requires administrative, physical, and technical safeguards over PHI — not a specific deployment topology. Cloud is acceptable when the cloud provider signs a Business Associate Agreement and the customer configures the workload correctly.
What is the most-failed HIPAA control in AI systems?
Audit logging. Specifically: not logging every PHI access, not retaining logs long enough (six years), and not logging at the granularity that lets an investigator answer 'who accessed which patient's record on which date?'
Can RAG systems be HIPAA-compliant?
Yes. The vector store, embeddings, retriever, and generation must all operate inside a BAA-covered environment. Embeddings are derivative of PHI and inherit the same protections. Logging covers retrieval as well as generation.
