Zero Cloud AI: What Law Firms and Medical Practices Need to Know

Section 01

Executive Summary

Key Findings

Every major cloud AI platform — including ChatGPT, Microsoft Copilot, and Google Gemini — routes user queries and document content through third-party servers the client does not control.
In United States v. Heppner, 1:25-cr-00503(JSR), 2026 WL 436479 (S.D.N.Y. Feb. 17, 2026), a federal judge ruled that AI queries containing attorney-provided materials were not protected by attorney-client privilege because they were processed by a cloud platform. The court noted that on-premises AI directed by counsel may qualify for protection.
Using cloud AI tools with Protected Health Information (PHI) without a compliant Business Associate Agreement (BAA) constitutes a potential HIPAA violation under 45 C.F.R. § 164.502. Critically, even with a BAA, certain platform data practices may not satisfy HIPAA's minimum necessary standard.
Zero cloud AI — large language models running on hardware the client owns and controls — eliminates the third-party transmission risk by architectural design, not contractual assurance.
Zero cloud is not a complete security solution. Physical security, network access controls, authentication, and audit logging are required in addition to local deployment to meet compliance standards.
Open-source models including Meta's Llama 3 and Google's Gemma, running via frameworks such as Ollama, are now capable of performing document analysis, summarization, and workflow automation at a quality level suitable for professional use.

This white paper is written for two audiences. The first is the practice owner, managing partner, or administrator who needs to understand the business and legal risk in plain language and make a decision about their AI policy. The second is the IT administrator or compliance officer who needs to understand what a compliant private AI deployment requires technically. The paper is structured to serve both: the first six sections address the business and legal case; sections seven through ten address the technical implementation.

Section 02

The Cloud AI Problem

Artificial intelligence tools have entered the professional workplace faster than compliance frameworks have adapted. A 2025 survey by the American Bar Association found that more than half of law firm associates reported using AI tools in their work — and that the majority of those tools were consumer-grade cloud products not approved by their firms.^[1] The picture in healthcare is similar: a 2024 survey by the American Medical Association found that 38% of physicians had used a general-purpose AI tool to assist with clinical documentation.^[2]

The compliance gap this creates is not theoretical. It is the result of a straightforward architectural fact: when a user types a query into ChatGPT, Microsoft Copilot, or Google Gemini, that query — along with any document content pasted into it — is transmitted over the internet to a server operated by a third party. The user has no visibility into what happens to that data after transmission.

What Actually Happens to Your Data

Each major cloud AI platform handles data differently, but several common practices are worth understanding:

OpenAI (ChatGPT): By default, conversations on the consumer platform may be used to train future models. Enterprise agreements provide different terms, but the data still traverses OpenAI's infrastructure. OpenAI's data retention policy retains conversation data for up to 30 days by default.^[3]
Microsoft Copilot: For enterprise Microsoft 365 customers, Microsoft commits that Copilot data is not used to train foundation models. However, data is processed in Microsoft's cloud infrastructure, and access controls depend entirely on your Microsoft 365 tenant configuration.^[4]
Google Gemini: Google Workspace customers have specific data processing commitments, but Gemini queries are processed on Google's infrastructure. Consumer Gemini usage may be reviewed by human reviewers.^[5]

The critical point is not which platform is "most private" — it is that all of them require transmitting confidential data to infrastructure you do not control, administered by parties who are not subject to the same professional obligations you are.

The Core Problem

A law firm partner and a medical practice owner share the same fundamental exposure: their staff is using AI tools that were not designed for professional confidentiality requirements. The question is not whether this is happening — it almost certainly is — but what the legal consequences are and how to stop them.

Section 03

The Legal Exposure: Attorney-Client Privilege

Attorney-client privilege is one of the oldest and most fundamental protections in American law. It exists to encourage full and frank communication between attorneys and their clients, and it depends on a critical condition: the communication must be made in confidence, without voluntary disclosure to third parties.

The question of whether cloud AI use constitutes voluntary disclosure to a third party — and therefore destroys privilege — was addressed directly by a federal court in early 2026.

United States v. Heppner — S.D.N.Y., Feb. 17, 2026

In this case, a defendant used a cloud-based AI assistant to process materials he had received from his attorney. The government sought to compel disclosure of the AI queries on the grounds that the communications were not privileged. Judge Jed S. Rakoff of the Southern District of New York agreed, ruling that the AI queries were not protected by attorney-client privilege because the data had been transmitted to and processed by a third-party cloud platform.

Critically, Judge Rakoff noted that the outcome might have been different had the AI system been directed by counsel rather than by the client, and had it operated on counsel's own infrastructure — language that directly points to on-premises AI as the architecturally privileged alternative.

1:25-cr-00503(JSR), 2026 WL 436479 (S.D.N.Y. Feb. 17, 2026) · Analysis via New York State Bar Association, March 10, 2026

The Work Product Doctrine

The work product doctrine, which protects materials prepared by or for an attorney in anticipation of litigation, faces a similar challenge. Work product protection can be waived by voluntary disclosure to an adverse party or to a third party in a manner inconsistent with maintaining its confidentiality. Transmitting attorney work product to a cloud AI platform — whose terms of service allow data access by platform employees, automated systems, and potentially government subpoena — is arguably inconsistent with maintaining that confidentiality.

State Bar Ethics Opinions

Multiple state bar associations have issued ethics guidance on cloud AI use. The California State Bar's guidance on technology and competence requires attorneys to understand the material risks of AI tools and to take reasonable measures to prevent inadvertent disclosure of client information.^[6] "Reasonable measures" — in the context of a federal court having already ruled that cloud AI use can void privilege — is a standard that is increasingly difficult to meet with consumer cloud AI products.

The Practical Risk for Law Firms

Consider a concrete scenario: An associate pastes a client's deposition transcript into ChatGPT and asks it to identify key admissions and contradictions. The resulting analysis is useful — but the deposition transcript, and the AI's analysis of it, now exist on OpenAI's servers. If opposing counsel discovers this in discovery, the privilege over those communications may be destroyed. The client's case may be damaged. The firm may face a malpractice claim.

This is not a hypothetical scenario. It is the fact pattern that the Heppner ruling addressed, and the ruling's implications extend to any law firm whose staff uses cloud AI with client materials.

Section 04

The HIPAA Exposure: Medical Practices

The Health Insurance Portability and Accountability Act of 1996, and its implementing regulations at 45 C.F.R. Parts 160 and 164, create a comprehensive framework for protecting individually identifiable health information. For medical practices using cloud AI tools with patient data, there are two distinct exposure vectors: the absence of a Business Associate Agreement, and the inadequacy of existing BAAs.

What Constitutes a HIPAA Violation

Under the HIPAA Privacy Rule (45 C.F.R. § 164.502), a covered entity — which includes virtually all medical practices — may not disclose Protected Health Information to a third party without patient authorization or a compliant legal basis. Using a cloud AI platform to process PHI without a valid BAA constitutes an impermissible disclosure, which is both a Privacy Rule violation and potentially a Security Rule violation if the platform does not meet HIPAA's technical safeguard requirements.

PHI is broadly defined. It includes not just diagnoses and treatment records but also any information that could reasonably identify a patient — including names, dates of birth, appointment types, and even the combination of a medical condition with a geographic area. A staff member who types "remind John Smith about his follow-up for his lumbar disc herniation on May 15" into ChatGPT has transmitted PHI to a third party without authorization.

The HIPAA Fine Schedule

The Office for Civil Rights (OCR) at the Department of Health and Human Services enforces HIPAA and imposes civil monetary penalties on a tiered basis:

Tier	Standard	Per Violation	Annual Cap
Tier 1	Did not know (and could not have known)	$100 – $50,000	$25,000
Tier 2	Reasonable cause (not willful neglect)	$1,000 – $50,000	$100,000
Tier 3	Willful neglect (corrected within 30 days)	$10,000 – $50,000	$250,000
Tier 4	Willful neglect (not corrected)	$50,000 per violation	$1,500,000

OCR enforcement has increased significantly in recent years. In 2024, OCR reached 22 settlement agreements with covered entities, recovering more than $9.8 million in penalties.^[7] Penalties are assessed per violation, and each patient whose PHI was impermissibly disclosed may constitute a separate violation.

Real-World Exposure Calculation

A small medical practice with 1,200 active patients, one staff member using ChatGPT with PHI for four months, affecting perhaps 80 patients: at Tier 2 minimum ($1,000 per violation), that is $80,000 in potential penalties — for a single staff member, on a single platform, over four months. The practice's HIPAA training policy, or its absence, determines whether OCR views this as Tier 2 or Tier 3.

Breach Notification Requirements

Under the HIPAA Breach Notification Rule (45 C.F.R. §§ 164.400–414), covered entities must notify affected individuals, the Secretary of HHS, and in some cases the media, when a breach of unsecured PHI occurs. The notification must occur within 60 days of discovery. Breaches affecting 500 or more individuals require immediate notification to HHS and are posted on OCR's public "Wall of Shame."

Using an unauthorized cloud AI platform with PHI is very likely a breach requiring notification. This means disclosure to patients — which carries its own reputational and operational consequences, independent of the financial penalties.

Section 05

Why "We Have a BAA" Is Not Enough

The most common objection to the compliance concerns raised above is: "We use the enterprise version and we have a BAA." This is a meaningful protection — it is better than nothing — but it is insufficient as a complete answer for several reasons.

What a BAA Actually Covers

A Business Associate Agreement is a contract that requires the AI vendor to handle PHI in accordance with HIPAA requirements. It does not:

Prevent the vendor from being subpoenaed or receiving a government access request for your data
Prevent the vendor's employees from accessing data for platform operations, safety review, or abuse prevention
Guarantee that the vendor's security controls actually meet HIPAA's technical safeguard requirements in practice, only that the vendor contractually agrees to comply
Protect attorney-client privilege — a BAA is a HIPAA instrument, not a privilege instrument
Insulate you from a breach notification obligation if the vendor suffers a data breach

The Privilege Gap

For law firms specifically: even a fully compliant enterprise BAA with an AI vendor does not address the privilege question. Attorney-client privilege is not a contract — it is a legal protection that depends on the confidentiality of the communication. A BAA governs the vendor's data handling obligations. It does not determine whether a court will treat AI-processed communications as privileged. Heppner made this clear: the existence of an enterprise agreement does not change the analysis.

The Minimum Necessary Standard

HIPAA's minimum necessary standard (45 C.F.R. § 164.502(b)) requires that covered entities limit PHI disclosures to the minimum necessary to accomplish the intended purpose. When a staff member pastes an entire patient record or intake form into a cloud AI prompt, this standard is almost certainly violated — even with a BAA in place — because the disclosure is broader than necessary for the AI's output.

Section 06

What Zero Cloud Actually Means

"Zero cloud" is a term that describes an AI deployment architecture in which the large language model — the core AI engine — runs entirely on hardware the client owns and controls, with no data transmitted to external servers during operation.

The Technical Architecture

A zero cloud AI deployment typically consists of the following components:

Hardware

A workstation or server with sufficient RAM (16GB minimum, 32GB+ recommended) and ideally a GPU with 8GB+ VRAM for acceptable performance on larger models

Required

Model Runtime

Ollama or a similar framework that manages model loading, serves an API on localhost, and handles inference — entirely offline once the model is downloaded

Required

LLM Model

An open-source model such as Meta Llama 3 (8B or 70B), Google Gemma 3, or Mistral — downloaded once, stored locally, never queried remotely during use

Required

Application Layer

A custom interface — web application, desktop app, or API — that provides the user-facing experience and connects to the local model runtime

Required

RAG / Indexing

Retrieval-Augmented Generation — a vector database (such as ChromaDB) that indexes practice documents locally so the AI can answer questions from your own files

Recommended

Voice / STT

Local speech-to-text using OpenAI's open-source Whisper model, running on-premises — enables voice dictation without cloud transcription services

Recommended

Auth / Logging

Access control, user authentication, and audit logging — required for HIPAA compliance, strongly recommended for law firm deployments

Recommended

Once this stack is operational, the system functions entirely without internet connectivity. A query submitted to the AI travels from the user's browser to the local application server, to the local model runtime, and back — all on the practice's or firm's own hardware, within their own network perimeter.

The Models

Open-source large language models have advanced dramatically. Meta's Llama 3 (released 2024) and its successors, Google's Gemma family, and Mistral's models are capable of performing sophisticated document analysis, summarization, and generation tasks. These are not inferior versions of commercial AI — they are the same class of model, running locally instead of in the cloud. For the specific workflows that law firms and medical practices need — document analysis, note generation, letter drafting — the performance gap between local and cloud models is minimal for most practical purposes.

Section 07

What Zero Cloud Protects Against

Zero cloud architecture addresses the specific risks created by third-party data transmission. When properly implemented, it eliminates:

Third-party data interception: No data travels over the internet during AI operation. There is no transmission to intercept.
Vendor data breaches: If OpenAI, Microsoft, or Google suffers a data breach, your client data is not in their systems and cannot be part of the breach.
Government subpoenas served to vendors: Law enforcement and regulators cannot compel your AI vendor to produce data they do not have. Your data is in your building.
Vendor employee access: Platform employees — whether for safety review, model training, or other operational purposes — cannot access your data because it never leaves your premises.
Model training on your data: Open-source models running locally do not use your queries or documents to train or improve the model. What you input stays input — it does not become training data.
Unauthorized access via vendor terms of service: Cloud AI platforms' terms of service include provisions that allow broad data access for platform purposes. These provisions do not apply to a model running on your hardware.
HIPAA breach via unauthorized vendor disclosure: PHI processed locally is not disclosed to a third party, which is the act that triggers HIPAA breach analysis.
Privilege waiver via third-party transmission: As the Heppner ruling framed it, on-premises AI directed by counsel is architecturally positioned to preserve privilege in a way that cloud AI cannot.

Section 08

What Zero Cloud Does Not Protect Against

Intellectual honesty requires acknowledging the limits of zero cloud architecture. Moving the AI on-premises eliminates the cloud transmission risk — it does not make the overall system secure. The following risks remain and must be addressed separately.

Physical Security

The hardware running the AI model is now the most valuable data asset in the office. If the server is physically stolen, all locally processed data may be compromised. Full disk encryption (BitLocker for Windows, FileVault for macOS) is a baseline requirement. Physical security of the server — locked room, secured rack, access logging — should be treated with the same seriousness as any other protected record.

Local Network Exposure

By default, AI model runtimes like Ollama listen on all network interfaces. This means any device on the same local network — including guest WiFi — can potentially query the AI without authentication. In a practice or firm with multiple users and shared network infrastructure, this is a meaningful exposure. Binding the model runtime to localhost only, implementing network segmentation, and restricting API access to authorized devices are required mitigations.

Access Control and Authentication

A zero cloud deployment without authentication means any user who can reach the application can use the AI without accountability. Role-based access controls — limiting which staff can use which AI capabilities, with individual user authentication — are necessary for HIPAA compliance and good practice governance.

Audit Logging

HIPAA's Security Rule requires covered entities to implement audit controls (45 C.F.R. § 164.312(b)) — hardware, software, and procedural mechanisms to record and examine activity in systems that contain PHI. A zero cloud AI system without audit logging does not meet this standard. Query metadata — user ID, timestamp, scenario used — should be logged to a secure, tamper-evident log. Query content may or may not be logged depending on the practice's data minimization policy.

Endpoint Security

The device running the AI system is a target for malware and unauthorized access. If the deployment machine is compromised by ransomware, a keylogger, or a remote access tool, the zero cloud protection collapses entirely. Standard endpoint security — current OS patches, antivirus/EDR, application allowlisting, no unauthorized software — applies to the deployment machine with particular urgency.

Insider Threats

Zero cloud does not prevent a staff member from copying confidential AI outputs and transmitting them externally. This is a human problem that requires policy, training, and if appropriate, data loss prevention controls at the network or endpoint level.

Model Integrity

Open-source models should be downloaded from official, verified sources (the Ollama model registry, official HuggingFace repositories with verified checksums). Models from unofficial sources may have been tampered with. Verify model checksums at download and document the source for compliance purposes.

The Honest Summary

Zero cloud eliminates one significant and well-documented attack vector: third-party data exposure via cloud transmission. It is necessary but not sufficient for a complete compliance posture. A practice or firm that deploys zero cloud AI and ignores physical security, access controls, and audit logging has reduced their risk but has not eliminated it. The goal is a layered security posture in which zero cloud is one important component.

Section 09

The Complete Security Stack

The following represents a baseline security posture for a zero cloud AI deployment in a medical practice or law firm. Requirements are stratified by urgency.

Baseline Requirements (Deploy Before Go-Live)

Full disk encryption on the AI deployment machine — BitLocker (Windows) or FileVault (macOS)
Localhost-only binding for the model runtime — Ollama's OLLAMA_HOST=127.0.0.1 configuration prevents network-wide access
Application-level authentication — at minimum, HTTP basic auth or token-based auth on the AI application layer
BIOS/firmware password and boot order locked to prevent cold boot attacks or unauthorized OS booting
Current OS and software patches — the deployment machine must be on a patched, supported OS
Written security policy — a documented AI use policy that staff acknowledge, covering what data may be used with the AI and what is prohibited

Compliance Requirements (HIPAA / Legal)

Audit logging — user ID, timestamp, and scenario for each AI session, stored in a tamper-evident log
Role-based access controls — not all staff need access to all AI capabilities; limit access to job function
Backup and recovery — the AI deployment (model, configuration, indexes) should be included in the practice's backup and DR procedures
Incident response plan update — include the AI system in the practice's HIPAA incident response plan; document how a compromise would be detected and responded to
Risk assessment update — HIPAA requires an annual risk assessment (45 C.F.R. § 164.308(a)(1)); the addition of any new system, including an AI deployment, requires updating this assessment

Recommended Enhancements

Network segmentation — place the AI server on a dedicated VLAN or subnet, separate from guest and general staff networks
VPN or LAN-only access — if remote access to the AI is needed, require VPN authentication rather than exposing the application to the public internet
Endpoint detection and response (EDR) — commercial EDR on the deployment machine provides early warning of compromise
Physical access logging — log physical access to the room or rack containing the AI server

Section 10

Implementation Considerations

Hardware Requirements

Current open-source models can run on consumer-grade hardware. A workstation with 16GB of RAM can run 7B to 8B parameter models (Llama 3 8B, Gemma 3) at practical speeds for document analysis. A GPU with 8GB of VRAM dramatically improves performance — on an NVIDIA RTX 4070 or equivalent, inference on an 8B model produces results in seconds rather than minutes. For 13B to 32B parameter models, which produce higher quality output on complex tasks, 32GB of RAM and a 12GB+ VRAM GPU are recommended.

Model Selection

For most law firm and medical practice workflows, the following models are recommended as starting points:

Llama 3 8B — Meta's flagship 8B model. Excellent general performance, fast on modern consumer hardware, strong instruction following. Suitable for document Q&A, summarization, and note generation.
Gemma 3 12B — Google's 12B model. Strong on structured output tasks, well-suited for SOAP note generation, prior authorization letters, and contract analysis where format matters.
Mistral 7B — Strong multilingual performance and efficient inference. A good choice for practices with non-English-speaking patient populations.

Typical Deployment Timeline

Week 1: Hardware assessment, model selection, baseline security configuration, initial deployment and testing
Week 2: Workflow configuration — prompt templates tuned to practice-specific documentation style, staff training, security policy documentation
Week 3–4 (if applicable): RAG indexing of practice documents, custom workflow development, audit logging setup
Ongoing: Model updates (open-source models improve regularly), workflow refinement, security patch management

Cost Considerations

The capital cost of a zero cloud AI deployment is primarily the hardware, if an upgrade is needed. Most practices and firms already own hardware capable of running smaller models. There is no per-seat license, no monthly subscription, and no per-query cost. The economics are substantially different from cloud AI products: higher upfront cost, zero ongoing variable cost, and a total cost of ownership that typically inverts within 12 to 18 months compared to an equivalent enterprise cloud AI subscription.

The Role of Professional Implementation

While the technical components of a zero cloud AI deployment are increasingly accessible, the compliance configuration — audit logging, access controls, risk assessment integration, policy documentation — requires domain expertise. A practice or firm that deploys open-source AI without attending to the compliance layer has reduced cloud exposure but may have introduced new compliance gaps. Professional implementation that addresses both the technical and compliance dimensions is the appropriate standard.

Section 11

Conclusion

The adoption of AI tools in law firms and medical practices is no longer a future-tense phenomenon. It is happening now, largely driven by individual staff members using consumer products that were not designed for professional confidentiality requirements. The gap between current practice and compliant practice is measurable, documented, and in some cases already litigated.

The Heppner ruling is the clearest signal yet that courts are paying attention. A federal judge has ruled that cloud AI use can destroy attorney-client privilege — and has implicitly identified on-premises AI as the architecturally privileged alternative. HIPAA enforcement data shows that OCR is willing to pursue civil monetary penalties against practices of all sizes. The fine schedule is not designed to bankrupt small practices; it is designed to make non-compliance more expensive than compliance.

Zero cloud AI is not a perfect solution — this paper has been explicit about what it does and does not protect against. But it addresses the most significant and most documented risk: the involuntary disclosure of confidential data to third parties through cloud transmission. Combined with the physical security, access control, and audit logging measures described in Section 9, it provides a compliance posture that can be defended to a regulator, a client, or a court.

The good news is that the technical barrier to implementation has fallen dramatically. Open-source models running on practice-owned hardware are capable of performing the specific workflows — SOAP note generation, deposition analysis, contract review, prior authorization letters — that create the most immediate productivity value. The investment required is modest compared to the ongoing cost of cloud AI subscriptions, and the compliance return is immediate.

The practices and firms that act now — before a breach, before a discovery dispute, before an OCR investigation — will be better positioned than those that wait for a forcing event.

See It in Action

AI Driven deploys private, zero-cloud AI for medical practices and law firms in Los Angeles and nationally. Schedule a demonstration to see the system running on your documents, in your office, with zero data leaving your building.

Schedule a Demo →

Section 12

References and Citations

American Bar Association, 2025 Legal Technology Survey Report, ABA Legal Technology Resource Center, 2025.
American Medical Association, 2024 Digital Health Survey: Physicians and AI, AMA, 2024.
OpenAI, Privacy Policy and Data Usage, openai.com/privacy, accessed May 2026.
Microsoft, Microsoft 365 Copilot Data Privacy and Security, microsoft.com/en-us/trust-center, accessed May 2026.
Google, Gemini Apps Privacy Notice, support.google.com/gemini, accessed May 2026.
State Bar of California, Practical Guidance for the Use of Generative Artificial Intelligence in the Practice of Law, California State Bar, 2023.
U.S. Department of Health and Human Services, Office for Civil Rights, HIPAA Enforcement Highlights 2024, hhs.gov/ocr, 2025.
United States v. Heppner, 1:25-cr-00503(JSR), 2026 WL 436479 (S.D.N.Y. Feb. 17, 2026).
New York State Bar Association, Loose AI Prompts Sink Ships: How Heppner Shook the Legal Community, nysba.org, March 10, 2026. Available at: nysba.org
45 C.F.R. Parts 160 and 164 — HIPAA Administrative Simplification Regulations.
45 C.F.R. § 164.312(b) — HIPAA Security Rule, Audit Controls.
45 C.F.R. § 164.308(a)(1) — HIPAA Security Rule, Risk Analysis and Management.

Zero Cloud AI:What Law Firms and Medical PracticesNeed to Know

Executive Summary

Key Findings

The Cloud AI Problem

What Actually Happens to Your Data

The Legal Exposure: Attorney-Client Privilege

The Work Product Doctrine

State Bar Ethics Opinions

The Practical Risk for Law Firms

The HIPAA Exposure: Medical Practices

What Constitutes a HIPAA Violation

The HIPAA Fine Schedule

Breach Notification Requirements

Why "We Have a BAA" Is Not Enough

What a BAA Actually Covers

The Minimum Necessary Standard

What Zero Cloud Actually Means

The Technical Architecture

The Models

What Zero Cloud Protects Against

What Zero Cloud Does Not Protect Against

Physical Security

Local Network Exposure

Access Control and Authentication

Audit Logging

Endpoint Security

Insider Threats

Model Integrity

The Complete Security Stack

Baseline Requirements (Deploy Before Go-Live)

Compliance Requirements (HIPAA / Legal)

Recommended Enhancements

Implementation Considerations

Hardware Requirements

Model Selection

Typical Deployment Timeline

Cost Considerations

The Role of Professional Implementation

Conclusion

See It in Action

References and Citations

Zero Cloud AI:
What Law Firms and Medical Practices
Need to Know