Private AI Inference — From Trust to Proof

Every time you send a prompt to an AI assistant, your message travels in plaintext through systems that can access it. Even if a provider claims to delete data after processing, your prompt is still visible at runtime — to logging infrastructure, monitoring tools, and the engineers running the platform. You are not interacting in private. You are interacting with a pipeline that is, by design, transparent to its operator.

That is starting to change. A new category of AI infrastructure has emerged in early 2026 — providers that can mathematically prove your prompts are private, not just promise it in a terms of service document. The technology underpinning this shift is called confidential computing, and it is moving faster than most people realize.

What Makes Inference Fundamentally Different

End-to-end encryption has protected messages and files for years. The challenge with AI inference is that a GPU server must be an active endpoint in the conversation — whoever runs that server can, by default, see what is happening inside it. As Moxie Marlinspike, the cryptographer behind Signal, put it when launching his encrypted AI chatbot Confer in December 2025: AI chat interfaces have become some of the largest centralized data lakes in history, accumulating more sensitive data than any prior consumer technology. The same privacy principles he applied to messaging needed to be reinvented from scratch for the inference layer.

The solution is a Trusted Execution Environment, or TEE — a cryptographically sealed enclave built directly into the CPU or GPU. Code and data inside it are isolated from the host operating system, the cloud provider, and the service operator. Before any data is processed, the hardware produces a cryptographic attestation: a signed certificate proving exactly which code is running, and that the environment is genuine and unmodified. This transforms privacy from a policy claim into a verifiable mathematical guarantee.

TEE Alone, or TEE Plus E2EE

TEE-only protection secures the compute layer, but data still passes through the provider’s network in standard TLS during transit. Client-side end-to-end encryption closes that remaining gap: prompts are encrypted on the user’s device before transmission — using key exchange protocols such as ECDH — and can only be decrypted inside a verified enclave. The provider’s infrastructure is on the wire, but never sees the plaintext.

Venice AI released both TEE and full E2EE inference modes in March 2026, operated through enclave partners NEAR AI Cloud and Phala Network. Each response is accompanied by verifiable attestation evidence, and the E2EE implementation uses ECDH key exchange with AES-256-GCM encryption — meaning neither Venice nor its infrastructure partners can access prompts at any stage. Tinfoil, a Y Combinator-backed startup founded by MIT researchers with NVIDIA confidential computing backgrounds, takes a TEE-only approach but pushes the technical frontier further: it claims multi-GPU confidential computing support, automatic client-side attestation verification via SDK, and has an active collaboration with Red Hat on bringing confidential inference to Kubernetes-native infrastructure. For developers, both Venice and Tinfoil offer OpenAI-compatible APIs, making them straightforward drop-in integrations.

Confer combines TEE inference with client-side encryption via WebAuthn passkeys and the Noise Protocol — the same cryptographic foundation as Signal — and the entire stack is open source. In March 2026, Marlinspike announced that Confer’s technology would be integrated into Meta AI. The parallel to Signal’s own trajectory is striking: the Signal Protocol was later adopted by WhatsApp to encrypt billions of conversations. If the Meta integration proceeds at scale, TEE-backed private inference could shift from a niche technical offering to a default feature for one of the world’s most widely used AI platforms.

Proton Lumo and the Limits of Policy-Based Privacy

Proton Lumo, launched by Swiss privacy company Proton in July 2025, takes a different path. Proton has a well-earned reputation in the privacy space — its mail and VPN products are built on genuine zero-access encryption, and that ethos carries over to Lumo’s stored chat history, which is E2EE on the client. For users who want a privacy-respecting AI assistant under European jurisdiction with no account required, Lumo is a compelling option.

The inference layer, however, is a different story. There is no TEE and no cryptographic attestation — privacy during actual prompt processing relies on asymmetric encryption to Proton’s GPU servers, which is a trust-based model rather than a verifiable one. More notably, Proton is opaque about which specific LLMs are powering Lumo at any given time. For a company whose entire brand is built on transparency and verifiability, the unwillingness to disclose which models users are talking to is a meaningful inconsistency. Users extending Proton’s well-deserved trust in its email product to Lumo’s inference privacy should be aware that the guarantees are not equivalent.

How the Providers Compare

Provider	TEE	Client E2EE	Attestation	Public API	Open Source	Jurisdiction
Venice AI	✅	✅	✅	✅	⚠️ Partial	🇺🇸
Tinfoil	✅ Multi-GPU	❌	✅ Auto SDK	✅	✅ Security code	🇺🇸
Confer	✅	✅ Passkey/Noise	✅	❌	✅ Fully open	🇺🇸
Proton Lumo	❌	⚠️ History only	❌	❌	⚠️ Client only	🇨🇭
Chutes	✅	❌	✅	✅	⚠️ Partial	Decentralised

The direction of travel is clear. The era in which AI privacy meant reading a terms of service and deciding whether to trust it is ending. Cryptographically verifiable inference is not a niche development — it is a foundational shift in how the most sensitive layer of the AI stack will be built. The question for the industry is no longer whether this is possible. It is how quickly it becomes expected.

Key Takeaways

Trusted Execution Environments make AI inference privacy verifiable via cryptographic attestation — not just a policy promise.
Client-side E2EE goes further, encrypting prompts on your device so they are never visible in transit or at the provider.
Venice AI and Tinfoil are the main providers with public APIs; Confer has no API but is integrating into Meta AI — potentially reaching billions of users.
Proton Lumo offers genuine value for EU-jurisdiction privacy and zero-access chat history, but its inference layer is trust-based and its model choices are opaque.
All providers with strong cryptographic inference guarantees are currently US-based — a gap that matters for European users and regulated industries.

Photo: Nic Wood via Pexels