LLM proxy architecture
Deploy AI security through the pattern that fits your infrastructure: reverse proxy, Kubernetes sidecar, client-side agent, or ICAP gateway integration. All patterns share the same inspection and policy engine.
Deployment patterns
Reverse Proxy
Standalone proxy server that applications route AI API calls through. Simple DNS or environment variable change redirects traffic.
Use case: API-first applications, centralized AI access
Pros
- ✓ Simple deployment
- ✓ Provider-agnostic
- ✓ Centralized logging
Cons
- △ Requires DNS/config change
- △ Single point of routing
Sidecar
Co-located container that intercepts AI traffic at the pod level in Kubernetes environments.
Use case: Kubernetes-native, microservice architectures
Pros
- ✓ No application code changes
- ✓ Per-service policies
- ✓ Container-native
Cons
- △ K8s dependency
- △ Per-pod resource overhead
Client Agent
Browser extension, editor plugin, or CLI tool that intercepts AI interactions at the user endpoint.
Use case: Browser AI tools, IDE completions, developer CLIs
Pros
- ✓ Covers web AI tools
- ✓ IDE integration
- ✓ No server changes
Cons
- △ Requires endpoint deployment
- △ Client-side complexity
ICAP Gateway
Integration with existing web proxies (Zscaler, Squid) via ICAP protocol for network-level AI traffic interception.
Use case: Enterprise proxy infrastructure, NW-level control
Pros
- ✓ Leverages existing infra
- ✓ No endpoint agents
- ✓ Policy at network edge
Cons
- △ Requires ICAP-capable proxy
- △ Limited to HTTP traffic
Unified policy across patterns
The key advantage of PromptWall's architecture is that all deployment patterns share the same policy engine. The same policy rules, PII detection thresholds, and injection prevention apply whether traffic arrives from the reverse proxy, sidecar, client agent, or ICAP gateway.
Choose your deployment pattern
Explore how PromptWall fits your infrastructure.
Frequently asked questions
What is an LLM proxy?+
An LLM proxy is a reverse proxy that intercepts AI API traffic between your applications and LLM providers. It inspects prompt content, applies security controls, and forwards clean requests to the provider — similar to how a web proxy inspects HTTP traffic.
Does a proxy add significant latency?+
PromptWall's proxy architecture adds less than 100ms of latency for inspection — negligible compared to LLM inference times of 1-10 seconds. Detection engines run in parallel, and the policy engine evaluates rules in-memory for minimal overhead.
Which deployment pattern should I choose?+
For API-only workloads, use the reverse proxy pattern. For Kubernetes applications, use sidecar injection. For browser and editor protection, use the client-side agent. Most enterprises use a combination — the same policy engine governs all patterns.
