Securing GenAI Systems in Production: Defense-in-Depth Beyond Prompt Injection

Prompt injection has become the headline topic in GenAI security and rightly so — it captures a fundamental property of LLMs that conventional application security did not have to handle. The attention has produced reasonable defensive practice on that specific vector. Less attention has been paid to the broader production security work that determines whether a GenAI system holds up under real conditions. The systems that operate safely at scale share defense-in-depth practices that go well beyond prompt injection countermeasures.

Treat the LLM as Untrusted Code

The most useful mental model for production GenAI security is to treat the LLM as untrusted code running in your environment. Untrusted code gets sandboxed, has tightly scoped permissions, runs with monitored capabilities, and its outputs are not implicitly trusted by downstream consumers. The same disciplines apply directly to LLMs. The model is not malicious; it is unreliable in ways that are different from but analogous to a third-party library. Architectural choices flow from this framing.

Data Flows: Knowing What Goes In and What Comes Out

A surprising number of production GenAI incidents trace back to data flows the team did not fully map. Customer PII flowing into prompts and then into provider logs. Confidential business data ending up in fine-tuning datasets that train shared models. Retrieved documents being shown to users without the access control that the source system would have applied. The mapping work is unglamorous — for each model call, what data goes in, what data comes out, where does it flow next, who has access at each step. Implementations that do this mapping rigorously discover surprising data exposure paths; implementations that skip it find them through incidents.

Privilege Separation Around the Model

A model with file system access is more dangerous than one with read-only access. A model with the ability to make API calls to internal systems is more dangerous than one that cannot. A model that operates on behalf of an authenticated user with broad permissions is more dangerous than one whose tool surface is narrowed to the specific permissions the current task requires. Privilege separation around the model is the most effective defense against abused capabilities. Where the worst case is "the model could be tricked into doing X," the answer is usually to remove the ability to do X rather than to harden the model against the trick.

Output Validation: The Step Teams Most Often Skip

Output from an LLM that gets rendered in a browser, executed as code, passed to a downstream API, or stored in a database needs validation appropriate to the destination. Rendering model output as HTML without escaping reintroduces XSS. Executing model-generated SQL without validation reintroduces injection. Passing model output to a downstream service without schema validation produces unpredictable downstream behaviour. These are not novel vulnerabilities; they are familiar ones reintroduced by treating model output as trusted. Validation that conventional applications already do at trust boundaries needs to be applied at the LLM boundary too.

A pattern that surfaces in incident reviews: the team built strong defenses against prompt injection on the input side and treated the output as inherently safe because "it is just the model speaking." The system was vulnerable to attacks where a malicious input produces an output that exploits the consuming system. The model was the messenger, not the attacker — but the system was still compromised. Output validation matters as much as input validation.

Telemetry That Detects What Filters Miss

A defensive system that relies entirely on filters will eventually fail because filters can be bypassed. The complement is telemetry — comprehensive logging of model interactions, behavioural baselines for what normal looks like, anomaly detection that flags interactions outside the baseline. Telemetry does not prevent attacks. It detects them, often after they have started but before they have completed, giving the team a chance to respond. GenAI systems without this telemetry are flying blind.

Operational Practices That Hold Up

Map data flows end-to-end for every model call; revisit when integrations change
Apply least-privilege scoping to model capabilities per task, not per platform
Validate model output before consumption, with validation appropriate to the consuming system
Maintain detailed telemetry of model interactions and detect anomalies against a behavioural baseline
Plan incident response for GenAI-specific scenarios — not all conventional IR playbooks apply
Conduct red-team exercises against the deployed system regularly, not just at launch
Treat the prompt-and-context structure as code and review it through the same process as source code

When the Discipline Pays Back

GenAI security investment pays back most clearly for systems that handle sensitive data, operate at meaningful scale, or take actions with consequences. For low-risk internal tools, the operational overhead of full defense-in-depth may exceed the risk it addresses. For customer-facing systems, regulated environments, or systems with material agency, the investment is what determines whether the system can operate safely under realistic adversarial conditions. The discipline scales with the stakes — and the stakes have been rising consistently as more GenAI systems take on roles that previously belonged to vetted human workflows.