Attack patterns
Prompt injection attack patterns a prompt firewall should catch.
Prompt injection is a family of attacks, not one string to block. A serious prompt firewall needs to evaluate intent, context, data movement, and follow-on actions before the request reaches an LLM.
Control
Policy first
Map every AI interaction to allow, flag, mask, or block decisions.
Evidence
Audit ready
Keep explainable records for security, risk, and compliance reviews.
Patterns
Attackers use language, context, and workflow gaps to bypass model intent.
Enterprise teams should test direct override prompts, indirect instructions hidden in retrieved content, jailbreak role-play, encoded payloads, multi-turn escalation, tool-use manipulation, data exfiltration requests, and prompt leakage attempts. Some look malicious immediately; others only become risky when combined with user role, retrieval context, or provider route.
Direct
Direct instruction override
User input asks the model to ignore system or developer instructions.
Read more
Indirect
Indirect retrieved-context attack
A document, web page, or knowledge base entry carries instructions into the model context.
Read more
Data
Sensitive data exfiltration
The prompt tries to extract credentials, customer records, or system prompts from the workflow.
Read more
Defense
Detection should lead to action, not just a risk score.
PromptWall maps prompt injection signals to policy outcomes. Low-confidence patterns can be flagged, clear attacks can be blocked, and sensitive content can be masked before any provider dispatch. That preserves productivity while preventing false confidence.
Test PromptWall against prompt injection patterns
Walk through direct, indirect, and data-focused attack examples with the PromptWall team.
Frequently asked questions
Are prompt injection attacks always obvious?+
No. Many attacks are indirect, encoded, multi-step, or hidden inside retrieved context. That is why prompt-layer inspection needs context and policy.
Should every suspicious prompt be blocked?+
Not always. Enterprise controls should support flagging, masking, and blocking based on confidence, data sensitivity, and business context.
