Document leak detection
Employees copy-paste from internal documents into AI tools daily — strategy decks, financial reports, customer data, and legal memos. AI DLP with semantic similarity analysis catches this leakage even when content is paraphrased or partially modified.
Beyond keyword matching
Traditional DLP uses keyword lists and regex patterns to detect sensitive content. This fails for document leakage because employees rarely paste entire documents — they paraphrase, summarize, and extract key points. PromptWall uses vector embeddings and cosine similarity to detect semantic matches regardless of surface-level text differences.
How it works
- Corpus indexing — Upload confidential documents to the protected corpus. Each is split into chunks and converted to vector embeddings.
- Real-time comparison — When a prompt is submitted, it is embedded and compared against the entire protected corpus using cosine similarity.
- Threshold evaluation — Similarity scores above the configured threshold (default 0.85) trigger the policy rule. Adjustable per document category.
- Enforcement — Based on policy configuration: flag for review, mask identified sections, or block the entire prompt.
Protected document categories
- Strategic documents — Product roadmaps, competitive analysis, M&A materials, board presentations
- Financial content — Quarterly earnings drafts, revenue projections, pricing models
- Legal and compliance — Contracts, NDAs, regulatory filings, legal opinions
- Customer data — Customer lists, account details, support tickets, feedback reports
- Technical assets — Architecture documents, proprietary algorithms, internal APIs, security assessments
Combined with PII masking for entity-level protection, document leak detection provides comprehensive coverage against data exposure through AI tools.
Multi-tenant isolation
Each tenant maintains a separate protected document corpus — fully isolated. Documents uploaded by one tenant are never compared against prompts from another. This enables managed service providers and enterprise divisions to operate independently within the same PromptWall deployment.
Protect your corporate content
Deploy document leak detection to prevent confidential content from reaching AI providers.
Frequently asked questions
How does document leak detection work?+
PromptWall converts protected documents into vector embeddings and stores them in a similarity index. When a prompt is submitted, it is also embedded and compared against the protected corpus. If the similarity score exceeds the configured threshold, the prompt is flagged or blocked.
Does it catch paraphrased content?+
Yes. Semantic similarity analysis operates on meaning, not exact text. Even if an employee paraphrases a confidential document, the embeddings capture the semantic intent and flag the similarity. This is a key advantage over keyword-based detection.
What types of documents can be protected?+
Any text-based document: strategy decks, financial reports, HR policies, legal contracts, product roadmaps, source code files, customer lists, and research papers. Documents are indexed by tenant, ensuring multi-tenant isolation.
