How Archevi Protects Your Family's Privacy
When you upload sensitive family documents -- tax returns, insurance policies, legal papers, medical records -- you need to know they're protected. Not just stored, but architecturally protected from exposure.
At Archevi, privacy isn't a toggle you switch on. It's built into the foundation of how we process every query. Here's exactly how it works.
The Problem with Most AI Document Apps
Most AI-powered apps send your actual data to cloud language models. When you ask "what did my insurance policy say about water damage?", the AI receives your real policy text, your real name, your real address -- everything. Even if providers promise not to train on your data, the information still crosses the wire in plain form.
Archevi takes a fundamentally different approach.
Boundary Anonymization: How It Works
We call our approach boundary anonymization. The idea is simple: anonymize data at the boundary between our system and external AI providers. Your real data lives securely in our database. Only when a query needs AI processing do we replace personal information with realistic surrogates.
Here's what happens when you ask a question:
1. You ask: "What did Sarah Thompson say about the mortgage renewal?"
2. Archevi detects entities: Sarah Thompson (PERSON), mortgage (financial term)
3. The AI receives: "What did James Chen say about the mortgage renewal?"
4. The AI processes the query using surrogates, finds the relevant document passages, and generates an answer referencing "James Chen"
5. You see the answer with your real names restored: "Sarah Thompson mentioned the renewal is due in March..."
The AI never knew your real name. It processed a realistic but fake identity, found the right information, and returned a useful answer. We swapped the surrogates back before you saw the result.
If you accidentally include sensitive data like a SIN or credit card number in a question, Archevi's hard redaction layer catches it before it reaches any AI service. The query is blocked entirely, not anonymized.
What Gets Anonymized
Our entity detection system, powered by Microsoft Presidio (the same NER engine used by enterprises worldwide), automatically detects and replaces:
- Names -- personal and family names become different realistic names
- Email addresses -- replaced with generated surrogate emails
- Phone numbers -- swapped with different numbers
- Locations -- cities and addresses replaced (Toronto becomes Halifax, etc.)
- Organizations -- company names replaced with generated alternatives
Each conversation maintains its own anonymization vault -- a mapping between real entities and their surrogates. This means the same person always maps to the same surrogate within a conversation, so the AI can reason consistently across multiple questions.
Hard Redaction: The Second Layer
Some data is too sensitive even for surrogates. When our system detects highly sensitive information like Social Insurance Numbers, credit card numbers, bank account numbers, or passport numbers, it doesn't anonymize them -- it blocks the query entirely.
This two-layer approach uses:
- Layer 1: Regex pattern matching -- instant detection of structured data formats (SIN patterns, credit card numbers, IBANs)
- Layer 2: Presidio NER analysis -- deep entity recognition for unstructured mentions of sensitive data
If either layer detects highly sensitive data, the query is rejected before it reaches any external service. You'll see a clear message explaining what was detected and why the query was blocked.
Canadian Data Residency
Your documents are stored on Canadian infrastructure (DigitalOcean, Toronto region) and are subject to Canadian privacy law (PIPEDA). Your files never leave our servers. Only anonymized query text -- containing surrogates, not your real data -- reaches cloud AI providers for processing.
Your family documents never leave Canadian infrastructure. All data is stored on servers in Toronto, subject to Canadian privacy law. When we use AI features, your personal information is anonymized before it reaches any external service.
AI Providers We Trust
We use AI providers with contractual commitments not to use customer data for model training. For the full technical comparison, see our AI with guardrails post.
Family Isolation
Every family on Archevi operates in a completely separate tenant with database-enforced row-level security. Your documents, conversations, anonymization vaults, and search history are invisible to other families. There is no query path that crosses tenant boundaries.
What We Don't Do
- We don't sell your data
- We don't use your documents for AI training
- We don't share your data with advertisers
- We don't send real personal information to cloud AI providers
- We don't retain data longer than needed
Privacy-preserving AI isn't just a feature we added. It's the architecture we built. Learn more on our security page, or start a free trial to see it in action.
For a deeper technical dive into how we run LLMs without data exposure, see our post on AI with guardrails.
For a comparison of how AI providers handle training data, read why your family AI won't train on your data.
Related Posts
Why Your Family Document AI Won't Use Your Data for Training
The #1 concern with AI tools: will my data train the model? At Archevi, the answer is no -- not by policy alone, but by architecture. Three independent layers ensure your family data never trains any AI.
Archevi vs. Google Drive: Why Families Need More Than Storage
Google Drive is great for storing files, but managing a family's important documents requires more than just storage. Compare AI search, privacy, expiry tracking, and family features side by side.
Why We Self-Host Everything on One Server
Most startups spread their stack across a dozen SaaS platforms. We put everything -- website, CMS, database, analytics, and AI pipeline -- on a single server. Here's why, and what it actually costs us in ways that aren't just money.