How Archevi Protects Your Family's Privacy

RHRob HudsonFebruary 3, 2026(Updated June 7, 2026)5 min read

When you upload sensitive family documents (tax returns, insurance policies, legal papers, medical records) you need to know they’re protected from exposure by design.

At Archevi, privacy isn't a toggle you switch on. It's built into the foundation of how we process every query. Here's exactly how it works.

Right now, millions of people are uploading private documents to ChatGPT

Tax returns. Insurance policies. Wills. Medical records. People paste them into ChatGPT or Gemini, ask a question, and get an answer. It works. What most people don’t think about is what happens next.

When you upload a document to ChatGPT, the AI receives your real name, your real address, your real policy numbers, everything in the document, verbatim. By default, OpenAI may use that data to train future models. Google’s Gemini follows a similar pattern. Even if you find the opt-out toggle, your data has already crossed the wire in plain form.

Archevi was built to solve the same problem (ask questions about your family’s documents) without that tradeoff. Here’s exactly how we do it.

Archevi takes a at its core different approach.

Boundary Name removal: How It Works

We call our approach boundary name removal. The idea is simple: anonymize data at the boundary between our system and external AI providers. Your real data lives securely in our database. Only when a query needs AI processing do we replace personal information with realistic surrogates.

Here's what happens when you ask a question:

Sequence diagram showing the privacy pipeline: your question is anonymized by Archevi, sent to AI with coded names only, the AI responds without seeing real identities, and Archevi restores the real names before showing you the answer. — The Privacy Pipeline

1. You ask: "What did Sarah Thompson say about the mortgage renewal?"

2. Archevi detects entities: Sarah Thompson (PERSON), mortgage (financial term)

3. The AI receives: "What did James Chen say about the mortgage renewal?"

4. The AI processes the query using surrogates, finds the relevant document passages, and generates an answer referencing "James Chen"

5. You see the answer with your real names restored: "Sarah Thompson mentioned the renewal is due in March..."

The AI never knew your real name. It processed a realistic but fake identity, found the right information, and returned a useful answer. We swapped the surrogates back before you saw the result.

Tip

If you accidentally include sensitive data like a SIN or credit card number in a question, Archevi's hard redaction layer catches it before it reaches any AI service. The query is blocked entirely rather than anonymized.

What Gets Anonymized

Our entity detection system, powered by Microsoft Presidio (the same NER engine used by enterprises worldwide), automatically detects and replaces:

Names: personal and family names become different realistic names
Email addresses: replaced with generated surrogate emails
Phone numbers: swapped with different numbers
Locations: cities and addresses replaced (Toronto becomes Halifax, etc.)
Organizations: company names replaced with generated alternatives

Each conversation maintains its own name removal vault: a mapping between real entities and their surrogates. This means the same person always maps to the same surrogate within a conversation, so the AI can reason consistently across multiple questions.

Hard Redaction: The Second Layer

Some data is too sensitive even for surrogates. When our system detects highly sensitive information like Social Insurance Numbers, credit card numbers, bank account numbers, or passport numbers, it doesn't anonymize them. Instead it blocks the query entirely.

This two-layer approach uses:

Layer 1: Regex pattern matching: instant detection of structured data formats (SIN patterns, credit card numbers, IBANs)
Layer 2: Presidio NER analysis: deep entity recognition for unstructured mentions of sensitive data

If either layer detects highly sensitive data, the query is rejected before it reaches any external service. You'll see a clear message explaining what was detected and why the query was blocked.

Canadian Data Residency

Your documents are stored on Canadian infrastructure (DigitalOcean, Toronto region) and are subject to Canadian privacy law (PIPEDA). Your files never leave our servers. Only anonymized query text, containing surrogates in place of your real data, reaches cloud AI providers for processing.

Important

Your family documents never leave Canadian infrastructure. All data is stored on servers in Toronto, subject to Canadian privacy law. When we use AI features, your personal information is anonymized before it reaches any external service.

AI Providers We Trust

We use AI providers with contractual commitments not to use customer data for model training. For the full technical comparison, see our AI with guardrails post.

Family Isolation

Every family on Archevi operates in a completely separate tenant with database-enforced row-level security. Your documents, conversations, name removal vaults, and search history are invisible to other families. There is no query path that crosses tenant boundaries.

What We Don't Do

We don't sell your data
We don't use your documents for AI training
We don't share your data with advertisers
We don't send real personal information to cloud AI providers
We don't retain data longer than needed

Privacy-preserving AI isn't just a feature we added. It's the architecture we built. Learn more on our security page, or sign up free to see it in action.

For a deeper technical dive into how we run LLMs without data exposure, see our post on AI with guardrails.

For a comparison of how AI providers handle training data, read why your family AI won't train on your data.