Document filing rules
The harness assumes you'll file documents (PDFs, screenshots, emails) into the vault throughout the year. This chapter is the discipline for doing that without creating compliance or security problems.
The PII rule
The harness has one non-negotiable rule:
.md file.PII includes:
- Social Security Numbers (SSNs)
- Bank account numbers
- Routing numbers
- Full credit card numbers
- Passport, driver's license, or government ID numbers
- Health record details (HIPAA-protected)
Where these appear in source documents (PDFs of bank statements, tax returns, payroll stubs), the source PDF stays in the vault as-is. But any .md file you generate from it replaces these with [on file].
Why
.md files get read by AI tools, may sync to backups or cloud services, may end up in search indices. Even if your vault is encrypted, the .md files are easier to leak than PDFs.
Treat .md as public-adjacent. Treat source PDFs as inert.
This rule has saved me from at least three potential leaks in the last 18 months.
The PII scan hook
The vault template ships with a pre-commit-style hook at _System/hooks/pii-scan.py. Before any AI tool writes a .md file, the hook scans the content for PII patterns:
- SSN format:
XXX-XX-XXXX - Routing numbers: 9-digit sequences in known bank patterns
- Credit card numbers: 13-19 digit sequences passing Luhn
- Account numbers: longer numeric sequences in financial contexts
If the hook detects PII, it blocks the write and tells the user to replace with [on file].
Install the hook before your first session.
The .md companion pattern
Every PDF you file in the vault gets a .md companion with the same basename:
The companion contains parsed structured data:
The companion answers the questions you'd ask without re-opening the PDF: "what was the ending balance?", "what was that big transaction?"
The companion does NOT include the account number, routing number, or full statement detail. Those stay in the PDF.
Naming conventions
For files inside numbered categories:
Temporal documents (statements, invoices, contracts): YYYY-MM-DD_short-description.ext
Examples:
2026-05-20_lease-amendment-208-e-main.pdf2026-04-15_invoice-acme-vendor.pdf2025-12-31_w2-from-msc.pdf
Topic documents (analyses, SOPs, frameworks): topic-name.md
Examples:
vendor-evaluation-criteria.mdpricing-tier-rationale.mdquarterly-close-checklist.md
Lowercase, kebab-case, date-prefixed when relevant.
Filing by category — examples
A vendor invoice arrives
- Save the PDF:
04-resources/vendors/acme-vendor/invoices/2026-05-20_invoice-INV-2026-001.pdf - Create a companion:
2026-05-20_invoice-INV-2026-001.mdwith:- Invoice date, due date, amount
- Line items
- Payment method (account references redacted)
- Notes
- When you pay the invoice, move the PDF to
02-finance/invoices/paid/YYYY/and update the companion to add the payment date.
A new contract gets signed
- Save the PDF:
05-legal/contracts/2026-05-20_acme-msa.pdf - Create a companion with:
- Parties
- Effective date
- Term and renewal terms
- Key provisions
- Termination notice period
- Auto-renewal flag
If the contract is vendor-specific, you can also drop a pointer in 04-resources/vendors/acme/contracts/:
A tax document
- Save the PDF:
02-finance/tax/2025/2026-04-15_2025-form-1040.pdf - Create a companion:
2026-04-15_2025-form-1040.mdwith parsed data — but SSNs replaced with[on file], bank account numbers replaced with[on file]. - The companion should be enough to answer "what did I owe?" or "what was my AGI?" without re-opening the PDF.
A screenshot
Screenshots come from your daily work and often don't have an obvious category. Default: drop in today's daily-log directory with a descriptive name. File properly later if it ends up mattering:
If three days later you decide the screenshot was important, move it to the right category. If not, it stays in daily-log forever and you can delete it during quarterly cleanup.
What about email?
Emails are the most common source of inbound documents. Two approaches:
Approach A: Save the email PDF/raw
Forward important emails to a personal address, or use your email client's "Save as PDF" function. File the PDF in the right category with a .md companion.
Approach B: Just save the parsed companion
If the email isn't worth preserving in original form, write only the companion: a .md with the date, sender, subject, key content. File in the right category.
For most operational emails, Approach B is sufficient. Reserve Approach A for emails you might need as legal evidence (contracts, disputes, complaints).
When to file vs. when to skip
Not everything deserves filing. A guide:
- Tax documents (statutory retention requirements)
- Contracts and legal agreements
- Insurance policies
- Bank / credit card statements (3 years minimum)
- Employee records (federal retention rules vary)
- Vendor invoices (paid and unpaid)
- Customer payments (refunds, chargebacks)
- Major correspondence (anything you might cite later)
- Marketing emails
- Routine confirmations ("order placed," "package shipped")
- One-time meeting notes (unless decisions were made — those go in daily log)
- Drafts you never sent
When in doubt, drop it in today's daily-log directory and decide later.
Quarterly cleanup
Every quarter, spend an hour cleaning up:
- Move documents that were dropped in daily-log directories into their proper categories
- Delete documents that turned out to be noise
- Audit
_loop/collect/files — are observations still relevant? Did any become measured patterns? - Review the PII scan hook log — any blocked writes? Why?
Without quarterly cleanup, the vault accumulates clutter that makes it slower to navigate. With cleanup, the vault stays sharp.
Retention rules
Different documents have different retention requirements:
- Tax records: 7 years (federal IRS rule for most categories)
- Employment records: varies — I-9 forms are 3 years from hire OR 1 year after termination; payroll records are 4 years federal, varies by state
- Bank statements: 3 years (statutory), 7 years (best practice)
- Contracts: term of contract + 6 years
- Health records: HIPAA permits indefinite retention; some states limit
When you don't know the rule, default to 7 years.
The harness makes retention easy: organize by year (02-finance/tax/2025/), and at year-end you can archive entire year folders to cold storage.
What NOT to put in the vault
Some things should not live in your vault at all:
- Passwords: use a dedicated password manager (1Password, Bitwarden)
- API keys and credentials: use environment variables or a secrets manager, never files
- Cryptocurrency private keys: hardware wallets only
- The PII you redact from .md files: stays in source PDFs only, never duplicated elsewhere
The vault is durable, backed up, and AI-readable. That's the wrong place for secrets.
Next chapter: anti-patterns I learned by stepping in them.