ADR-003: Shared Schema + Row-Level Security

Decision record for choosing shared schema with PostgreSQL RLS over schema-per-tenant, database-per-tenant, and application-layer filtering.

Status: Accepted Date: 2026-03-21 (Week 0 Decision Sprint), modified 2026-03-26 (TRO-8 implementation) Deciders: Kyle Olson (Solo Founder)

Decision

Use shared schema with PostgreSQL Row-Level Security as the primary tenant isolation mechanism. All tenants share one database, one schema. Every tenant-scoped table has RLS policies enforced at the database level, making cross-tenant data access structurally impossible even if application code is bypassed. A dedicated database-per-tenant upgrade path is designed but deferred until enterprise customers justify the cost.

Tenancy model: shared schema + RLS (all tenants in one database)
Three-tier accounts: Personal, Family, Company -- implemented as Better Auth organizations with a type field
Isolation mechanism: PostgreSQL RLS policies on every tenant-scoped table, enforced via transaction-scoped SET LOCAL variables
Defense-in-depth: Three layers -- session context, CASL authorization, database enforcement

Context

Multi-tenant SaaS where a cross-tenant data leak is a company-killing event. The founder's explicit framing: tenant isolation must be enforced at the database level, not just trusted to application code. AI agents writing code take the shortest path to working functionality and will bypass application-layer checks if nothing stops them -- the database must be the final safety net.

Constraints:

RLS is non-negotiable -- retrofitting RLS onto an existing schema is exponentially more expensive than building it from day one
Budget: $27/month for Cloud SQL at launch; ~$0.27 per tenant at 100 tenants
ORM support: Drizzle is the only TypeScript ORM with native RLS policy support
Scalability: must handle 10K+ tenants on shared infrastructure before any dedicated instances are needed

Alternatives Considered

Schema-per-Tenant

Separate PostgreSQL schemas per tenant within one database.

Pros: stronger logical isolation than shared schema, per-tenant schema customization, database-level access control via schema permissions
Cons: migration complexity scales linearly with tenant count (500 tenants = 500 DDL operations). PostgreSQL catalog bloat limits practical scale to ~500-2,000 tenants. No physical isolation despite separate schemas. No native Drizzle ORM support for runtime schema switching.
Why rejected: combines the operational complexity of database-per-tenant with the isolation limitations of shared infrastructure. Catalog bloat at scale and linear migration cost make it untenable for a solo developer.

Database-per-Tenant

Dedicated Cloud SQL instance per tenant.

Pros: maximum physical isolation, per-tenant performance tuning and backup/restore, meets strictest compliance requirements (HIPAA, FedRAMP)
Cons: minimum $27/month per tenant. At 1,000 tenants = $27,000/month vs $300/month shared. Each instance requires provisioning (5-10 min), monitoring, patching, migration orchestration.
Why rejected: cost-prohibitive as the default tier. Preserved as the enterprise upgrade path -- when a customer's revenue justifies dedicated infrastructure ($500+/month contract), a dedicated instance can be provisioned via Terraform and the tenant routing layer updated with one row change.

Application-Layer Filtering Only (No RLS)

Rely on WHERE tenant_id = ? in every query.

Pros: simplest to implement, no PostgreSQL-specific features needed, works with any ORM
Cons: every query must include the filter. A single forgotten filter -- by a human or AI agent -- leaks all tenant data. No database-level safety net.
Why rejected: one missed filter in one query is a company-killing event. AI agents generating code are especially likely to miss filter clauses in complex joins or subqueries. The database must enforce isolation independently of application code.

Key Implementation Decisions

`tenantPolicies()` generates four CRUD policies per table

Every tenant-scoped table calls tenantPolicies(tableName), which generates SELECT, INSERT, UPDATE, and DELETE policies using (SELECT current_setting('app.tenant_id')). The SELECT subquery wrapper enables PostgreSQL initPlan caching (avoids re-evaluating per row). Writing individual policies per table would be error-prone and inconsistent.

For the helper implementation, see Data & Storage -- Tenant Scoping.

`withTenantContext()` uses transaction-scoped `SET LOCAL`

All tenant-scoped database access goes through withTenantContext(tenantId, userId, fn), which opens a transaction, calls set_config('app.tenant_id', tenantId, true) (the true = transaction-local), then runs the callback. Transaction-local scope means the tenant context is automatically cleared on COMMIT/ROLLBACK and never leaks between pooled connections.

Connection-level SET was rejected because connection pooling would leak tenant context between requests.

For the runtime mechanics, see Data & Storage -- Tenant Context.

Tenant column is `organization_id` (TEXT), not `tenant_id`

Better Auth's organization plugin uses organization_id as its column name with TEXT type. Rather than maintaining a mapping layer, all tenant-scoped tables use organization_id directly. The tenantPolicies() helper defaults to "tenant_id" but every call in the codebase passes "organization_id" explicitly.

Default-deny when tenant context is missing

When app.tenant_id is not set, current_setting() returns an empty string. The comparison organization_id = '' is always FALSE, so zero rows are returned. A missing withTenantContext() call returns empty results rather than all tenants' data -- fail-safe, not fail-open.

Non-tenant tables explicitly skip RLS

Eight tables intentionally have no RLS: user, session, account, verification, organization, ai_model, ai_model_pricing, personal_access_token. These are either global reference data or user-scoped (not tenant-scoped). The list is maintained and verified by architecture tests.

The TRO-10 bug: a company-ending one-character fix

During TRO-10, a code review discovered that tenantProcedure was passing the bare db pool to ctx.db instead of the transaction-scoped tx from withTenantContext(). The code compiled, the app functioned, and single-tenant tests passed -- but downstream queries had no RLS context and would return all tenants' data. The fix was a one-character change (db to tx), but the incident drove all subsequent enforcement work. See Enforcement for the full enforcement evolution.

Consequences

Positive

Database-enforced isolation -- even if application code is completely bypassed, the database rejects cross-tenant queries
Near-zero marginal cost -- adding a tenant is an organization row insert, not infrastructure provisioning
Single migration path -- schema changes apply to all tenants in one operation
Fail-safe default -- missing tenant context returns zero rows, not all rows
Three-layer defense -- session, CASL, and RLS are each independently sufficient for tenant isolation

Negative

2-5% RLS overhead -- every query passes through policy evaluation. Acceptable at current scale; requires organization_id as leading index column for optimization
Logical isolation only -- shared infrastructure means a database-level vulnerability or DBA compromise could expose all tenants. Mitigated by the hybrid upgrade path for enterprise customers.
Per-tenant backup/restore requires row-level extraction -- cannot use Cloud SQL PITR for a single tenant
Noisy neighbor risk -- one tenant's heavy workload affects all tenants. Mitigated by Cloud SQL resource limits.

Risks

RLS bypass through new code paths -- every new database access point must go through withTenantContext(). Mitigated by ESLint import restrictions, the /client subpath pattern, CI architecture tests, and the AI review checklist.
Enterprise isolation demand before hybrid path is built -- if an enterprise customer requires physical isolation before the routing layer exists, it would need to be fast-tracked. Mitigated by the simple routing design (one row in tenant_registry + separate connection string).
RLS performance at scale -- if table sizes exceed millions of rows per tenant with complex policies, query performance may degrade. Mitigated by proper indexing (organization_id as leading column) and the planned Enterprise Plus upgrade.

References

Full ADR-003 text
Related: ADR-001 (Database -- Drizzle ORM), ADR-002 (ID strategy -- TEXT IDs), ADR-004 (Authorization -- CASL)
Linear: TRO-8 (Database + RLS), TRO-10 (CASL + RLS bug fix), TRO-88 (Package boundary enforcement), TRO-94 (CLAUDE.md rules audit)