Health Checks

The /api/health endpoint, individual service checks, Docker healthcheck integration, and degraded-state behavior.

The /api/health endpoint is the primary liveness and readiness signal for Trovella. It checks three infrastructure services in parallel and returns an aggregate status. Docker uses it as a container healthcheck, and external uptime monitors can poll it.

Endpoint Behavior

The handler in apps/web/src/app/api/health/route.ts runs three checks concurrently:

const [database, redis, typesense] = await Promise.all([
  checkDatabase(), // SELECT 1 via Drizzle
  checkRedis(), // PING via IORedis
  checkTypesense(), // /health via Typesense client
]);

Each check function returns { ok: boolean, latencyMs: number } and never throws -- errors are caught internally, returning ok: false with the time spent waiting.

Response Shape

{
  "status": "healthy",
  "checks": {
    "database": { "ok": true, "latencyMs": 3 },
    "redis": { "ok": true, "latencyMs": 2 },
    "typesense": { "ok": true, "latencyMs": 8 }
  },
  "timestamp": "2026-04-08T14:30:00.000Z"
}

Status Logic

Condition	Status	HTTP Code	Meaning
All three checks pass	`healthy`	200	Everything is operational
One or two checks fail	`degraded`	200	Partial outage, app still functional
All three checks fail	`unhealthy`	503	Total infrastructure failure

A degraded status still returns HTTP 200. This is intentional -- Docker healthchecks and uptime monitors require a 2xx response to consider the container alive. The application can still serve requests with degraded infrastructure (for example, cache misses fall through to the database, or search falls back to pgvector if Typesense is down). Only a total failure returns 503.

Logging on Degradation

When the health check is not fully healthy, the endpoint logs a warning with the full check results:

if (!allHealthy) {
  logger.warn(body, "Health check degraded or unhealthy");
}

This means healthy checks produce no log output -- only degraded or unhealthy states generate log entries, keeping Cloud Logging noise-free during normal operation.

Individual Check Functions

Database (`checkDatabase`)

Defined in packages/db/src/client.ts. Executes SELECT 1 against the PostgreSQL connection pool via Drizzle ORM.

export async function checkDatabase(): Promise<{
  ok: boolean;
  latencyMs: number;
}> {
  const start = performance.now();
  try {
    await db.execute(sql`SELECT 1`);
    return { ok: true, latencyMs: Math.round(performance.now() - start) };
  } catch {
    return { ok: false, latencyMs: Math.round(performance.now() - start) };
  }
}

In production, this goes through the Cloud SQL Auth Proxy sidecar container. A failure here typically means the proxy is down, the Cloud SQL instance is unreachable, or the connection pool is exhausted.

Redis (`checkRedis`)

Defined in packages/cache/src/health.ts. Sends a PING command via IORedis. See Caching -- Health and Observability for full details on Redis health, connection lifecycle, and debugging.

Typesense (`checkTypesense`)

Defined in packages/search/src/health.ts. Calls the Typesense /health HTTP endpoint via the Typesense client library.

export async function checkTypesense(): Promise<TypesenseHealthCheck> {
  const start = performance.now();
  try {
    const client = getTypesense();
    await client.health.retrieve();
    return { ok: true, latencyMs: Math.round(performance.now() - start) };
  } catch {
    return { ok: false, latencyMs: Math.round(performance.now() - start) };
  }
}

Docker Healthcheck Integration

The production Docker Compose file (infra/docker-compose.prod.yml) configures a healthcheck on the web service that calls the health endpoint:

web:
  healthcheck:
    test:
      [
        "CMD",
        "node",
        "-e",
        "fetch('http://localhost:3000/api/health').then(r => { if (!r.ok) process.exit(1) }).catch(() => process.exit(1))",
      ]
    interval: 10s
    timeout: 5s
    retries: 3
    start_period: 15s

Key details:

start_period: 15s -- Docker waits 15 seconds before the first check, giving the Next.js server time to start.
retries: 3 -- the container is only marked unhealthy after three consecutive failures.
The check uses Node's fetch rather than curl or wget because the standalone Next.js image is minimal and does not include those tools.
Only a 503 response (all checks failed) triggers an unhealthy status. Degraded (200) is treated as healthy for container orchestration purposes.

The Caddy reverse proxy depends on this healthcheck via condition: service_healthy, so Caddy does not start routing traffic until the web container is confirmed healthy.

Local Development

In local development, Docker Compose (docker-compose.yml) defines individual healthchecks for Postgres and Redis:

postgres:
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U trovella"]
    interval: 5s

redis:
  healthcheck:
    test: ["CMD", "redis-cli", "ping"]
    interval: 5s

You can verify the health endpoint locally by hitting it directly:

curl http://localhost:3000/api/health | jq

Common Issues

Symptom	Likely Cause	Resolution
All checks `ok: false`	Dev server not connected to Docker services	Run `pnpm docker:up` to start local containers
Database `ok: false`, others pass	Cloud SQL Proxy down (prod) or Postgres container stopped (local)	Check `docker ps` for the proxy/postgres container
Redis `ok: false` only	Redis container or Upstash connection issue	See Caching -- Health and Observability
Typesense `ok: false` only	Typesense container not running or API key mismatch	Check `TYPESENSE_API_KEY` env var and container status
High `latencyMs` on any check	Network issue or overloaded service	Check Cloud SQL / Upstash status; review connection pool settings