Trovella Wiki

Duplication Detection

jscpd configuration reference -- thresholds, scanned formats, exclusions, and the HTML report artifact.

jscpd (JavaScript Copy/Paste Detector) finds duplicated code blocks across the codebase. It tokenizes source files and identifies sequences of tokens that appear in multiple locations, regardless of variable naming differences. This catches "mutant duplicates" -- when AI agents solve the same problem independently in different sessions, they create nearly-identical code blocks with different variable names that a simple text search would miss.

Running Locally

pnpm lint:duplication

This expands to jscpd ., which reads configuration from .jscpd.json at the monorepo root.

Configuration

The full configuration in .jscpd.json:

{
  "threshold": 15,
  "minLines": 5,
  "minTokens": 50,
  "ignore": [
    "**/node_modules/**",
    "**/dist/**",
    "**/.next/**",
    "**/*.d.ts",
    "**/generated/**",
    "**/migrations/**",
    "pnpm-lock.yaml"
  ],
  "format": ["typescript", "typescriptx"],
  "reporters": ["console", "html"],
  "output": ".jscpd-report",
  "gitignore": true
}

Thresholds

SettingValuePurpose
threshold15%Codebase duplication above this percentage triggers CI failure
minLines5Minimum lines for a clone to be flagged
minTokens50Minimum tokens for a clone to be flagged

The 15% threshold was chosen as a starting point -- strict enough to catch significant duplication but permissive enough to avoid blocking PRs for minor repetition. The 5-line / 50-token minimums avoid false positives on short, naturally repeated patterns like import blocks and simple conditionals.

Scanned Formats

Only TypeScript (typescript) and TSX (typescriptx) files are scanned. JSON, YAML, Markdown, and other formats are excluded because they naturally contain repeated structures that are not actionable.

Exclusions

PatternWhy
node_modules/Third-party code
dist/Build output
.next/Next.js build cache
*.d.tsType declaration files (often contain repeated structures)
generated/Generated code (duplication is expected)
migrations/Database migrations (SQL patterns naturally repeat)
pnpm-lock.yamlLockfile

The gitignore: true flag also excludes anything in .gitignore.

Reporters

Two reporters are configured:

  • console -- outputs clone details to stdout for CI logs
  • html -- generates an HTML report in .jscpd-report/ for visual inspection

The HTML report is retained as a CI artifact for 14 days. It provides a visual view of detected clones with side-by-side diffs, making it easy to determine whether flagged duplicates are actual problems or acceptable structural similarity.

Why AI Agents Create Duplicates

AI coding agents take the shortest path to working functionality. When solving a problem, they write new code rather than searching for an existing implementation. Two common patterns:

  1. Session amnesia -- two Claude Code sessions on different branches independently implement the same utility function. Both implementations compile and pass tests. Only one is actually needed.

  2. Incremental feature addition -- an agent copies an existing feature's implementation as a starting point for a new feature, modifying only the parts that differ. The shared boilerplate is now duplicated across two files.

jscpd catches both patterns because it uses token-based matching, which ignores variable name differences.

CI Integration

jscpd runs as pnpm lint:duplication in the pnpm ci:check pipeline, after Knip (lint:dead-code) and before typecheck. If the overall duplication percentage exceeds 15%, the exit code is non-zero and the PR is blocked.

Responding to Findings

When jscpd flags a duplicate:

  1. Open the HTML report (.jscpd-report/html/index.html) to visually compare the clones
  2. Determine if the duplication is a genuine problem (extractable shared logic) or acceptable structural similarity (similar but semantically different code)
  3. If extractable: create a shared function in the appropriate package (@repo/utils for pure logic, the feature package for domain logic)
  4. If acceptable: no action needed -- the threshold allows reasonable repetition
  • Dead Code Detection -- Knip catches the complementary problem (code that exists once but is unused)

On this page