Docs Kit MCP living connection between code & knowledge

charNgrams function exported ✓ 100.0%

Last updated: 2026-02-24T19:46:21.789Z

Location

src/lib/shared/ngram.ts:60-77

Metrics

LOC: 18 Complexity: 2 Params: 1 Coverage: 100.0% (6/6 lines, 5x executed)

Signature

charNgrams(text: string): : string[]

Summary

Extracts character n-grams from input text. Processing steps: 1. Lowercase 2. Remove diacritics (NFD normalization) 3. Replace common separators with spaces 4. Collapse multiple spaces 5. Pad text with boundary markers ("_") to preserve edge context Example (N = 3): "Email" → "email" → ["em", "ema", "mai", "ail", "il"] Boundary padding improves discrimination between prefixes and suffixes.

Source Code

export function charNgrams(text: string): string[] {
  const normalized = text
    .toLowerCase()
    .normalize("NFD")
    .replace(/[\u0300-\u036f]/g, "")
    .replace(/[_\-/.]+/g, " ")
    .replace(/\s+/g, " ")
    .trim();

  const padded = `_${normalized}_`;

  const result: string[] = [];
  for (let i = 0; i <= padded.length - NGRAM_SIZE; i++) {
    result.push(padded.slice(i, i + NGRAM_SIZE));
  }

  return result;
}

No outgoing dependencies.

Impact (Incoming)

graph LR charNgrams["charNgrams"] buildVocab["buildVocab"] vectorize["vectorize"] buildVocab -->|calls| charNgrams vectorize -->|calls| charNgrams vectorize -->|calls| charNgrams style charNgrams fill:#dbeafe,stroke:#2563eb,stroke-width:2px click charNgrams "4a9652b5a506b435.html" click buildVocab "775e3614ccc16f3a.html" click vectorize "fd9c9589294ace14.html"

Source	Type
buildVocab	calls
vectorize	calls
vectorize	calls