charNgrams function exported ✓ 100.0%
Last updated: 2026-02-24T19:46:21.789Z
Location
Metrics
LOC: 18
Complexity: 2
Params: 1
Coverage: 100.0% (6/6 lines, 5x executed)
Signature
charNgrams(text: string): : string[]
Summary
Extracts character n-grams from input text. Processing steps: 1. Lowercase 2. Remove diacritics (NFD normalization) 3. Replace common separators with spaces 4. Collapse multiple spaces 5. Pad text with boundary markers ("_") to preserve edge context Example (N = 3): "Email" → "email" → ["em", "ema", "mai", "ail", "il"] Boundary padding improves discrimination between prefixes and suffixes.
Source Code
export function charNgrams(text: string): string[] {
const normalized = text
.toLowerCase()
.normalize("NFD")
.replace(/[\u0300-\u036f]/g, "")
.replace(/[_\-/.]+/g, " ")
.replace(/\s+/g, " ")
.trim();
const padded = `_${normalized}_`;
const result: string[] = [];
for (let i = 0; i <= padded.length - NGRAM_SIZE; i++) {
result.push(padded.slice(i, i + NGRAM_SIZE));
}
return result;
}
No outgoing dependencies.
Impact (Incoming)
| Source | Type |
|---|---|
| buildVocab | calls |
| vectorize | calls |
| vectorize | calls |