vectorize function exported ✓ 100.0%
Last updated: 2026-02-24T19:46:21.789Z
Location
Metrics
LOC: 24
Complexity: 6
Params: 4
Coverage: 100.0% (10/10 lines, 3x executed)
Signature
vectorize(
text: string,
vocab: Map<string, number>,
): : Float32Array
Summary
Converts text into a dense Float32 feature vector using a fixed vocabulary. Steps: 1. Generate character n-grams 2. Count term frequency for each n-gram present in vocab 3. Produce dense vector of size vocab.size 4. Apply L2 normalisation Output: - A unit-length vector (||v|| = 1) - Suitable for cosine similarity comparison via dot product Notes: - This is a TF (term frequency) representation, not TF-IDF. - Vocabulary must be consistent between training and inference. - Unknown n-grams are ignored.
Source Code
export function vectorize(
text: string,
vocab: Map<string, number>,
): Float32Array {
const v = new Float32Array(vocab.size);
// Term frequency accumulation
for (const ng of charNgrams(text)) {
const i = vocab.get(ng);
if (i !== undefined) v[i] += 1;
}
// Compute L2 norm
let norm = 0;
for (let i = 0; i < v.length; i++) norm += v[i] * v[i];
norm = Math.sqrt(norm);
// Normalise to unit vector
if (norm > 0) {
for (let i = 0; i < v.length; i++) v[i] /= norm;
}
return v;
}
Dependencies (Outgoing)
| Target | Type |
|---|---|
| charNgrams | calls |
| ng | dynamic_call |
Impact (Incoming)
| Source | Type |
|---|---|
| trainModelFromDataset | calls |
| PretrainedState | uses |
| resetModelMock | uses |