vectorize function exported ✓ 100.0%

Last updated: 2026-02-24T19:46:21.789Z

Location

Metrics

LOC: 24 Complexity: 6 Params: 4 Coverage: 100.0% (10/10 lines, 3x executed)

Signature

vectorize( text: string, vocab: Map<string, number>, ): : Float32Array

Summary

Converts text into a dense Float32 feature vector using a fixed vocabulary. Steps: 1. Generate character n-grams 2. Count term frequency for each n-gram present in vocab 3. Produce dense vector of size vocab.size 4. Apply L2 normalisation Output: - A unit-length vector (||v|| = 1) - Suitable for cosine similarity comparison via dot product Notes: - This is a TF (term frequency) representation, not TF-IDF. - Vocabulary must be consistent between training and inference. - Unknown n-grams are ignored.

Source Code

export function vectorize(
  text: string,
  vocab: Map<string, number>,
): Float32Array {
  const v = new Float32Array(vocab.size);

  // Term frequency accumulation
  for (const ng of charNgrams(text)) {
    const i = vocab.get(ng);
    if (i !== undefined) v[i] += 1;
  }

  // Compute L2 norm
  let norm = 0;
  for (let i = 0; i < v.length; i++) norm += v[i] * v[i];
  norm = Math.sqrt(norm);

  // Normalise to unit vector
  if (norm > 0) {
    for (let i = 0; i < v.length; i++) v[i] /= norm;
  }

  return v;
}

Dependencies (Outgoing)

graph LR vectorize["vectorize"] charNgrams["charNgrams"] vectorize -->|calls| charNgrams style vectorize fill:#dbeafe,stroke:#2563eb,stroke-width:2px click vectorize "20cf7613c5e8a682.html" click charNgrams "4a9652b5a506b435.html"

Target	Type
charNgrams	calls
ng	dynamic_call

Impact (Incoming)

graph LR vectorize["vectorize"] trainModelFromDataset["trainModelFromDataset"] PretrainedState["PretrainedState"] resetModelMock["resetModelMock"] trainModelFromDataset -->|calls| vectorize PretrainedState -->|uses| vectorize resetModelMock -->|uses| vectorize style vectorize fill:#dbeafe,stroke:#2563eb,stroke-width:2px click vectorize "20cf7613c5e8a682.html" click trainModelFromDataset "116a4fd1e25c7132.html" click PretrainedState "5cefc72e50bf5399.html" click resetModelMock "4ef72c19f1c89871.html"

Source	Type
trainModelFromDataset	calls
PretrainedState	uses
resetModelMock	uses