extractTextFromPdf function exported ✗ 0.0%
Last updated: 2026-03-03T12:11:09.168Z
Location
Metrics
LOC: 45
Complexity: 4
Params: 3
Coverage: 0.0% (0/22 lines, 0x executed)
Signature
extractTextFromPdf(
buffer: ArrayBuffer,
maxChars = 12000,
): : Promise<string>
Summary
Extracts all text content from a PDF ArrayBuffer. Returns the concatenated text of all pages, trimmed to maxChars.
Source Code
export async function extractTextFromPdf(
buffer: ArrayBuffer,
maxChars = 12000,
): Promise<string> {
log.debug(
`extractTextFromPdf: buffer=${(buffer.byteLength / 1024).toFixed(1)}KB, maxChars=${maxChars}`,
);
await ensureWorker();
const pdfjsLib = await import("pdfjs-dist");
const loadingTask = pdfjsLib.getDocument({ data: buffer });
const pdf = await loadingTask.promise;
log.debug(`PDF carregado para extração: ${pdf.numPages} página(s)`);
const textParts: string[] = [];
for (let pageNum = 1; pageNum <= pdf.numPages; pageNum++) {
try {
const page = await pdf.getPage(pageNum);
const content = await page.getTextContent();
const pageText = (content.items as Array<{ str?: string }>)
.filter((item) => typeof item.str === "string")
.map((item) => item.str!)
.join(" ")
.replace(/ {2,}/g, " ")
.trim();
log.debug(`Página ${pageNum}: ${pageText.length} caracteres extraidos.`);
if (pageText) textParts.push(pageText);
page.cleanup();
} catch (pageErr) {
log.warn(`Falha ao extrair texto da página ${pageNum}:`, pageErr);
}
}
await pdf.cleanup();
const result = textParts.join("\n\n").slice(0, maxChars);
log.info(
`extractTextFromPdf concluído: ${result.length} chars (de ${textParts.length} página(s) com texto).`,
);
return result;
}
Dependencies (Outgoing)
| Target | Type |
|---|---|
| ensureWorker | calls |
No incoming dependencies.