Skip to content

pdf2table is a node.js library that attempts to extract tables from a pdf.

License

Notifications You must be signed in to change notification settings

SamDecrock/pdf2table

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf2table

pdf2table is a node.js library that attempts to extract tables from a pdf.

The 'tables' are extracted as an array of rows.

It uses pdf2json to extract the pdf data.

Install

You can install pdf2table using the Node Package Manager (npm):

npm install pdf2table

Simple example

var pdf2table = require('pdf2table');
var fs = require('fs');

fs.readFile('./test.pdf', function (err, buffer) {
    if (err) return console.log(err);

    pdf2table.parse(buffer, function (err, rows, rowsdebug) {
        if(err) return console.log(err);

        console.log(rows);
    });
});

Note

Note that this is a simplistic implementation to extract tables. If your pdf contains other stuff that's not a table, pdf2table will still attempt to shape this data into a row. Feel free to improve and send pull requests.