PDF Utils for node
This library contains tools for analysing and converting PDF files. You can get metadata, extract text, render pages to svg or png, all with our beloved asynchronous programming style.
It is planed to support extracting links from the document and create ImageMaps (You remember them, don't you?) on the fly. Also pdfutils should support password locked files. But that's still on the todo.
The library is currently beta. This means it has incomplete error handling and it lacks a testing suite.
Installation
To install pdfutils you have to install libpoppler-glib first.
Using Debian execute:
apt-get install libpoppler-glib-dev libpoppler-glib8 libcairo2-dev libcairo2
Using CentOS execute:
yum install poppler poppler-glib-devel
Using MacOS and Macports:
port install poppler
or if you prefere brew:
brew install poppler --with-glib
export PKG_CONFIG_PATH=/usr/X11/lib/pkgconfig
Then install pdfutils
npm install pdfutils
Usage
See this very basic example:
var pdfutils = require('pdfutils').pdfutils;
pdfutils("document.pdf", function(err, doc) {
doc[0].asPNG({maxWidth: 100, maxHeight: 100}).toFile("firstpage.png");
});
3sloc to generate thumbnails of PDFs. Awesome!
Here a bit more documentation:
pdfutils(source, callback)
this function is a factory for Documents
arguments:
- source: can be a Buffer or a String. If it's a string, read from the file. If it's a buffer, treat the buffer content as in-memory PDF. Please make sure to not change the buffer while using it by pdfutils!
- callback(err, doc): a callback with the following arguments:
- err: an error string when the pdf couldn't be loaded successfully,
otherwise
null
- doc: an instance of
Document
when the pdf is loaded successfully, otherwiseundefined
- err: an error string when the pdf couldn't be loaded successfully,
otherwise
Class PDFDocument
This class is generated by pdfutils(source, callback) described above.
members:
- 0, 1, 2, 3, 4, ... , n instances of the
Page
s contained by the Document. See description ofPage
below - length: number of
Page
s in a document - author: the author of the document or
null
if not known - creationDate: the creation date as integer since 1970-01-01
- creator: creator of the document or null if unknown
- format: exact format of this PDF file or null if unknown
- keywords: keywords of the document as string or null if unknown
- linearized: true if document is linearized, otherwise false
- metadata: Metadata as string
- modDate: last modification of pdf as integer since 1970-01-01
- pageLayout: the layout of the pages. Can be on of the following strings or null if unknown:
- singlePage
- oneColumn
- twoColumnLeft
- twoColumnRight
- twoPageLeft
- twoPageRight
- pageMode: the suggested viewing mode of a page. Can be one of the following strings or null if unkown:
- none
- useOutlines
- useThumbs
- fullscreen
- useOc
- useAttachments
- permissions: the permissions of this document. Is an object with the following members:
- print: whether the user is allowed to print
- modify: whether the user is allowed to modify the document
- copy: whether the user is allowed to take copies of this document
- notes: whether the user is allowed to make notes
- fillForm: whether the user is allowed to fill out forms
- producer: producer of a document or null if unknown
- subject: subject of this document or null if unknown
- title: title of the document or null if unknown
Class PDFPage
This class represents a page of a document
members:
- width: width of the document
- height: width of the document
- index: number of this page.
- label: label of this page or null if no label was defined.
- links: array containing links of a page
- asSVG(opts): returns an instance of PageJob described below, opts is an
optional argument with an Object with the following optional fields:
- maxWidth: maximal width of the resulting SVG in px.
- minWidth: minimal width of the resulting SVG in px.
- maxHeight: maximal height of the resulting SVG in px.
- minHeight: minimal height of the resulting SVG in px.
- width: the width of the resulting SVG in px. Overwrites minWidth and maxWidth.
- height: the height of the resulting SVG in px. Overwrites minHeight and maxHeight.
- asPDF(opts): returns an instance of PageJob described below, opts is an
optional argument with an Object with the following optional fields:
- maxWidth: maximal width of the resulting PDF in pt.
- minWidth: minimal width of the resulting PDF in pt.
- maxHeight: maximal height of the resulting PDF in pt.
- minHeight: minimal height of the resulting PDF in pt.
- width: the width of the resulting PDF in pt. Overwrites minWidth and maxWidth.
- height: the height of the resulting PDF in pt. Overwrites minHeight and maxHeight.
- asPNG(opts): returns an instance of PageJob described below, opts is an
optional argument with an Object with the following optional fields:
- maxWidth: maximal width of the resulting PNG in px
- minWidth: minimal width of the resulting PNG in px
- maxHeight: maximal height of the resulting PNG in px
- minHeight: minimal height of the resulting PNG in px
- width: the width of the resulting PNG in px. Overwrites minWidth and maxWidth.
- height: the height of the resulting PNG in px. Overwrites minHeight and maxHeight.
- asText(opts): returns an instance of PageJob described below. opts is an optional argument with an Object, which is currently ignored.
Class PDFPageJob
This class inherits Stream. It handles converting a Page (described above) to SVG, PNG or Text
members:
- links: array containing links of a page, translated to fit the output page.
events:
- data: emitted when a new chunk of the converted file is available
- end: emitted when the file is successfully converted
- error: emitted when the file cannot be converted. Is not implemented yet.
members:
- toFile(path, [options]): writes a page to the file in the desired format.
- see Stream for further members.
License
This module is licensed under GPL.