• Stars
    star
    170
  • Rank 223,357 (Top 5 %)
  • Language
    JavaScript
  • Created about 12 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PDF Filler is a RESTful service (API) to aid in the completion of existing PDF-based forms and empowers web developers to use browser-based forms and modern web standards to facilitate the collection of information.

PDF Filler

PDF Filler is a RESTful service (API) to aid in the completion of existing PDF-based forms and empowers web developers to use browser-based forms and modern web standards to facilitate the collection of information.

PDF Filler works with virtually any unencrypted PDF, supporting both fillable (e.g., PDFs with pre-defined entry fields) and non-fillable (e.g., scanned PDFs) forms. Simply pass it the URL to any publicly hosted PDF. PDF Filler can even automatically create the markup necessary to embed an HTML form in an existing webpage.

Features

  • RESTful service (API) to aid in the completion of PDF-based forms
  • Submit form values via HTTP POST, receive the completed PDF as a download
  • Works with both fillable and non-fillable (e.g., scanned) PDFs
  • Dynamically generates HTML forms for any fillable PDF
  • Provides developers with field name lookup service to facilitate the rapid development of client applications

Under the Hood

The project abstracts the form-filling logic of pdftk and prawn.

Usage

PDF Filler works by accepting a key => value pair of field names and values. These fields can either be fillable PDF form fields, or can be an arbitrary x/y coordinate of a non-fillable field. For fillable PDFs the key should represent the field name. In non-fillable PDFs, the key should represent the field coordinates as described below (e.g., 100,100). In both insstances, the field value should contain the user input for that given field.

Getting Field Names

Field names can be discovered locally using open-source PDF utility pdftk, or dynamically using the service.

To get a list of all fields within a given PDF

/fields?pdf={URL to the PDF}

To get a JSON representation of all fields within a given PDF

/fields.json?pdf={URL to the PDF}

Filling Out Forms

To fill out a PDF, issue a POST request to /fill. POST data should be in the format of key => value where key represents the field name and value represents the field value. Be sure to pass a key of "pdf" with the URL of the PDF to fill. The service will return the filled in PDF as a download.

Note: Due to the way HTML handles forms, certain special characters such as square brackets will not properly POST to the service. If the PDF field contains reserved characters, simply urlencode the field name prior to POSTing.

Generating HTML Forms

To get a generic HTML representation of any fillable PDF form

/form?pdf={URL To PDF}

Non-Fillable PDFs

Non-Fillable PDFs (e.g., scanned or other PDFs without structured forms) require passing X, Y coordinates, and (optionally) a page number. This data is passed using the following naming convention for the field: x,y,page (or simply x,y) where X and Y represent the pointer coordinates from the bottom left hand corner of the document. If no page is given, the first page will be assumed.

Structuring the HTML Form

Data can be submitted programmatically (e.g. via an API) or as a standard web-based form. For example, to structure an HTML form, you may do so as follows:

    <form method="post" action ="/fill">
      
      <!-- A standard, fillable field, simply pass the field name -->
      <label>First Name: <input type="text" name="first_name" /><label>
      
      <!-- A non-fillable field for which we pass coordinates -->
      <label>Last Name: <input type="text" name="100,100,1" /><label>
      
      <input type="submit" value="Submit" />
      
    </form>
    

Requirements

  • Latest stable version of Ruby (+ the Bundler gem)
  • PDFtk

Setting up

  1. Install the latest version of Ruby if not already installed ($ \curl -L https://get.rvm.io | bash -s stable --ruby)
  2. Install PDFtk
  3. Install bundler if not already installed (gem install bundler)
  4. Install git if not already installed (or simply download the repository and unzip in the following step)
  5. Clone the git repository (git clone [email protected]:GSA-OCSIT/pdf-filler.git and cd into the target directory (most likely pdf-filler)
  6. bundle install

Running

To run, simply run the command ruby app.rb from the project's directory. The service will be exposed on port 4567 by default.

You can freely use PDF Filler as a web service. But if you'd like to grab the source code and host it locally, it's actually pretty easy.

PDF Filler uses pdftk to handle the action form filling. pdftk can be freely downloaded and installed on most systems. If installed at a location other than /usr/local/bin/pdftk, be sure to update the configuration by setting the environmental variable PATH_TO_PDFTK to the proper path.

PDF Filler is written in Ruby and uses Sinatra to generate a RESTful API

Deploying

PDF Filler is simple to deploy as a backend service on your server. Follow the instructions here: http://www.kalzumeus.com/2010/01/15/deploying-sinatra-on-ubuntu-in-which-i-employ-a-secretary/ as an example of how to deploy and set up the app as a backend service on your machine. There is a file called daemon.rb that is part of the app for this purpose.

Hosting

The app is designed to be hosted on hosting services like heroku. If using Heroku, be sure to select the "Bamboo" build (which comes compiled with pdftk) and set an environment config for PATH_TO_PDFTK to /usr/bin/pdftk.

Examples

Contributing

Anyone is encouraged to contribute to the project by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.)

By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the MIT License.

License

This project constitutes a United States Government Work under 17 USC 105 and is distributed under the terms of the MIT License.

More Repositories

1

data

Assorted data from the General Services Administration.
HTML
2,097
star
2

datagov-wptheme

Data.gov WordPress Theme (obsolete)
JavaScript
1,892
star
3

data.gov

Main repository for the data.gov service
Shell
411
star
4

code-gov-web

DEPRECATED πŸ›‘- Federal Source Code policy implementation.
TypeScript
410
star
5

https

The HTTPS-Only Standard for federal domains (M-15-13), and implementation guidance.
Python
241
star
6

govt-urls

Most government websites end in .gov or .mil, but many do not. This repo contains USA.gov's list of public government domains and URLs that don't end in .gov or .mil.
213
star
7

fedramp-automation

FedRAMP Automation
TypeScript
211
star
8

digitalgov.gov

Digital.gov β€”Β Helping the government community deliver better digital services.
HTML
194
star
9

plainlanguage.gov

A resource to help federal employees write in plain language and comply with the Plain Writing Act of 2010
SCSS
144
star
10

project-open-data-dashboard

Project Open Data Dashboard
PHP
137
star
11

code-gov

An informative repo for all Code.gov repos
132
star
12

accessibility-for-teams

A β€˜quick-start’ guide for embedding accessibility and inclusive design practices into your team’s workflow
JavaScript
91
star
13

openacr

OpenACR is a digital native Accessibility Conformance Report (ACR). The initial development is based on Section 508 requirements. The main goal is to be able to compare the accessibility claims of digital products and services. A structured, self-validated, machine-readable documentation will provide for this.
JavaScript
82
star
14

piv-guides

This is the old location for the PIV Playbook. New location below.
JavaScript
66
star
15

search-gov

Source code for the GSA's Search.gov search engine
Ruby
63
star
16

modernization

Report to the President on IT Modernization
CSS
61
star
17

slash-developer-pages

A lightweight listing of /developer pages in government, including embed-ready html code and structured xml.
60
star
18

code-gov-api

API powering the code.gov source code harvester
JavaScript
52
star
19

sf-sandbox-post-copy

A framework for managing automation tasks that are fired upon sandbox refresh in Salesforce orgs.
Apex
52
star
20

AI-Assistant-Pilot

Inter-agency Federal AI Personal Assistant Pilot
46
star
21

touchpoints

Feedback platform for continuous improvement of systems, services, processes, and policy.
HTML
45
star
22

resources.data.gov

Resources for open data and enterprise data inventory management
SCSS
45
star
23

open-gsa-redesign

A fresh start for open.gsa.gov.
SCSS
44
star
24

fedramp-tailored

FedRAMP Tailored.
SCSS
43
star
25

punchcard

Repository of synonyms, protected words, stop words, and localizations
Ruby
41
star
26

code-gov-front-end

Front-end of code.gov
JavaScript
40
star
27

devsecops-example

Example implementation of the GSA DevSecOps Pipeline
HCL
38
star
28

cto-website

Tech at GSA website
JavaScript
36
star
29

federal-open-source-repos

Uses Javascript to query the Social Media Registry and GitHub APIs and list details about all federal open source code on GitHub
CoffeeScript
36
star
30

srt-fbo-scraper

Using machine learning to predict Federal IT procurement compliance with Section 508 Accessibility Standards
Python
35
star
31

DataBeam

Generic RESTful Interface for databases
PHP
35
star
32

ckanext-geodatagov

data.gov extension
XSLT
34
star
33

ficam-playbooks

The content on this repository was migrated to idmanagement.gov.
HTML
33
star
34

devsecops-cloud-custodian-rules

[WORK IN PROGRESS] A repo containing rule sets for cloud-custodian inside GSA AWS accounts. This repo does not contain cloud-custodian itself.
30
star
35

jobs_api

Allows you to tap into a list of current jobs openings with federal, state, and local government agencies. Jobs are searchable by keyword, location, agency, schedule, or any combination of these.
Ruby
30
star
36

asis

ASIS (Advanced Social Image Search) indexes Flickr and MRSS images and provides a search API across both indexes.
Ruby
30
star
37

Open-And-Structured-Content-Models

Open and structured content models drafted by a cross-agency working group.
CSS
29
star
38

Challenge_gov

Elixir
29
star
39

training-pathway-data-practitioner

Open source training material for the GSA Data Science Practitioner Learning Program
Jupyter Notebook
29
star
40

fpki-guides

This is the old location for the FPKI Playbook. New location below.
JavaScript
29
star
41

recalls_api

NOT SUPPORTED. Allows you to tap into a list of car, drug, food, and product safety recalls. Recalls are searchable by keyword, issuing agency, date, UPC code, vehicle-specific attributes, or any combination of these.
Ruby
29
star
42

catalog.data.gov

Development environment for catalog.data.gov
Python
28
star
43

sdg-indicators-usa

U.S. National Reporting Platform for the Sustainable Development Goals
JavaScript
28
star
44

uswds-sf-lightning-community

A Salesforce Lightning Community Theme and related components built upon US Web Design System
JavaScript
27
star
45

DevSecOps

Base infrastructure for future DevSecOps environment in AWS
26
star
46

ficam-arch

This is the old location for the FICAM Architecture. New location below.
JavaScript
26
star
47

federal-website-index

A project to build and maintain a comprehensive listing of the public websites of the U.S. federal government.
Python
25
star
48

idmanagement.gov

IDManagement.gov is a collaboration between GSA and the Federal CIO Council. It is managed by the Identity Assurance and Trusted Access Division in the GSA Office of Government-wide Policy.
JavaScript
25
star
49

ckanext-usmetadata

A CKAN extension for inventory.data.gov
Python
24
star
50

jenkins-shared-library-examples

Groovy
24
star
51

emerging-technology-atlas

Emerging Citizen Technology
CSS
23
star
52

piv-conformance

Tool to verify conformance to the PIV data model per most recent releases of FIPS 201 and associated publications
HTML
22
star
53

GitHub-Administration

GSA's administration and implementation of github.com/gsa
22
star
54

security-benchmarks

GSA Security Benchmarks and Tools
21
star
55

jenkins-deploy

deploy Jenkins to AWS with Terraform and Ansible
HCL
21
star
56

sam-design-system

TypeScript
21
star
57

ansible-os-win-2016

Ansible Roles for Windows Server 2016
21
star
58

ckanext-datagovtheme

Theme for Data Catalog
XSLT
20
star
59

gsa-doc-digital-signature

This tool is deprecated. Please follow these new procedures - https://playbooks.idmanagement.gov/signfedregister/
Java
20
star
60

catalog-app

Development environment for catalog.data.gov
Shell
19
star
61

IAE-Architecture

Repository for IAE architectural documents.
19
star
62

inventory-app

Docker image for ckan app powering inventory.data.gov
Python
18
star
63

usagov-benefits-eligibility

Benefits eligibility estimator tool for USAGov.
JavaScript
18
star
64

site-scanning-engine

The repository for the rearchitected site-scanning project, specifically the scanning engine itself.
TypeScript
18
star
65

ITDB-schema

IT Dashboard submissions schema, documentation and example files.
18
star
66

data-strategy

Federal data strategy website
HTML
18
star
67

citizenscience.gov

This is the new build of CitizenScience.gov using Jekyll on Federalist. Feel free to contribute or submit an issue!
HTML
17
star
68

digital-strategy

Machine-readable schema for describing action items within the president's digital strategy, and for reporting on its progress
17
star
69

digital-strategy-report-generator

Generates reports to describe agencies' progress in realizing the goals of the President's Digital Government Strategy
PHP
17
star
70

github-federal-stats

Bash scripts to generate metrics on U.S. Federal usage of GitHub using the GitHub APIs
HTML
16
star
71

SF-Event-Monitoring-Log-Retrieval

Python-based utility to fetch Salesforce Event Monitoring Logs and store them locally for consumption by log monitoring and analytics software.
Python
16
star
72

ngx-uswds

USWDS Components in Angular
TypeScript
16
star
73

EDX

GSA Enterprise Digital Experience (EDX)
JavaScript
16
star
74

AI_Grand_Challenge_For_Resiliency

AI Grand Challenge for Resiliency: Impact of U.S. Government Policy on COVID-19 using Natural Language Processing & Text Analytics
Python
16
star
75

Mobile-Code-Catalog

Source code from around, inside, and outside the federal government that can be helpful to federal agencies building mobile apps.
HTML
15
star
76

Open-Data-Collaboration-Sandbox

A sandbox for loose collaboration on assorted open data projects
CSS
15
star
77

cloudgov-demo-postgrest

Get a federally-compliant REST API for your CSV data on cloud.gov in about 60 seconds. ATO not included.
Shell
15
star
78

laptop-management

ALPHA/WIP for OSquery configuration for Mac and Linux Operating Systems
Shell
15
star
79

ckan-php-manager

A tool for managing a CKAN data catalog
PHP
15
star
80

innovation.gov

Deprecated - This project repo is no longer being maintained.
Ruby
14
star
81

i14y

Search engine for agencies' published content
Ruby
14
star
82

oscal-gen-tool

C#
14
star
83

threat-analysis

14
star
84

grace-inventory

Lambda function to create an inventory report of AWS services as an Excel spreadsheet in an S3 bucket. Includes Terraform code to deploy it.
Go
14
star
85

participation-playbook

US Public Participation Playbook
CSS
14
star
86

gsa-icam-card-builder

ICAM Test Card Signer and Data Populator
Java
14
star
87

fpkilint

Federal PKI, X.509 certificate linter
JavaScript
14
star
88

centers-of-excellence

All the excellent centers
HTML
14
star
89

Very-Simple-API

A barebones API
HTML
13
star
90

participate-nap4

Participate in the 4th U.S. National Action Plan for Open Government
13
star
91

ckan-php-client

A PHP client for the CKAN data catalog, used by https://github.com/GSA/ckan-php-manager
PHP
13
star
92

openacr-editor

With this tool, people can generate Accessibility Conformance Report in the OpenACR format.
Svelte
13
star
93

devsecops-ekk-stack

Terraform that builds an EKK logging stack
HCL
12
star
94

coe-industry-day

Information on the Phase II Industry Day for the Centers of Excellence at USDA.
12
star
95

recruiter

Embeddable forms to recruit research participants. Sends results to a Google Sheet, deployed via Google Tag Manager.
JavaScript
12
star
96

code-gov-style

Deprecated - Style for code.gov including buttons, banners, and cards
JavaScript
12
star
97

oscal-ssp-to-word

JavaScript
12
star
98

ansible-https-proxy

Ansible role to set up nginx as a secure proxy
11
star
99

10x

[Deprecated] Website for the Office of Investments in GSA’s Technology Transformation Service
JavaScript
11
star
100

site-scanning

The central repository for the Site Scanning program
11
star