• This repository has been archived on 14/Jul/2023
  • Stars
    star
    590
  • Rank 75,794 (Top 2 %)
  • Language
    Java
  • License
    MIT License
  • Created almost 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to PDF files.

Docs to PDF Converter

(I'm not maintaining this code as I neither have personal resources nor am I still using this project. I'll be happy to oblige if you have any pull requests or even if you wish to be a co-maintainer.)

A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents to pdf files. (Requires JRE 7)

The v1.7 release has not been updated for about 2 years although it seems quite reliable for me. In response to an issue request to update the libraries, I have done so with the new v1.8. I now use Maven to managed the libraries in the pom.xml file.

I have not tested v1.8 much so if you face any issues, you can still use v1.7 in the Releases section.

Table of content

Why?

I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code and libraries to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program. Along the way, I decided to add ODT support as well since I encountered the code too.

Command Line Usage:

java -jar doc-converter.jar -type "type" -input "path" -output "path" -verbose
java -jar doc-converter.jar -input test.doc
java -jar doc-converter.jar -i test.ppt -o ~\output.pdf
java -jar doc-converter.jar -i ~\no-extension-file -o ~\output.pdf -t docx

Parameters:

-inputPath (-i, -in, -input) "path" : specifies a path for the input file

-outputPath (-o, -out, -output) "path" : specifies a path for the output PDF, use input file directory and name.pdf if not specified (Optional)

-type (-t) [DOC | DOCX | PPT | PPTX | ODT] : Specifies doc converter. Leave blank to let program infer via file  extension (Optional)

-verbose (-v) : To view intermediate processing messages. (Optional)

Library Usage:

  1. Drop the jar into your lib folder and add to build path.
  2. Choose the converter of your choice, they are named DocToPDFConverter, DocxToPDFConverter, PptToPDFConverter, PptxToPDFConverter and OdtToPDFConverter.
  3. Instantiate with 4 parameters
    • InputStream inStream: Document source stream to be converted
    • OutputStream outStream: Document output stream
    • boolean showMessages: Whether to show intermediate processing messages to Standard Out (stdout)
    • boolean closeStreamsWhenComplete: Whether to close input and output streams when complete
  4. Call the "convert()" method and wait.

Caveats and technical details:

This tool relies on Apache POI, xdocreport, docx4j and odfdom libraries. They are not 100% reliable and the output format may not always be what you desire.

DOC:

Generally ok but takes some time to convert.. I notice that after conversion, the paragraph spacing tends to increase affecting your page layout. Conversion is done using docx4j to convert DOC to DOCX then to PDF.(Cannot use xdocreport once the DOCX data is obtained as the intermediate data structure is docx4j specific.)

DOCX:

Very good results. Fast conversion too. Conversion is done using xdocreport library as it seems faster and more accurate than docx4j.

PPT and PPTX:

Resulting file is a PDF comprising of a PNG embedded in each page. Should be good enough for printing. This is the limitation of the Apache POI and docx4j libraries.

ODT:

Quality and speed as good as DOCX. Conversion is done using odfdom of the Apache ODF Toolkit.

Main Libraries

Apache POI: https://poi.apache.org/
xdocreport: http://code.google.com/p/xdocreport/
docx4j: http://www.docx4java.org/
odfdom: https://incubator.apache.org/odftoolkit/odfdom/
and others...

Compiling the code

I just want the jar

# If you don't already have Maven in your Mac
brew install maven
mvn clean package

The output jar file can be found in the target folder.

Development

I'm using Eclipse Mars IDE Java EE with the M2Eclipse plugin. Simply create a workspace and import my project into it. Let Maven do its work in downloading all the necessary dependencies. Once everything is downloaded, you should be able to run the MainClass.

More details can be found in the Wiki section.

The MIT License (MIT) Copyright (c) 2013-2014 Yeo Kheng Meng

More Repositories

1

w31slack

A proof-of-concept Slack client for Windows for Workgroups 3.11 with tests.
C
219
star
2

doschgpt

A proof-of-concept ChatGPT and Hugging Face client for DOS with text-to-speech for Sound Blaster compatible systems.
C++
199
star
3

SwiftSerial

A Swift Linux and Mac library for reading and writing to serial ports.
Swift
137
star
4

http-to-https-proxy

A proxy that upgrades HTTP connections to HTTPS for systems which cannot make HTTPS requests.
Go
83
star
5

gentoo-on-486

Instructions on how to install modern Gentoo Linux on ancient 486-based PCs.
67
star
6

programming-win31

The sample codes provided in the floppy disk of Charles Petzold's book Programming Windows 3.1.
C
49
star
7

ndp2019-wristband-teardown

Tear-down effort of the Pixmob wristband used in NDP2019.
47
star
8

retro-configs

Collection of my DOS configurations and drivers of my retro machines
Assembly
31
star
9

reverse-engineering-ndp2016-wristband

Schematics of the reverse engineered Pixmob wristband used in NDP2016. Sample IR brute-force code is available but has not been fully tested in practice.
KiCad Layout
25
star
10

covox-music-player

A Linux program that plays MP3 or WAV files to the Covox Speech Thing or its clones via a native parallel port.
C
18
star
11

nus-soc-print

An Android application that prints office documents and PDF files to Unix printers in NUS School Of Computing via SSH
Java
16
star
12

reflections-on-trusting-trust

A talk (including sample code) about the Turing award lecture "Reflections on Trusting Trust" originally given by Ken Thompson.
C
16
star
13

pcb-covox-amp

A tiny sound card based on the Covox Speech Thing design which includes an LM386 amplifier.
15
star
14

coders-dvorak-regular-keypad

Programmer Dvorak layout with the typical keypad layout instead of the modified one by Roland Kaufmann.
15
star
15

distance-machine-locker

A system that locks your computer the moment you move away from it.
Swift
14
star
16

ssim

Program to measure the similarity between two videos using the OpenCV library and the structural similarity algorithm (SSIM). This is modified from the video-input-psnr-ssim tutorial of OpenCV.
C++
13
star
17

intro-to-ble

A talk about basic Bluetooth Low Energy concepts followed by code explanations.
Java
11
star
18

ne2000plus-collection

Collection of NE2000+ software obtained from various sources
Assembly
11
star
19

pcb-covox

A tiny sound card based on the Covox Speech Thing design.
9
star
20

pi-radio

Raspberry Pi internet radio streamer for Singapore. Can be modified to stream other internet radio stations.
Python
9
star
21

nus-soc-print-ios

An iOS Application that prints office documents and PDF files to Unix printers in NUS School Of Computing.
Swift
8
star
22

ssim-cuda

CUDA Program to measure the similarity between two videos using the OpenCV library and the structural similarity algorithm (SSIM). This is modified from the video-input-psnr-ssim tutorial of OpenCV
C++
7
star
23

ble-localiser

This talk is about how I manage to do localisation using 3 Raspberry Pis(RPis) as beacons.
Swift
7
star
24

iot-esp32-mcu-workshop

This is an introductory IoT ESP32 Microcontroller workshop to get acquainted with microcontroller programming and simple electronics.
C++
6
star
25

dvfs

A C++ native Android program that does dynamic frequency scaling to achieve a target frames-per-second (FPS).
C++
5
star
26

reflections-on-trusting-trust-go

A talk (including sample Golang code) about the Turing award lecture "Reflections on Trusting Trust" originally given by Ken Thompson.
Go
5
star
27

SwiftLinuxSerial

A Swift 3 Linux-only library for reading and writing to serial ports. (Deprecated for SwiftSerial)
Swift
4
star
28

ht107e-socket-tester-teardown

4
star
29

intro-to-pcb-design-eagle

A class to introduce students to designing Printed Circuit Boards (PCBs) using the Eagle software.
Eagle
4
star
30

pcb-usb-ft245r-parallel-adapter

A USB to parallel adapter using the FTDI FT245R chip. This is not a IEEE 1284 compliant parallel port. Just using the DB25 connector to access the data pins.
KiCad Layout
3
star
31

pcb-name-card

My business card that is on a Printed Circuit Board (PCB). Comes with an white LED, UV LED, ruler and QR code.
Eagle
3
star
32

getting-started-with-rpi

A talk about introducing the audience to setting up the Raspberry Pi as well as basic Python coding on the GPIO pins.
Python
3
star
33

intro-to-rpi

A short introductory course I conducted for people new to Raspberry Pis and Linux.
Python
3
star
34

odroid-power-measure

A Java program that dynamically shows FPS, CPU/GPU frequency and power of Odroid-XU.
Java
2
star
35

pcb-breakout-ltc3625-tps63031

Pin breakout of the LTC3625 supercapacitor charger with TPS63031 3.3V regulator to through-hole pin headers for prototyping
2
star
36

pcb-covox-amp-v2

A tiny sound card based on the Covox Speech Thing design which includes an LM386 amplifier and FT245RL chip to receive data via USB. This project is a combination of my 2 earlier projects pcb-covox-amp and pcb-usb-ft245r-parallel-adapter
KiCad Layout
2
star
37

mov-is-turing-complete

A talk about the academic paper "mov is Turing-complete" written by Stephan Dolan". Practical examples from movcc compiler by Chris Domas.
C
2
star
38

sensor_clock_2

A clock composed of an Arduino, temperature and humidity sensors. Includes integrated charger.
Arduino
2
star
39

vibrate_alarm_clock_2

An vibrating alarm clock based on Microview.
Arduino
1
star
40

start-sslstrip-on-boot

Scripts to to start SSLStrip on boot for Arch Linux ARM on Raspberry Pi
Shell
1
star
41

power_measure_tool

An Arduino setup to measure the voltage and current of a load.
C++
1
star
42

retro-configs-apple

Collection of my setup configurations and disc locations of my retro Apple machines
1
star
43

cs5272-project

A group project done in NUS CS5272 module in AY2014-2015 Sem 2.
C
1
star
44

yeokm1.github.io-hugo

Hugo source files for my site at http://yeokhengmeng.com.
JavaScript
1
star
45

yeokm1.github.io

Hugo compiled frontend files for my site at http://yeokhengmeng.com.
HTML
1
star
46

pcb-bme280-breakout

Pin breakout of the Bosch BME280 sensor to through-hole pin headers for prototyping
Eagle
1
star
47

dvfsapp

An app front end that does dynamic frequency scaling to achieve a target frames-per-second (FPS).
Java
1
star
48

vibrate_alarm_clock

An alarm clock with vibration capabilities.
Arduino
1
star
49

auto-rotate-screen

A program that auto rotates the screen when the Thinkpad is rotated.
C++
1
star
50

fpsmeasure

Java
1
star