• Stars
    star
    1,302
  • Rank 34,649 (Top 0.8 %)
  • Language Rich Text Format
  • License
    Apache License 2.0
  • Created over 11 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

✒️ Word Processing Document Library

SheetJS js-word

Parser and writer for various word processing doc formats. Pure-JS cleanroom implementation from official specifications, related documents, and test files. Emphasis on parsing and writing robustness, cross-format feature compatibility with a unified JS representation, and maximal browser compatibility.

Test Files

Test files should be placed in the test_files directory, in the appropriate subdirectory for the filetype. For example, DOCX files should be placed in test_files\docx\wordjs and RTF files should be in test_files\rtf\wordjs.

Every test file should be accompanied by a plain text .txt representation whose filename is the original filename appended with .txt. For example, the DOCX file test_files\docx\wordjs\foo.docx pairs with the plain text file test_files\docx\wordjs\foo.docx.txt

Generating Baselines using Word for Windows

  1. Ensure you have PowerShell version 7.0 or greater
  2. Run Set-ExecutionPolicy RemoteSigned OR Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass in Powershell (PS) Admin 7.0
  3. Have the PS script in the root of the repo
  4. Run .\generate_txt.ps1 .\test_files\EXT_TYPE\FOLDER (ex. .\generate_txt.ps1 .\test_files\docx\apachepoi)

On first run, if a test file does not have an accompanying .txt file, the script will open Word and save the file as plaintext. Word will rapidly open and close during this process.

The script will not attempt to open Word or try to generate .txt files if they already exist. After a clean run, Word should not open on future runs.

The script will halt for documents that are broken in certain ways. Word will display a prompt, stalling the automated process. Those documents can be skipped by creating a .skip file as described below.

Skipping Files

The script will look for files with the .skip extension and skip processing the base file. For example, if test_files\docx\wordjs\Hello.docx.skip exists, the script will not attempt to process test_files\docx\wordjs\Hello.docx

When the UI blocks (for example, on a VBA error with ThisDocument), the corresponding .skip file should be created manually. The script merely tests if the file exists, so the content is immaterial and a single letter suffices.

Generating .skip files

The script will attempt to open password-protected documents using the password "WordJS". The script will not halt but it will not generate a text file. Instead, an output would be written to terminal indicating a skip and will generate a .skip when encountered.

License

Please consult the attached LICENSE file for details. All rights not explicitly granted by the Apache 2.0 License are reserved by the Original Author.

References

OSP-covered Specifications (click to show)
  • MS-CFB: Compound File Binary File Format
  • MS-DOC: Word (.doc) Binary File Format
  • RTF: Rich Text Format
  • ISO/IEC 29500:2012(E) "Information technology — Document description and processing languages — Office Open XML File Formats"
  • Open Document Format for Office Applications Version 1.3 (25 December 2019)

Analytics

More Repositories

1

sheetjs

📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs
JavaScript
33,614
star
2

j

❌ Multi-format spreadsheet CLI (now merged in http://github.com/sheetjs/js-xlsx )
JavaScript
345
star
3

js-crc32

🌀 JS standard CRC-32 and CRC32C implementation
Python
330
star
4

SheetJS.github.io

:goberserk: SheetJS Spreadsheet Parser/Writer tests and demos
HTML
262
star
5

printj

📜 sprintf for JS
JavaScript
196
star
6

test_files

📚 SheetJS Test Files (XLS/XLSX/XLSB and other spreadsheet formats)
HTML
162
star
7

ssf

📝 Spreadsheet Number Formatter
JavaScript
157
star
8

js-codepage

💱 Codepages for JS
JavaScript
147
star
9

js-adler32

☑️ ADLER-32 checksum
Python
137
star
10

js-ppt

Pure JS PowerPoint 97-2003 (PPT) Parser
JavaScript
107
star
11

k

❌ Spreadsheet Differ
JavaScript
85
star
12

frac

➗ rational approximation with bounded denominator
JavaScript
72
star
13

sgds

Simple REST Server that emulates Google Docs interface using your Excel files (currently read-only)
JavaScript
71
star
14

js-cfb

💾 OLE File Container Format
JavaScript
59
star
15

js-harb

❌ Host of Archaic Representations of Books (now merged in http://github.com/sheetjs/js-xlsx )
JavaScript
55
star
16

jxls

Snapshot for test files. https://github.com/jxlsteam/jxls is the current repo for the project
Java
51
star
17

sheetaki

🔣 Spreadsheet CSV conversion microservice
HTML
50
star
18

pb

📋 Access HTML and other pasteboards from JS and command line
JavaScript
35
star
19

bessel

Bessel Functions in JS
JavaScript
31
star
20

sheets

generate pretty ascii tables from XLS/XLSX/XLSB/XLSM/XML workbooks
JavaScript
23
star
21

wk

🔍 Preview spreadsheets in your terminal!
TypeScript
20
star
22

enron_xls

Spreadsheets from the Enron Corpus
JavaScript
20
star
23

voc

👷 A Literate Programming Framework for JS and compile-to-JS languages.
JavaScript
20
star
24

maths

Collection of Math Functions for NodeJS
JavaScript
18
star
25

py-xls

PyPI xls module
Python
16
star
26

sheet.js.org

sheet.js.org
15
star
27

js-wmf

Windows MetaFile (wmf) processor
TypeScript
14
star
28

bz2

bzip2 for JavaScript
JavaScript
13
star
29

js-vdc

🎧 van der Corput low-discrepancy sequences
HTML
13
star
30

cfb-editor

💼 ZIP/CFB/MIME Archive Editor
JavaScript
9
star
31

node-exit-on-epipe

💥 Cleanly exit on pipe errors
JavaScript
9
star
32

xlsx-nw-demo

node-webkit XLSX demo
JavaScript
7
star
33

rooster

🐓 File filter for version control systems.
Go
7
star
34

docs.sheetjs.com

SheetJS Community Edition Docs repo
HTML
6
star
35

notes

Various file format notes
TypeScript
5
star
36

js-funzip

`funzip` for nodejs
TypeScript
4
star
37

libreoffice_test-files

Mirror of LO Test Files (see https://bugs.freedesktop.org/show_bug.cgi?id=85756)
Python
3
star
38

flat-sheet

demo for https://docs.sheetjs.com/docs/demos/hosting/github
TypeScript
2
star
39

sheetjs-npm-placeholder

Placeholder for the `sheetjs` package on npm
1
star
40

test_files_pres

Presentation Test Files
1
star