• Stars
    star
    1,463
  • Rank 32,132 (Top 0.7 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 10 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A ruby gem to liberate content from Microsoft Word documents

Word to Markdown converter

A Ruby gem to liberate content from the jail that is Word documents

CI Gem Version Inline docs Build status Maintainability Test Coverage

The problem

Our default content publishing workflow is terribly broken. We've all been trained to make paper, yet today, content authored once is more commonly consumed in multiple formats, and rarely, if ever, does it embody physical form. Put another way, our go-to content authoring workflow remains relatively unchanged since it was conceived in the early 80s.

I'm asked regularly by government employees — knowledge workers who fire up a desktop word processor as the first step to any project — for an automated pipeline to convert Microsoft Word documents to Markdown, the lingua franca of the internet, but as my recent foray into building just such a converter proves, it's not that simple.

Markdown isn't just an alternative format. Markdown forces you to write for the web.

Read more

Just want to convert a Microsoft Word (or Google) document to Markdown?

You can use this hosted service (or check out its source).

Install

You'll need to install LibreOffice. Then:

gem install word-to-markdown

Usage

file = WordToMarkdown.new("/path/to/document.docx")
=> <WordToMarkdown path="/path/to/document.docx">

file.to_s
=> "# Test\n\n This is a test"

file.document.tree
=> <Nokogiri Document>

Command line usage

Once you've installed the gem, it's just:

$ w2m path/to/document.docx

Outputs the resulting markdown to stdout

Supports

  • Paragraphs
  • Numbered lists
  • Unnumbered lists
  • Nested lists
  • Italic
  • Bold
  • Explicit headings (e.g., selected as "Heading 1" or "Heading 2")
  • Implicit headings (e.g., text with a larger font size relative to paragraph text)
  • Images
  • Tables
  • Hyperlinks

Requirements and configuration

Word-to-markdown requires soffice a command line interface to LibreOffice that works on Linux, Mac, and Windows. To install soffice, see the LibreOffice documentation.

Testing

script/cibuild

Docker

First, create the Gemfile.lock by installing the dependencies:

bundle install

Everything you need to run the executable locally:

docker-compose build
docker-compose run --rm app bundle exec w2m --help
docker-compose run --rm app bundle exec w2m test/fixtures/em.docx

Hosted service

Word-to-markdown-server contains a lightweight server for converting Word Documents as a service. A live version runs at word2md.com.

More Repositories

1

wordpress-to-jekyll-exporter

One-click WordPress plugin that converts all posts, pages, taxonomies, metadata, and settings to Markdown and YAML which can be dropped into Jekyll (or Hugo or any other Markdown and YAML based site engine).
PHP
1,023
star
2

jekyll-auth

A simple way to use GitHub OAuth to serve a protected Jekyll site to your GitHub organization
Ruby
841
star
3

jekyll-remote-theme

Jekyll plugin for building Jekyll sites with any GitHub-hosted theme
Ruby
291
star
4

github-mention-highlighter

A Chrome extension that automatically highlights any time you are mentioned on a GitHub issue or pull request thread by highlighting your username, any team you're a member of, and the border of any containing comment.
TypeScript
273
star
5

word_diff

Word Diff empowers you to be a Markdown person in a Microsoft Word world by automatically converting any Word document committed to a GitHub repo to Markdown
Ruby
196
star
6

gman

A ruby gem to check if the owner of a given email address or website is working for THE MAN (a.k.a verifies government domains).
Ruby
164
star
7

dc-maps

Maps of Washington DC area public geodata
Ruby
152
star
8

pi-hole-cloudflared-docker-compose-ansible-caddy

Example configuration for using Pi-Hole, Cloudflared, Docker Compose, Ansible, and Caddy to over-engineer your home network for privacy and security.
Dockerfile
150
star
9

congressional-districts

Historic and current US Congressional districts as GeoJSON, versioned within Git
149
star
10

benbalter.github.com

The personal website of Ben Balter. Built using Jekyll and GitHub Pages. See humans.txt for more infos.
Ruby
146
star
11

jekyll-relative-links

A Jekyll plugin to convert relative links to markdown files to their rendered equivalents
Ruby
140
star
12

jekyll-include-cache

A Jekyll plugin to cache the rendering of Liquid includes
Ruby
113
star
13

github_records_archiver

Backs up a GitHub organization's repositories and all their associated information for archival purposes.
Ruby
108
star
14

count-org-loc

Count total lines of code across a GitHub organization
Ruby
108
star
15

geojson-diff

A Ruby library for diffing GeoJSON files
Ruby
101
star
16

jekyll-titles-from-headings

Ruby
94
star
17

dc-wifi-social

A collaborative list of DC locations that serve up both Internet and Alcohol
Shell
90
star
18

jekyll-readme-index

A Jekyll plugin to render a project's README as the site's index.
Ruby
87
star
19

site-inspector

Ruby Gem to sniff information about a domain's technology and capabilities.
Ruby
84
star
20

markdown_to_word

Converts GitHub-flavored Markdown to a Word document
Ruby
83
star
21

jekyll-optional-front-matter

A Jekyll plugin to make front matter optional for Markdown files
Ruby
82
star
22

wordpress-plugin-tests

A Travis CI compatible unit testing for WordPress plugins that utilizes the WordPress core unit testing framework and PHPUnit
PHP
76
star
23

retlab

A minimalist Jekyll theme for your personal site
HTML
75
star
24

markdown-to-pdf

On demand generation of enterprise-grade PDFs from GitHub-hosted markdown files
Ruby
74
star
25

word-to-markdown-server

A hosted version of the Word to Markdown gem
Ruby
70
star
26

jekyll-style-guide

An opinionated guide to common Jekyll design patterns and anti-patterns.
Shell
66
star
27

github-forms

A RESTful API for submitting standard HTML form data to a GitHub-hosted CSV.
CSS
65
star
28

comment-card

A simple interface for non-technical users — both authenticated and pseudonymous — to provide feedback for your GitHub-hosted project
Ruby
56
star
29

jekyll-default-layout

Silently sets default layouts for Jekyll pages and posts
Ruby
55
star
30

jekyllbot

Listens for GitHub post-recieve service hooks messages, runs jekyll, and pushes the results back to GitHub.
Ruby
54
star
31

problem_child

Allows authenticated or anonymous users to fill out a standard web form to create GitHub issues (and pull requests).
Ruby
50
star
32

zoom-launcher

A command line tool for joining your next Zoom meeting.
Ruby
48
star
33

open-source-alternatives

A collaborative list of open-source alternatives to typical government and enterprise software needs
HTML
47
star
34

change_agent

A Git-backed key-value store, for tracking changes to documents and other files over time
Ruby
46
star
35

situation-clock

LED-style Situation Room wall clock formatted for iPads
TypeScript
37
star
36

github-uploader

A simple app to enable drag-and-drop uploading of binary and other assets to GitHub Repositories
Ruby
36
star
37

word-to-markdown-js

Convert Word documents to beautiful Markdown. Via command line or in your browser.
TypeScript
36
star
38

Convert-Word-Documents-to-HTML

Cleans up the otherwise-ugly output generated by Microsoft Word when you use the File -> Save As HTML option and optionally includes Twitter Bootstrap
PHP
35
star
39

troll-repellant

A micro-service to automatically comment on and close issues opened by bothersome users.
Ruby
35
star
40

zoom-go

A command line tool for joining your next Zoom meeting.
Go
34
star
41

huebot

Changes a Phillips Hue light's color and flashes based on GitHub's status
Ruby
34
star
42

dotfiles

@BenBalter's computering environment and the scripts to initialize it and keep it up to date.
Ruby
31
star
43

jekyll-bootstrap-sass

A plugin to add Twitter Bootstrap to your Jekyll site
Ruby
30
star
44

Frequency-Analysis

Script to parse text file and find frequency of n-word phrases
PHP
30
star
45

redliner

A tool for facilitating the redlining of documents with the GitHub uninitiated
Ruby
27
star
46

add-to-org

A simple Oauth App to automatically add users to an organization
Ruby
27
star
47

avatar-montage

Create a tiled montage (or collage) of GitHub Avatar images for use in all hands and other "welcome" or "thank you" slides
Shell
25
star
48

open-sourcing-government

Presentation on applying the open-collaboration philosophy to the process of governing
CSS
25
star
49

WP-Resume

Out-of-the-box solution to get your resume online. Built on WordPress's custom post types, it offers a uniquely familiar approach to publishing
PHP
25
star
50

sitemap-parser

Ruby Gem to parse sitemaps.org compliant sitemaps
Ruby
24
star
51

jekyll-restful-api-generator

Given a CSV file (or a series of YAML files), generates a RESTful API using Jekyll
PHP
21
star
52

plister

A utility for programmatically setting OS X plist file preferences
Ruby
21
star
53

Twitter-Mentions-as-Comments

Please note: This project is no longer actively maintained and is seeking a new maintainer.
PHP
20
star
54

naughty_or_nice

You've made the list, we'll help you check it twice. Given a domain-like string, verifies inclusion in a list you provide.
Ruby
19
star
55

bulk-issue-creator

Bulk opens batches of issues (or posts comments) across GitHub repositories based on a template and CSV of values. Think of it like "mail merge" for GitHub issues.
TypeScript
18
star
56

gmail-and-google-calendar-stats

Scrapes your GMail and Google Calendar data and returns it as a CSV for further analysis.
TypeScript
17
star
57

ra11y

Ruby-based automated accessibility testing
Ruby
16
star
58

pdftotext

A Ruby wrapper for the `pdftotext` command line library
Ruby
15
star
59

Site-Inspector-php

PHP Class to provide information about a given domain.
PHP
15
star
60

coconductor

A work-in-progress code of conduct detector based off Licensee
Ruby
15
star
61

government-glossary

A GovSpeak to English translator a.k.a. glossary of common government IT and procurement terms, abbreviations and acronyms (CGITPTAA)
Ruby
15
star
62

2016-campaign-tech

An entirely unofficial look at the technology stacks of the 2016 Presidential Campaigns
Ruby
14
star
63

squad_goals

A tiny app to allow open-source contributors to opt-in to GitHub teams.
Ruby
12
star
64

Plugin-Boilerplate

PHP
12
star
65

Infinite-Scroll

"Fork" of the infinite scroll WordPress Plugin
PHP
12
star
66

copy-issue-link-bookmarklet

Copies a markdown link to the GitHub issue you're currently viewing in the form of [issue title](issue URL).
TypeScript
12
star
67

2021-analysis-of-federal-dotgov-domains

Analysis of what technologies power federal .gov websites - 2021 edition.
HTML
12
star
68

open-source-for-government

A collaborative resource for government employees looking to participate in the open source community
HTML
12
star
69

simple-api

An example of making a simple API with Jekyll and GitHub Pages
12
star
70

open_source_stats

A quick script to generate metrics about the contribution your organization makes to the open source community in a 24-hour period
Ruby
11
star
71

issue-shadower

A webhook receiver to mirror GitHub.com issues to private GitHub Enterprise forks.
Ruby
10
star
72

so_far_so_good

A Ruby Gem to parse and manipulate the Federal Acquisition Regulation (FAR) and Defense Federal Acquisition Regulation Supplement (DFARS)
Ruby
9
star
73

copy-to

A quick-and-dirty Heroku app to simulate running `git clone`, `git remote add`, and `git push` locally.
Ruby
9
star
74

wordpress-unix-theme

Unix-like Interface for Wordpress
JavaScript
9
star
75

.github

Script to share community files (e.g., CONTRIBUTING.md) and probot configuration files across repositories
Ruby
8
star
76

Domain-Inventory

WordPress plugin to track Federal .Govs by Agency, Status, Non-WWW Support, IPv6 Support, CDN, CMS, Cloud Provider, Analytics, JavaScript Libraries, and HTTPs support.
PHP
7
star
77

tweets

Archive's historical Tweets as a Jekyll site on GitHub Pages
HTML
7
star
78

PCLJ-Members-Workspace

Customization of WP Document Revisions for the Public Contract Law Journal
PHP
7
star
79

wordpress-plugin-boilerplate-classes

PHP
7
star
80

Convert-Microsoft-Word-Footnotes-to-WordPress-Simple-Footnotes

Plugin to parse Microsoft Word footnotes into WordPress's Simple Footnotes format
PHP
7
star
81

behind-github-geojson

Open source, open standards and 50 lines of code: A look behind GitHub’s GeoJSON rendering and diffing
CSS
7
star
82

feedback

Ask @benbalter anything!
6
star
83

markdown-table-formatter

A Ruby class to normalize column lengths in a markdown table
Ruby
6
star
84

when-did-nacin-last-blog

Answering the internet's toughest questions since 2011.
Ruby
6
star
85

hubot-webscale

A web-sockets based Hubot adapter to allow Hubot to be embedded in a website at webscale.
CoffeeScript
6
star
86

meeting-matrix

Visually counts down the time remaining in a meeting on an RGB LED Matrix using a Raspberry Pi.
Python
6
star
87

ti-83-programs

The first (useful) software I ever wrote. Calculator programs written between 2001 and 2005 on a TI-83 Plus.
Shell
5
star
88

Recursive-Merge-Sort-in-PHP

Recursive Merge Sort Function in PHP
JavaScript
5
star
89

import_export

A Ruby client for Trade.gov's Consolidated Screening List
Ruby
5
star
90

Encrypt-Posts

WordPress Plugin that provides data-at-rest encryption of post content
PHP
5
star
91

bot-notification-silencer

Marks notifications from bots as read.
TypeScript
5
star
92

federal-open-source-repositories

A dynamically generated list of open source projects hosted on GitHub
Ruby
5
star
93

change_agent_demo

Demo data store for https://github.com/benbalter/change_agent
4
star
94

jekyll-accessibility-template

Jekyll Voluntary Product Accessibility Template (VPAT) and §508 accessibility information template
HTML
4
star
95

jekyll-edit-link

A Jekyll tag that generates links to edit the current page on GitHub
Ruby
4
star
96

make-maps-better-together

Collaborative mapping the GitHub Wayâ„¢
CSS
4
star
97

chilling_effects

A Ruby gem to interact with the Chilling Effects API
Ruby
4
star
98

dont-build-an-api

Presentation for the DC API meetup
CSS
4
star
99

go-huebot

Changes a Phillips Hue light's color and flashes based on GitHub's status
Go
4
star
100

site-inspector-demo

HTML
3
star