• Stars
    star
    230
  • Rank 174,053 (Top 4 %)
  • Language
    JavaScript
  • Created almost 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Extract PDFs to Markdown within Obsidian

Extract PDF text to Markdown

Allows you to extract the basic textual content of a PDF into a Markdown file. Works well with headings, paragraphs and lists.

Demo

How to use this plugin

After you've installed and activated the plugin:

  1. Drag your PDF into Obsidian
  2. Open the PDF within Obsidian
  3. Make sure the pane with your PDF is focused
  4. Click the "PDF to Markdown" button in the sidebar
  5. Edit the generated markdown file to your needs

Tips & Tricks for editing the generated markdown file

I just went ahead and turned a 500 page PDF into markdown and found that it worked better and faster than I expected.

Bulk-removing page footers

The book I used had the same footer on every page. That means they got copied into the markdown file over and over, too.

For bulk search-and-replace I use the Atom editor (https://atom.io):

  1. Copy the footer text into your clipboard
  2. Download and install Atom
  3. Open Atom and open the Markdown file inside
  4. Use "Find -> Find in Buffer" and paste the footer text
  5. Use the button "Replace" or "Replace All" to remove footer text

Remove a single space before a new line of text

Weirdly, sometimes, new lines of text had a space infront of them. Such as:

Some text

...which resulted in Obisidian treating it as a sub-block of the preceding line.

To remove the space for those lines, I used a regular expression search-and-replace:

  1. In "Find in current buffer" activate "Regex Search" (The .* icon)
  2. Enter ^([ ]|\t)+ into the search field
  3. Use the button "Replace" or "Replace All" to remove the space

Known issues

First-time use

If you had a PDF open in Obisidian before you installed and activated the plugin, hitting the button may not work. I've had this issue with other plugins as well. The code just doesn't hook up to already-open files.

The solution is to simply close the PDF note and re-open it. That will allow the plugin to hook into it.

Limited PDF parsing

Please understand that this is a basic, best-effort tool to get basic text and headings from a PDF. It really just gets the text from a pdf and turns it into Markdown. The plugin doesn't handle anything more complex, like tables, images, annotations etc:

  • Does not turn PDF highlights and annotations into MD highlights
  • Does not retain PDF numbered lists
  • Does not skip text in headers and footers

More Repositories

1

obsidian-extract-pdf-highlights

Extract highlights, underlines and annotations from your PDFs into Obsidian
TypeScript
220
star
2

obsidian-journey-plugin

Discover the story between your notes in Obsidian
TypeScript
146
star
3

obsidian-footnotes

Makes creating footnotes in Obsidian more fun!
TypeScript
136
star
4

extract-highlights-plugin

Manage your highlights in Obsidian by easily creating, removing and exporting them.
TypeScript
88
star
5

dangerzone-writing-plugin

A plugin that forces you to write for X seconds. If you pause for more than 5 seconds, everything you've written in this note is DELETED.
TypeScript
69
star
6

quickdown

QuickDown – A better inbox for your ideas
Swift
63
star
7

obsidian-shuffle

Create custom and randomized writing prompts
TypeScript
47
star
8

Streamline

Streamline is a stream-of-consciousness writer for Obsidian
Swift
25
star
9

getunblah

An app for those (like myself) who talk too much, or too little, when they get nervous in meetings
SCSS
22
star
10

algorand-ballet

A qualitative analysis tool for the Algorand blockchain
Vue
15
star
11

plotto-for-obsidian

Write your plots faster with this 'batteries-included' collection of Plotto plain text markdown files for Obsidian.md
HTML
14
star
12

text-expander-plugin

A bare-bones and opinionated proof-of-concept text expander utility for Obsidian
TypeScript
13
star
13

getstreamline

Streamline is a stream-of-consciousness writer for Obsidian
HTML
11
star
14

getquickdown

QuickDown – A better inbox for your ideas.
SCSS
8
star
15

obsidian-recursor

An ambient plugin that will place your cursor back to where you left off.
TypeScript
8
star
16

getjamgpt

A free, instant & bespoke ChatGPT app for macOS
HTML
6
star
17

menu-bar-breathing

A small MacOS menu bar app that helps you breathe
Swift
5
star
18

obsidian-airgap

This plugin allows you to link to notes in an airgapped vault. This is useful if you want to link to ideas from one vault while working in a different one without the danger of mixing up the two.
TypeScript
5
star
19

Poppy

Poppy keeps your single most important webpage on top of all other windows. Very useful while writing with ChatGPT or pair programming on a backlog ticket.
Swift
5
star
20

pudding

Follow the Money, Find the Story. Have more fun with OSINT analysis and reconnaissance in decentralized finance ecosystems.
TypeScript
3
star
21

extracthighlights-dist

Custom PDFJS library for highlights
JavaScript
2
star
22

the-little-clojurian

My TDD implementation of "The Little Schemer" in Clojure.
Clojure
2
star
23

blockchain-toy

A recursive toy algorithm to calculate the nonce for zero-padded SHA256 hashes
Clojure
2
star
24

getpogo

Unlock the Power of GPT3 and OpenAI with a Single Keyboard Shortcut
1
star
25

LangChainInABox

A native macOS app for playing with LangChain that ships with a self-contained, embedded, Apple-notarisable Python 3.11 and up-to-date LangChain libraries
Python
1
star
26

KeyPair

KeyPair makes using keyboard shortcuts during pair programming sessions easier.
Swift
1
star
27

obsidian-dashboard

Dashboard for Obsidian
TypeScript
1
star