• Stars
    star
    151
  • Rank 246,057 (Top 5 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

scan paper documents 📄 from a scanner 🖨️ as PDFs to Google Drive for full-text search

scan2drive

scan2drive screenshot

scan2drive is a Go program (with a web interface) for scanning, converting and uploading physical documents to Google Drive. The author runs scan2drive as a gokrazy appliance on a Raspberry Pi 4.

During the conversion step, scan2drive skips empty pages and converts the rest from multi-megabyte JPEGs into a kilobyte-sized PDF. This allows you to use Google Drive’s OCR-based full text search.

Both the originals and the converted PDF are uploaded to Google Drive, so that you can enjoy full text search but still have the full-quality originals just in case.

In comparison to the native Google Drive connectivity which some document scanner vendors provide, scan2drive has these main advantages:

  1. scan2drive integrates with the scan button of your document scanner. You press one button and your documents will end up on Google Drive. Other solutions require you to use a mobile app or software on your PC.
  2. scan2drive is self-hosted and depends only on Google Drive being available, not the scanner vendor’s cloud integration service. Many vendors send documents into their own clouds and then to Google Drive. You are welcome to archive the scan directory of scan2drive to other places you see fit, in case there are any issues with Google Drive.
  3. scan2drive converts the scanned documents into a PDF which is small enough to be full text indexed by Google Drive, but it also retains the original JPEGs in case you need them.

Project status and vision

Currently, there are a number of open issues and not all functionality might work well. Use at your own risk!

The project vision is described above. Notably, scan2drive is already feature complete. We don’t want to add any more features to it than it currently has.

scan2drive was published in the hope that it could be useful to others, but the main author has no time to create an active community around it or accept contributions in a timely manner. All support, development and bug fixes are strictly best effort.

Supported scanners {#supported}

  • scan2drive can scan from any AirScan-compatible scanner. This means any scanner that is marketed as compatible with Apple iPhones should work. You can find a list of tested devices at https://github.com/stapelberg/airscan#tested-devices
  • Fujitsu ScanSnap iX500 connected via USB

Architecture

Directory structure

The scans directory (-scans_dir flag) contains the following files:

  • <sub>/ is the per-user directory under which scans are placed
  • 2016-05-09-21:05:02+0200/ is a directory for an individual scan
    • page*.jpg are the raw pages obtained by calling scanimage
    • scan.pdf is the converted PDF
    • thumb.png is the first page of the converted PDF for display in the UI
    • COMPLETE.* are empty files recording which individual processing steps are done

Any file in the scans directory can be deleted at will, with the caveat that deleting scans before the COMPLETE.uploadoriginals file is present will result in that scan being irrevocably lost.

The state directory (-state_dir flag) contains the following files:

  • cookies.key is a secret key with which cookies are encrypted
  • sessions/ contains session contents
  • users/ is a directory containing per-user data
  • users/<sub>/ is a directory for an individual user
    • drive_folder.json contains information about the selected destination Google Drive folder. In case this file is deleted, the user will need to re-select the destination folder and scans cannot be uploaded until a new destination folder has been selected.
    • token.json contains the offline OAuth token for accessing Google Drive on behalf of the user. In case this file is deleted, the user will need to re-login. In case this file is leaked, the user should revoke the token

Installation

First, follow the gokrazy quickstart instructions.

Then, add github.com/stapelberg/scan2drive/cmd/scan2drive package to your gokrazy instance:

gok -i scanner add github.com/stapelberg/scan2drive/cmd/scan2drive

Deploy your gokrazy instance to your Raspberry Pi and connect a supported scanner.

You should be able to access the gokrazy web interface at the URL which the gok tool printed. To access the scan2drive web interface, switch to port 7120.

For setting up Google OAuth, you’ll need to access scan2drive via a domain name with a valid TLS certificate. scan2drive has builtin support to obtain free certificates from Let’s Encrypt, but you do need to make your scan2drive installation reachable over the internet for this to work:

  1. If your provider offers IPv6, set your domain name’s AAAA record to point to your Raspberry Pi’s internet-reachable IPv6 address.
  2. If you don’t have IPv6 available, set up a port forwarding on your router and use Dynamic DNS to make your domain name point to your current IP address.

Building with libjpeg-turbo

libjpeg-turbo is a JPEG image codec that uses SIMD instructions (Arm Neon in case of the Raspberry Pi) to accelerate baseline JPEG compression.

scan2drive can optionally make use of libjpeg-turbo (via the turbojpeg build tag), but doesn’t include it by default because of the cumbersome setup.

Using libjpeg-turbo on gokrazy requires a few extra setup steps. Because gokrazy does not include a C runtime environment (neither libc nor a dynamic linker), we need to link scan2drive statically.

  1. Install the gcc cross compiler, for example on Debian:

    apt install crossbuild-essential-arm64
    
  2. Enable cgo for your gokrazy instance. This means setting the following environment variables when calling gok (for example in your “gokline”, see gokrazy → Automation):

    export CC=aarch64-linux-gnu-gcc
    export CGO_ENABLED=1
    
  3. Enable static linking and the turbojpeg build tag for scan2drive in your instance config (use gok edit):

{
    "Hostname": "scanner",
    "Packages": [
        "github.com/gokrazy/fbstatus",
        "github.com/gokrazy/hello",
        "github.com/gokrazy/serial-busybox",
        "github.com/gokrazy/breakglass",
        "github.com/stapelberg/scan2drive/cmd/scan2drive"
    ],
    "PackageConfig": {
        "github.com/stapelberg/scan2drive/cmd/scan2drive": {
            "GoBuildFlags": [
                "-ldflags=-linkmode external -extldflags -static"
            ],
            "GoBuildTags": [
                "turbojpeg"
            ]
        }
    }
}

More Repositories

1

coronaqr

Go decoder and verifier for EU Digital COVID Certificate (EUDCC) QR code data
Go
169
star
2

airscan

Go package to scan paper documents 📄 from a scanner 🖨️ via the network 🕸️ using the Apple AirScan (eSCL) protocol.
Go
157
star
3

expanderr

expands the Go Call Expression under your cursor to check errors
Go
145
star
4

costream

a co-programming / pair-programming twitch stream setup using H264/OPUS RTP (like WebRTC does)
Go
87
star
5

rsyncprom

rsync wrapper (or output parser) that pushes metrics to prometheus
Go
60
star
6

qrbill

QR-bill implementation (Swiss 🇨🇭 payment standard for wire transfers)
Go
53
star
7

android-davsync

Tool to (automatically) share pictures to a WebDAV server
Java
48
star
8

zkj-nas-tools

NAS (Network Attached Storage) related tools
Go
43
star
9

configfiles

My own personal collection of configfiles. ⚠ Copy bits and pieces at your own risk. ⚠ Might be broken or break your computer. You have been warned :)
Emacs Lisp
35
star
10

wsmgr-for-i3

workspace manager for i3
Go
31
star
11

hmgo

minimal HomeMatic CCU (central control unit) re-implementation in Go
Go
27
star
12

goturbopfor

Teaching implementation of the TurboPFor integer compression algorithm
Go
22
star
13

regelwerk

regelwerk: a collection of behaviors (rules) for my MQTT-driven smart home. Published as an example, not as an active project.
Go
18
star
14

hugo

sources of michael.stapelberg.ch (using the hugo static website generator)
HTML
13
star
15

workspace-populate-for-i3

restores a 50/50 split layout and starts 2 urxvt terminals when a new workspace is created
Go
13
star
16

godebiancontrol

Golang debian control file parser
Go
13
star
17

gibot

Simple IRC bot that helps software projects
Go
12
star
18

hue2mqtt

bridge for Philips Hue to MQTT, written in Go
Go
11
star
19

rsyncparse

rsync output parser
Go
9
star
20

shelly2mqtt

HTTP-to-MQTT adapter for my Shelly-connected door sensors
Go
6
star
21

fitbit-backup

Simple Go program to download your weight data from fitbit
Go
5
star
22

pw-to-yubi

personalizes your yubikey with a static password
Perl
5
star
23

next-chrome-for-i3

next-chrome-for-i3 focuses the chrome window on the current workspace or starts a new chrome instance
Go
5
star
24

stapelberg.github.io

My personal web site (rendered HTML version) — please report issues in github.com/stapelberg/hugo instead, where the sources are
HTML
4
star
25

intercom-backpack

Software/Artifacts of my intercom MQTT backpack project
C++
4
star
26

percentage-for-i3

resizes the current window to use the specified percentage of its parent container
Go
4
star
27

prozesskommunikation

C
4
star
28

xcb-proto

fork of http://cgit.freedesktop.org/xcb/proto/ for documentation
Python
4
star
29

debian-ensure-basics

script to install essential software and configs on fresh Debian systems
Shell
4
star
30

xcb-man-pregenerated

pregenerated xcb manpages for easier reviewing
4
star
31

kinesis-repl

Kinesis Advantage Contoured replacement keyboard controller (succeeded by the kinT keyboard controller: https://github.com/kinx-project/kint)
3
star
32

vsy-bullshit

Bullshit-Bingo für die Vorlesung VSY
C++
3
star
33

kinectboard

An awesome kinect board.
C++
3
star
34

nuki2mqtt

Go program that receives Nuki Bridge webhooks and publishes contents to MQTT
Go
3
star
35

loggedexec

Go
3
star
36

defaultsink2mqtt

publish pulseaudio default sink on MQTT
Go
3
star
37

pretix-datatrans

Ein Plugin für die Shop-Software «pretix» um Zahlungen via datatrans zu ermöglichen
Python
2
star
38

mystrom2mqtt

HTTP-to-MQTT adapter for myStrom power switch
Go
2
star
39

kry-uebung3

Perl
1
star
40

travis-exp

Shell
1
star
41

cpan-install

"How to install a Perl module from CPAN?"
1
star
42

kry-uebung1

C++
1
star
43

godoc-exported

Go
1
star
44

xapian-ruby

Xapian 1.0.18 Bindings for Ruby (1.8-1.9.1 supported)
Ruby
1
star
45

mscompress

compress data using LZ77 algorithm
C
1
star
46

X11-XCB

Perl-Bindings for libxcb
Perl
1
star
47

greetbot

IRC bot which says hello
Go
1
star
48

sit-aufgabe2

Perl
1
star
49

kry-uebung4

KRY Übung 4
Perl
1
star
50

xen-lvm-snapshot

Scripts to make backups of snapshots of LVMs easy
Shell
1
star