• Stars
    star
    194
  • Rank 195,497 (Top 4 %)
  • Language
    JavaScript
  • Created almost 13 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tiny Clojure library that deals with MIME types (Internet media types)

Pantomime, a Library For Working With MIME Types In Clojure

Pantomime is a Clojure interface to Apache Tika.

Originally created as a library that deals with MIME types (Internet media types, sometimes referred to as "content types"), it now also supports extraction of document metadata and text content.

Maven Artifacts

Pantomime artifacts are released to Clojars. If you are using Maven, add the following repository definition to your

pom.xml:

<repository>
  <id>clojars.org</id>
  <url>http://clojars.org/repo</url>
</repository>

Latest Release

With Leiningen:

[com.novemberain/pantomime "2.11.0"]

With Maven:

<dependency>
  <groupId>com.novemberain</groupId>
  <artifactId>pantomime</artifactId>
  <version>2.11.0</version>
</dependency>

Supported Clojure versions

Pantomime requires Clojure 1.8+. The most recent stable release is highly recommended.

Caveats

Pantomime depends on a reasonably modern version of org.apache.commons/commons-compress. This may cause confusing issues with other libraries. If you run into issues with undefined classes, missing methods and such, use lein deps :tree to see if you may have conflicting dependencies then exclude dependencies (either in libraries that bring in older commons-compress versions or Pantomime) as a workaround.

Usage

Detecting MIME type

pantomime.mime/mime-type-of function accepts content as byte arrays, java.io.InputStream and java.net.URL instances as well as filenames as strings and java.io.File instances, and returns MIME type as a string or "application/octet-stream" if detection failed.

An example:

(ns your.app.namespace
  (:require [pantomime.mime :refer [mime-type-of]]))

;; by content (as byte array)
(mime-type-of (.getBytes "filename.pdf"))
;; by file extension
(mime-type-of "filename.pdf")
;; by file content (as java.io.File)
(mime-type-of (File. "some/file/without/extension"))
;; by content (as java.net.URL)
(mime-type-of (URL. "http://domain.com/some/url/path.pdf"))

Pantomime has a variation of mime-type-of function that is suitable for cases when content was fetched from the Web and HTTP headers are also available:

(ns your.app.namespace
  (:require [pantomime.web :refer [mime-type-of]]))

;; body is a string or input stream, headers is a map of lowercased headers.
;; Ring and clj-http both use this format for headers, for example.
(mime-type-of body headers)

In this case, Pantomime will try to detect content type from response body first (because there are applications, frameworks and servers that report content type incorrectly, for example, serve PDFs as text/html) and if it fails, will use content type header.

HTTP headers map must contain "content-type" key for content type header to be used. Most Clojure HTTP client, for instance, clj-http, use lowercase strings for header names so Pantomime follows this convention.

Extension Recommendation

Pantomime can recommend an extension (one of the well known ones) for a MIME type:

(require [pantomime.mime :as pm])

(pm/extension-for-name "application/vnd.ms-excel")
;= ".xls"
(pm/extension-for-name "image/jpeg")
;= ".jpg"
(pm/extension-for-name "application/octet-stream")
;= ".bin"

Parsing and Recognizing Media Types

(ns your.app.namespace
  (:require [pantomime.media :as mt]))

(mt/parse "application/json")

(mt/base-type "text/html; charset=UTF-8") ;; => media type of "text/html"

(mt/application? "application/json")
(mt/application? "application/xhtml+xml")
(mt/application? "application/pdf")
(mt/application? "application/vnd.ms-excel")
(mt/application? (mt/parse "application/json"))

(mt/image? "image/jpeg")
(mt/audio? "audio/mp3")
(mt/video? "video/quicktime")
(mt/text?  "text/plain")
(mt/has-parameters? "text/html; charset=UTF-8") ;; => true
(mt/has-parameters? "text/html") ;; => false
(mt/parameters-of "text/html; charset=UTF-8") ;; => {"charset" "UTF-8"}
(mt/charset-of "text/html; charset=UTF-8") ;; => "UTF-8"

Language Detection

pantomime.languages is a new that provides functions for detecting natural languages:

(require [pantomime.languages :as pl])

(pl/detect-language "this is English, it should not be hard to detect")
;= "en"

(pl/detect-language "parlez-vous Français")
;= "fr"

Note that Tika (and, in turn, Pantomime) supports detection of a limited number of languages. To get the list of supported languages, use the pantomime.languages/supported-languages var.

Metadata and Text Extraction

pantomime.extract provides two functions for extracting metadata, content, and embedded files from byte arrays, java.io.InputStream and java.net.URL instances as well as filenames as strings and java.io.File instances. The extraction functions differ in how they handle embedded documents.

pantomime.extract/parse takes as its single argument any of the types mentioned above. It returns a map containing all the metadata Tika was able to extract from the document, and the text content of the document concatenated with the text of all embedded documents, recursively.

An example:

(require [clojure.java.io :as io]
         [pantomime.extract :as extract])

(pprint (extract/parse "test/resources/pdf/qrl.pdf"))

;= {:producer ("GNU Ghostscript 7.05"),
;=  :pdf:pdfversion ("1.2"),
;=  :dc:title ("main.dvi"),
;=  :dc:format ("application/pdf; version=1.2"),
;=  :xmp:creatortool ("dvips(k) 5.86 Copyright 1999 Radical Eye Software"),
;=  :pdf:encrypted ("false"),
;=  ...
;=  :text "\nQuickly Reacquirable Locksβˆ—\n\nDave Dice Mark Moir ... "
;= }

pantomime.extract/parse-extract-embedded also returns Tika-extracted metadata and document text, but it handles embedded documents differently. Instead of returning the concatenation of all embedded document text, it saves each embedded file to the filesystem and includes a vector of file names and paths in the returned data. Remember to remove those files when you're done with them!

For example, the file fileAttachment.pdf contains a single attached file, which gets saved to /tmp/pantomime-3207476364135900258-embedded:

(require [clojure.java.io :as io]
         [pantomime.extract :as extract])

(pprint (extract/parse-extract-embedded "test/resources/pdf/fileAttachment.pdf"))

;= {:date ("2012-11-23T14:40:50Z"),
;=  :producer ("Acrobat Distiller 9.5.2 (Windows)"),
;=  :creator ("van der Knijff"),
;=  :pdf:pdfversion ("1.7"),
;=  :dc:title ("This is a test document"),
;=  :text "\nThis is a test document. It contains a file attachment..."
;=  ...
;=  :embedded [{:path "/tmp/pantomime-3207476364135900258-embedded",
;=              :name "KSBASE.WQ2"}],
;=  ...}

Note that parse-extract-embedded creates temporary files in the JVM's default location.

If extraction fails, the functions will return the following:

{:text "",
 :content-type ("application/octet-stream"),
 :x-parsed-by ("org.apache.tika.parser.EmptyParser")}

Community

Pantomime has a mailing list. Feel free to join it and ask any questions you may have.

To subscribe for announcements of releases, important changes and so on, please follow @ClojureWerkz on Twitter.

Pantomime Is a ClojureWerkz Project

Pantomime is part of the group of libraries known as ClojureWerkz, together with Monger, Langohr, Neocons, Elastisch, Quartzite and several others.

Continuous Integration

Continuous Integration status

CI is hosted by travis-ci.org

Development

Pantomime uses Leiningen 2. Make sure you have it installed and then run tests against all supported Clojure versions using

lein all test

Then create a branch and make your changes on it. Once you are done with your changes and all tests pass, submit a pull request on Github.

License

Copyright (C) 2011-2019 Michael S. Klishin, and the ClojureWerkz team.

Distributed under the Eclipse Public License, the same as Clojure.

More Repositories

1

monger

Monger is an idiomatic Clojure MongoDB driver with sane defaults, batteries included, well documented, low overhead
Clojure
480
star
2

rabbit-hole

RabbitMQ HTTP API client in Go
Go
404
star
3

langohr

A small, feature complete Clojure client for RabbitMQ that embraces AMQP 0.9.1 model
Clojure
348
star
4

cucumber.el

Emacs mode for editing Cucumber plain text stories
Emacs Lisp
251
star
5

quartz-mongodb

A MongoDB-based store for the Quartz scheduler. This fork strives to be as feature complete as possible. Originally by MuleSoft.
Java
246
star
6

sous-chef

Develop & test your OpsCode Chef cookbooks with pleasure with Vagrant & VirtualBox
Ruby
237
star
7

quartzite

Quarzite is a thin idiomatic Clojure layer on top the Quartz Scheduler
Clojure
214
star
8

neocons

A feature rich idiomatic Clojure client for the Neo4J REST API
Clojure
201
star
9

validateur

Functional validations inspired by Ruby's ActiveModel
Clojure
183
star
10

cassandra-chef-cookbook

Chef cookbook for Apache Cassandra, DataStax Enterprise (DSE) and DataStax agent
Ruby
161
star
11

gdb-macros-for-ruby

GDB macros for Ruby processes inspection: by Jamis Buck, Mauricio Fernandez, Phillipe Hanrigou
119
star
12

urly

A tiny Clojure library that parses and attempts to unify URIs, URLs and relative values found in real world HTML anchors
Clojure
115
star
13

git-wtf

A Ruby script that displays detailed status of local & remote branches, whether they are merged, and so on
Ruby
109
star
14

welle

An expressive Clojure client for Riak with batteries included
Clojure
91
star
15

jdk_switcher

A yet another Ubuntu/Debian-specific tool that makes switching between multiple JDK versions a one liner
Shell
73
star
16

crawlista

Crawlista is a support library for Clojure applications that crawl the Web
HTML
66
star
17

neo4j-server-chef-cookbook

Chef cookbook for Neo4J Server (Community Edition)
HTML
52
star
18

merb-internals-handbook

A guide through internals of Merb, very fast, flexible and modular web framework in Ruby
45
star
19

propertied

Tiny Clojure library for working with Java property lists (java.util.Properties)
Clojure
36
star
20

chash

A yet another consistent hashing library for Clojure
Clojure
25
star
21

merb-messenger

Attempt to come up with a useable unified messaging interface for Merb
Ruby
20
star
22

english

New home for English gem code
Ruby
14
star
23

el4r

Emacs Lisp Ruby bridge: extend Emacs with Ruby (or Ruby apps with Emacs Lisp)
Ruby
14
star
24

vclock

A Clojure implementation of vector clocks, roughly ported from Riak Core
Clojure
13
star
25

rubyonrails23_unicorn_amqp_gem_example

An example Ruby on Rails 2.3 application that uses Ruby AMQP gem with Unicorn
Ruby
13
star
26

storygen

RSpec stories generator for Ruby on Rails applications
Ruby
12
star
27

riak_core_cowboy_example

An example app that uses Cowboy for HTTP and Riak Core for cluster membership and work distribution
Erlang
11
star
28

eventsource-netty5

EventSource (Server-Sent Events) Java client built with Netty 5. Based on the Netty 3.x implementation by Aslak HellesΓΈy.
Java
11
star
29

gradle-chef-cookbook

A Gradle OpsCode Chef cookbook that provides an up-to-date Gradle version and uses a reasonable license (MIT)
Ruby
10
star
30

haskell-platform-chef-cookbook

OpsCode Chef cookbook that provisions GHC 7.4 (or 7.6) and Haskell Platform 2012.02
Ruby
10
star
31

acits

RabbitMQ clients interoperability test suite
Clojure
10
star
32

emacsd

Chronicles of my Emacs life, recorded in a git repository
Emacs Lisp
9
star
33

go-language-chef-cookbook

A Chef cookbook that provisions the Go programming language build toolchain (stable or tip versions)
Ruby
9
star
34

mqtt-tls-playground

Example scripts/sample programs that demonstrate MQTT-with-TLS connections with various clients
Java
8
star
35

cyclist

Tiny Clojure library that detects cyclic dependencies between named entities
Clojure
6
star
36

sbt-chef-cookbook

Chef cookbook for Scala SBT (Simple Build Tool) 0.10.1, currently supports Ubuntu and Debian
Ruby
5
star
37

kiev_ruby_barcamp_2009

Some examples from my talk on beauty of Ruby's object model and power of modules/mixins/traits
Ruby
5
star
38

clang-chef-cookbook

A Chef cookbook that provisions the Clang compiler
Ruby
5
star
39

nginx-x-accel-redirect-example-application

Nginx X-Accel-Redirect example running on Merb core.
Ruby
5
star
40

rubyshift2013_talks

Slides from my talks at RubyShift 2013
5
star
41

money.ex

Elixir library for working with monetary amounts and currencies
Elixir
4
star
42

adventures_with_ssl_talk

Adventures with SSL: Hitting One Wall at a Time
4
star
43

eurler.scala

Resharpening my Scala saw. Nothing to see here, really.
Scala
4
star
44

rmq-chat-load-testing-scripts

Ruby
4
star
45

apache-jackrabbit-chef-cookbook

Chef cookbook that provisions Apache Jackrabbit (standalone)
Ruby
4
star
46

leiningen-chef-cookbook

Chef cookbook that provisions Leiningen 2.x
Ruby
4
star
47

esl-erlang-chef-cookbook

An OpsCode Chef cookbook that provides recent Erlang releases via ErlangSolutions apt repository
Ruby
3
star
48

quartzite.listeners.amqp

Quartz listeners that publish events over AMQP. Developed to be used in Clojure projects with Quartzite.
Clojure
3
star
49

flexri

Like Ruby's ri but for ActionScript 3.0.
3
star
50

nginxctl

A little utility that makes working with custom (built from source) local Nginx instances a bit easier
Python
3
star
51

lapin

Experimental F# client for RabbitMQ. HIGHLY INCOMPLETE AND IMMATURE.
F#
3
star
52

elasticsearch-chef-cookbook

ElasticSearch Chef cookbook that uses official Debian packages
Ruby
3
star
53

travisci-sbt-packaging

SBT packging tailored for travis-ci.org needs. May include pre-release versions, highly opinionated changes or other customizations most SBT users will never care about
Shell
3
star
54

pypy-chef-cookbook

An OpsCode Chef cookbook for PyPy (stable releases)
Ruby
3
star
55

green_bunny

Groovy RabbitMQ client heavily inspired by Bunny, March Hare and Langohr
Groovy
3
star
56

amqp_broker_stress_tests

A collection of stress tests for AMQP (0.9.1) brokers
Ruby
3
star
57

momentum.experiments

Just a bunch of experiments, move along
Clojure
3
star
58

openstack-summit-tokyo-2015

Slides of my talk at OpenStack Summit Tokyo
3
star
59

standalone-lein

A distribution of Leinigen that can be embedded into packages and other tools
Shell
2
star
60

noir-1.3-example

Just an example of a Noir project with a bunch of database clients, templating libraries and Leiningen plugins
Clojure
2
star
61

dattrack

Small command line tool for Digitally Imported (http://di.fm) fans
Go
2
star
62

dattrack.js

dattrack ported to JavaScript to learn a little bit about NPM module development workflow
JavaScript
2
star
63

rabbitmq-java-client-gae-example

Demonstrates how to use RabbitMQ Java client 3.3's thread factory on Google App Engine
Java
2
star
64

naturally_sorted_pathname

Pathname with natural order sorting
Ruby
2
star
65

rabbitmq_java_client_issue_19

User-provided test that aims to reproduce rabbitmq/rabbitmq-java-client#19
Clojure
2
star
66

clj1062

A small project that reproduces CLJ-1062
Clojure
2
star
67

multilingual

Multilingual dictionary for the 21st century, implemented as a Safari 5 extension.
JavaScript
2
star
68

euler.erl

Just learning myself some Erlang. Nothing to see here, move along.
Erlang
1
star
69

phantomjs-chef-cookbook

Chef cookbook that provisions PhantomJS
Ruby
1
star
70

priority-consumer-examples

Examples that demonstrate various priority queueing scenarios with RabbitMQ, in Java
Java
1
star
71

sync-up

A tiny utility that updates groups of Git and Mercurial repositories
JavaScript
1
star
72

marc_language_codes

MARC language codes table for Ruby
Ruby
1
star
73

mrmm

CLI-based multi-repository milestone manager
Rust
1
star
74

ruby_barcamp_kiev_nov_2009

Slides & examples from my talks on SSL/TLS/HTTPS and advanced RSpec topics
Ruby
1
star
75

proxima

Nothing to see here yet, move along
Erlang
1
star
76

nanite

self assembling fabric of ruby daemons
Ruby
1
star
77

ldnclj-september-2013

Code from team 2 at London Clojure Dojo, September 2013
Clojure
1
star
78

tlspejo

A tiny TLS echo server developed as a code kata exercise and as a possible hackable version of OpenSSL's s_server
Go
1
star