feedparser-clj
Parse RSS/Atom feeds with a simple, clojure-friendly API. Uses the Java ROME library, wrapped in StructMaps.
Status
Usable for parsing and exploring feeds. No escaping of potentially-malicious content is performed, and we've inherited any quirks that ROME itself has.
Supports the following syndication formats:
- RSS 0.90
- RSS 0.91 Netscape
- RSS 0.91 Userland
- RSS 0.92
- RSS 0.93
- RSS 0.94
- RSS 1.0
- RSS 2.0
- Atom 0.3
- Atom 1.0
Usage
For a more detailed understanding about supported feed types and meanings, the ROME javadocs (under com.sun.syndication.feed.synd
) are a good resource.
There is only one function, parse-feed
, which takes a URL and returns a StructMap with all the feed's structure and content.
The following REPL session should give an idea about the capabilities and usage of feedparser-clj
.
Load the package into your namespace:
user=> (ns user (:require [feedparser-clj.core] [clojure.string :as string]))
Retrieve and parse a feed:
user=> (def f (parse-feed "http://gregheartsfield.com/atom.xml"))
parse-feed
also accepts a java.io.InputStream for reading from a file or other sources (see clojure.java.io/input-stream):
;; Contents of resources/feed.rss
<rss>
...
</rss>
user=> (def f (with-open
[feed-stream (-> "feed.rss"
clojure.java.io/resource
clojure.java.io/input-stream)]
(parse-feed feed-stream)))
f
is now a map that can be accessed by key to retrieve feed information:
user=> (keys f)
(:authors :author :categories :contributors :copyright :description :encoding :entries :feed-type :image :language :link :entry-links :published-date :title :uri)
A key applied to the feed gives the value, or nil if it was not defined for the feed.
user=> (:title f)
"Greg Heartsfield"
Feed/entry ID or GUID can be obtained with the :uri
key:
user=> (:uri f)
"http://gregheartsfield.com/"
Some feed attributes are maps themselves (like :image
) or lists of structs (like :entries
and :authors
):
user=> (map :email (:authors f))
("[email protected]")
Check how many entries are in the feed:
user=> (count (:entries f))
18
Determine the feed type:
user=> (:feed-type f)
"atom_1.0"
Look at the first few entry titles:
user=> (map :title (take 3 (:entries f)))
("Version Control Diagrams with TikZ" "Introducing cabal2doap" "hS3, with ByteString")
Find the most recently updated entry's title:
user=> (first (map :title (reverse (sort-by :updated-date (:entries f)))))
"Version Control Diagrams with TikZ"
Compute what percentage of entries have the word "haskell" in the body (uses clojure.string
):
user=> (let [es (:entries f)]
(* 100.0 (/ (count (filter #(string/substring? "haskell"
(:value (first (:contents %)))) es))
(count es))))
55.55555555555556
Installation
This library uses the Leiningen build tool.
ROME and JDOM are required dependencies, which may have to be manually retrieved and installed with Maven. After that, simply clone this repository, and run:
lein install
License
Distributed under the BSD-3 License.
Copyright
Copyright (C) 2010 Greg Heartsfield