Package feed implements a flexible, robust and efficient RSS/Atom parser.
If you just want some bytes to be quickly parsed into an object without care about underlying feed type, you can start with this: Simple Use
If you want to take a deeper dive into how you can customize the parser behavior:
- Extending BasicFeed
- Robustness and recovery from bad input
- Parse with specification compliancy checking
- RSS and Atom extensions
Get the pkg
go get github.com/jloup/xml
Use it in code
import "github.com/jloup/xml/feed"
Example:
f, err := os.Open("feed.txt")
if err != nil {
return
}
myfeed, err := feed.Parse(f, feed.DefaultOptions)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
return
}
fmt.Printf("FEED '%s'\n", myfeed.Title)
for i, entry := range myfeed.Entries {
fmt.Printf("\t#%v '%s' (%s)\n\t\t%s\n\n", i, entry.Title,
entry.Link,
entry.Summary)
}
Output:
FEED 'Me, Myself and I'
#0 'Breakfast' (http://example.org/2005/04/02/breakfast)
eggs and bacon, yup !
#1 'Dinner' (http://example.org/2005/04/02/dinner)
got soap delivered !
feed.Parse returns a BasicFeed which fields are :
// Rss channel or Atom feed
type BasicFeed struct {
Title string
Id string // Atom:feed:id | RSS:channel:link
Date time.Time
Image string // Atom:feed:logo:iri | RSS:channel:image:url
Entries []BasicEntryBlock
}
type BasicEntryBlock struct {
Title string
Link string
Date time.Time // Atom:entry:updated | RSS:item:pubDate
Id string // Atom:entry:id | RSS:item:guid
Summary string
}
BasicFeed is really basic struct implementing feed.UserFeed interface. You may want to access more values extracted from feeds. For this purpose you can pass your own implementation of feed.UserFeed to feed.ParseCustom.
type UserFeed interface {
PopulateFromAtomFeed(f *atom.Feed) // see github.com/jloup/xml/feed/atom
PopulateFromAtomEntry(e *atom.Entry)
PopulateFromRssChannel(c *rss.Channel) // see github.com/jloup/xml/feed/rss
PopulateFromRssItem(i *rss.Item)
}
func ParseCustom(r io.Reader, feed UserFeed, options ParseOptions) error
To avoid starting from scratch, you can embed feed.BasicEntryBlock and feed.BasicFeedBlock in your structs
Example:
type MyFeed struct {
feed.BasicFeedBlock
Generator string
Entries []feed.BasicEntryBlock
}
func (m *MyFeed) PopulateFromAtomFeed(f *atom.Feed) {
m.BasicFeedBlock.PopulateFromAtomFeed(f)
m.Generator = fmt.Sprintf("%s V%s", f.Generator.Uri.String(),
f.Generator.Version.String())
}
func (m *MyFeed) PopulateFromRssChannel(c *rss.Channel) {
m.BasicFeedBlock.PopulateFromRssChannel(c)
m.Generator = c.Generator.String()
}
func (m *MyFeed) PopulateFromAtomEntry(e *atom.Entry) {
newEntry := feed.BasicEntryBlock{}
newEntry.PopulateFromAtomEntry(e)
m.Entries = append(m.Entries, newEntry)
}
func (m *MyFeed) PopulateFromRssItem(i *rss.Item) {
newEntry := feed.BasicEntryBlock{}
newEntry.PopulateFromRssItem(i)
m.Entries = append(m.Entries, newEntry)
}
func main() {
f, err := os.Open("feed.txt")
if err != nil {
return
}
myfeed := &MyFeed{}
err = feed.ParseCustom(f, myfeed, feed.DefaultOptions)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
return
}
fmt.Printf("FEED '%s' generated with %s\n", myfeed.Title, myfeed.Generator)
for i, entry := range myfeed.Entries {
fmt.Printf("\t#%v '%s' (%s)\n", i, entry.Title, entry.Link)
}
}
Output:
FEED 'Me, Myself and I' generated with http://www.atomgenerator.com/ V1.0
#0 'Breakfast' (http://example.org/2005/04/02/breakfast)
#1 'Dinner' (http://example.org/2005/04/02/dinner)
Feeds are wildly use and it is quite common that a single invalid character, missing closing/starting tag invalidate the whole feed. Standard encoding/xml is quite pedantic (as it should) about input xml.
In order to produce an output feed at all cost, you can set the number of times you want the parser to recover from invalid input via XMLTokenErrorRetry field in ParseOptions. The strategy is quite simple, if xml decoder returns an XMLTokenError while parsing, the faulty token will be removed from input and the parser will retry to build a feed from it. It useful when invalid html, xml is present in content tag (atom) for example.
Example:
f, err := os.Open("testdata/invalid_atom.xml")
opt := feed.DefaultOptions
opt.XMLTokenErrorRetry = 1
_, err = feed.Parse(f, opt)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
} else {
fmt.Println("no error")
}
Output:
no error
with XMLTokenError set to 0, it would have produced the following error:
Cannot parse feed: [XMLTokenError] XML syntax error on line 574: illegal character code U+000C
RSS and Atom feeds should conform to a specification (which is complex for Atom). The common behavior of Parse functions is to not be too restrictive about input feeds. To validate feeds, you can pass a custom FlagChecker to ParseOptions. If you really know what you are doing you can enable/disable only some spec checks.
Error flags can be found for each standard in packages documentation:
- RSS : github.com/jloup/xml/feed/rss
- Atom : github.com/jloup/xml/feed/atom
Example:
// the input feed is not compliant to spec
f, err := os.Open("feed.txt")
if err != nil {
return
}
// the input feed should be 100% compliant to spec...
flags := xmlutils.NewErrorChecker(xmlutils.EnableAllError)
//... but it is OK if Atom entry does not have <updated> field
flags.DisableErrorChecking("entry", atom.MissingDate)
options := feed.ParseOptions{extension.Manager{}, &flags}
myfeed, err := feed.Parse(f, options)
if err != nil {
fmt.Printf("Cannot parse feed:\n%s\n", err)
return
}
fmt.Printf("FEED '%s'\n", myfeed.Title)
Output:
Cannot parse feed:
in 'feed':
[MissingId]
feed's id should exist
Both formats allow to add third party extensions. Some extensions have been implemented for the example e.g. RSS dc:creator (github.com/jloup/xml/feed/rss/extension/dc)
Example:
type ExtendedFeed struct {
feed.BasicFeedBlock
Entries []ExtendedEntry
}
type ExtendedEntry struct {
feed.BasicEntryBlock
Creator string // <dc:creator> only present in RSS feeds
Entries []feed.BasicEntryBlock
}
func (f *ExtendedFeed) PopulateFromAtomEntry(e *atom.Entry) {
newEntry := ExtendedEntry{}
newEntry.PopulateFromAtomEntry(e)
f.Entries = append(f.Entries, newEntry)
}
func (f *ExtendedFeed) PopulateFromRssItem(i *rss.Item) {
newEntry := ExtendedEntry{}
newEntry.PopulateFromRssItem(i)
creator, ok := dc.GetCreator(i)
// we must check the item actually has a dc:creator element
if ok {
newEntry.Creator = creator.String()
}
f.Entries = append(f.Entries, newEntry)
}
func main() {
f, err := os.Open("rss.txt")
if err != nil {
return
}
//Manager is in github.com/jloup/xml/feed/extension
manager := extension.Manager{}
// we add the dc extension to it
// dc extension is in "github.com/jloup/xml/feed/rss/extension/dc"
dc.AddToManager(&manager)
opt := feed.DefaultOptions
//we pass our custom extension Manager to ParseOptions
opt.ExtensionManager = manager
myfeed := &ExtendedFeed{}
err = feed.ParseCustom(f, myfeed, opt)
if err != nil {
fmt.Printf("Cannot parse feed: %s\n", err)
return
}
fmt.Printf("FEED '%s'\n", myfeed.Title)
for i, entry := range myfeed.Entries {
fmt.Printf("\t#%v '%s' by %s (%s)\n", i, entry.Title,
entry.Creator,
entry.Link)
}
}
Output:
FEED 'Me, Myself and I'
#0 'Breakfast' by Peter J. (http://example.org/2005/04/02/breakfast)
#1 'Dinner' by Peter J. (http://example.org/2005/04/02/dinner)