• Stars
    star
    257
  • Rank 158,728 (Top 4 %)
  • Language
    Go
  • License
    MIT License
  • Created over 14 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The robots.txt exclusion protocol implementation for Go language

What

This is a robots.txt exclusion protocol implementation for Go language (golang).

Build

To build and run tests run go test in source directory.

Contribute

Warm welcome.

  • If desired, add your name in README.rst, section Who.
  • Run script/test && script/clean && echo ok
  • You can ignore linter warnings, but everything else must pass.
  • Send your change as pull request or just a regular patch to current maintainer (see section Who).

Thank you.

Usage

As usual, no special installation is required, just

import "github.com/temoto/robotstxt"

run go get and you're ready.

1. Parse

First of all, you need to parse robots.txt data. You can do it with functions FromBytes(body []byte) (*RobotsData, error) or same for string:

robots, err := robotstxt.FromBytes([]byte("User-agent: *\nDisallow:"))
robots, err := robotstxt.FromString("User-agent: *\nDisallow:")

As of 2012-10-03, FromBytes is the most efficient method, everything else is a wrapper for this core function.

There are few convenient constructors for various purposes:

  • FromResponse(*http.Response) (*RobotsData, error) to init robots data

from HTTP response. It does not call response.Body.Close():

robots, err := robotstxt.FromResponse(resp)
resp.Body.Close()
if err != nil {
    log.Println("Error parsing robots.txt:", err.Error())
}
  • FromStatusAndBytes(statusCode int, body []byte) (*RobotsData, error) or

FromStatusAndString if you prefer to read bytes (string) yourself. Passing status code applies following logic in line with Google's interpretation of robots.txt files:

  • status 2xx -> parse body with FromBytes and apply rules listed there.
  • status 4xx -> allow all (even 401/403, as recommended by Google).
  • other (5xx) -> disallow all, consider this a temporary unavailability.

2. Query

Parsing robots.txt content builds a kind of logic database, which you can query with (r *RobotsData) TestAgent(url, agent string) (bool).

Explicit passing of agent is useful if you want to query for different agents. For single agent users there is an efficient option: RobotsData.FindGroup(userAgent string) returns a structure with .Test(path string) method and .CrawlDelay time.Duration.

Simple query with explicit user agent. Each call will scan all rules.

allow := robots.TestAgent("/", "FooBot")

Or query several paths against same user agent for performance.

group := robots.FindGroup("BarBot")
group.Test("/")
group.Test("/download.mp3")
group.Test("/news/article-2012-1")

Who

Honorable contributors (in undefined order):

  • Ilya Grigorik (igrigorik)
  • Martin Angers (PuerkitoBio)
  • Micha Gorelick (mynameisfiber)

Initial commit and other: Sergey Shepelev [email protected]

Flair

https://travis-ci.org/temoto/robotstxt.svg?branch=master https://goreportcard.com/badge/github.com/temoto/robotstxt

More Repositories

1

heroshi

Heroshi – open source web crawler.
Go
49
star
2

vender

Open source vending machine controller firmware
Go
48
star
3

nginx-lint

nginx config advisor
Haskell
31
star
4

dlock

Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.
Go
25
star
5

gpio-cdev-go

Linux 4.8+ GPIO interface (gpiochip char dev) for pure Go
Go
10
star
6

py-gitfs

FUSE (filesystem in userspace) that presents contents of git repository.
Python
5
star
7

configs

my configs are free to read: bash, git, sublime, vim, zsh
Vim Script
5
star
8

iodin

MDB (9bit serial) GPIO bit-banging option for VMC (vending machine controller). Sub-project, used in github.com/temoto/vender
Rust
4
star
9

mysql-leader-kuska

Mysql leader (master) election using consul lock
Shell
3
star
10

ru-nalog-go

Go библиотека для работы с онлайн-кассами Умка
Go
3
star
11

alive

Go library waiting for subtasks. sync.WaitGroup on steroids. Helps to coordinate graceful or fast shutdown.
Go
3
star
12

stm32

Toy project to get familiar with STM32F0-Discovery board and wonderful world of ARM Cortex-M processors. Based on template https://github.com/szczys/stm32f0-discovery-basic-template
C
3
star
13

peacemaker

Kill processes that abuse system resources. Peace, bro.
Go
2
star
14

herdis-server

Herdis is a toy, study project, a proof of concept implementation of simple in-memory key-value storage with network interface and simple custom text protocol. Name is inspired by Redis.
Haskell
2
star
15

vm-001

Toy virtual machine with light-weight concurrency support. This is a study project to get some experience with virtual machines and concurrency.
2
star
16

inputevent-go

Pure Go library for InputEvent protocol and keycodes; copied from @gvalkov/golang-evdev
Go
1
star
17

k411

A toy OS kernel
C
1
star
18

temoto_ru

My personal website
HTML
1
star
19

pytermkey

pytermkey is python binding to libtermkey library by Paul Evans.
C
1
star
20

py-helpers

Various little helpers and shortcuts around Python stdlib or other libraries
1
star
21

project-537

Project-537 is a remake of famous video game Constructor by System 3.
Go
1
star
22

extremofile

Durable local storage. File that survives failures.
Go
1
star
23

venderctl

Open source vending machine data processing server. The backend for https://github.com/temoto/vender
Go
1
star
24

linux-input-control

For XCSoar flight pilots. Switch IR touchpad on/off with power button on Kobo.
Go
1
star
25

spq

Persistent queue library putting fail safety over speed.
Go
1
star