summaryrefslogtreecommitdiff
path: root/vendor/github.com/temoto/robotstxt/README.rst
diff options
context:
space:
mode:
Diffstat (limited to 'vendor/github.com/temoto/robotstxt/README.rst')
-rw-r--r--vendor/github.com/temoto/robotstxt/README.rst115
1 files changed, 115 insertions, 0 deletions
diff --git a/vendor/github.com/temoto/robotstxt/README.rst b/vendor/github.com/temoto/robotstxt/README.rst
new file mode 100644
index 000000000..92f1ae161
--- /dev/null
+++ b/vendor/github.com/temoto/robotstxt/README.rst
@@ -0,0 +1,115 @@
+What
+====
+
+This is a robots.txt exclusion protocol implementation for Go language (golang).
+
+
+Build
+=====
+
+To build and run tests run `go test` in source directory.
+
+
+Contribute
+==========
+
+Warm welcome.
+
+* If desired, add your name in README.rst, section Who.
+* Run `script/test && script/clean && echo ok`
+* You can ignore linter warnings, but everything else must pass.
+* Send your change as pull request or just a regular patch to current maintainer (see section Who).
+
+Thank you.
+
+
+Usage
+=====
+
+As usual, no special installation is required, just
+
+ import "github.com/temoto/robotstxt"
+
+run `go get` and you're ready.
+
+1. Parse
+^^^^^^^^
+
+First of all, you need to parse robots.txt data. You can do it with
+functions `FromBytes(body []byte) (*RobotsData, error)` or same for `string`::
+
+ robots, err := robotstxt.FromBytes([]byte("User-agent: *\nDisallow:"))
+ robots, err := robotstxt.FromString("User-agent: *\nDisallow:")
+
+As of 2012-10-03, `FromBytes` is the most efficient method, everything else
+is a wrapper for this core function.
+
+There are few convenient constructors for various purposes:
+
+* `FromResponse(*http.Response) (*RobotsData, error)` to init robots data
+from HTTP response. It *does not* call `response.Body.Close()`::
+
+ robots, err := robotstxt.FromResponse(resp)
+ resp.Body.Close()
+ if err != nil {
+ log.Println("Error parsing robots.txt:", err.Error())
+ }
+
+* `FromStatusAndBytes(statusCode int, body []byte) (*RobotsData, error)` or
+`FromStatusAndString` if you prefer to read bytes (string) yourself.
+Passing status code applies following logic in line with Google's interpretation
+of robots.txt files:
+
+ * status 2xx -> parse body with `FromBytes` and apply rules listed there.
+ * status 4xx -> allow all (even 401/403, as recommended by Google).
+ * other (5xx) -> disallow all, consider this a temporary unavailability.
+
+2. Query
+^^^^^^^^
+
+Parsing robots.txt content builds a kind of logic database, which you can
+query with `(r *RobotsData) TestAgent(url, agent string) (bool)`.
+
+Explicit passing of agent is useful if you want to query for different agents. For
+single agent users there is an efficient option: `RobotsData.FindGroup(userAgent string)`
+returns a structure with `.Test(path string)` method and `.CrawlDelay time.Duration`.
+
+Simple query with explicit user agent. Each call will scan all rules.
+
+::
+
+ allow := robots.TestAgent("/", "FooBot")
+
+Or query several paths against same user agent for performance.
+
+::
+
+ group := robots.FindGroup("BarBot")
+ group.Test("/")
+ group.Test("/download.mp3")
+ group.Test("/news/article-2012-1")
+
+
+Who
+===
+
+Honorable contributors (in undefined order):
+
+ * Ilya Grigorik (igrigorik)
+ * Martin Angers (PuerkitoBio)
+ * Micha Gorelick (mynameisfiber)
+
+Initial commit and other: Sergey Shepelev temotor@gmail.com
+
+
+Flair
+=====
+
+.. image:: https://travis-ci.org/temoto/robotstxt.svg?branch=master
+ :target: https://travis-ci.org/temoto/robotstxt
+
+.. image:: https://codecov.io/gh/temoto/robotstxt/branch/master/graph/badge.svg
+ :target: https://codecov.io/gh/temoto/robotstxt
+
+.. image:: https://goreportcard.com/badge/github.com/temoto/robotstxt
+ :target: https://goreportcard.com/report/github.com/temoto/robotstxt