robots.txt

robots.txt

Check rules

You can use GET or POST requests to check resources using our API with the https://api.robotstxt.io/v1/allowed URL. How do I know if my robot can access a resource?. All responses, if valid, contains a field called allowed, which indicates if the resource can be crawled using the given user agent. See examples below:

Example using GET request

Imagine that you want to check if your robot (called AwesomeBot) can crawl content from https://example.com/comments. The network request you will made is:

curl https://api.robotstxt.io/v1/allowed?url=https://example.com/comments&agent=AwesomeBot
            

The result will be like this:

{
    "url": "https://example.com/comments",
    "agent": "AwesomeBot",
    "allowed": true
}
            
Example using POST request

Imagine that you want to check if your robot (called AwesomeBot) can crawl content from https://example.com/comments. With this method you need to send a JSON payload with two keys, url and agent. The network request you will made is:

curl https://api.robotstxt.io/v1/allowed \
    -H 'Content-Type: application/json' \
    -d '{"url": "https://example.com/comments", "agent": "AwesomeBot"}'
            

The result will be like this:

{
    "url": "https://example.com/comments",
    "agent": "AwesomeBot",
    "allowed": true
}