DEVELOPMENT ENVIRONMENT

~liljamo/robots.txt

ref: 7b980c9192db1ec5a1e361cfb41f44595b8ffecb robots.txt/README.md -rw-r--r-- 753 bytes
7b980c91Jonni Liljamo fix: nginx 403 -> 444 27 days ago

#robots.txt

This repository contains my very opinionated (and possibly overengineered) robots.txt generation.

lists/ contains lists of user agents to disallow at root:

  • .txt files, line by line listings of user agents
  • empty lines are ignored
  • lines starting with # are ignored

bases/ contains site specific base robots.txt files:

  • in the robots.txt format
  • names represent the domains they're served at

generate.sh is a bash script for generating output robots.txt files:

  • arg $1 is a required path to the out file
  • arg $2 is an optional path to a base file

generate-nginx.sh is a bash script for generating an nginx if block to block user agents at that level:

  • prints out an if block with every user agent in the lists