DEVELOPMENT ENVIRONMENT

~liljamo/robots.txt

opinionated sane robots.txt generation
fix: nginx 403 -> 444
feat: generate-nginx.sh
feat: add second nix package for src.quest robots.txt

refs

master
browse  log 

clone

read-only
https://git.src.quest/~liljamo/robots.txt
read/write
git@git.src.quest:~liljamo/robots.txt

You can also use your local clone with git send-email.

#robots.txt

This repository contains my very opinionated (and possibly overengineered) robots.txt generation.

lists/ contains lists of user agents to disallow at root:

  • .txt files, line by line listings of user agents
  • empty lines are ignored
  • lines starting with # are ignored

bases/ contains site specific base robots.txt files:

  • in the robots.txt format
  • names represent the domains they're served at

generate.sh is a bash script for generating output robots.txt files:

  • arg $1 is a required path to the out file
  • arg $2 is an optional path to a base file

generate-nginx.sh is a bash script for generating an nginx if block to block user agents at that level:

  • prints out an if block with every user agent in the lists