Want to subscribe to topics you're interested in?
Become a Member

Blocking Certain Robots

Discussion in 'System Administration' started by BamaStangGuy, Jun 23, 2014.

  1. BamaStangGuy

    BamaStangGuy Active Member

    465
    136
    43
    May 25, 2014
    Ratings:
    +179
    Local Time:
    4:22 AM
    Who here blocks certain scraper sites like BoardReader? If so, what robots.txt are you using? What other similar services are out there like BoardReader that we should be blocking?
     
  2. eva2000

    eva2000 Administrator Staff Member

    29,033
    6,589
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +9,782
    Local Time:
    7:22 PM
    Nginx 1.13.x
    MariaDB 5.5
    Isn't it just

    Code:
    User-agent: THEUSERAGENTYOUWANTTOBLOCK
    Disallow: /
    there's a few blocked user agents in /usr/local/nginx/conf/block.conf for Centmin Mod installs (not active)
    Code:
        ## Block user agents
        set $block_user_agents 0;
    
        # Don't disable wget if you need it to run cron jobs!
        #if ($http_user_agent ~ "Wget") {
        #    set $block_user_agents 1;
        #}
    
        # Disable Akeeba Remote Control 2.5 and earlier
        if ($http_user_agent ~ "Indy Library") {
            set $block_user_agents 1;
        }
    
        # Common bandwidth hoggers and hacking tools.
        if ($http_user_agent ~ "libwww-perl") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "GetRight") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "GetWeb!") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "Go!Zilla") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "Download Demon") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "Go-Ahead-Got-It") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "TurnitinBot") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "GrabNet") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "dirbuster") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "nikto") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "SF") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "sqlmap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "fimap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "nessus") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "whatweb") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "Openvas") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "jbrofuzz") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "libwhisker") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "webshag") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "Acunetix-Product") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "Acunetix") {
            set $block_user_agents 1;
        }
    
        if ($block_user_agents = 1) {
            return 403;
        }
     
    Last edited: Jun 23, 2014
  3. BamaStangGuy

    BamaStangGuy Active Member

    465
    136
    43
    May 25, 2014
    Ratings:
    +179
    Local Time:
    4:22 AM
    Anyone know what user agent board reader uses?
     
  4. eva2000

    eva2000 Administrator Staff Member

    29,033
    6,589
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +9,782
    Local Time:
    7:22 PM
    Nginx 1.13.x
    MariaDB 5.5
    checked your Nginx access logs ?

    I recall setting up board reader disallow for one client way back but you can probably find info at http://boardreader.com/info/robots.htm for legit bot. Wouldn't help if you have malicious bots masking themselves as a different bot/user-agent though.

    To remove a specific forum make sure you have the following in your robots.txt
    Code:
    User-agent: BoardReader
    Disallow: Forum ID: 2345, etc...
    
    To remove your board entirely, make sure you have the following in your robots.txt
    Code:
    User-agent: BoardReader
    Disallow: /
    
     
  5. BamaStangGuy

    BamaStangGuy Active Member

    465
    136
    43
    May 25, 2014
    Ratings:
    +179
    Local Time:
    4:22 AM
    Code:
    User-agent: BoardReader
    Disallow: /
    
    User-agent: *
    Allow: /
    I want to allow everything for all bots that I don't specifically disallow. This look like the correct setup for that?

    Do I even need the Allow part?
     
  6. eva2000

    eva2000 Administrator Staff Member

    29,033
    6,589
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +9,782
    Local Time:
    7:22 PM
    Nginx 1.13.x
    MariaDB 5.5
    Yeah that would work, usually I have it reversed, so allow * up top and any disallow entries after but shouldn't make a difference.
     
    • Like Like x 1
  7. dorobo

    dorobo Active Member

    420
    104
    43
    Jun 6, 2014
    Ratings:
    +161
    Local Time:
    5:22 PM
    latest
    latest
    The problem with some spiders is they don't follow the rules of robots.txt

    Might as well block them through csf.deny
     
  8. eva2000

    eva2000 Administrator Staff Member

    29,033
    6,589
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +9,782
    Local Time:
    7:22 PM
    Nginx 1.13.x
    MariaDB 5.5
    Yeah robots.txt is only guideline recommendations for bots - they don't have to abide by the rules laid out in robots.txt

    That is what /usr/local/nginx/conf/block.conf is intended for in Centmin Mod :)
     
    • Like Like x 1
  9. BamaStangGuy

    BamaStangGuy Active Member

    465
    136
    43
    May 25, 2014
    Ratings:
    +179
    Local Time:
    4:22 AM
    Would be nice to be able to easily comment out certain areas of the block.conf. It conflicts with xenForo like system, for example, if all of it is left.

    I would like to simply use the useragent blocking part.
     
  10. eva2000

    eva2000 Administrator Staff Member

    29,033
    6,589
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +9,782
    Local Time:
    7:22 PM
    Nginx 1.13.x
    MariaDB 5.5
    just create your own /usr/local/nginx/conf/block_ua.conf file with contents

    Code:
      
    ## Block user agents
        set $block_user_agents 0;
    
        # Don't disable wget if you need it to run cron jobs!
        #if ($http_user_agent ~* "Wget") {
        #    set $block_user_agents 1;
        #}
    
        # Disable Akeeba Remote Control 2.5 and earlier
        if ($http_user_agent ~* "Indy Library") {
            set $block_user_agents 1;
        }
    
        # Common bandwidth hoggers and hacking tools.
        if ($http_user_agent ~* "libwww-perl") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "GetRight") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "GetWeb!") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "Go!Zilla") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "Download Demon") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "Go-Ahead-Got-It") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "TurnitinBot") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "GrabNet") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "dirbuster") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "nikto") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "SF") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "sqlmap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "fimap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "nessus") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "whatweb") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "Openvas") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "jbrofuzz") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "libwhisker") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "webshag") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "Acunetix-Product") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "Acunetix") {
            set $block_user_agents 1;
        }
    
        if ($block_user_agents = 1) {
            return 403;
        }
    
    and include it in your Nginx vhost

    Code:
    include /usr/local/nginx/conf/block_ua.conf;
    restart Nginx

    ta da ! :D
     
  11. pamamolf

    pamamolf Well-Known Member

    2,532
    231
    63
    May 31, 2014
    Ratings:
    +394
    Local Time:
    12:22 PM
    Nginx-1.13.x
    MariaDB 10.1.x
    User agent name for Bing search name?
     
  12. eva2000

    eva2000 Administrator Staff Member

    29,033
    6,589
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +9,782
    Local Time:
    7:22 PM
    Nginx 1.13.x
    MariaDB 5.5
  13. pamamolf

    pamamolf Well-Known Member

    2,532
    231
    63
    May 31, 2014
    Ratings:
    +394
    Local Time:
    12:22 PM
    Nginx-1.13.x
    MariaDB 10.1.x
    Can we enable this globaly or we must enable it per vhost?