Learn about Centmin Mod LEMP Stack today
Register Now

Blocking Certain Robots

Discussion in 'System Administration' started by BamaStangGuy, Jun 23, 2014.

  1. BamaStangGuy

    BamaStangGuy Active Member

    668
    192
    43
    May 25, 2014
    Ratings:
    +272
    Local Time:
    12:08 AM
    Who here blocks certain scraper sites like BoardReader? If so, what robots.txt are you using? What other similar services are out there like BoardReader that we should be blocking?

     
  2. eva2000

    eva2000 Administrator Staff Member

    54,600
    12,225
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,794
    Local Time:
    4:08 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Isn't it just

    Code:
    User-agent: THEUSERAGENTYOUWANTTOBLOCK
    Disallow: /
    there's a few blocked user agents in /usr/local/nginx/conf/block.conf for Centmin Mod installs (not active)
    Code:
        ## Block user agents
        set $block_user_agents 0;
    
        # Don't disable wget if you need it to run cron jobs!
        #if ($http_user_agent ~ "Wget") {
        #    set $block_user_agents 1;
        #}
    
        # Disable Akeeba Remote Control 2.5 and earlier
        if ($http_user_agent ~ "Indy Library") {
            set $block_user_agents 1;
        }
    
        # Common bandwidth hoggers and hacking tools.
        if ($http_user_agent ~ "libwww-perl") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "GetRight") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "GetWeb!") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "Go!Zilla") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "Download Demon") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "Go-Ahead-Got-It") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "TurnitinBot") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~ "GrabNet") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "dirbuster") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "nikto") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "SF") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "sqlmap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "fimap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "nessus") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "whatweb") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "Openvas") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "jbrofuzz") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "libwhisker") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "webshag") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "Acunetix-Product") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~ "Acunetix") {
            set $block_user_agents 1;
        }
    
        if ($block_user_agents = 1) {
            return 403;
        }
     
    Last edited: Jun 23, 2014
  3. BamaStangGuy

    BamaStangGuy Active Member

    668
    192
    43
    May 25, 2014
    Ratings:
    +272
    Local Time:
    12:08 AM
    Anyone know what user agent board reader uses?
     
  4. eva2000

    eva2000 Administrator Staff Member

    54,600
    12,225
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,794
    Local Time:
    4:08 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    checked your Nginx access logs ?

    I recall setting up board reader disallow for one client way back but you can probably find info at http://boardreader.com/info/robots.htm for legit bot. Wouldn't help if you have malicious bots masking themselves as a different bot/user-agent though.

    To remove a specific forum make sure you have the following in your robots.txt
    Code:
    User-agent: BoardReader
    Disallow: Forum ID: 2345, etc...
    
    To remove your board entirely, make sure you have the following in your robots.txt
    Code:
    User-agent: BoardReader
    Disallow: /
    
     
  5. BamaStangGuy

    BamaStangGuy Active Member

    668
    192
    43
    May 25, 2014
    Ratings:
    +272
    Local Time:
    12:08 AM
    Code:
    User-agent: BoardReader
    Disallow: /
    
    User-agent: *
    Allow: /
    I want to allow everything for all bots that I don't specifically disallow. This look like the correct setup for that?

    Do I even need the Allow part?
     
  6. eva2000

    eva2000 Administrator Staff Member

    54,600
    12,225
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,794
    Local Time:
    4:08 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Yeah that would work, usually I have it reversed, so allow * up top and any disallow entries after but shouldn't make a difference.
     
  7. dorobo

    dorobo Active Member

    420
    104
    43
    Jun 6, 2014
    Ratings:
    +162
    Local Time:
    2:08 PM
    latest
    latest
    The problem with some spiders is they don't follow the rules of robots.txt

    Might as well block them through csf.deny
     
  8. eva2000

    eva2000 Administrator Staff Member

    54,600
    12,225
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,794
    Local Time:
    4:08 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Yeah robots.txt is only guideline recommendations for bots - they don't have to abide by the rules laid out in robots.txt

    That is what /usr/local/nginx/conf/block.conf is intended for in Centmin Mod :)
     
  9. BamaStangGuy

    BamaStangGuy Active Member

    668
    192
    43
    May 25, 2014
    Ratings:
    +272
    Local Time:
    12:08 AM
    Would be nice to be able to easily comment out certain areas of the block.conf. It conflicts with xenForo like system, for example, if all of it is left.

    I would like to simply use the useragent blocking part.
     
  10. eva2000

    eva2000 Administrator Staff Member

    54,600
    12,225
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,794
    Local Time:
    4:08 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    just create your own /usr/local/nginx/conf/block_ua.conf file with contents

    Code:
      
    ## Block user agents
        set $block_user_agents 0;
    
        # Don't disable wget if you need it to run cron jobs!
        #if ($http_user_agent ~* "Wget") {
        #    set $block_user_agents 1;
        #}
    
        # Disable Akeeba Remote Control 2.5 and earlier
        if ($http_user_agent ~* "Indy Library") {
            set $block_user_agents 1;
        }
    
        # Common bandwidth hoggers and hacking tools.
        if ($http_user_agent ~* "libwww-perl") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "GetRight") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "GetWeb!") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "Go!Zilla") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "Download Demon") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "Go-Ahead-Got-It") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "TurnitinBot") {
            set $block_user_agents 1;
        }
        if ($http_user_agent ~* "GrabNet") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "dirbuster") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "nikto") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "SF") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "sqlmap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "fimap") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "nessus") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "whatweb") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "Openvas") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "jbrofuzz") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "libwhisker") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "webshag") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "Acunetix-Product") {
            set $block_user_agents 1;
        }
    
        if ($http_user_agent ~* "Acunetix") {
            set $block_user_agents 1;
        }
    
        if ($block_user_agents = 1) {
            return 403;
        }
    
    and include it in your Nginx vhost

    Code:
    include /usr/local/nginx/conf/block_ua.conf;
    restart Nginx

    ta da ! :D
     
  11. pamamolf

    pamamolf Premium Member Premium Member

    4,084
    428
    83
    May 31, 2014
    Ratings:
    +834
    Local Time:
    8:08 AM
    Nginx-1.25.x
    MariaDB 10.3.x
    User agent name for Bing search name?
     
  12. eva2000

    eva2000 Administrator Staff Member

    54,600
    12,225
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,794
    Local Time:
    4:08 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
  13. pamamolf

    pamamolf Premium Member Premium Member

    4,084
    428
    83
    May 31, 2014
    Ratings:
    +834
    Local Time:
    8:08 AM
    Nginx-1.25.x
    MariaDB 10.3.x
    Can we enable this globaly or we must enable it per vhost?