Twitter's robots.txt question:

Discussion in 'Web Development & Web Hosting' started by Hemat, Dec 24, 2010.

  1. Hemat

    Hemat New Member

    1
    0
    1
    Twitter's robots.txt, It shows everything is disallowed, but surprisingly search engines are crawling and indexing everybody's profiles pages, Why?
     
  2. karan1337

    karan1337 Byte Poster

    205
    5
    44
    http://www.robotstxt.org/faq/blockjustbad.html
     
    Certifications: MCP, MCDST, MCTS, Brainbench: XP and Vista [Master]
    WIP: Bachelors:Computer Science
  3. dmarsh
    Honorary Member 500 Likes Award

    dmarsh Petabyte Poster

    4,305
    503
    259
    Agreed bad robots can simply ignore robots.txt making it useless, then you will need to use a firewall and blacklist the IP's or something similar.

    This does not really answer the question however as the large search engines should be using crawlers that respect robots.txt, so the question still stands ?

    Are you sure the robots.txt is correctly set to block all access ?
     

Share This Page

Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.