1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Twitter's robots.txt question:

Discussion in 'Web Development & Web Hosting' started by Hemat, Dec 24, 2010.

  1. Hemat

    Hemat New Member

    1
    0
    1
    Twitter's robots.txt, It shows everything is disallowed, but surprisingly search engines are crawling and indexing everybody's profiles pages, Why?
     
  2. karan1337

    karan1337 Byte Poster

    205
    5
    44
    http://www.robotstxt.org/faq/blockjustbad.html
     
    Certifications: MCP, MCDST, MCTS, Brainbench: XP and Vista [Master]
    WIP: Bachelors:Computer Science
  3. dmarsh

    dmarsh Terabyte Poster

    3,782
    302
    184
    Agreed bad robots can simply ignore robots.txt making it useless, then you will need to use a firewall and blacklist the IP's or something similar.

    This does not really answer the question however as the large search engines should be using crawlers that respect robots.txt, so the question still stands ?

    Are you sure the robots.txt is correctly set to block all access ?
     
    Certifications: CITP, BSc, HND, SCJP, SCJD, SCWCD, SCBCD, SCEA, N+, Sec+, Proj+, Server+, Linux+, MCTS, MCPD, MCSA, MCITP, CCDH

Share This Page

Loading...