1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Review Webbots,Spiders, and Screen Scrapers

Discussion in 'Articles, Reviews and Interviews' started by tripwire45, May 4, 2007.

  1. tripwire45
    Honorary Member

    tripwire45 Zettabyte Poster

    13,493
    179
    287
    Webbots,Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
    Author: Michael Schrenk
    Format: Paperback 328 pages
    Publisher: No Starch Press (March 30, 2007)
    ISBN-10: 1593271204
    ISBN-13: 978-1593271206

    Review by James Pyles
    May 4, 2007

    The book assumes a few things about the reader, which is good. If you aren't part of the "assumed" group, this book won't be very interesting or at least not very useful. Here they are...there are only two: The first assumption is that you know how to program. There are no details about to what degree, so I'll assume that basic programming skills will be sufficient. The second assumption is a little more specific. You'll need to have at least a basic understanding of PHP. If you need help in these areas, try reading books like Beginning PHP5, Learning PHP5, and perhaps some other beginner's programming book.

    As with most books and especially most programming books. there's a companion website which in this case is http://www.schrenk.com/nostarch/webbots/. Once there, you can download code libraries and sample scripts. The only provision is that you can't use these materials commercially (which is a bummer if you do this professionally...still, it's best to play by the rules).

    Since Schrenk was so polite as to build a promotional website for the public, I decided to check it out. I found it interesting that the author refers to himself in the third person on his home page, especially when he talks about writing this book: "We offer both traditional online services as well as advanced strategies incorporating automated browsing agents called webbots. In fact, we wrote the book on webbots". I checked and Schrenk is the sole author of the book so "we" must be an attempt to create the illusion of having a corporate staff when in fact, Michael Schrenk is the corporate staff.

    Oh, the book. I'm supposed to be reviewing the book (and I'll try not to hold it against the author that he mentions being a fan of The Brady Bunch). Ok, here goes. Learning the skills taught in this book is a little like learning to be "M"; developing and sending your "double-oh" agents out to solve problems and retrieve information. When you develop webbots, you are creating "representatives" of your intentions to the web (for good or for ill). Web robots tend to have a bad rep since the general public almost always is unaware of them until some cracker uses them to steal their identity, to spam them, or to load malicious adware on their PCs. Like most things however, there is both a light and dark side to consider.

    If you are someone who is or plans to specialize in developing webbots for corporate use, this book contains all the information, skills, and tools you'll need to get going. Schrenk presents the material both with the authority that his eleven years of programming experience gives him and in a friendly, easy-to-read style. You don't have to necessarily "speak geek" to read this book.

    The book's assumptions are spot on. If you don't have programming experience in general and PHP experience in specific, your learning curve will be steep if not just plain vertical. On the other hand, you don't have to be a programming genius to effectively use this book's content. I'd say the text's target audience falls in the beginner to intermediate range. Some of the screen shots indicate that the author develops on a Microsoft platform but the software tools he recommends (PHP, CURL, and MySQL) are freely available downloads. Using this book to learn webbot development won't cause you to break into your piggy bank (unless you can't afford the price of this text), plus we Linux people can also participate.

    Pay special attention to Chapter 28 Keeping Webbots Out of Trouble. Earlier, I mentioned the "dark side" of webbots development. It would be just as easy to use this information to develop "pain-in-the-butt" bots that do everything from annoying millions of Internet users to committing crimes against them. Take very seriously the ethical and legal standards that govern legitimate webbot development. These "critters" have a valued place on the web but they can be very much misused. In the words of Uncle Ben (or rather Stan Lee), "With great power comes great responsibility". It's like learning to drive a car. It's not enough to learn how to drive adequately. You must also practice driving safely. Don't hurt anyone. With that in mind, ladies and gentlemen, start your engines. This book is a great ride.
     
    Certifications: A+ and Network+
  2. tripwire45
    Honorary Member

    tripwire45 Zettabyte Poster

    13,493
    179
    287
    I got a nice email from the author thanking me for my review. Turns out he really does have a staff and they did assist in getting the book written, so the "we" is authentic. :wink:
     
    Certifications: A+ and Network+
  3. JasonGawker

    JasonGawker New Member

    1
    0
    1
    Hello

    I read the ***removed spam link***, and I found it highly informative, cutting edge, and usually it goes straight to the point. I liked the illustrations and the code snippets as well, they're very useful. I worked on my private spider myself and it's up and running, and that's only under a few months of work. Of course I recommend it to anyone interested in these topics.
     
  4. dmarsh

    dmarsh Terabyte Poster

    3,782
    302
    184
    I think editing of the above post is a little ironic.

    Trip since you don't code in PHP and I presume you know little about spiders and bots aren't you essentially spamming the forums... ?

    In fact with 2600 views in under 20 minutes I can't help but think this post is being spidered !
     
    Certifications: CITP, BSc, HND, SCJP, SCJD, SCWCD, SCBCD, SCEA, N+, Sec+, Proj+, Server+, Linux+, MCTS, MCPD, MCSA, MCITP, CCDH
  5. Sparky
    Highly Decorated Member Award

    Sparky Zettabyte Poster Moderator

    10,191
    299
    319
    Posted in 2007 mate 8)
     
    Certifications: MSc MCSE MCSA:M MCSA:S MCITP:EA MCTS(x5) Security+ Network+ A+
    WIP: Exchange 2007\2010
  6. dmarsh

    dmarsh Terabyte Poster

    3,782
    302
    184
    Oh my bad missed the date OP, didn't realise it was a zombie thread... :oops:
     
    Certifications: CITP, BSc, HND, SCJP, SCJD, SCWCD, SCBCD, SCEA, N+, Sec+, Proj+, Server+, Linux+, MCTS, MCPD, MCSA, MCITP, CCDH

Share This Page

Loading...