1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Analysing text files to obtain statistics on their content

Discussion in 'Scripting & Programming' started by Davo1977, Jun 23, 2008.

  1. Davo1977

    Davo1977 New Member

    5
    0
    0
    perl assignment

    I need to analyse a text file to obtain statistics on its contents? It should check if an argument has been provided and if not, it should prompt for, and accept input of a filename from the keyboard. This filename should also be checked to ensure it is in MS-DOS format and should be no longer than 8 characters. The file extension should be optional but if it is given then it should be .txt, either upper or lowercase. If an extension isn't given, .TXT should be added to the end of the filename. If the filename provided isn't the right format the program should display a suitable error message and end at this point. It should then check to see if the file exists using the filename given. If it doesn't then the error message should again be displayed then ended again. If the file does exist but it's empty the error message again to be displayed then ended. If the file exists and contains words or characters etc. it should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file.

    I simply am struggling with this assignment as Course study notes are simply not good enough to help me.

    Thompsonelvis@aol.com
     
    WIP: CIW Website Design Manager
  2. Maruchino

    Maruchino Bit Poster

    22
    0
    4
    Not one to bash your post.. but what are you asking? You simply state you cannot do the project and supply your email address - do you expect someone to complete it and email it to you?

    Think man think..
     
  3. hbroomhall

    hbroomhall Petabyte Poster Gold Member

    6,623
    115
    224
    I replied to this earlier - but somehow the reply has been lost in a black hole! (Glitches on CF?)

    First - welcome to CF!

    Second - not a good idea to put your email address in a posting, unless you *want* that mailbox to be full of spam!

    On to the main question. I think you need to let us see how far you have got with it. I'm not going to give you 'the answer' but I will point out where you may be going wrong.

    The golden rule for programming is to break the problem down into pieces. Code up those pieces, and then you will know how to deal with the problem as a whole. If you are going to be doing Web programming you *have* to be able to do this!

    Harry.
     
    Certifications: ECDL A+ Network+ i-Net+
    WIP: Server+
  4. Davo1977

    Davo1977 New Member

    5
    0
    0
    I am very new to Perl and have managed to compile this code using examples from various books. Could anyone oversee this coding and see how it could be improved.

    #!/usr/bin/perl

    use strict;
    use warnings;

    if ($#ARGV == -1) #no filename provided as a command line argument.
    {
    print("Please enter a filename: ");
    $filename = <STDIN>;
    chomp($filename);
    }
    else #got a filename as an argument.
    {
    $filename = $ARGV[0];
    }

    #perform the specified checks
    #check if filename is valid, exit if not
    if ($filename !~ m^/[a-z]{1,7}\.TXT$/i)
    {
    die("File format not valid\n");)
    }

    if ($filename !~ m/\.TXT$/i)
    {
    $filename .= ".TXT";
    }

    #check if filename is actual file, exit if it is.
    if (-e $filename)
    {
    die("File does not exist\n");
    }

    #check if filename is empty, exit if it is.
    if (-s $filename)
    {
    die("File is empty\n");
    }

    my $i = 0;
    my $p = 1;
    my $words = 0;
    my $chars = 0;

    open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!";

    #then use a while loop and series of if statements similar to the following
    while (<READFILE>) {
    chomp; #removes the input record Separator
    $i = $.; #"$". is the input record line numbers, $i++ will also work
    $p++ if (m/^$/); #count paragraphs
    $my @t = split (/\s+/); #split sentences into "words"
    $words += @t; #add count to $words
    $chars += tr/ //c; #tr/ //c count all characters except spaces and add to $chars
    }


    #display results
    print "There are $i lines in $data1\n";
    print "There are $p Paragraphs in $data1\n";
    print "There are $words in $data1\n";
    print "There are $chars in $data1\n";

    close(READFILE);
     
    WIP: CIW Website Design Manager
  5. hbroomhall

    hbroomhall Petabyte Poster Gold Member

    6,623
    115
    224
    Well - it seems a fairly good start. However, the way it reads suggests that it has been copied from two (at least) different places. Do you actually understand the code? Because if you do some of the changes should be obvious.

    I'll comment on some of the things I spotted. That isn't to say that there may not be other points:
    If you are using /i then the a-zA-Z is redundant - just use a-z. (Strictly you should use [:alpha:] instead of [a-z], but I wouldn't ding a beginner for that!)
    You are insisting on the .TXT being there - not allowing it to be optional.
    You aren't anchoring the match to the beginning of the filename (use ^).
    You allow a zero length part before the . - I would say that was a grey area - I'd insist on at least one character myself.
    You have got your filename above - why introduce something else?
    This is muddled. data1.txt was a literal filename - not a variable. If you wanted to use a variable it would be $file (or $filename from higher up).
    You can't use my like this! It isn't a variable. And you use my on a variable just once, but the loop here will try and reuse it. Take the 'my @t' out of the loop.
    The comment doesn't match the code. The code is counting words.


    Harry.
     
    Certifications: ECDL A+ Network+ i-Net+
    WIP: Server+

Share This Page

Loading...