Analysing text files to obtain statistics on their content

Davo1977 · Jun 23, 2008

Oh man! Ad-blocking software has been detected! :'(

This website is run by the community, for the community... and it needs advertisements in order to keep running. Blocking our ads means your killing our stats!
Please disable your ad-block, or become a premium member to hide all advertisements and this notice.

perl assignment

I need to analyse a text file to obtain statistics on its contents? It should check if an argument has been provided and if not, it should prompt for, and accept input of a filename from the keyboard. This filename should also be checked to ensure it is in MS-DOS format and should be no longer than 8 characters. The file extension should be optional but if it is given then it should be .txt, either upper or lowercase. If an extension isn't given, .TXT should be added to the end of the filename. If the filename provided isn't the right format the program should display a suitable error message and end at this point. It should then check to see if the file exists using the filename given. If it doesn't then the error message should again be displayed then ended again. If the file does exist but it's empty the error message again to be displayed then ended. If the file exists and contains words or characters etc. it should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file.

I simply am struggling with this assignment as Course study notes are simply not good enough to help me.

[email protected]

Maruchino · Jun 23, 2008

Oh man! Ad-blocking software has been detected! :'(

This website is run by the community, for the community... and it needs advertisements in order to keep running. Blocking our ads means your killing our stats!
Please disable your ad-block, or become a premium member to hide all advertisements and this notice.

Not one to bash your post.. but what are you asking? You simply state you cannot do the project and supply your email address - do you expect someone to complete it and email it to you?

Think man think..

hbroomhall · Jun 23, 2008

I replied to this earlier - but somehow the reply has been lost in a black hole! (Glitches on CF?)

First - welcome to CF!

Second - not a good idea to put your email address in a posting, unless you *want* that mailbox to be full of spam!

On to the main question. I think you need to let us see how far you have got with it. I'm not going to give you 'the answer' but I will point out where you may be going wrong.

The golden rule for programming is to break the problem down into pieces. Code up those pieces, and then you will know how to deal with the problem as a whole. If you are going to be doing Web programming you *have* to be able to do this!

Harry.

Davo1977 · Jun 24, 2008

I am very new to Perl and have managed to compile this code using examples from various books. Could anyone oversee this coding and see how it could be improved.

#!/usr/bin/perl

use strict;
use warnings;

if ($#ARGV == -1) #no filename provided as a command line argument.
{
print("Please enter a filename: ");
$filename = <STDIN>;
chomp($filename);
}
else #got a filename as an argument.
{
$filename = $ARGV[0];
}

#perform the specified checks
#check if filename is valid, exit if not
if ($filename !~ m^/[a-z]{1,7}\.TXT$/i)
{
die("File format not valid\n");)
}

if ($filename !~ m/\.TXT$/i)
{
$filename .= ".TXT";
}

#check if filename is actual file, exit if it is.
if (-e $filename)
{
die("File does not exist\n");
}

#check if filename is empty, exit if it is.
if (-s $filename)
{
die("File is empty\n");
}

my $i = 0;
my $p = 1;
my $words = 0;
my $chars = 0;

open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!";

#then use a while loop and series of if statements similar to the following
while (<READFILE>) {
chomp; #removes the input record Separator
$i = $.; #"$". is the input record line numbers, $i++ will also work
$p++ if (m/^$/); #count paragraphs
$my @t = split (/\s+/); #split sentences into "words"
$words += @t; #add count to $words
$chars += tr/ //c; #tr/ //c count all characters except spaces and add to $chars
}

#display results
print "There are $i lines in $data1\n";
print "There are $p Paragraphs in $data1\n";
print "There are $words in $data1\n";
print "There are $chars in $data1\n";

close(READFILE);

hbroomhall · Jun 24, 2008

Davo1977 said: ↑

This is what I have done so far on the following subject could anybody ellaborate on it please.
Click to expand...

Well - it seems a fairly good start. However, the way it reads suggests that it has been copied from two (at least) different places. Do you actually understand the code? Because if you do some of the changes should be obvious.

I'll comment on some of the things I spotted. That isn't to say that there may not be other points:

Davo1977 said: ↑

#check if filename is valid, exit if not
if ($filename !~ m/[a-zA-Z]{0,7}\.TXT$/i)
Click to expand...

If you are using /i then the a-zA-Z is redundant - just use a-z. (Strictly you should use [:alpha:] instead of [a-z], but I wouldn't ding a beginner for that!)
You are insisting on the .TXT being there - not allowing it to be optional.
You aren't anchoring the match to the beginning of the filename (use ^).
You allow a zero length part before the . - I would say that was a grey area - I'd insist on at least one character myself.

Davo1977 said: ↑

my $file = "data1.txt";
Click to expand...

You have got your filename above - why introduce something else?

Davo1977 said: ↑

open(READFILE, "<$data1.txt") or die "Can't open file '$data1.txt': $!";
Click to expand...

This is muddled. data1.txt was a literal filename - not a variable. If you wanted to use a variable it would be $file (or $filename from higher up).

Davo1977 said: ↑

#then use a while loop and series of if statements similar to the following
while (<READFILE>) {
chomp; #removes the input record Separator
$i = $.; #"$". is the input record line numbers, $i++ will also work
$p++ if (m/^$/); #count paragraphs
$my @t = split (/\s+/); #split sentences into "words" + store them in @t
Click to expand...

You can't use my like this! It isn't a variable. And you use my on a variable just once, but the loop here will try and reuse it. Take the 'my @t' out of the loop.

Davo1977 said: ↑

$words += @t; #count all characters except spaces and add to $chars
Click to expand...

The comment doesn't match the code. The code is counting words.

Harry.

Log in or Sign up

Analysing text files to obtain statistics on their content

Davo1977 New Member

Maruchino Bit Poster

hbroomhall Petabyte Poster Gold Member

Davo1977 New Member

hbroomhall Petabyte Poster Gold Member

Share This Page

Navigation

Popular Forums

Useful Links

Log in or Sign up

Analysing text files to obtain statistics on their content

Davo1977 New Member

Maruchino Bit Poster

hbroomhall Petabyte Poster Gold Member

Davo1977 New Member

hbroomhall Petabyte Poster Gold Member

Share This Page

Useful Searches