The purpose of this research project is to discover the magical world of Perl — scripting language extraodinaire!
Your report needs to cover the origins/history of Perl, common uses of Perl, its prospects for portability, and a comparison/contrast with C (there are many common features and some distinct differences!). You can also get bonus for having an appendix with a biography of Larry Wall.
Support code can be chosen from the following suggestions. (Note: Any code below that totals over 4 levels will count as a program as well!)
deHTML (Level 3)
remove all HTML tags, center the title and any headings, place a blank line between paragraphs, use * for unordered lists or number ordered lists, put *'s around bold/strong text, put _'s around underlined text, put ~'s around italics/emphasized text
add (Level 1.5) to process the 'type' field for ordered lists; legal types are A, a, I, i, and 1
add (Level 1.5) to process nested lists correctly (cycle through *, +, @, o, and . for unordered lists and cycle through 1, I, A, i, and a for ordered lists
add (Level 3) to process tables
add (Level 1.5) to process nested tables correctly
add (Level 5) to process CSS
command line:
name of file to 'deHTML'; fixed output is sent to screen — user can redirect
-i# set paragraph indention to # spaces
-l# set line length to # characters (for use in centering and justification)
-jA justify paragraphs; the A can be f or nothing for full justification, l for left justification, r for right justification, or c for centered justification
add (Level 2.5) for a -h option that calls out to figlet (if possible) or internally (otherwise) to alter headings appropriately by size category (h1, you'll recall, is the largest)
readability (Level 2)
calculates the '3F' readability ('3F' is '4F' without the Fry Graph — which is a bit tedious at best) of the file(s) specified on the command line
add (Level 2) to add the 4th 'F'
command line options:
-a forces all contents to be considered instead of the regular sampling for each method
space abuse checker (Level 1)
looking through /etc/passwd, find all standard user accounts and get their space usage figure (du with -k for kilobytes, see man page for other useful options). now sort this list of user account|kbytes used primarily by kb (non-increasing) and secondarily by account name (non-decreasing). display the list for the 'admin'.
add (Level 2) to implement the space usage calculation in Perl — don't call du; (think stat, opendir, et.al.)
add another (Level 1.5) to make your Perl space usage calculator efficient by not recalculating usage of directories you've already calculated before (think hash) command line options:
-n# only show the usage for the # highest accounts (think head)
-xu,u,u... exclude the comma-separated list of users from the report; useful when certain special user accounts are known to have large disk usage, but they already gained permission ahead of time
mass renaming utility (Level 4)
rename files specified by wild-card on command line via pattern also given on command line subject to options; we must warn when extensions are changed and ask if we should continue (this can be turned off on the command line if we want — or via the 'A'lways choice when asked if it is okay); patterns we support are A for the (left-most) non-numeric sequence of characters in the filename, 9 for the (left-most) numeric sequence of characters in the filename, C to be replaced by a counter (normally starts at 0 but can be started elsewhere via command-line option), E to be replaced by the original file extension without alteration, F to be replaced by the original file name without alteration, T to be replaced by the last modification time of the file, and D to be replaced by the last modification date of the file; a pattern symbol that is to be interpreted literally (and not as a pattern) must be escaped with a \ character; the counter pattern is normally justified in a field large enough to count all files concerned, but this can be turned off via command-line option; the counter is normally reset for each new 'group' (basically when the rest of the filename except the counter changes at all, we've entered a new group), but this can be turned off via an option so that all files being renamed use the same counter; the date pattern format is normally YYYY-MM-DD, but this may be change-able via an option; the time pattern is normally 24:MM:SS, but this may be change-able via an option; normally we only consider files for renaming, but we may allow directories via an option setting; we normally search only in the current directory for files, but this may be change-able via an option; normally renaming alters the modification time of the file to the renaming time, but this may be turned off via an option
command-line options:
-e allow extension changes without warning
-j do not justify counter patterns
-c do not re-start counter pattern for each new group — count through all files being renamed
-C# start counter patterns numbering from #
add (Level 1) for a -d option that allows renaming of directories as well as files
add (Level 1) for a -a option that turns off alteration of the file modification time when renaming
add (Level 1.5) for a -R option that recursively descends directories under the current looking for the desired files (may need command-line tweaking when the shell pre-expands file wild-cards)
add (Level 3) for a -D option that allows the user to pick among at least 7 different date formats for the date pattern replacement
add (Level 2) for a -T option that allows the user to pick among at least 4 different time formats for the time pattern replacement
examples:
you need to rename the files from your digital camera that you took at your neice's birthday party:
massren -e 123[.]*.jp[e]*g \\Alicia_bday_9.jpg
you want to change all the usenet files you downloaded last week from GIF and gif and Gif etc. to just gif; on the way, let's sensibly sequence them:
massren -e [A-Za-z]*[0-9]*.[Gg][Ii][Ff] A_C.gif
Bob (recently fired) has named all of his sales report files after himself with nonsensical numbering to boot! we need to rename all his files — throughout his account — and watch for multiple reports per day (note the quotes to stop wild-card expansion by the shell):
massren -R 'Bob[0-9]*.doc' -a SalesRep_D_C.doc
we've moved all the files for the Acara project into a single directory and need to get them consistently named (note: if the original extension was empty, we'll not put the . in the new filename — just in case the OS is picky):
massren -a -c * Acara_C.E
Any chosen code must utilize functions for proper break-down of the program into smaller/re-usable parts. Any code that uses Perl's idea of object orientation gains (Level 2). Any code that creates a (proper) Perl library gains (Level 2). Any code that links to C code (for speed/etc.) gains (Level 3). These gains reduce for each time they are used: 3, 2, 1, 0.5. So, if you hand in two codes which each link to C code, the first would be worth an extra 3 levels but the second would only gain 2 levels from the C code.
You must choose at least one of deHTML and readability and at least one of 'space abuse checker' and 'mass renaming utility' for adequate supporting code to your paper.
This assignment is (Level 4) (not including any support code — which each have their own level rating to be added to this base).