Perl

What is Perl?

Perl is a powerful and easy-to-use scripting language, that is, it supports “scripts”, which are programs written for automating the execution of multiple tasks (e.g. running numerical convergence tests) that could alternatively be executed one-by-one by a human operator. This is opposed to programming languages, such as Fortran, which are used for computation, not for processing text or interacting with the shell. Over recent years, Perl has evolved to a general-purpose programming language used for a wide range of tasks such as web development, network programming, GUI development, and more.

This section is intended to give a quick overview of Perl, necessary and relevant to the focus of this course. Much more information can be found on the internet; see e.g. perldoc and perltoc and tutorialspoint.

Before we start using Perl, make sure you have Perl. In a shell terminal type:

$ which perl
/usr/bin/perl
$ perl --version

This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level

Good! I have Perl 5 in the path /usr/bin/ which is already in my search path (PATH).

Perl scripts

A perl script is a text file with the extension “.pl”. Create a text file, named hello.pl, with the content:

#!/usr/bin/perl

print "Hello World!\n";

The first line (starting with a shebang character #!) enforces using Perl 5 in /usr/bin/ (you may need to modify depending on the location of Perl in your computer). Alternatively, if Perl is in your search path (which is supposed to be), you could also write

#!perl

print "Hello World!\n";

To “run the script” you would simply go to a shell and type perl hello.pl.

Scalar data

A scalar is a single unit of data. It is either a number or a string:

  • Number literals can be integers (e.g. 6 or 123) or floating point numbers (e.g. -1.23e-4 or 1.2).

  • String literals are sequences of characters (e.g. 'a1' or "Hellow"). They are usually alphanumeric values delimited by either single (‘) or double (“) quotes.

    • Single quotes: A single quote string literal is just a collection of characters, e.g. 'a1' or 'Hellow'. They also support two special characters \' and \\.
    • Double quotes: A double quote string literal allows variables interpolation and supports \ (backslash) escape characters, such as \n (newline), \t (tab), \u (forces next character to uppercase), \l (forces next character to lowercase), \U (forces all following characters to uppercase), \L (forces all following characters to lowercase), \E (ends \U and \L).

    For example, generate a file, named test1.pl:

    #!perl
    
    print 'Abc' . "\n" ;
    print 'Abc\'s' . "\n" ;
    print 'Abc\\' . "\n" ;
    
    print "\n";
    
    print "A\tB\n\LCDEF\E\n123\n";
    

    And then in the shel:

    $ perl test1.pl
    Abc
    Abc's
    Abc\
    
    A       B
    cdef
    123
    

    Note that . concatenates two strings; see “miscellaneous operators” below.

List data

A list is an ordered set of scalars. For example (1,2,3) is a list of three numbers, ('a','b','c') is list of three strings, ('Hello') is a list of one string, and ( ) is an empty list.

Scalar variables

A scalar variable stores a single scalar data (a number or a string) and hence reserves some space in memory. A scalar variable starts with a $ sign. For example consider $a = 1; written in a Perl script. Here, 1 is a scalar number and $a is a scalar variable that holds value 1 (or stores number 1). As another example, $name='Mohammad'; stores the string 'Mohammad' in the scalar variable $name. Note that variable names are case sensitive. This means that it is OK if, for instance, we write $ab=1 and $AB=2 in the same script.

Array variables

An array is a variable that stores an ordered list of scalar values. An array variable starts with a @ sign. To refer to a single element of an array variable, which is a scalar variable, we use the $ sign with the variable name and followed by the index of the element in square brackets []. Indexation of elements starts with 0. For example, generate a file, named test2.pl:

#!perl

@ages = (20, 22, 25);
@names = ("Dan", "Maria", "Sanju");

print "\$ages[0] = $ages[0]\n";       #or equivalently "\$ages[0] = " . $ages[0] . "\n";
print "\$ages[1] = $ages[1]\n";
print "\$ages[2] = $ages[2]\n";
print "\$names[0] = $names[0]\n";
print "\$names[1] = $names[1]\n";
print "\$names[2] = $names[2]\n";

print "\n";

print "$names[0] is $ages[0] years old\n";
print "$names[1] is $ages[1] years old\n";
print "$names[2] is $ages[2] years old\n";

We use \ before $ just to print its name, not its value. When executed (type perl test2.pl in a shell), this will produce the following result:

$ages[0] = 20
$ages[1] = 22
$ages[2] = 25
$names[0] = Dan
$names[1] = Maria
$names[2] = Sanju

Dan is 20 years old
Maria is 22 years old
Sanju is 25 years old

Remark: (lists vs. arrays) One of the most common sources of confusion is the difference between lists and arrays. Consider @vec = (1,2,3). In this example, the thing on the right-hand side of = is a list. We assign that list to the the variable @vec. That variable, which begins with the @ sign, is an array. Therefore, a list can be assigned to an array. Moreover, arrays can have names (starting with @), but lists cannot.

Perl operators

Perl language supports many operator types. We will review four most frequently used operators.

Arithmetic operators include addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (**). In Perl all operations with numbers are performed using double precision.

For example let $a = 10 and $b = 2. Then $a + $b will give 12 and $a ** $b will give 100.

Assignment operators:

  • = assigns values from right side operand to left side operand
  • += e.g. $b += $a is equivalent to $b = $b + $a
  • -= e.g. $b-= $a is equivalent to $b = $b - $a
  • *= e.g. $b *= $a is equivalent to $b = $b * $a
  • /= e.g. $b/= $a is equivalent to $b = $b / $a
  • **= e.g. $b **= $a is equivalent to $b = $b ** $a

Relational operators are divided into two categories:

  • Numeric relational operators (==, !=, <, >, <=, >=)
  • String relational operators (eq, ne, lt, gt, le, ge)

Example: suppose $a=10, $b=20, $c="xyz", $d="XYZ". Then ($a == $b) is not true, and ($c ne $d) is true.

Miscellaneous Operators:

  • . (concatenation) concatenates two strings; see the examples above.
  • x (repetition) returns a string consisting of the left operand repeated the number of times specified by the right operand. For example ('+' x 5) will give +++++.
  • .. (range) returns a list of values counting (up by ones) from the left value to the right value. For example, (4..9) will give (4, 5, 6, 7, 8, 9).
  • ++ (increment) increases integer value by one. For example, if $a=7, then $a++ is 8.
  • -- (decrement) decreases integer value by one. For example, if $a=7, then $a-- is 6.

Loops

The most useful loops in Perl are while, for, and foreach loops.

The while and for loops in Perl behave pretty much like most other languages.

The syntax of a while loop is while(condition) {statements;}.

The syntax of a for loop is for ( init; condition; increment ){statements;}.

$n = 5; $fact = 1; $i = 1;

while ($i <= $n ) {
         $fact *= $i;
         $i += 1;
}

print "$n! = $fact \n";

This will display 5! = 120 in terminal window.

for ($i = 1; $i <= 10; $i += 1) {
      print "$i ";
}

print "\n";

This will display 1 2 3 4 5 6 7 8 9 10 in terminal window.

The foreach loop iterates over a normal list value (assigned to an array variable) by setting the iteration variable to be each element of the list in turn.

The syntax for a foreach loop is foreach $i (list) {statements;}.

@food = qw/ pancake  taco soup/ ;          # qw is the quote word operator
@meal = ('breakfast', 'lunch', 'dinner');
$i=0;

foreach $a (@food) {
       print "We have $a for $meal[$i] \n";
       $i+=1;
}

This will display

We have pancake for breakfast
We have taco for lunch
We have soup for dinner

Nested loops: A loop can be nested inside another loop. For example the syntax for a “nested for loop” is

for ( init; condition; increment ){
      for ( init; condition; increment ){statements;}
      statements;
}

Conditionals

The basic structure of the if-elsif-else statement is shown in the following simple example.

if (1==2) {print "1=2\n";}
elsif (1==3) {print "1=3\n";}
else {print "I found out that 1 is not equal to 2 or 3! \n";}

The special variable $_

There are some variables which have special meanings in Perl. The most commonly used special variable is $_. It contains the “default iterator variable” in a foreach loop if no other variable is supplied. In this case you can either type $_ or leave it out. For example, in the example above, you may leave out the iteration variable “$a”. Perl will then use $_, which is “$a” by default:

@food = qw/ pancake  taco soup/ ;
@meal = ('breakfast', 'lunch', 'dinner');
$i=0;

foreach (@food) {
       print "We have ";
       print ;
       print " for $meal[$i] \n";
       $i+=1;
}

Here both foreach and the second print use $_, which is “$a” by default. The output will be the same as above.

File Input-Output

Perl makes file input and output extremely easy. We use the open command to open a filestream and then “read” from and “write” to it. Then once we are done, we use the close command to close the file.

The syntax for opening a file is

  • In read-only mode: open(FILEHANDLE,"<filename"); or open(FILEHANDLE,"filename");
  • In writing mode: open(FILEHANDLE,">filename");
  • To append to a file: open(FILEHANDLE,">>filename");

All these commands open the file filename, which is located on your disk, and associate a filehandle FILEHANDLE with the file. A filehandle, usually all caps, is a structure that associates a file with a name.

As an example, consider the following code:

#!perl

#Part 1
$myFile="./data1.txt";
$outFile="./data2.txt";
open(FILE,"<$myFile") || die "cannot open file $myFile!";
open(OUTFILE,">$outFile") || die "cannot open file!";

#Part 2
while( $line = <FILE> )  # read one line at a time until the end of file
{
print OUTFILE $line;
print $line;
}

#Part 3
close(OUTFILE);
close(FILE);

This program will first open a file, named “data1.txt”, to read and a file, named “data2.txt”, to write to. The die command (followed by a message) will halt the program if it fails to open the file, for example, if the file “data1.txt” does not exist in the current working directory. It then copies the file $myFile to $outFile. Finally, it closes both files.

Another example:

#!perl

open FILE, ">data3.txt";    #opens a file to be written to
while(<>){                         #while we are getting input from the keyboard
print FILE $_;                     #write it to the file
}
close FILE;                         #closes the file.

You can end the input from keyboard by Ctrl+D.

Note that > will create a new file, named “data3.txt”: it will open a new file and write data into it. If the file had already existed it would have removed the whole existing data and just put in data you just wrote. To prevent this, you would need to open file in >> mode.

Regular Expressions

A regular expression (regex) is a pattern that can be used to match a string against and possibly substitute it by another pattern. For example, we may need to search a file for some pattern (e.g. a particular word) and then replace it with something else (e.g. another word).

Two main regex operators within Perl are: match (//) and substitute (s///).

The Match Operator is used to match a string or statement to a regex. For example, to match the regex “green” against the default $_ = "The tree is green", we write the following code:

#!/usr/bin/perl

$_ = "The tree is green";

if(/green/){
   print "Found green!\n";
}

The above code checks if “green” appears in the default string $_. If it appears, then, the expression in the if-statement returns true, otherwise it returns false. Hence the above code will print "Found green!", because there is a “green” in the string $_. Note that the two forward slashes are the delimiters of the regex (just as single-quotes or double-quotes are delimiters of regular strings).

Matching against the default variable $_ is not the only way to use regex in Perl. We can also use the binding operator =~ to match against the string on the left.

$str = 'The tree is green';
if($str =~ /green/){
   print "Found green!\n";
}

On the left-hand side of the =~ operator there is a string. On the right-hand side there is a regex (which is “green”). This code would also print "Found green!".

Some useful characters:

  • . matches any single character except newline. For example, the regex /c.t/ will match any string with ‘c’ followed by any character, followed by ‘t’. It will hence match e.g. “cat”, “cut”, “c t”, and “c.t”.
  • * matches zero or more occurrences of preceding expression. For example, in the pattern /xy*z/ the x and the z are required, but the y can appear any number of times including not at all. This pattern would match e.g. xz, xyz, xyyz, xyyyyyyyyyyyyyyyyz, etc.
  • + matches one or more occurrence of preceding expression. For example /A+/ matches A, AA, etc.
  • {n} matches exactly n number of occurrences of preceding expression.
  • Parenthesis () is used to search for an item longer than one character. For example, /(OMG)+/ would match OMGOMGOMG while /OMG+/ would match OMGGGGGGGGG.
  • i make the match case-insensitive. For example, /(OMG)+/i would also match oMgomGomg.
  • g stands for “global” and tells Perl to replace all matches, and not just the first one.
  • \b ensures that you match only the whole word. For example, /\bOMG\b/ would match only OMG and not TOMG.

There are more of these that you can find online.

The Substitution Operator allows you to replace the text matched with some new text. You can do this for the default or by using binding:

$_ = "I have a cat on the mat.\n";
s/cat/CAT/;
print ;

This will print I have a CAT on the mat..

$str = "Sja sjosjuka sjoman skottes av sju skona
sjukskoterskor pa det sjankande skeppet Shanghai.\n";
$str =~ s/sja/sju/ig;
print "$str\n";

This will print sju sjosjuka sjoman skottes av sju skona sjukskoterskor pa det sjunkande skeppet Shanghai..

A fun break: The sentence above is a Swedish tongue-twister:

Sju sjösjuka sjömän sköttes av sju sköna sjuksköterskor på det sjunkande skeppet Shanghai.

Seven seasick sailors were nursed by seven beautiful nurses on the sinking ship of Shanghai.

It is used by Swedes to make someone who is learning Swedish as a second language feel miserable and give up pronouncing some of difficult Swedish words. See the following vido. It is perhaps more fun than Perl.