==================================== Perl ==================================== What is Perl? ++++++++++++++++++++++++++ Perl is a powerful and easy-to-use scripting language, that is, it supports "scripts", which are programs written for automating the execution of multiple tasks (e.g. running numerical convergence tests) that could alternatively be executed one-by-one by a human operator. This is opposed to programming languages, such as Fortran, which are used for computation, not for processing text or interacting with the shell. Over recent years, Perl has evolved to a general-purpose programming language used for a wide range of tasks such as web development, network programming, GUI development, and more. This section is intended to give a quick overview of Perl, necessary and relevant to the focus of this course. Much more information can be found on the internet; see e.g. `perldoc`__ and `perltoc`__ and `tutorialspoint`__. Before we start using Perl, make sure you have Perl. In a shell terminal type: .. code-block:: none $ which perl /usr/bin/perl $ perl --version This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level Good! I have Perl 5 in the path `/usr/bin/` which is already in my search path (`PATH`__). Perl scripts ++++++++++++++++++++++++ A perl script is a text file with the extension ".pl". Create a text file, named ``hello.pl``, with the content: .. code-block:: none #!/usr/bin/perl print "Hello World!\n"; The first line (starting with a shebang character ``#!``) enforces using Perl 5 in `/usr/bin/` (you may need to modify depending on the location of Perl in your computer). Alternatively, if ``Perl`` is in your search path (which is supposed to be), you could also write .. code-block:: none #!perl print "Hello World!\n"; To "run the script" you would simply go to a shell and type ``perl hello.pl``. Scalar data +++++++++++++++++++++++ A scalar is a single unit of data. It is either a `number` or a `string`: * **Number literals** can be integers (e.g. ``6`` or ``123``) or floating point numbers (e.g. ``-1.23e-4`` or ``1.2``). * **String literals** are sequences of characters (e.g. ``'a1'`` or ``"Hellow"``). They are usually alphanumeric values delimited by either single (') or double (") quotes. * `Single quotes`: A single quote string literal is just a collection of characters, e.g. ``'a1'`` or ``'Hellow'``. They also support two special characters ``\'`` and ``\\``. * `Double quotes`: A double quote string literal allows variables interpolation and supports ``\`` (backslash) escape characters, such as ``\n`` (newline), ``\t`` (tab), ``\u`` (forces next character to uppercase), ``\l`` (forces next character to lowercase), ``\U`` (forces all following characters to uppercase), ``\L`` (forces all following characters to lowercase), ``\E`` (ends ``\U`` and ``\L``). For example, generate a file, named ``test1.pl``: .. code-block:: none #!perl print 'Abc' . "\n" ; print 'Abc\'s' . "\n" ; print 'Abc\\' . "\n" ; print "\n"; print "A\tB\n\LCDEF\E\n123\n"; And then in the shel: .. code-block:: none $ perl test1.pl Abc Abc's Abc\ A B cdef 123 Note that ``.`` concatenates two strings; see "miscellaneous operators" below. List data +++++++++++++++++++++++ A list is an ordered set of scalars. For example ``(1,2,3)`` is a list of three numbers, ``('a','b','c')`` is list of three strings, ``('Hello')`` is a list of one string, and ``( )`` is an empty list. Scalar variables +++++++++++++++++++++++ A scalar variable stores a single scalar data (a number or a string) and hence reserves some space in memory. A scalar variable starts with a ``$`` sign. For example consider ``$a = 1;`` written in a Perl script. Here, ``1`` is a scalar number and ``$a`` is a scalar variable that holds value 1 (or stores number 1). As another example, ``$name='Mohammad';`` stores the string ``'Mohammad'`` in the scalar variable ``$name``. Note that variable names are case sensitive. This means that it is OK if, for instance, we write ``$ab=1`` and ``$AB=2`` in the same script. Array variables +++++++++++++++++++++++ An array is a variable that stores an ordered list of scalar values. An array variable starts with a ``@`` sign. To refer to a single element of an array variable, which is a scalar variable, we use the ``$`` sign with the variable name and followed by the index of the element in square brackets ``[]``. Indexation of elements starts with 0. For example, generate a file, named ``test2.pl``: .. code-block:: none #!perl @ages = (20, 22, 25); @names = ("Dan", "Maria", "Sanju"); print "\$ages[0] = $ages[0]\n"; #or equivalently "\$ages[0] = " . $ages[0] . "\n"; print "\$ages[1] = $ages[1]\n"; print "\$ages[2] = $ages[2]\n"; print "\$names[0] = $names[0]\n"; print "\$names[1] = $names[1]\n"; print "\$names[2] = $names[2]\n"; print "\n"; print "$names[0] is $ages[0] years old\n"; print "$names[1] is $ages[1] years old\n"; print "$names[2] is $ages[2] years old\n"; We use ``\`` before ``$`` just to print its name, not its value. When executed (type ``perl test2.pl`` in a shell), this will produce the following result: .. code-block:: none $ages[0] = 20 $ages[1] = 22 $ages[2] = 25 $names[0] = Dan $names[1] = Maria $names[2] = Sanju Dan is 20 years old Maria is 22 years old Sanju is 25 years old **Remark:** (lists vs. arrays) One of the most common sources of confusion is the difference between `lists` and `arrays`. Consider ``@vec = (1,2,3)``. In this example, the thing on the right-hand side of ``=`` is a list. We assign that list to the the variable ``@vec``. That variable, which begins with the ``@`` sign, is an array. Therefore, a list can be assigned to an array. Moreover, arrays can have names (starting with ``@``), but lists cannot. Perl operators ++++++++++++++++++++++++ Perl language supports many operator types. We will review four most frequently used operators. **Arithmetic operators** include addition (``+``), subtraction (``-``), multiplication (``*``), division (``/``), and exponentiation (``**``). In Perl all operations with numbers are performed using double precision. For example let ``$a = 10`` and ``$b = 2``. Then ``$a + $b`` will give 12 and ``$a ** $b`` will give 100. **Assignment operators:** * ``=`` assigns values from right side operand to left side operand * ``+=`` e.g. ``$b += $a`` is equivalent to ``$b = $b + $a`` * ``-=`` e.g. ``$b-= $a`` is equivalent to ``$b = $b - $a`` * ``*=`` e.g. ``$b *= $a`` is equivalent to ``$b = $b * $a`` * ``/=`` e.g. ``$b/= $a`` is equivalent to ``$b = $b / $a`` * ``**=`` e.g. ``$b **= $a`` is equivalent to ``$b = $b ** $a`` **Relational operators** are divided into two categories: * Numeric relational operators (``==``, ``!=``, ``<``, ``>``, ``<=``, ``>=``) * String relational operators (``eq``, ``ne``, ``lt``, ``gt``, ``le``, ``ge``) Example: suppose ``$a=10``, ``$b=20``, ``$c="xyz"``, ``$d="XYZ"``. Then ``($a == $b)`` is not true, and ``($c ne $d)`` is true. **Miscellaneous Operators:** * ``.`` (concatenation) concatenates two strings; see the examples above. * ``x`` (repetition) returns a string consisting of the left operand repeated the number of times specified by the right operand. For example ``('+' x 5)`` will give ``+++++``. * ``..`` (range) returns a list of values counting (up by ones) from the left value to the right value. For example, ``(4..9)`` will give ``(4, 5, 6, 7, 8, 9)``. * ``++`` (increment) increases integer value by one. For example, if ``$a=7``, then ``$a++`` is 8. * ``--`` (decrement) decreases integer value by one. For example, if ``$a=7``, then ``$a--`` is 6. Loops ++++++++++++++++++++++++++++++ The most useful loops in Perl are ``while``, ``for``, and ``foreach`` loops. The ``while`` and ``for`` loops in Perl behave pretty much like most other languages. The syntax of a ``while`` loop is ``while(condition) {statements;}``. The syntax of a ``for`` loop is ``for ( init; condition; increment ){statements;}``. .. code-block:: none $n = 5; $fact = 1; $i = 1; while ($i <= $n ) { $fact *= $i; $i += 1; } print "$n! = $fact \n"; This will display ``5! = 120`` in terminal window. .. code-block:: none for ($i = 1; $i <= 10; $i += 1) { print "$i "; } print "\n"; This will display ``1 2 3 4 5 6 7 8 9 10`` in terminal window. The ``foreach`` loop iterates over a normal list value (assigned to an array variable) by setting the iteration variable to be each element of the list in turn. The syntax for a ``foreach`` loop is ``foreach $i (list) {statements;}``. .. code-block:: none @food = qw/ pancake taco soup/ ; # qw is the quote word operator @meal = ('breakfast', 'lunch', 'dinner'); $i=0; foreach $a (@food) { print "We have $a for $meal[$i] \n"; $i+=1; } This will display .. code-block:: none We have pancake for breakfast We have taco for lunch We have soup for dinner **Nested loops:** A loop can be nested inside another loop. For example the syntax for a "nested for loop" is .. code-block:: none for ( init; condition; increment ){ for ( init; condition; increment ){statements;} statements; } Conditionals ++++++++++++++++++++++++++++++ The basic structure of the ``if-elsif-else`` statement is shown in the following simple example. .. code-block:: none if (1==2) {print "1=2\n";} elsif (1==3) {print "1=3\n";} else {print "I found out that 1 is not equal to 2 or 3! \n";} The special variable $_ +++++++++++++++++++++++++++++ There are some variables which have special meanings in Perl. The most commonly used special variable is ``$_``. It contains the "default iterator variable" in a ``foreach`` loop if no other variable is supplied. In this case you can either type ``$_`` or leave it out. For example, in the example above, you may leave out the iteration variable "$a". Perl will then use ``$_``, which is "$a" by default: .. code-block:: none @food = qw/ pancake taco soup/ ; @meal = ('breakfast', 'lunch', 'dinner'); $i=0; foreach (@food) { print "We have "; print ; print " for $meal[$i] \n"; $i+=1; } Here both ``foreach`` and the second ``print`` use ``$_``, which is "$a" by default. The output will be the same as above. File Input-Output +++++++++++++++++++++++++++ Perl makes file input and output extremely easy. We use the ``open`` command to open a filestream and then "read" from and "write" to it. Then once we are done, we use the ``close`` command to close the file. The syntax for opening a file is * In read-only mode: ``open(FILEHANDLE,"filename");`` * To append to a file: ``open(FILEHANDLE,">>filename");`` All these commands open the file `filename`, which is located on your disk, and associate a filehandle `FILEHANDLE` with the file. A filehandle, usually all caps, is a structure that associates a file with a name. As an example, consider the following code: .. code-block:: none #!perl #Part 1 $myFile="./data1.txt"; $outFile="./data2.txt"; open(FILE,"<$myFile") || die "cannot open file $myFile!"; open(OUTFILE,">$outFile") || die "cannot open file!"; #Part 2 while( $line = ) # read one line at a time until the end of file { print OUTFILE $line; print $line; } #Part 3 close(OUTFILE); close(FILE); This program will first open a file, named "data1.txt", to read and a file, named "data2.txt", to write to. The ``die`` command (followed by a message) will halt the program if it fails to open the file, for example, if the file "data1.txt" does not exist in the current working directory. It then copies the file $myFile to $outFile. Finally, it closes both files. Another example: .. code-block:: none #!perl open FILE, ">data3.txt"; #opens a file to be written to while(<>){ #while we are getting input from the keyboard print FILE $_; #write it to the file } close FILE; #closes the file. You can end the input from keyboard by ``Ctrl+D``. Note that ``>`` will create a new file, named "data3.txt": it will open a new file and write data into it. If the file had already existed it would have removed the whole existing data and just put in data you just wrote. To prevent this, you would need to open file in ``>>`` mode. Regular Expressions ++++++++++++++++++++++++++++++++++++++++ A regular expression (regex) is a pattern that can be used to match a string against and possibly substitute it by another pattern. For example, we may need to search a file for some pattern (e.g. a particular word) and then replace it with something else (e.g. another word). Two main regex operators within Perl are: **match** (``//``) and **substitute** (``s///``). **The Match Operator** is used to match a string or statement to a regex. For example, to match the regex "green" against the default ``$_ = "The tree is green"``, we write the following code: .. code-block:: none #!/usr/bin/perl $_ = "The tree is green"; if(/green/){ print "Found green!\n"; } The above code checks if "green" appears in the default string ``$_``. If it appears, then, the expression in the if-statement returns true, otherwise it returns false. Hence the above code will print ``"Found green!"``, because there is a "green" in the string ``$_``. Note that the two forward slashes are the delimiters of the regex (just as single-quotes or double-quotes are delimiters of regular strings). Matching against the default variable ``$_`` is not the only way to use regex in Perl. We can also use the binding operator ``=~`` to match against the string on the left. .. code-block:: none $str = 'The tree is green'; if($str =~ /green/){ print "Found green!\n"; } On the left-hand side of the ``=~`` operator there is a string. On the right-hand side there is a regex (which is "green"). This code would also print ``"Found green!"``. **Some useful characters:** * ``.`` matches any single character except newline. For example, the regex ``/c.t/`` will match any string with 'c' followed by any character, followed by 't'. It will hence match e.g. "cat", "cut", "c t", and "c.t". * ``*`` matches zero or more occurrences of preceding expression. For example, in the pattern ``/xy*z/`` the ``x`` and the ``z`` are required, but the ``y`` can appear any number of times including not at all. This pattern would match e.g. ``xz``, ``xyz``, ``xyyz``, ``xyyyyyyyyyyyyyyyyz``, etc. * ``+`` matches one or more occurrence of preceding expression. For example ``/A+/`` matches ``A``, ``AA``, etc. * ``{n}`` matches exactly n number of occurrences of preceding expression. * Parenthesis ``()`` is used to search for an item longer than one character. For example, ``/(OMG)+/`` would match ``OMGOMGOMG`` while ``/OMG+/`` would match ``OMGGGGGGGGG``. * ``i`` make the match case-insensitive. For example, ``/(OMG)+/i`` would also match ``oMgomGomg``. * ``g`` stands for "global" and tells Perl to replace all matches, and not just the first one. * ``\b`` ensures that you match only the whole word. For example, ``/\bOMG\b/`` would match only ``OMG`` and not ``TOMG``. There are more of these that you can find online. **The Substitution Operator** allows you to replace the text matched with some new text. You can do this for the default or by using binding: .. code-block:: none $_ = "I have a cat on the mat.\n"; s/cat/CAT/; print ; This will print ``I have a CAT on the mat.``. .. code-block:: none $str = "Sja sjosjuka sjoman skottes av sju skona sjukskoterskor pa det sjankande skeppet Shanghai.\n"; $str =~ s/sja/sju/ig; print "$str\n"; This will print ``sju sjosjuka sjoman skottes av sju skona sjukskoterskor pa det sjunkande skeppet Shanghai.``. **A fun break**: The sentence above is a Swedish tongue-twister: .. code-block:: none Sju sjösjuka sjömän sköttes av sju sköna sjuksköterskor på det sjunkande skeppet Shanghai. Seven seasick sailors were nursed by seven beautiful nurses on the sinking ship of Shanghai. It is used by Swedes to make someone who is learning Swedish as a second language feel miserable and give up pronouncing some of difficult Swedish words. See the following vido. It is perhaps more fun than Perl. .. raw:: html __ http://perldoc.perl.org/index.html __ https://perldoc.pl/5.005/perltoc __ http://www.tutorialspoint.com/perl __ https://math.unm.edu/~motamed/Teaching/Fall20/HPSC/unix.html#path-and-the-search-path