Saturday 30 May 2009

PHP, Code Igniter, PEAR, command line and daemons [geek alert]

Oh hear ye mortals who pass through this barren land . . . I'm still alive and kicking, and after half a year ready to fill up the blog with a little bit more content.
In this post and probably the next few I will give the computer geek inside myself free reign and delve into programming lore — which may seem utterly boring and/or incomprehensible to you if you aren't very technically minded. If the description applies to you, take my apologies, just be warned and steer clear of posts with "[geek alert]" in the title. Everything has got its time and place: rest assured that I'll write about other stuff as well.
So, with that out of the way — over to you, geek!

I haven't got any one message or motivation for this post, but rather thought I'd just share some of the fun stuff I have learned lately concerning programming. To cite Tom Lehrer: This may prove useful, some day, to some of you, under a somewhat bizarre set of circumstances . . .

Let me start by mentioning, as a background, several programming languages that I have come to appreciate through the years and use extensively today:
  • Pascal, which I first learned in highschool, is still my compiled language of choice (nowadays in the form of the excellent FreePascal 2, which includes powerful object-oriented programming, GUI capabilities and basically everything one would expect of a modern programming language).
  • Scripts of the bash shell with related utilities are a great tool for all kinds of computer "chores" like finding, batch altering and sorting files. I don't use it for major projects, but it's part of Linux' attraction on me and no Windows computer I regularly work on stays without an installation of bash and its ilk from the Cygwin suite for long.
  • Matlab is what I nowadays do 99.9% of my scientific computer work on — calculations, data processing, statistics and plotting. I'll probably spend another post on it one of these days.
  • PHP — the main topic of this post.
I started learning PHP through participation in a nice web-based bibliography project, Aigaion. As we were discussing the transition to an all-new version 2 about two years ago, we got to know the concept of MVC (model-view-controller) frameworks and after a while decided for the simple-yet-powerful framework CodeIgniter (aka CI).
We liked that CI implemented the MVC concept quite flexibly, not forcing a strict "architecture" upon the application programmer as some others did. The only thing one positively needs in CI is a "Controller" class for each category or "sub-application" (say, "blog") with an entry function for each action (e.g. "write", "index", "delete"). That is the core of your program. Beyond that, it is your free choice whether to use a "Model" class for talking to the database (provided that your app uses one) or if you call the DB from your controller. Likewise, while I think that a "View" component (typically an HTML page with just some embedded PHP for including the dynamically generated data) is a very good idea, you are by no means forced to use one.
Then, CI includes lots of libraries and "Helper" components to make it easy to program many common functions, among others components for communicating with databases — with an optional "Active Record" interface —, for internationalisation, also one working transparently in the background and "sanitising" input from the web so that the application is reasonably immune to common security risks like SQL-code injection.
As to databases, some people swear by Active Record patterns, some hate it and want to write their own SQL code... I for one appreciate it because I like to keep code in one language rather free of strings in secondary languages and also for instance because the commands for inserting data simply take associative arrays of data, without me having to loop through the fields and build a lengthy query string myself. Yeah, talk about comfort!

As cool as CI is, I started wondering a few weeks ago if it would be feasible to take it from its use for web applications, and use it in non-web programs/scripts (so that I for instance could continue using the database library). It is clearly written for the former, as it uses the web address to determine what command to run (e.g. http://your.site.com/blog/index calls function index in the Blog controller). But after some googling, I found a very nice solution: The web-address part used by CI is usually found in $_SERVER['REQUEST_URI'] — so as long as you put something of the form "controller/method" into that variable, you're good! Command-line parameters are found in PHP's $argv array (with $argv[0] being the name of the script); you can for instance assign the content of $argv[1] to the REQUEST_URI field, or generate some valid string inside your PHP file depending on different command-line parameters.

Which brings me to my next finding: parsing of the command line. From earlier bash days, I knew "getopts" to work down optional-parameter lists. It works, though not extremely flexibly, and PHP's equivalent "getopt" is even more limited . . . Then I started looking at PEAR, the PHP Extension and Application Repository, and there found the Console_CommandLine package, which allows you to define really powerful sets of command-line options and even automatically builds a help page listing all your options and parameters!

A little while later I wanted to execute an operation (creating a plot of the state of a running process in the lab) about every 30th second, i.e. more often than cron allows. The plotting is currently done upon user request within a PHP/CI web app (using Ploticus), but takes so much time that I'd like the graphs to be prepared in the background. With cron out of the question, a continually looping script is the natural alternative coming to mind — but I wouldn't want to start it by hand and have it rely on some terminal staying open all the time. Is it possible to run a PHP script as a daemon (roughly "service" in Windows parlance)? It turns out, yes, also that is quite possible!

Last but not least, I wanted to be able to have mathematical transformations saved as strings e.g. in a config file or database and have PHP interpret the formulas on the fly. For kicks I chose not to rely on PHP's eval() but to write my own parser. Parsing infix notation (such as 3+4*2) and interpreting it right away quickly proves to be fiendishly difficult, what with parantheses, the different operator precedences (* and / overrule + and -, etc) and operator associativity (+,-,*,/ are left associative, i.e. 2*3*4 = (2*3)*4, while the exponent ^ is right associative: 2^3^4 = 2^(3^4)). You and the program have to keep track of a lot . . . Then I remembered having heard about RPN (Reverse Polish Notation) in university, a postfix notation (3+4*2 would translate to 3 4 + 2 *, for instance) which circumvents the complexity of infix interpretation and makes do without any brackets at all, and thankfully the wise Wikipedia yielded a very good description of an algorithm for converting infix into postfix, by none less than Edsger Dijkstra, an influential early computer scientist (whose algorithm for calculating the shortest way I also had implemented once upon a time during my studies). Implementing that Shunting-yard algorithm and the RPN calculator was fun and went surprisingly smoothly, so in the end I added several (backward-compatible) bonus functionalities to my code:
  • Variable data: Give the RPN an associative array with numeric data as a second argument and you can digest that data within the calculation.
  • Unary minus: All standard RPN operators are binary, so to get -1, as far as I know, you would have to write 0 1 -. So in my RPN calculator I added the unary minus (negation) operator "~", and the infix-to-RPN converter detects lonely minuses in the beginning of the expression, after opening parentheses as well as in function arguments (like in "min(0, -1)").
  • Variable number of function arguments: The original conversion algorithm already recognised functions like "sin(a)" or "min(2,3)", but there was no way of telling the RPN calculator how many to expect. For standard functions like sin, cos, sqrt etc. that would be a known 1 and no problem, but I'd like to allow syntax like "min(a,b,c)" or "min(a)" (with an array). The code supports that now; in RPN, that looks like, for instance, "a b c {3} min".
  • Array calculations: Since I work a lot with numbers in arrays (measurement data), I had already developed functions for various operations on arrays, and the mathematical ones I now have the parsers understand, in Matlab's notation with a dot prepended to the respective operator: .+ .- .* ./ and .^ (power, ^ for non-array power is also understood).

No comments: