Technology for CMS building
Earlier we looked at how changing demands on websites occurred alongside innovation in technology, and particularly mentioned the arrival of scripting languages. Of these, JavaScript is the most popular at present, but for server-side scripting the favorite is PHP. With version 5, PHP reached a new level. The most significant changes are in the object-oriented features. These were thought to be a kind of "extra" when they were introduced into version 4. But extensive and enthusiastic use of these features to build object-oriented applications has led to PHP5 being built with a much better range of class and object capabilities. This provides the opportunity to adopt a much more thoroughgoing object orientation in the building of a new CMS framework. Strangely, despite all the talk of "Internet years" and rapid change, the move to PHP5 has been extremely slow, taking about five years from first release to widespread deployment.
Software developers can argue at length about the relative merits of different languages, but there is no doubt that PHP has established itself as a very popular tool for creating active websites. Two factors stand out, one of which applies to PHP generally, the other specifically to PHP5.
The general consideration is the ongoing attempt to separate the creation of views (which in practice means creating XHTML) from the problem-oriented logic. More generally, the aim is to split development into the MVC model model, view, and controller. While some have seen a need to create templating systems to achieve this, such systems have always been questionable on the grounds that PHP itself contains the necessary features for handling XHTML in a sound way. The trend recently has been to see templating systems as an unnecessary overhead. Indeed, one developer of a templating system has written to say that he now considers such systems undesirable. So a significant advantage of using PHP is the ability to handle XHTML neatly. There still remain plenty of unsolved problems in this area, notably the viability of widget libraries and the issue of how to support easy customization. Despite those problems, PHP offers powerful mechanisms for dealing with XHTML, briefly illustrated in a later section.
The specific advantage of PHP5 is its greatly improved provisions for classes and objects. Many experienced developers take the view that object principles lead to more flexible systems and better quality code. Of course, this does not happen automatically. Knowledge and skill are still required. More detailed comments about object-oriented development are made in a later section.
Note
After I had left the Mambo development team and decided to create a radically changed CMS to evolve out of the Mambo history, it was a major commitment of development effort. Given the huge advantage of PHP5 through its radically improved handling of classes and objects, it would have seemed foolish to commit so much effort to an obsolescent system. Because object orientation enables such radical improvements to the design of a CMS framework, it seemed to me that the logical conclusion was to work in PHP5 and wait for the world to catch up. It is now easy to find PHP5 hosting, and most developers have either migrated or are currently making the transition.
Before we go into specifics in later chapters, there are some general points about PHP that apply everywhere. There is scope for varying opinions in programming practice, so it has to be said that these are only my opinions and others may well disagree. But they do affect the way in which the code examples are written, so mentioning them may aid understanding. Much more could be said; the following comments are a selection of what seem the most important considerations for sound use of PHP. Other points will become apparent through the rest of the book.
PHP will not fail if variables are uninitialized, as it will assume that they are null and will issue a notice to tell you about it. Sometimes, PHP software is run with warnings and notices suppressed. This is not a good way to work. It hardly requires any more effort to write code so that variables are always initialized before use. The same applies to all other situations that give rise to notices or warnings, which can be easily avoided. Often, quite serious logical errors can be picked up by seeing a notice or warning. The error may not make the code fail in an obvious way, but nonetheless something may be going badly wrong. A low-level error is frequently an important sign of a problem. It is therefore best to make sure that you find out about every level of error.
Declarations are powerful, and it pays to maximize their power. Classes can be declared as abstract
when they are not intended to be used on their own to create objects, but used only to build subclasses. Conversely, classes or methods that should not be subclassed can be declared final
. Methods can be declared as public, private
, or protected
and the most suitable options should always be explicitly chosen. Properties inside an object should be declared wherever possible, and like methods, their visibility should be stated. In line with the previous comments, it is a good idea to initialize every declared variable in a class with a reasonably safe value.
'Magic quotes' is a crude facility that should be avoided. It was introduced in the early days of PHP to put backslashes in front of quote marks so that strings could be used in ways such as storing in a database without further effort. But for other purposes, the escaping backslash is a nuisance (they are sometimes visible on web pages when they should not be) and it is anyway better to use database-specific routines for escaping "difficult" characters before storing them. Software that relies on magic quotes will fail on a server that has the option turned off, and the whole issue will be finally settled in PHP version 6, as it will then be totally withdrawn. Wherever possible, Aliro will strip out magic quotes, but this is less reliable than avoiding them in the first place.
I have mixed feelings about symbols. PHP allows a string to be defined as a symbol and given an equivalent value. Symbols are global in scope, which is a reason for disliking them. Another drawback is that they are much more costly than variables, probably because of the work involved in making them global. Once defined, they cannot be altered. This last point can be an advantage for security reasons. If some critical and widely used information can be set as a defined symbol (or constant) very early in the processing, it will generally be available and cannot be altered by any means. So my current view is that symbols should mostly be avoided, but should not be ignored altogether and have a valuable role in specific circumstances.
In line with those comments, it should not be necessary to make anything global using the PHP global
keyword or the $GLOBALS
super-global. Use of globals obscures the flow of information through the program and the explicit passing of parameters is nearly always preferred. Class names are automatically global, and as their name obviously implies, so are the PHP super-globals such as $_POST
.
There are many built-in functions in PHP, and because they are made with compiled code, they can operate much faster than PHP code. It is, therefore, worth getting to know about the function library, and using it in preference to writing code wherever this is logical and clear. The array handling functions are particularly extensive and powerful.
PHP provides an eval
function so that you can dynamically construct some PHP code and then have it executed within a program by invoking eval
. It is very rare for this to be unavoidable, and any use of eval
involving user input is risky. Mostly it is better to think of an alternative solution.
In general, I like to lay code out neatly, but do not believe that there is one particular set of detailed rules for layout that should be slavishly followed. Consistency and clarity are the main criteria, and the latter is liable to be subjective.
Although efficiency is important, I would not allow small differences in performance to determine code to the detriment of clarity. Sometimes code is written knowing that it is slightly less efficient, for the simple reason that it looks better and is therefore easier to grasp. Efficiency is achieved by good design and avoiding writing unnecessary code. The fastest code is the code that has been factored out of the design and no longer exists! For something like a CMS framework, my inclination is towards compactness. This may make code harder to understand at first glance. Provided the logic is directly related to the problem, though, I believe that it is easier to disentangle a short piece of code than a long one.
It is easy to have too many globals in PHP. Before we consider what to do, we should review the generally received opinion that globals are bad, particularly in relation to PHP.
If you just start writing PHP with no particular structure, then every variable that you use is global in scope. That is to say, if there are several separate PHP code files, each of which is PHP code outside of functions and classes, the variables used in the separate files will be the very same variables if they have the same name (assuming all the code is run for to handle a single request). Inside functions and classes, the global
keyword can be used to declare the scope of a variable global
.
There is a general opinion that globals are bad. This needs some qualification, but there are certainly some good reasons to look carefully at globals. Uncontrolled use of global variables is undoubtedly bad. The problem is that it becomes very difficult to isolate what sections of program code are doing, since their operation is liable to be affected by values that may have been altered in any one of a number of places. There is a large benefit in clarity when functions, or the preferred alternative, class methods operate in a way where the only variability comes from the data that is passed in as parameters.
Global variables are not the only kind of globals in PHP. One category has already been mentioned, that of symbols. But also mentioned was the fact that global symbols can have good features. Since they cannot be redefined, they are good candidates for holding critical information about the environment whose modification might compromise security. Still, the number of symbols should be kept small. A good number of symbols are automatically defined by PHP.
Another category is functions and classes. The real reason for using classes is to implement object-oriented design, but a supplementary reason for using them is because method names need to be unique only within the class, and are not global across the system. Thus, they have an advantage over functions.
Yet another is the set of PHP "super-globals" such as $_SERVER
and $_POST
. These are filled with values by PHP and they can be used anywhere in a program.
It has been pointed out that the data in a database is also a kind of global data, since there is, in general, no constraint on access to database tables. Observing this point, it should be starting to be clear that we cannot expect to eliminate globalness, and it may not even be a sensible goal.
What we really need are some guidelines as to how to constrain globalness. One consideration is that readable globals are a lot less damaging than those that are readable and writeable. This applies to symbols, for example. Ideally, they are set in a limited number of places, and thereafter are read only.
More generally, the solution is to have data guarded by classes that can reasonably be expected to "know" about the data. What we want to do is to avoid scattering data around our programs in an uncontrolled way. But anticipating the next chapter, we can imagine having a class that knows about all the classes available in a whole system, and knows how to load their code when required. It is reasonable to ask for this information at any point in the system.
Likewise, although it is technically possible to access database tables anywhere, it is undesirable for this to be done. Usually, it is better for a particular table to be managed entirely by either a single class, or possibly a small framework of classes. For example, the class that knows about all the system's classes might well be the only one that access the corresponding database table.
If we favor program structure, as I do, then we will have very little code that runs outside functions and classes, and so very little globalness will arise in that way. And it is hard to see any good reason for using the global
keyword to make information global.
So, our general conclusion is that we should exercise caution in the use of globals, and that where they are justified, they should be managed by appropriate classes, so that their modification takes place in a controlled way, and their use can be easily traced.
The crux of the argument for PHP5 is the radically improved object model. PHP4 had object capabilities tacked on almost as an afterthought, and the developers were surprised at the extent to which they were used. Their response to this turn of events was to review the object facilities and make them much better, without sacrificing too much backwards compatibility. Inevitably, though, it is impossible to take full advantage of the new model while writing code that will still run in PHP4.
But before getting into any details, we need to establish why object features matter. Arguments will no doubt continue to rage for a long time yet, so I simply state my views on the subject. Object techniques were first devised long ago, around the time the fundamental ideas of windows, icons, and mice came into being. All of these ideas took a long time to become mainstream, and it was the mid nineties before object orientation started to become widely accepted.
Before that, building objects had become fundamental to the creation of graphical user interfaces. The model of computer programming as something that had a main line of logic which controlled a variety of subsidiary processing sections no longer worked when the user could pick and choose from a variety of different possible actions. And the creation of complex displays needed to be built up out of simpler, reusable components.
Thinking about the example of the graphical user interface gives us an inkling of the essential nature of object orientation. It is an approach where the software is a model of aspects of the problem. This is easy enough to imagine in a user interface, where we know that there are windows, slide bars, buttons, and so on. There are different kinds of windows, but they all have something in common, and any particular kind of window is likely to be created multiple times with different data. In a CMS, the things that are being modeled are often more abstract, but the principles still apply. Software that models the problem is easier to understand and, if well written, is powerful and flexible.
Bearing this in mind, what I advocate is an approach to object orientation where the classes and objects come naturally from the problem being solved. Arbitrary rules for class building are best avoided in favor of letting the problem show through the code. And although patterns can be immensely valuable, they should also be treated with considerable caution.
Objects, patterns, and refactoring
For more than a decade, object design has had to take account of the idea of patterns. This is a sound principle, which says that real problems produce object-oriented solutions that often fall into one or another pattern. Common patterns can be documented, with guidance on how best they can be implemented.
Refactoring is a new name for an old practice that has frequently not been followed. It has always been true of software development that code is over valued. The hardest part of development is ironing out the inconsistencies and impossibilities in the original specification, and creating a good solution to the problem. As development proceeds, the problem becomes better understood, and the design ideas better developed. As this goes on, it makes good sense to throw away and rewrite code that is seen as unsatisfactory. It is often feasible to retain the best code for reuse. When this principle is applied in an object environment, an added twist is that the most important moves may be changes to the object model.
Despite the obvious benefits derived from thinking about patterns, I am always concerned to see them becoming too dominant in design. An idea that can be expressed in code is an algorithm, not a pattern. Unfortunately, the seminal book Design Patterns by Gamma, Helm, Johnson and Vlissides, often called the Gang of Four or GoF, includes a coded example for each pattern. These have often been closely followed, ignoring the GoF's caveat that they are intended only as illustrations, and not as models.
Patterns are meant to be at a higher level, where no single algorithm captures all the ways in which the pattern might be used. It therefore makes no sense to attempt a standard implementation of a pattern, nor is it sound design to force a problem into a pattern rather than think through the best solution. It has been observed that letting patterns drive development frequently results in excessively complex solutions that are inefficient and hard to understand.
Another problem is that patterns are often narrowly viewed as a standardized way to build objects in code. But a pattern does not have to be confined to guiding the PHP code, it can be used in a wider context. For example, a CMS may use XML documents for various purposes, notably for defining extensions. Sometimes a better solution can be achieved by using a pattern that embraces more than just the code. That way, both XML documents and PHP objects can be involved as actors in the pattern, which may result in a simpler, cleaner solution. Naturally, XML documents are not the only objects that could play a role in a broader application of patterns.
There are now a number of interesting web articles on the pitfalls of design patterns. A useful starting point is http://blog.jovan-s.com/2009/07/29/do-not-use-design-patterns-upfront/.
With those caveats, let us look briefly at some possible patterns that can help our project.
A number of well known patterns may be relevant for the construction of a CMS:
The Singleton pattern appears a number of times, especially for the handler objects described later. In theory, handlers could be implemented as class methods, but in PHP5 a class is not itself a first class object and it is more effective to use singleton objects. Other objects are naturally singletons, such as the session object, as PHP5 handles only a single request at a time. Some people object to singletons, but a defense of them is given later.
The Factory pattern is used to create objects when the circumstances alter what class is required. For example, a factory could be used to create the session object which will be of a different class depending on whether the session belongs to an administrator or an ordinary user.
The Observer (sometimes called subject-observer or publish-subscribe) pattern is implemented more as an architecture to handle plugins than as PHP5 classes. This is because the actual process of adding a plugin is an installation that affects the database more than the object structure.
The Command pattern is also used to handle plugins, since the external interface for plugins must be completely generalized. A Factory is used to create the correct plugin object to handle a particular request.
The Bridge and Memento patterns can also be used in the construction of plugins.
The object-relational compromise
In an ideal world, it would be possible to simply store objects in a database. Certainly, object databases do exist, and have important applications. But the vast majority of CMS designs still rely on a relational database. This kind of database is easily obtained and its strengths are well understood. Unfortunately, there is a mismatch between the principles of relational design and the character of many objects.
Compromises can be made in both areas. An example will make this clearer. In common with other systems, Aliro implements a system of plugins. A plugin is a piece of code that can be installed in the system to respond to some kind of "trigger". This allows the CMS to easily support added functionality in areas that can be defined in outline but not in detail. So we know that there is a need for user authentication, but there are alternative ways to achieve this. The CMS may provide as default an ID and password scheme. But we can imagine that the CMS might need to interface to an LDAP system, or any number of other possible authentication mechanisms. The authentication of a user is therefore handled by one or more plugins, triggered just at the point a user needs to be authenticated.
Now a single plugin might want to respond to several different triggers, with a different but related action for each. Pure relational analysis would require that if information about the plugin is held in a relational table, details of the triggers for each plugin would have to be in another table, since there may be more than one. In fact, Aliro makes a relational compromise here and stores the triggers for a plugin as a comma-separated list in the table of plugins. This design decision keeps the relational table structure simple, although impure.
It also implies something about the object mechanisms. It is messy to retrieve information from a relational table when the key is part of a list held as a single field. But Aliro does not do that. The number of plugins and the amount of data needed to describe them is small enough that there is a plugin handler, which knows about all the plugins and their triggers. The handler simply reads the entire database table describing plugins and builds internal data structures that make the actual operations on plugins as simple as possible. Because of this, a plugin object needs to know its triggers, but does not need to be stored in a relationally pure form. The handler is a singleton object that is normally stored in cache and only built infrequently.
The relational object compromise thus means that design is biased towards breaking relational rules in minor ways and giving some precedence to choosing objects that are simple to store. This compromise will be seen again as we build the CMS framework.
Basics of combining PHP and XHTML
Now we digress from the more or less arcane issues of object design into the practicalities of creating XHTML. Although I mentioned HTML in the historical comments early in this chapter, I am now assuming that all current development is implementing XHTML, preferably at the 1.0 Strict level. We need to consider how to generate XHTML using PHP scripts before we can adequately consider Model-View-Controller (MVC) architecture. There are several ways in which PHP can handle XHTML. It can, like pretty much any programming language, create XHTML as strings, in either single or double quotes. If double quotes are used, then PHP variables can be embedded in the string. While very few other languages possess the ability to flip between programming and XHTML, PHP will assume that it is processing XHTML (and simply pass it on to the web server) until it comes across the <?php
start tag. It then handles everything as program code until it reaches a ?>
end tag. This means that it is possible to write pure XHTML with occasional interruptions of PHP code, such as the following fragment:
As code like this builds up, I find all the clutter of PHP tags and the necessity for the 'echo' irritating and a barrier to comprehension. Also, it is often more effective and more flexible to build chunks of XHTML in memory before transmitting a substantial block to the browser. The other PHP alternative is one that I neglected for a long time, and which seems to be ignored by a lot of other developers. It is the PHP heredoc. It is a kind of string, but bounded by a name rather than any kind of quote mark. Here is a simple example:
Now we have the advantage that nothing terminates the XHTML text until we get to the concluding DETAIL_HTML
; so there is no need to worry about escaping quotes. The PHP tags have also disappeared, and the PHP variable $heading
is simply included within the XHTML. In PHP4, it was also possible to include an object property, such as $this->heading
. But PHP5 goes a lot further, and provided they are included in curly brackets, complex expressions that start with a $
sign can be written within heredoc, as follows:
Individual developers will make choices about how much to assign complex PHP into simple variables, and how much to use it directly within heredoc. Whatever the precise implementation, the result should be code that can be understood by a web designer without needing to know much at all about PHP.
Note
Note that although the printed layout of example code does not show it clearly, the terminating name, such as DETAIL_HTML
; for heredoc must be at the extreme left-hand side of a separate line with no whitespace before or after it.
Model, view, and controller
It has long been agreed that it is good to make a separation between the model and views of the model. The model is understood as the set of objects that emulates the problem being solved by a computer application. As an example, one class of objects that is likely to be useful in a CMS is the class of menu items. An individual menu item is likely to know its own title, as displayed to the user. It probably also knows something about how to invoke the service for which it is the menu entry. It is quite a simple class, with relatively few properties and a handful of methods (or behaviors). But we will want to view menu items in different ways, so that one view is where the item is part of a menu shown to the user in a browser window.
Another view is shown to the administrator who will be interested in controlling the appearance and function of the item, and perhaps who is permitted to see it. So it makes sense to keep model and view separate. Views may well change in different ways and at different times from the model.
The MVC pattern comes about because of a feeling that views should be further refined by separating out something called a controller. The view concentrates on presentation to the user, taking information from the model. The controller manages the situation by handling input from the user, organizing the required objects, and then supplying them to the appropriate view. This approach minimizes the amount of decision making required in the view, so that it can be written using simple code.
The phrase "simple code" is deliberately vague. This is the point at which advocates of templates jump in and claim that views should be created using a template system (popular examples include patTemplates and Smarty). The role of the controller then includes marshalling data and somehow transferring it into the template system. Templates themselves still require some conditional and looping capabilities, unless the number of templates and the extent of duplication are to grow uncontrollably. This is inevitable, since displays are often repetitive, and we would not want to have one template when there are five repeats and a different template when there are six, and so on. Neither do we want to have different templates to allow for optional sections.
Given this need for control at the template level, the main template systems have introduced their own syntax. Skepticism about the added value from template systems is growing. While the principle of keeping the display output (typically including a good deal of XHTML) free of complex logic is sound, the doubt is whether template systems provide the best way to do this. There is no good reason why the syntax adopted for templates should be any easier to understand than straightforward PHP. And given that in practice, the division between software developers and web designers is a fuzzy one, there is a clear advantage to using a single language throughout. This also eliminates an overhead in marshalling data, which is already available through direct access to the objects that model the problem domain.
My approach and that of Aliro to MVC is therefore to create a model of the problem that is developed purely to provide functionality that makes sense in the terms used to discuss the problem. It exists in a kind of abstracted computing world, devoid of user interfaces. Controllers then look at requests from users and make sure that the relevant parts of the problem are instantiated as objects. Those objects are then passed to view classes, which contain the simplest possible code needed to present data from the model to the user in the form required. Usually, the form is XHTML.