PHP 5 CMS Framework Development - 2nd Edition

Chapter 1. CMS Architecture

This chapter lays the groundwork that helps us to understand what Content Management Systems (CMS) are all about. First, it summarizes the whole idea of a CMS where it came from and what it looks like. This is followed by a review of the technology that is advocated here for CMS building. Next, we will take account of how the circumstances in which a CMS is deployed affect its design; some of the important environmental factors, including security, are considered. Finally, all these things are brought together in an overview of CMS architecture. Along the way, Aliro is introduced the CMS framework that is used for illustrating implementations throughout this book.

The idea of a CMS

Since you are reading this book, most likely you have already decided to build or use a CMS. But before we go into any detail, it is worth spending some time presenting a clear picture of where we are and how we got here. To be more precise, I will describe how I got here, in the expectation that at least some aspects of my experiences are quite typical.

The World Wide Web (WWW) is a huge set of interlinked documents built using a small group of simple protocols, originally put together by Tim Berners-Lee. Prominent among them was HTML, a simplified markup language. The protocols utilized the Internet with the immediate aim of sharing academic papers. The Web performed this useful function for some years while the Internet remained relatively closed, with access limited primarily to academics. As the Internet opened up during the nineties, early efforts at web pages were very simple. I started up a monthly magazine that reflected my involvement at the time with OS/2 and wrote the pages using a text editor. While writing a page, a tag was needed occasionally, but the work was simple, since for the most part the only tags used were headings and paragraphs, with the occasional bold or italic. With the addition of the odd graphic, perhaps including a repeating background, the result was perfectly presentable by the standards of the time.

But that was followed by a period in which competition between browsers was accompanied by radical development of complex HTML to create far higher standards of presentation. It became much harder for amateurs to create presentable websites, and people started to look for tools. One early success was the development of Lotus Notes as a CMS, by grafting HTML capability onto the existing document-handling features. While this was not a final solution, it certainly demonstrated some key features of CMS. One was the attempt to separate the skills of the web designer from the knowledge of the people who understood the content. Another was to take account of the fact that websites increasingly needed a way to organize large volumes of regularly changing material.

As HTML evolved, so did the servers and programs that delivered it. A significant evolutionary step was the introduction of server-side scripting languages, the most notable being PHP. They built on traditional "third generation" programming language concepts, but allied to special features designed for the creation of HTML for the Web. As they evolved, scripting languages acquired numerous features that are geared specifically to the web environment.

The next turning point was the appearance of complete systems designed to organize material, and present it in a slick way. In particular, open source systems offered website-building capabilities to people with little or no budget. That was exactly my situation a few years ago, as a consultant wanting a respectable website that could be easily maintained, but costing little or nothing to buy and run. A number of systems could lay claim to being ground breakers in this area, and I tried a few that seemed to me to not quite achieve a solution.

For me, the breakthrough came with Mambo 4.5. It installed in a few minutes, and already there was the framework of a complete website, with navigation and a few other useful capabilities. The vital feature was that it came with templates that made my plain text look good. By spending a small amount of money, it was possible to have a personalized template that looked professional, and then it took no special skills to insert articles of one kind or another. Mambo also included some simple publishing to support the workflow involved in the creation and publication of articles. Mambo and its grown up offspring Joomla! have become well-known features in the CMS world.

My own site relied on Mambo for a number of years, and I gradually became more and more involved with the software, eventually becoming leader of the Mambo development team for a critical period in the development of version 4.6. For various reasons, though, I finally departed from the Mambo organization and eventually wrote my own CMS framework, called Aliro. Extensions that I develop are usually capable of running on any of MiaCMS, Mambo, Joomla!, or Aliro. The Aliro system is used to provide all the code examples given here, and you can find a site that is running the exact software described in this book at http://packt.aliro.org.

Some people said of the first edition of this book that it was only about Aliro. In one sense that is true, but in another it is not. Something like a CMS consists of many parts, but they all need to integrate successfully. This makes it difficult to take one part from here, another from there, and hope to make them work together. And in order to give code examples that could be relied on to work, I was anxious to take them from a complete system. However, when creating Aliro I sought to question every single design decision and never do anything without considering alternatives. This book aims to explain the issues that were reviewed along the way, as well as the choices made. You may look at the same issues and make different choices, but I hope to help you in making your choices. I also hope that people will find that some of the ideas here can be applied in areas other than CMS frameworks.

From time to time, you will find mentions of backwards compatibility, mostly in relation to the code examples taken from Aliro. In this context, backwards compatibility should be understood to be features that have been put into Aliro so that software originally designed to run with Mambo (or its various descendants) can be used with relatively little modification in Aliro. The vast majority of the Aliro code is completely new, and no feature of older systems has been retained if it seriously restricts desirable features or requires serious compromise of sound design.

Critical CMS features

It might seem that we have now defined a CMS as a system for managing content on the Web. That would be to look backwards rather than forwards, though. In retrospect, it is apparent that one of the limitations of systems like Mambo is that their design is geared too heavily to handling documents. While every website has some pages of text, few are now confined to that. Even where text is primary, older systems are pushed to the limit by demands for more flexibility in who has access to what, and who can do what.

While the so called "core" Mambo system could be installed with useful functionality, an essential part of Mambo's success was the ability to add extensions. Outside the core development, numerous extra functions were created. The existence of this pool of added capabilities was vital to many users of Mambo. For many common requirements, there was an extension available off the shelf. For unusual cases, either the existing code could be customized or new code could be commissioned within the Mambo framework. The big advantages were the ability to impose overall styling and the existence of site-wide schemes for navigation and other basic services.

The outcome is that the systems have outgrown the CMS tag, as the world of the Web has become ever more interactive. Sites such as Amazon and eBay have inspired many other innovations where the website is far more than a compendium of articles. This is reflected in a trend for the CMS to migrate towards being a framework for the creation of web capabilities. Presentation of text, often with illustrations, is one important capability, but flexibility and extensibility are critical.

So what is left? As with computing, generally, new ideas are often implemented as islands. There is then pressure to integrate them. At the very least, the aim is to show users a single, rich interface, preferably with a common look and feel. The functionality is likely to be richer if the integration runs deeper than the top presentation level. For example, integration is excessively superficial if users have to authenticate themselves separately for different facilities in the same website. Ideally, the CMS framework would be able to take the best-of-breed applications and weave them together through commonly-agreed APIs, RESTful interfaces, and XML-RPC exchanges. Today's reality is far from this, and progress has been slow, but some integration is possible.

It should now be possible to create a list of essential requirements and another list of desirable features for a CMS. The essentials are:

Continuity: Despite the limitations of basic web protocols, many website functions need to retain information through a series of user interactions and the information must be protected from hijacking. The framework should handle this in a way that makes it easy for extensions to keep whatever data they need.
User management: The framework needs to provide the fundamentals for a system of controlling users via some form of authentication. But this needs to be flexible so that the least amount of code is installed to handle the requirement, which can range from a single administrative user to handling hundreds of thousands of distinct users and a variety of authentication systems.
Access control: Constraints are always required, if only to limit who can configure the website. Often much more is needed as various groups of users are allocated different privileges. It is now widely agreed that the best approach is the Role-Based Access Control (RBAC) system. This means that it is roles that are granted permissions, and accessors are allocated roles. It is preferable to think of accessors rather than users, since roles also need to be given to things other than just users, such as computer systems.
Extension management: A framework is useful if it can be easily extended. There is no single user visible facility that is essential to every website, so ideally the framework is stripped of all such functions. Each capability visible to users can then be added as an extension. When the requirements for building a website are considered, it turns out that there are several different kinds of extensions. One well known classification is into components, modules, plugins, and templates. These are explained in detail in Chapter 8,
Security and error handling: Everyone is aware of the tide of threats from spam to malicious cracking of websites. To be effective, security has to be built in from the start so that the framework not only achieves the best possible security, but also provides a helpful environment for building secure extensions. Errors are significant both as a usability problem and a potential security flaw, so a standard error handling mechanism is also required.

Desirable CMS features

Most people would not be content to stop with the list of critical features. Although they are the essentials, it is likely that more facilities will be needed in practice, especially if the creation of extensions is to be made easy. The list of desirable features certainly includes:

Efficient and maintainable code handling: The framework is likely to consist of a number of separate code files. It is essential that they be loaded when needed, and preferable that they are not loaded if not needed. The mechanisms used need to be capable of handling extra code files added as extensions.
Database interface: Many web applications need access to a database to be able to function efficiently. The framework itself needs a database to perform its own functions. While PHP provides an interface to various databases, there is much that can be done in a CMS framework to provide higher level functions to meet common requirements. These are needed both by the framework and by many extensions.
Caches: These are used in many different contexts for Internet processing. To date, the two most productive areas have been object and XHTML caching. Both the speed of operation and the processing load benefit considerably from well implemented caches. So it is highly desirable for a CMS framework to provide suitable mechanisms that are lightweight and easy to use.
Menus: These are a common feature of websites, especially when taken in the widest sense to include such things as navigation bars and other ways to present what are essentially lists of links. It is not desirable for the framework to create final XHTML because that preempts decisions about presentation that should belong to templates or other extensions. But it is desirable for the framework to provide the logic for creating and managing menus, including a standard interface to extensions for menu creation. The framework should also provide menu data in a way that makes it easy to create a menu display.
Languages: Nowadays, as a minimum, software development should take account of the requirements imposed by implementation in different languages, including those that need multi-byte characters. It is now broadly agreed that part of the solution to this requirement is the use of UTF-8. A mechanism to allow fixed text to be translated is highly desirable. The bundle of issues raised by demands for language support are usually described using the terms internationalization and localization. The first is the building of capabilities into a system to support different ways of doing things, of which the most prominent is choice of language. Localization is the deployment of specific local characteristics into a system that has been internationalized. Apart from language itself, matters to be considered include the presentation of dates, times, monetary amounts, and numbers.

Many other services are useful, such as handling the sending of e-mails, assistance in the creation of XHTML, insulating applications from the file system, and so on. But before considering an approach to implementation, there is an important matter of how a CMS is to be managed.

System management

In this discussion of system management, it is assumed that a web interface is provided. The person in control of a site, typically called the manager or administrator, is often in the same situation as the user of the site. That is to say, the site itself is installed on a hosted web server distant from both its users and its managers. A logical response to this scenario is to implement all interactions with the site through web interfaces.

There are disagreements about how much, if any, system management should be kept apart from user access. One school of thought requires a distinct management login using a slightly different URI. Opposing this is the view that everything should be done from the same starting point, but allowing different facilities according to the identity of the user. Drupal is the best known example of the latter approach, while Mambo and Joomla! keep the administrator separate. Aliro continues along the path trodden by Mambo and Joomla!

There is some justification for the idea that everything should be merged, with no distinct administrator area. As the CMS grows in sophistication, user groups proliferate; the distinction between an administrator and a privileged user is hard to sustain. Typically, visitors may be given quite a lot of read access to site material, but constrained write access, mainly because of misuse problems. But users who have identified themselves to the site may be given quite extensive capabilities. These might extend to having areas of the site where they are able to publish their own material. The registered user can thus become an administrator of his/her own material, needing similar facilities to a site administrator.

The argument in favor of splitting off some administrative functions is largely to do with security. Someone at the highest administrator level is likely to have access to tools that are capable of destroying the site and possibly the whole server. With everything merged, the safety of key administrative functions depends critically on the robustness of user management. It is difficult to be completely confident in this, especially as the total volume of software deployed on a site becomes large. Allowing access to the most sensitive administrative functions only through a distinct URI and login mechanism allows for other security mechanisms to be combined with the CMS user management. This might be a different user and password scheme implemented using Apache, or it might be a constraint on the IP addresses permitted to access the administrator login URI. No security mechanism is perfect, but combining more than one mechanism increases the chances of keeping out intruders. More information is said about security issues in a later section of this chapter.

Because of the separatist arguments, Aliro is implemented with a distinct administrator login to a small range of critical functions. Extensions added to the CMS have the ability to implement an administrator-side interface, but are free to make their own design decisions on the balance to be struck. The functions provided by the Aliro base system for administrators are as follows:

Basic system configuration such as details of databases used, caching options, mailing options, and presentation of system information
Management of extensions through the ability to install packages of software or to remove them, and the ability to manage what appears on which display
A particular part of extension management is the handling of themes (formerly known as templates in the Mambo world) that affect the presentation of the whole site
Management of a folder system that supports a tree structure of arbitrary depth, around which site content can be constructed
Creation and management of menu information
Access to error reports that contain detailed diagnostic information
A generalized system for modifying URIs to be friendly to humans and search engines, and to manage metadata
Whatever management functions are provided by extensions to the basic CMS

In Aliro, some of the critical classes that provide these facilities are not known to the general user side of the system, which provides another obstacle to misuse. Indeed it is possible to rename the directory under which code exclusive to the administrator side of the system resides. Code on the general user side does not have any straightforward means to find out where the administrator code exists. On balance, I believe that splitting off the most fundamental administrative functions is the more secure policy.

Now we have lists of essential and desirable CMS features, together with a set of administrator functions. We also need to start thinking about the technology needed for building a CMS.

The idea of a CMS

Critical CMS features

It should now be possible to create a list of essential requirements and another list of desirable features for a CMS. The essentials are:

Continuity: Despite the limitations of basic web protocols, many website functions need to retain information through a series of user interactions and the information must be protected from hijacking. The framework should handle this in a way that makes it easy for extensions to keep whatever data they need.
User management: The framework needs to provide the fundamentals for a system of controlling users via some form of authentication. But this needs to be flexible so that the least amount of code is installed to handle the requirement, which can range from a single administrative user to handling hundreds of thousands of distinct users and a variety of authentication systems.
Access control: Constraints are always required, if only to limit who can configure the website. Often much more is needed as various groups of users are allocated different privileges. It is now widely agreed that the best approach is the Role-Based Access Control (RBAC) system. This means that it is roles that are granted permissions, and accessors are allocated roles. It is preferable to think of accessors rather than users, since roles also need to be given to things other than just users, such as computer systems.
Extension management: A framework is useful if it can be easily extended. There is no single user visible facility that is essential to every website, so ideally the framework is stripped of all such functions. Each capability visible to users can then be added as an extension. When the requirements for building a website are considered, it turns out that there are several different kinds of extensions. One well known classification is into components, modules, plugins, and templates. These are explained in detail in Chapter 8,
Security and error handling: Everyone is aware of the tide of threats from spam to malicious cracking of websites. To be effective, security has to be built in from the start so that the framework not only achieves the best possible security, but also provides a helpful environment for building secure extensions. Errors are significant both as a usability problem and a potential security flaw, so a standard error handling mechanism is also required.

Desirable CMS features

Efficient and maintainable code handling: The framework is likely to consist of a number of separate code files. It is essential that they be loaded when needed, and preferable that they are not loaded if not needed. The mechanisms used need to be capable of handling extra code files added as extensions.
Database interface: Many web applications need access to a database to be able to function efficiently. The framework itself needs a database to perform its own functions. While PHP provides an interface to various databases, there is much that can be done in a CMS framework to provide higher level functions to meet common requirements. These are needed both by the framework and by many extensions.
Caches: These are used in many different contexts for Internet processing. To date, the two most productive areas have been object and XHTML caching. Both the speed of operation and the processing load benefit considerably from well implemented caches. So it is highly desirable for a CMS framework to provide suitable mechanisms that are lightweight and easy to use.
Menus: These are a common feature of websites, especially when taken in the widest sense to include such things as navigation bars and other ways to present what are essentially lists of links. It is not desirable for the framework to create final XHTML because that preempts decisions about presentation that should belong to templates or other extensions. But it is desirable for the framework to provide the logic for creating and managing menus, including a standard interface to extensions for menu creation. The framework should also provide menu data in a way that makes it easy to create a menu display.
Languages: Nowadays, as a minimum, software development should take account of the requirements imposed by implementation in different languages, including those that need multi-byte characters. It is now broadly agreed that part of the solution to this requirement is the use of UTF-8. A mechanism to allow fixed text to be translated is highly desirable. The bundle of issues raised by demands for language support are usually described using the terms internationalization and localization. The first is the building of capabilities into a system to support different ways of doing things, of which the most prominent is choice of language. Localization is the deployment of specific local characteristics into a system that has been internationalized. Apart from language itself, matters to be considered include the presentation of dates, times, monetary amounts, and numbers.

System management

Basic system configuration such as details of databases used, caching options, mailing options, and presentation of system information
Management of extensions through the ability to install packages of software or to remove them, and the ability to manage what appears on which display
A particular part of extension management is the handling of themes (formerly known as templates in the Mambo world) that affect the presentation of the whole site
Management of a folder system that supports a tree structure of arbitrary depth, around which site content can be constructed
Creation and management of menu information
Access to error reports that contain detailed diagnostic information
A generalized system for modifying URIs to be friendly to humans and search engines, and to manage metadata
Whatever management functions are provided by extensions to the basic CMS

Now we have lists of essential and desirable CMS features, together with a set of administrator functions. We also need to start thinking about the technology needed for building a CMS.

Technology for CMS building

Earlier we looked at how changing demands on websites occurred alongside innovation in technology, and particularly mentioned the arrival of scripting languages. Of these, JavaScript is the most popular at present, but for server-side scripting the favorite is PHP. With version 5, PHP reached a new level. The most significant changes are in the object-oriented features. These were thought to be a kind of "extra" when they were introduced into version 4. But extensive and enthusiastic use of these features to build object-oriented applications has led to PHP5 being built with a much better range of class and object capabilities. This provides the opportunity to adopt a much more thoroughgoing object orientation in the building of a new CMS framework. Strangely, despite all the talk of "Internet years" and rapid change, the move to PHP5 has been extremely slow, taking about five years from first release to widespread deployment.

Leveraging PHP5

Software developers can argue at length about the relative merits of different languages, but there is no doubt that PHP has established itself as a very popular tool for creating active websites. Two factors stand out, one of which applies to PHP generally, the other specifically to PHP5.

The general consideration is the ongoing attempt to separate the creation of views (which in practice means creating XHTML) from the problem-oriented logic. More generally, the aim is to split development into the MVC model model, view, and controller. While some have seen a need to create templating systems to achieve this, such systems have always been questionable on the grounds that PHP itself contains the necessary features for handling XHTML in a sound way. The trend recently has been to see templating systems as an unnecessary overhead. Indeed, one developer of a templating system has written to say that he now considers such systems undesirable. So a significant advantage of using PHP is the ability to handle XHTML neatly. There still remain plenty of unsolved problems in this area, notably the viability of widget libraries and the issue of how to support easy customization. Despite those problems, PHP offers powerful mechanisms for dealing with XHTML, briefly illustrated in a later section.

The specific advantage of PHP5 is its greatly improved provisions for classes and objects. Many experienced developers take the view that object principles lead to more flexible systems and better quality code. Of course, this does not happen automatically. Knowledge and skill are still required. More detailed comments about object-oriented development are made in a later section.

Note

After I had left the Mambo development team and decided to create a radically changed CMS to evolve out of the Mambo history, it was a major commitment of development effort. Given the huge advantage of PHP5 through its radically improved handling of classes and objects, it would have seemed foolish to commit so much effort to an obsolescent system. Because object orientation enables such radical improvements to the design of a CMS framework, it seemed to me that the logical conclusion was to work in PHP5 and wait for the world to catch up. It is now easy to find PHP5 hosting, and most developers have either migrated or are currently making the transition.

Some PHP policies

Before we go into specifics in later chapters, there are some general points about PHP that apply everywhere. There is scope for varying opinions in programming practice, so it has to be said that these are only my opinions and others may well disagree. But they do affect the way in which the code examples are written, so mentioning them may aid understanding. Much more could be said; the following comments are a selection of what seem the most important considerations for sound use of PHP. Other points will become apparent through the rest of the book.

PHP will not fail if variables are uninitialized, as it will assume that they are null and will issue a notice to tell you about it. Sometimes, PHP software is run with warnings and notices suppressed. This is not a good way to work. It hardly requires any more effort to write code so that variables are always initialized before use. The same applies to all other situations that give rise to notices or warnings, which can be easily avoided. Often, quite serious logical errors can be picked up by seeing a notice or warning. The error may not make the code fail in an obvious way, but nonetheless something may be going badly wrong. A low-level error is frequently an important sign of a problem. It is therefore best to make sure that you find out about every level of error.

Declarations are powerful, and it pays to maximize their power. Classes can be declared as abstract when they are not intended to be used on their own to create objects, but used only to build subclasses. Conversely, classes or methods that should not be subclassed can be declared final. Methods can be declared as public, private, or protected and the most suitable options should always be explicitly chosen. Properties inside an object should be declared wherever possible, and like methods, their visibility should be stated. In line with the previous comments, it is a good idea to initialize every declared variable in a class with a reasonably safe value.

'Magic quotes' is a crude facility that should be avoided. It was introduced in the early days of PHP to put backslashes in front of quote marks so that strings could be used in ways such as storing in a database without further effort. But for other purposes, the escaping backslash is a nuisance (they are sometimes visible on web pages when they should not be) and it is anyway better to use database-specific routines for escaping "difficult" characters before storing them. Software that relies on magic quotes will fail on a server that has the option turned off, and the whole issue will be finally settled in PHP version 6, as it will then be totally withdrawn. Wherever possible, Aliro will strip out magic quotes, but this is less reliable than avoiding them in the first place.

I have mixed feelings about symbols. PHP allows a string to be defined as a symbol and given an equivalent value. Symbols are global in scope, which is a reason for disliking them. Another drawback is that they are much more costly than variables, probably because of the work involved in making them global. Once defined, they cannot be altered. This last point can be an advantage for security reasons. If some critical and widely used information can be set as a defined symbol (or constant) very early in the processing, it will generally be available and cannot be altered by any means. So my current view is that symbols should mostly be avoided, but should not be ignored altogether and have a valuable role in specific circumstances.

In line with those comments, it should not be necessary to make anything global using the PHP global keyword or the $GLOBALS super-global. Use of globals obscures the flow of information through the program and the explicit passing of parameters is nearly always preferred. Class names are automatically global, and as their name obviously implies, so are the PHP super-globals such as $_POST.

There are many built-in functions in PHP, and because they are made with compiled code, they can operate much faster than PHP code. It is, therefore, worth getting to know about the function library, and using it in preference to writing code wherever this is logical and clear. The array handling functions are particularly extensive and powerful.

PHP provides an eval function so that you can dynamically construct some PHP code and then have it executed within a program by invoking eval. It is very rare for this to be unavoidable, and any use of eval involving user input is risky. Mostly it is better to think of an alternative solution.

In general, I like to lay code out neatly, but do not believe that there is one particular set of detailed rules for layout that should be slavishly followed. Consistency and clarity are the main criteria, and the latter is liable to be subjective.

Although efficiency is important, I would not allow small differences in performance to determine code to the detriment of clarity. Sometimes code is written knowing that it is slightly less efficient, for the simple reason that it looks better and is therefore easier to grasp. Efficiency is achieved by good design and avoiding writing unnecessary code. The fastest code is the code that has been factored out of the design and no longer exists! For something like a CMS framework, my inclination is towards compactness. This may make code harder to understand at first glance. Provided the logic is directly related to the problem, though, I believe that it is easier to disentangle a short piece of code than a long one.

Globalness in PHP

It is easy to have too many globals in PHP. Before we consider what to do, we should review the generally received opinion that globals are bad, particularly in relation to PHP.

If you just start writing PHP with no particular structure, then every variable that you use is global in scope. That is to say, if there are several separate PHP code files, each of which is PHP code outside of functions and classes, the variables used in the separate files will be the very same variables if they have the same name (assuming all the code is run for to handle a single request). Inside functions and classes, the global keyword can be used to declare the scope of a variable global.

There is a general opinion that globals are bad. This needs some qualification, but there are certainly some good reasons to look carefully at globals. Uncontrolled use of global variables is undoubtedly bad. The problem is that it becomes very difficult to isolate what sections of program code are doing, since their operation is liable to be affected by values that may have been altered in any one of a number of places. There is a large benefit in clarity when functions, or the preferred alternative, class methods operate in a way where the only variability comes from the data that is passed in as parameters.

Global variables are not the only kind of globals in PHP. One category has already been mentioned, that of symbols. But also mentioned was the fact that global symbols can have good features. Since they cannot be redefined, they are good candidates for holding critical information about the environment whose modification might compromise security. Still, the number of symbols should be kept small. A good number of symbols are automatically defined by PHP.

Another category is functions and classes. The real reason for using classes is to implement object-oriented design, but a supplementary reason for using them is because method names need to be unique only within the class, and are not global across the system. Thus, they have an advantage over functions.

Yet another is the set of PHP "super-globals" such as $_SERVER and $_POST. These are filled with values by PHP and they can be used anywhere in a program.

It has been pointed out that the data in a database is also a kind of global data, since there is, in general, no constraint on access to database tables. Observing this point, it should be starting to be clear that we cannot expect to eliminate globalness, and it may not even be a sensible goal.

What we really need are some guidelines as to how to constrain globalness. One consideration is that readable globals are a lot less damaging than those that are readable and writeable. This applies to symbols, for example. Ideally, they are set in a limited number of places, and thereafter are read only.

More generally, the solution is to have data guarded by classes that can reasonably be expected to "know" about the data. What we want to do is to avoid scattering data around our programs in an uncontrolled way. But anticipating the next chapter, we can imagine having a class that knows about all the classes available in a whole system, and knows how to load their code when required. It is reasonable to ask for this information at any point in the system.

Likewise, although it is technically possible to access database tables anywhere, it is undesirable for this to be done. Usually, it is better for a particular table to be managed entirely by either a single class, or possibly a small framework of classes. For example, the class that knows about all the system's classes might well be the only one that access the corresponding database table.

If we favor program structure, as I do, then we will have very little code that runs outside functions and classes, and so very little globalness will arise in that way. And it is hard to see any good reason for using the global keyword to make information global.

So, our general conclusion is that we should exercise caution in the use of globals, and that where they are justified, they should be managed by appropriate classes, so that their modification takes place in a controlled way, and their use can be easily traced.

Classes and objects

The crux of the argument for PHP5 is the radically improved object model. PHP4 had object capabilities tacked on almost as an afterthought, and the developers were surprised at the extent to which they were used. Their response to this turn of events was to review the object facilities and make them much better, without sacrificing too much backwards compatibility. Inevitably, though, it is impossible to take full advantage of the new model while writing code that will still run in PHP4.

But before getting into any details, we need to establish why object features matter. Arguments will no doubt continue to rage for a long time yet, so I simply state my views on the subject. Object techniques were first devised long ago, around the time the fundamental ideas of windows, icons, and mice came into being. All of these ideas took a long time to become mainstream, and it was the mid nineties before object orientation started to become widely accepted.

Before that, building objects had become fundamental to the creation of graphical user interfaces. The model of computer programming as something that had a main line of logic which controlled a variety of subsidiary processing sections no longer worked when the user could pick and choose from a variety of different possible actions. And the creation of complex displays needed to be built up out of simpler, reusable components.

Thinking about the example of the graphical user interface gives us an inkling of the essential nature of object orientation. It is an approach where the software is a model of aspects of the problem. This is easy enough to imagine in a user interface, where we know that there are windows, slide bars, buttons, and so on. There are different kinds of windows, but they all have something in common, and any particular kind of window is likely to be created multiple times with different data. In a CMS, the things that are being modeled are often more abstract, but the principles still apply. Software that models the problem is easier to understand and, if well written, is powerful and flexible.

Bearing this in mind, what I advocate is an approach to object orientation where the classes and objects come naturally from the problem being solved. Arbitrary rules for class building are best avoided in favor of letting the problem show through the code. And although patterns can be immensely valuable, they should also be treated with considerable caution.

Objects, patterns, and refactoring

For more than a decade, object design has had to take account of the idea of patterns. This is a sound principle, which says that real problems produce object-oriented solutions that often fall into one or another pattern. Common patterns can be documented, with guidance on how best they can be implemented.

Refactoring is a new name for an old practice that has frequently not been followed. It has always been true of software development that code is over valued. The hardest part of development is ironing out the inconsistencies and impossibilities in the original specification, and creating a good solution to the problem. As development proceeds, the problem becomes better understood, and the design ideas better developed. As this goes on, it makes good sense to throw away and rewrite code that is seen as unsatisfactory. It is often feasible to retain the best code for reuse. When this principle is applied in an object environment, an added twist is that the most important moves may be changes to the object model.

Despite the obvious benefits derived from thinking about patterns, I am always concerned to see them becoming too dominant in design. An idea that can be expressed in code is an algorithm, not a pattern. Unfortunately, the seminal book Design Patterns by Gamma, Helm, Johnson and Vlissides, often called the Gang of Four or GoF, includes a coded example for each pattern. These have often been closely followed, ignoring the GoF's caveat that they are intended only as illustrations, and not as models.

Patterns are meant to be at a higher level, where no single algorithm captures all the ways in which the pattern might be used. It therefore makes no sense to attempt a standard implementation of a pattern, nor is it sound design to force a problem into a pattern rather than think through the best solution. It has been observed that letting patterns drive development frequently results in excessively complex solutions that are inefficient and hard to understand.

Another problem is that patterns are often narrowly viewed as a standardized way to build objects in code. But a pattern does not have to be confined to guiding the PHP code, it can be used in a wider context. For example, a CMS may use XML documents for various purposes, notably for defining extensions. Sometimes a better solution can be achieved by using a pattern that embraces more than just the code. That way, both XML documents and PHP objects can be involved as actors in the pattern, which may result in a simpler, cleaner solution. Naturally, XML documents are not the only objects that could play a role in a broader application of patterns.

There are now a number of interesting web articles on the pitfalls of design patterns. A useful starting point is http://blog.jovan-s.com/2009/07/29/do-not-use-design-patterns-upfront/.

With those caveats, let us look briefly at some possible patterns that can help our project.

A number of well known patterns may be relevant for the construction of a CMS:

The Singleton pattern appears a number of times, especially for the handler objects described later. In theory, handlers could be implemented as class methods, but in PHP5 a class is not itself a first class object and it is more effective to use singleton objects. Other objects are naturally singletons, such as the session object, as PHP5 handles only a single request at a time. Some people object to singletons, but a defense of them is given later.
The Factory pattern is used to create objects when the circumstances alter what class is required. For example, a factory could be used to create the session object which will be of a different class depending on whether the session belongs to an administrator or an ordinary user.
The Observer (sometimes called subject-observer or publish-subscribe) pattern is implemented more as an architecture to handle plugins than as PHP5 classes. This is because the actual process of adding a plugin is an installation that affects the database more than the object structure.
The Command pattern is also used to handle plugins, since the external interface for plugins must be completely generalized. A Factory is used to create the correct plugin object to handle a particular request.
The Bridge and Memento patterns can also be used in the construction of plugins.

The object-relational compromise

In an ideal world, it would be possible to simply store objects in a database. Certainly, object databases do exist, and have important applications. But the vast majority of CMS designs still rely on a relational database. This kind of database is easily obtained and its strengths are well understood. Unfortunately, there is a mismatch between the principles of relational design and the character of many objects.

Compromises can be made in both areas. An example will make this clearer. In common with other systems, Aliro implements a system of plugins. A plugin is a piece of code that can be installed in the system to respond to some kind of "trigger". This allows the CMS to easily support added functionality in areas that can be defined in outline but not in detail. So we know that there is a need for user authentication, but there are alternative ways to achieve this. The CMS may provide as default an ID and password scheme. But we can imagine that the CMS might need to interface to an LDAP system, or any number of other possible authentication mechanisms. The authentication of a user is therefore handled by one or more plugins, triggered just at the point a user needs to be authenticated.

Now a single plugin might want to respond to several different triggers, with a different but related action for each. Pure relational analysis would require that if information about the plugin is held in a relational table, details of the triggers for each plugin would have to be in another table, since there may be more than one. In fact, Aliro makes a relational compromise here and stores the triggers for a plugin as a comma-separated list in the table of plugins. This design decision keeps the relational table structure simple, although impure.

It also implies something about the object mechanisms. It is messy to retrieve information from a relational table when the key is part of a list held as a single field. But Aliro does not do that. The number of plugins and the amount of data needed to describe them is small enough that there is a plugin handler, which knows about all the plugins and their triggers. The handler simply reads the entire database table describing plugins and builds internal data structures that make the actual operations on plugins as simple as possible. Because of this, a plugin object needs to know its triggers, but does not need to be stored in a relationally pure form. The handler is a singleton object that is normally stored in cache and only built infrequently.

The relational object compromise thus means that design is biased towards breaking relational rules in minor ways and giving some precedence to choosing objects that are simple to store. This compromise will be seen again as we build the CMS framework.

Basics of combining PHP and XHTML

Now we digress from the more or less arcane issues of object design into the practicalities of creating XHTML. Although I mentioned HTML in the historical comments early in this chapter, I am now assuming that all current development is implementing XHTML, preferably at the 1.0 Strict level. We need to consider how to generate XHTML using PHP scripts before we can adequately consider Model-View-Controller (MVC) architecture. There are several ways in which PHP can handle XHTML. It can, like pretty much any programming language, create XHTML as strings, in either single or double quotes. If double quotes are used, then PHP variables can be embedded in the string. While very few other languages possess the ability to flip between programming and XHTML, PHP will assume that it is processing XHTML (and simply pass it on to the web server) until it comes across the <?php start tag. It then handles everything as program code until it reaches a ?> end tag. This means that it is possible to write pure XHTML with occasional interruptions of PHP code, such as the following fragment:

<td width="30%" valign="top" align="right">
<strong><?php echo T_('Icon'); ?></strong>
</td>

As code like this builds up, I find all the clutter of PHP tags and the necessity for the 'echo' irritating and a barrier to comprehension. Also, it is often more effective and more flexible to build chunks of XHTML in memory before transmitting a substantial block to the browser. The other PHP alternative is one that I neglected for a long time, and which seems to be ignored by a lot of other developers. It is the PHP heredoc. It is a kind of string, but bounded by a name rather than any kind of quote mark. Here is a simple example:

echo <<<DETAIL_HTML
<table class="adminheading">
<thead>
<tr>
<th class="user">
$heading
</th>
</tr>
</thead>
DETAIL_HTML;

Now we have the advantage that nothing terminates the XHTML text until we get to the concluding DETAIL_HTML; so there is no need to worry about escaping quotes. The PHP tags have also disappeared, and the PHP variable $heading is simply included within the XHTML. In PHP4, it was also possible to include an object property, such as $this->heading. But PHP5 goes a lot further, and provided they are included in curly brackets, complex expressions that start with a $ sign can be written within heredoc, as follows:

<div id="topmenu">
{$this->screenarea['topmenu']->getData()}
</div>

Individual developers will make choices about how much to assign complex PHP into simple variables, and how much to use it directly within heredoc. Whatever the precise implementation, the result should be code that can be understood by a web designer without needing to know much at all about PHP.

Note

Note that although the printed layout of example code does not show it clearly, the terminating name, such as DETAIL_HTML; for heredoc must be at the extreme left-hand side of a separate line with no whitespace before or after it.

Model, view, and controller

It has long been agreed that it is good to make a separation between the model and views of the model. The model is understood as the set of objects that emulates the problem being solved by a computer application. As an example, one class of objects that is likely to be useful in a CMS is the class of menu items. An individual menu item is likely to know its own title, as displayed to the user. It probably also knows something about how to invoke the service for which it is the menu entry. It is quite a simple class, with relatively few properties and a handful of methods (or behaviors). But we will want to view menu items in different ways, so that one view is where the item is part of a menu shown to the user in a browser window.

Another view is shown to the administrator who will be interested in controlling the appearance and function of the item, and perhaps who is permitted to see it. So it makes sense to keep model and view separate. Views may well change in different ways and at different times from the model.

The MVC pattern comes about because of a feeling that views should be further refined by separating out something called a controller. The view concentrates on presentation to the user, taking information from the model. The controller manages the situation by handling input from the user, organizing the required objects, and then supplying them to the appropriate view. This approach minimizes the amount of decision making required in the view, so that it can be written using simple code.

The phrase "simple code" is deliberately vague. This is the point at which advocates of templates jump in and claim that views should be created using a template system (popular examples include patTemplates and Smarty). The role of the controller then includes marshalling data and somehow transferring it into the template system. Templates themselves still require some conditional and looping capabilities, unless the number of templates and the extent of duplication are to grow uncontrollably. This is inevitable, since displays are often repetitive, and we would not want to have one template when there are five repeats and a different template when there are six, and so on. Neither do we want to have different templates to allow for optional sections.

Given this need for control at the template level, the main template systems have introduced their own syntax. Skepticism about the added value from template systems is growing. While the principle of keeping the display output (typically including a good deal of XHTML) free of complex logic is sound, the doubt is whether template systems provide the best way to do this. There is no good reason why the syntax adopted for templates should be any easier to understand than straightforward PHP. And given that in practice, the division between software developers and web designers is a fuzzy one, there is a clear advantage to using a single language throughout. This also eliminates an overhead in marshalling data, which is already available through direct access to the objects that model the problem domain.

My approach and that of Aliro to MVC is therefore to create a model of the problem that is developed purely to provide functionality that makes sense in the terms used to discuss the problem. It exists in a kind of abstracted computing world, devoid of user interfaces. Controllers then look at requests from users and make sure that the relevant parts of the problem are instantiated as objects. Those objects are then passed to view classes, which contain the simplest possible code needed to present data from the model to the user in the form required. Usually, the form is XHTML.

The CMS environment

It is time now to consider the web environment. While all software has common features, writing for the Web involves considerations that are not found in longer established application areas.

Hosting the CMS

A huge range of hosting services exists, with costs ranging from zero upwards. Quality varies enormously, and is not always related to price. It is not easy to choose a hosting service, as the information given by rival providers is only part of the picture. It is difficult to offer general advice on the topic, but there is one issue that frequently causes problems with advanced systems such as a CMS, particularly where a web interface is provided for management.

This is the question of how to manage permissions for files and directories. The majority of hosting runs on Linux servers and therefore UNIX permission principles apply. The scheme is simple enough in concept, with permissions given separately for the owner of the file or directory, the group of which the owner is a member, and everyone else.

But there are some twists to this that make matters more difficult. From the way UNIX permissions work, it is clear that the situation of a particular file or directory depends on who owns it; only then is it possible to see what the permissions mean in practice. The web serving software, usually Apache, runs by default as a special user for whom a variety of names are used, including apache, nobody, www-data, or many other alternatives. At the same time, the site owner is given access through FTP, usually with the alternative of a file manager. The site owner is a quite different user from the web server.

Why does this make a difference? Problems arise because maintenance operations directly performed by the site owner, create files belonging to one user, while maintenance operations (including the installation of extensions) carried out through the web interface, create files owned by the Apache user. Even if all the files have the same nominal permissions (usually expressed in octal numbers, such as 0644) the actual ability to handle the files will vary according to the owner. Generally, if you are not the owner of a file, you will not be able to change the ownership or permissions of that file, so it is frequently impossible to change any of the permissions on a file created through the web management interface.

A rather crude solution is to give everyone all rights to every file, but that may lead to weak security. Another solution is to avoid using FTP or file manager, and instead rely on web interfaces for all operations, which may not always be possible.

My strongly preferred solution is to insist on some mechanism that runs the PHP programs making up the website under the ownership of the site owner. Apache is capable of switching who is the active user when it comes to running a script, and there are various schemes for applying this in a PHP environment. All involve some degree of overhead, but good implementations keep this to an acceptable minimum. The benefit is a much smoother running site with far fewer issues over permissions, because all files are now under the ownership of the site owner, whether created directly or through a web interface.

Using this configuration is also a good solution to the security problems that can arise in shared hosting, where the actions of other customers of the hosting provider can cause damage. This may be accidental rather than malicious, but I have had whole sites demolished by another user's faulty script. It's not an experience to be recommended! Normally, in my preferred configuration, you also have to watch out for files that give write permission to "others" as they are blocked from being executed as a security feature.

In general, hosting companies are keen to host whatever they can get, so as a customer you need to ask questions to find out whether you will get what you really need for your CMS.

Basic browser matters

To build any web application, we have to make some assumptions about what will happen at the browser. This is made complicated by the existence of many different browsers, each with its own peculiarities. Most of these relate to the details of XHTML and CSS usage, but there are some broad questions of usage that we can review now.

One is to adopt a policy on the use of JavaScript. It is certainly possible to improve the responsiveness of web applications by the use of a browser-based scripting language. The code runs on the visitor's own computer, rather than always having to go back to the server to run code. For some applications, such as WYSIWYG editors, it is impractical to use anything other than mechanisms that exist in the browser. Although there are various options for browser scripting, the most widely used is JavaScript.

There are problems over standardization with JavaScript, but most of all there is an accessibility problem. Not everyone is running a browser that will handle JavaScript, and in particular, screen readers used by people who cannot read information from a screen usually do not do so. The developments described here do not, therefore, rely on JavaScript to any significant extent. Relative to predecessor systems, Aliro is much less dependent on its use. No doubt improvements can be made by reintroducing more JavaScript, but as a matter of policy this should be done in a way that supports graceful degradation for visitors (and this should include site administrators) who cannot make use of it. The lack of JavaScript should not block access to any facility that could possibly be delivered some other way.

Another general consideration is the use of cookies. Despite scare stories soon after their introduction, appropriate use of cookies is now considered perfectly normal. The major exception we will encounter is the search bots that crawl the net looking at web pages and refusing cookies. Otherwise, since we are interested in building an advanced CMS, and critical features such as the ability to allow users to log in or shoppers to build up a shopping cart cannot be provided securely without cookies, we assume that cookies will be accepted. That is not to say a visitor who refuses cookies will be blocked, only that the services they receive will be restricted.

Security of a CMS

The possibility of having sessions without the use of cookies is disregarded for reasons given in Chapter 5, Sessions and Users. Software has always needed to be robust, but increasing involvement with people raises the stakes. Long ago, when software ran in a closed computer room, attended only by specialist operators, security was a simple issue. As software became exposed to direct interactions with users, so the security questions increased. But, as everyone knows, the internet has raised the issue to a completely new level on account of the existence of significant number of people who may damage a service. Some of the damage has been done out of simple curiosity, but a lot is now caused in pursuit of money making schemes that abuse internet facilities in one way or another.

There is controversy over whether "hacking" means breaking into computer systems or a certain approach to software development. To avoid this misunderstanding, I have used the alternative term "cracking" and "crackers" to refer to abusive actions and actors respectively. Cracking is now so prevalent that we need to start thinking about security before we get into any serious coding at all. Not only are weaknesses in Web software likely to be found and exploited, crackers use software tools that are quite as sophisticated as any of the applications that are subjected to cracking. It may not be nice, but it is a reality.

Software developers differ in their approach to security. Some take the view that, as professional developers, they have taken the trouble to know how to build secure software, and that is all there is to be said. Personally, I disagree with this approach, and prefer to think in terms of placing obstacles in the way of crackers. While writing the code, it may seem to be placing an impassable obstacle, but crackers are ingenious and find unexpected routes to evade obstacles. The regular appearance of security loopholes in major software projects demonstrates that total security is extremely hard to attain. Moreover, it is in the nature of a CMS that it is likely to have code added to it by different authors, and it may be that not all are as security aware as the original CMS creator. So anything that makes a significant contribution to increasing the difficulty of cracking is worth considering for inclusion.

Much old PHP code runs in an environment making extensive use of global data. Either the code is run at the top-level, not inside a function or class, so that variables are automatically global. This means that two separate PHP files will share data without any specific declaration, simply by the use of common variable names. In the worst cases, this is combined with reliance on "register globals". That is a PHP capability that automatically places values returned in URI strings or forms into PHP variables. In the days of innocence before cracking was rife, it seemed a nice way to make coding easier. Nowadays, it is the cause of many cracks and every effort is being made to eliminate it.

Aliro adopts a thoroughgoing class architecture, not least because of the contribution this makes to security. The entire system contains only six lines of code at the global level. There are very few functions at the global level; mostly they are used in the language system, and work as functions because they are needed so frequently that they would be clumsy as class methods. The rest of the system consists entirely of classes.

Classes have the considerable merit that their code does not run until the class is invoked. Many cracks have involved loading PHP code in a way that was never intended and causing it to execute in a compromised way. That cannot happen with classes, because loading the code of a class simply makes the class known to PHP, it does not cause any code to execute (unless the file that is loaded has code outside the class). In a totally class-based system, control of what is executed is guaranteed to follow a logical path from the original starting point, typically in an index.php file. Use of class methods can be controlled with PHP5 features, so that wherever possible they are designated as internal to the class and may not be used from outside. Even where methods are public, they are tightly associated with the environment of a particular class.

No single step will ever eliminate security problems. But writing systems entirely out of classes makes a useful contribution, quite apart from its benefits in quality of code. This imposes a requirement on a general CMS framework, which is the effective handling of the classes belonging to extensions. That is solved in the next chapter.

Some CMS terminology

There is scope for improvement and standardization in the terminology that is used in relation to content management systems. Unfortunately, it is difficult for one person to achieve much in this direction. This book is written within the tradition established by Mambo and, although I have made some attempts to clarify particularly confusing areas, the text largely conforms to convention. The names used in code examples are firmly linked to the traditional terminology, and altering the text while leaving the code in older terms would have been too confusing.

So, it is perhaps worth defining the major terms here, before we move onto any CMS details. The main CMS has been called the "core", although definitions of its boundary vary. Major extensions that are added to the CMS have been called components or applications, and could be likened to whole web applications. Minor extensions usually create small screen boxes with useful information, and have been called modules. The more pluggable units of code that can be triggered in a variety of ways not directly related to what appears on the screen were called mambots in Mambo, and are more generally referred to as plugins.

In an attempt to clarify what happens as different pieces of code work together to create the browser display, I have talked about blocks and boxes. Modules are pieces of code that create boxes, and they are grouped together to form boxes which are named portions of the browser display. One module may create multiple boxes on the same or different displays.

The styling of the site, or of pages within the site, is achieved by a collection of PHP, CSS, and images, which have been known collectively as a template. Some people prefer to keep the term "template" to describe only the code that is directly involved in determining a layout. So, although the code examples stick with the name "template", another term whose popularity is increasing is also used, and the packages are also called "themes".

Francesco Orio Jan 08, 2016

Il libro parla si di "Framework Development" ma per farlo utilizza come base un framework già realizzato dall'autore. Fornisce un'idea di massima di cosa potrebbe servire ma i dettagli non sono sufficienti. A titolo "accademico" può anche essere un buon punto di partenza, in ambiente lavorativo difficilmente andrete ad usare una base scritta anni prima e non più mantenuta.

Amazon Verified review

PHP 5 CMS Framework Development - 2nd Edition: For professional PHP developers, this is the perfect guide to web-oriented frameworks and content management systems. Covers all the critical design issues and programming techniques in an easy-to-follow style and structure.

What do you get with eBook?

Contact Details

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with eBook?

Contact Details

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs