(For more resources related to this topic, see here.)
Of course, backups are not the only issue with managing multiple, remote systems. In particular, managing such multiple configurations using a centralized application is often desirable.
One of the issues frequently faced by administrators is that of having multiple, remote systems all with similar software for the most part, but with minor differences in what is installed or running. Debian provides several packages that can help manage such an environment in a unified manner. Two of the more popular packages, both available in Debian, are FAI and Puppet. While we don't have the space to go into details, both applications are described briefly here.
Fully Automated Installation (FAI) focuses on managing Linux installations, and is developed using Debian, although it works with many different distributions, not just Debian. FAI uses a class concept for categorizing similar systems, and provides a good deal of flexibility and customization via hooks. FAI provides for unattended, automatic installation as well as tools for monitoring and updating groups of systems. FAI is frequently used for creating and maintaining clusters. More information is available at http://fai-project.org/.
Probably the best known application for distributed management is Puppet, developed by Puppet Labs. Unlike FAI, only the Open Source edition is free, the Enterprise edition, which has many additional features, is not. Puppet does include support for environments other than Linux. The desired configuration is described in a custom, high-level definition language, and distributed to systems with installed clients. Unlike FAI, Puppet does not provide its own bare metal remote installation method, but does use existing methods (such as kickstart) to provide this function. A number of companies that make heavy use of distributed and clustered systems use Puppet to manage their environments. More information is available at http://puppetlabs.com/.
There are other packages that can be used to manage a distributed environment, such as Chef and BCFG2. While simpler than Puppet or FAI, they support similar functions and have been used in some distributed and clustered environments.
The use of FAI, Puppet, and others in cluster management warrants a brief look at clustering next, and what packages in Debian support clustering.
A cluster is a group of systems that work together in such a way that the whole functions as a single unit. Such clusters can be loosely coupled or tightly coupled. A loosely coupled environment, each system is complete in itself, and can handle all of the tasks any of the other systems can handle. The environment provides mechanisms for redundancy, load sharing, and fail-over between systems, and is often called a High Availability (HA) cluster. In a tightly coupled environment, the systems involved are highly dependent on one another, often sharing memory and disk storage, and all work on the same task together. The environment provides mechanisms for data sharing, avoiding storage conflicts, keeping the systems in synchronization, and splitting up tasks appropriately. This design is often used in super-computing environments.
Clustering is an advanced technique that involves more than just installing and configuring software. It also involves hardware integration, and systems and network design, and implementation. Along with the URLs mentioned below, a good text on the subject is Building Clustered Linux Systems, by Robert W. Lucke, Prentice Hall. Here we will only touch the very basics, along with what tools Debian provides.
Let's take a brief look at each environment, and some of the tools used to create them.
Two primary functions are required to implement a high availability cluster:
Debian includes meta packages that bring together software from the Linux High Availability project, including cluster-agents and resource-agents, two of the higher-level meta packages. These packages install various agents that are useful in coordinating and managing load balancing and fail-over. In some cases, a master server is designated to distribute the processing load among other servers.
Data synchronization is handled by using shared storage and any of the filesystems that provide for multiple accesses and shared files, such as NFS or AFS.
High Availability clusters generally use standard software, along with software that is readily available to manage the dynamics of such environments.
In addition to the considerations for High Availability clusters, more tightly coupled environments such as Beowulf clusters also require an infrastructure to manage and distribute computing tasks. There are several web pages devoted to creating a Beowulf cluster using Debian as well as packages that aid in creating such a cluster. One such page is https://wiki.debian.org/StartaBeowulf, a Debian Wiki page on Beowulf basics. The manual for FAI also has a section on creating a Beowulf cluster. Books are available as well. Debian provides several packages that are helpful in building such a cluster, such as the OpenMPI libraries for message passing, and various utilities that run commands on multiple systems, such as those in the kadif package. There are even projects that have released scripts and live CDs that allow you to set up a cluster quickly (one such project is the PelicanHPC project, developed for Debian Lenny, hosted at http://www.pelicanhpc.org/.
This type of cluster is not something that you can set up and go. Beowulf and other tightly coupled clusters are intended for highly parallel computing, and the programs that do the actual computing must be designed specifically for such an environment. That said, some packages for specific parallel computations do exist in Debian, such as nwchem, which provides several applications for computational chemistry that take advantage of parallelism.
Some common components of clusters have already been mentioned, such as the OpenMPI libraries. Aside from the meta-packages already mentioned, the redhat-cluster suite of tools is available in Debian, as well as many useful libraries, scheduling tools, and failover tools such as booth. All of these can be found using apt-cache or Synaptic by searching for "cluster".
Many administrators will never have to administer a cluster, and many won't be responsible for a large number of systems requiring central backup solutions. However, even administering a single system using command line tools and text editors can be a chore. Even clusters sometimes require administrative tasks on individual systems. Fortunately, there is an application that can ease many administrative tasks, is easy to use, and can handle many aspects of Linux administration. It is called Webmin.
Up until Debian Sarge, Webmin was a part of Debian distributions. However, the Debian developer in charge of packaging it had difficulty keeping up with the frequent releases, and it was eventually dropped from Debian. However, the upstream Webmin developers maintain current packages that install cleanly. Some users have reported issues because Webmin does not always handle configuration files exactly as Debian intends, but it most certainly attempts to handle them in a compatible manner, and while some users have experienced problems with upgrades, many administrators are quite happy with Webmin.
As long as you are willing to deal with conflicts during upgrades, or restrict use of modules that have major configuration impacts, you will find Webmin quite useful.
Webmin may be installed by adding the following lines to your apt sources file:
deb http://download.webmin.com/download/repository sarge contrib deb http://webmin.mirror.somersettechsolutions.co.uk/repository sarge contrib
Usually, this is added to a separate webmin.list file in /etc/apt/sources.list.d.
The use of 'sarge' for the release name in the configuration is not a mistake. Since Webmin was dropped after the Sarge release (Debian 3.1), the developers update the repository as it is and haven't bothered changing it to keep up with the Debian code names. However, the versions available in the repository are compatible with any Debian release since 3.1.
After updating your cache file, Webmin can be installed and maintained using apt-get, aptitude, or Synaptic. Also, if you request a Webmin upgrade from within Webmin itself on a Debian system, it will use the proper Debian package to upgrade.
Webmin runs in the background, and provides an HTTP or HTTPS server on localhost port 10,000. You can use any web browser to connect to http://localhost:10000/ to access Webmin. Upon first installation, only the root user or those in a group allowed to use sudo to access the root account, may log in but Webmin users can be managed separately or in conjunction with local users.
Webmin provides extensive and easy to understand menus and icons for various configuration tasks. Webmin is also highly modular and extensible, and an extensive list of standard modules is included with the base package. It is not possible to cover Webmin as fully here as it deserves, but a short list of some of its capabilities includes:
There are even modules that apply to clusters. Also, Webmin can search and allow access to other Webmin servers on the local network or you can define remote servers manually. This allows a central Webmin server, installed on a particular system, to be the gateway to all of the other servers in your environment, essentially providing a single point of access to manage all Webmin enabled servers.
Webmin understands the configuration file layout of many distributions. The main problem is when a particular module does not handle certain types of configuration in the way the Debian developers prefer, which can make package upgrades somewhat difficult.
This can be handled in a couple of ways. Most modules provide a means to edit configuration files directly, so if you have read the Debian documentation you can modify the configuration appropriately to use Debian specific configuration techniques. Or, you may choose to allow Webmin to modify files as it sees fit, and handle any conflicts manually when you upgrade the software involved. Finally, you can avoid those modules involved with specific software that are more likely to cause problems.
One such module is Apache, which doesn't use links from sites-enabled to sites-available. Rather, it configures directly in the sites-enabled directory. Some administrators create the configuration in Webmin, and then move and link the files. Others prefer to manually configure Apache outside of Webmin.
Webmin modules are constantly changing, and some actually recognize the Debian file layouts well, so it is not possible to give a comprehensive list of modules to avoid at this time.
Best practice when using Webmin is to read the documentation and check the configuration files for specific software prior to using Webmin. Then, after configuring with Webmin, check the files again to determine whether changes may be required to work within the particular package's Debian configuration framework. Based upon this, you can decide whether to continue to configure using Webmin or switch back to manual configuration of that particular software.
Security is always a concern when remote access to a system is involved. Webmin handles this by requiring authentication and providing for detailed access restrictions that provide a layer of control beyond the firewall. Webmin users can be defined separately, or certain local users can be designated. Access to the various modules in Webmin can be restricted to certain users or groups of users, and detailed logs of Webmin actions are kept.
In addition to Webmin, there is a server called Usermin which may be installed from the same repository as Webmin. It allows individual users to perform a number of functions more easily, such as changing their password, accessing their files, read and manage their email, and managing some aspects of their user profile. It is also modular and has the same security features as Webmin.
Several powerful and flexible central backup solutions exist that help manage backups for multiple remote servers and sites. Debian provides packages that assist in building High Availability and Beowulf style multiprocessing clusters as well. And, whether you are involved in managing clusters or not, or even a single system, Webmin can ease an administrator's tasks.
Further resources on this subject: