Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
DevOps: Puppet, Docker, and Kubernetes

You're reading from   DevOps: Puppet, Docker, and Kubernetes Practical recipes to make the most of DevOps with powerful tools

Arrow left icon
Product type Course
Published in Mar 2017
Publisher Packt
ISBN-13 9781788297615
Length 925 pages
Edition 1st Edition
Tools
Concepts
Arrow right icon
Authors (6):
Arrow left icon
Ke-Jou Carol Hsu Ke-Jou Carol Hsu
Author Profile Icon Ke-Jou Carol Hsu
Ke-Jou Carol Hsu
Neependra Khare Neependra Khare
Author Profile Icon Neependra Khare
Neependra Khare
John Arundel John Arundel
Author Profile Icon John Arundel
John Arundel
Hideto Saito Hideto Saito
Author Profile Icon Hideto Saito
Hideto Saito
Thomas Uphill Thomas Uphill
Author Profile Icon Thomas Uphill
Thomas Uphill
Hui-Chuan Chloe Lee Hui-Chuan Chloe Lee
Author Profile Icon Hui-Chuan Chloe Lee
Hui-Chuan Chloe Lee
+2 more Show less
Arrow right icon
View More author details
Toc

Chapter 6. Managing Resources and Files

 

"The art of simplicity is a puzzle of complexity".

 
 --Douglas Horton

In this chapter, we will cover the following recipes:

  • Distributing cron jobs efficiently
  • Scheduling when resources are applied
  • Using host resources
  • Using exported host resources
  • Using multiple file sources
  • Distributing and merging directory trees
  • Cleaning up old files
  • Auditing resources
  • Temporarily disabling resources

Introduction

In the previous chapter, we introduced virtual and exported resources. Virtual and exported resources are ways to manage the way in which resources are applied to a node. In this chapter, we will deal with when and how to apply resources. In some cases, you may only wish to apply a resource off hours, while in others, you may wish to only audit the resource but change nothing. In other cases, you may wish to apply completely different resources based on which node is using the code. As we will see, Puppet has the flexibility to deal with all these scenarios.

Distributing cron jobs efficiently

When you have many servers executing the same cron job, it's usually a good idea not to run them all at the same time. If all the jobs access a common server (for example, when running backups), it may put too much load on that server, and even if they don't, all the servers will be busy at the same time, which may affect their capacity to provide other services.

As usual, Puppet can help; this time, using the inline_template function to calculate a unique time for each job.

How to do it...

Here's how to have Puppet schedule the same job at a different time for each machine:

  1. Modify your site.pp file as follows:
    node 'cookbook' {
      cron { 'run-backup':
        ensure  => present,
        command => '/usr/local/bin/backup',
        hour    => inline_template('<%= @hostname.sum % 24 %>'),
        minute  => '00',
      }
    }
  2. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413730771'
    Notice: /Stage[main]/Main/Node[cookbook]/Cron[run-backup]/ensure: created
    Notice: Finished catalog run in 0.11 seconds
    
  3. Run crontab to see how the job has been configured:
    [root@cookbook ~]# crontab -l
    # HEADER: This file was autogenerated at Sun Oct 19 10:59:32 -0400 2014 by puppet.
    # HEADER: While it can still be managed manually, it is definitely not recommended.
    # HEADER: Note particularly that the comments starting with 'Puppet Name' should
    # HEADER: not be deleted, as doing so could cause duplicate cron jobs.
    # Puppet Name: run-backup
    0 15 * * * /usr/local/bin/backup
    

How it works...

We want to distribute the hour of the cron job runs across all our nodes. We choose something that is unique across all the machines and convert it to a number. This way, the value will be distributed across the nodes and will not change per node.

We can do the conversion using Ruby's sum method, which computes a numerical value from a string that is unique to the machine (in this case, the machine's hostname). The sum function will generate a large integer (in the case of the string cookbook, the sum is 855), and we want values for hour between 0 and 23, so we use Ruby's % (modulo) operator to restrict the result to this range. We should get a reasonably good (though not statistically uniform) distribution of values, depending on your hostnames. Another option here is to use the fqdn_rand() function, which works in much the same way as our example.

If all your machines have the same name (it does happen), don't expect this trick to work! In this case, you can use some other string that is unique to the machine, such as ipaddress or fqdn.

There's more...

If you have several cron jobs per machine and you want to run them a certain number of hours apart, add this number to the hostname.sum resource before taking the modulus. Let's say we want to run the dump_database job at some arbitrary time and the run_backup job an hour later, this can be done using the following code snippet:

cron { 'dump-database':
  ensure  => present,
  command => '/usr/local/bin/dump_database',
  hour    => inline_template('<%= @hostname.sum % 24 %>'),
  minute  => '00',
}

cron { 'run-backup':
  ensure  => present,
  command => '/usr/local/bin/backup',
  hour    => inline_template('<%= ( @hostname.sum + 1) % 24 %>'),
  minute  => '00',
}

The two jobs will end up with different hour values for each machine Puppet runs on, but run_backup will always be one hour after dump_database.

Most cron implementations have directories for hourly, daily, weekly, and monthly tasks. The directories /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly exist on both our Debian and Enterprise Linux machines. These directories hold executables, which will be run on the referenced schedule (hourly, daily, weekly, or monthly). I find it better to describe all the jobs in these folders and push the jobs as file resources. An admin on the box searching for your script will be able to find it with grep in these directories. To use the same trick here, we would push a cron task into /etc/cron.hourly and then verify that the hour is the correct hour for the task to run. To create the cron jobs using the cron directories, follow these steps:

  1. First, create a cron class in modules/cron/init.pp:
    class cron {
      file { '/etc/cron.hourly/run-backup':
        content => template('cron/run-backup'),
        mode    => 0755,
      }
    }
  2. Include the cron class in your cookbook node in site.pp:
    node cookbook {
      include cron
    }
  3. Create a template to hold the cron task:
    #!/bin/bash
    
    runhour=<%= @hostname.sum%24 %>
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup
  4. Then, run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413732254'
    Notice: /Stage[main]/Cron/File[/etc/cron.hourly/run-backup]/ensure: defined content as '{md5}5e50a7b586ce774df23301ee72904dda'
    Notice: Finished catalog run in 0.11 seconds
    
  5. Verify that the script has the same value we calculated before, 15:
    #!/bin/bash
    
    runhour=15
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup

Now, this job will run every hour but only when the hour, returned by $(date +%H), is equal to 15 will the rest of the script run. Creating your cron jobs as file resources in a large organization makes it easier for your fellow administrators to find them. When you have a very large number of machines, it can be advantageous to add another random wait at the beginning of your job. You would need to modify the line before echo run-backup and add the following:

MAXWAIT=600
sleep $((RANDOM%MAXWAIT))

This will sleep a maximum of 600 seconds but will sleep a different amount each time it runs (assuming your random number generator is working). This sort of random wait is useful when you have thousands of machines, all running the same task and you need to stagger the runs as much as possible.

See also

  • The Running Puppet from cron recipe in Chapter 2, Puppet Infrastructure
How to do it...

Here's how to have Puppet schedule the same job at a different time for each machine:

Modify your site.pp file as follows:
node 'cookbook' {
  cron { 'run-backup':
    ensure  => present,
    command => '/usr/local/bin/backup',
    hour    => inline_template('<%= @hostname.sum % 24 %>'),
    minute  => '00',
  }
}
Run Puppet:
[root@cookbook ~]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413730771'
Notice: /Stage[main]/Main/Node[cookbook]/Cron[run-backup]/ensure: created
Notice: Finished catalog run in 0.11 seconds
Run crontab to see how the job has been configured:
[root@cookbook ~]# crontab -l
# HEADER: This file was autogenerated at Sun Oct 19 10:59:32 -0400 2014 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: run-backup
0 15 * * * /usr/local/bin/backup

How it works...

We want to distribute the hour of the cron job runs across all our nodes. We choose something that is unique across all the machines and convert it to a number. This way, the value will be distributed across the nodes and will not change per node.

We can do the conversion using Ruby's sum method, which computes a numerical value from a string that is unique to the machine (in this case, the machine's hostname). The sum function will generate a large integer (in the case of the string cookbook, the sum is 855), and we want values for hour between 0 and 23, so we use Ruby's % (modulo) operator to restrict the result to this range. We should get a reasonably good (though not statistically uniform) distribution of values, depending on your hostnames. Another option here is to use the fqdn_rand() function, which works in much the same way as our example.

If all your machines have the same name (it does happen), don't expect this trick to work! In this case, you can use some other string that is unique to the machine, such as ipaddress or fqdn.

There's more...

If you have several cron jobs per machine and you want to run them a certain number of hours apart, add this number to the hostname.sum resource before taking the modulus. Let's say we want to run the dump_database job at some arbitrary time and the run_backup job an hour later, this can be done using the following code snippet:

cron { 'dump-database':
  ensure  => present,
  command => '/usr/local/bin/dump_database',
  hour    => inline_template('<%= @hostname.sum % 24 %>'),
  minute  => '00',
}

cron { 'run-backup':
  ensure  => present,
  command => '/usr/local/bin/backup',
  hour    => inline_template('<%= ( @hostname.sum + 1) % 24 %>'),
  minute  => '00',
}

The two jobs will end up with different hour values for each machine Puppet runs on, but run_backup will always be one hour after dump_database.

Most cron implementations have directories for hourly, daily, weekly, and monthly tasks. The directories /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly exist on both our Debian and Enterprise Linux machines. These directories hold executables, which will be run on the referenced schedule (hourly, daily, weekly, or monthly). I find it better to describe all the jobs in these folders and push the jobs as file resources. An admin on the box searching for your script will be able to find it with grep in these directories. To use the same trick here, we would push a cron task into /etc/cron.hourly and then verify that the hour is the correct hour for the task to run. To create the cron jobs using the cron directories, follow these steps:

  1. First, create a cron class in modules/cron/init.pp:
    class cron {
      file { '/etc/cron.hourly/run-backup':
        content => template('cron/run-backup'),
        mode    => 0755,
      }
    }
  2. Include the cron class in your cookbook node in site.pp:
    node cookbook {
      include cron
    }
  3. Create a template to hold the cron task:
    #!/bin/bash
    
    runhour=<%= @hostname.sum%24 %>
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup
  4. Then, run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413732254'
    Notice: /Stage[main]/Cron/File[/etc/cron.hourly/run-backup]/ensure: defined content as '{md5}5e50a7b586ce774df23301ee72904dda'
    Notice: Finished catalog run in 0.11 seconds
    
  5. Verify that the script has the same value we calculated before, 15:
    #!/bin/bash
    
    runhour=15
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup

Now, this job will run every hour but only when the hour, returned by $(date +%H), is equal to 15 will the rest of the script run. Creating your cron jobs as file resources in a large organization makes it easier for your fellow administrators to find them. When you have a very large number of machines, it can be advantageous to add another random wait at the beginning of your job. You would need to modify the line before echo run-backup and add the following:

MAXWAIT=600
sleep $((RANDOM%MAXWAIT))

This will sleep a maximum of 600 seconds but will sleep a different amount each time it runs (assuming your random number generator is working). This sort of random wait is useful when you have thousands of machines, all running the same task and you need to stagger the runs as much as possible.

See also

  • The Running Puppet from cron recipe in Chapter 2, Puppet Infrastructure
How it works...

We want to

distribute the hour of the cron job runs across all our nodes. We choose something that is unique across all the machines and convert it to a number. This way, the value will be distributed across the nodes and will not change per node.

We can do the conversion using Ruby's sum method, which computes a numerical value from a string that is unique to the machine (in this case, the machine's hostname). The sum function will generate a large integer (in the case of the string cookbook, the sum is 855), and we want values for hour between 0 and 23, so we use Ruby's % (modulo) operator to restrict the result to this range. We should get a reasonably good (though not statistically uniform) distribution of values, depending on your hostnames. Another option here is to use the fqdn_rand() function, which works in much the same way as our example.

If all your machines have the same name (it does happen), don't expect this trick to work! In this case, you can use some other string that is unique to the machine, such as ipaddress or fqdn.

There's more...

If you have several cron jobs per machine and you want to run them a certain number of hours apart, add this number to the hostname.sum resource before taking the modulus. Let's say we want to run the dump_database job at some arbitrary time and the run_backup job an hour later, this can be done using the following code snippet:

cron { 'dump-database':
  ensure  => present,
  command => '/usr/local/bin/dump_database',
  hour    => inline_template('<%= @hostname.sum % 24 %>'),
  minute  => '00',
}

cron { 'run-backup':
  ensure  => present,
  command => '/usr/local/bin/backup',
  hour    => inline_template('<%= ( @hostname.sum + 1) % 24 %>'),
  minute  => '00',
}

The two jobs will end up with different hour values for each machine Puppet runs on, but run_backup will always be one hour after dump_database.

Most cron implementations have directories for hourly, daily, weekly, and monthly tasks. The directories /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly exist on both our Debian and Enterprise Linux machines. These directories hold executables, which will be run on the referenced schedule (hourly, daily, weekly, or monthly). I find it better to describe all the jobs in these folders and push the jobs as file resources. An admin on the box searching for your script will be able to find it with grep in these directories. To use the same trick here, we would push a cron task into /etc/cron.hourly and then verify that the hour is the correct hour for the task to run. To create the cron jobs using the cron directories, follow these steps:

  1. First, create a cron class in modules/cron/init.pp:
    class cron {
      file { '/etc/cron.hourly/run-backup':
        content => template('cron/run-backup'),
        mode    => 0755,
      }
    }
  2. Include the cron class in your cookbook node in site.pp:
    node cookbook {
      include cron
    }
  3. Create a template to hold the cron task:
    #!/bin/bash
    
    runhour=<%= @hostname.sum%24 %>
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup
  4. Then, run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413732254'
    Notice: /Stage[main]/Cron/File[/etc/cron.hourly/run-backup]/ensure: defined content as '{md5}5e50a7b586ce774df23301ee72904dda'
    Notice: Finished catalog run in 0.11 seconds
    
  5. Verify that the script has the same value we calculated before, 15:
    #!/bin/bash
    
    runhour=15
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup

Now, this job will run every hour but only when the hour, returned by $(date +%H), is equal to 15 will the rest of the script run. Creating your cron jobs as file resources in a large organization makes it easier for your fellow administrators to find them. When you have a very large number of machines, it can be advantageous to add another random wait at the beginning of your job. You would need to modify the line before echo run-backup and add the following:

MAXWAIT=600
sleep $((RANDOM%MAXWAIT))

This will sleep a maximum of 600 seconds but will sleep a different amount each time it runs (assuming your random number generator is working). This sort of random wait is useful when you have thousands of machines, all running the same task and you need to stagger the runs as much as possible.

See also

  • The Running Puppet from cron recipe in Chapter 2, Puppet Infrastructure
There's more...

If you have several cron jobs per machine and you want to run them a certain number of hours apart, add this number to the hostname.sum resource before taking the modulus. Let's say we want to run the dump_database job at some arbitrary time and the run_backup job an hour later, this can be done using the following code snippet:

cron { 'dump-database': ensure => present, command => '/usr/local/bin/dump_database', hour => inline_template('<%= @hostname.sum % 24 %>'), minute => '00', } cron { 'run-backup': ensure => present, command => '/usr/local/bin/backup', hour => inline_template('<%= ( @hostname.sum + 1) % 24 %>'), minute => '00', }

The two jobs will end up with different hour values for each machine Puppet runs on, but run_backup will always be one hour after dump_database.

Most cron implementations have directories for hourly, daily, weekly, and monthly tasks. The directories /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly exist on both our Debian and Enterprise Linux machines. These directories hold executables, which will be run on the referenced schedule (hourly, daily, weekly, or monthly). I find it better to describe all the jobs in these folders and push the jobs as file resources. An

admin on the box searching for your script will be able to find it with grep in these directories. To use the same trick here, we would push a cron task into /etc/cron.hourly and then verify that the hour is the correct hour for the task to run. To create the cron jobs using the cron directories, follow these steps:

  1. First, create a cron class in modules/cron/init.pp:
    class cron {
      file { '/etc/cron.hourly/run-backup':
        content => template('cron/run-backup'),
        mode    => 0755,
      }
    }
  2. Include the cron class in your cookbook node in site.pp:
    node cookbook {
      include cron
    }
  3. Create a template to hold the cron task:
    #!/bin/bash
    
    runhour=<%= @hostname.sum%24 %>
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup
  4. Then, run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413732254'
    Notice: /Stage[main]/Cron/File[/etc/cron.hourly/run-backup]/ensure: defined content as '{md5}5e50a7b586ce774df23301ee72904dda'
    Notice: Finished catalog run in 0.11 seconds
    
  5. Verify that the script has the same value we calculated before, 15:
    #!/bin/bash
    
    runhour=15
    hour=$(date +%H)
    if [ "$runhour" -ne "$hour" ]; then
      exit 0
    fi
    
    echo run-backup

Now, this job will run every hour but only when the hour, returned by $(date +%H), is equal to 15 will the rest of the script run. Creating your cron jobs as file resources in a large organization makes it easier for your fellow administrators to find them. When you have a very large number of machines, it can be advantageous to add another random wait at the beginning of your job. You would need to modify the line before echo run-backup and add the following:

MAXWAIT=600
sleep $((RANDOM%MAXWAIT))

This will sleep a maximum of 600 seconds but will sleep a different amount each time it runs (assuming your random number generator is working). This sort of random wait is useful when you have thousands of machines, all running the same task and you need to stagger the runs as much as possible.

See also

  • The Running Puppet from cron recipe in Chapter 2, Puppet Infrastructure
See also

The Running Puppet from cron recipe in
  • Chapter 2, Puppet Infrastructure

Scheduling when resources are applied

So far, we looked at what Puppet can do, and the order that it does things in, but not when it does them. One way to control this is to use the schedule metaparameter. When you need to limit the number of times a resource is applied within a specified period, schedule can help. For example:

exec { "/usr/bin/apt-get update":
    schedule => daily,
}

The most important thing to understand about schedule is that it can only stop a resource being applied. It doesn't guarantee that the resource will be applied with a certain frequency. For example, the exec resource shown in the preceding code snippet has schedule => daily, but this just represents an upper limit on the number of times the exec resource can run per day. It won't be applied more than once a day. If you don't run Puppet at all, the resource won't be applied at all. Using the hourly schedule, for instance, is meaningless on a machine configured to run the agent every 4 hours (via the runinterval configuration setting).

That being said, schedule is best used to restrict resources from running when they shouldn't, or don't need to; for example, you might want to make sure that apt-get update isn't run more than once an hour. There are some built-in schedules available for you to use:

  • hourly
  • daily
  • weekly
  • monthly
  • never

However, you can modify these and create your own custom schedules, using the schedule resource. We'll see how to do this in the following example. Let's say we want to make sure that an exec resource representing a maintenance job won't run during office hours, when it might interfere with production.

How to do it...

In this example, we'll create a custom schedule resource and assign this to the resource:

  1. Modify your site.pp file as follows:
    schedule { 'outside-office-hours':
      period => daily,
      range  => ['17:00-23:59','00:00-09:00'],
      repeat => 1,
    }
    node 'cookbook' {
      notify { 'Doing some maintenance':
        schedule => 'outside-office-hours',
      }
    }
  2. Run Puppet. What you'll see will depend on the time of the day. If it's currently outside the office hours period you defined, Puppet will apply the resource as follows:
    [root@cookbook ~]# date
    Fri Jan  2 23:59:01 PST 2015
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413734477'
    Notice: Doing some maintenance
    Notice: /Stage[main]/Main/Node[cookbook]/Notify[Doing some maintenance]/message: defined 'message' as 'Doing some maintenance'
    Notice: Finished catalog run in 0.07 seconds
    
  3. If the time is within the office hours period, Puppet will do nothing:
    [root@cookbook ~]# date
    Fri Jan  2 09:59:01 PST 2015
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413734289'
    Notice: Finished catalog run in 0.09 seconds
    

How it works...

A schedule consists of three bits of information:

  • The period (hourly, daily, weekly, or monthly)
  • The range (defaults to the whole period, but can be a smaller part of it)
  • The repeat count (how often the resource is allowed to be applied within the range; the default is 1 or once per period)

Our custom schedule named outside-office-hours supplies these three parameters:

schedule { 'outside-office-hours':
  period => daily,
  range  => ['17:00-23:59','00:00-09:00'],
  repeat => 1,
}

The period is daily, and range is defined as an array of two time intervals:

17:00-23:59
00:00-09:00

The schedule named outside-office-hours is now available for us to use with any resource, just as though it were built into Puppet such as the daily or hourly schedules. In our example, we assign this schedule to the exec resource using the schedule metaparameter:

notify { 'Doing some maintenance':
  schedule => 'outside-office-hours',
}

Without this schedule parameter, the resource would be applied every time Puppet runs. With it, Puppet will check the following parameters to decide whether or not to apply the resource:

  • Whether the time is in the permitted range
  • Whether the resource has already been run the maximum permitted number of times in this period

For example, let's consider what happens if Puppet runs at 4 p.m., 5 p.m., and 6 p.m. on a given day:

  • 4 p.m.: It's outside the permitted time range, so Puppet will do nothing
  • 5 p.m.: It's inside the permitted time range, and the resource hasn't been run yet in this period, so Puppet will apply the resource
  • 6 p.m.: It's inside the permitted time range, but the resource has already been run the maximum number of times in this period, so Puppet will do nothing

And so on until the next day.

There's more...

The repeat parameter governs how many times the resource will be applied given the other constraints of the schedule. For example, to apply a resource no more than six times an hour, use a schedule as follows:

period => hourly,
repeat => 6,

Remember that this won't guarantee that the job is run six times an hour. It just sets an upper limit; no matter how often Puppet runs or anything else happens, the job won't be run if it has already run six times this hour. If Puppet only runs once a day, the job will just be run once. So schedule is best used to make sure things don't happen at certain times (or don't exceed a given frequency).

How to do it...

In this example, we'll create a

custom schedule resource and assign this to the resource:

  1. Modify your site.pp file as follows:
    schedule { 'outside-office-hours':
      period => daily,
      range  => ['17:00-23:59','00:00-09:00'],
      repeat => 1,
    }
    node 'cookbook' {
      notify { 'Doing some maintenance':
        schedule => 'outside-office-hours',
      }
    }
  2. Run Puppet. What you'll see will depend on the time of the day. If it's currently outside the office hours period you defined, Puppet will apply the resource as follows:
    [root@cookbook ~]# date
    Fri Jan  2 23:59:01 PST 2015
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413734477'
    Notice: Doing some maintenance
    Notice: /Stage[main]/Main/Node[cookbook]/Notify[Doing some maintenance]/message: defined 'message' as 'Doing some maintenance'
    Notice: Finished catalog run in 0.07 seconds
    
  3. If the time is within the office hours period, Puppet will do nothing:
    [root@cookbook ~]# date
    Fri Jan  2 09:59:01 PST 2015
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413734289'
    Notice: Finished catalog run in 0.09 seconds
    

How it works...

A schedule consists of three bits of information:

  • The period (hourly, daily, weekly, or monthly)
  • The range (defaults to the whole period, but can be a smaller part of it)
  • The repeat count (how often the resource is allowed to be applied within the range; the default is 1 or once per period)

Our custom schedule named outside-office-hours supplies these three parameters:

schedule { 'outside-office-hours':
  period => daily,
  range  => ['17:00-23:59','00:00-09:00'],
  repeat => 1,
}

The period is daily, and range is defined as an array of two time intervals:

17:00-23:59
00:00-09:00

The schedule named outside-office-hours is now available for us to use with any resource, just as though it were built into Puppet such as the daily or hourly schedules. In our example, we assign this schedule to the exec resource using the schedule metaparameter:

notify { 'Doing some maintenance':
  schedule => 'outside-office-hours',
}

Without this schedule parameter, the resource would be applied every time Puppet runs. With it, Puppet will check the following parameters to decide whether or not to apply the resource:

  • Whether the time is in the permitted range
  • Whether the resource has already been run the maximum permitted number of times in this period

For example, let's consider what happens if Puppet runs at 4 p.m., 5 p.m., and 6 p.m. on a given day:

  • 4 p.m.: It's outside the permitted time range, so Puppet will do nothing
  • 5 p.m.: It's inside the permitted time range, and the resource hasn't been run yet in this period, so Puppet will apply the resource
  • 6 p.m.: It's inside the permitted time range, but the resource has already been run the maximum number of times in this period, so Puppet will do nothing

And so on until the next day.

There's more...

The repeat parameter governs how many times the resource will be applied given the other constraints of the schedule. For example, to apply a resource no more than six times an hour, use a schedule as follows:

period => hourly,
repeat => 6,

Remember that this won't guarantee that the job is run six times an hour. It just sets an upper limit; no matter how often Puppet runs or anything else happens, the job won't be run if it has already run six times this hour. If Puppet only runs once a day, the job will just be run once. So schedule is best used to make sure things don't happen at certain times (or don't exceed a given frequency).

How it works...

A schedule consists of three

bits of information:

  • The period (hourly, daily, weekly, or monthly)
  • The range (defaults to the whole period, but can be a smaller part of it)
  • The repeat count (how often the resource is allowed to be applied within the range; the default is 1 or once per period)

Our custom schedule named outside-office-hours supplies these three parameters:

schedule { 'outside-office-hours':
  period => daily,
  range  => ['17:00-23:59','00:00-09:00'],
  repeat => 1,
}

The period is daily, and range is defined as an array of two time intervals:

17:00-23:59
00:00-09:00

The schedule named outside-office-hours is now available for us to use with any resource, just as though it were built into Puppet such as the daily or hourly schedules. In our example, we assign this schedule to the exec resource using the schedule metaparameter:

notify { 'Doing some maintenance':
  schedule => 'outside-office-hours',
}

Without this schedule parameter, the resource would be applied every time Puppet runs. With it, Puppet will check the following parameters to decide whether or not to apply the resource:

  • Whether the time is in the permitted range
  • Whether the resource has already been run the maximum permitted number of times in this period

For example, let's consider what happens if Puppet runs at 4 p.m., 5 p.m., and 6 p.m. on a given day:

  • 4 p.m.: It's outside the permitted time range, so Puppet will do nothing
  • 5 p.m.: It's inside the permitted time range, and the resource hasn't been run yet in this period, so Puppet will apply the resource
  • 6 p.m.: It's inside the permitted time range, but the resource has already been run the maximum number of times in this period, so Puppet will do nothing

And so on until the next day.

There's more...

The repeat parameter governs how many times the resource will be applied given the other constraints of the schedule. For example, to apply a resource no more than six times an hour, use a schedule as follows:

period => hourly,
repeat => 6,

Remember that this won't guarantee that the job is run six times an hour. It just sets an upper limit; no matter how often Puppet runs or anything else happens, the job won't be run if it has already run six times this hour. If Puppet only runs once a day, the job will just be run once. So schedule is best used to make sure things don't happen at certain times (or don't exceed a given frequency).

There's more...

The repeat parameter

governs how many times the resource will be applied given the other constraints of the schedule. For example, to apply a resource no more than six times an hour, use a schedule as follows:

period => hourly,
repeat => 6,

Remember that this won't guarantee that the job is run six times an hour. It just sets an upper limit; no matter how often Puppet runs or anything else happens, the job won't be run if it has already run six times this hour. If Puppet only runs once a day, the job will just be run once. So schedule is best used to make sure things don't happen at certain times (or don't exceed a given frequency).

Using host resources

It's not always practical or convenient to use DNS to map your machine names to IP addresses, especially in cloud infrastructures, where those addresses may change all the time. However, if you use entries in the /etc/hosts file instead, you then have the problem of how to distribute these entries to all machines and keep them up to date.

Here's a better way to do it; Puppet's host resource type controls a single /etc/hosts entry, and you can use this to map a hostname to an IP address easily across your whole network. For example, if all your machines need to know the address of the main database server, you can manage it with a host resource.

How to do it...

Follow these steps to create an example host resource:

  1. Modify your site.pp file as follows:
    node 'cookbook' {
      host { 'packtpub.com':
        ensure => present,
        ip     => '83.166.169.231',
      }
    }
  2. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413781153'
    Notice: /Stage[main]/Main/Node[cookbook]/Host[packtpub.com]/ensure: created
    Info: Computing checksum on file /etc/hosts
    Notice: Finished catalog run in 0.12 seconds
    

How it works...

Puppet will check the target file (usually /etc/hosts) to see whether the host entry already exists, and if not, add it. If an entry for that hostname already exists with a different address, Puppet will change the address to match the manifest.

There's more...

Organizing your host resources into classes can be helpful. For example, you could put the host resources for all your DB servers into one class called admin::dbhosts, which is included by all web servers.

Where machines may need to be defined in multiple classes (for example, a database server might also be a repository server), virtual resources can solve this problem. For example, you could define all your hosts as virtual in a single class:

class admin::allhosts {
  @host { 'db1.packtpub.com':
    tag => 'database'
    ...
  }
}

You could then realize the hosts you need in the various classes:

class admin::dbhosts {
  Host <| tag=='database' |>
}

class admin::webhosts {
  Host <| tag=='web' |>
}
How to do it...

Follow these steps to create an example host resource:

Modify your site.pp file as follows:
node 'cookbook' {
  host { 'packtpub.com':
    ensure => present,
    ip     => '83.166.169.231',
  }
}
Run Puppet:
[root@cookbook ~]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413781153'
Notice: /Stage[main]/Main/Node[cookbook]/Host[packtpub.com]/ensure: created
Info: Computing checksum on file /etc/hosts
Notice: Finished catalog run in 0.12 seconds

How it works...

Puppet will check the target file (usually /etc/hosts) to see whether the host entry already exists, and if not, add it. If an entry for that hostname already exists with a different address, Puppet will change the address to match the manifest.

There's more...

Organizing your host resources into classes can be helpful. For example, you could put the host resources for all your DB servers into one class called admin::dbhosts, which is included by all web servers.

Where machines may need to be defined in multiple classes (for example, a database server might also be a repository server), virtual resources can solve this problem. For example, you could define all your hosts as virtual in a single class:

class admin::allhosts {
  @host { 'db1.packtpub.com':
    tag => 'database'
    ...
  }
}

You could then realize the hosts you need in the various classes:

class admin::dbhosts {
  Host <| tag=='database' |>
}

class admin::webhosts {
  Host <| tag=='web' |>
}
How it works...

Puppet will check

the target file (usually /etc/hosts) to see whether the host entry already exists, and if not, add it. If an entry for that hostname already exists with a different address, Puppet will change the address to match the manifest.

There's more...

Organizing your host resources into classes can be helpful. For example, you could put the host resources for all your DB servers into one class called admin::dbhosts, which is included by all web servers.

Where machines may need to be defined in multiple classes (for example, a database server might also be a repository server), virtual resources can solve this problem. For example, you could define all your hosts as virtual in a single class:

class admin::allhosts {
  @host { 'db1.packtpub.com':
    tag => 'database'
    ...
  }
}

You could then realize the hosts you need in the various classes:

class admin::dbhosts {
  Host <| tag=='database' |>
}

class admin::webhosts {
  Host <| tag=='web' |>
}
There's more...

Organizing your host resources into classes can be helpful. For example, you could put the host resources for all your DB servers into one class called admin::dbhosts, which is included by all web servers.

Where machines may need to be defined in multiple classes (for example, a database server might also be a repository server), virtual resources can solve this problem. For example, you could define all your hosts as virtual in a single class:

class admin::allhosts { @host { 'db1.packtpub.com': tag => 'database' ... } }

You could then

realize the hosts you need in the various classes:

class admin::dbhosts {
  Host <| tag=='database' |>
}

class admin::webhosts {
  Host <| tag=='web' |>
}

Using exported host resources

In the previous example, we used the spaceship syntax to collect virtual host resources for hosts of type database or type web. You can use the same trick with exported resources. The advantage to using exported resources is that as you add more database servers, the collector syntax will automatically pull in the newly created exported host entries for those servers. This makes your /etc/hosts entries more dynamic.

Getting ready

We will be using exported resources. If you haven't already done so, set up puppetdb and enable storeconfigs to use puppetdb as outlined in Chapter 2, Puppet Infrastructure.

How to do it...

In this example, we will configure database servers and clients to communicate with each other. We'll make use of exported resources to do the configuration.

  1. Create a new database module, db:
    t@mylaptop ~/puppet/modules $ mkdir -p db/manifests
    
  2. Create a new class for your database servers, db::server:
    class db::server {
      @@host {"$::fqdn":
        host_aliases => $::hostname,
        ip           => $::ipaddress,
        tag          => 'db::server',
      }
      # rest of db class
    }
  3. Create a new class for your database clients:
    class db::client {
      Host <<| tag == 'db::server' |>>
    }
  4. Apply the database server module to some nodes, in site.pp, for example:
    node 'dbserver1.example.com' {
      class {'db::server': }
    }
    node 'dbserver2.example.com' {
      class {'db::server': }
    }
  5. Run Puppet on the nodes with the database server module to create the exported resources.
  6. Apply the database client module to cookbook:
    node 'cookbook' {
      class {'db::client': }
    }
  7. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413782501'
    Notice: /Stage[main]/Db::Client/Host[dbserver2.example.com]/ensure: created
    Info: Computing checksum on file /etc/hosts
    Notice: /Stage[main]/Db::Client/Host[dbserver1.example.com]/ensure: created
    Notice: Finished catalog run in 0.10 seconds
    
  8. Verify the host entries in /etc/hosts:
    [root@cookbook ~]# cat /etc/hosts
    # HEADER: This file was autogenerated at Mon Oct 20 01:21:42 -0400 2014
    # HEADER: by puppet.  While it can still be managed manually, it
    # HEADER: is definitely not recommended.
    127.0.0.1	localhost  localhost.localdomain localhost4 localhost4.localdomain4 
    ::1  localhost  localhost.localdomain localhost6 localhost6.localdomain6
    83.166.169.231  packtpub.com
    192.168.122.150  dbserver2.example.com  dbserver2
    192.168.122.151  dbserver1.example.com  dbserver1
    

How it works...

In the db::server class, we create an exported host resource:

@@host {"$::fqdn":
  host_aliases => $::hostname,
  ip           => $::ipaddress,
  tag          => 'db::server',
}

This resource uses the fully qualified domain name ($::fqdn) of the node on which it is applied. We also use the short hostname ($::hostname) as an alias of the node. Aliases are printed after fqdn in /etc/hosts. We use the node's $::ipaddress fact as the IP address for the host entry. Finally, we add a tag to the resource so that we can collect based on that tag later.

The important thing to remember here is that if the ip address should change for the host, the exported resource will be updated, and nodes that collect the exported resource will update their host records accordingly.

We created a collector in db::client, which only collects exported host resources that have been tagged with 'db::server':

Host <<| tag == 'db::server' |>>

We applied the db::server class for a couple of nodes, dbserver1 and dbserver2, which we then collected on cookbook by applying the db::client class. The host entries were placed in /etc/hosts (the default file). We can see that the host entry contains both the fqdn and the short hostname for dbserver1 and dbserver2.

There's more...

Using exported resources in this manner is very useful. Another similar system would be to create an NFS server class, which creates exported resources for the mount points that it exports (via NFS). You can then use tags to have clients collect the appropriate mount points from the server. In the previous example, we made use of a tag to aid in our collection of exported resources. It is worth noting that there are several tags automatically added to resources when they are created, one of which is the scope where the resource was created.

Getting ready

We will be using exported resources. If you haven't already done so, set up puppetdb and enable storeconfigs to use puppetdb as outlined in

Chapter 2, Puppet Infrastructure.

How to do it...

In this example, we will configure database servers and clients to communicate with each other. We'll make use of exported resources to do the configuration.

  1. Create a new database module, db:
    t@mylaptop ~/puppet/modules $ mkdir -p db/manifests
    
  2. Create a new class for your database servers, db::server:
    class db::server {
      @@host {"$::fqdn":
        host_aliases => $::hostname,
        ip           => $::ipaddress,
        tag          => 'db::server',
      }
      # rest of db class
    }
  3. Create a new class for your database clients:
    class db::client {
      Host <<| tag == 'db::server' |>>
    }
  4. Apply the database server module to some nodes, in site.pp, for example:
    node 'dbserver1.example.com' {
      class {'db::server': }
    }
    node 'dbserver2.example.com' {
      class {'db::server': }
    }
  5. Run Puppet on the nodes with the database server module to create the exported resources.
  6. Apply the database client module to cookbook:
    node 'cookbook' {
      class {'db::client': }
    }
  7. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413782501'
    Notice: /Stage[main]/Db::Client/Host[dbserver2.example.com]/ensure: created
    Info: Computing checksum on file /etc/hosts
    Notice: /Stage[main]/Db::Client/Host[dbserver1.example.com]/ensure: created
    Notice: Finished catalog run in 0.10 seconds
    
  8. Verify the host entries in /etc/hosts:
    [root@cookbook ~]# cat /etc/hosts
    # HEADER: This file was autogenerated at Mon Oct 20 01:21:42 -0400 2014
    # HEADER: by puppet.  While it can still be managed manually, it
    # HEADER: is definitely not recommended.
    127.0.0.1	localhost  localhost.localdomain localhost4 localhost4.localdomain4 
    ::1  localhost  localhost.localdomain localhost6 localhost6.localdomain6
    83.166.169.231  packtpub.com
    192.168.122.150  dbserver2.example.com  dbserver2
    192.168.122.151  dbserver1.example.com  dbserver1
    

How it works...

In the db::server class, we create an exported host resource:

@@host {"$::fqdn":
  host_aliases => $::hostname,
  ip           => $::ipaddress,
  tag          => 'db::server',
}

This resource uses the fully qualified domain name ($::fqdn) of the node on which it is applied. We also use the short hostname ($::hostname) as an alias of the node. Aliases are printed after fqdn in /etc/hosts. We use the node's $::ipaddress fact as the IP address for the host entry. Finally, we add a tag to the resource so that we can collect based on that tag later.

The important thing to remember here is that if the ip address should change for the host, the exported resource will be updated, and nodes that collect the exported resource will update their host records accordingly.

We created a collector in db::client, which only collects exported host resources that have been tagged with 'db::server':

Host <<| tag == 'db::server' |>>

We applied the db::server class for a couple of nodes, dbserver1 and dbserver2, which we then collected on cookbook by applying the db::client class. The host entries were placed in /etc/hosts (the default file). We can see that the host entry contains both the fqdn and the short hostname for dbserver1 and dbserver2.

There's more...

Using exported resources in this manner is very useful. Another similar system would be to create an NFS server class, which creates exported resources for the mount points that it exports (via NFS). You can then use tags to have clients collect the appropriate mount points from the server. In the previous example, we made use of a tag to aid in our collection of exported resources. It is worth noting that there are several tags automatically added to resources when they are created, one of which is the scope where the resource was created.

How to do it...

In this example, we will configure database servers and clients to communicate with each other. We'll make use of exported resources to do the configuration.

Create a new database module, db:
t@mylaptop ~/puppet/modules $ mkdir -p db/manifests
Create a new class for your database servers, db::server:
class db::server {
  @@host {"$::fqdn":
    host_aliases => $::hostname,
    ip           => $::ipaddress,
    tag          => 'db::server',
  }
  # rest of db class
}
Create a new class for your database clients:
class db::client {
  Host <<| tag == 'db::server' |>>
}
Apply the database server module to some nodes, in site.pp, for example:
node 'dbserver1.example.com' {
  class {'db::server': }
}
node 'dbserver2.example.com' {
  class {'db::server': }
}
Run Puppet on the nodes
  1. with the database server module to create the exported resources.
  2. Apply the database client module to cookbook:
    node 'cookbook' {
      class {'db::client': }
    }
  3. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413782501'
    Notice: /Stage[main]/Db::Client/Host[dbserver2.example.com]/ensure: created
    Info: Computing checksum on file /etc/hosts
    Notice: /Stage[main]/Db::Client/Host[dbserver1.example.com]/ensure: created
    Notice: Finished catalog run in 0.10 seconds
    
  4. Verify the host entries in /etc/hosts:
    [root@cookbook ~]# cat /etc/hosts
    # HEADER: This file was autogenerated at Mon Oct 20 01:21:42 -0400 2014
    # HEADER: by puppet.  While it can still be managed manually, it
    # HEADER: is definitely not recommended.
    127.0.0.1	localhost  localhost.localdomain localhost4 localhost4.localdomain4 
    ::1  localhost  localhost.localdomain localhost6 localhost6.localdomain6
    83.166.169.231  packtpub.com
    192.168.122.150  dbserver2.example.com  dbserver2
    192.168.122.151  dbserver1.example.com  dbserver1
    

How it works...

In the db::server class, we create an exported host resource:

@@host {"$::fqdn":
  host_aliases => $::hostname,
  ip           => $::ipaddress,
  tag          => 'db::server',
}

This resource uses the fully qualified domain name ($::fqdn) of the node on which it is applied. We also use the short hostname ($::hostname) as an alias of the node. Aliases are printed after fqdn in /etc/hosts. We use the node's $::ipaddress fact as the IP address for the host entry. Finally, we add a tag to the resource so that we can collect based on that tag later.

The important thing to remember here is that if the ip address should change for the host, the exported resource will be updated, and nodes that collect the exported resource will update their host records accordingly.

We created a collector in db::client, which only collects exported host resources that have been tagged with 'db::server':

Host <<| tag == 'db::server' |>>

We applied the db::server class for a couple of nodes, dbserver1 and dbserver2, which we then collected on cookbook by applying the db::client class. The host entries were placed in /etc/hosts (the default file). We can see that the host entry contains both the fqdn and the short hostname for dbserver1 and dbserver2.

There's more...

Using exported resources in this manner is very useful. Another similar system would be to create an NFS server class, which creates exported resources for the mount points that it exports (via NFS). You can then use tags to have clients collect the appropriate mount points from the server. In the previous example, we made use of a tag to aid in our collection of exported resources. It is worth noting that there are several tags automatically added to resources when they are created, one of which is the scope where the resource was created.

How it works...

In the db::server class, we

create an exported host resource:

@@host {"$::fqdn":
  host_aliases => $::hostname,
  ip           => $::ipaddress,
  tag          => 'db::server',
}

This resource uses the fully qualified domain name ($::fqdn) of the node on which it is applied. We also use the short hostname ($::hostname) as an alias of the node. Aliases are printed after fqdn in /etc/hosts. We use the node's $::ipaddress fact as the IP address for the host entry. Finally, we add a tag to the resource so that we can collect based on that tag later.

The important thing to remember here is that if the ip address should change for the host, the exported resource will be updated, and nodes that collect the exported resource will update their host records accordingly.

We created a collector in db::client, which only collects exported host resources that have been tagged with 'db::server':

Host <<| tag == 'db::server' |>>

We applied the db::server class for a couple of nodes, dbserver1 and dbserver2, which we then collected on cookbook by applying the db::client class. The host entries were placed in /etc/hosts (the default file). We can see that the host entry contains both the fqdn and the short hostname for dbserver1 and dbserver2.

There's more...

Using exported resources in this manner is very useful. Another similar system would be to create an NFS server class, which creates exported resources for the mount points that it exports (via NFS). You can then use tags to have clients collect the appropriate mount points from the server. In the previous example, we made use of a tag to aid in our collection of exported resources. It is worth noting that there are several tags automatically added to resources when they are created, one of which is the scope where the resource was created.

There's more...

Using exported resources in this manner is very useful. Another similar system would be to create an NFS server class, which creates exported resources for the mount points that it exports (via NFS). You can then use tags to have clients collect the appropriate mount points from the server. In the previous example, we made use of a tag to aid in our collection of exported resources. It is worth noting that there are several tags automatically added to resources when they are created, one of which is the scope where the resource was created.

Using multiple file sources

A neat feature of Puppet's file resource is that you can specify multiple values for the source parameter. Puppet will search them in order. If the first source isn't found, it moves on to the next, and so on. You can use this to specify a default substitute if the particular file isn't present, or even a series of increasingly generic substitutes.

How to do it...

This example demonstrates using multiple file sources:

  1. Create a new greeting module as follows:
    class greeting {
      file { '/tmp/greeting':
        source => [ 'puppet:///modules/greeting/hello.txt',
                    'puppet:///modules/greeting/universal.txt'],
      }
    }
  2. Create the file modules/greeting/files/hello.txt with the following contents:
    Hello, world.
  3. Create the file modules/greeting/files/universal.txt with the following contents:
    Bah-weep-Graaaaagnah wheep ni ni bong
  4. Add the class to a node:
    node cookbook {
      class {'greeting': }
    }
  5. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413784347'
    Notice: /Stage[main]/Greeting/File[/tmp/greeting]/ensure: defined content as '{md5}54098b367d2e87b078671fad4afb9dbb'
    Notice: Finished catalog run in 0.43 seconds
    
  6. Check the contents of the /tmp/greeting file:
    [root@cookbook ~]# cat /tmp/greeting 
    Hello, world.
    
  7. Now remove the hello.txt file from your Puppet repository and rerun the agent:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413784939'
    Notice: /Stage[main]/Greeting/File[/tmp/greeting]/content: 
    --- /tmp/greeting	2014-10-20 01:52:28.117999991 -0400
    +++ /tmp/puppet-file20141020-4960-1o9g344-0	2014-10-20 02:02:20.695999979 -0400
    @@ -1 +1 @@
    -Hello, world.
    +Bah-weep-Graaaaagnah wheep ni ni bong
    
    Info: Computing checksum on file /tmp/greeting
    Info: /Stage[main]/Greeting/File[/tmp/greeting]: Filebucketed /tmp/greeting to puppet with sum 54098b367d2e87b078671fad4afb9dbb
    Notice: /Stage[main]/Greeting/File[/tmp/greeting]/content: content changed '{md5}54098b367d2e87b078671fad4afb9dbb' to '{md5}933c7f04d501b45456e830de299b5521'
    Notice: Finished catalog run in 0.77 seconds
    

How it works...

On the first Puppet run, puppet searches for the available file sources in the order given:

source => [
  'puppet:///modules/greeting/hello.txt',
  'puppet:///modules/greeting/universal.txt'
],

The file hello.txt is first in the list, and is present, so Puppet uses that as the source for /tmp/greeting:

Hello, world.

On the second Puppet run, hello.txt is missing, so Puppet goes on to look for the next file, universal.txt. This is present, so it becomes the source for /tmp/greeting:

Bah-weep-Graaaaagnah wheep ni ni bong

There's more...

You can use this trick anywhere you have a file resource. A common example is a service that is deployed on all nodes, such as rsyslog. The rsyslog configuration is the same on every host except for the rsyslog server. Create an rsyslog class with a file resource for the rsyslog configuration file:

class rsyslog {
  file { '/etc/rsyslog.conf':
    source => [
      "puppet:///modules/rsyslog/rsyslog.conf.${::hostname}",
      'puppet:///modules/rsyslog/rsyslog.conf' ],
  }

Then, you put the default configuration in rsyslog.conf. For your rsyslog server, logger, create an rsyslog.conf.logger file. On the machine logger, rsyslog.conf.logger will be used before rsyslog.conf because it is listed first in the array of sources.

See also

  • The Passing parameters to classes recipe in Chapter 3, Writing Better Manifests
How to do it...

This example demonstrates using multiple file sources:

Create a new greeting module as follows:
class greeting {
  file { '/tmp/greeting':
    source => [ 'puppet:///modules/greeting/hello.txt',
                'puppet:///modules/greeting/universal.txt'],
  }
}
Create the file modules/greeting/files/hello.txt with the following contents:
Hello, world.
Create the file modules/greeting/files/universal.txt with the following contents:
Bah-weep-Graaaaagnah wheep ni ni bong
Add the class to a node:
node cookbook {
  class {'greeting': }
}
Run Puppet:
[root@cookbook ~]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413784347'
Notice: /Stage[main]/Greeting/File[/tmp/greeting]/ensure: defined content as '{md5}54098b367d2e87b078671fad4afb9dbb'
Notice: Finished catalog run in 0.43 seconds
Check the contents of the /tmp/greeting file:
[root@cookbook ~]# cat /tmp/greeting 
Hello, world.
Now remove the hello.txt file from your Puppet repository and rerun the agent:
[root@cookbook ~]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413784939'
Notice: /Stage[main]/Greeting/File[/tmp/greeting]/content: 
--- /tmp/greeting	2014-10-20 01:52:28.117999991 -0400
+++ /tmp/puppet-file20141020-4960-1o9g344-0	2014-10-20 02:02:20.695999979 -0400
@@ -1 +1 @@
-Hello, world.
+Bah-weep-Graaaaagnah wheep ni ni bong

Info: Computing checksum on file /tmp/greeting
Info: /Stage[main]/Greeting/File[/tmp/greeting]: Filebucketed /tmp/greeting to puppet with sum 54098b367d2e87b078671fad4afb9dbb
Notice: /Stage[main]/Greeting/File[/tmp/greeting]/content: content changed '{md5}54098b367d2e87b078671fad4afb9dbb' to '{md5}933c7f04d501b45456e830de299b5521'
Notice: Finished catalog run in 0.77 seconds

How it works...

On the first Puppet run, puppet searches for the available file sources in the order given:

source => [
  'puppet:///modules/greeting/hello.txt',
  'puppet:///modules/greeting/universal.txt'
],

The file hello.txt is first in the list, and is present, so Puppet uses that as the source for /tmp/greeting:

Hello, world.

On the second Puppet run, hello.txt is missing, so Puppet goes on to look for the next file, universal.txt. This is present, so it becomes the source for /tmp/greeting:

Bah-weep-Graaaaagnah wheep ni ni bong

There's more...

You can use this trick anywhere you have a file resource. A common example is a service that is deployed on all nodes, such as rsyslog. The rsyslog configuration is the same on every host except for the rsyslog server. Create an rsyslog class with a file resource for the rsyslog configuration file:

class rsyslog {
  file { '/etc/rsyslog.conf':
    source => [
      "puppet:///modules/rsyslog/rsyslog.conf.${::hostname}",
      'puppet:///modules/rsyslog/rsyslog.conf' ],
  }

Then, you put the default configuration in rsyslog.conf. For your rsyslog server, logger, create an rsyslog.conf.logger file. On the machine logger, rsyslog.conf.logger will be used before rsyslog.conf because it is listed first in the array of sources.

See also

  • The Passing parameters to classes recipe in Chapter 3, Writing Better Manifests
How it works...

On the first Puppet run, puppet searches for

the available file sources in the order given:

source => [
  'puppet:///modules/greeting/hello.txt',
  'puppet:///modules/greeting/universal.txt'
],

The file hello.txt is first in the list, and is present, so Puppet uses that as the source for /tmp/greeting:

Hello, world.

On the second Puppet run, hello.txt is missing, so Puppet goes on to look for the next file, universal.txt. This is present, so it becomes the source for /tmp/greeting:

Bah-weep-Graaaaagnah wheep ni ni bong

There's more...

You can use this trick anywhere you have a file resource. A common example is a service that is deployed on all nodes, such as rsyslog. The rsyslog configuration is the same on every host except for the rsyslog server. Create an rsyslog class with a file resource for the rsyslog configuration file:

class rsyslog {
  file { '/etc/rsyslog.conf':
    source => [
      "puppet:///modules/rsyslog/rsyslog.conf.${::hostname}",
      'puppet:///modules/rsyslog/rsyslog.conf' ],
  }

Then, you put the default configuration in rsyslog.conf. For your rsyslog server, logger, create an rsyslog.conf.logger file. On the machine logger, rsyslog.conf.logger will be used before rsyslog.conf because it is listed first in the array of sources.

See also

  • The Passing parameters to classes recipe in Chapter 3, Writing Better Manifests
There's more...

You can use this trick anywhere you

have a file resource. A common example is a service that is deployed on all nodes, such as rsyslog. The rsyslog configuration is the same on every host except for the rsyslog server. Create an rsyslog class with a file resource for the rsyslog configuration file:

class rsyslog {
  file { '/etc/rsyslog.conf':
    source => [
      "puppet:///modules/rsyslog/rsyslog.conf.${::hostname}",
      'puppet:///modules/rsyslog/rsyslog.conf' ],
  }

Then, you put the default configuration in rsyslog.conf. For your rsyslog server, logger, create an rsyslog.conf.logger file. On the machine logger, rsyslog.conf.logger will be used before rsyslog.conf because it is listed first in the array of sources.

See also

  • The Passing parameters to classes recipe in Chapter 3, Writing Better Manifests
See also

The Passing parameters to classes recipe in
  • Chapter 3, Writing Better Manifests

Distributing and merging directory trees

As we saw in the previous chapter, the file resource has a recurse parameter, which allows Puppet to transfer entire directory trees. We used this parameter to copy an admin user's dotfiles into their home directory. In this section, we'll show how to use recurse and another parameter sourceselect to extend our previous example.

How to do it...

Modify our admin user example as follows:

  1. Remove the $dotfiles parameter, remove the condition based on $dotfiles. Add a second source to the home directory file resource:
    define admin_user ($key, $keytype) { 
      $username = $name
      user { $username:
        ensure     => present,
      }
      file { "/home/${username}/.ssh":
        ensure  => directory,
        mode    => '0700',
        owner   => $username,
        group   => $username,
        require => File["/home/${username}"],
      }
      ssh_authorized_key { "${username}_key":
        key     => $key,
        type    => "$keytype",
        user    => $username,
        require => File["/home/${username}/.ssh"],
      }
      # copy in all the files in the subdirectory
      file { "/home/${username}":
        recurse => true,
        mode    => '0700',
        owner   => $username,
        group   => $username,
        source  => [
          "puppet:///modules/admin_user/${username}",
          'puppet:///modules/admin_user/base' ],
        sourceselect => 'all',
        require      => User["$username"],
      }
    }
    
  2. Create a base directory and copy all the system default files from /etc/skel:
    t@mylaptop ~/puppet/modules/admin_user/files $ cp -a /etc/skel base
    
  3. Create a new admin_user resource, one that will not have a directory defined:
    node 'cookbook' {
      admin_user {'steven':
        key     => 'AAAAB3N...',
        keytype => 'dsa',
      }
    }
  4. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413787159'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/User[steven]/ensure: created
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven]/ensure: created
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.bash_logout]/ensure: defined content as '{md5}6a5bc1cc5f80a48b540bc09d082b5855'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.emacs]/ensure: defined content as '{md5}de7ee35f4058681a834a99b5d1b048b3'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.bashrc]/ensure: defined content as '{md5}2f8222b4f275c4f18e69c34f66d2631b'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.bash_profile]/ensure: defined content as '{md5}f939eb71a81a9da364410b799e817202'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.ssh]/ensure: created
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/Ssh_authorized_key[steven_key]/ensure: created
    Notice: Finished catalog run in 1.11 seconds
    

How it works...

If a file resource has the recurse parameter set on it, and it is a directory, Puppet will deploy not only the directory itself, but all its contents (including subdirectories and their contents). As we saw in the previous example, when a file has more than one source, the first source file found is used to satisfy the request. This applies to directories as well.

There's more...

By specifying the parameter sourceselect as 'all', the contents of all the source directories will be combined. For example, add thomas admin_user back into your node definition in site.pp for cookbook:

admin_user {'thomas':
    key     => 'ABBA...',
    keytype => 'rsa',
  }

Now run Puppet again on cookbook:

[root@cookbook thomas]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413787770'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/content: content changed '{md5}3e8337f44f84b298a8a99869ae8ca76a' to '{md5}f939eb71a81a9da364410b799e817202'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bash_profile]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_logout]/ensure: defined content as '{md5}6a5bc1cc5f80a48b540bc09d082b5855'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/content: content changed '{md5}db2a20b2b9cdf36cca1ca4672622ddd2' to '{md5}033c3484e4b276e0641becc3aa268a3a'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bashrc]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.emacs]/ensure: defined content as '{md5}de7ee35f4058681a834a99b5d1b048b3'
Notice: Finished catalog run in 0.86 seconds

Because we previously applied the thomas admin_user to cookbook, the user existed. The two files defined in the thomas directory on the Puppet server were already in the home directory, so only the additional files, .bash_logout, .bash_profile, and .emacs were created. Using these two parameters together, you can have default files that can be overridden easily.

Sometimes you want to deploy files to an existing directory but remove any files which aren't managed by Puppet. A good example would be if you are using mcollective in your environment. The directory holding client credentials should only have certificates that come from Puppet.

The purge parameter will do this for you. Define the directory as a resource in Puppet:

file { '/etc/mcollective/ssl/clients':
  purge   => true,
  recurse => true,
}

The combination of recurse and purge will remove all files and subdirectories in /etc/mcollective/ssl/clients that are not deployed by Puppet. You can then deploy your own files to that location by placing them in the appropriate directory on the Puppet server.

If there are subdirectories that contain files you don't want to purge, just define the subdirectory as a Puppet resource, and it will be left alone:

file { '/etc/mcollective/ssl/clients':
  purge => true,
  recurse => true,
}
file { '/etc/mcollective/ssl/clients/local':
  ensure => directory,
}

Note

Be aware that, at least in current implementations of Puppet, recursive file copies can be quite slow and place a heavy memory load on the server. If the data doesn't change very often, it might be better to deploy and unpack a tar file instead. This can be done with a file resource for the tar file and an exec, which requires the file resource and unpacks the archive. Recursive directories are less of a problem when filled with small files. Puppet is not a very efficient file server, so creating large tar files and distributing them with Puppet is not a good idea either. If you need to copy large files around, using the Operating Systems packager is a better solution.

How to do it...

Modify our admin user example as follows:

Remove the $dotfiles parameter, remove the condition based on $dotfiles. Add a second source to the home directory file resource:
define admin_user ($key, $keytype) { 
  $username = $name
  user { $username:
    ensure     => present,
  }
  file { "/home/${username}/.ssh":
    ensure  => directory,
    mode    => '0700',
    owner   => $username,
    group   => $username,
    require => File["/home/${username}"],
  }
  ssh_authorized_key { "${username}_key":
    key     => $key,
    type    => "$keytype",
    user    => $username,
    require => File["/home/${username}/.ssh"],
  }
  # copy in all the files in the subdirectory
  file { "/home/${username}":
    recurse => true,
    mode    => '0700',
    owner   => $username,
    group   => $username,
    source  => [
      "puppet:///modules/admin_user/${username}",
      'puppet:///modules/admin_user/base' ],
    sourceselect => 'all',
    require      => User["$username"],
  }
}
Create a
  1. base directory and copy all the system default files from /etc/skel:
    t@mylaptop ~/puppet/modules/admin_user/files $ cp -a /etc/skel base
    
  2. Create a new admin_user resource, one that will not have a directory defined:
    node 'cookbook' {
      admin_user {'steven':
        key     => 'AAAAB3N...',
        keytype => 'dsa',
      }
    }
  3. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413787159'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/User[steven]/ensure: created
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven]/ensure: created
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.bash_logout]/ensure: defined content as '{md5}6a5bc1cc5f80a48b540bc09d082b5855'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.emacs]/ensure: defined content as '{md5}de7ee35f4058681a834a99b5d1b048b3'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.bashrc]/ensure: defined content as '{md5}2f8222b4f275c4f18e69c34f66d2631b'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.bash_profile]/ensure: defined content as '{md5}f939eb71a81a9da364410b799e817202'
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/File[/home/steven/.ssh]/ensure: created
    Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[steven]/Ssh_authorized_key[steven_key]/ensure: created
    Notice: Finished catalog run in 1.11 seconds
    

How it works...

If a file resource has the recurse parameter set on it, and it is a directory, Puppet will deploy not only the directory itself, but all its contents (including subdirectories and their contents). As we saw in the previous example, when a file has more than one source, the first source file found is used to satisfy the request. This applies to directories as well.

There's more...

By specifying the parameter sourceselect as 'all', the contents of all the source directories will be combined. For example, add thomas admin_user back into your node definition in site.pp for cookbook:

admin_user {'thomas':
    key     => 'ABBA...',
    keytype => 'rsa',
  }

Now run Puppet again on cookbook:

[root@cookbook thomas]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413787770'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/content: content changed '{md5}3e8337f44f84b298a8a99869ae8ca76a' to '{md5}f939eb71a81a9da364410b799e817202'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bash_profile]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_logout]/ensure: defined content as '{md5}6a5bc1cc5f80a48b540bc09d082b5855'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/content: content changed '{md5}db2a20b2b9cdf36cca1ca4672622ddd2' to '{md5}033c3484e4b276e0641becc3aa268a3a'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bashrc]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.emacs]/ensure: defined content as '{md5}de7ee35f4058681a834a99b5d1b048b3'
Notice: Finished catalog run in 0.86 seconds

Because we previously applied the thomas admin_user to cookbook, the user existed. The two files defined in the thomas directory on the Puppet server were already in the home directory, so only the additional files, .bash_logout, .bash_profile, and .emacs were created. Using these two parameters together, you can have default files that can be overridden easily.

Sometimes you want to deploy files to an existing directory but remove any files which aren't managed by Puppet. A good example would be if you are using mcollective in your environment. The directory holding client credentials should only have certificates that come from Puppet.

The purge parameter will do this for you. Define the directory as a resource in Puppet:

file { '/etc/mcollective/ssl/clients':
  purge   => true,
  recurse => true,
}

The combination of recurse and purge will remove all files and subdirectories in /etc/mcollective/ssl/clients that are not deployed by Puppet. You can then deploy your own files to that location by placing them in the appropriate directory on the Puppet server.

If there are subdirectories that contain files you don't want to purge, just define the subdirectory as a Puppet resource, and it will be left alone:

file { '/etc/mcollective/ssl/clients':
  purge => true,
  recurse => true,
}
file { '/etc/mcollective/ssl/clients/local':
  ensure => directory,
}

Note

Be aware that, at least in current implementations of Puppet, recursive file copies can be quite slow and place a heavy memory load on the server. If the data doesn't change very often, it might be better to deploy and unpack a tar file instead. This can be done with a file resource for the tar file and an exec, which requires the file resource and unpacks the archive. Recursive directories are less of a problem when filled with small files. Puppet is not a very efficient file server, so creating large tar files and distributing them with Puppet is not a good idea either. If you need to copy large files around, using the Operating Systems packager is a better solution.

How it works...

If a file resource

has the recurse parameter set on it, and it is a directory, Puppet will deploy not only the directory itself, but all its contents (including subdirectories and their contents). As we saw in the previous example, when a file has more than one source, the first source file found is used to satisfy the request. This applies to directories as well.

There's more...

By specifying the parameter sourceselect as 'all', the contents of all the source directories will be combined. For example, add thomas admin_user back into your node definition in site.pp for cookbook:

admin_user {'thomas':
    key     => 'ABBA...',
    keytype => 'rsa',
  }

Now run Puppet again on cookbook:

[root@cookbook thomas]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413787770'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/content: content changed '{md5}3e8337f44f84b298a8a99869ae8ca76a' to '{md5}f939eb71a81a9da364410b799e817202'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bash_profile]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_logout]/ensure: defined content as '{md5}6a5bc1cc5f80a48b540bc09d082b5855'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/content: content changed '{md5}db2a20b2b9cdf36cca1ca4672622ddd2' to '{md5}033c3484e4b276e0641becc3aa268a3a'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bashrc]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.emacs]/ensure: defined content as '{md5}de7ee35f4058681a834a99b5d1b048b3'
Notice: Finished catalog run in 0.86 seconds

Because we previously applied the thomas admin_user to cookbook, the user existed. The two files defined in the thomas directory on the Puppet server were already in the home directory, so only the additional files, .bash_logout, .bash_profile, and .emacs were created. Using these two parameters together, you can have default files that can be overridden easily.

Sometimes you want to deploy files to an existing directory but remove any files which aren't managed by Puppet. A good example would be if you are using mcollective in your environment. The directory holding client credentials should only have certificates that come from Puppet.

The purge parameter will do this for you. Define the directory as a resource in Puppet:

file { '/etc/mcollective/ssl/clients':
  purge   => true,
  recurse => true,
}

The combination of recurse and purge will remove all files and subdirectories in /etc/mcollective/ssl/clients that are not deployed by Puppet. You can then deploy your own files to that location by placing them in the appropriate directory on the Puppet server.

If there are subdirectories that contain files you don't want to purge, just define the subdirectory as a Puppet resource, and it will be left alone:

file { '/etc/mcollective/ssl/clients':
  purge => true,
  recurse => true,
}
file { '/etc/mcollective/ssl/clients/local':
  ensure => directory,
}

Note

Be aware that, at least in current implementations of Puppet, recursive file copies can be quite slow and place a heavy memory load on the server. If the data doesn't change very often, it might be better to deploy and unpack a tar file instead. This can be done with a file resource for the tar file and an exec, which requires the file resource and unpacks the archive. Recursive directories are less of a problem when filled with small files. Puppet is not a very efficient file server, so creating large tar files and distributing them with Puppet is not a good idea either. If you need to copy large files around, using the Operating Systems packager is a better solution.

There's more...

By specifying the

parameter sourceselect as 'all', the contents of all the source directories will be combined. For example, add thomas admin_user back into your node definition in site.pp for cookbook:

admin_user {'thomas':
    key     => 'ABBA...',
    keytype => 'rsa',
  }

Now run Puppet again on cookbook:

[root@cookbook thomas]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413787770'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/content: content changed '{md5}3e8337f44f84b298a8a99869ae8ca76a' to '{md5}f939eb71a81a9da364410b799e817202'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_profile]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bash_profile]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bash_logout]/ensure: defined content as '{md5}6a5bc1cc5f80a48b540bc09d082b5855'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/content: content changed '{md5}db2a20b2b9cdf36cca1ca4672622ddd2' to '{md5}033c3484e4b276e0641becc3aa268a3a'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/group: group changed 'root' to 'thomas'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.bashrc]/mode: mode changed '0644' to '0700'
Notice: /File[/home/thomas/.bashrc]/seluser: seluser changed 'system_u' to 'unconfined_u'
Notice: /Stage[main]/Main/Node[cookbook]/Admin_user[thomas]/File[/home/thomas/.emacs]/ensure: defined content as '{md5}de7ee35f4058681a834a99b5d1b048b3'
Notice: Finished catalog run in 0.86 seconds

Because we previously applied the thomas admin_user to cookbook, the user existed. The two files defined in the thomas directory on the Puppet server were already in the home directory, so only the additional files, .bash_logout, .bash_profile, and .emacs were created. Using these two parameters together, you can have default files that can be overridden easily.

Sometimes you want to deploy files to an existing directory but remove any files which aren't managed by Puppet. A good example would be if you are using mcollective in your environment. The directory holding client credentials should only have certificates that come from Puppet.

The purge parameter will do this for you. Define the directory as a resource in Puppet:

file { '/etc/mcollective/ssl/clients':
  purge   => true,
  recurse => true,
}

The combination of recurse and purge will remove all files and subdirectories in /etc/mcollective/ssl/clients that are not deployed by Puppet. You can then deploy your own files to that location by placing them in the appropriate directory on the Puppet server.

If there are subdirectories that contain files you don't want to purge, just define the subdirectory as a Puppet resource, and it will be left alone:

file { '/etc/mcollective/ssl/clients':
  purge => true,
  recurse => true,
}
file { '/etc/mcollective/ssl/clients/local':
  ensure => directory,
}

Note

Be aware that, at least in current implementations of Puppet, recursive file copies can be quite slow and place a heavy memory load on the server. If the data doesn't change very often, it might be better to deploy and unpack a tar file instead. This can be done with a file resource for the tar file and an exec, which requires the file resource and unpacks the archive. Recursive directories are less of a problem when filled with small files. Puppet is not a very efficient file server, so creating large tar files and distributing them with Puppet is not a good idea either. If you need to copy large files around, using the Operating Systems packager is a better solution.

Cleaning up old files

Puppet's tidy resource will help you clean up old or out-of-date files, reducing disk usage. For example, if you have Puppet reporting enabled as described in the section on generating reports, you might want to regularly delete old report files.

How to do it...

Let's get started.

  1. Modify your site.pp file as follows:
    node 'cookbook' {
      tidy { '/var/lib/puppet/reports':
        age     => '1w',
        recurse => true,
      }
    }
  2. Run Puppet:
    [root@cookbook clients]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409090637.yaml]/ensure: removed
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409100556.yaml]/ensure: removed
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409090631.yaml]/ensure: removed
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201408210557.yaml]/ensure: removed
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409080557.yaml]/ensure: removed
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409100558.yaml]/ensure: removed
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201408210546.yaml]/ensure: removed
    Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201408210539.yaml]/ensure: removed
    Notice: Finished catalog run in 0.80 seconds
    

How it works...

Puppet searches the specified path for any files matching the age parameter; in this case, 2w (two weeks). It also searches subdirectories (recurse => true).

Any files matching your criteria will be deleted.

There's more...

You can specify file ages in seconds, minutes, hours, days, or weeks by using a single character to specify the time unit, as follows:

  • 60s
  • 180m
  • 24h
  • 30d
  • 4w

You can specify that files greater than a given size should be removed, as follows:

size => '100m',

This removes files of 100 megabytes and over. For kilobytes, use k, and for bytes, use b.

Note

Note that if you specify both age and size parameters, they are treated as independent criteria. For example, if you specify the following, Puppet will remove all files that are either at least one day old, or at least 512 KB in size:

age => "1d",

size => "512k",

How to do it...

Let's get started.

Modify your site.pp file as follows:
node 'cookbook' {
  tidy { '/var/lib/puppet/reports':
    age     => '1w',
    recurse => true,
  }
}
Run Puppet:
[root@cookbook clients]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409090637.yaml]/ensure: removed
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409100556.yaml]/ensure: removed
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409090631.yaml]/ensure: removed
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201408210557.yaml]/ensure: removed
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409080557.yaml]/ensure: removed
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201409100558.yaml]/ensure: removed
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201408210546.yaml]/ensure: removed
Notice: /Stage[main]/Main/Node[cookbook]/File[/var/lib/puppet/reports/cookbook.example.com/201408210539.yaml]/ensure: removed
Notice: Finished catalog run in 0.80 seconds

How it works...

Puppet searches the specified path for any files matching the age parameter; in this case, 2w (two weeks). It also searches subdirectories (recurse => true).

Any files matching your criteria will be deleted.

There's more...

You can specify file ages in seconds, minutes, hours, days, or weeks by using a single character to specify the time unit, as follows:

  • 60s
  • 180m
  • 24h
  • 30d
  • 4w

You can specify that files greater than a given size should be removed, as follows:

size => '100m',

This removes files of 100 megabytes and over. For kilobytes, use k, and for bytes, use b.

Note

Note that if you specify both age and size parameters, they are treated as independent criteria. For example, if you specify the following, Puppet will remove all files that are either at least one day old, or at least 512 KB in size:

age => "1d",

size => "512k",

How it works...

Puppet searches the

specified path for any files matching the age parameter; in this case, 2w (two weeks). It also searches subdirectories (recurse => true).

Any files matching your criteria will be deleted.

There's more...

You can specify file ages in seconds, minutes, hours, days, or weeks by using a single character to specify the time unit, as follows:

  • 60s
  • 180m
  • 24h
  • 30d
  • 4w

You can specify that files greater than a given size should be removed, as follows:

size => '100m',

This removes files of 100 megabytes and over. For kilobytes, use k, and for bytes, use b.

Note

Note that if you specify both age and size parameters, they are treated as independent criteria. For example, if you specify the following, Puppet will remove all files that are either at least one day old, or at least 512 KB in size:

age => "1d",

size => "512k",

There's more...

You can specify file ages in seconds, minutes, hours, days, or weeks by using a single character to specify the time unit, as follows:

60s
180m
24h
30d
4w

You can specify that files greater than a given size should be removed, as follows:

size => '100m',

This removes files of 100 megabytes and over. For kilobytes, use k, and for bytes, use b.

Note

Note that if you specify both age and size parameters, they are treated as independent criteria. For example, if you specify the following, Puppet will remove all files that are either at least one day old, or at least 512 KB in size:

age => "1d",

size => "512k",

Auditing resources

Dry run mode, using the --noop switch, is a simple way to audit any changes to a machine under Puppet's control. However, Puppet also has a dedicated audit feature, which can report changes to resources or specific attributes.

How to do it...

Here's an example showing Puppet's auditing capabilities:

  1. Modify your site.pp file as follows:
    node 'cookbook' {
      file { '/etc/passwd':
        audit => [ owner, mode ],
      }
    }
  2. Run Puppet:
    [root@cookbook clients]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413789080'
    Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/passwd]/owner: audit change: newly-recorded value 0
    Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/passwd]/mode: audit change: newly-recorded value 644
    Notice: Finished catalog run in 0.55 seconds
    

How it works...

The audit metaparameter tells Puppet that you want to record and monitor certain things about the resource. The value can be a list of the parameters that you want to audit.

In this case, when Puppet runs, it will now record the owner and mode of the /etc/passwd file. In future runs, Puppet will spot whether either of these has changed. For example, if you run:

[root@cookbook ~]# chmod 666 /etc/passwd

Puppet will pick up this change and log it on the next run:

Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/passwd]/mode: audit change: previously recorded value 0644 has been changed to 0666

There's more...

This feature is very useful to audit large networks for any changes to machines, either malicious or accidental. It's also very handy to keep an eye on things that aren't managed by Puppet, for example, application code on production servers. You can read more about Puppet's auditing capability here:

http://puppetlabs.com/blog/all-about-auditing-with-puppet/

If you just want to audit everything about a resource, use all:

file { '/etc/passwd':
  audit => all,
}

See also

  • The Noop - the don't change anything option recipe in Chapter 10, Monitoring, Reporting, and Troubleshooting
How to do it...

Here's an example

showing Puppet's auditing capabilities:

  1. Modify your site.pp file as follows:
    node 'cookbook' {
      file { '/etc/passwd':
        audit => [ owner, mode ],
      }
    }
  2. Run Puppet:
    [root@cookbook clients]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413789080'
    Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/passwd]/owner: audit change: newly-recorded value 0
    Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/passwd]/mode: audit change: newly-recorded value 644
    Notice: Finished catalog run in 0.55 seconds
    

How it works...

The audit metaparameter tells Puppet that you want to record and monitor certain things about the resource. The value can be a list of the parameters that you want to audit.

In this case, when Puppet runs, it will now record the owner and mode of the /etc/passwd file. In future runs, Puppet will spot whether either of these has changed. For example, if you run:

[root@cookbook ~]# chmod 666 /etc/passwd

Puppet will pick up this change and log it on the next run:

Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/passwd]/mode: audit change: previously recorded value 0644 has been changed to 0666

There's more...

This feature is very useful to audit large networks for any changes to machines, either malicious or accidental. It's also very handy to keep an eye on things that aren't managed by Puppet, for example, application code on production servers. You can read more about Puppet's auditing capability here:

http://puppetlabs.com/blog/all-about-auditing-with-puppet/

If you just want to audit everything about a resource, use all:

file { '/etc/passwd':
  audit => all,
}

See also

  • The Noop - the don't change anything option recipe in Chapter 10, Monitoring, Reporting, and Troubleshooting
How it works...

The audit metaparameter tells Puppet that you want to record and monitor certain things about the resource. The value can be a list of the parameters that you want to audit.

In this case, when Puppet runs, it will now record the owner and mode of the /etc/passwd file. In future runs, Puppet will spot whether either of these has changed. For example, if you run:

[root@cookbook ~]# chmod 666 /etc/passwd

Puppet will pick up this change and log it on the next run:

Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/passwd]/mode: audit change: previously recorded value 0644 has been changed to 0666

There's more...

This feature is very useful to audit large networks for any changes to machines, either malicious or accidental. It's also very handy to keep an eye on things that aren't managed by Puppet, for example, application code on production servers. You can read more about Puppet's auditing capability here:

http://puppetlabs.com/blog/all-about-auditing-with-puppet/

If you just want to audit everything about a resource, use all:

file { '/etc/passwd':
  audit => all,
}

See also

  • The Noop - the don't change anything option recipe in Chapter 10, Monitoring, Reporting, and Troubleshooting
There's more...

This feature is very useful to audit large networks for any changes to machines, either malicious or accidental. It's also very handy to keep an eye on things that aren't managed by Puppet, for example, application code on production servers. You can read more about Puppet's auditing capability

here:

http://puppetlabs.com/blog/all-about-auditing-with-puppet/

If you just want to audit everything about a resource, use all:

file { '/etc/passwd':
  audit => all,
}

See also

  • The Noop - the don't change anything option recipe in Chapter 10, Monitoring, Reporting, and Troubleshooting
See also

The Noop - the don't change anything option recipe in
  • Chapter 10, Monitoring, Reporting, and Troubleshooting

Temporarily disabling resources

Sometimes you want to disable a resource for the time being so that it doesn't interfere with other work. For example, you might want to tweak a configuration file on the server until you have the exact settings you want, before checking it into Puppet. You don't want Puppet to overwrite it with an old version in the meantime, so you can set the noop metaparameter on the resource:

noop => true,

How to do it...

This example shows you how to use the noop metaparameter:

  1. Modify your site.pp file as follows:
    node 'cookbook' {
      file { '/etc/resolv.conf':
        content => "nameserver 127.0.0.1\n",
        noop    => true,
      }
    }
  2. Run Puppet:
    [root@cookbook ~]# puppet agent -t
    Info: Caching catalog for cookbook.example.com
    Info: Applying configuration version '1413789438'
    Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/resolv.conf]/content: 
    --- /etc/resolv.conf  2014-10-20 00:27:43.095999975 -0400
    +++ /tmp/puppet-file20141020-8439-1lhuy1y-0	2014-10-20 03:17:18.969999979 -0400
    @@ -1,3 +1 @@  
    -; generated by /sbin/dhclient-script
    -search example.com
    -nameserver 192.168.122.1
    +nameserver 127.0.0.1
    
    Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/resolv.conf]/content: current_value {md5}4c0d192511df253826d302bc830a371b, should be {md5}949343428bded6a653a85910f6bdb48e (noop)
    Notice: Node[cookbook]: Would have triggered 'refresh' from 1 events
    Notice: Class[Main]: Would have triggered 'refresh' from 1 events
    Notice: Stage[main]: Would have triggered 'refresh' from 1 events
    Notice: Finished catalog run in 0.50 seconds
    

How it works...

The noop metaparameter is set to true, so for this particular resource, it's as if you had to run Puppet with the --noop flag. Puppet noted that the resource would have been applied, but otherwise did nothing.

The nice thing with running the agent in test mode (-t) is that Puppet output a diff of what it would have done if the noop was not present (you can tell puppet to show the diff's without using -t with --show_diff; -t implies many different settings):

--- /etc/resolv.conf  2014-10-20 00:27:43.095999975 -0400
+++ /tmp/puppet-file20141020-8439-1lhuy1y-0	2014-10-20 03:17:18.969999979 -0400
@@ -1,3 +1 @@
-; generated by /sbin/dhclient-script
-search example.com
-nameserver 192.168.122.1
+nameserver 127.0.0.1

This can be very useful when debugging a template; you can work on your changes and then see what they would look like on the node without actually applying them. Using the diff, you can see whether your updated template produces the correct output.

How to do it...

This example shows you how to use the noop metaparameter:

Modify your site.pp file as follows:
node 'cookbook' {
  file { '/etc/resolv.conf':
    content => "nameserver 127.0.0.1\n",
    noop    => true,
  }
}
Run Puppet:
[root@cookbook ~]# puppet agent -t
Info: Caching catalog for cookbook.example.com
Info: Applying configuration version '1413789438'
Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/resolv.conf]/content: 
--- /etc/resolv.conf  2014-10-20 00:27:43.095999975 -0400
+++ /tmp/puppet-file20141020-8439-1lhuy1y-0	2014-10-20 03:17:18.969999979 -0400
@@ -1,3 +1 @@  
-; generated by /sbin/dhclient-script
-search example.com
-nameserver 192.168.122.1
+nameserver 127.0.0.1

Notice: /Stage[main]/Main/Node[cookbook]/File[/etc/resolv.conf]/content: current_value {md5}4c0d192511df253826d302bc830a371b, should be {md5}949343428bded6a653a85910f6bdb48e (noop)
Notice: Node[cookbook]: Would have triggered 'refresh' from 1 events
Notice: Class[Main]: Would have triggered 'refresh' from 1 events
Notice: Stage[main]: Would have triggered 'refresh' from 1 events
Notice: Finished catalog run in 0.50 seconds

How it works...

The noop metaparameter is set to true, so for this particular resource, it's as if you had to run Puppet with the --noop flag. Puppet noted that the resource would have been applied, but otherwise did nothing.

The nice thing with running the agent in test mode (-t) is that Puppet output a diff of what it would have done if the noop was not present (you can tell puppet to show the diff's without using -t with --show_diff; -t implies many different settings):

--- /etc/resolv.conf  2014-10-20 00:27:43.095999975 -0400
+++ /tmp/puppet-file20141020-8439-1lhuy1y-0	2014-10-20 03:17:18.969999979 -0400
@@ -1,3 +1 @@
-; generated by /sbin/dhclient-script
-search example.com
-nameserver 192.168.122.1
+nameserver 127.0.0.1

This can be very useful when debugging a template; you can work on your changes and then see what they would look like on the node without actually applying them. Using the diff, you can see whether your updated template produces the correct output.

How it works...

The noop metaparameter is set to true, so for this particular resource, it's as if you had to run Puppet with the --noop flag. Puppet noted that the resource would have been applied, but otherwise did nothing.

The nice thing with running the agent in test mode (-t) is that Puppet output a diff of what it would have done if the noop was not present (you can tell puppet to show the diff's without using -t with --show_diff; -t implies many different settings):

--- /etc/resolv.conf 2014-10-20 00:27:43.095999975 -0400 +++ /tmp/puppet-file20141020-8439-1lhuy1y-0 2014-10-20 03:17:18.969999979 -0400 @@ -1,3 +1 @@ -; generated by /sbin/dhclient-script -search example.com -nameserver 192.168.122.1 +nameserver 127.0.0.1

This can be very useful when debugging a template; you can work on your changes and then see what they would look like on the node without actually applying them. Using the diff, you can see whether your updated template produces the correct output.

You have been reading a chapter from
DevOps: Puppet, Docker, and Kubernetes
Published in: Mar 2017
Publisher: Packt
ISBN-13: 9781788297615
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image