Puppet Infrastructure Options

Puppet is a powerful framework to allow for centralized configuration management and infrastructure as code (IaC). This is the starting point for a series about IaC and Puppet, in addition to my series about Kubernetes. Let's find out about our options to build a great Puppet landscape.

Table of Contents

Introduction

What exactly is configuration management, and why should I use it? How can I improve my IT landscape and its availability with Infrastructure as Code? I am using shell scripts, why should I not continue to do so? What is Puppet, and why should I use it, when there are other tools like Ansible or Saltstack available?

Well these are good questions, let's dive a little into it.

Configuration Management

Redhat defines configuration management as follows:
Configuration management is a process for maintaining computer systems, servers, and software in a desired, consistent state. It’s a way to make sure that a system performs as it’s expected to as changes are made over time. Managing IT system configurations involves defining a system's desired state—like server configuration—then building and maintaining those systems. Closely related to configuration assessments and drift analyses, configuration management uses both to identify systems to update, reconfigure, or patch.

The main point is the definition of a desired state indeed. Configuration of a system in the '*nix' world (Unix, Linux) is almost always done in the same way:

  • install binaries
  • manage one or more configuration files (content, permissions) and their respective parent directories
  • manage the status of a service (daemon).

For instance, to keep your infrastructure in time sync, you would use the NTP protocol. That typically includes NTP servers and clients, and hence a variety of different settings. Modern infrastructure landscapes often include many different systems and even various operating systems, and you will not want to have to touch every single one of them all the time, yet allowing to configure different settings as required.
So you define the desired state of those components and apply it for all systems in questions. Usually we populate the configuration files through parameters and variables, which allows us using one or more templates for the various scenarios through defaults and overrides.
A working very simple example would look like this:

class ntp () {

  package { 'ntp': 
    ensure  =>  present,
  }
  
  service { 'ntpd':
    ensure  =>  running,
    enable  =>  true,
  }
}

The 'package' section will install the binaries, and the 'service' section will enable the service and make sure that it is running. Note that the binary installation will install a default configuration file, which is not ideal but will suffice for this example. Once you applied the class to all required systems, you will find the service running and synchronizing to time servers in the internet, assuming that the firewall allows it.

Of course the reality will likely consist of much more complex classes and modules. However, we did define the desired state (here which binary to install and to manage the service) and let your tool of choice (here Puppet) take care to put it into action.

To manage or not to manage, that seems to be the question here?

Over time you will determine which aspects of your infrastructure ( i.e. which services and applications etc) you will want to actively manage, while others probably be left alone, typically anything which does not need special configuration.

Almost always you will control your access management centrally, i.e. SSH, LDAP etc. which both rely on accurate time measures, so NTP is important too, and the proper management of file system permissions. Sooner or later this develops into tight control of things, and a chain of classes / modules to manage. On the other hand, it can lead quickly to an undesired overkill, namely when things are controlled which don't need controlling. While you probably need to ensure all required configuration files are there and properly accessible, you may not need to manage the contents itself always.

One of the advantages of developing a Puppet infrastructure (or any other configuration management framework for that matter) is the understanding how the services and applications actually work under the hood. This allows you to align the configuration of your services against the requirements of your infrastructure. This is what configuration management and IaC is all about. Manage what you need to manage. Leave everything else alone. Improve service availability, performance and robustness of your infrastructure. Decrease the amount of repeated work required.

For this to work well, you will need a proper Puppet infrastructure in place, as well as a versioning mechanism like git. More about this below. Meanwhile have a look at Infrastructure as Code, if you did not do that yet.

Puppet Basics

So how does Puppet basically work? While you can use Puppet Enterprise, which does give a lot of advanced management options and support, Puppet Open Source is actually all you need to manage your environment in most environments. This is what I focus on in here.
In either case you have at least one Puppet server, an agent on the managed host. There is also a facter , and a PuppetDB instance.

The Puppet server compiles classes consisting of the functions and logic to build your desired state into a catalog. It also uses variables to override defaults based on facts or provided values. The catalog is provided to the agents based on host allocations.

The facter reads certain local values and provides them as facts to the agent. The agent fetches the catalog from the server. It compares the current and desired state and applies required changes. To ensure your Puppet infrastructure is always update to date about its entire state, the agent creates and uploads reports about the puppet run. Agents do speak the specific package management language of the host OS, like apt or yum.

PuppetDB stores exported values, as well as reports, and provides them back for queries.

Infrastructure Requirements and Options

So at a minimum there must be one Puppet server and per managed host the agent, which also installs the facter and other required components. PuppetDB should be also used. But then where does source code for the catalog live, and what about those variables to override class defaults? How are thy applied to the server? Is only one server a good idea? How is the code developed and tested?

This all depends on the model used in your environment, typically either using hiera or ENC. Also both can be used, but that gets very quickly complex and should be avoided.

Option 1 - Hiera

Like mentioned before, classes contain the functionality and logic to build the desired state. While we can always write this as static code (see the ntp example above), this is often not feasible. Sometimes you want the desired statue to be a specific package version,, not just to be present. So let's change that code a little bit:

class ntp (
 $package_ensure = present,
 $service_ensure = running,
) {
  package { 'ntp': 
    ensure  =>  $package_ensure,
  }
 
  service { 'ntpd':
    ensure  =>  $service_ensure,
    enable  =>  true,
  }
}

We introduced a variable for each "ensure" and set a default. When the agent runs, it looks up the variable instead the static code. Now if we provide a different value for the host in question for some or all of the variables, this one overrides the default. All of a sudden you can have a different outcome per host without touching the puppet code. For this to work, we need a place for the value to live, which can be an External Node Qualifier (ENC) see below, or here Hiera.

HIera is a yaml based key-value configuration data lookup system. Puppet is always aware of Hiera. All you need is configure a hiera.yaml file, which contains the places to look for the keys and their values. Those typically live in a hierarchy of yaml files (hence "hiera"). So the main hiera.yaml needs to define the hierarchy of the files. It also allows to use encrypted eyaml files for sensitive things like passwords, which is very cool and important to keep your Puppet infrastructure secure.

Structuring data

The actual data files are really only key-value pairs and allow to contain strings, booleans and arrays, which need to be stored in a yaml-based structure. While many of those common values live in a common.yaml, there might be specific ones based on the host fqdn, so we might have a nodes/example.node.fqdn.yaml in a subfolder for useful structure.

This would result in the following example files (always include the file extensions):
common.yaml:

ntp_package_ensure: present
ntpd_service_ensure: running

example.node.fqdn.yaml

ntp_package_ensure: 4.2.6p5-28
ntpd_service_ensure: stopped

As a result, we'll now see all hosts per default having ntp installed, no matter what version, and the ntpd daemon to be running. However, the node called example.node.fqdn would have a very specific package installed, and the daemon will be stopped. If anyone would manually update the package to latest, and / or start the service, Puppet would now always go and revert to this particular package, and also stop the service, using the same puppet code.

Option 2 - Foreman as ENC

While using Puppet with Hiera is probably the most common way, it certainly is not the only option. Using external node classifiers is a very good way to structure your node data in different ways, often database-based. I personally prefer this too, using Foreman as ENC. Foreman allows for node management, reporting and alerting too, which is very handy. It also comes with a GUI on it's own and is much better than the Puppet console. You allocate classes to nodes from here, as well as key-values override the defaults per FQDN, domain, OS or hostgroup. You can always add your own criteria for that, but usually these suffice.

Foreman GUI
Foreman screen showing status of Puppet infrastrructure

We import the Puppet code directly from the Puppet server(s) via a foreman-proxy, and store them in a RDM database like PostgreSQL. The structured data is added via GUI or cli and held in the same database, which allows the Foreman to compile and provide the yaml files required by the Puppet agent. Foreman also tracks changes the code or structured data in the same database.

Foreman adds a lot of other features, like user roles, host management including building hosts from scratch via PXE boot through templates, managing cloud environments or vSphere nodes etc remotely .

The next screen shows the overriding of default values via Foreman:

overriding default values based on FQDN

Like with Hiera, we use the same Puppet code with variables. We need to import the code to Foreman through foreman-proxy for Foreman to be aware of the parameters. We never should have to adjust the Puppet code itself.

Conclusion

Advantages in using Hiera for your Puppet infrastructure are:

  • data structuring is easi via key-value pairs in yaml format, which is easy to read.
  • structruring encrypted data in eyaml format is very easy and secure.
  • The data can (and should) be versioned i.e. within its own git repo. As we can track changes easily and revert if needed.
  • no additional nodes required to host the data, it all lives on the Puppet server(s).

Advantages of using Foreman as ENC:

  • GUI-driven management
  • cli available too for CICD pipelines
  • many valuable features, including management and grouping of hosts and users, alerting, reporting of facts and trends.
  • RDM database for code and and parameters

The yaml file for the puppet agent however is plain, unencrpyted. However the connection between Puppet server and agent is always TLS encrypted. Unless you decide to use autosigning new hosts (not recommended), you should be clear.

It is possible can combine HIera and ENC together, but that leads often to extremely complex scenarios. Troubleshooting can be cumbersome, especially when you receive a phone call in the middle of the night in case of failure. It can also lower the performance significantly.

Which option is the better for you depends really on the requirements of your company and probably even more on the available skills expertise of your stuff.

About Post Author

1 thought on “Puppet Infrastructure Options

Leave a Reply

Your email address will not be published. Required fields are marked *

three × 2 =