Variable precedence
As you learned in the previous section, there are a few major types of variables that can be defined in a myriad of locations. This leads to a very important question, what happens when the same variable name is used in multiple locations? Ansible has a precedence for loading variable data, and thus it has an order and a definition to decide which variable will "win". Variable value overriding is an advanced usage of Ansible, so it is important to fully understand the semantics before attempting such a scenario.
Precedence order
Ansible defines the precedence order as follows:
- Extra vars (from command line) always win
- Connection variables defined in inventory
- Most everything else
- Rest of the variables defined in inventory
- Facts discovered about a system
- Role defaults
This list is a useful starting point, however things are a bit more nuanced, as we will explore.
Extra-vars
Extra-vars, as supplied on the command line, certainly overrides anything else. Regardless of where else a variable might be defined, even if it's explicitly set in a play with set_fact
, the value provided on the command line will be the value used.
Connection variables
Next up are connection variables, the behavioral variables outlined earlier. These are variables that influence how Ansible will connect to and execute tasks on a system. These are variables like ansible_ssh_user
, ansible_ssh_host
, and others as described in the earlier section regarding behavioral inventory parameters. The Ansible documentation states that these come from the inventory, however, they can be overridden by tasks such as set_fact
. A set_fact
module on a variable such as ansible_ssh_user
will override the value that came from the inventory source. There is a precedence order within the inventory as well. Host-specific definitions will override group definitions, and child group definitions will override parent of group definitions. This allows for having a value that applies to most things in a group and overrides it on specific hosts that would be different. When a host belongs to multiple groups and each group defines the same variable with different values, the behavior is less defined and strongly discouraged.
Most everything else
The "most everything else" block is a big grouping of sources. These include:
- Command line switches
- Play variables
- Task variables
- Role variables (not defaults)
These sets of variables can override each other as well, with the rule being that the last supplied variable wins. The role variables in this set refer to the variables provided in a role's vars/main.yaml
file and the variables defined when assigning a role or a role dependency. In this example, we will provide a variable named role_var
at the time we assign the role:
- role: example_role role_var: var_value_here
An important nuance here is that a definition provided at role assignment time will override the definition within a role's vars/main.yaml
file. Also remember the last provided rule; if within the role example_role
, the role_var
variable is redefined via a task, that definition will win from that point on.
The rest of the inventory variables
The next lower set of variables is the remaining inventory variables. These are variables that can be defined within the inventory data, but do not alter the behavior of Ansible. The rules from connection variables apply here.
Facts discovered about a system
Discovered facts variables are the variables we get when gathering facts. The exact list of variables depends on the platform of the host and the extra software that can be executed to display system information, which might be installed on said host. Outside of role defaults, these are the lowest level of variables and are most likely to be overridden.
Role defaults
Roles can have default variables defined within them. These are reasonable defaults for use within the role and are customization targets for role applications. This makes roles much more reusable, flexible, and tuneable to the environment and conditions in which the role will be applied.
Merging hashes
In the previous section, we focused on the order of precedence in which variables will override each other. The default behavior of Ansible is that any overriding definition for a variable name will completely mask the previous definition of that variable. However, that behavior can be altered for one type of variable, the hash. A hash variable (a "dictionary" in Python terms) is a dataset of keys and values. Values can be of different types for each key, and can even be hashes themselves for complex data structures.
In some advanced scenarios, it is desirable to replace just one bit of a hash or add to an existing hash rather than replacing the hash altogether. To unlock this ability, a configuration change is necessary in an Ansible config file. The config entry is hash_behavior
, which takes one of replace, or merge. A setting of merge will instruct Ansible to merge or blend the values of two hashes when presented with an override scenario rather than the default of replace, which will completely replace the old variable data with the new data.
Let's walk through an example of the two behaviors. We will start with a hash loaded with data and simulate a scenario where a different value for the hash is provided as a higher priority variable.
Starting data:
hash_var: fred: home: Seattle transport: Bicycle
New data loaded via include_vars:
hash_var: fred: transport: Bus
With the default behavior, the new value for hash_var
will be:
hash_var: fred: transport: Bus
However, if we enable the merge behavior we would get the following result:
hash_var: fred: home: Seattle transport: Bus
There are even more nuances and undefined behaviors when using merge, and as such, it is strongly recommended to only use this setting if absolutely needed.