Understanding SoT
We’ve already mentioned SoT a few times. It’s finally time to dive in. Let’s start by talking about data. We’ll do that through the lens of making a change on the network.
Let’s assume that you want to turn up a new port that’s going to terminate a connection to a new building. If you look at other similar configurations on the same device, you’re going to find a configuration similar to this:
interface vlan100 description Routed Interface for connection to off campus house ip address 10.1.100.1/24 interface GigabitEthernet4/1 description connects to och-sw-01 GigabitEthernet1/1 (off campus house) switchport switchport access vlan 100 vlan 100 name off_campus_house
Is there any other way to configure the same interface? Could we have used a routed port? Could we have configured a trunk instead? A different prefix? Sure, these are all valid possibilities. The point is that you are going to have your own standards, and they will drive your new configuration. When adopting a SoT approach, we need to decouple data from configuration syntax.
For example, the standard configuration you copy and paste becomes your template while you extract the data. That data becomes any input that changes to derive a configuration. In this example, the data is as follows:
- SVI interface: 100
- SVI description: Routed interface for connection to off-campus house
- SVI IP address: 10.1.100.1/24
- Physical interface: GigabitEthernet4/1
- Physical interface description: Connects to och-sw-01 GigabitEthernet1/1
- VLAN ID: 100
In reality, both descriptions – that is, the SVI interface and the IP address, could be removed from data inputs since they can be auto-generated from the VLAN ID. We’ll see that soon. For descriptions, they can be auto-generated by having a use case or description of the project defined. Let’s look at a few examples of showing this data as YAML structured data:
Note
Teaching YAML and Jinja2 is outside the scope of this book.
svi_interface: 100svi_description: Routed Interface for connection to off campus house svi_ip_address: 10.1.100.1/24 physical_interface: GigabitEthernet4/1 physical_interface_description: connects to och-sw-01 GigabitEthernet1/1 (off campus house) vlan_id: 100
You may opt to nest some data, like this:
svi:  interface: 100   description: Routed Interface for connection to off campus house   ip_address: 10.1.100.1/24 physical_interface:   name: GigabitEthernet4/1   description: connects to och-sw-01 GigabitEthernet1/1 (off campus house) vlan_id: 100
Going one step further, a few values could be eliminated if there is more logic in your Jinja2 template. This one also adds data for the remote peer:
physical_interface: GigabitEthernet4/1vlan_id: 100 connection:   description: Routed Interface for connection to off campus house   remote_peer: och-sw-01   remote_interface: GigabitEthernet1/1
Finally, a Jinja template that could consume this data and render a configuration snippet would look like this (focused on one of the devices):
interface vlan{{ vlan_id }} description {{ connection['description'] }} ip address 10.1.{{ vlan_id }}.1/24 interface {{ physical_interface }} description connects to {{ connection['remote_peer'] }} {{ connection['remote_interface'] }} switchport switchport access vlan {{ vlan_id }} vlan {{ vlan_id }} name {{ connection['description'] }}
Defining SoT
After looking at a few different ways to represent data, the main point is that we have successfully decoupled data, which is shown as YAML, and syntax, which is shown as a Jinja template. The templates are built or defined by those who own the standards. However, data is what needs to be created or updated for any given change. Focusing on the data focuses on a change, without getting pulled into syntactical details that vary per vendor.
This data is now the SoT (technically, the SoT would be the file that contains the data).
With our focus on the data, now comes the real questions to ask:
- Why did we pick GigabitEthernet4/1?
- Why was VLAN 100 chosen?
- Why was 10.1.100.1 chosen?
- How did we construct the interface descriptions?
It would be fairly common if you were checking one or more spreadsheets to get this data, but it’s more likely that you just knew because you’re good at what you do and you checked the devices and connections that you most recently deployed.
The idea of a SoT is that it allows you to plan and focus on what should be. A SoT defines the desired state. With a SoT, users manage the data that’s used for upcoming changes, which is then programmatically accessed by automation tools during a change. The automation tools access the data, render a network configuration, and then ensure that configuration exists on the network. On your SoT journey, you should be able to build a document that defines one tool as the authoritative source per type of data – for example, ASNs, VLANs, and so on.
Due to the breadth of network data required to manage a production network, often, one or more systems are used as an authoritative source of information to build a configuration. For example, a database might be used for inventory and IP addresses, and another that has policies used for ACLs. The authoritative source of data is the location where updates are made. This is also often referred to as a system of record (SoR). It’s worth calling out that SoT and SoR are often used interchangeably:
Figure 1.1 – Visualizing SoR, SoT, and SSOT
Generally speaking, the term SoT is a system that stores data from one or more SoRs. However, how often SoR and SoT are used interchangeably, the term Single Source of Truth (SSoT) is often used to reflect a system that is aggregating data from multiple SOR. This type of system allows relationships to be formed between these datasets and also provides one unified API that can be used to access all network data. Having this data accessed from a single API significantly lowers the amount of work required by your automation tooling. In Appendix 2, we review working with multiple SoTs, doing a deep dive on the Nautobot SSoT application, and discussing other designs used for managing network data.
Approaches to SoT
The previous section described the purist view and the most correct approach to understanding a SoT. It is based on the premise that the SoT always contains the intended state. This means that as a user, you change the data and then perform your change using that data. Of course, using automation to fetch the data is the ideal state, but even if you were using it as a documentation store, it’s a step in the right direction. The gap in this approach is that the SoT does not always reflect the actual state of the network (maybe a user makes a manual change because they don’t like automation or they are just fixing something quickly). There should be tooling built around the SoT in this approach that compares the SoT and the actual network. This provides assurance and compliance that the network is operating as expected.
Note
Based on the network technology deployed or your preference, another approach is also possible when implementing a SoT. The alternative is to ensure the SoT reflects what exists on the network. This approach may be used as a one-time event to turn the initial data population into a SoT. This may seem a little confusing because it goes against the purist view of SoT, but we thought it is worth calling out because it is reality.
With the growth of NetDevOps over the past few years, one common place to start with a SoT is to define data in a YAML file and version it in a Git repository. The YAML data is the intended state. That data gets rendered with one or more templates to generate the intended configuration, which is later deployed to the network. This approach provides peer review (through pull and merge requests) on the data before being merged and later deployed and also enables users to run automated tests with CI on the data providing even more assurances the data is good. This approach of defining the data first and having that drive automation is what data-driven network automation is all about.
Due to the plethora of technologies that exist today from SDN and cloud-native networking, networks are not always planned – they may be dynamic. There may be auto-scaling or dynamic policies. In these types of environments, you may prefer to see the actual state in one place. This is also possible by using a SoT. With this approach, it is more analogous to a discovery engine, but for configuration data.
It is also possible to employ a hybrid approach. This would mean certain data in the SoT is authoritative and drives the intent of the network, and other data shows what exists in certain domain managers, controllers, or clouds. The general assumption here would be that the data added via controllers or the cloud is authoritative and what is intended to be configured.
Overall, it’s always worth remembering that not all purist points of view and ideals can be implemented in a network that has been evolving for 25 years. We need to take a pragmatic approach, but it is important to recognize proper definitions and terminology to ensure everyone embarking on their SoT journey is on the same page.
Keeping the purist view in mind allows us to see the relationship between network data and network automation, given the data is ultimately at the center and driving network automation. The beautiful thing about data-driven network automation is that it allows us to start thinking about abstractions and the level of intent that we want to describe the network.
Even in this book, we’re talking about lower-level data, which leads to lower-level intent. However, once you’ve embraced data, it is possible to build abstractions around design. Consider the earlier example at a higher level of intent:
connection:Â Â Â Â source: Â Â Â Â Â Â Â Â device: nyc-sw-01 Â Â Â Â Â Â Â Â interface: GigabitEthernet4/1 Â Â Â Â destination: Â Â Â Â Â Â Â Â device: och-sw-01 Â Â Â Â Â Â Â Â interface: GigabitEthernet1/1 Â Â Â Â type: off_campus
In this example YAML data file, you’ll notice off_campus
defined as a type. This was not used in the prior example. With logic in your templates and automation, the right data will be generated and then populated in the SoT based on the standard off_campus
designs for both required devices. You could go one step further and not even choose the devices and let the automation tell you the ports to use on particular devices that have capacity. This will take time, but it starts with repeatable standards (few to no snowflakes) and data, meaning it starts with SoT.
SoT tools and products
After learning more about SoT and the role of network data in network automation, we’re ready to look at SoT tools and products. The fact is that there are not many tools that focus on network data specifically for network automation. Let’s look at some tools that may be used in building out an overarching SoT strategy. Some are more common than others:
- Nautobot: It should be obvious and is likely the reason you’re reading this book, but we believe Nautobot is the SoT for networking. With native models, extensibility, and a framework in place for aggregating data to and from other data sources, it is becoming the de facto standard for enterprises adopting a SoT for network automation. Nautobot is an open source project sponsored by Network to Code. Network to Code’s mission is to continue to drive network automation around the world, one network at a time.
- YAML files: Usually playing a part in almost every network automation journey, they provide a solid path to getting started and understanding data-driven network automation. In Chapter 6, we’ll look at integrating YAML files stored in a Git repository directly into Nautobot – showing that with the click of a button, those files and data can be pulled directly into Nautobot.
- NetBox: The motivation for Nautobot, NetBox is a solution that models and documents modern networks. NetBox is an open source project sponsored by NetBox Labs. Nautobot forked NetBox when NetBox was at v2.10 and has continued to diverge (as a hard fork (https://producingoss.com/en/forks.html#:~:text=Hard%20forks%20(also%20sometimes%20called,line%20with%20their%20own%20vision)) since February 2021.
- Configuration management databases (CMDBs): More often than not, CMDBs are part of a greater ITSM strategy, including ServiceNow and BMC Remedy. These tools may be used as the SoT for inventory or general asset management but are usually not used to model network configuration data due to a lack of data models, lack of skills, and how these teams are often disconnected from the network teams. These tools are often built off auto-discovery engines with a general trend toward showing what is versus the intended state.
- Device42: This is usually seen and adopted for data center infrastructure management (DCIM) with a focus on inventory, data center design, rack layouts, and IPAM with automated discovery. Similar to CMDBs, there is a focus on auto-discovery with a general trend toward showing what is versus the intended state, but usually not used to model actual network configurations such as routing, interfaces, and more and powering network automation solutions.
- Infoblox and BlueCat: Arguably the most widely deployed IPAM solutions, their focus is on IPAM. They also have discovery capabilities. They have some SoT branding and marketing, but usually, it’s on discovering IPs versus defining the intent of IP schemes and having that drive automation.
These are just a select few tools that exist on the market and are being used by network teams. What we believe, and the premise for creating this book, is that Nautobot has grown immensely over the past 2 years and fills a gap in the market as an enterprise network SoT catered specifically for network automation. Through the remainder of this book, we hope you’ll see what Nautobot has to offer and how it can act as the SoT and nucleus to power your data-driven network automation stack on your network automation journey.
Finally, let’s dive into Nautobot.