ACI uses inter-fabric messaging (IFM) to communicate between the different nodes. IFM uses TCP packets, which are secured by 1024-bit SSL encryption, and the keys are stored on secure storage. The Cisco Manufacturing Certificate Authority (CMCA) signs the keys.
Issues with IFM can prevent fabric nodes communicating and from joining the fabric. We will cover this in greater depth in the SSL Troubleshooting recipe in Chapter 9, Troubleshooting ACI, but we can look at the output of the checks on a healthy system:
apic1# netstat -ant | grep :12
tcp 0 0 10.0.0.1:12151 0.0.0.0:* LISTEN
tcp 0 0 10.0.0.1:12215 0.0.0.0:* LISTEN
tcp 0 0 10.0.0.1:12471 0.0.0.0:* LISTEN
tcp 0 0 10.0.0.1:12279 0.0.0.0:* LISTEN
<truncated>
tcp 0 0 10.0.0.1:12567 10.0.248.29:49187 ESTABLISHED
tcp 0 0 10.0.0.1:12343 10.0.248.30:45965 ESTABLISHED
tcp 0 0 10.0.0.1:12343 10.0.248.31:47784 ESTABLISHED
tcp 0 0 10.0.0.1:12343 10.0.248.29:49942 ESTABLISHED
tcp 0 0 10.0.0.1:12343 10.0.248.30:42946 ESTABLISHED
tcp 0 0 10.0.0.1:50820 10.0.248.31:12439 ESTABLISHED
apic1# openssl s_client -state -connect 10.0.0.1:12151
CONNECTED(00000003)
SSL_connect:before/connect initialization
SSL_connect:SSLv2/v3 write client hello A
SSL_connect:SSLv3 read server hello A
depth=1 O = Cisco Systems, CN = Cisco Manufacturing CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
SSL_connect:SSLv3 read server certificate A
SSL_connect:SSLv3 read server key exchange A
SSL_connect:SSLv3 read server certificate request A
SSL_connect:SSLv3 read server done A
SSL_connect:SSLv3 write client certificate A
SSL_connect:SSLv3 write client key exchange A
SSL_connect:SSLv3 write change cipher spec A
SSL_connect:SSLv3 write finished A
SSL_connect:SSLv3 flush data
SSL3 alert read:fatal:handshake failure
SSL_connect:failed in SSLv3 read server session ticket A
139682023904936:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure:s3_pkt.c:1300:SSL alert number 40
139682023904936:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:s23_lib.c:177:
---
Certificate chain
0 s:/CN=serialNumber=PID:APIC-SERVER-L1 SN:TEP-1-1, CN=TEP-1-1
i:/O=Cisco Systems/CN=Cisco Manufacturing CA
1 s:/O=Cisco Systems/CN=Cisco Manufacturing CA
i:/O=Cisco Systems/CN=Cisco Manufacturing CA
---
Server certificate
-----BEGIN CERTIFICATE-----
<runcated>
-----END CERTIFICATE-----
subject=/CN=serialNumber=PID:APIC-SERVER-L1 SN:TEP-1-1, CN=TEP-1-1
issuer=/O=Cisco Systems/CN=Cisco Manufacturing CA
---
No client certificate CA names sent
---
SSL handshake has read 2171 bytes and written 210 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: zlib compression
Expansion: NONE
SSL-Session:
Protocol : TLSv1.2
Cipher : DHE-RSA-AES256-GCM-SHA384
Session-ID:
Session-ID-ctx:
Master-Key: 419BF5E19D0A02AA0D40BDF380E8E959A4F27371A87EFAD1B
Key-Arg : None
PSK identity: None
PSK identity hint: None
SRP username: None
Compression: 1 (zlib compression)
Start Time: 1481059783
Timeout : 300 (sec)
Verify return code: 19 (self signed certificate in certificate chain)
---
apic1#
IFM is essential in the success of the discovery process. A fabric node is only considered active when the APIC and the node can exchange heartbeats through IFM. Going forward, though, we still need IFM once we have active nodes, as it is also used by the APIC to push policies to the fabric leaf nodes.
The fabric discovery process has three stages and uses IFM, LLDP (Link Layer Discovery Protocol), DHCP (Dynamic Host Configuration Protocol), and TEPs (tunnel endpoints):
- Stage 1: A second discovery brings in any spines connected to initial "seed" leaf.
- Stage 2: The leaf node that is directly connected to APIC is discovered.
- Stage 3: In this stage, we have the discovery of other leaf nodes and other APICs in the cluster.
The process can be visualized as follows:
The node can transition through a number of different states during the discovery process:
- Unknown: Node discovered but no node ID policy configured
- Undiscovered: Node ID configured but not yet discovered
- Discovering: Node discovered but no IP address assigned
- Unsupported: Node is not a supported model
- Disabled: Node has been decommissioned
- Inactive: No IP connectivity
- Active: Node is active
Using the acidiag fnvread command, you can see the current state. In the following command output, the leaf node is in the unknown state (note that I have removed the final column in the output, which was LastUpdMsg, the value of which was 0):
apic1# acidiag fnvread
ID Pod ID Name Serial Number IP Address Role State
---------------------------------------------------------------------
0 0 TEP-1-101 0.0.0.0 unknown unknown
Total 1 nodes
apic1#
During fabric registration and initialization, a port may transition to an out-of-service state. In this state, the only traffic permitted is DHCP and CDP or LLDP. There can be a number of reasons why we would transition to this state, but these are generally due to human error, such as cabling or LLDP not being enabled; again, these are covered in the Layer-2 troubleshooting recipe in Chapter 9, Troubleshooting ACI.
There are a couple of ways in which we can check the health of our controllers and nodes. We can use the CLI to check LLDP (show lldp neighbors), or we can use the GUI (System | Controllers | Node | Cluster as Seen by Node):
This shows us the APIC, and we can look at our leaf nodes from the Fabric menu. In the code output from acidiag fnvread, we saw a node named TEP-1-101. This is a leaf node, as we can see from the GUI (Fabric | Inventory | Fabric Membership):
We will look at the GUI in the next section.