This article by Muhammad Zeeshan Munir, author of the book VMware vSphere Troubleshooting, includes troubleshooting vSphere virtual distributed switches, vSphere standard virtual switches, vLANs, uplinks, DNS, and routing, which is one of the core issues a seasonal system engineer has to deal with on a daily basis. This article will cover all these topics and give you hands-on step-by-step instructions to manage and monitor your network resources. The following topics will be covered in this article:
(For more resources related to this topic, see here.)
Some of the commands that can be used for networking troubleshooting include net-dvs, Esxcli network, vicfg-route, vicfg-vmknic, vicfg-dns, vicfg-nics, and vicfg-vswitch.
You can use the net-dvs command to troubleshoot VMware distributed dvSwitches. The command shows all the information regarding the VMware distributed dvSwtich configuration. The net-dvs command reads the information from the /etc/vmware/dvsdata.db file and displays all the data in the console. A vSphere host keeps updating its dvsdata.db file every five minutes.
net-dvs
In the preceding screenshot, you can see that the first line represents the UUID of a VMware distributed switch. The second line shows the maximum number of ports a distributed switch can have. The line com.vmware.common.alias = dvswitch-Network-Pools represents the name of a distributed switch. The next line com.vmware.common.uplinkPorts: dvUplink1 to dvUplinkn shows the uplink ports a distributed switch has. The distributed switch MTU is set to 1,600 and you can see the information about CDP just below it. CDP information can be useful to troubleshoot connectivity issues.
You can see com.vmware.common.respools.list listing networking resource pools, while com.vmware.common.host.uplinkPorts shows the ports numbers assigned to uplink ports. Further details about these uplink ports are explained as follows for each uplink port by their port number. You can also see the port statistics as displayed in the following screenshot. When you perform troubleshooting, these statistics can help you to check the behavior of the distributed switch and the ports. From these statistics, you can diagnose if the data packets are going in and out. As you can see in the following screenshot, all the metrics regarding packet drops are zero. If you find in your troubleshooting that the packets are being dropped, you can easily start finding the root cause of the problem:
Unfortunately, the net-dvs command is very poorly documented, and usually, it is hard to find useful references. Moreover, it is not supported by VMware. However, you can use it with –h switch to display more options.
Sometimes, the dvsdata.db file of a vSphere host becomes corrupted and you face different types of distributed switch errors, for example, unable to create proxy DVS. In this case, when you try to run the net-dvs command on a vSphere host, it will fail with an error as well. As I have mentioned earlier, the net-dvs command reads data from the /etc/vmware/dvsdata.db file—it fails because it is unable to read data from the file. The possible cause for the corruption of the dvsdata.db file could be network outage; or when a vSphere host is disconnected from vCenter and deleted, it might have the information in its cache.
You can resolve this issue by restoring the dvsdata.db file by following these steps:
The esxcli network command is a longtime friend of the system administrator and the support staff for troubleshooting network related issues. The esxcli network command will be used to examine different network configurations and to troubleshoot problems. You can type esxcli network to quickly see a help reference and the different options that can be used with the command.
Let's walk through some useful esxcli network troubleshooting commands. Type the following command into your vSphere CLI to list all the virtual machines and the networks they are on. You can see that the command returned World ID, virtual machine name, number of ports, and the network:
esxcli network vm list
World ID Name Num Ports Networks
-------- --------------------------------------------------- --------- ---------------
14323012 cluster08_(5fa21117-18f7-427c-84d1-c63922199e05) 1 dvportgroup-372
Now use the World ID of a virtual machine returned by the last command to list all the ports the virtual machine is currently using. You can see the virtual switch name, MAC address of the NIC, IP address, and uplink port ID:
esxcli network vm port list -w 14323012
Port ID: 50331662
vSwitch: dvSwitch-Network-Pools
Portgroup: dvportgroup-372
DVPort ID: 1063
MAC Address: 00:50:56:01:00:7e
IP Address: 0.0.0.0
Team Uplink: all(2)
Uplink Port ID: 0
Active Filters:
Type the following command in the CLI to list the statistics of the virtual switch—you need to replace the port ID as returned by the last command after –p flag:
esxcli network port stats get -p 50331662
Packet statistics for port 50331662
Packets received: 10787391024
Packets sent: 7661812086
Bytes received: 3048720170788
Bytes sent: 154147668506
Broadcast packets received: 17831672
Broadcast packets sent: 309404
Multicast packets received: 656
Multicast packets sent: 52
Unicast packets received: 10769558696
Unicast packets sent: 7661502630
Receive packets dropped: 92865923
Transmit packets dropped: 0
Type the following command to list complete information about the network card of the virtual machine:
esxcli network nic stats get -n vmnic0
NIC statistics for vmnic0
Packets received: 2969343419
Packets sent: 155331621
Bytes received: 2264469102098
Bytes sent: 46007679331
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors: 78507
Receive length errors: 0
Receive over errors: 22
Receive CRC errors: 0
Receive frame errors: 0
Receive FIFO errors: 78485
Receive missed errors: 0
Total transmit errors: 0
Transmit aborted errors: 0
Transmit carrier errors: 0
Transmit FIFO errors: 0
Transmit heartbeat errors: 0
Transmit window errors: 0
A complete reference of the ESXCli Network command can be found here at https://goo.gl/9OMbVU.
All the vicfg-* commands are very helpful and easy to use. I will encourage you to learn in order to make your life easier. Here are some of the vicfg-* commands relevant to network troubleshooting:
We will use the vicfg-nics command to manage physical network adapters of vSphere hosts. The vicfg-nics command can also be used to set up the speed, VMkernel name for the uplink adapters, duplex setting, driver information, and link state information of the NIC.
Connect to your vMA appliance console and set up the target vSphere host:
vifptarget --set crimv3esx001.linxsol.com
List all the network cards available in the vSphere host. See the following screenshot for the output:
vicfg-nics –l
You can see that my vSphere host has five network cards from vmnic0 to vmnic5. You are able to see the PCI and driver information. The link state for the all the network cards is up. You can also see two types of network card speeds: 1000 Mbs and 9000 Mbs. There is also a card name in the Description field, MTU, and the Mac address for the network cards. You can set up a network card to auto-negotiate as follows:
vicfg-nics --auto vimnic0
Now let's set the speed of vmnic0 to 1000 and its duplex settings to full:
vicfg-nics --duplex full --speed 1000 vmnic0
The last command we will discuss in this article is vicfg-vswitch. The vicfg-vswitch command is a very powerful command that can be used to manipulate the day-to-day operations of a virtual switch. I will show you how to create and configure port groups and virtual switches.
Set up a vSphere host in the vMA appliance in which you want to get information about virtual switches:
vifptarget --set crimv3esx001.linxsol.com
Type the following command to list all the information about the switches the vSphere host has. You can see the command output in the screenshot that follows:
vicfg-vswitch -l
You can see that the vSphere host has one virtual switch and two virtual NICs carrying traffic for the management network and for the vMotion. The virtual switch has 128 ports, and 7 of them are in used state. There are two uplinks to the switch with MTU set to 1500, while two VLANS are being used: one for the management network and one for the vMotion traffic. You can also see three distributed switches named OpenStack, dvSwitch-External-Networks, and dvSwitch-Network-Pools.
Prefixing dv with the distributed switch name is a command practice, and it can help you to easily recognize a distributed switch.
I will go through adding a new virtual switch:
vicfg-vswitch --add vSwitch002
This creates a virtual switch with 128 ports and MTU of 1500. You can use the --mtu flag to specify a different MTU. Now add an uplink adapter vnic02 to the newly created virtual switch vSwitch002:
vicfg-vswitch --link vmnic0 vSwitch002
To add a port group to the virtual switch, use the following command:
vicfg-vswitch --add-pg portgroup002 vSwitch002
Now add an uplink adapter to the port group:
vicfg-vswitch --add-pg-uplink vmnic0 --pg portgroup002 vSwitch002
We have discussed all the commands to create a virtual switch and its port groups and to add uplinks. Now we will see how to delete and edit the configuration of a virtual switch. An uplink NIC from the port group can be deleted using –N flag. Remove vmnic0 from the portgroup002:
vicfg-vswitch --del-pg-uplink vmnic0 --pg portgroup002 vSwitch002
You can delete the recently created port group as follows:
vicfg-vswitch --del-pg portgroup002 vSwitch002
To delete a switch, you first need to remove an uplink adapter from the virtual switch. You need to use the –U flag, which unlinks the uplink from the switch:
vicfg-vswitch --unlink vmnic0 vSwitch002
You can delete a virtual switch using the –d flag. Here is how you do it:
vicfg-vswitch --delete vSwitch002
You can check the Cisco Discovery Protocol (CDP) settings by using the --get-cdp flag with the vicfg-vswitch command. The following command resulted in putting the CDP in the Listen state, which indicates that the vSphere host is configured to receive CDP information from the physical switch:
vi-admin@vma:~[crimv3esx001.linxsol.com]> vicfg-vswitch --get-cdp vSwitch0
listen
You can configure CDP options for the vSphere host to down, listen, or advertise. In the Listen mode, the vSphere host tries to discover and publish this information received from a Cisco switch port, though the information of the vSwitch cannot be seen by the Cisco device. In the Advertise mode, the vSphere host doesn't discover and publish the information about the Cisco switch; instead, it publishes information about its vSwitch to the Cisco switch device.
vicfg-vswitch --set-cdp both vSwitch0
Virtual LANS or VLANs are used to separate the physical switching segment into different logical switching segments in order to segregate the broadcast domains. VLANs not only provide network segmentation but also provide us a method of effective network management. It also increases the overall network security, and nowadays, it is very commonly used in infrastructure. If not set up correctly, it can lead your vSphere host to no connectivity, and you can face some very common problems where you are unable to ping or resolve the host names anymore. Some common errors are exposed, such as Destination host unreachable and Connection failed. A Private VLAN (PVLAN) is an extended version of VLAN that divides logical broadcast domain into further segments and forms private groups. PVLANs are divided into primary and secondary PVLANs.
Primary PVLAN is the VLAN distributed into smaller segments that are called primary. These then host all the secondary PVLANs within them. Secondary PVLANs live within primary VLANS, and individual secondary VLANs are recognized by VLAN IDs linked to them. Just like their ancestor VLANs, the packets that travel within secondary VLANS are tagged with their associated IDs. Then, the physical switch recognizes if the packets are tagged as isolated, community, or promiscuous.
As network troubleshooting involves taking care of many different aspects, one aspect you will come across in the troubleshooting cycle is actually troubleshooting VLANS. vSphere Enterprise Plus licensing is a requirement to connect a host using a virtual distributed switch and VLANs. You can see the three different network segments in the following screenshot. VLAN A connects all the virtual machines on different vSphere hosts; VLAN B is responsible for carrying out management network traffic; and VLAN C is responsible for carrying out vMotion-related traffic. In order to create PVLANs on your vSphere host, you also need the support of a physical switch:
For detailed information about the vSphere network, refer to the VMware official networking guide for vSphere 5.5 at http://goo.gl/SYySFL.
The first and most important step to troubleshooting your VLAN problem is to look into the VLAN configuration of your vSphere host. You should always start by verifying it. Let's walk through how to verify the network configuration of the management network and VLAN configuration from the vSphere client:
Following are the steps for verifying VLAN configuration from CLI:
esxcfg-vswitch -l
vicfg-vswitch –l
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 128 7 128 1500 vmnic3,vmnic2
PortGroup Name VLAN ID Used Ports Uplinks
vMotion 2231 1 vmnic3,vmnic2
Management Network 2230 1 vmnic3,vmnic2
---Omitted output---
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 128 7 128 1500 vmnic2,vmnic3
PortGroup Name VLAN ID Used Ports Uplinks
vMotion 2231 1 vmnic2,vmnic3
Management Network 2230 1 vmnic3,vmnic2
--Omitted output---
esxcfg-vswitch –v 2233 –p "Management Network" vSwitch0
vicfg-vswitch --vlan 2233 --pg "Management Network" vSwitch0
Verifying information about VLANs from the PowerCLI is fairly simple. Type the following command into the console after connecting with vCenter using Connect-VIServer:
Get-VirtualPortGroup –VMHost crimv3esx001.linxsol.com | select Name, VirtualSwitch VLanID
Name VirtualSwitch
VlanId
---- ------------- -----
vMotion vSwitch0
2231
Management Network vSwitch0 2233
When you have configured PVLANs or secondary PVLANs in your vSphere infrastructure, you may arrive at a situation where you need to troubleshoot them. This topic will provide you some tips to obtain and view information about PVLANs and secondary PVLANs, as follows:
Whenever you are troubleshooting, virtual-machine-to-virtual-machine testing is very important. It helps you to isolate the problem domain to a smaller scope. When performing virtual-machine-to-virtual-machine testing, you should always move virtual machines to a single vSphere host. You can then start troubleshooting the network using basic commands, such as ping. If ping works, you are ready to test it further and move the virtual machines to other hosts, and if it still doesn't work, it most likely is a configuration problem of a physical switch or is likely to be a mismatched physical trunk configuration. The most common problem in this scenario is a problematic physical switch configuration.
In this section, we will see how to troubleshoot VMkernel interfaces:
You should know how to use these commands to test if everything is working. You should be able to ping to ensure connectivity exists.
We will use the vicfg-vmknic command to configure vSphere VMkernel NICs. Let's create a new VMkernel NIC in a vSphere host using the following steps:
vicfg-vmknic –h crimv3esx001.linxsol.com --add --ip 10.2.0.10 –n 255.255.255.0 'portgroup01'
You can enable vMotion using the vicfg-vmknic command as follows:
vicfg-vmknic –enable-vmotion.
You will not be able to enable vMotion from ESXCLI.vMotion protect migration of your virtual machines with zero down time.
vicfg-vmknic –h crimv3esx001.linxsol.com --delete 'portgroup01'
vicfg-vmknic -l
When you successfully install vSphere, the first yellow screen that you see is called the vSphere DCUI. DCUI is a frontend management system that helps perform some basic system administration tasks. It also offers the best way to troubleshoot some problems that may be difficult to troubleshoot through vMA, vCLI, or PowerCLI. Further, it is very useful when your host becomes irresponsive from the vCenter or is not accessible from any of the management tools.
Some useful tasks that can be performed using the DCUI are as follows:
The vSphere host automatically assigns the first network card available to the system for the management network. Moreover, the default installation of the vSphere host does not let you set up VLAN tags until the VMkernel has been loaded. Verifying network connectivity from the DCUI is important but easy. To do so, follow these steps:
You can also verify the settings of your management network from the DCUI.
As you can see in the preceding screenshot, you can also configure the IP address and DNS settings for your vSphere host. You can also use DCUI to configure VLANs and DNS Suffix for your vSphere host.
In this article, for troubleshooting, we took a deep dive into the troubleshooting commands and some of the monitoring tools to monitor network performance.
The various platforms to execute different commands help you to isolate your troubleshooting techniques. For example, for troubleshooting a single vSphere host, you may like to use esxcli, but for a bunch of vSphere hosts you would like to automate scripting tasks from PowerCLI or from a vMA appliance.
Further resources on this subject: