Gathering hardware information
As a matter of principle, most people will tend to suggest that all system information can be categorized as either hardware-or-software based. This approach certainly serves to simplify things, but throughout the course of this chapter I will go some way to infer that there are instances in which the interplay of both (hardware and software) can be the reason for the issues at hand.
So, before you begin troubleshooting a system, always consider that the need gathering information about a system is the recommended approach to gaining additional insight and familiarity. Look at it this way: the practice of gathering hardware information is not necessarily required, but an investigation of this type may assist you in the search for an eventual diagnosis.
To begin, we will start by running a simple CPU-based hardware report with the following command:
# cat /proc/cpuinfo
As you will see, the purpose of this command is to output all information related to the CPU model, family, architecture, the cache, and much more. The /proc
approach is always a good tradition, but using the following command is generally considered to be a better practice and far easier to use:
# lscpu
This command will query the system and output all relevant information associated with the CPU in the following manner:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 ...
On the other hand, rather than querying absolutely everything, you can specify criteria by using grep
(a subject that we will return to a little later in this chapter) in order to obtain any pertinent information, like this:
# lscpu | grep op-mode
So, having done this and recorded the results for future reference, we will now continue our investigation by running a simple hardware report with the lspci
command in the following way:
# lspci
The result of this command may output something similar to the following information:
00:00.0 Host bridge: Intel Corporation 82P965/G965 Memory Controller Hub (rev 02) 00:01.0 PCI bridge: Intel Corporation 82G35 Express PCI Express Root Port (rev 02) 00:05.0 Ethernet controller: Red Hat, Inc Virtio network device 00:0a.0 PCI bridge: Digital Equipment Corporation DECchip 21150 00:0e.0 RAM memory: Red Hat, Inc Virtio memory balloon 00:1d.0 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #1 (rev 02) 00:1d.7 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev f2) 00:1f.0 ISA bridge: Intel Corporation 82801HB/HR (ICH8/R) LPC Interface Controller (rev 02)
The lspci
command provides all the relevant information concerning the PCI devices of your server, which in turn, can be expanded by employing either the -v
option or the alternative -vv
/ -vvv
option(s), depending on the level of detail you require:
# lspci -v # lspci -vv # lspci -vvv
By default, the above commands will provide all the information required by you to confirm whether a device is supported by any of the modules currently installed on your system or not. It is expected that you should only need to do this when hardware upgrades have been implemented, when the system has just been installed, or if you are attempting to familiarize yourself with a new environment. However, in order to simplify this exercise even further, you will be glad to know that a "tree view mode" is also available. The purpose of this facility is to output the associated device ID and show how these values are associated with the relevant bus.
To do this, type the following command:
# lspci -t
As a troubleshooter, you will be aware that every device must maintain a unique identifier as CentOS, like all other operating systems, will use that identifier to bind a driver to that device. The lspci
command works by scanning the /sys
tree for all connected devices, which can also include the connection port, the device type, and class, to name but a few. Having done this, the lspci
command will then consult /usr/share/hwdata/pci.ids
to provide the human-readable entries it displays.
For example, you can display the kernel drivers/modules by typing the following lspci
command with the -k
option like this:
# lspci -k
Naturally, during any hardware-based troubleshooting investigation you will want to review the system logs for additional clues, but as we have seen, both the lscpu
and lspci
commands are particularly useful when attempting to discover more about the necessary hardware information present on your system.
You can learn more about these commands by reviewing the respective on-board manuals at any time:
$ man lscpu $ man lspci
Meanwhile, if you want to practice more, a simple test would be to insert a USB thumb drive and to analyze the findings yourself by paying close attention to the enumeration found within /var/log/messages
.
Note
Remember, if you do try this, you are looking at how the system reacted once the USB drive was inserted; you are not necessarily looking at the USB drive itself; the information about which can be obtained with lsusb
.
On the other hand, in the same way that we can use grep
with lscpu
, if you are already feeling comfortable with this type of investigation, then you may like to know that you can also use grep
with the lspci
command to discover more about your RAID controller in the following way:
# lspci | grep -i raid
Now, I am sure you will not be surprised to learn that there are many more commands associated with obtaining hardware information. This includes (but is not limited to) lsmod
, dmidecode
hdparm
, df -h
, or even lsblk
and the many others that will be mentioned throughout the course of this book. All of them are useful, but for those who do not want to commit them to memory, a significant amount of information can be found by simply reading the files found within the /proc
and /sys
directories like this:
# find /proc | less # find /sys | less
Consequently, and before we move on, you should now be aware that when you are dealing with hardware analysis, perfecting your skills is about practice and exposure to a server over the long term. My reason for stating this is based on the notion that a simple installation procedure can serve to identify these problems almost immediately, but without that luxury, and as time goes by, it is possible that the hardware will need replacing or servicing. RAID Battery packs will fail, memory modules will fail, and, on some occasions, it could be that a particular driver has not fully loaded during the most recent reboot. In this situation, you may find that the kernel is flooding the system with random messages to such an extent that it suggests an entirely different issue is causing the problem. So yes, hardware troubleshooting requires a good measure of patience and observation, and it is for this reason that a quick review of both the lscpu
and lspci
commands has formed our introduction to troubleshooting CentOS 7.