





















































(For more resources related to this topic, see here.)
OpenStack is a complex suite of software that can make tracking down issues and faults quite daunting to beginners and experienced system administrators alike. While there is no single approach to troubleshooting systems, understanding where OpenStack logs vital information or what tools are available to help track down bugs will help resolve issues we may encounter. However, OpenStack like all software will have bugs that we are not able to solve ourselves. In that case, we will show you how gathering the required information so that the OpenStack community can identify bugs and suggest fixes is important in ensuring those bugs or issues are dealt with quickly and efficiently.
Logging is important in all computer systems, but the more complex the system, the more you rely on logging to be able to spot problems and cut down on troubleshooting time. Understanding logging in OpenStack is important to ensure your environment is healthy and you are able to submit relevant log entries back to the community to help fix bugs.
Log in as the root user onto the appropriate servers where the OpenStack services are installed. This makes troubleshooting easier as root privileges are required to view all the logs.
OpenStack produces a large number of logs that help troubleshoot our OpenStack installations. The following details outline where these services write their logs:
Logs for the OpenStack Compute services are written to /var/log/nova/, which is owned by the nova user, by default. To read these, log in as the root user (or use sudo privileges when accessing the files). The following is a list of services and their corresponding logs. Note that not all logs exist on all servers. For example, nova-compute.log exists on your compute hosts only:
Log entries regarding the spinning up and running of the instances
Log entries regarding network state, assignment, routing, and security groups
Log entries produced when running the nova-manage command
Log entries regarding services making requests for database information
Log entries pertaining to the scheduler, its assignment of tasks to nodes, and messages from the queue
Log entries regarding user interaction with OpenStack as well as messages regarding interaction with other components of OpenStack
Entries regarding the nova-cert process
Details about the nova-console VNC service
Authentication details related to the nova-console service
Network information regarding the dhcpbridge service
OpenStack Dashboard (Horizon) is a web application that runs through Apache by default, so any errors and access details will be in the Apache logs. These can be found in /var/log/apache2/*.log, which will help you understand who is accessing the service as well as the report on any errors seen with the service.
OpenStack Object Storage (Swift) writes logs to syslog by default. On an Ubuntu system, these can be viewed in /var/log/syslog. On other systems, these might be available at /var/log/messages.
The OpenStack Block Storage service, Cinder, will produce logs in /var/log/cinder by default. The following list is a breakdown of the log files:
Details about the cinder-api service
Details related to the operation of the Cinder scheduling service
Log entries related to the Cinder volume service
The OpenStack Identity service, Keystone, writes its logging information to /var/log/keystone/keystone.log. Depending on how you have Keystone configured, the information in this log file can be very sparse to extremely verbose including complete plaintext requests.
The OpenStack Image Service Glance stores its logs in /var/log/glance/*.log with a separate log file for each service. The following is a list of the default log files:
Entries related to the glance API
Log entries related to the Glance registry service. Things like metadata updates and access will be stored here depending on your logging configuration.
OpenStack Networking Service, formerly Quantum, now Neutron, stores its log files in /var/log/quantum/*.log with a separate log file for each service. The following is a list of the corresponding logs:
Log entries pertaining to the dhcp-agent
Log entries related to the l3 agent and its functionality
This file contains log entries related to requests Quantum has proxied to the Nova metadata service.
Entries related the the operation of Open vSwitch. When implementing OpenStack Networking, if you use a different plugin, its log file will be named accordingly.
Details and entries related to the quantum API service
Details and entries related to the OpenVSwitch Switch Daemon
By default each OpenStack service has a sane level of logging, which is determined by the level set as Warning. That is, it will log enough information to provide you the status of the running system as well as some basic troubleshooting information. However, there will be times that you need to adjust the logging verbosity either up or down to help diagnose an issue or reduce logging noise.
As each service can be configured similarly, we will show you how to make these changes on the OpenStack Compute service.
To do this, log into the box where the OpenStack Compute service is running and execute the following commands:
sudo vim /etc/nova/logging.conf
Change the following log levels to either DEBUG, INFO or WARNING in any of the services listed:
Other services such as Glance and Keystone currently have their log-level settings within their main configuration files such as /etc/glance/glance-api.conf. Adjust the log levels by altering the following lines to achieve INFO or DEBUG levels:
Restart the relevant service to pick up the log-level change.
Logging is an important activity in any software, and OpenStack is no different. It allows an administrator to track down problematic activity that can be used in conjunction with the community to help provide a solution. Understanding where the services log and managing those logs to allow someone to identify problems quickly and easily are important.
OpenStack provides tools to check on its services. In this section, we'll show you how to check the operational status of these services. We will also use common system commands to check whether our environment is running as expected.
To check our OpenStack Compute host, we must log into that server, so do this now before following the given steps.
To check that OpenStack Compute is running the required services, we invoke the nova-manage tool and ask it various questions about the environment, as follows:
To check our OpenStack Compute services, issue the following command:
sudo nova-manage service list
You will see an output similar to the following. The :-) indicates that everything is fine.
nova-manage service list
The fields are defined as follows:
If OpenStack Compute has a problem, you will see XXX in place of :-). The following command shows the same:
nova-compute compute.book nova enabled XXX 2013-06-18 16:47:35
If you do see XXX, the answer to the problem will be in the logs at /var/log/nova/.
If you get intermittent XXX and :-) for a service, first check whether the clocks are in sync.
The OpenStack Image Service, Glance, while critical to the ability of OpenStack to provision new instances, does not contain its own tool to check the status of the service. Instead, we rely on some built-in Linux tools. OpenStack Image Service (Glance) doesn't have a tool to check its running services, so we can use some system commands instead, as follows:
ps -ef | grep glance netstat -ant | grep 9292.*LISTEN
These should return process information for Glance to show it's running, and 9292 is the default port that should be open in the LISTEN mode on your server, which is ready for use. The output of these commands will be similar to the following:
ps -ef | grep glance
This produces output like the following:
To check if the correct port is in use, issue the following command:
netstat -ant | grep 9292 tcp 0 0 0.0.0.0:9292 0.0.0.0:* LISTEN
Should Glance be having issues while the above services are in working order, you will want to check the following services as well:
sudo rabbitmqctl status
For example, output from rabbitmqctl (when everything is running OK) should look similar to the following screenshot:
If rabbitmq isn't working as expected, you will see output similar to the following indicating that the rabbitmq service or node is down:
ntpq -p
ntp is required for multi-host OpenStack environments but it may not be installed by default. Install the ntp package with sudo apt-get install -y ntp)
This should return output regarding contacting NTP servers, for example:
PASSWORD=openstack mysqladmin -uroot –p$PASSWORD status
This will return some statistics about MySQL, if it is running, as shown in the following screenshot:
Like the Glance Service, the OpenStack Dashboard service, Horizon, does not come with a built-in tool to check its health.
Horizon, despite not having a built-in utility to check service health, does rely on the Apache web server to serve pages. To check the status of the service then, we check the health of the web service. To check the health of the Apache web service, log into the server running Horizon and execute the following command:
ps -ef | grep apache
This command produces output like the following screenshot:
To check that Apache is running on the expected port, TCP Port 80, issue the following command:
netstat -ano | grep :80
This command should show the following output:
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN off (0.00/0/0)
To test access to the web server from the command line issue the following command:
telnet localhost 80
This command should show the following output:
Trying 127.0.0.1... Connected to localhost. Escape character is '^]'.
Keystone comes with a client side implementation called the python-keystone client. We use this tool to check the status of our Keystone services.
To check that Keystone is running the required services, we invoke the keystone command:
# keystone user-list
This produces output like the following screenshot:
Additionally, you can use the following commands to check the status of Keystone. The following command checks the status of the service:
# ps -ef | grep keystone
This should show output similar to the following:
keystone 5441 1 0 Jun20 ? 00:00:04 /usr/bin/python /usr/bin/keystone-all
Next you can check that the service is listening on the network. The following command can be used:
netstat -anlp | grep 5000
This command should show output like the following:
tcp 0 0 0.0.0.0:5000 0.0.0.0: LISTEN 54421/python
When running the OpenStack Networking service, Neutron, there are a number of services that should be running on various nodes. These are depicted in the following diagram:
On the Controller node, check the Quantum Server API service is running on TCP Port 9696 as follows:
sudo netstat -anlp | grep 9696
The command brings back output like the following:
tcp 0 0 0.0.0.0:9696 0.0.0.0:* LISTEN 22350/python
On the Compute nodes, check the following services are running using the ps command:
For example, run the following command:
ps -ef | grep ovsdb-server
On the Network node, check the following services are running:
To check our Neutron agents are running correctly, issue the following command from the Controller host when you have the correct OpenStack credentials sourced into your environment:
quantum agent-list
This will bring back output like the following screenshot when everything is running correctly:
To check the status of the OpenStack Block Storage service, Cinder, you can use the following commands:
ps -ef | grep cinder
This command produces output like the following screenshot:
netstat -anp | grep 3260
This command produces output like the following:
tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN 10236/tgtd
netstat -an | grep 8776
This command produces output like the following:
tcp 0 0.0.0.0:8776 0.0.0.0:* LISTEN
cinder list
This produces output like the following:
The OpenStack Object Storage service, Swift, has a few built-in utilities that allow us to check its health. To do so, log into your Swift node and run the following commands:
swift stat
This produces output like the following:
There will be a service for each configured container, account, object-store.
ps -ef | grep swift
This should produce output like the following screenshot:
ps -ef | grep swift-proxy
This should produce the following screenshot:
netstat -anlp | grep 8080
This should produce output like the following:
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 9818/python
We have used some basic commands that communicate with OpenStack services to show they're running. This elementary level of checking helps with troubleshooting our OpenStack environment.
OpenStack Compute services are complex, and being able to diagnose faults is an essential part of ensuring the smooth running of the services. Fortunately, OpenStack Compute provides some tools to help with this process, along with tools provided by Ubuntu to help identify issues.
Troubleshooting OpenStack Compute services can be a complex issue, but working through problems methodically and logically will help you reach a satisfactory outcome. Carry out the following suggested steps when encountering the different problems presented.
sysctl -A | grep ip_forward
net.ipv4.ip_forward=1
sudo sysctl -p
install ipv6 /bin/true
nova list
nova console-log INSTANCE_ID
For example:
nova console-log ee0cb5ca-281f-43e9-bb40-42ffddcb09cd
The console logs are owned by root, so only an administrator can do this. They are placed at: var/lib/nova/instances/<instance_id>/console.log.
If an instance fails to communicate to download the extra information that can be supplied to the instance meta-data, we can end up in a situation where the instance is up but you're unable to log in, as the SSH key information is injected using this method.
Viewing the console log will show output like in the following screenshot:
If you are not using Neutron, ensure the following:
sudo iptables -L -n -t nat
We should see a line in the output like in the following screenshot:
ps -ef | grep dnsmasq
This will bring back two process entries, the parent dnsmasq process and a spawned child (verify by the PIDs). If there are any other instances of dnsmasq running, kill the dnsmasq processes. When killed, restart nova-network, which will spawn dnsmasq again without any conflicting processes.
If you are using Neutron:
The first place to look is in the /var/log/quantum/metadata_agent.log on the Network host. Here you may see Python stack traces that could indicate a service isn't running correctly. A connection refused message may appear here suggesting the metadata agent running on the Network host is unable to talk to the Metadata service on the Controller host via the Metadata Proxy service (also running on the Network host).
The metadata service runs on port 8775 on our Controller host, so checking that is running involves checking the port is open and it's running the metadata service. To do this on the Controller host, run the following:
sudo netstat -antp | grep 8775
This will bring back the following output if everything is OK:
tcp 0 0 0.0.0.0:8775 0.0.0.0:* LISTEN
If nothing is returned, check that the nova-api service is running and if not, start it.
Sometimes, a little patience is needed before assuming the instance has not booted, because the image is copied across the network to a node that has not seen the image before. At other times though, if the instance has been stuck in booting or a similar state for longer than normal, it indicates a problem. The first place to look will be for errors in the logs. A quick way of doing this is from the controller server and by issuing the following command:
sudo nova-manage logs errors
A common error that is usually present is usually related to AMQP being unreachable. Generally, these errors can be ignored unless, that is, you check the time stamp and these errors are currently appearing. You tend to see a number of these messages related to when the services first started up so look at the timestamp before reaching conclusions.
This command brings back any log line with the ERROR as log level, but you will need to view the logs in more detail to get a clearer picture.
A key log file, when troubleshooting instances that are not booting properly, will be available on the controller host at /var/log/nova/nova-scheduler.log. This file tends to produce the reason why an instance is stuck in Building state. Another file to view further information will be on the compute host at /var/log/nova/nova-compute.log. Look here at the time you launch the instance. In a busy environment, you will want to tail the log file and parse for the instance ID.
Check /var/log/nova/nova-network.log (for Nova Network) and /var/log/quantum/*.log (for Neutron) for any reason why instances aren't being assigned IP addresses. It could be issues around DHCP preventing address allocation or quotas being reached.
The majority of the OpenStack services are web services, meaning the responses from the services are well defined.
40X: This refers to a service that is up but responding to an event that is produced by some user error. For example, a 401 is an authentication failure, so check the credentials used when accessing the service.
500: These errors mean a connecting service is unavailable or has caused an error that has caused the service to interpret a response to cause a failure. Common problems here are services that have not started properly, so check for running services.
If all avenues have been exhausted when troubleshooting your environment, reach out to the community, using the mailing list or IRC, where there is a raft of people willing to offer their time and assistance. See the Getting help from the community recipe at the end of this article for more information.
From the OpenStack controller node, you can execute the following command to get a list of the running instances in the environment:
sudo nova-manage vm list
To view all instances across all tenants, as a user with an admin role execute the following command:
nova list --all-tenants
These commands are useful in identifying any failed instances and the host on which it is running. You can then investigate further.
Troubleshooting OpenStack Compute problems can be quite complex, but looking in the right places can help solve some of the more common problems. Unfortunately, like troubleshooting any computer system, there isn't a single command that can help identify all the problems that you may encounter, but OpenStack provides some tools to help you identify some problems. Having an understanding of managing servers and networks will help troubleshoot a distributed cloud environment such as OpenStack.
There's more than one place where you can go to identify the issues, as they can stem from the environment to the instances themselves. Methodically working your way through the problems though will help lead you to a resolution.