Troubleshooting the installation
Ansible is a tool, written by people, that runs playbooks, written by people, to configure systems that would ordinarily be manually performed by people, and as such, errors can occur. The end result is only as good as the input.
Typical failures either occur quickly, such as connection problems, and will be relatively self-evident, or after long running jobs that may be as a result of load or network timeouts. In any case, the OpenStack-Ansible playbooks provide an efficient mechanism to rerun playbooks without having to repeat the tasks it has already completed.
On failure, Ansible produces a file in /root
(as we're running these playbooks as root
) called the playbook name, with the file extension of .retry
. This file simply lists the hosts that had failed so this can be referenced when running the playbook again. This targets the single or small group of hosts, which is far more efficient than a large cluster of machines that successfully completed.
How to do it...
We will step through a problem that caused one of the playbooks to fail.
Note the failed playbook and then invoke it again with the following steps:
Ensure that you're in the
playbooks
directory as follows:cd /opt/openstack-ansible/playbooks
Now rerun that Playbook, but specify the
retry
file:ansible-openstack setup-openstack.yml --retry /root/setup-openstack.retry
In most situations, this will be enough to rectify the situation, however, OpenStack-Ansible has been written to be idempotent—meaning that the whole playbook can be run again, only modifying what it needs to. Therefore, you can run the Playbook again without specifying the
retry
file.
Should there be a failure at this first stage, execute the following:
First remove the generated
inventory
files:rm -f /etc/openstack_deploy/openstack_inventory.json rm -f /etc/openstack_deploy/openstack_hostnames_ips.yml
Now rerun the
setup-hosts.yml
playbook:cd /opt/openstack-ansible/playbooks openstack-ansible setup-hosts.yml
In some situations, it might be applicable to destroy the installation and begin again. As each service gets installed in LXC containers, it is very easy to wipe an installation and start from the beginning. To do so, carry out the following steps:
We first destroy all of the containers in the environment:
cd /opt/openstack-ansible/playbooks openstack-ansible lxc-containers-destroy.yml
You will be asked to confirm this action. Follow the ons-screen prompts.
We recommend you to uninstall the following package to avoid any conflicts with the future running of the playbooks, and also clear out any remnants of containers on each host:
ansible hosts -m shell -a "pip uninstall -y appdirs"
Finally, remove the inventory information:
rm -f /etc/openstack_deploy/openstack_inventory.json /etc/openstack_deploy/openstack_hostnames_ips.yml
How it works…
Ansible is not perfect and so are computers. Sometimes failures occur in the environment due to SSH timeouts, or some other transient failure. Also, despite Ansible trying its best to retry the execution of a playbook, the result might be a failure. Failure in Ansible is quite obvious—it is usually predicated by outputs of red text on the screen. In most cases, rerunning the offending playbook may get over some transient problems. Each playbook runs a specific task, and Ansible will state which task has failed. Troubleshooting why that particular task had failed will eventually lead to a good outcome. Worst case, you can reset your installation from the beginning.