Storage nodes within a MySQL Cluster store all the data either in memory or on disk; they store indexes in memory and conduct a significant portion of the SQL query processing. The single-threaded storage node process is called ndbd
and either this or the multi-threaded version (ndbdmt
) must be installed and executed on each storage node.
Firstly, download the two files required on each storage node (that is, complete this on all storage nodes simultaneously):
Once both the files are downloaded, install these two packages using the same command as it was used in the previous recipe:
Now, using your favorite text editor, insert the following into /etc/my.cnf
file, replacing 10.0.0.5:1186
with the hostname or IP address of the already installed management node:
Note
Ensure that you have completed the above steps on all storage nodes before continuing. This is because (unless you force it) a MySQL Cluster will not start without all storage nodes, and it is best practice to start all storage nodes at the same time, if possible.
Now, as we have installed the storage node client and configured /etc/my.cnf
file to allow a starting storage node process to find its management node, we can start our cluster.
To join storage nodes to our cluster, following requirements must be met:
All storage nodes must be ready to join the cluster (this can be overridden, if really required)
A config.ini
file must be prepared with the details of the storage nodes in a cluster, and then a management node must be started based on this file
The storage nodes must be able to communicate with (that is, no firewall) the management node, otherwise, the storage nodes will fail to connect
The storage nodes must be able to communicate freely with each other; problems here can cause clusters failing to start or, in some case, this can cause truly bizarre behavior
There must be enough memory on the storage nodes to start the process (using the configuration in this example, that is, the defaults will result in a total RAM usage of approximately 115 MB per storage node)
Note
When you start ndbd
, you will notice that the two processes have started. One is an angel
process, which monitors the other---the main ndbd
process. The angel
process is generally configured to automatically restart the main process if a problem is detected, which causes that process to exit. This can cause confusion, if you attempt to send a KILL
signal to just the main process as the angel
process can create a replacement process.
To start our storage nodes, on each node create the data
directory that was configured for each storage node in the config.ini
file on the management node, and run ndbd
with the –-initial
flag:
If you fail to create the data
directory (/var/lib/mysql-cluster
in our example in the previous recipe), you may well get a Cannot become daemon: /var/lib/mysql-cluster/ndb_3.pid: open for write failed: No such file or directory
error. If you get this, run the mkdir
command again on the relevant storage node
Once you have started ndbd
on each node, you can run the management client, ndb_mgm
, from any machine as long as it can talk to port 1186
on the management node with which it will work.
Note
ndb_mgm
, like all MySQL Cluster binaries, reads the [mysqld_cluster]
section in /etc/my.cnf
to find the management node's IP address to connect to. If you are running ndb_mgm
on a node that does not have this set in /etc/my.cnf
, you should pass a cluster connect string to ndb_mgm
(see the final recipe in this chapter—Cluster Concepts).
The --initial
flag to ndbd
tells ndbd
to initialize the DataDir
on the local disk, overwriting any existing data (or in this case, creating it for the first time).
You should only use --initial
the first time you start a cluster or if you are deliberately discarding the local copy of data. If you do it at other times, you risk losing all the data held in the cluster, if there is no online node in the same nodegroup that stays online, long enough, to update the starting node.
Note
Be aware that starting ndbd
with --initial
does not always delete all of the logfiles in the DataDir
; you should delete these manually if you need to remove them.
The cluster will go through various stages as it starts. If you run the ALL STATUS
command in the management client while the nodes are starting, you will see that they start off as unconnected, then go through the startup phases, and finally are marked as started.
Because often a node starting must apply a large number of database transactions either from other nodes or from its local disk, this process can take some time in clusters with data. Although, in the case of an initial start, this process should be relatively fast. The following output shows the management client when all the storage nodes have started:
At this point, this cluster has one management node and two storage nodes that are connected. You are now able to start the SQL nodes.
In the case you have any problems, look at the following points:
Cluster error log—in the DataDir
on the management node, with a filename similar to DataDir/ndb_1_cluster.log
(the number is the sequence number) MySQL Cluster has an inbuilt rotation system—when a file gets to 1 MB, a new one with a higher sequence number is created
Storage node error log—in the DataDir
on the relevant storage node, with a filename similar to DataDir/ndb_4_out.log
(the number is the cluster node ID)
These two logfiles should give you a pretty good idea of what is causing a problem.
If one node remains in phase 1
, when others are in phase 2
, this likely indicates a network problem—normally, a firewall between storage nodes causes this issue. In such cases, double check that there are no software or hardware firewalls between the nodes.
For convenience (particularly, when writing scripts to manage large clusters), you may want to start the ndbd
process on all storage nodes to the point that they get configuration data from the management node and are able to be controlled by it but you may not want to completely start them. This is achieved with the --nostart
or -n
flag:
On the storage nodes (and assuming that ndbd
is not already running):
Note
If required (for example, the first time you start a node), you could add the –initial
flag as you would for a normal start of the storage node process.
Then, on the management node you should see that the nodes are not started (this is different from not connected) as shown here:
It is then possible to start all the nodes at the same time from the management client with the command <nodeid> START
as follows:
You can start all storage nodes in a not started state with the command ALL START
:
During the start up of storage nodes, you may find the following list of phases useful, if nodes fail or hang during a certain start up phase:
Note
MySQL Clusters with a large amount of data in them will be slow to start; it is often useful to look at CPU usage and network traffic to reassure you and check that the cluster is actually still doing something.
Setup and initialization (Phase -1): During this phase, each storage node is initialized (obtaining a node ID from the management node, fetching configuration data (effectively the contents of config.ini
file), allocating ports for inter-cluster communication, and allocating memory).
Phase 0: If the storage node is started with --initial
, the cluster kernel is initialized and in all cases certain parts of it are prepared for use.
Phase 1: The remainder of the cluster kernel is started and nodes start communicating with each other (using heartbeats).
Phase 2: Nodes check the status of each other and elect a Master node.
Phase 3: Additional parts of the cluster kernel used for communication are initialized.
Phase 4: For an initial start or initial node restart, the redo logfiles are created. The number of these files is equal to the NoOfFragmentLogFiles
variable in the config.ini
file. In the case of a restart, nodes read schemas and apply local checkpoints, until the last restorable global checkpoint has been reached.
Phase 5: Execute a local checkpoint, then a global checkpoint, and memory usage check.
Phase 6: Establish node groups.
Phase 7: The arbitrator node is selected and begins to function. At this point, nodes are shown as started in the management client, and SQL nodes may join the cluster.
Phase 8: In the case of a restart, indexes are rebuilt.
Phase 9: Internal node's startup variables are reset.