The Mesos API
Mesos provides an API to allow developers to build custom frameworks that can run on top of the underlying distributed infrastructure. The detailed steps involved in developing bespoke frameworks leveraging this API and the new HTTP API will be explored in detail in Chapter 6, Mesos Frameworks.
Messages
Mesos implements an actor-style message-passing programming model to enable nonblocking communication between different Mesos components and leverages protocol buffers for the same. For example, a scheduler needs to tell the executor to utilize a certain number of resources, an executor needs to provide status updates to the scheduler regarding the tasks that are executed, and so on. Protocol buffers provide the required flexible message delivery mechanism to enable this communication by allowing developers to define custom formats and protocols that can be used across different languages. For more details regarding the messages that are passed between different Mesos components, refer to https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto
API details
A brief description of the different APIs and methods that Mesos provides is provided in the following section:
Executor API
A brief description of the Executor API is given below. For more details, visit http://mesos.apache.org/api/latest/java/org/apache/mesos/Executor.html.
registered
: This can be registered via the following code:void registered(ExecutorDriver driver, ExecutorInfo executorInfo, FrameworkInfo frameworkInfo, SlaveInfo slaveInfo)
This code is invoked once the executor driver is able to successfully connect with Mesos. In particular, a scheduler can pass some data to its executors through the
ExecutorInfo.getData()
field.The following are the parameters:
driver
: This is the executor driver that was registered and connected to the Mesos clusterexecutorInfo
: This describes information about the executor that was registeredframeworkInfo
: This describes the framework that was registeredslaveInfo
: This describes the slave that will be used to launch the tasks for this executor
reregistered
: This can be reregistered as follows:void reregistered(ExecutorDriver driver, SlaveInfo slaveInfo)
This code is invoked when the executor reregisters with a restarted slave.
The following are the parameters:
driver
: This is the executor driver that was reregistered with the Mesos masterslaveInfo
: This describes the slave that will be used to launch the tasks for this executor
disconnected
: This can be disconnected via the following code:void disconnected(ExecutorDriver driver)
The preceding code is invoked when the executor gets "disconnected" from the slave—for example, when the slave is restarted due to an upgrade).
The following is the parameter:
driver
: This is the executor driver that was disconnected.
launchTask
: Take a look at the following code:void launchTask(ExecutorDriver driver, TaskInfo task)
The preceding code is invoked when a task is launched on this executor (initiated via
SchedulerDriver.launchTasks(java.util.Collection<OfferID>, java.util.Collection<TaskInfo>, Filters
). Note that this task can be realized with a thread, a process, or some simple computation; however, no other callbacks will be invoked on this executor until this callback returns.The following are the parameters:
driver
: This is the executor driver that launched the tasktask
: This describes the task that was launched
killTask
: Run the following code:void killTask(ExecutorDriver driver, TaskID taskId)
This is invoked when a task running within this executor is killed via
SchedulerDriver.killTask
(TaskID). Note that no status update will be sent on behalf of the executor, and the executor is responsible for creating a newTaskStatus
protobuf message (that is, withTASK_KILLED
) and invokingExecutorDriver.sendStatusUpdate
(TaskStatus
).The following are the parameters:
driver
: This is the executor driver that owned the task that was killedtaskId
: This is the ID of the task that was killed
frameworkMessage
: Run the following code:void frameworkMessage(ExecutorDriver driver, byte[] data)
This is invoked when a framework message arrives for this executor. These messages are the best effort; do not expect a framework message to be retransmitted in any reliable fashion.
The following are the parameters:
driver
: This is the executor driver that received the messagedata
: This is the message payload
shutdown
: Execute the following code:void shutdown(ExecutorDriver driver)
This is invoked when the executor terminates all of its currently running tasks. Note that after Mesos determines that an executor has terminated, any tasks that the executor did not send Terminal status updates for (for example,
TASK_KILLED
,TASK_FINISHED
,TASK_FAILED
, and so on), and aTASK_LOST
status update will be created.The following is the parameter:
driver
: This is the executor driver that should terminate.
error
: Run the following:void error(ExecutorDriver driver, java.lang.String message)
The previous code is invoked when a fatal error occurs with the executor and/or executor driver. The driver will be aborted BEFORE invoking this callback.
The following are the parameters:
driver
: This is the executor driver that was aborted due to this errormessage
: This is the error message
The Executor Driver API
A brief description of the Executor Driver API is given below. For more details, visit http://mesos.apache.org/api/latest/java/org/apache/mesos/ExecutorDriver.html.
start
: Run the following line:Status start()
The preceding code starts the executor driver. This needs to be called before any other driver calls are made.
The state of the driver after the call is returned.
stop
: Run the following line:Status stop()
This stops the executor driver.
The state of the driver after the call is the return.
abort
: Run the following line:Status abort()
This aborts the driver so that no more callbacks can be made to the executor. The semantics of abort and stop are deliberately separated so that the code can detect an aborted driver (via the return status of
join()
; refer to the following section) and instantiate and start another driver if desired (from within the same process, although this functionality is currently not supported for executors).The state of the driver after the call is the return.
join
: Run the following:Status join()
This waits for the driver to be stopped or aborted, possibly blocking the current thread indefinitely. The return status of this function can be used to determine whether the driver was aborted (take a look at
mesos.proto
for a description of status).The state of the driver after the call is the return.
run
: Take a look at the following line of code:Status run()
This starts and immediately joins (that is, blocks) the driver.
The state of the driver after the call is the return.
sendStatusUpdate
: Here's the code to execute:Status sendStatusUpdate(TaskStatus status)
This sends a status update to the framework scheduler, retrying as necessary until an acknowledgement is received or the executor is terminated (in which case, a
TASK_LOST
status update will be sent). Take a look atScheduler.statusUpdate(org.apache.mesos.SchedulerDriver, TaskStatus)
for more information about status update acknowledgements.The following is the parameter:
status
: This is the status update to send.
- The state of the driver after the call is the return.
sendFrameworkMessage
: Run the following code:Status sendFrameworkMessage(byte[] data)
This sends a message to the framework scheduler. These messages are sent on a best effort basis and should not be expected to be retransmitted in any reliable fashion.
The parameters are as follows:
data
: This is the message payload.
The state of the driver after the call is the return.
The Scheduler API
A brief description of the Scheduler API is given below. For more details, visit http://mesos.apache.org/api/latest/java/org/apache/mesos/Scheduler.html.
registered
: This can be registered via the following code:void registered(SchedulerDriver driver, FrameworkID frameworkId, MasterInfo masterInfo)
The preceding is invoked when the scheduler successfully registers with a Mesos master. A unique ID (generated by the master) is used to distinguish this framework from others, and
MasterInfo
with the IP and port of the current master are provided as arguments.The following are the parameters:
driver
: This is the scheduler driver that was registeredFrameworkID
: This is theFrameworkID
generated by the masterMasterInfo
: This is the information about the current master, including the IP and port.
reregistered
: The preceding code can be reregistered as follows:void reregistered(SchedulerDriver driver, MasterInfo masterInfo)
The preceding code is invoked when the scheduler reregisters with a newly elected Mesos master. This is only called when the scheduler is previously registered.
MasterInfo
containing the updated information about the elected master is provided as an argument.The parameters are as follows:
driver
: This is the driver that was reregisteredMasterInfo
: This is the updated information about the elected master
resourceOffers
: Execute the following code:void resourceOffers(SchedulerDriver driver, java.util.List<Offer> offers)
The preceding code is invoked when resources are offered to this framework. A single offer will only contain resources from a single slave. Resources associated with an offer will not be reoffered to this framework until either; (a) this framework rejects these resources (refer to
SchedulerDriver.launchTasks(java.util.Collection<OfferID>, java.util.Collection<TaskInfo>, Filters)
), or (b) these resources are rescinded (refer toofferRescinded(org.apache.mesos.SchedulerDriver, OfferID)
). Note that resources may be concurrently offered to more than one framework at a time, depending on the allocator being used. In this case, the first framework to launch tasks using these resources will be able to use them, while the other frameworks will have these resources rescinded. (Alternatively, if a framework has already launched tasks with these resources, these tasks will fail with aTASK_LOST
status and a message saying as much).The following are the parameters:
driver
: This is the driver that was used to run this scheduleroffers
: These are the resources offered to this framework
offerRescinded
: Run the following code:void offerRescinded(SchedulerDriver driver, OfferID offerId)
This is invoked when an offer is no longer valid (for example, the slave is lost or another framework is used resources in the offer). If, for whatever reason, an offer is never rescinded (for example, a dropped message, failing over framework, and so on), a framework that attempts to launch tasks using an invalid offer will receive a
TASK_LOST
status update for these tasks (take a look atresourceOffers(org.apache.mesos.SchedulerDriver, java.util.List<Offer>)
).The following are the parameters:
driver
: This is the driver that was used to run this schedulerofferID
: This is the ID of the offer that was rescinded
statusUpdate
: Take a look at the following code:void statusUpdate(SchedulerDriver driver, TaskStatus status)
The preceding code is invoked when the status of a task changes (for example, a slave is lost, so the task is lost; a task is finished, and an executor sends a status update saying so; and so on). If, for whatever reason, the scheduler is aborted during this callback or the process exits, then another status update will be delivered. (Note, however, that this is currently not true if the slave sending the status update is lost or fails during this time.)
The parameters are as follows:
driver
: This is the driver that was used to run this schedulerstatus
: This is the status update, which includes the task ID and status
frameworkMessage
: Take a look at the following code:void frameworkMessage(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, byte[] data)
The preceding code is invoked when an executor sends a message. These messages are sent on a best effort basis and should not be expected to be retransmitted in any reliable fashion.
The parameters are as follows:
driver
: This is the driver that received the messageExecutorID
: This is the ID of the executor that sent the messageSlaveID
: This is the ID of the slave that launched the executordata
: This is the message payload
disconnected
: Run the following:void disconnected(SchedulerDriver driver)
This is invoked when the scheduler becomes disconnected from the master (for example, the master fails and another takes over).
The following is the parameter:
driver
: This is the driver that was used to run this scheduler
slaveLost
: Execute the following code:void slaveLost(SchedulerDriver driver, SlaveID slaveId)
This is invoked when a slave is determined unreachable (for example, machine failure or network partition). Most frameworks need to reschedule any tasks launched on this slave on a new slave.
The following are the parameters:
driver
: This is the driver that was used to run this schedulerSlaveID
: This is the ID of the slave that was lost
executorLost
: Run the following:void executorLost(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, int status)
The preceding is invoked when an executor is exited or terminated. Note that any running task will have the
TASK_LOST
status update automatically generated.The following are the parameters:
driver
: This is the driver that was used to run this schedulerExecutorID
: This is the ID of the executor that was lostslaveID
: This is the ID of the slave that launched the executorstatus
: This is the exit status of the executor
error
: Run the following code:void error(SchedulerDriver driver, java.lang.String message)
The preceding is invoked when there is an unrecoverable error in the scheduler or driver. The driver will be aborted before invoking this callback.
The following are the parameters:
driver
: This is the driver that was used to run this schedulermessage
: This is the error message
The Scheduler Driver API
A brief description of the Scheduler Driver API is given below. For more details, visit http://mesos.apache.org/api/latest/java/org/apache/mesos/SchedulerDriver.html
start
: Run the following code:Status start()
This starts the scheduler driver. It needs to be called before any other driver calls are made.
The preceding returns the state of the driver after the call.
stop
: Execute the following code:Status stop(boolean failover)
This stops the scheduler driver. If the
failover
flag is set to false, it is expected that this framework will never reconnect to Mesos. So, Mesos will unregister the framework and shut down all its tasks and executors. Iffailover
is true, all executors and tasks will remain running (for some framework-specific failover timeout), allowing the scheduler to reconnect (possibly in the same process or from a different process—for example, on a different machine).The following is the parameter:
failover
: This is whether framework failover is expected
This returns the state of the driver after the call.
Stop
: Run the following line:Status stop()
This stops the scheduler driver assuming no failover. This will cause Mesos to unregister the framework and shut down all its tasks and executors.
This returns the state of the driver after the call.
abort
: Execute the following code:Status abort()
This aborts the driver so that no more callbacks can be made to the scheduler. The semantics of abort and stop are deliberately separated so that code can detect an aborted driver (via the return status of
join()
; refer to the following section) and instantiate and start another driver if desired from within the same process.This returns the state of the driver after the call.
join
: Run the following:Status join()
This waits for the driver to be stopped or aborted, possibly blocking the current thread indefinitely. The return status of this function can be used to determine whether the driver was aborted (take a look at
mesos.proto
for a description ofStatus
).This returns the state of the driver after the call.
run
: Execute the following:Status run()
This starts and immediately joins (that is, blocks) the driver.
It returns the state of the driver after the call.
requestResources
: Take a look at the following:Status requestResources(java.util.Collection<Request> requests)
This requests resources from Mesos (take a look at
mesos.proto
for a description of Request and how, for example, to request resources from specific slaves). Any resources available are offered to the framework via theScheduler.resourceOffers(org.apache.mesos.SchedulerDriver, java.util.List<Offer>)
callback asynchronously.The following is the parameter:
requests
: These are the resource requests.
It returns the state of the driver after the call.
launchTasks
: Use the following code:Status launchTasks(java.util.Collection<OfferID> offerIds, java.util.Collection<TaskInfo> tasks, Filters filters)
The preceding code launches the given set of tasks on a set of offers. Resources from offers are aggregated when more than one is provided. Note that all the offers must belong to the same slave. Any resources remaining (that is, not used by the tasks or their executors) will be considered declined. The specified filters are applied on all unused resources (take a look at
mesos.proto
for a description of Filters). Invoking this function with an empty collection of tasks declines offers in their entirety (refer todeclineOffer(OfferID, Filters)
).The following are the parameters:
offerIds
: This is the collection of offer IDstasks
: This is the collection of tasks to be launchedfilters
: This is the filters to set for any remaining resources.
It returns the state of the driver after the call.
killTask
: Execute the following code:Status killTask(TaskID taskId)
This kills the specified task. Note that attempting to kill a task is currently not reliable. If, for example, a scheduler fails over while it attempts to kill a task, it will need to retry in the future. Likewise, if unregistered/disconnected, the request will be dropped (these semantics may be changed in the future).
The following is the parameter:
taskId
: This is the ID of the task to be killed
It returns the state of the driver after the call.
declineOffer
: Run the following code:Status declineOffer(OfferID offerId, Filters filters)
This declines an offer in its entirety and applies the specified filters on the resources (take a look at
mesos.proto
for a description of Filters). Note that this can be done at any time, and it is not necessary to do this within theScheduler.resourceOffers(org.apache.mesos.SchedulerDriver, java.util.List<Offer>)
callback.The following are the parameters:
offerId
: This is the ID of the offer to be declinedfilters
: These are the filters to be set for any remaining resources
It returns the state of the driver after the call.
reviveOffers
: Execute the following:Status reviveOffers()
This removes all the filters previously set by the framework (via
launchTasks(java.util.Collection<OfferID>, java.util.Collection<TaskInfo>, Filters)
). This enables the framework to receive offers from these filtered slaves.It returns the state of the driver after the call.
sendFrameworkMessage
: Take a look at the following:Status sendFrameworkMessage(ExecutorID executorId, SlaveID slaveId, byte[] data)
This sends a message from the framework to one of its executors. These messages are sent on a best effort basis and should not be expected to be retransmitted in any reliable fashion.
The parameters are:
executorId
: This is the ID of the executor to send the message toslaveId
: This is the ID of the slave that runs the executordata
: This is the message
It returns the state of the driver after the call.
reconcileTasks
: Take a look at the following code:Status reconcileTasks(java.util.Collection<TaskStatus> statuses)
This allows the framework to query the status for nonterminal tasks. This causes the master to send back the latest task status for each task in
statuses
if possible. Tasks that are no longer known will result in aTASK_LOST
update. Ifstatuses
is empty, the master will send the latest status for each task currently known.The following are the parameters:
statuses
: This is the collection of nonterminalTaskStatus
protobuf messages to reconcile.
It returns the state of the driver after the call.