This document is designed for the developer who want to create a cloud provider or container extension for the DockerParallel
package. We will discuss the structure of the package and calling order of the package function.
The package assumes a server-worker-client structure, their relationship can be summarized as follow
{width=90%}
When the user creates a DockerCluster
object, the object serves as the client. The server will receive the computation jobs from the client and send them to the workers. The server and workers are defined by the DockerContainer
objects. They can be dynamically created on the cloud using the functions provided by the corresponding CloudProvider
object. However, it is possible to have the server and some workers running outside of the cloud as the user might have them in some other platforms.
The design of the package is twofold. It contains two sets of APIs, user APIs and developer APIs. The user APIs are called by the user to start the cluster on the cloud. The developer APIs are all S4 generics and should be defined by the developer to collect the container information and deploy the container on the cloud. Below shows the difference between the user and developer APIs
{width=90%}
The user issues the high-level command to the DockerCluster
object(e.g. startCluster
). The DockerCluster
object will call the downstream functions to supply what the user needs. DockerCluster
class contains fives major slots, cloudConfig
, cloudRuntime
, cloudProvider
, serverContainer
and workerContainer
.
CloudConfig
stores the cluster static settings such as jobQueueName
and workerNumber
.
CloudRuntime
keeps the runtime information of the server and workers. CloudProvider
provides the functions to run the container on the cloud and DockerContainer
defines which container should be used. As the cluster might need to deploy both the server and workers, the DockerCluster
object defines both serverContainer
and workerContainer
as its slots. Since a DockerCluster
object corresponds to a cluster on the cloud, all the before-mentioned components behave like the environment object in R. Therefore, it is possible to change the value in the DockerCluster
object inside a function and broadcast the effect to the same object outside of the function.
The reference class CloudProvider
and DockerContainer
are generalizable, the developer can define a new cloud provider by defining a new class which inherits CloudProvider
. The same rule applies to DockerContainer
as well. In the rest of the document, we will discuss the implementation details of the DockerCluster
object.
In this section, we use the function DockerCluster$startCluster
as an example to show how the components of the DockerCluster
object work together to deploy a cluster on the cloud.
{width=90%}
The gray color means the function is from DockerCluster
, red means the CloudProvider
and blue means the DockerContainer
. Note that the flowchart only contains these three classes as CloudConfig
and CloudRuntime
are purely the class for storing the cluster information. In most cases, it is DockerCluster
's job to manage the data in the cluster, there is no need to change the value of CloudConfig
and CloudRuntime
in the function defined for CloudProvider
or DockerContainer
. The only exception is the function reconnectDockerCluster
where the cloud provider is responsible for setting all the values in the cluster.
The package provides getters/setters for the developer to access the values in the DockerCluster
object. The first argument is always the DockerCluster
object. The high-level accessors are
.getCloudProvider
.getCloudConfig
.getServerContainer
.getWorkerContainer
.getCloudRuntime
Note that there are no setters for them as they are of the reference class. The accessors for the CloudConfig
values are
## Getter
.getJobQueueName
.getExpectedWorkerNumber
.getWorkerHardware
.getServerHardware
.getServerWorkerSameLAN
.getServerClientSameLAN
.getServerPassword
.getServerPort
## Setter
.setJobQueueName
.setExpectedWorkerNumber
.setWorkerHardware
.setServerHardware
.setServerWorkerSameLAN
.setServerClientSameLAN
.setServerPassword
.setServerPort
The accessors for the CloudRuntime
values are
## Getter
.getServerFromOtherSource
.getServerPrivateIp
.getServerPrivatePort
.getServerPublicIp
.getServerPublicPort
## Setter
.setServerFromOtherSource
.setServerPrivateIp
.setServerPrivatePort
.setServerPublicIp
.setServerPublicPort
In most cases, only the getter are required to be called by the developer when developing the cloud provider or the container.
DockerContainer
defines the docker image and its environment variables. Its class definition is
setRefClass(
"DockerContainer",
fields = list(
name = "character",
backend = "character",
maxWorkerNum = "integer",
environment = "list",
image = "character"
)
)
where name
is the name of the container and backend is the name of the parallel backend used by the cluster. maxWorkerNum
defines the maximum number of workers that can be run in the same container. environment
is the environment variable in the container. image
is the image used by the container. Note that a minimum container should define at least the image
slot. The other slots are optional.
The generic function for the DockerContainer
class are
configServerContainerEnv
configWorkerContainerEnv
registerParallelBackend
deregisterParallelBackend
getServerContainer
getExportedNames
getExportedObject
A minimum container should override configServerContainerEnv
, configWorkerContainerEnv
, registerParallelBackend
and deregisterParallelBackend
. The rest is optional.
The motivation for configuring the container environment is to allow developer to pass the cluster data(e.g. server password) to the container before deploying it on the cloud(please see The big picture). Therefore, server can be password protected and worker can find the server via the settings in the environment variable. The function prototypes are
getGeneric("configServerContainerEnv")
#> nonstandardGenericFunction for "configServerContainerEnv" defined from package "DockerParallel"
#>
#> function (container, cluster, verbose)
#> {
#> standardGeneric("configServerContainerEnv")
#> }
#> <bytecode: 0x0000000015c2e5d8>
#> <environment: 0x0000000015c1fff0>
#> Methods may be defined for arguments: container
#> Use showMethods(configServerContainerEnv) for currently available ones.
and
getGeneric("configWorkerContainerEnv")
#> nonstandardGenericFunction for "configWorkerContainerEnv" defined from package "DockerParallel"
#>
#> function (container, cluster, workerNumber, verbose)
#> {
#> standardGeneric("configWorkerContainerEnv")
#> }
#> <bytecode: 0x0000000015cae000>
#> <environment: 0x0000000015c77708>
#> Methods may be defined for arguments: container
#> Use showMethods(configWorkerContainerEnv) for currently available ones.
where container
is the worker container and cluster
is the DockerCluster
object. workerNumber
defines how many workers should be run in the same container. Please keep in mind that the user might also define the environment variables in the container, so you should insert your environment variables to DockerContainer$environment
, not overwrite the entire environment object.
Since the container object is a reference object, it is recommended to call $copy()
before set the environment variable to avoid the side affect.
It is the container's duty to register the parallel backend as neither the cloud provider nor the cluster knows which backend your container supports and how to register it. The function prototypes are
getGeneric("registerParallelBackend")
#> nonstandardGenericFunction for "registerParallelBackend" defined from package "DockerParallel"
#>
#> function (container, cluster, verbose, ...)
#> {
#> standardGeneric("registerParallelBackend")
#> }
#> <bytecode: 0x0000000016146ed0>
#> <environment: 0x0000000016143950>
#> Methods may be defined for arguments: container
#> Use showMethods(registerParallelBackend) for currently available ones.
and
getGeneric("deregisterParallelBackend")
#> nonstandardGenericFunction for "deregisterParallelBackend" defined from package "DockerParallel"
#>
#> function (container, cluster, verbose, ...)
#> {
#> standardGeneric("deregisterParallelBackend")
#> }
#> <bytecode: 0x0000000015ce33b8>
#> <environment: 0x0000000015cd05e0>
#> Methods may be defined for arguments: container
#> Use showMethods(deregisterParallelBackend) for currently available ones.
where container
is the worker container and cluster
is the DockerCluster
object. The backend can be anything defined globally that allow the user to do the parallel computing. The popular parallel frameworks are foreach
and BiocParallel
. Other frameworks are also possible. The default deregisterParallelBackend
can deregister the foreach backend. If your backend is not from foreach
, you should define deregisterParallelBackend
.
The function getServerContainer
is purely for obtaining the server container from the worker container. By doing so the user only need to provide the worker container to the cluster. The prototype is
getGeneric("getServerContainer")
#> nonstandardGenericFunction for "getServerContainer" defined from package "DockerParallel"
#>
#> function (workerContainer)
#> {
#> standardGeneric("getServerContainer")
#> }
#> <bytecode: 0x0000000016070500>
#> <environment: 0x0000000016068960>
#> Methods may be defined for arguments: workerContainer
#> Use showMethods(getServerContainer) for currently available ones.
This function is optional, but we recommend to define it as otherwise the user must explicitly provide both the server and worker container to the cluster if both need to be deployed by the cloud.
You can export the functions and variables in the container via getExportedNames
and getExportedObject
. The prototypes are
getGeneric("getExportedNames")
#> nonstandardGenericFunction for "getExportedNames" defined from package "DockerParallel"
#>
#> function (x)
#> {
#> standardGeneric("getExportedNames")
#> }
#> <bytecode: 0x0000000015ef65e0>
#> <environment: 0x0000000015ee5c00>
#> Methods may be defined for arguments: x
#> Use showMethods(getExportedNames) for currently available ones.
and
getGeneric("getExportedObject")
#> nonstandardGenericFunction for "getExportedObject" defined from package "DockerParallel"
#>
#> function (x, name)
#> {
#> standardGeneric("getExportedObject")
#> }
#> <bytecode: 0x0000000016052338>
#> <environment: 0x000000001604a238>
#> Methods may be defined for arguments: x
#> Use showMethods(getExportedObject) for currently available ones.
getExportedNames
defines the exported names and getExportedObject
gets the exported object.
They will be called by the cluster and the user can see the exported objects from DockerCluster$serverContainer$...
and DockerCluster$workerContainer$...
.
If the exported object is a function, the exported function will be defined in an environment such that the DockerCluster
object is implicitly assigned to the variable cluster
. In other words, the exported function can use the variable cluster
without defining it. For example, if we export the function
foo <- function(){
cluster$startCluster()
}
the user can call foo
via DockerCluster$workerContainer$foo()
with no argument and the cluster will be started. This can be useful if the developer needs to change anything in the cluster without asking the user to provide the DockerCluster
object. If the function has the argument cluster
, the argument will be removed from the function when the function is exported to the user. The user would not be bothered with the redundant cluster
argument.
CloudProvider
provides functions to deploy the container on the cloud. Its generic functions are
initializeCloudProvider
runDockerServer
stopDockerServer
getServerStatus
getDockerServerIp
setDockerWorkerNumber
getDockerWorkerNumbers
dockerClusterExists
reconnectDockerCluster
cleanupDockerCluster
most function names should be self-explained and we will introduce them in the following sections. A minimum cloud provider only needs to define runDockerServer
, stopDockerServer
, getDockerServerIp
, setDockerWorkerNumber
. However, many important features will be missing if you do not define the optional functions.
initializeCloudProvider
allows the developer to initialize the cloud provider. It will be automatically called by the DockerCluster
before running the server and workers on the cloud. This function can be omitted if the cloud provider does not require any initialization. Its generic is
getGeneric("initializeCloudProvider")
#> nonstandardGenericFunction for "initializeCloudProvider" defined from package "DockerParallel"
#>
#> function (provider, cluster, verbose)
#> {
#> standardGeneric("initializeCloudProvider")
#> }
#> <bytecode: 0x00000000160b5e80>
#> <environment: 0x00000000160b0ba0>
#> Methods may be defined for arguments: provider
#> Use showMethods(initializeCloudProvider) for currently available ones.
where the provider
is the cloud provider object, cluster
is the DockerCluster
object which contains the provider
. verbose
is an integer showing the verbose level.
runDockerServer
and setDockerWorkerNumber
implement the core functions of the cloud provider. The generics are
getGeneric("runDockerServer")
#> nonstandardGenericFunction for "runDockerServer" defined from package "DockerParallel"
#>
#> function (provider, cluster, container, hardware, verbose)
#> {
#> standardGeneric("runDockerServer")
#> }
#> <bytecode: 0x00000000151c1b18>
#> <environment: 0x00000000151d43d8>
#> Methods may be defined for arguments: provider
#> Use showMethods(runDockerServer) for currently available ones.
getGeneric("setDockerWorkerNumber")
#> nonstandardGenericFunction for "setDockerWorkerNumber" defined from package "DockerParallel"
#>
#> function (provider, cluster, container, hardware, workerNumber,
#> verbose)
#> {
#> standardGeneric("setDockerWorkerNumber")
#> }
#> <bytecode: 0x0000000014fec648>
#> <environment: 0x0000000015016290>
#> Methods may be defined for arguments: provider
#> Use showMethods(setDockerWorkerNumber) for currently available ones.
where container
should be an object that inherits from DockerContainer
. hardware
is from the class DockerHardware
. workerNumber
indicates how many workers should be run on the cloud. There is no return value for these generics.
The provider needs to make sure the worker number meets the user requirement. If some workers are died due to an unexpected reason, DockerCluster
will call setDockerWorkerNumber
to ask the provider to update the workers.
Not all slots in the container
object need to be used by the cloud provider. Only the slots name
, environment
and image
should be handled. The environment
slot in the container has been configured before passing to the cloud provider as The big picture) shows. However, the worker number is set to 1 for each container. If the cloud provider have enough resources to support multiple workers in one container, the provider should call configWorkerContainerEnv
with the new worker number again to overwrite the previous settings. This can make the worker deployment faster sometimes. It is providers responsibility to make sure the worker number does not exceed the number limitation specified in container$maxWorkerNum
The argument hardware
specifies the hardware for each container. As the hardware is different for each cloud provider, the base class DockerHardware
only contains the most import hardware parameters, that is, it only has the slots cpu
, memory
and id
. Although the cluster does not have any hard restriction on how these three parameters will be explained by the cloud provider, we recommend to explain them as follow
cpu
: The CPU unit used by each worker, 1024 units corresponds to a physical CPU core.memory
: The memory in MB used by each worker.id
: The id of the hardware. It is a character that is only meaningful for the provider. This can be ignored if the cloud provider do not need this slot.getDockerServerIp
needs to return the public/private IP of the container. Its generic is
getGeneric("getDockerServerIp")
#> nonstandardGenericFunction for "getDockerServerIp" defined from package "DockerParallel"
#>
#> function (provider, cluster, verbose)
#> {
#> standardGeneric("getDockerServerIp")
#> }
#> <bytecode: 0x0000000015d818d8>
#> <environment: 0x0000000015d76648>
#> Methods may be defined for arguments: provider
#> Use showMethods(getDockerServerIp) for currently available ones.
The return value should be a name list with four elements publicIp
, publicPort
, privateIp
and privatePort
. If the server does not have the public endpoint, the public IP and port can be NULL
.
The DockerCluster
object will use getDockerWorkerNumbers
to check the worker status. The prototype is
getGeneric("getDockerWorkerNumbers")
#> nonstandardGenericFunction for "getDockerWorkerNumbers" defined from package "DockerParallel"
#>
#> function (provider, cluster, verbose)
#> {
#> standardGeneric("getDockerWorkerNumbers")
#> }
#> <bytecode: 0x0000000015e88d48>
#> <environment: 0x0000000015e71080>
#> Methods may be defined for arguments: provider
#> Use showMethods(getDockerWorkerNumbers) for currently available ones.
The return value is a named list with initializing
and running
elements. Each of them is an integer indicating how many workers are in that status.
Sometimes users might have a running cluster on the cloud and want to reuse the same cluster. The functions dockerClusterExists
and reconnectDockerCluster
are designed to achieve it. The generics are
getGeneric("dockerClusterExists")
#> nonstandardGenericFunction for "dockerClusterExists" defined from package "DockerParallel"
#>
#> function (provider, cluster, verbose)
#> {
#> standardGeneric("dockerClusterExists")
#> }
#> <bytecode: 0x0000000015d31f48>
#> <environment: 0x0000000015d2a108>
#> Methods may be defined for arguments: provider
#> Use showMethods(dockerClusterExists) for currently available ones.
getGeneric("reconnectDockerCluster")
#> nonstandardGenericFunction for "reconnectDockerCluster" defined from package "DockerParallel"
#>
#> function (provider, cluster, verbose)
#> {
#> standardGeneric("reconnectDockerCluster")
#> }
#> <bytecode: 0x0000000016105130>
#> <environment: 0x00000000160fbc58>
#> Methods may be defined for arguments: provider
#> Use showMethods(reconnectDockerCluster) for currently available ones.
Two cluster is considered the same if they have the same jobQueueName
in CloudConfig
. Reconnecting to the same cluster is tricky, the provider needs to store the cluster data in a secret place and recover them later as the user requests. The package provides getDockerStaticData
to quickly extract the data from the cluster. The generic is
getGeneric("getDockerStaticData")
#> nonstandardGenericFunction for "getDockerStaticData" defined from package "DockerParallel"
#>
#> function (x)
#> {
#> standardGeneric("getDockerStaticData")
#> }
#> <bytecode: 0x0000000015da5208>
#> <environment: 0x0000000015d96fa8>
#> Methods may be defined for arguments: x
#> Use showMethods(getDockerStaticData) for currently available ones.
Where x
should be the DockerCluster
object. The method for DockerCluster
will apply the same generic again on cloudConfig
, workerContainer
and serverContainer
to obtain the data in the slots of DockerCluster
.
When reconnecting, the provider can find the extracted data from the cloud and use setDockerStaticData
to recover them. Its generic is
getGeneric("setDockerStaticData")
#> nonstandardGenericFunction for "setDockerStaticData" defined from package "DockerParallel"
#>
#> function (x, staticData)
#> {
#> standardGeneric("setDockerStaticData")
#> }
#> <bytecode: 0x00000000150b6360>
#> <environment: 0x00000000150ce8a0>
#> Methods may be defined for arguments: x
#> Use showMethods(setDockerStaticData) for currently available ones.
The slot cloudConfig
, workerContainer
and serverContainer
in the DockerCluster
object will be recovered. CloudRuntime
should be automatically updated by the function call to getDockerServerIp
and getDockerWorkerNumbers
after reconnectDockerCluster
returns. Therefore, it is expected that the provider is functional once reconnectDockerCluster
returns the control to the DockerCluster
object.
The package provides cleanupDockerCluster
to delete the resources on the cloud when the cluster is not needed. Its generic is
getGeneric("cleanupDockerCluster")
#> nonstandardGenericFunction for "cleanupDockerCluster" defined from package "DockerParallel"
#>
#> function (provider, cluster, deep, verbose)
#> {
#> standardGeneric("cleanupDockerCluster")
#> }
#> <bytecode: 0x0000000015be5410>
#> <environment: 0x0000000015bb8bc0>
#> Methods may be defined for arguments: provider
#> Use showMethods(cleanupDockerCluster) for currently available ones.
where deep
means whether to perform a deep cleanup. This generic will only be called when the cluster has been stopped and all server and workers has been removed.
The provider also supports exporting APIs to the user, it follows the same rule as the container and user can find them in cluster$cloudProvider$...
. Please see Extension to the container) for the details.
We provide a general purpose unit test function for the developers to test their extensions. The test function uses testthat
framework. Since the package needs a provider and a container package to work, the developer needs to define both components in the test file.
provider <- ECSFargateProvider::ECSFargateProvider()
container <- BiocFEDRContainer::BiocFEDRWorkerContainer()
generalDockerClusterTest(
cloudProvider = provider,
workerContainer = container,
workerNumber = 3L,
testReconnect = TRUE)
Please see ?generalDockerClusterTest
for more information,