Experiments‎ > ‎

Virtual Clusters on Federated Cloud Sites (VCOC)

The main objective of the experiment VCOC (Virtual Clusters on federated sites) is to evaluate the feasibility of using multiple Cloud environments to deploy Services which need the allocation of a large pool of CPUs or virtual machines to a single user (as High Throughput Computing or as High Performance Computing). This experiment considers the deployment of virtual clusters to execute an application, developed in the eIMRT project, which calculates the dose for radiotherapy treatments based on Monte Carlo methods as an example of these services (see the video).

The experiment tries to answer different questions related to the usage of virtual clusters in distributed Cloud environments. A first set of questions is related to the time that the deployment and enlargement of such cluster need to be operational and the influence of other simultaneous operations have in the process. The objective is to better understand how to manage these virtual clusters to guarantee that the time to solution or latency (this means, the time since the cluster has been requested till the end of the service). To perform this experiment, several scripts have been developed, that enable the configuration of a virtual cluster using the virtual machines on top of BonFIRE infrastructure. Also, to study the elasticity of such a cluster, a set of application probes have been developed that will provide information to trigger the enlargement of the cluster if it is needed to guarantee the quality of service.

 


A second set of experiments will investigate the usage of the distributed capability of Cloud providers in order to protect the service against failures. A virtual cluster will be deployed divided into two sets, and the characteristics of the network (latency, bandwidth and packet loss rate) will be changed to study the effects on the performance of the cluster. The radical situation of losing part of the cluster will also be simulated, when the survived site should recover from the loss to guarantee that the customer receives the solution on time.

Added-value of the BonFIRE infrastructure

BonFIRE facilities are unique in the Cloud field. They provide capabilities that are difficult to implement locally by small research centres or SME, where their software can be tested and new concepts can be experimented. Although deploying open software stacks for Cloud inside one organization is relatively easy, making a stable and productive infrastructure for doing Cloud experiments is still complex and consumes a lot of effort. Using commercial Cloud for this kind of experiments, where the infrastructure is shared with production and isolation level is not always guaranteed, since it is a problem to make some experiments where the factors must be controlled in order to get right conclusions.

This experiment needs some of the BonFIRE unique capabilities:
  • The multi-site environment. The experiment will gather very valuable information about how to deploy and manage virtual clusters which can use several sites and the influence of requesting several deployments simultaneously.
  • A testbed where the experimenter can control network parameters such as bandwidth, latency or errors. This facility will allow the experimenter to study the effects of the network in the deployment and management of a virtual distributed cluster.

Impact on the BonFIRE project

This experiment uses most of the components of the BonFIRE infrastructure, and will help to improve the final user experience. A large number of new requirements have been requested to the project, which are being implemented and deployed. Also, due to the fact that the experiment will measure the time to deploy and enlarge a set of machines, BonFIRE benefits from using this information to detect any bottlenecks.