As part of our data management support, CDS has developed a workflow automation framework to automate data processing. This allows for 24×7 operations without 24×7 support personnel. The centerpiece of this framework is a program called Islands. Islands is a general purpose automation system for reliably executing tasks on a cluster of heterogeneous servers. It enables complex process flows to be modeled as different logical steps, each with potentially different resource requirements.
These steps are then executed on appropriate nodes based on resource availability with (configurable) automatic re-execution on errors. Sequential process flows are supported as well as more complex tasks such as one task spawning many (for independent/parallel execution) and many tasks converging to one (for aggregating/combining results or for more efficient execution of small tasks). Information about all tasks is stored in a relational database for easy reporting. This allows instantaneous tracking of data receipt, current status, processing times, failure rates, and availability to end users.
Unlike other schedulers which are geared towards large map and reduce tasks, Islands is equally suited for millions of extremely short tasks per day as well as longer running tasks. Additionally, Islands supports non-homogeneous computing clusters through the concept of named resources. Rather than each node simply having a fixed number of “slots” for job execution, Islands allocates tasks with finer resolution based actual resource requirements. Commonly used resources are CPU, RAM, I/O bandwidth, but user-defined named resources are also supported.