Executing Cyclic Scientific Workflows in the Cloud
Journal of Cloud Computing
We present an algorithm and a software architecture for a cloud-based system that executes cyclic scientific workflows whose structure may change during run time. Existing approaches either rely on workflow definitions based on directed acyclic graphs (DAGs) or require workarounds to implement cyclic structures. In contrast, our system supports cycles natively, avoids workarounds, and as such reduces the complexity of workflow modelling and maintenance. Our algorithm traverses workflow graphs and transforms them iteratively into linear sequences of executable actions. We call these sequences process chains. Our software architecture distributes the process chains to multiple compute nodes in the cloud and oversees their execution. We evaluate our approach by applying it to two practical use cases from the domains of astronomy and engineering. We also compare it with two existing workflow management systems. The evaluation demonstrates that our algorithm is able to execute dynamically changing workflows with cycles and that design and maintenance of complex workflows is easier than with existing solutions. It also shows that our software architecture can run process chains on multiple compute nodes in parallel to significantly speed up the workflow execution. An implementation of our algorithm and the software architecture is available with the Steep Workflow Management System that we released under an open-source license. The resources for the first practical use case are also available as open source for reproduction.
Function as a Service for the Storage of Large Geospatial Data
Darmstadt, TU, Master Thesis, 2020
Applications grow over time. While they are usually small at the beginning, more and more features are added over the years. At some point, this leads to a heavy monolithic system. On the other hand, there is a trend towards deploying applications in smaller units. The most recent stage of this development is Function as a Service (FaaS). It uses isolated, short-running and stateless functions. This reduces the complexity of individual functions compared to the entire monolithic application. Furthermore, a framework can scale the functions up and down as needed. To run an existing monolithic application as FaaS, it has to be adjusted. This thesis presents a strategy for the migration process. It analyzes the processing flow in the monolithic application and defines criteria for a division into functions. The suitability of the process is demonstrated based on an existing application. For this purpose two functional concepts with different focuses are developed. The first one preserves the backwards compatibility to the monolithic application and thus allows a flexible change between a monolithic operation and a FaaS execution. The second concept focuses on a fine division of functions. Here the compatibility to the monolithic application is lost, but the implementation becomes more flexible and can be extended more easily. To execute the designed functions, an improved scaling metric is presented. It is based on the number of outstanding function requests and integrates into the underlying FaaS-Framework. The evaluation shows that the presented functional concepts are suitable for processing real world data. Both concepts lead to an speed up in processing compared to the monolithic application. However, this performance gain is accompanied by an increased resource consumption, so that the use of a FaaS-based solution must be weighed up depending on the situation.
Scalable Processing of Massive Geodata in the Cloud: Generating a Level-of-Detail Structure Optimized for Web visualization
Full paper Proceedings of the 23rd AGILE Conference on Geographic Information Science
Conference on Geographic Information Science (AGILE) <23, 2020, Chania, Crete, Creece>
We present a cloud-based approach to transform arbitrarily large terrain data to a hierarchical level-of-detail structure that is optimized for web visualization. Our approach is based on a divide-andconquer strategy. The input data is split into tiles that are distributed to individual workers in the cloud. These workers apply a Delaunay triangulation with a maximum number of points and a maximum geometric error. They merge the results and triangulate them again to generate less detailed tiles. The process repeats until a hierarchical tree of different levels of detail has been created. This tree can be used to stream the data to the web browser. We have implemented this approach in the frameworks Apache Spark and GeoTrellis. Our paper includes an evaluation of our approach and the implementation. We focus on scalability and runtime but also investigate bottlenecks, possible reasons for them, as well as options for mitigation. The results of our evaluation show that our approach and implementation are scalable and that we are able to process massive terrain data.