Generic Schema Descriptions for Comma-Separated Values Files of Environmental Data
Geospatial Technologies for All
Conference on Geographic Information Science (AGILE) <21, 2018, Lund, Sweden>
Comma-Separated Values (CSV) files are commonly used to publish data about environmental phenomena and environmental sensor measurements. Due to its simplicity, this format has many advantages. However, at the same time there is no official standard for CSV and no possibility to specify schematic constraints or other metadata. As a result, CSV files come in many variations and often with no metadata that would support interpretation or further processing, analysis and visualization. In this paper, we propose a framework for the specification of schema descriptions for CSV files as they are used in the environmental sciences. It allows to constrain the structure and content of a CSV file and also to specify relations between files, for example when they are published in one data package. The framework is extensible, also to other spatial data formats such as GeoTiff. The schema descriptions are encoded in JSON or XML to be published in the Web as a supplement to the data. It comes as a lightweight solution that provides metadata required to publish OGC compliant services from CSV files. It helps to overcome the heterogeneities of different data providers when exchanging environmental measurement data on the Web.Keywords: tabular data, generic schema language, CSV, comma separated values, metadata.
Vector Based Web Visualization of Geospatial Big Data
Darmstadt, TU, Bachelor Thesis, 2018
Today, big data is one of the most challenging topics in computer science. To give customers, developers or domain experts an overview of their data, one needs to visualize these. They need to explore their data, using visualization technologies on high level but also in detail. As base technology, visualizations can be used to do more complex data analytic tasks. In case data contains geospatial information it becomes more difficult, because nearly every user has a well trained experience how to explore geographic information. These map applications provide an interface, in which users can zoom and pan over the whole world. This thesis focuses on evaluating one approach to visualize huge sets of geospatial data in modern web browsers. The contribution of this work is, to make it possible to render over one million polygons integrated in a modern web application which is done by using 2D Vector Tiles. Another major challenge is the web application, which provides interaction features like data-driven filtering and styling of vector data for intuitive data exploration. The important point is memory management in modern web browsers and its limitations.
A Modular Software Architecture for Processing of Big Geospatial Data in the Cloud
Computers & Graphics
In this paper we propose a software architecture that allows for processing of large geospatial data sets in the cloud. Our system is modular and flexible and supports multiple algorithm design paradigms such as MapReduce, in-memory computing or agent-based programming. It contains a web-based user interface where domain experts (e.g. GIS analysts or urban planners) can define high-level processing workflows using a domain-specific language (DSL). The workflows are passed through a number of components including a parser, interpreter, and a service called job manager. These components use declarative and procedural knowledge encoded in rules to generate a processing chain specifying the execution of the workflows on a given cloud infrastructure according to the constraints defined by the user. The job manager evaluates this chain, spawns processing services in the cloud and monitors them. The services communicate with each other through a distributed file system that is scalable and fault-tolerant. Compared to previous work describing cloud infrastructures and architectures we focus on the processing of big heterogeneous geospatial data. In addition to that, we do not rely on only one specific programming model or a certain cloud infrastructure but support several ones. Combined with the possibility to control the processing through DSL-based workflows, this makes our architecture very flexible and configurable. We do not only see the cloud as a means to store and distribute large data sets but also as a way to harness the processing power of distributed computing environments for large-volume geospatial data sets. The proposed architecture design has been developed for the IQmulus research project funded by the European Commission. The paper concludes with the evaluation results from applying our solution to two example workflows from this project.
Rule-based Process Orchestration
Campus Gießen, TH Mittelhessen, Master Thesis, 2015
The constant improvements of remote-sensing technologies along with the emergence of new technologies to gather geo-related information has led to an exponential growth of geospatial data along with the range of potential applications, during the last decades. New technologies are required to target the challenges which arise not only from the variety but more importantly from the volume of so called Big Data. This thesis presents a modular and flexible software architecture, which aims to address these issues through the rule-based generation of dynamic process chains in respect to properties such as type and volume of data and an efficient utilisation of the available processing infrastructure. The use of an abstract workflow model enables the definition of processing steps in a concise and comprehensible manner without the need to consider the efficient execution. After the exploration and evaluation of related solutions a precise analysis of the context is carried out. This way, all interacting systems and data sources are identified and considered during the following design of the architecture. Furthermore, the requirements and quality goals such as high-availability and scalability are pointed out. To proof the feasibility of the technology-independent architecture, a prototypical implementation is presented which can be also used as a groundwork for further developments. Finally, real-world scenarios are used to show how such a system behaves in conditions simulating the practical use, which allows a discussion about strengths and weaknesses of this approach.