Towards Visual Feature Selection for Multivariate Time Series Data
Magdeburg, Univ., Master Thesis, 2017
Time series analysis and modeling are essential tools for the transfer of knowledge across time, also called forecasting. This often involves the task of identifying the least number of features that are most useful for building a model that accurately forecasts a target without suffering from dimensionality issues. This is challenging, because time series involve many different characteristics that need to be captured by a model. Traditional wrapper approaches are bound to the actual learning algorithm that builds the model, which requires computational effort and limits their range of application. Filter methods are independent of the future model, but mostly take the form of a black box algorithm, which does not allow analysts to monitor and interactively guide the feature selection. In this thesis, the filter concept for multivariate time series is advanced by making use of the human perception and interpretation abilities for independent evaluation of a feature subset's quality. To ensure independence, we derive a quality criterion from a general assumption about the relationship between input and output in a valid model. An overview visualization enables analysts to visually assess its validity and to steer the analysis towards regions of interest, where the feature subset's quality is not sufficient. Critical regions can be analyzed in detail using the surrounding system of linked views. Findings contribute to an interactive refinement of the feature subset, which might also include the analyst's expertise. We evaluate the proposed method by applying it to real-world sensor data and an artificial time-oriented data set. The analyst was able to quickly distinguish well-explained regions from critical parts of the feature space, for which the identification of an additional explanatory feature could be tackled straight-away. Due to visualization constraints, the approach can handle only two-dimensional feature subsets, which are taken as input to perform one feature selection iteration. Still, it might be an inspiring step in the direction of universal dimension reduction that involves the human strengths.