ProBGP: Progressive Visual Analytics of Live BGP Updates
EuroVis 2021. 23rd Eurographics / IEEE VGTC Conference on Visualization 2021
Eurographics / IEEE VGTC Conference on Visualization (EuroVis) <23, 2021, online>
The global routing network is the backbone of the Internet. However, it is quite vulnerable to attacks that cause major disruptions or routing manipulations. Prior related works have visualized routing path changes with node link diagrams, but it requires strong domain expertise to understand if a routing change between autonomous systems is suspicious. Geographic visualization has an advantage over conventional node-link diagrams by helping uncover such suspicious routes as the user can immediately see if a path is the shortest path to the target or an unreasonable detour. In this paper, we present ProBGP, a web-based progressive approach to visually analyze BGP update routes. We created a novel progressive data processing algorithm for the geographic approximation of autonomous systems and combined it with a progressively updating visualization. While the newest log data is continuously loaded, our approach also allows querying the entire log recordings since 1999. We present the usefulness of our approach with a real use case of a major route leak from June 2019. We report on multiple interviews with domain experts throughout the development. Finally, we evaluated our algorithm quantitatively against a public peering database and qualitatively against AS network maps.
Information Visualization Interface on Home Router Traffic Data for Laypersons
Proceedings of the Working Conference on Advanced Visual Interfaces AVI 2020
International Conference on Advanced Visual Interfaces (AVI) <2020, online>
With the aim to increase the awareness of the everyday internet user for the own home network traffic, we present two interactive visualization interfaces for visual exploration of home router traffic records. Thereby we differentiate between users with a present intrinsic motivation for the topic and those with absent intrinsic motivation. Therefore, gamification in the first interface is used to maintain motivation of the first type of user, while the storytelling concept based on the hero's journey in the second interface aims at increasing the perceived incentives for the second user group.
NetCapVis: Web-based Progressive Visual Analytics for Network Packet Captures
IEEE Symposium on Visualization for Cyber Security (VizSec) <16, 2019>
Network traffic log data is a key data source for forensic analysis of cybersecurity incidents. Packet Captures (PCAPs) are the raw information directly gathered from the network device. As the bandwidth and connections to other hosts rise, this data becomes very large quickly. Malware analysts and administrators are using this data frequently for their analysis. However, the currently most used tool Wireshark is displaying the data as a table, making it difficult to get an overview and focus on the significant parts. Also, the process of loading large files into Wireshark takes time and has to be repeated each time the file is closed. We believe that this problem poses an optimal setting for a client-server infrastructure with a progressive visual analytics approach. The processing can be outsourced to the server while the client is progressively updated. In this paper we present NetCapVis, an web-based progressive visual analytics system where the user can upload PCAP files, set initial filters to reduce the data before uploading and then instantly interact with the data while the rest is progressively loaded into the visualizations.
Using Dashboard Networks to Visualize Multiple Patient Histories: A Design Study on Post-operative Prostate Cancer
IEEE Transactions on Visualization and Computer Graphics
In this design study, we present a visualization technique that segments patients' histories instead of treating them as raw event sequences, aggregates the segments using criteria such as the whole history or treatment combinations, and then visualizes the aggregated segments as static dashboards that are arranged in a dashboard network to show longitudinal changes. The static dashboards were developed in nine iterations, to show 15 important attributes from the patients' histories. The final design was evaluated with five non-experts, five visualization experts and four medical experts, who successfully used it to gain an overview of a 2,000 patient dataset, and to make observations about longitudinal changes and differences between two cohorts. The research represents a step-change in the detail of large-scale data that may be successfully visualized using dashboards, and provides guidance about how the approach may be generalized.
Visual-Interactive Identification of Anomalous IP-Block Behavior Using Geo-IP Data
IEEE Symposium on Visualization for Cyber Security (VizSec) <15, 2018, Berlin, Germany>
Routing of network packets from one computer to another is the backbone of the internet and impacts the everyday life of many people. Although, this is a fully automated process it has many security issues. IP hijacks and misconfigurations occur very often and are difficult to detect. In the past visual analytics approaches aimed at detecting these phenomenons but only a few of these integrated geographical references. Geo-IP data is being used mostly as a lookup table which is an undervaluation of its capabilities. In this paper we present a visual-interactive system which only relies on Geo-IP data to create more awareness for this data source. We show that looking at Geo-IP data over time in combination with owner and location information of IP blocks already reveals suspicious cases. Together with our design study we also contribute a pre-processing algorithm for the Maxmind GeoIP2 City and ISP databases, to motivate the community to integrate this data source in future approaches.
Visual-Interactive Learning of Time Series Similarity
Darmstadt, TU, Master Thesis, 2017
Similarity is important for the applicability of a series of data analysis tasks. Pattern recognition, clustering and nearest neighbor search require a meaningful similarity function for their functionality. In this work, we consider a similarity function to be meaningful, if it reflects the similarity notion in the minds of the users. Therefore, the design of a similarity function has to incorporate and reflect user's preferences. In this work we focus on the similarity for time-oriented data. The definition of similarity functions for this data type requires a cascade of routines, including preprocessing steps, descriptors, normalization steps, and distance measures. Referring to this cascade, manually choosing appropriate routines matching the user's expectations of time series similarity is a tedious process. We present a visual-interactive approach that identifies meaningful similarity functions for time-oriented data automatically. The core principle is to learn from user-defined labels about the similarity of pairs of time series. Automatic choice of fitting routines allows to match the similarity notion of the users at run time. We implement a labeling interface for pairwise time series, including active learning support to enhance the learning process. Different views allow the analysis of the learning process of similarity for time series data. A list-based ranking interface provides detailed information on the best performing similarity functions. Filtering interfaces allow for detailed analysis of the applicability of routines included in the similarity functions. Furthermore, they provide drill-down functionality that can be used to experiment with different sets of similarity functions, in order to increase the robustness of the prediction. Nearest neighbor search closes the feedback loop and enables the users to validate, if the defined similarity function complies with their notion of similarity. In addition, we demonstrate the applicability of the approach in case studies based on different pre-defined notions of similarity used for labeling. Finally, we evaluate our approach to determine which factors influence the prediction accuracy. In conclusion, our approach extends the classical user-centered and iterative design process to an online learning process that defines similarity functions based on user feedback. We report on an increase in efficiency from a tedious design process for similarity functions, down to a process that only takes minutes of expert time.
Visual-Interactive Similarity Search for Complex Objects by Example of Soccer Player Analysis
IVAPP 2017. Proceedings
International Conference on Information Visualization Theory and Applications (IVAPP) <8, 2017, Porto, Portugal>
The definition of similarity is a key prerequisite when analyzing complex data types in data mining, information retrieval, or machine learning. However, the meaningful definition is often hampered by the complexity of data objects and particularly by different notions of subjective similarity latent in targeted user groups. Taking the example of soccer players, we present a visual-interactive system that learns users' mental models of similarity. In a visual-interactive interface, users are able to label pairs of soccer players with respect to their subjective notion of similarity. Our proposed similarity model automatically learns the respective concept of similarity using an active learning strategy. A visual-interactive retrieval technique is provided to validate the model and to execute downstream retrieval tasks for soccer player analysis. The applicability of the approach is demonstrated in different evaluation strategies, including usage scenarions and cross-validation tests.
Towards Combining Attribute-Based and Time Series-Based Visual Querying
EuroVis 2016. Eurographics / IEEE Symposium on Visualization 2016: Posters
Eurographics Conference on Visualization (EuroVis) <18, 2016, Groningen, The Netherlands>
We present a concept for the visual-interactive definition of meaningful subsets in data sets comprising multivariate attributes and time series data. Based on a generalization of requirements of a real-world user group, we propose a three-stage approach, combining visual-interactive querying, query filter analysis, and result exploration. The approach includes several design parameters that can easily be adapted in future design studies for alternative applications.
Visual-Interactive Exploration of Relations Between Time-Oriented Data and Multivariate Data
International EuroVis Workshop on Visual Analytics (EuroVA) <7, 2016, Groningen, The Netherlands>
The analysis of large, multivariate data sets is challenging, especially when some of these data objects are timeoriented. Exploring relationships between multivariate and temporal information, e.g., to identify patterns that support decision making is an important industrial analysis task. The target group of this design study are data analysts aiming at detecting fault patterns in a telecommunications network in order to spend maintenance budget more effectively. We present a visual analytics tool that provides overviews of multivariate data sets and associated time series. Users can select data subsets of interest in both attribute data and clustered time series data. Linked views consequently support the identification of relations between the two spaces. To ensure usefulness, the tool was designed in an iterative way, based on a careful characterization of the data, users, and tasks. A usage scenario demonstrates the applicability of the approach.
A Visual Active Learning System for the Assessment of Patient Well-Being in Prostate Cancer Research
Proceedings of the 2015 Workshop on Visual Analytics in Healthcare
Workshop in Visual Analytics in Healthcare (VAHC) <2015, Chicago, IL, USA>
The assessment of patient well-being is highly relevant for the early detection of diseases, for assessing the risks of therapies, or for evaluating therapy outcomes. The knowledge to assess a patient's well-being is actually tacit knowledge and thus, can only be used by the physicians themselves. The rationale of this research approach is to use visual interfaces to capture the mental models of experts and make them available more explicitly. We present a visual active learning system that enables physicians to label the well-being state of patient histories su_ering prostate cancer. The labeled instances are iteratively learned in an active learning approach. In addition, the system provides models and visual interfaces for a) estimating the number of patients needed for learning, b) suggesting meaningful learning candidates and c) visual feedback on test candidates. We present the results of two evaluation strategies that prove the validity of the applied model. In a representative real-world use case, we learned the feedback of physicians on a data collection of more than 16.000 prostate cancer histories.
A Visual-Interactive System for Prostate Cancer Cohort Analysis
IEEE Computer Graphics and Applications
Data-centered research is becoming increasingly important in prostate cancer research, where a long-term goal is a sound prognosis prior to surgery. The proposed visual-interactive system, developed in close collaboration with medical researchers, helps physicians efficiently and effectively visualize single and multiple patient histories at a glance.
Towards a User-Defined Visual-Interactive Definition of Similarity Functions for Mixed Data
IEEE Conference on Visual Analytics Science and Technology. Proceedings
IEEE Symposium on Visual Analytics Science and Technology (VAST) <9, 2014, Paris, France>
The creation of similarity functions based on visual-interactive user feedback is a promising means to capture the mental similarity notion in the heads of domain experts. In particular, concepts exist where users arrange multivariate data objects on a 2D data landscape in order to learn new similarity functions. While systems that incorporate numerical data attributes have been presented in the past, the remaining overall goal may be to develop systems also for mixed data sets. In this work, we present a feedback model for categorical data which can be used alongside of numerical feedback models in future.
User-Based Visual-Interactive Similarity Definition for Mixed Data Objects - Concept and First Implementation
WSCG 2014. Communication Papers Proceedings
International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG) <22, 2014, Plzen, Czech Republic>
The definition of similarity between data objects plays a key role in many analytical systems. The process of similarity definition comprises several challenges as three main problems occur: different stakeholders, mixed data, and changing requirements. Firstly, in many applications the developers of the analytical system (data scientists) model the similarity, while the users (domain experts) have distinct (mental) similarity notions. Secondly, the definition of similarity for mixed data types is challenging. Thirdly, many systems use static similarity models that cannot adapt to changing data or user needs. We present a concept for the development of systems that support the visual-interactive similarity definition for mixed data objects emphasizing 15 crucial steps. For each step different design considerations and implementation variants are presented, revealing a large design space. Moreover, we present a first implementation of our concept, enabling domain experts to express mental similarity notions through a visual-interactive system. The provided implementation tackles the different-stakeholders problem, the mixed data problem, and the changing requirements problem. The implementation is not limited to a specific mixed data set. However, we show the applicability of our implementation in a case study where a functional similarity model is trained for countries as objects.
User-centered Interactive Similarity Definition for Complex Data Objects
Darmstadt, TU, Bachelor Thesis, 2014
The definition of similarity between data objects plays a key role for the applicability of many analytical systems. Similarity measures are used for prominent data analysis tasks like nearest neighbor search, clustering, or pattern recognition. These tasks are applied in many scientific domains like Information Retrieval, Data Mining, Machine Learning, Information Visualization and Visual Analytics. The data used for the calculation of similarity can either be of uniform attribute type (like numerical, ordinal, categorical or binary) or consist of combinations thereof (mixed data). The process of similarity definition comprises several challenges which I aim to tackle in this work. To start with, in many applications the developers (data experts) of the analytical system are not necessarily the users (domain experts) of the system. A problem arises, because data experts implement the functional similarity specification for domain experts. The functional similarity specification, however, should reflect the similarity notion in the minds of domain experts. Therefore the domain experts should be involved in the similarity generation process. The second challenge refers to the similarity definition for mixed data. A variety of similarity definitions for numerical, categorical or binary data exist. However, the similarity definition based on mixed data is cumbersome because of the complexity of the data. Finally, there are two possibilities when the similarity can be defined, namely at compile time or at run time. Today, many analytical systems define the similarity at compile time. However, the similarity notion of domain experts or the data set may vary over time. This would require a new specification of the functional similarity and a new compilation of the system. The definition of similarity at run time would solve this problem. I present a visual-interactive system that enables domain experts to define a similarity measure that reflects their similarity notion. The system is applicable for mixed data sets. Domain experts can align objects in a visual interface to generate feedback. Dynamic recalculation of the functional similarity specification allows to match the similarity notion of domain expert at run time. This way the functional similarity specification can be adjusted at any time. Further, I provide a visual-interactive mode which enables the data expert to explore the similarity definition process of the domain expert. In addition, I evaluate the system to assess the quality of the similarity concept as well as the feedback generation process. The results of the evaluation illustrate both: the validity of my solution as well as extension possibilities depending on the complexity of the given user feedback. In two case studies I show the applicability of the system. Both use cases show that the 'mental' similarity notion of users can be captured by the similarity concept. The results of the evaluation and the observations made in the case studies can be applied to improve the system or be used as a baseline for future approaches for user-centered interactive similarity definition for complex data objects.