Advances in Representation and Learning of Temporal Event Sequences

Date

2020-08

Authors

Mirbagheritabatabaei, Seyedmohammad

Journal Title

Journal ISSN

Volume Title

Publisher

Faculty of Graduate Studies and Research, University of Regina

Abstract

This thesis focuses on representation and learning of two types of temporal event sequences. The first type is a sequence of interval-based events, called an e-sequence, where every event can have a duration, and the second type is a sequence of pointbased events, where events have no durations. For interval-based event sequences, we propose the coincidence eventset representation (CER) to represent interval-based events along with their durations. This representation is especially designed for pattern mining problems. We incorporate both internal and external utilities into e-sequences and formulate the problem of high utility pattern mining of e-sequences. We present a sound and complete algorithm called HUIPMiner to discover high utility patterns from e-sequence datasets. We introduce the L-sequence downward-closure property (LDCP), which is utilized in our pruning strategy to reduce the search space. We then demonstrate that the HUIPMiner algorithm generates all high utility patterns. We propose a feature-based framework called FIBS for the e-sequence classification problem. In FIBS, features from an e-sequence dataset are extracted based on two representations: vectors of the relative frequency of event labels and vectors of the temporal relations among event intervals. We also propose a heuristic filter-based strategy to avoid selecting irrelevant features. We show the superiority of the FIBS performance in terms of classification accuracy compared to state-of-the-art competitors. We propose three novel distance-based approaches for full-length matching of esequences. The first approach, ERF, is based on the Euclidean distance between relative frequency representations of two e-sequences. The second approach, EPC, is based on the cosine distance between position code representations of two e-sequences. The third method, WLC, uses a weighted linear combination of the ERF and EPC measures. We demonstrate that WLC outperforms ERF, EPC, and existing state-ofthe- art methods in terms of nearest neighbor classification accuracy. For point-based event sequences, we build an ensemble model that predicts the time of occurrence of the next point-based event. The ensemble model comprises nine other methods that are able to perform the prediction task. We demonstrate that the prediction results obtained by the ensemble method are more accurate than the results obtained by most individual methods.

Description

A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science, University of Regina. xv, 156 p.

Keywords

Citation