Advances in Representation and Learning of Temporal Event Sequences
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis focuses on representation and learning of two types of temporal event
sequences. The first type is a sequence of interval-based events, called an e-sequence,
where every event can have a duration, and the second type is a sequence of pointbased
events, where events have no durations.
For interval-based event sequences, we propose the coincidence eventset representation
(CER) to represent interval-based events along with their durations. This
representation is especially designed for pattern mining problems. We incorporate
both internal and external utilities into e-sequences and formulate the problem of
high utility pattern mining of e-sequences. We present a sound and complete algorithm
called HUIPMiner to discover high utility patterns from e-sequence datasets.
We introduce the L-sequence downward-closure property (LDCP), which is utilized
in our pruning strategy to reduce the search space. We then demonstrate that the
HUIPMiner algorithm generates all high utility patterns.
We propose a feature-based framework called FIBS for the e-sequence classification
problem. In FIBS, features from an e-sequence dataset are extracted based on
two representations: vectors of the relative frequency of event labels and vectors of
the temporal relations among event intervals. We also propose a heuristic filter-based
strategy to avoid selecting irrelevant features. We show the superiority of the FIBS
performance in terms of classification accuracy compared to state-of-the-art competitors.
We propose three novel distance-based approaches for full-length matching of esequences.
The first approach, ERF, is based on the Euclidean distance between
relative frequency representations of two e-sequences. The second approach, EPC, is
based on the cosine distance between position code representations of two e-sequences.
The third method, WLC, uses a weighted linear combination of the ERF and EPC
measures. We demonstrate that WLC outperforms ERF, EPC, and existing state-ofthe-
art methods in terms of nearest neighbor classification accuracy.
For point-based event sequences, we build an ensemble model that predicts the
time of occurrence of the next point-based event. The ensemble model comprises nine
other methods that are able to perform the prediction task. We demonstrate that
the prediction results obtained by the ensemble method are more accurate than the
results obtained by most individual methods.