Advances in Representation and Learning of Temporal Event Sequences

Date

2020-08

Journal Title

Journal ISSN

Volume Title

Publisher

Faculty of Graduate Studies and Research, University of Regina

Abstract

This thesis focuses on representation and learning of two types of temporal event

sequences. The first type is a sequence of interval-based events, called an e-sequence,

where every event can have a duration, and the second type is a sequence of pointbased

events, where events have no durations.

For interval-based event sequences, we propose the coincidence eventset representation

(CER) to represent interval-based events along with their durations. This

representation is especially designed for pattern mining problems. We incorporate

both internal and external utilities into e-sequences and formulate the problem of

high utility pattern mining of e-sequences. We present a sound and complete algorithm

called HUIPMiner to discover high utility patterns from e-sequence datasets.

We introduce the L-sequence downward-closure property (LDCP), which is utilized

in our pruning strategy to reduce the search space. We then demonstrate that the

HUIPMiner algorithm generates all high utility patterns.

We propose a feature-based framework called FIBS for the e-sequence classification

problem. In FIBS, features from an e-sequence dataset are extracted based on

two representations: vectors of the relative frequency of event labels and vectors of

the temporal relations among event intervals. We also propose a heuristic filter-based

strategy to avoid selecting irrelevant features. We show the superiority of the FIBS

performance in terms of classification accuracy compared to state-of-the-art competitors.

We propose three novel distance-based approaches for full-length matching of esequences.

The first approach, ERF, is based on the Euclidean distance between

relative frequency representations of two e-sequences. The second approach, EPC, is

based on the cosine distance between position code representations of two e-sequences.

The third method, WLC, uses a weighted linear combination of the ERF and EPC

measures. We demonstrate that WLC outperforms ERF, EPC, and existing state-ofthe-

art methods in terms of nearest neighbor classification accuracy.

For point-based event sequences, we build an ensemble model that predicts the

time of occurrence of the next point-based event. The ensemble model comprises nine

other methods that are able to perform the prediction task. We demonstrate that

the prediction results obtained by the ensemble method are more accurate than the

results obtained by most individual methods.

Description

A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science, University of Regina. xv, 156 p.

Keywords

Citation