Learning Pattern Languages from a Small Number of Helpfully Chosen Examples

Date
2013-08
Authors
Mazadi, Zeinab
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Graduate Studies and Research, University of Regina
Abstract

A pattern is a string containing variable symbols and constants. The language of a pattern is the set of all strings obtained by replacing all variables in the pattern with non-empty strings. Patterns and their languages were introduced by Angluin in 1980. Since that time, learning of pattern languages has been a topic of great interest in the research area of computational learning theory, mainly because of its relevance for many applications. Areas in which patterns are suitable for modelling data are for example bioinformatics (e.g., when representing sets of amino acid sequences) or text mining (e.g., for automated information extraction). This thesis studies learning of pattern languages in the context of computational learning theory. Computational learning theory studies various models of learning and investigates the learnability of classes of languages using each learning model. More- over, determining the number of data points (sample complexity) and the amount of computational time (time complexity) required for learning a particular class of languages in a particular model is a main goal in computational learning theory. In this thesis we focus on learning classes of pattern languages in a model called “learning from helpful examples” or “learning from teachers”. In particular, we are interested in determining the worst case number of examples that need to be com- municated between a teacher and a learner to identify the target language from all other languages in the class. The sample complexity measures we use are the teaching dimension in the classic teaching model and the recursive teaching dimension in the cooperative teaching model that was introduced later. We are interested in computing these parameters for certain classes of pattern languages over various sizes of alphabets from which the constant symbols can be taken. In particular, we aim to determine the effect of the alphabet size on these parameters. We study arbitrary patterns, regular patterns and one-variable patterns over three types of alphabets, namely finite alphabets of size at least two, singleton alphabets and infinite alphabets. Our results show that sometimes the alphabet size does in- fluence the sample complexity parameters and sometimes it does not influence them. Moreover, we demonstrate that more advanced models of teaching, like the recursive teaching model, can be more sample-efficient than the classic teaching model when learning certain kinds of pattern languages.

Description
A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science, University of Regina. viii, 74 p.
Keywords
Citation
Collections