Sparse Coding Tone-Like Structures in Sound Using Local Image Features
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
A trend in machine learning has emphasized the use of features which are learned algorithmically, in contrast to the hand-engineered features traditionally used in classification tasks. Classical sparse coding is a robust feature learning paradigm which represents inputs as a sparse vector of coefficients applied to a dictionary of basis functions. While sparse coding has yielded state-of-the-art results in many application domains, computational challenges often make it impractical to use. This thesis examines the application of a local image feature based sparse coding algorithm (ScSPM) to the problem domain of audio classification. The convex optimization problems involved in dictionary learning are discussed, and existing methods are reviewed. With the goal of mitigating some of the computational expense involved in sparse coding local image features, alternative image-based representations of audio are proposed which isolate the tone-like structures present in the signal. The proposed alternative representations are evaluated in a multi-class audio classification task with respect to training time as well as classification accuracy. i