Abstract
Images have been used to express and convey information for so many years. Over the years, the human visual cortex has adapted through the environments that we have lived in. This has paved way for cognitive modelling of the visual cortex, where computer models have been developed to carry out visual information processing functions. This dissertation presents a system that reads in digital images of mathematical expressions with noisy backgrounds, and then applies agents to the various stages of image recognition, identifying the characters in the mathematical expression. The model is called Natural Image Mathematical Expression Recognition Model (NIMER). The NIMER model applies both supervised and unsupervised learning methods to the process of recognition. NIMER follows the classic two step recognition process, which is segmentation and classification, applying multiple agents and ensemble learning at each of the stages. The segmentation stage is composed of Region-based Convolutional Neural Network (R-CNN), Minimum Spanning Tree (MST) and Connected Components Labelling (CCL) agents. The MST and CCL agents apply a form of unsupervised learning similar to clustering in order to segment images, and R-CNN uses supervised learning. The classification stage is made up of Convolutional Neural Network (CNN), K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) agents which all use a supervised form of learning. The NIMER model presents ensemble results at each stage that are better than the individual agent results.
M.Sc. (Computer Science)