Latent Semantic Analysis as a Method of Content-Based Image Retrieval in Medical Applications

Citation data:

CEC Theses and Dissertations

Publication Year:
Usage 16
Abstract Views 16
Repository URL:;
Makovoz, Gennadiy
CBIR with High Resolution CT Images; Content-Based Image Retrieval; LSA; LSA with CT images; Medical Imaging; Computer Sciences
thesis / dissertation description
The research investigated whether a Latent Semantic Analysis (LSA)-based approach to image retrieval can map pixel intensity into a smaller concept space with good accuracy and reasonable computational cost. From a large set of computed tomography (CT) images, a retrieval query found all images for a particular patient based on semantic similarity. The effectiveness of the LSA retrieval was evaluated based on precision, recall, and F-score.This work extended the application of LSA to high-resolution CT radiology images. The images were chosen for their unique characteristics and their importance in medicine. Because CT images are intensity-only, they carry less information than color images. They typically have greater noise, higher intensity, greater contrast, and fewer colors than a raw RGB image. The study targeted level of intensity for image features extraction.The focus of this work was a formal evaluation of the LSA method in the context of large number of high-resolution radiology images. The study reported on preprocessing and retrieval time and discussed how reduction of the feature set size affected the results. LSA is an information retrieval technique that is based on the vector-space model. It works by reducing the dimensionality of the vector space, bringing similar terms and documents closer together. Matlab software was used to report on retrieval and preprocessing time.In determining the minimum size of concept space, it was found that the best combination of precision, recall, and F-score was achieved with 250 concepts (k = 250). This research reported precision of 100% on 100% of the queries and recall close to 90% on 100% of the queries with k=250. Selecting a higher number of concepts did not improve recall and resulted in significantly increased computational cost.