Text mining and high dimensional statistical analysis
Seminar Room 2, Newton Institute Gatehouse
Text mining can be thought of as a synthesis of information retrieval, natural language processing and statistical data mining. The set of documents being considered can scale to hundreds of thousands and the associated lexicon can be a million or more words. Analysis is often done by consideration of a term-document matrix or even a bigram-document matrix. The dimensionality of the term vector can thus easily be a million or more. In this talk I will describe some of the approaches to text mining on which we have been working. This is a joint work with Dr Edward Wegman.