Text mining and high dimensional statistical analysis

Presented by: 
YH Said [George Mason]
Wednesday 13th February 2008 - 14:00 to 15:00
INI Seminar Room 2

Text mining can be thought of as a synthesis of information retrieval, natural language processing and statistical data mining. The set of documents being considered can scale to hundreds of thousands and the associated lexicon can be a million or more words. Analysis is often done by consideration of a term-document matrix or even a bigram-document matrix. The dimensionality of the term vector can thus easily be a million or more. In this talk I will describe some of the approaches to text mining on which we have been working. This is a joint work with Dr Edward Wegman.

