Skip to content



Text mining and high dimensional statistical analysis

Said, YH (George Mason)
Wednesday 13 February 2008, 14:00-15:00

Seminar Room 2, Newton Institute Gatehouse


Text mining can be thought of as a synthesis of information retrieval, natural language processing and statistical data mining. The set of documents being considered can scale to hundreds of thousands and the associated lexicon can be a million or more words. Analysis is often done by consideration of a term-document matrix or even a bigram-document matrix. The dimensionality of the term vector can thus easily be a million or more. In this talk I will describe some of the approaches to text mining on which we have been working. This is a joint work with Dr Edward Wegman.


[pdf ]



Back to top ∧