Some thoughts about the design of dissimilarity measures
Seminar Room 2, Newton Institute Gatehouse
In many situations, dissimilarities between objects cannot be measured directly, but have to be constructed from some known characteristics of the objects of interest, e.g. some values on certain variables.
>From a philosophical point of view, the assumption of the objective existence of a 'true' but not directly observable dissimilarity value between two objects is highly questionable. We treat the dissimilarity construction problem as a problem of the choice or design of such a measure and not as an estimation problem of some existing but unknown quantities.
Therefore, subjective judgment is necessarily involved, and the main aim of the design of a dissimilarity measure is the proper representation of a subjective or intersubjective concept (usually of subject-matter experts) of similarity or dissimilarity between the objects.
The design of dissimilarity measures is of particular interest when analyzing high-dimensional data, because methods such as MDS and nearest neighbour techniques operate on dissimilarity matrices and such matrices are not essentially more complex when derived from high dimensional data.
Some guidelines for the choice and design of dissimilarity measures are given and illustrated by the construction of a new dissimilarity measure between species distribution areas in biogeography, which are formalized as binary presence-absence data on a set of geographic units.
I will also discuss alternatives to the Euclidean distance and their implications for high-dimensional situations in which it is not feasible to use information about the meaning of individual variables to construct a dissimilarity measure.