The goal of clustering is to partition the objects (here: persons) into groups which ideally should be homogenous and well-separated (i.e. low within-group and high between-groups heterogeneity). In the context of data preparation, cluster analysis reveals underlying structures in the data to provide subgroups that are easier to handle in subsequent steps of the data analysis. Most clustering techniques use a distance or dissimilarity matrix. Finding a good - i.e. interpretable - distance measure for a particular clustering task is hard. Therefore, a more direct approach to clustering might perform better: Homogeneity and heterogeneity describe the two extreme points of a measure of dispersion. In this paper, a new clustering procedure for discrete-time discretevalued life course trajectories is introduced that does not depend on a dissimilarity measure but on dispersions. The applied measure of dispersion has to deal with nominal data appropriately. Moreover, a discrete measure of association is needed to cope with the dependency structure of the time series. Both measures are discussed, a model for clustering discrete time series is introduced and the applicability of the new algorithm is demonstrated on a quite large data set from the German pension insurance. This paper offers a technical foundation for accounting for the heterogenous histories of the participants of observational studies with greater precision, without immediately being confronted with the problems of dimensionality. This is particularly useful for policy evaluation.
Dlugosz, Stephan (2011), Clustering Life Trajectories - A New Divisive Hierarchical Clustering Algorithm for Discrete-valued Discrete Time Series, ZEW Discussion Paper No. 11-015, Mannheim.