Hi All,

Have been fiddling about with various statistical packages for cluster analysis. Been using SPSS and R namely, and will shortly use Minitab. While I really like SPSS, as I find the Command Line infuriating these days, I was quietly impressed with R. So having got my dataset together, I plugged into

SPSS outputs the clusters in the following way, which is quite neat.

By double clicking on the input boxes one is able to see the distribution of data. From there one can do ones comparisons. SPSS also outputs the cluster number back into the original dataset – like this:

mpg cyl disp hp drat wt qsec vs am gear carbClusterMazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 42

Haven’t been able to display a dendrogram.

R on the other hand requires command line! A great and simple tutorial is here.

For my dataset however it resulted in a dendrogram that looked like this. It’s a little hard to read, lets be honest!

One can fiddle around with the font size, but with this number of rows, one really needs the data in spreadsheet form to see what row belongs to what cluster.

This can be done by using the following commands (with thanks to Chi Yau from http://www.r-tutor.com/

> B <- mtcars[1:10, ]

> x <- hclust(dist(as.matrix(B)))

The cluster tree info is in x$merge:

> x$merge

[,1] [,2]

[1,] -1 -2

[2,] -10 1

[3,] -3 -9

[4,] -4 -6

[5,] -8 3

[6,] 2 5

[7,] -5 -7

[8,] 4 6

[9,] 7 8

The first row of x$merge is (-1, -2). It means merging the 1st and 2nd cars of B into a new cluster #1.

The second row of x$merge is (-10, 1). It means merging the 10th cars of B and cluster #1 into a new cluster #2.

The third row of x$merge is (-3, -9). It means merging the 3rd and 9th cars of B into a new cluster #3.

The rest of the table can be interpreted similarly.

Hence to get the cluster information of, say, “Hornet 4 Drive”, which is the 4th car in B:

> cbind(x$labels)

[,1]

[1,] “Mazda RX4”

[2,] “Mazda RX4 Wag”

[3,] “Datsun 710”

[4,] “Hornet 4 Drive”

[5,] “Hornet Sportabout”

[6,] “Valiant”

[7,] “Duster 360”

[8,] “Merc 240D”

[9,] “Merc 230”

[10,] “Merc 280”

We can look up the row in x$merge that contains the member “-4”:

[4,] -4 -6

It tells us that “Hornet 4 Drive” is a member of cluster #4, which contains “Hornet 4 Drive” (-4) and “Valiant” (-6).