R Cluster Analysis

Hi All,

Have been fiddling about with various statistical packages for cluster analysis.  Been using SPSS and R namely, and will shortly use Minitab.  While I really like SPSS, as I find the Command Line infuriating these days, I was quietly impressed with R. So having got my dataset together, I plugged into

SPSS outputs the clusters in the following way, which is quite neat.

By double clicking on the input boxes one is able to see the distribution of data.  From there one can do ones comparisons.  SPSS also outputs the cluster number back into the original dataset – like this:

              mpg cyl  disp  hp drat    wt  qsec vs am gear carb  Cluster
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 2

Haven’t been able to display a dendrogram.

R on the other hand requires command line!  A great and simple tutorial is here.

For my dataset however it resulted in a dendrogram that looked like this.  It’s a little hard to read, lets be honest!

One can fiddle around with the font size, but with this number of rows, one really needs the data in spreadsheet form to see what row belongs to what cluster.

This can be done by using the following commands (with thanks to Chi Yau from http://www.r-tutor.com/

> B <- mtcars[1:10, ]
> x <- hclust(dist(as.matrix(B)))

The cluster tree info is in x$merge:

> x$merge
[,1] [,2]
[1,]   -1   -2
[2,]  -10    1
[3,]   -3   -9
[4,]   -4   -6
[5,]   -8    3
[6,]    2    5
[7,]   -5   -7
[8,]    4    6
[9,]    7    8

The first row of x$merge is (-1, -2). It means merging the 1st and 2nd cars of B into a new cluster #1.

The second row of x$merge is (-10, 1). It means merging the 10th cars of B and cluster #1 into a new cluster #2.

The third row of x$merge is (-3, -9). It means merging the 3rd and 9th cars of B into a new cluster #3.

The rest of the table can be interpreted similarly.

Hence to get the cluster information of, say, “Hornet 4 Drive”, which is the 4th car in B:

> cbind(x$labels)
[,1]
[1,] “Mazda RX4”
[2,] “Mazda RX4 Wag”
[3,] “Datsun 710”
[4,] “Hornet 4 Drive”
[5,] “Hornet Sportabout”
[6,] “Valiant”
[7,] “Duster 360”
[8,] “Merc 240D”
[9,] “Merc 230”
[10,] “Merc 280”

We can look up the row in x$merge that contains the member “-4”:

[4,]   -4   -6

It tells us that “Hornet 4 Drive”  is a member of cluster #4, which contains “Hornet 4 Drive” (-4) and “Valiant” (-6).


Advertisements
This entry was posted in forecasting, statistics and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s