Calculate the Number of Days Between Dates in Excel

Hi again,

There is a really useful undocumented function in Excel called Datedif().  It calculates the differences between two given dates in the unit of time specified:

= DATEDIF ( start_date , end_date , unit )

The function has three arguments that need to be entered as part of the function:

  • start_date – the first or starting date.
  • end_date – the second or last date.
  • unit – tells the function to find the number of days (“D”), complete months (“M”), or complete years (“Y”) between the two dates.

The unit parameter MUST be enclosed in quotes and start_date MUST be less than end_date otherwise an error will occur.

More at these links below:

Posted in Uncategorized | Leave a comment

Microsoft Excel 2007 Table Formula Issue

Excel Tables are a convienient way to arrange and reference your data in a spreadsheet.

However I had an infuriating problem with formulas not working within my table. No amount of cell reformatting, cutting and pasting, or resizing table dimensions seemed to cure the problem.

However, it was really quite simple. Formula’s in Excel 2007 tables are CASE SENSITIVE as you can see from the attached image:

Excel Formula Table Problems

Write case-sensitive formulas and you will be fine!

Posted in Uncategorized | Tagged , , , | 1 Comment

Taxation Statistics by Postcode

Been looking to add average salary by Australian post code information to a data model that I am designing.

Initially my searches, despite the power of google, didn’t come up with too much. The ABS site, you would think would be the first port of call, but they group average wage by this notion of the “Local Government Area” (LGA).

There is some discussion online about how to transform the LGA code to a postcode – which was a step that I quite seriously wanted to avoid.

However after searching for “taxation statistics 2009-10” on google, I found what I was looking for on the ATO site – you do have to search on their site quite hard though.

Under the “Detailed tables and description” -> “Personal Tax”

You can also search for previous years

Posted in data, forecasting, regression, statistics | Tagged , , , | Leave a comment

R Binomial Regression

Been doing some propensity modelling recently using R glm package, but upon recommendation from a professional in my network, I wanted to try k-fold cross validation to asses the accuracy of my model.

So doing some research I found a great article on fitting models on this site (which I have linked to before):

This is a really great site – such good clear examples of R code, with comments, so you can understand what is going on.  The models in this example is a linear model, while I am using a glm, so instead of using cv.lm, I can use cv.glm to run the cross validation.

#Grab Data
e_data = read.csv("G:/Binomial Regression/set.csv", sep=',', header= TRUE, stringsAsFactors = FALSE)

#Build Model
model <- glm(Y ~ X1 + X2 + X3, family=binomial(link = "logit"))

cv.err <- cv.glm(lease_data, model, K=10) # 10 fold cross-validation

#return delta

The deltas should be compared across different models, and indeed against an MSE you may have calculated using a more tradition test/validation set.

NB: here as a good article on cross validation by Rob Hyndman :

Posted in data, forecasting, R, statistics, Uncategorized | Leave a comment

Propensity Modelling

Been doing some propensity modelling recently.  There are some really good articles out there on how to perform the Binomial Regression.  Provided you’ve setup your data correctly, it’s fairly straightforward.

This is a really good simple example:

Slightly more complicated:


Posted in Uncategorized | Tagged , , | Leave a comment

R Cluster Analysis

Hi All,

Have been fiddling about with various statistical packages for cluster analysis.  Been using SPSS and R namely, and will shortly use Minitab.  While I really like SPSS, as I find the Command Line infuriating these days, I was quietly impressed with R. So having got my dataset together, I plugged into

SPSS outputs the clusters in the following way, which is quite neat.

By double clicking on the input boxes one is able to see the distribution of data.  From there one can do ones comparisons.  SPSS also outputs the cluster number back into the original dataset – like this:

              mpg cyl  disp  hp drat    wt  qsec vs am gear carb  Cluster
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 2

Haven’t been able to display a dendrogram.

R on the other hand requires command line!  A great and simple tutorial is here.

For my dataset however it resulted in a dendrogram that looked like this.  It’s a little hard to read, lets be honest!

One can fiddle around with the font size, but with this number of rows, one really needs the data in spreadsheet form to see what row belongs to what cluster.

This can be done by using the following commands (with thanks to Chi Yau from

> B <- mtcars[1:10, ]
> x <- hclust(dist(as.matrix(B)))

The cluster tree info is in x$merge:

> x$merge
[,1] [,2]
[1,]   -1   -2
[2,]  -10    1
[3,]   -3   -9
[4,]   -4   -6
[5,]   -8    3
[6,]    2    5
[7,]   -5   -7
[8,]    4    6
[9,]    7    8

The first row of x$merge is (-1, -2). It means merging the 1st and 2nd cars of B into a new cluster #1.

The second row of x$merge is (-10, 1). It means merging the 10th cars of B and cluster #1 into a new cluster #2.

The third row of x$merge is (-3, -9). It means merging the 3rd and 9th cars of B into a new cluster #3.

The rest of the table can be interpreted similarly.

Hence to get the cluster information of, say, “Hornet 4 Drive”, which is the 4th car in B:

> cbind(x$labels)
[1,] “Mazda RX4”
[2,] “Mazda RX4 Wag”
[3,] “Datsun 710”
[4,] “Hornet 4 Drive”
[5,] “Hornet Sportabout”
[6,] “Valiant”
[7,] “Duster 360”
[8,] “Merc 240D”
[9,] “Merc 230”
[10,] “Merc 280”

We can look up the row in x$merge that contains the member “-4”:

[4,]   -4   -6

It tells us that “Hornet 4 Drive”  is a member of cluster #4, which contains “Hornet 4 Drive” (-4) and “Valiant” (-6).

Posted in forecasting, statistics | Tagged , , , , | Leave a comment

New job, etc

Haven’t posted in a while…

Not in gambling anymore

The black smudge on my heart that was working in the gambling industry, ceased 4 weeks ago, and now I work at a growing ASX 200 company in the Business Intelligence team.  It’s been interesting so far.

How to find a job

The process whereby I got this job was interesting, and dimmed my view of HR company recruiters.  Some reacted personally to my non-acceptance of jobs, which I found uncomfortable, and others were a fraction pushy.  I am not into hard sells so it was difficult for me.  But I guess they are on commission and it’s their livelihood so good luck to them.

I did receive some help from them along the way.  After starting my job search in June, I sent out over 40 applications for the grand total of 1 job interview.  A good samaritan from Suncorp (whom I need to take out for coffee in gratitude), knew me through a mutual contact and caught up with me, suggesting a few tweaks to my CV…

I beefed up my statistics/analysis background, and the result was … 11 interviews at 7 companies in 7 days and 3 job offers.

The lesson is – make your CV relevant to the job you’re going for – mine was too IT focussed.  The interviews after that, seemed to be a breeze, almost to the point that I was pre-empting questions from the interviewee.  A great experience, which led to the difficult decision of “which job do I take?”.  City location, biggest salary, and being convinced by my new manager got me across the line.

So far, it’s been great.

Time Series Forecasting

Head along to this talk last night

Normally these R meetings bore me a little bit (too much coding, too little statistical analysis), but this one was really good.  Rob Hyndman is a great lecturer at Monash Uni and showed off his forecasting packages in R.  They looked ridiculously simple to use (indeed it was almost a black box).  Ie, throw some data into his functions and watch it choose the best model and spit back out an accurate forecast.

The package did exponential smoothing, ARIMA, and box-cox forecasts.  Was really amazing, and will make some aspects of my job easy.

Rstudio also looked like a good R environment to work in.

Had the joy (sigh) of running a 99designs contest for a logo that I wanted to create for a sports club I play for.  99designs is a competition site, where you post a design brief and a prize, and designers compete for the prizemoney by submitting potential designs.  After 7 days or so you pick a winner.  It’s a site that has created justifiable angst in the design community:

And quite frankly the underlying mood of the contest (designer comments, etc) indicated some level of hostility/frustration.  Not a friendly environment to work in.  If you want something done, go see your local graphic designer.  That’s my advice.

Well I think I broached lots of topics here…until next time…

Posted in data, life, statistics | Tagged , , , | Leave a comment