More Retrosheet stuff…

If you read my previous post I was using some Retrosheet tools to extract data from Retrosheet play by play files…

http://www.retrosheet.org/datause.txt

The problem with the previous command specified was that it spat out a series of columns that were un-named.  From the command line one can actually specify what columns that you wish to extract:

  -f flist  give list of fields to output
              Default is 0-6,8-9,12-13,16-17,26-40,43-45,51,58-61

The following list presents all of the above options with the numbers to use
with the -f option to specify them. Those marked with an asterisk are produced
by the default option when the user specifies no fields.

Therefore one can use a command such as this:

BEVENT.EXE -y 2010 -s 0101 -e 1231 -f 0-6 2010*.EV* > 2010.1.csv

number field
—— —–
0 game id*
1 visiting team*
2 inning*
3 batting team*
4 outs*
5 balls*
6 strikes*
7 pitch sequence
8 vis score*
9 home score*
10 batter
11 batter hand
12 res batter*
13 res batter hand*
14 pitcher
15 pitcher hand
16 res pitcher*
17 res pitcher hand*
18 catcher
19 first base
20 second base
21 third base
22 shortstop
23 left field
24 center field
25 right field
26 first runner*
27 second runner*
28 third runner*
29 event text*
30 leadoff flag*
31 pinchhit flag*
32 defensive position*
33 lineup position*
34 event type*
35 batter event flag*
36 ab flag*
37 hit value*
38 SH flag*
39 SF flag*
40 outs on play*
41 double play flag
42 triple play flag
43 RBI on play*
44 wild pitch flag*
45 passed ball flag*
46 fielded by
47 batted ball type
48 bunt flag
49 foul flag
50 hit location
51 num errors*
52 1st error player
53 1st error type
54 2nd error player
55 2nd error type
56 3rd error player
57 3rd error type
58 batter dest* (5 if scores and unearned, 6 if team unearned)
59 runner on 1st dest* (5 if scores and unearned, 6 if team unearned)
60 runner on 2nd dest* (5 if scores and unearned, 6 if team unearned)
61 runner on 3rd dest* (5 if socres and uneanred, 6 if team unearned)
62 play on batter
63 play on runner on 1st
64 play on runner on 2nd
65 play on runner on 3rd
66 SB for runner on 1st flag
67 SB for runner on 2nd flag
68 SB for runner on 3rd flag
69 CS for runner on 1st flag
70 CS for runner on 2nd flag
71 CS for runner on 3rd flag
72 PO for runner on 1st flag
73 PO for runner on 2nd flag
74 PO for runner on 3rd flag
75 Responsible pitcher for runner on 1st
76 Responsible pitcher for runner on 2nd
77 Responsible pitcher for runner on 3rd
78 New Game Flag
79 End Game Flag
80 Pinch-runner on 1st
81 Pinch-runner on 2nd
82 Pinch-runner on 3rd
83 Runner removed for pinch-runner on 1st
84 Runner removed for pinch-runner on 2nd
85 Runner removed for pinch-runner on 3rd
86 Batter removed for pinch-hitter
87 Position of batter removed for pinch-hitter
88 Fielder with First Putout (0 if none)
89 Fielder with Second Putout (0 if none)
90 Fielder with Third Putout (0 if none)
91 Fielder with First Assist (0 if none)
92 Fielder with Second Assist (0 if none)
93 Fielder with Third Assist (0 if none)
94 Fielder with Fourth Assist (0 if none)
95 Fielder with Fifth Assist (0 if none)
96 event num

Posted in baseball, data, retrosheet, statistics | Tagged , , | Leave a comment

Back to Baseball Statistics

Yep, I’m back to looking at Baseball statistics at work…

Will need to download data from the following sites:

http://baseball1.com/ and http://www.retrosheet.org/

While baseball1.com allows you to download in Access/CSV format, retrosheet.org has it’s own proprietary format for flat file storage…

However, there are clever people out there who have written converters which is very handy.  Don’t want to re-invent the wheel, so I’m heading over to these two sites to check their data conversion applications:

http://chadwick.sourceforge.net/doc/index.html

http://www.retrosheet.org/tools.htm

Ended up using the Retrosheet tools to extract the data.  The instructions to do this are here:

http://www.retrosheet.org/stepex.txt

The following command was used to convert the data into CSV format.

BEVENT.EXE -y 2010 -s 0101 -e 1231 2010*.EV* > 2010.csv

The CSV file comes without headers, but the following link gives  a list of the columns that the BEVENT.EXE file generates:

http://www.retrosheet.org/datause.txt

Posted in baseball, data, retrosheet, statistics | Tagged , , | Leave a comment

Clive James

I’ve always liked Clive James, here are some transcripts of his talks…I particularly like this one about his battles to give up smoking…

http://www.clivejames.com/point-of-view/smoking

Posted in Uncategorized | Leave a comment

Cassandra/Hadoop/Apache Stuff

I posted a few weeks ago that I was going to look into this Hadoop stuff.  Well, I actually looked into the “Cassandra” project…and managed to get it going on a LAMP stack

Cassandra is “A scalable multi-master database with no single points of failure”.

On Ubuntu 10.04 I extracted it to a directory and then installed phpcassa.  It took a while but I was able to run the tutorial script defined here:

http://thobbs.github.com/phpcassa/tutorial.html

I guess now it’s time to change my mindset from a relational (MySQL) database to this new way of storing data.

 

Posted in Uncategorized | Leave a comment

The Demons – in turmoil again/How to run a professional sports organisation

The Australian Rules football club I support here in Melbourne – the Melbourne Football Club – are in turmoil again after a near record thrashing at the hands of the Geelong Cats.

To be honest with the work on my Masters (playing sport on the weekend) my interest in the game has waned over the last few years, however I still go to the odd game and occasionally read an opinion piece about the Demons in the paper.

For my entire lifetime, and since 1964, the club has pretty much lurched from one crisis to the next (proposed mergers, poor membership, poor performance, grand final thrashings, sacked coaches, tragic injuries to key players) and on the weekend we sacked our coach Dean Bailey.

All this news prompted me to voraciously consume online and traditional media content about the Demons, something I haven’t done for quite a while.  In this search I stumbled across MFC CEO Cameron Schwab’s youtube “Wednesday Whiteboard” sessions.

Cameron is probably my favourite sports Administrator outside of Oakland A’s Billy Beane (yes, Mr Moneyball), and while he may not be the subject of any books, I think he’s one of the most compelling speakers in AFL football.

So here’s an example of “Wednesday Whiteboard”:

Posted in Uncategorized | 1 Comment

The U.S. Debt Owed by Each American Throughout History – The Atlantic

The U.S. Debt Owed by Each American Throughout History – The Atlantic.

Interesting article about the US Debt!  $14 Trillion is a lot of money!

Posted in Uncategorized | Leave a comment

X,Y,Z Co-ordinate Charting/Visualisation

Working with a raw dataset at the moment, and my team believes that a certain piece of data within the set may be x,y,z co-ordinates.  We also thought that the raw data we had could be better interpreted if we visualised it. (“No way??!!” I hear you say…)

Unfortunately Excel doesn’t offer a readily available way to plot things in 3d, so I went on a bit of a search for some free tools online.  Came across KDNuggets of course, and their list of tools:

http://www.kdnuggets.com/software/visualization.html

and…

http://personal.cscs.ch/~mvalle/visualization/tools.html

But then I thought, R! R has scatterplots in 3d, of course…

http://www.statmethods.net/graphs/scatterplot.html

Let’s see how that goes…

 

Posted in Uncategorized | Leave a comment