Archive for October, 2009

py-mlb: Python library for MLB.com “API”

So in a follow-up to an earlier post regarding MLB.com’s unofficial, undocumented, kind-of, sort-of API, I’ve written a Python abstraction layer for the league, the teams, and the players. Like everything it’s beta, or maybe alpha, and anything it does to your system is likely your own doing.

That being said I’m rather happy with the first iteration, but there are some design issues to iron out and lots of to-dos. I wanted to throw out it out there to the community so that A) it can be used, B) my Python skills and design ideas can be harshly criticized (only partially kidding here, I love to hear feedback), and C) anyone who wants to can work on the code as well.

As an example, here’s how one might get the entire roster for the Seattle Mariners:

from py_mlb import player, team, league
l = league.League()
team = l.teams['sea']
team.fetchRoster()
 
for player_id, player in team.roster.iteritems():
    print player.name_full

I’ll setup a permanent page for the project in a bit. For now, the project is hosted here at GitHub.

Cliff Lee makes good

Tonight featured an adjectivally challenging game pitched by Cliff Lee, providing his Phillies with a 1-0 lead in the best of seven series. Words like masterful, dominating, domineering, totally badass, etc, etc all feel somewhat inadequate in describing what we saw. Lee threw a complete game and earned no runs, the only Yankee crossing the plate on a throwing error by Jimmy Rollins in the ninth with one out remaining.

Lee’s game score for the evening was 89, which ties him for sixth all time in World Series games pitched with Don Drysdale (1963) and Hod Eller (1919). The best World Series game ever pitched by game score? It surprised me:

Dude Team Date Game Score IP ER SO
Babe Ruth BOS 1916-10-09 97 14 1 4
Don Larsen NYY 1956-10-08 94 9 0 7
Ed Walsh CHW 1906-10-11 94 9 0 12
Bob Gibson STL 1968-10-02 93 9 0 17
Randy Johnson ARI 2001-10-28 91 9 0 11
Monte Pearson NYY 1939-10-05 90 9 0 8
George Earnshaw PHA 1931-10-06 90 9 0 8
Bill Dinneen BOS 1903-10-02 90 9 0 11
Don Drysdale LAD 1963-10-05 89 9 0 9
Hod Eller CIN 1919-10-06 89 9 0 9
Cliff Lee PHI 2009-10-28 89 9 0 10

In Babe Ruth’s game two win in 1916, he out-pitched Brooklyn’s Sherry Smith who himself went 13.1 innings, allowing two runs and striking out two for a game score of 82. Really nothing to sneeze at.

Take away Rollins’s mistake in the ninth and Lee goes to 91, putting him in some pretty elite company. The Phillies could truly ask for nothing more than what they got from game one- if P-E-D-R-O and Hamels are able to even come close to Lee, we could well see a sweep.

Speaking of which, PEDRO. In Yankee Stadium. Like old times. Sort of.

MLB.com’s unofficially, thoroughly awesome API

So MLB.come has an unofficial API. When you go to any player’s page and open up an inspector such as Firefox’s Firebug or Safari’s Web Inspector, you can see that the page initiates a bunch of AJAX requests for various information including awesomely useful information such as player statistics and game logs.

Behold Felix Hernandez’s ‘core’ player information. Mouse-over to see the URL.

And then you can get Felix’s game log using this incredibly long URL. I haven’t really played with the query string parameters too much, but there’s one, ‘results’, which is used on the player’s page to limit the number of games shown to 10. Making that 165 will show all for the season. Likewise there’s ‘year’ which corresponds to the season. I’m not quite sure how far back the data will go. There’s also a ‘game_type’, which given that the value used in the URLs in this post is ‘R’, means regular/post/preseason.

Here’s the URL for a position player, Ichiro’s 2009 stats.

There are more GET calls to return detailed player biography and news-related items, awards, and other stuff. There are also GET calls on the team pages for team information.

Anyway. I’m pretty certain that these APIs are not meant for public consumption, and could change at any time, though I hope not, because they are awesome. You can use the data returned with any basic knowledge of XML and JSON, and most (if not all) programming languages now have pretty solid libraries for that.

Update: I have made a python abstraction layer for the MLB.com API. You can find it on its github project page or read more about it on this blog post.

FINAL UPDATE – This project lives now on my baseball projects page.

Retrosheet and github

In a post a few months ago I put up some code to spider the Retrosheet site, download game data, and build a database out of event & game data. A reader responded a week ago or so recommending an update for the code to work with the newer (0.5.2) version of Chadwick as well as suggesting I put the code out on github so it can be worked on collaboratively. I’m all for it, so I went ahead and signed up and made the project publicly available.

The project is located here. Dude who recommended this all to me made a branch and submitted a patch (pardon, git folks, if patch isn’t really the word, I’m still firmly planted in the Subversion universe), which I then applied to the master project.

DO FEEL FREE to fork and work on the thing – I’d love to see whatever you might make of it. In my spare time as of late I’ve been building a python library which pulls data from MLB.com for the current season (which, sadly, is all but over) and provides nice abstractions for working with it. I’ll pit it up on github when it’s ready to be roundly criticized publicized.

Not at all interested in talking about the past, McGwire to return?

HUH! The big news floating around tonight does not involve the weather in New York: it’s that the world shall soon see the (triumphant? tragic?) reentry of one Mark McGwire. According to… sources… he’s going to be LaRussa’s hitting coach next year in St. Louis.

I wonder if LaRussa is requiring this to be a contingency upon his own resigning: surely the Cardinals brass are telling him what a– if not horrible, at least horribly awkward PR issue this will be. I don’t think LaRussa quite gets PR, or quite gets the fact that other human beings act out entirely complex emotional lives outside of the minutiae contained inside those white lines of his universe. And not to intrude upon those very lines with frothing ignorance but wasn’t Mark McGwire a career .263 hitter?

It will be interesting to watch it unfold. STL fans seem to still love the guy- will fans act out their long bottled anger on McGwire on the road? Or will bygones be bygones?