Here they are in CSV with MLBAM’s venue ID added. Here’s a schema you can use should you want to run them into a database:
DROP TABLE IF EXISTS `park_factors`;
CREATE TABLE park_factors (
venue_id INT DEFAULT NULL,
year INT DEFAULT NULL,
name VARCHAR(100) DEFAULT NULL,
R FLOAT DEFAULT NULL,
H FLOAT DEFAULT NULL,
HR FLOAT DEFAULT NULL,
H2B FLOAT DEFAULT NULL,
H3B FLOAT DEFAULT NULL,
BB FLOAT DEFAULT NULL,
PRIMARY KEY (venue_id, year)
);
MLB.com provides, among other things, all of the pitch information for each MLB and AAA game in XML format. It’s the data that drives their wonderful little service Gameday. If you want to take a spin through what the data looks like, start here and poke around. What’s key, most folks agree, is the Pitch/FX information, but there’s also pitch-by-pitch logs for every game.
I’ve put together a little package which includes (1) a schema for a MySQL database to retain the information, and (2) a python script which will handle fetching and parsing the XML data found on MLB.com servers. If you’re interested in such a thing, you can download it from the github project page.
There is detailed installation and execution information found on the wiki at github as well but just to provide them here:
./gameday.py
With the following arguments:
--year=XXXX four digit year
--day=X,Y days in a comma separated list
--month=X,Y months in a comma separated list
--type=[mlb, aaa] optional: which league to process. Default is ‘mlb’. Any of the categories found here (AA, etc) should work- I’ve just worked with MLB and AAA.
--verbose Shows every HTTP request
--delta Uses delta mode.
When delta mode is run, the script will store the last date it processed in the database. Upon next execution, it will start from where it left off. This is useful for running the thing nightly to grab the latest stuff.