On page 120 of Infinite Jest, came across this gem. Yes, there’s a gem clocked at probably every other paragraph, but…
Something humble, placid even, about inert feet under stall doors. The defecatory posture is an accepting posture, it occurs to him. Head down, elbows on knees, the fingers laced together between the knees. Some hunched timeless millennial type of waiting, almost religious. Luther’s shoes on the floor beneath the chamber pot, placid, possibly made of wood, Luther’s 16th century shoes, awaiting epiphany. The mute quiescent suffering of generations of salesmen in the stalls of train-station johns, heads down, fingers laced, shined shoes inert, awaiting the acid gush. Women’s slippers, centurions’ dusty sandals, dockworkers hobnailed boots, Pope’s slippers. All waiting, pointing straight ahead, slightly tapping. Huge shaggy-browed men in skins hunched just pasted the firelight’s circle with wadded leaves in one hand waiting.
Reading Infinite Jest again this summer as part of Infinite Summer. Wondering what to make of this little quip of Hal’s as he’s talking to Mario in their darkened dorm room, discussing the death of their father and their mother’s sad/not sad reaction:
Remember the flag only halfway up the pole? Booboo, there are two ways to lower a flag to half-mast. Are you listening? Because no shit I really have to sleep here in a second. So listen – one way to lower the flag to half mast is just to lower the flag. There’s another way though. You can also just raise the pole. You can raise the pole to like twice its original height. You get me? You understand what I mean, Mario?
Installed Twitter Tools and mostly liked it except for the fact I couldn’t edit the tweet format without editing the PHP file, and so when upgrading I would lose this. It also did a lot more than I wanted, and I’m a fan of software that seeks to do as little as possible. I cooked up a very simple plugin that will, if you so choose, send a status update to Twitter on posting. Options include using TinyURL for shortening of your post’s permalink, and the ability to edit the format of the tweet itself.
You can download the plugin here. Let me know if it should suddenly explode or run off with your wife.
posted: 23 June 2009, 3:08 pm by Wells
comments: 11
tags: wordpress
NOTE: Here’s an update to this post which includes better and more complete scripts.
Been reading a lot of Joseph Adler’s incredible book Baseball Hacks. In it he details many ways to retrieve, parse, and investigate various sources of MLB data. One of the sections covers the data used to run MLB.com’s Gameday application. This data is all stored in XML files hosted on MLB.com’s servers and contains information for every game played since 2005- batters, pitchers, the results of each play, the box scores, and play-by-play data including the location and information for each pitch, derived from the PitchFX system. MLB updates this information every day, with every game, and there’s so much you can pull from it.
Adler provides a few scripts in his book to spider the MLB site and retrieve the GameDay information using perl. I decided to cook one up with python, and the resulting script is far leaner, clocking in at 91 lines. For each game, it saves the box score, the players involved (older games stored this in a TXT file; newer in XML), information for each batter and pitcher in the game, the inning data (each play), and the play-by-play data for each pitch. It uses threading- something that in my humble opinion python handles far better than perl – to speed up the processing time while still being nice to the MLB.com servers.
BE FOREWARNED! I ran this script for seasons 2006 through the present day and it downloaded 9.2GB of data. I’ll be updating the script at some point to pull nighties. Additionally, I need to run all of this into a MySQL database, but for now, here is the script to grab all of the data.
#!/usr/bin/env python
import threading
import urllib2
import urllib
import os
import re
import calendar
import time
BASE = "http://gd2.mlb.com/components/game/mlb/"
OUTPUT = "./data/"
YEARS = range(2006,2010)
class Handler(threading.Thread):
def __init__(self, url):
threading.Thread.__init__(self)
self.url = url
def fetch(self, url):
for tries in xrange(10):
try:
#print "fetching %s" % url
page = urllib2.urlopen(url)
except urllib2.URLError, e:
if (type(e)) is urllib2.HTTPError and e.code == 404:
return ''
break
else:
print "retrying %s" % url
# one second sleep between retries
time.sleep(1)
continue
if page.code == 200:
return page.read()
break
def save(self, url, location):
content = self.fetch(url)
if not content:
return
if not os.path.exists(os.path.dirname(location)):
os.makedirs(os.path.dirname(location))
file = open(location, 'w')
file.write(content)
file.close()
def regex_save(self, url, regex):
content = self.fetch(url)
if not content:
return
for match in re.finditer(r'%s' % regex, content, re.S):
file = "%s%s" % (url, match.group(1))
location = "%s%s" % (OUTPUT, file.replace(BASE, ''))
self.save(file, location)
def run(self):
year = int(re.search(r'year_(\d{4})/', self.url, re.S).group(1))
month = int(re.search(r'month_(\d{2})/', self.url, re.S).group(1))
for day in range(1, calendar.monthrange(year, month)[1] + 1):
url = "%sday_%02d/" % (self.url, day)
html = self.fetch(url)
for match in re.finditer(r'<a href="(gid_\w+)/">', html, re.S):
game_url = "%sday_%02d/%s/" % (self.url, day, match.group(1))
location = "%s%s" % (OUTPUT, game_url.replace(BASE, ''))
self.save("%sboxscore.xml" % game_url, "%sboxscore.xml" % location)
self.save("%splayers.txt" % game_url, "%splayers.txt" % location)
self.save("%splayers.xml" % game_url, "%splayers.xml" % location)
self.regex_save("%spbp/batters/" % game_url, '</a><a href="(\d+\.xml)">')
self.regex_save("%spbp/pitchers/" % game_url, '</a><a href="(\d+\.xml)">')
self.regex_save("%sinning/" % game_url, '</a><a href="(inning_\d+\.xml)">')
self.regex_save("%sbatters/" % game_url, '</a><a href="(\w+\.xml)">')
self.regex_save("%spitchers/" % game_url, '</a><a href="(\w+\.xml)">')
# one second sleep between games
time.sleep(1)
threads = []
for year in YEARS:
print "processing %d..." % year
for month in range (3,12):
handler = Handler("%syear_%02d/month_%02d/" % (BASE, year, month))
handler.start()
threads.append(handler)
</a>
KYLE MACLACHLAN MUST BE STOPPED. The man wants to reboot, or remake, or rekindle, or restart, or simply maybe regurgitate TWIN PEAKS on the web, using five-minute episodes, or, as the kids, call them, *barf* – WEBISODES. Normal, well-adjusted, obsessive fans of the original show, which was great, cannot let this happen. MacLachlan says that David Lynch will not be involved. Well great, that’s like saying there’s this new Nirvana record coming out, sans Kurt Cobain! Count me out.
Granted, all right, the second half of the second season was HORRIBLE, but the first season was far away one of the brighter shining moments in the history of television, and that’s even if you believe in that hogwash phony-baloney Hollywood “moon landing” thing which of course you DON’T.
We’ll see how this develops. Hopefully, it won’t – then I can go about maintaining my memory of the original show as it was sullied and tarnished by… the original show. Ah, nevermind.
posted: 19 June 2009, 3:45 pm by Wells
comments: 0