<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Downloading MLB data with Python&#8230;</title>
	<atom:link href="http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=downloading-mlb-data-with-python</link>
	<description>on the cutting edge of the back burner</description>
	<lastBuildDate>Tue, 07 Sep 2010 07:17:30 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Brock</title>
		<link>http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/comment-page-1/#comment-34</link>
		<dc:creator>Brock</dc:creator>
		<pubDate>Mon, 06 Jul 2009 23:50:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wellsoliver.com/?p=58#comment-34</guid>
		<description>Thanks for the feedback!!!  I cant wait to work through this.

Many thanks,

Brock</description>
		<content:encoded><![CDATA[<p>Thanks for the feedback!!!  I cant wait to work through this.</p>
<p>Many thanks,</p>
<p>Brock</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wells</title>
		<link>http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/comment-page-1/#comment-33</link>
		<dc:creator>Wells</dc:creator>
		<pubDate>Mon, 06 Jul 2009 15:25:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wellsoliver.com/?p=58#comment-33</guid>
		<description>OK Brock, let me try and start from the top. First I think a lot of your issues are because you&#039;re running Windows, and I haven&#039;t tested this script (or any script) on a Windows machine because I don&#039;t have access to one (by design!). My apologies. I&#039;m wondering if the threading is causing some issue when working w/ the FAT32 file system.

1. I am running 2.6. MySQLdb can be built and compiled into python 2.6.2.

2. I&#039;m not sure where you&#039;d install Chadwick on Windows, but all you need to do is change the location as defined in the variable on line 21 (CHADWICK) to like &quot;c:\program files\chadwick&quot;, or whatever.

3. Classes are a nice way of encapsulating functionality into discrete units to avoid redundant code as well as provide easy reuse. This is a huge over simplification. It&#039;s a big topic, maybe you could start here: http://en.wikipedia.org/wiki/Object-oriented_programming

4. Threads offer a way of running concurrent processes, so you can handle N number of things at once versus handling one thing, waiting for it to finish, and continuing on. This too is an oversimplification- threading can become quite complex, but that&#039;s the central idea.

5. Queues in python are much faster to iterate through: when you use a list, you have to .pop(0) to get the next element- to do this, python makes a copy of that list. This takes a while, especially when you have a large set of data. Queues, in my experience, also play nicer with multithreaded applications.

6a. I am confused as to why you&#039;d get an IO error on line 70- that line just connects to a database. The culprit is probably my lack of attention to Windows and the FAT32/NTFS filesystem- I wonder if it&#039;s locking something where my ext3fs (on Linux) is not. I&#039;ll look into this, but I don&#039;t have a great answer. You sure it&#039;s line 70?

6b. Do you have a line # for where this error is occuring?

6c. This could be related to 6a. I&#039;ll try to find a Windows machine and test this sucker out.

Regarding your second post..

1. Python breaks a for loop when it matches the upper bound of the range(), so in this case if you want 2009 to be included, you have to go to 2010, and it will break when it reaches 2010.

2. &#039;self&#039; is the more convention in python than anything- it refers to the actual object that you&#039;ve instantiated (based on the class). I&#039;d refer you back to that Wikipedia link on OO programming, and maybe Google for &quot;python self&quot; and see what there is to see.

3. Do you mean the import statements? Or something else? The import statements are in no particular order.

4. Do you mean the pbp script, or the Retrosheet script? I am assuming the former, b/c there are no XML files w/ the Retrosheet information. So regarding the former, the location of the saved XML files is defined by the OUTPUT variable on line 12- if you change that to like &#039;OUTPUT=&quot;c:\wherever\&quot;&#039;, it will put the XML files there.

Hope this helps.</description>
		<content:encoded><![CDATA[<p>OK Brock, let me try and start from the top. First I think a lot of your issues are because you&#8217;re running Windows, and I haven&#8217;t tested this script (or any script) on a Windows machine because I don&#8217;t have access to one (by design!). My apologies. I&#8217;m wondering if the threading is causing some issue when working w/ the FAT32 file system.</p>
<p>1. I am running 2.6. MySQLdb can be built and compiled into python 2.6.2.</p>
<p>2. I&#8217;m not sure where you&#8217;d install Chadwick on Windows, but all you need to do is change the location as defined in the variable on line 21 (CHADWICK) to like &#8220;c:\program files\chadwick&#8221;, or whatever.</p>
<p>3. Classes are a nice way of encapsulating functionality into discrete units to avoid redundant code as well as provide easy reuse. This is a huge over simplification. It&#8217;s a big topic, maybe you could start here: <a href="http://en.wikipedia.org/wiki/Object-oriented_programming" rel="nofollow">http://en.wikipedia.org/wiki/Object-oriented_programming</a></p>
<p>4. Threads offer a way of running concurrent processes, so you can handle N number of things at once versus handling one thing, waiting for it to finish, and continuing on. This too is an oversimplification- threading can become quite complex, but that&#8217;s the central idea.</p>
<p>5. Queues in python are much faster to iterate through: when you use a list, you have to .pop(0) to get the next element- to do this, python makes a copy of that list. This takes a while, especially when you have a large set of data. Queues, in my experience, also play nicer with multithreaded applications.</p>
<p>6a. I am confused as to why you&#8217;d get an IO error on line 70- that line just connects to a database. The culprit is probably my lack of attention to Windows and the FAT32/NTFS filesystem- I wonder if it&#8217;s locking something where my ext3fs (on Linux) is not. I&#8217;ll look into this, but I don&#8217;t have a great answer. You sure it&#8217;s line 70?</p>
<p>6b. Do you have a line # for where this error is occuring?</p>
<p>6c. This could be related to 6a. I&#8217;ll try to find a Windows machine and test this sucker out.</p>
<p>Regarding your second post..</p>
<p>1. Python breaks a for loop when it matches the upper bound of the range(), so in this case if you want 2009 to be included, you have to go to 2010, and it will break when it reaches 2010.</p>
<p>2. &#8216;self&#8217; is the more convention in python than anything- it refers to the actual object that you&#8217;ve instantiated (based on the class). I&#8217;d refer you back to that Wikipedia link on OO programming, and maybe Google for &#8220;python self&#8221; and see what there is to see.</p>
<p>3. Do you mean the import statements? Or something else? The import statements are in no particular order.</p>
<p>4. Do you mean the pbp script, or the Retrosheet script? I am assuming the former, b/c there are no XML files w/ the Retrosheet information. So regarding the former, the location of the saved XML files is defined by the OUTPUT variable on line 12- if you change that to like &#8216;OUTPUT=&#8221;c:\wherever\&#8221;&#8216;, it will put the XML files there.</p>
<p>Hope this helps.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brock</title>
		<link>http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/comment-page-1/#comment-30</link>
		<dc:creator>Brock</dc:creator>
		<pubDate>Sun, 05 Jul 2009 00:13:21 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wellsoliver.com/?p=58#comment-30</guid>
		<description>Ok, let me work backwards.  Maybe I tried to bite off too much too quickly.  The code above was more manageable, and after a few tweaks (indenting matters?), I got it to run. 

I know I am asking alot from you, but maybe these few questions will help with the laundry list I wrote above.

1)  I tried to set the range to be (2009,2009) - just wanted one year - and to get it to run, I had to say 2009,2010.  Why?  In VBA, I think for comparable projects, I did 2009,2009, so I am kinda confused on that one (but I did figure it out! :) )

2)  This self thing is throwing me.  I dont see it defined anywhere, I am kinda lost as to what it implies, what it gets, etc.  Obviously from the name and google searches I get that it means this project, or something referenced inside the module, but I cant quite get my head around its &quot;value&quot;

3)  In what order are the modules called - I guess I am thinking in terms of how the program works through each iteration.  I can look at the modules and understand the general process of each, but the order is throwing me.

4)  Finally, how is the file path where the xml files determined?  I found the files on my computer, but I would have preferred to store them someplace else.  



Honestly, I apologize, because I know I am asking alot.  Any help or insight is MASSIVELY appreciated!

Best wishes,  Brock</description>
		<content:encoded><![CDATA[<p>Ok, let me work backwards.  Maybe I tried to bite off too much too quickly.  The code above was more manageable, and after a few tweaks (indenting matters?), I got it to run. </p>
<p>I know I am asking alot from you, but maybe these few questions will help with the laundry list I wrote above.</p>
<p>1)  I tried to set the range to be (2009,2009) &#8211; just wanted one year &#8211; and to get it to run, I had to say 2009,2010.  Why?  In VBA, I think for comparable projects, I did 2009,2009, so I am kinda confused on that one (but I did figure it out! :) )</p>
<p>2)  This self thing is throwing me.  I dont see it defined anywhere, I am kinda lost as to what it implies, what it gets, etc.  Obviously from the name and google searches I get that it means this project, or something referenced inside the module, but I cant quite get my head around its &#8220;value&#8221;</p>
<p>3)  In what order are the modules called &#8211; I guess I am thinking in terms of how the program works through each iteration.  I can look at the modules and understand the general process of each, but the order is throwing me.</p>
<p>4)  Finally, how is the file path where the xml files determined?  I found the files on my computer, but I would have preferred to store them someplace else.  </p>
<p>Honestly, I apologize, because I know I am asking alot.  Any help or insight is MASSIVELY appreciated!</p>
<p>Best wishes,  Brock</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brock</title>
		<link>http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/comment-page-1/#comment-29</link>
		<dc:creator>Brock</dc:creator>
		<pubDate>Sat, 04 Jul 2009 13:25:50 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wellsoliver.com/?p=58#comment-29</guid>
		<description>Many, many thanks.  No issues...just trying to learn.  I wanted to post to the other entry, but the form was closed...my apologies.  I will try to break up my questions, because I am sure at some point, you will get sick of answering them :), which is completely fine.  But here it goes:

Pre-python questions:

1.  What version are you using?  I was getting an error when I was using Python 2.5 on the extractall from zipfile, and when I looked it up, it appears that it is only available in 2.6.  I upgraded, but MySQLdb isnt supported on 2.6.  I have to be missing something obvious.  That said, i did find a link that I believe installed the module for 2.6, and in Eclipse, while I believe I can connect and query the database, the error says that it is an unresolved import.  

2.  I downloaded the same version of Chadwick as you, but where do I put all of the files, or some of the files?  I can not believe I did this right (as I am unsure as to what &quot;/usr/local/bin/&quot; is).

Python basics:

3.  One of the hardest things I having trouble doing is looking at modules and classes.  I think I start to see why its good to use modules, but why classes?  When you call a class, does it just run all of the modules for you..is that the idea?  Also, in your code and others, I often see def__init__.  Between that and self (what is this?), I have no idea what is going on.

4.  What are threads, as in threading.thread.

5.  Is there an advantage to using queue instead of a list?

6.  As for when I execute the program, I am getting a host of errors.  I wont paste my console here, but I will try to summarize. 

6A)  The first error I am getting is WindowsError: Error32, which appears to be at os.f at approximately line 70.  it says that it cannot access the file because it is being used by another process.  I get this error 20 times, which appears to be the range you set in the for loop (and as the constant).  I am using Vista Ultimate 64bit...this is probably the culprit, right?

6B)  When it processes the game files (printed to console), I get an error that the system cannot find the specified path.

6C)  The console tells me that it is processing events and games for years 1911 thru 1961.  At the end, I get another error on os.remove(file).  Again, it says that it cannot access the the file because it is being used by another program.


Phew.  This is probably pushing my luck, but I can tell you I already have learned a TON!  I learn by seeing, so regex isnot as scary anymore, and I have completely started to internalize program flow when using modules as well as loops.  A major step forward in just one day.

Anyway, I aplogize for the long comment, but I really, really do appreciate your time.  Have a great 4th!</description>
		<content:encoded><![CDATA[<p>Many, many thanks.  No issues&#8230;just trying to learn.  I wanted to post to the other entry, but the form was closed&#8230;my apologies.  I will try to break up my questions, because I am sure at some point, you will get sick of answering them :), which is completely fine.  But here it goes:</p>
<p>Pre-python questions:</p>
<p>1.  What version are you using?  I was getting an error when I was using Python 2.5 on the extractall from zipfile, and when I looked it up, it appears that it is only available in 2.6.  I upgraded, but MySQLdb isnt supported on 2.6.  I have to be missing something obvious.  That said, i did find a link that I believe installed the module for 2.6, and in Eclipse, while I believe I can connect and query the database, the error says that it is an unresolved import.  </p>
<p>2.  I downloaded the same version of Chadwick as you, but where do I put all of the files, or some of the files?  I can not believe I did this right (as I am unsure as to what &#8220;/usr/local/bin/&#8221; is).</p>
<p>Python basics:</p>
<p>3.  One of the hardest things I having trouble doing is looking at modules and classes.  I think I start to see why its good to use modules, but why classes?  When you call a class, does it just run all of the modules for you..is that the idea?  Also, in your code and others, I often see def__init__.  Between that and self (what is this?), I have no idea what is going on.</p>
<p>4.  What are threads, as in threading.thread.</p>
<p>5.  Is there an advantage to using queue instead of a list?</p>
<p>6.  As for when I execute the program, I am getting a host of errors.  I wont paste my console here, but I will try to summarize. </p>
<p>6A)  The first error I am getting is WindowsError: Error32, which appears to be at os.f at approximately line 70.  it says that it cannot access the file because it is being used by another process.  I get this error 20 times, which appears to be the range you set in the for loop (and as the constant).  I am using Vista Ultimate 64bit&#8230;this is probably the culprit, right?</p>
<p>6B)  When it processes the game files (printed to console), I get an error that the system cannot find the specified path.</p>
<p>6C)  The console tells me that it is processing events and games for years 1911 thru 1961.  At the end, I get another error on os.remove(file).  Again, it says that it cannot access the the file because it is being used by another program.</p>
<p>Phew.  This is probably pushing my luck, but I can tell you I already have learned a TON!  I learn by seeing, so regex isnot as scary anymore, and I have completely started to internalize program flow when using modules as well as loops.  A major step forward in just one day.</p>
<p>Anyway, I aplogize for the long comment, but I really, really do appreciate your time.  Have a great 4th!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wells</title>
		<link>http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/comment-page-1/#comment-28</link>
		<dc:creator>Wells</dc:creator>
		<pubDate>Sat, 04 Jul 2009 04:48:10 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wellsoliver.com/?p=58#comment-28</guid>
		<description>Hey Brock, fire away with any questions or issues.</description>
		<content:encoded><![CDATA[<p>Hey Brock, fire away with any questions or issues.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brock</title>
		<link>http://blog.wellsoliver.com/2009/06/downloading-mlb-data-with-python/comment-page-1/#comment-27</link>
		<dc:creator>Brock</dc:creator>
		<pubDate>Fri, 03 Jul 2009 21:02:02 +0000</pubDate>
		<guid isPermaLink="false">http://blog.wellsoliver.com/?p=58#comment-27</guid>
		<description>First let me say how happy I am that I found your postings.  I am a hobby programmer at best, but I have been trying to learn Python lately since a software package I use at work has embraced it.  

Anyway, I read Adler&#039;s book a ways back and wanted to teach myself Python  by replicating his work...which you did.  Long story short, your two baseball scripts embody everything I want to learn for Python (get web data, parse it into a db, and even extract zip files when needed). I can look at the code and have a general idea of what is going on (I already learned a few things as I worked through it), but I still have a ton of holes that I need to fill in. 

Anyway, since I am really new to Python and am not a computer scientist, working through errors that I am getting is next to impossible.  I am hoping you might be willing to help me out / answer some questions of mine so I can learn the skills you used and apply them to other projects down the road.  If not, which I would understand, any places you can suggest so I can try to debug my errors and learn how the program works?

Many thanks for posting this!  

- Brock</description>
		<content:encoded><![CDATA[<p>First let me say how happy I am that I found your postings.  I am a hobby programmer at best, but I have been trying to learn Python lately since a software package I use at work has embraced it.  </p>
<p>Anyway, I read Adler&#8217;s book a ways back and wanted to teach myself Python  by replicating his work&#8230;which you did.  Long story short, your two baseball scripts embody everything I want to learn for Python (get web data, parse it into a db, and even extract zip files when needed). I can look at the code and have a general idea of what is going on (I already learned a few things as I worked through it), but I still have a ton of holes that I need to fill in. </p>
<p>Anyway, since I am really new to Python and am not a computer scientist, working through errors that I am getting is next to impossible.  I am hoping you might be willing to help me out / answer some questions of mine so I can learn the skills you used and apply them to other projects down the road.  If not, which I would understand, any places you can suggest so I can try to debug my errors and learn how the program works?</p>
<p>Many thanks for posting this!  </p>
<p>- Brock</p>
]]></content:encoded>
	</item>
</channel>
</rss>
