<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Silveira Neto &#187; urllib</title>
	<atom:link href="http://silveiraneto.net/tag/urllib/feed/" rel="self" type="application/rss+xml" />
	<link>http://silveiraneto.net</link>
	<description></description>
	<lastBuildDate>Fri, 09 Mar 2012 04:13:27 +0000</lastBuildDate>
	<language>pt-br</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Python Fast XML Parsing</title>
		<link>http://silveiraneto.net/2009/12/25/python-fast-xml-parsing/</link>
		<comments>http://silveiraneto.net/2009/12/25/python-fast-xml-parsing/#comments</comments>
		<pubDate>Fri, 25 Dec 2009 18:04:50 +0000</pubDate>
		<dc:creator>Silveira</dc:creator>
				<category><![CDATA[english]]></category>
		<category><![CDATA[dtd]]></category>
		<category><![CDATA[expat]]></category>
		<category><![CDATA[game]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[pygame]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[sax]]></category>
		<category><![CDATA[schema]]></category>
		<category><![CDATA[urllib]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://silveiraneto.net/?p=3081</guid>
		<description><![CDATA[<a href="http://silveiraneto.net/2009/12/25/python-fast-xml-parsing/" title="Python Fast XML Parsing"></a>Here is a useful tip on Python XML decoding. I was extending xml.sax.ContentHandler class in a example to decode maps for a Pygame application when my connection went down and I noticed that the program stop working raising a exception &#8230;<p class="read-more"><a href="http://silveiraneto.net/2009/12/25/python-fast-xml-parsing/">Read more &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<a href="http://silveiraneto.net/2009/12/25/python-fast-xml-parsing/" title="Python Fast XML Parsing"></a><p style="text-align: center;"><img class="size-full wp-image-3082 aligncenter" title="monty python bunny toy" src="http://silveiraneto.net/wp-content/uploads/2009/12/monty_python_bunny_toy.jpg" alt="" width="175" height="178" /></p>
<p>Here is a useful tip on Python XML decoding.</p>
<p>I was extending <a title="Python Documentation on XML SAX" href="http://docs.python.org/library/xml.sax.html">xml.sax.ContentHandler</a> class in <a title="Tiled TMX Map Loader for Pygame" href="http://silveiraneto.net/2009/12/19/tiled-tmx-map-loader-for-pygame/">a example to decode maps for a Pygame application</a> when my connection went down and I noticed that the program stop working raising a exception regarded a call to <a title="Python Documentation on urllib" href="http://docs.python.org/library/urllib.html">urlib</a> (a module for retrieve resources by url). I noticed that the module was getting the remote <a title="Wikipedia on Document Type Definition" href="http://en.wikipedia.org/wiki/Document_Type_Definition">DTD schema</a> to validate the XML.</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #00bbdd;">&lt;!DOCTYPE map SYSTEM &quot;http://mapeditor.org/dtd/1.0/map.dtd&quot;&gt;</span></pre></div></div>

<p>This is not a requirement for my applications and it&#8217;s a huge performance overhead when works (almost 1 second for each map loaded) and when the applications is running in a environment without Internet it just waits for almost a minute and then fail with the remain decoding. A dirty workaround is open the XML file and get rid of the line containing the DTD reference.</p>
<p>But the correct way to programming XML decoding when we are not concerned on validate a XML schema is just the <a href="http://docs.python.org/library/pyexpat.html">xml.parsers.expat</a>. Instead of using a interface you just have to set some callback functions with the behaviors we want. This is a example from the documentation:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">xml</span>.<span style="color: black;">parsers</span>.<span style="color: black;">expat</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># 3 handler functions</span>
<span style="color: #ff7700;font-weight:bold;">def</span> start_element<span style="color: black;">&#40;</span>name, attrs<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Start element:'</span>, name, attrs
<span style="color: #ff7700;font-weight:bold;">def</span> end_element<span style="color: black;">&#40;</span>name<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'End element:'</span>, name
<span style="color: #ff7700;font-weight:bold;">def</span> char_data<span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Character data:'</span>, <span style="color: #dc143c;">repr</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span>
&nbsp;
p = <span style="color: #dc143c;">xml</span>.<span style="color: black;">parsers</span>.<span style="color: black;">expat</span>.<span style="color: black;">ParserCreate</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
p.<span style="color: black;">StartElementHandler</span> = start_element
p.<span style="color: black;">EndElementHandler</span> = end_element
p.<span style="color: black;">CharacterDataHandler</span> = char_data
&nbsp;
p.<span style="color: black;">Parse</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;&quot;&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;parent id=&quot;top&quot;&gt;&lt;child1 name=&quot;paul&quot;&gt;Text goes here&lt;/child1&gt;
&lt;child2 name=&quot;fred&quot;&gt;More text&lt;/child2&gt;
&lt;/parent&gt;&quot;&quot;&quot;</span>, <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span></pre></div></div>

<p>The output:</p>
<pre>
Start element: parent {'id': 'top'}
Start element: child1 {'name': 'paul'}
Character data: 'Text goes here'
End element: child1
Character data: '\n'
Start element: child2 {'name': 'fred'}
Character data: 'More text'
End element: child2
Character data: '\n'
End element: parent
</pre>
]]></content:encoded>
			<wfw:commentRss>http://silveiraneto.net/2009/12/25/python-fast-xml-parsing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

