<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://www.zopatista.com/atom.xml" rel="self" type="application/atom+xml" /><link href="https://www.zopatista.com/" rel="alternate" type="text/html" /><updated>2026-04-11T18:23:00+01:00</updated><id>https://www.zopatista.com/atom.xml</id><title type="html">Martijn Pieters</title><subtitle>Martijn Pieters — Software Architect and Python mentor</subtitle><author><name>Martijn Pieters</name></author><entry><title type="html">Interview on Distinguished Devs</title><link href="https://www.zopatista.com/python/2019/09/27/distinguished-devs-interview/" rel="alternate" type="text/html" title="Interview on Distinguished Devs" /><published>2019-09-27T01:00:00+01:00</published><updated>2019-09-27T01:00:00+01:00</updated><id>https://www.zopatista.com/python/2019/09/27/distinguished-devs-interview</id><content type="html" xml:base="https://www.zopatista.com/python/2019/09/27/distinguished-devs-interview/"><![CDATA[<p>Recently, <a href="https://dev.to/bengineer">Ben James</a> reached out to me for an interview, to appear on his <a href="https://dev.to/bengineer/elite-devs-0-introduction-247"><em>Distinguished Devs</em> interview series</a>.</p>

<p>You can listen to it right here:</p>

<iframe width="100%" height="166" scrolling="no" frameborder="no" allow="autoplay" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/687095827&amp;color=%23355c7a&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false&amp;show_teaser=true"></iframe>

<p>You can read Ben’s take on it as well, on Dev.to: <a href="https://dev.to/bengineer/what-i-learned-from-interviewing-stackoverflow-s-top-python-contributor-martijn-pieters-5dpi"><em>What I learned from interviewing StackOverflow’s top Python contributor, Martijn Pieters</em></a></p>]]></content><author><name>Martijn Pieters</name></author><category term="python" /><category term="podcast" /><category term="interview" /><category term="stackoverflow" /><summary type="html"><![CDATA[I was interviewed by Ben James for his "Distinguished Devs" podcast series. Read his take on it, and hear me talk about my career.]]></summary></entry><entry><title type="html">Logging in asyncio applications</title><link href="https://www.zopatista.com/python/2019/05/11/asyncio-logging/" rel="alternate" type="text/html" title="Logging in asyncio applications" /><published>2019-05-11T01:00:00+01:00</published><updated>2019-05-11T01:00:00+01:00</updated><id>https://www.zopatista.com/python/2019/05/11/asyncio-logging</id><content type="html" xml:base="https://www.zopatista.com/python/2019/05/11/asyncio-logging/"><![CDATA[<p>So you are building an asyncio-based application, and like most production-quality systems, you need to log events throughout. Normally, you’d reach for the <a href="https://docs.python.org/3/library/logging.html"><code class="language-plaintext highlighter-rouge">logging</code> module</a>, but the logging module uses blocking I/O when emitting records.</p>

<p><em>Or does it?</em></p>

<p>The framework is very flexible in how records are emitted; it is up to the <a href="https://docs.python.org/3/library/logging.handlers.html">logging handlers</a> that you install. And since Python 3.2, an interesting new handler has been included, the <a href="https://docs.python.org/3/library/logging.handlers.html#queuehandler"><code class="language-plaintext highlighter-rouge">QueueHandler</code> class</a>, which comes with a corresponding <a href="https://docs.python.org/3/library/logging.handlers.html#queuelistener"><code class="language-plaintext highlighter-rouge">QueueListener</code> class</a>. These were originally developed to handle logging in the child processes of the <a href="https://docs.python.org/3/library/multiprocessing.html"><code class="language-plaintext highlighter-rouge">multiprocessing</code> library</a>, but are otherwise perfectly usable in an asyncio context.</p>

<p>The <code class="language-plaintext highlighter-rouge">QueueListener</code> class starts its own thread to watch a queue and send records to handlers it manages, so it will not affect the asyncio loop. The <code class="language-plaintext highlighter-rouge">QueueHandler</code> handler implementation simply puts records into the queue you specify, after a minimal clean-up operation to ensure records can be serialised easily. This makes this handler entirely non-blocking.</p>

<h2 id="move-existing-handlers-to-queuelistener">Move existing handlers to <code class="language-plaintext highlighter-rouge">QueueListener</code></h2>

<p>My strategy is simply to move all root handlers to a <code class="language-plaintext highlighter-rouge">QueueListener</code> object before the main asyncio loop starts, and placing a <code class="language-plaintext highlighter-rouge">QueueHandler</code> object in their place. From there on out, all blocking operations in handlers are handled in a separate thread, freeing the asyncio loop from having to wait for log records to written out to files or network sockets.</p>

<p>I do customise the <code class="language-plaintext highlighter-rouge">QueueHandler</code> class a little, but only minimally so: there is no need to prepare records that go into a local, in-process queue, we can skip that process and minimise the cost of logging further:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">asyncio</span>
<span class="kn">import</span> <span class="n">logging</span>
<span class="kn">import</span> <span class="n">logging.handlers</span>
<span class="k">try</span><span class="p">:</span>
    <span class="c1"># Python 3.7 and newer, fast reentrant implementation
</span>    <span class="c1"># without task tracking (not needed for that when logging)
</span>    <span class="kn">from</span> <span class="n">queue</span> <span class="kn">import</span> <span class="n">SimpleQueue</span> <span class="k">as</span> <span class="n">Queue</span>
<span class="k">except</span> <span class="nb">ImportError</span><span class="p">:</span>
    <span class="kn">from</span> <span class="n">queue</span> <span class="kn">import</span> <span class="n">Queue</span>
<span class="kn">from</span> <span class="n">typing</span> <span class="kn">import</span> <span class="n">List</span>


<span class="k">class</span> <span class="nc">LocalQueueHandler</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">handlers</span><span class="p">.</span><span class="n">QueueHandler</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">emit</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">record</span><span class="p">:</span> <span class="n">logging</span><span class="p">.</span><span class="n">LogRecord</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="c1"># Removed the call to self.prepare(), handle task cancellation
</span>        <span class="k">try</span><span class="p">:</span>
            <span class="n">self</span><span class="p">.</span><span class="nf">enqueue</span><span class="p">(</span><span class="n">record</span><span class="p">)</span>
        <span class="k">except</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">CancelledError</span><span class="p">:</span>
            <span class="k">raise</span>
        <span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
            <span class="n">self</span><span class="p">.</span><span class="nf">handleError</span><span class="p">(</span><span class="n">record</span><span class="p">)</span>


<span class="k">def</span> <span class="nf">setup_logging_queue</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="sh">"""</span><span class="s">Move log handlers to a separate thread.

    Replace handlers on the root logger with a LocalQueueHandler,
    and start a logging.QueueListener holding the original
    handlers.

    </span><span class="sh">"""</span>
    <span class="n">queue</span> <span class="o">=</span> <span class="nc">Queue</span><span class="p">()</span>
    <span class="n">root</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">()</span>

    <span class="n">handlers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="n">logging</span><span class="p">.</span><span class="n">Handler</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>

    <span class="n">handler</span> <span class="o">=</span> <span class="nc">LocalQueueHandler</span><span class="p">(</span><span class="n">queue</span><span class="p">)</span>
    <span class="n">root</span><span class="p">.</span><span class="nf">addHandler</span><span class="p">(</span><span class="n">handler</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">h</span> <span class="ow">in</span> <span class="n">root</span><span class="p">.</span><span class="n">handlers</span><span class="p">[:]:</span>
        <span class="k">if</span> <span class="n">h</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">handler</span><span class="p">:</span>
            <span class="n">root</span><span class="p">.</span><span class="nf">removeHandler</span><span class="p">(</span><span class="n">h</span><span class="p">)</span>
            <span class="n">handlers</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">h</span><span class="p">)</span>

    <span class="n">listener</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="n">handlers</span><span class="p">.</span><span class="nc">QueueListener</span><span class="p">(</span>
        <span class="n">queue</span><span class="p">,</span> <span class="o">*</span><span class="n">handlers</span><span class="p">,</span> <span class="n">respect_handler_level</span><span class="o">=</span><span class="bp">True</span>
    <span class="p">)</span>
    <span class="n">listener</span><span class="p">.</span><span class="nf">start</span><span class="p">()</span>
</code></pre></div></div>

<p>You could, of course, configure the <code class="language-plaintext highlighter-rouge">logging</code> module to use the queue handler to begin with, but I find the above pattern to work better when also using the <a href="https://docs.python.org/3/library/logging.config.html"><code class="language-plaintext highlighter-rouge">logging.config</code></a> to handle handler configuration elsewhere. The above can then simply take an already-configured system and make it suitable for an asyncio environment.</p>]]></content><author><name>Martijn Pieters</name></author><category term="python" /><category term="asyncio" /><category term="logging" /><summary type="html"><![CDATA[Strategy to adapt the standard library logging framework for an asyncio app.]]></summary></entry><entry><title type="html">Interview on Talk Python to Me</title><link href="https://www.zopatista.com/python/2016/11/24/talk-python-to-me-interview/" rel="alternate" type="text/html" title="Interview on Talk Python to Me" /><published>2016-11-24T00:00:00+00:00</published><updated>2016-11-24T00:00:00+00:00</updated><id>https://www.zopatista.com/python/2016/11/24/talk-python-to-me-interview</id><content type="html" xml:base="https://www.zopatista.com/python/2016/11/24/talk-python-to-me-interview/"><![CDATA[<p>I was interviewed for the excellent <a href="https://talkpython.fm/"><em>Talk Python to Me</em> podcast</a>, in <a href="https://talkpython.fm/episodes/show/86/python-at-stackoverflow">Episode #86</a> on the subject of Stack Overflow, Python, and my connection to both!</p>

<blockquote>
  <p>If you run into a problem with some API or Python code what do you do to solve it? I personally throw a few keywords into google, sometimes even before checking the full docs.</p>

  <p>Why does this work? Because invariably an excellent conversation and answer from StackOverflow comes back with just what I need.</p>

  <p>This week you’ll meet Martijn Pieters. One of the top Python contributors at StackOverflow with over 16,500 questions answered and a reputation of over 500,000.</p>
</blockquote>

<iframe width="100%" height="166" scrolling="no" frameborder="no" src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/294479325&amp;color=ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false">
</iframe>

<p>Relevant links (apart from my social media links):</p>

<ul>
  <li><a href="https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook">Scaling Mercurial at Facebook</a></li>
</ul>

<p>The specific questions I discussed:</p>

<ul>
  <li><a href="https://stackoverflow.com/q/40226063/100297">Set literal gives different result from set function call</a></li>
  <li><a href="https://stackoverflow.com/q/100003/100297">What is a metaclass in Python?</a></li>
  <li><a href="https://stackoverflow.com/q/23294658/100297">Asking the user for input until they give a valid response</a></li>
  <li><a href="https://stackoverflow.com/q/13905741/100297">Accessing class variables from a list comprehension in the class definition</a></li>
  <li><a href="https://stackoverflow.com/q/19608134/100297">Why is Python 3.x’s super() magic?</a></li>
  <li><a href="https://stackoverflow.com/q/30081275/100297">Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?</a></li>
  <li><a href="https://stackoverflow.com/q/35004162/100297">Why does range(0) == range(2, 2, 2) equal True in Python 3?</a></li>
  <li><a href="https://stackoverflow.com/q/15479928/100297">Why is the order in dictionaries and sets arbitrary?</a></li>
</ul>]]></content><author><name>Martijn Pieters</name></author><category term="python" /><category term="podcast" /><category term="interview" /><category term="stackoverflow" /><summary type="html"><![CDATA[Listen to an interview with me on the Talk Python to Me podcast on the subject of Stack Overflow and, of course, Python.]]></summary></entry><entry><title type="html">Cross-Python metaclasses</title><link href="https://www.zopatista.com/python/2014/03/14/cross-python-metaclasses/" rel="alternate" type="text/html" title="Cross-Python metaclasses" /><published>2014-03-14T00:00:00+00:00</published><updated>2014-03-14T00:00:00+00:00</updated><id>https://www.zopatista.com/python/2014/03/14/cross-python-metaclasses</id><content type="html" xml:base="https://www.zopatista.com/python/2014/03/14/cross-python-metaclasses/"><![CDATA[<p><em>Using a class decorator for applying a metaclass in both Python 2 and 3</em></p>

<p>When you want to create a class including a metaclass, making it compatible with both Python 2 and 3 can be a little tricky.</p>

<p>The excellent <a href="https://six.readthedocs.io/"><code class="language-plaintext highlighter-rouge">six</code> library</a> provides you with a <a href="https://six.readthedocs.io/#six.with_metaclass"><code class="language-plaintext highlighter-rouge">six.with_metaclass()</code> factory function</a> that’ll generate a base class for you from a given metaclass:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">six</span> <span class="kn">import</span> <span class="n">with_metaclass</span>

<span class="k">class</span> <span class="nc">Meta</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span>
    <span class="k">pass</span>

<span class="k">class</span> <span class="nc">Base</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
    <span class="k">pass</span>

<span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="nf">with_metaclass</span><span class="p">(</span><span class="n">Meta</span><span class="p">,</span> <span class="n">Base</span><span class="p">)):</span>
    <span class="k">pass</span>
</code></pre></div></div>

<p>The basic trick is that you can call any metaclass to produce a class for you, given a name, a sequence of baseclasses and the class body. <code class="language-plaintext highlighter-rouge">six</code> produces a <em>new</em>, intermediary base class for you:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="nf">type</span><span class="p">(</span><span class="n">MyClass</span><span class="p">)</span>
<span class="go">&lt;class '__main__.Meta'&gt;
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">MyClass</span><span class="p">.</span><span class="n">__mro__</span>
<span class="go">(&lt;class '__main__.MyClass'&gt;, &lt;class 'six.NewBase'&gt;, &lt;class '__main__.Base'&gt;, &lt;type 'object'&gt;)
</span></code></pre></div></div>

<p>This can complicate your code as for some usecases you now have to account for the extra <code class="language-plaintext highlighter-rouge">six.NewBase</code> baseclass present.</p>

<p>Rather than creating a base class, I’ve come up with a class decorator that replaces any class with one produced from the metaclass, instead:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">with_metaclass</span><span class="p">(</span><span class="n">mcls</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">decorator</span><span class="p">(</span><span class="n">cls</span><span class="p">):</span>
        <span class="n">body</span> <span class="o">=</span> <span class="nf">vars</span><span class="p">(</span><span class="n">cls</span><span class="p">).</span><span class="nf">copy</span><span class="p">()</span>
        <span class="c1"># clean out class body
</span>        <span class="n">body</span><span class="p">.</span><span class="nf">pop</span><span class="p">(</span><span class="sh">'</span><span class="s">__dict__</span><span class="sh">'</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
        <span class="n">body</span><span class="p">.</span><span class="nf">pop</span><span class="p">(</span><span class="sh">'</span><span class="s">__weakref__</span><span class="sh">'</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
        <span class="k">return</span> <span class="nf">mcls</span><span class="p">(</span><span class="n">cls</span><span class="p">.</span><span class="n">__name__</span><span class="p">,</span> <span class="n">cls</span><span class="p">.</span><span class="n">__bases__</span><span class="p">,</span> <span class="n">body</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">decorator</span>
</code></pre></div></div>

<p>which you’d use as:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Meta</span><span class="p">(</span><span class="nb">type</span><span class="p">):</span>
    <span class="k">pass</span>

<span class="k">class</span> <span class="nc">Base</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
    <span class="k">pass</span>

<span class="nd">@with_metaclass</span><span class="p">(</span><span class="n">Meta</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="n">Base</span><span class="p">):</span>
    <span class="k">pass</span>
</code></pre></div></div>

<p>which results in a cleaner MRO:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="nf">type</span><span class="p">(</span><span class="n">MyClass</span><span class="p">)</span>
<span class="go">&lt;class '__main__.Meta'&gt;
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">MyClass</span><span class="p">.</span><span class="n">__mro__</span>
<span class="go">(&lt;class '__main__.MyClass'&gt;, &lt;class '__main__.Base'&gt;, &lt;type 'object'&gt;)
</span></code></pre></div></div>

<h2 id="update">Update</h2>

<p>As it turns out, <a href="https://github.com/benjaminp/six/commit/0163ad03b519fcde529a4473ba712d71a57ac4ba">Jason Coombs took Guido’s time machine</a> and added the same functionality to the <code class="language-plaintext highlighter-rouge">six</code> library last summer. Not only that, he included support for classes with <code class="language-plaintext highlighter-rouge">__slots__</code> in his version. Thanks to <a href="http://kmike.ru/">Mikhail Korobov</a> for pointing this out.</p>

<p>The <code class="language-plaintext highlighter-rouge">six</code> decorator is called <a href="https://six.readthedocs.io/#six.add_metaclass"><code class="language-plaintext highlighter-rouge">@six.add_metaclass()</code></a>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@six.add_metaclass</span><span class="p">(</span><span class="n">Meta</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="n">Base</span><span class="p">):</span>
    <span class="k">pass</span>
</code></pre></div></div>]]></content><author><name>Martijn Pieters</name></author><category term="python" /><category term="metaclass" /><category term="decorator" /><category term="python-2-and-3" /><summary type="html"><![CDATA[Using a class decorator for applying a metaclass in both Python 2 and 3]]></summary></entry><entry><title type="html">Easy in-place file rewriting</title><link href="https://www.zopatista.com/python/2013/11/26/inplace-file-rewriting/" rel="alternate" type="text/html" title="Easy in-place file rewriting" /><published>2013-11-26T00:00:00+00:00</published><updated>2013-11-26T00:00:00+00:00</updated><id>https://www.zopatista.com/python/2013/11/26/inplace-file-rewriting</id><content type="html" xml:base="https://www.zopatista.com/python/2013/11/26/inplace-file-rewriting/"><![CDATA[<p><em>Using a context manager to allow painless rewriting of files</em></p>

<p>Whenever you need to process a file in-place, transforming the contents and writing it out again in the same location, you can reach out for the <a href="http://docs.python.org/2/library/fileinput.html"><code class="language-plaintext highlighter-rouge">fileinput</code> module</a> and use its <code class="language-plaintext highlighter-rouge">inplace</code> option:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">fileinput</span>

<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">fileinput</span><span class="p">.</span><span class="nf">input</span><span class="p">(</span><span class="n">somefilename</span><span class="p">,</span> <span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
    <span class="n">line</span> <span class="o">=</span> <span class="sh">'</span><span class="s">additional information </span><span class="sh">'</span> <span class="o">+</span> <span class="n">line</span><span class="p">.</span><span class="nf">rstrip</span><span class="p">(</span><span class="sh">'</span><span class="se">\n</span><span class="sh">'</span><span class="p">)</span>
    <span class="k">print</span> <span class="n">line</span>
</code></pre></div></div>

<p>There are a few problems with the <code class="language-plaintext highlighter-rouge">fileinput</code> module, however. My biggest nitpick with the module is that it has an API that relies heavily on globals; <code class="language-plaintext highlighter-rouge">fileinput.input()</code> creates a global <a href="http://docs.python.org/2/library/fileinput.html#fileinput.FileInput"><code class="language-plaintext highlighter-rouge">fileinput.FileInput()</code> object</a>, which other functions in the module then access. You can of course ignore all that and reach straight for the <code class="language-plaintext highlighter-rouge">fileinput.FileInput()</code> constructor, but <code class="language-plaintext highlighter-rouge">fileinput.input()</code> is presented as the main API entrypoint.</p>

<p>The other is that the in-place modus hijacks <code class="language-plaintext highlighter-rouge">sys.stdout</code> as the means to write back to the replacement file. Obstensibly this is to make it easy to use a <code class="language-plaintext highlighter-rouge">print</code> statement, but then you have to remember to remove the newline from the lines read from the old file.</p>

<p>Last, but not least, the <a href="http://docs.python.org/3/library/fileinput.html"><code class="language-plaintext highlighter-rouge">fileinput</code> version in the Python 3 standard library</a> does not support specifying an encoding, error mode or newline handling. You can open the input file in binary mode, but output is always handled in text mode. This greatly diminishes the usefulness of this library.</p>

<p>So I wrote my own replacement, using the excellent <a href="http://docs.python.org/2/library/contextlib.html#contextlib.contextmanager"><code class="language-plaintext highlighter-rouge">@contextlib.contextmanager</code> decorator</a>. This version works on both Python 2 and 3, relying on <a href="http://docs.python.org/2/library/io.html#io.open"><code class="language-plaintext highlighter-rouge">io.open()</code></a> to remain compatible between Python versions:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">contextlib</span> <span class="kn">import</span> <span class="n">contextmanager</span>
<span class="kn">import</span> <span class="n">io</span>
<span class="kn">import</span> <span class="n">os</span>


<span class="nd">@contextmanager</span>
<span class="k">def</span> <span class="nf">inplace</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="sh">'</span><span class="s">r</span><span class="sh">'</span><span class="p">,</span> <span class="n">buffering</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">errors</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
            <span class="n">newline</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">backup_extension</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Allow for a file to be replaced with new content.

    yields a tuple of (readable, writable) file objects, where writable
    replaces readable.

    If an exception occurs, the old file is restored, removing the
    written data.

    mode should *not* use </span><span class="sh">'</span><span class="s">w</span><span class="sh">'</span><span class="s">, </span><span class="sh">'</span><span class="s">a</span><span class="sh">'</span><span class="s"> or </span><span class="sh">'</span><span class="s">+</span><span class="sh">'</span><span class="s">; only read-only-modes are supported.

    </span><span class="sh">"""</span>

    <span class="c1"># move existing file to backup, create new file with same permissions
</span>    <span class="c1"># borrowed extensively from the fileinput module
</span>    <span class="k">if</span> <span class="nf">set</span><span class="p">(</span><span class="n">mode</span><span class="p">).</span><span class="nf">intersection</span><span class="p">(</span><span class="sh">'</span><span class="s">wa+</span><span class="sh">'</span><span class="p">):</span>
        <span class="k">raise</span> <span class="nc">ValueError</span><span class="p">(</span><span class="sh">'</span><span class="s">Only read-only file modes can be used</span><span class="sh">'</span><span class="p">)</span>

    <span class="n">backupfilename</span> <span class="o">=</span> <span class="n">filename</span> <span class="o">+</span> <span class="p">(</span><span class="n">backup_extension</span> <span class="ow">or</span> <span class="n">os</span><span class="p">.</span><span class="n">extsep</span> <span class="o">+</span> <span class="sh">'</span><span class="s">bak</span><span class="sh">'</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="n">os</span><span class="p">.</span><span class="nf">unlink</span><span class="p">(</span><span class="n">backupfilename</span><span class="p">)</span>
    <span class="k">except</span> <span class="n">os</span><span class="p">.</span><span class="n">error</span><span class="p">:</span>
        <span class="k">pass</span>
    <span class="n">os</span><span class="p">.</span><span class="nf">rename</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">backupfilename</span><span class="p">)</span>
    <span class="n">readable</span> <span class="o">=</span> <span class="n">io</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">backupfilename</span><span class="p">,</span> <span class="n">mode</span><span class="p">,</span> <span class="n">buffering</span><span class="o">=</span><span class="n">buffering</span><span class="p">,</span>
                       <span class="n">encoding</span><span class="o">=</span><span class="n">encoding</span><span class="p">,</span> <span class="n">errors</span><span class="o">=</span><span class="n">errors</span><span class="p">,</span> <span class="n">newline</span><span class="o">=</span><span class="n">newline</span><span class="p">)</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="n">perm</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="nf">fstat</span><span class="p">(</span><span class="n">readable</span><span class="p">.</span><span class="nf">fileno</span><span class="p">()).</span><span class="n">st_mode</span>
    <span class="k">except</span> <span class="nb">OSError</span><span class="p">:</span>
        <span class="n">writable</span> <span class="o">=</span> <span class="nf">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="sh">'</span><span class="s">w</span><span class="sh">'</span> <span class="o">+</span> <span class="n">mode</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="sh">'</span><span class="s">r</span><span class="sh">'</span><span class="p">,</span> <span class="sh">''</span><span class="p">),</span>
                        <span class="n">buffering</span><span class="o">=</span><span class="n">buffering</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="n">encoding</span><span class="p">,</span> <span class="n">errors</span><span class="o">=</span><span class="n">errors</span><span class="p">,</span>
                        <span class="n">newline</span><span class="o">=</span><span class="n">newline</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">os_mode</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">O_CREAT</span> <span class="o">|</span> <span class="n">os</span><span class="p">.</span><span class="n">O_WRONLY</span> <span class="o">|</span> <span class="n">os</span><span class="p">.</span><span class="n">O_TRUNC</span>
        <span class="k">if</span> <span class="nf">hasattr</span><span class="p">(</span><span class="n">os</span><span class="p">,</span> <span class="sh">'</span><span class="s">O_BINARY</span><span class="sh">'</span><span class="p">):</span>
            <span class="n">os_mode</span> <span class="o">|=</span> <span class="n">os</span><span class="p">.</span><span class="n">O_BINARY</span>
        <span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">os_mode</span><span class="p">,</span> <span class="n">perm</span><span class="p">)</span>
        <span class="n">writable</span> <span class="o">=</span> <span class="n">io</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="sh">"</span><span class="s">w</span><span class="sh">"</span> <span class="o">+</span> <span class="n">mode</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="sh">'</span><span class="s">r</span><span class="sh">'</span><span class="p">,</span> <span class="sh">''</span><span class="p">),</span> <span class="n">buffering</span><span class="o">=</span><span class="n">buffering</span><span class="p">,</span>
                           <span class="n">encoding</span><span class="o">=</span><span class="n">encoding</span><span class="p">,</span> <span class="n">errors</span><span class="o">=</span><span class="n">errors</span><span class="p">,</span> <span class="n">newline</span><span class="o">=</span><span class="n">newline</span><span class="p">)</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="k">if</span> <span class="nf">hasattr</span><span class="p">(</span><span class="n">os</span><span class="p">,</span> <span class="sh">'</span><span class="s">chmod</span><span class="sh">'</span><span class="p">):</span>
                <span class="n">os</span><span class="p">.</span><span class="nf">chmod</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">perm</span><span class="p">)</span>
        <span class="k">except</span> <span class="nb">OSError</span><span class="p">:</span>
            <span class="k">pass</span>
    <span class="k">try</span><span class="p">:</span>
        <span class="k">yield</span> <span class="n">readable</span><span class="p">,</span> <span class="n">writable</span>
    <span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
        <span class="c1"># move backup back
</span>        <span class="k">try</span><span class="p">:</span>
            <span class="n">os</span><span class="p">.</span><span class="nf">unlink</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
        <span class="k">except</span> <span class="n">os</span><span class="p">.</span><span class="n">error</span><span class="p">:</span>
            <span class="k">pass</span>
        <span class="n">os</span><span class="p">.</span><span class="nf">rename</span><span class="p">(</span><span class="n">backupfilename</span><span class="p">,</span> <span class="n">filename</span><span class="p">)</span>
        <span class="k">raise</span>
    <span class="k">finally</span><span class="p">:</span>
        <span class="n">readable</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>
        <span class="n">writable</span><span class="p">.</span><span class="nf">close</span><span class="p">()</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="n">os</span><span class="p">.</span><span class="nf">unlink</span><span class="p">(</span><span class="n">backupfilename</span><span class="p">)</span>
        <span class="k">except</span> <span class="n">os</span><span class="p">.</span><span class="n">error</span><span class="p">:</span>
            <span class="k">pass</span>
</code></pre></div></div>

<p>This context manager deliberately focuses on just <em>one</em> file, and ignores <code class="language-plaintext highlighter-rouge">sys.stdin</code>, unlike the <code class="language-plaintext highlighter-rouge">fileinput</code> module. It is aimed squarly at just replacing a file in-place.</p>

<p>Usage example, in Python 2, with the CSV module:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">csv</span>

<span class="k">with</span> <span class="nf">inplace</span><span class="p">(</span><span class="n">csvfilename</span><span class="p">,</span> <span class="sh">'</span><span class="s">rb</span><span class="sh">'</span><span class="p">)</span> <span class="nf">as </span><span class="p">(</span><span class="n">infh</span><span class="p">,</span> <span class="n">outfh</span><span class="p">):</span>
    <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="p">.</span><span class="nf">reader</span><span class="p">(</span><span class="n">infh</span><span class="p">)</span>
    <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="p">.</span><span class="nf">writer</span><span class="p">(</span><span class="n">outfh</span><span class="p">)</span>

    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
        <span class="n">row</span> <span class="o">+=</span> <span class="p">[</span><span class="sh">'</span><span class="s">new</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">columns</span><span class="sh">'</span><span class="p">]</span>
        <span class="n">writer</span><span class="p">.</span><span class="nf">writerow</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
</code></pre></div></div>

<p>and the Python 3 version:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">csv</span>

<span class="k">with</span> <span class="nf">inplace</span><span class="p">(</span><span class="n">csvfilename</span><span class="p">,</span> <span class="sh">'</span><span class="s">r</span><span class="sh">'</span><span class="p">,</span> <span class="n">newline</span><span class="o">=</span><span class="sh">''</span><span class="p">)</span> <span class="nf">as </span><span class="p">(</span><span class="n">infh</span><span class="p">,</span> <span class="n">outfh</span><span class="p">):</span>
    <span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="p">.</span><span class="nf">reader</span><span class="p">(</span><span class="n">infh</span><span class="p">)</span>
    <span class="n">writer</span> <span class="o">=</span> <span class="n">csv</span><span class="p">.</span><span class="nf">writer</span><span class="p">(</span><span class="n">outfh</span><span class="p">)</span>

    <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
        <span class="n">row</span> <span class="o">+=</span> <span class="p">[</span><span class="sh">'</span><span class="s">new</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">columns</span><span class="sh">'</span><span class="p">]</span>
        <span class="n">writer</span><span class="p">.</span><span class="nf">writerow</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
</code></pre></div></div>]]></content><author><name>Martijn Pieters</name></author><category term="python" /><category term="in-place" /><category term="contextmanager" /><category term="file-io" /><summary type="html"><![CDATA[Using a context manager to allow painless rewriting of files]]></summary></entry><entry><title type="html">Portlets as ESI include</title><link href="https://www.zopatista.com/plone/2012/06/14/portlets-as-esi-include/" rel="alternate" type="text/html" title="Portlets as ESI include" /><published>2012-06-14T00:00:00+01:00</published><updated>2012-06-14T00:00:00+01:00</updated><id>https://www.zopatista.com/plone/2012/06/14/portlets-as-esi-include</id><content type="html" xml:base="https://www.zopatista.com/plone/2012/06/14/portlets-as-esi-include/"><![CDATA[<p><em>Using <abbr title="Edge Side Includes">ESI</abbr> includes to cache Plone portlets separately.</em></p>

<p>To help with making a large and busy intranet website perform better, we’ve used a light sprinkling of <a href="https://en.wikipedia.org/wiki/Edge_Side_Includes"><abbr title="Edge Side Includes">ESI</abbr></a> (via <a href="https://www.varnish-cache.org/trac/wiki/ESIfeatures">Varnish’s <abbr title="Edge Side Includes">ESI</abbr> support</a>) to improve the cacheabilibty of pages in the site. By delegating assembly of parts of the page to the Varnish cache, pages become much more cacheable as frequently changing chunks such as the personal bar at the top are requested separately.</p>

<h2 id="portlets-via-esi-include">Portlets via <abbr title="Edge Side Includes">ESI</abbr> include</h2>

<p>One such chunk we separated out is the right-hand portlets column. Varnish has been configured to set a special header so that we can detect that <abbr title="Edge Side Includes">ESI</abbr> is supported:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sub</span> <span class="n">vcl_recv</span> <span class="p">{</span>
    <span class="p">...</span>
    <span class="cp"># Indicate that a varnish capable of doing ESI is in front...
</span>    <span class="n">set</span> <span class="n">req</span><span class="p">.</span><span class="n">http</span><span class="p">.</span><span class="n">X</span><span class="o">-</span><span class="n">ESI</span> <span class="o">=</span> <span class="s">"esi"</span><span class="p">;</span>
    <span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Using this header we can then conditionally swap out the portlets column with an <code class="language-plaintext highlighter-rouge">&lt;esi:include&gt;</code> statement; this makes site development much easier as we do not have to run Varnish just to see the site working. Here is the relevant section from the <code class="language-plaintext highlighter-rouge">main_template.pt</code> file:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;td</span> <span class="na">id=</span><span class="s">"portal-column-two"</span>
    <span class="na">metal:define-slot=</span><span class="s">"column_two_slot"</span>
    <span class="na">tal:condition=</span><span class="s">"sr"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"visualPadding"</span>
       <span class="na">tal:define=</span><span class="s">"
           esi_header request/HTTP_X_ESI | nothing;
           base context/@@plone_context_state/current_base_url | nothing;
           location python:base and base.rstrip('/').split('/')[-1].lstrip('@');
           esi python:esi_header and (location not in (
               'manage-portlets', 'manage-content-type-portlets'));
           queryString request/QUERY_STRING;
           queryString python: queryString and '?' + queryString or '';
           "</span><span class="nt">&gt;</span>
    <span class="nt">&lt;metal:portlets</span> <span class="na">define-slot=</span><span class="s">"portlets_two_slot"</span><span class="nt">&gt;</span>
      <span class="nt">&lt;esi:include</span> <span class="na">tal:condition=</span><span class="s">"esi"</span>
          <span class="na">tal:attributes=</span><span class="s">"src string:${context/absolute_url}/@@right-column${queryString}"</span> <span class="nt">/&gt;</span>
      <span class="nt">&lt;tal:noesi</span> <span class="na">condition=</span><span class="s">"not: esi"</span>
                 <span class="na">replace=</span><span class="s">"structure provider:plone.rightcolumn"</span> <span class="nt">/&gt;</span>
    <span class="nt">&lt;/metal:portlets&gt;</span>
    <span class="ni">&amp;nbsp;</span>
  <span class="nt">&lt;/div&gt;</span>
<span class="nt">&lt;/td&gt;</span>
</code></pre></div></div>

<p>Note that we are making sure that <abbr title="Edge Side Includes">ESI</abbr> is also not applied when using the portlet management views.</p>

<p>The <code class="language-plaintext highlighter-rouge">@@right-column</code> view is simply a template:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;html</span> <span class="na">tal:omit-tag=</span><span class="s">""</span><span class="nt">&gt;</span>
<span class="nt">&lt;body</span> <span class="na">tal:omit-tag=</span><span class="s">""</span><span class="nt">&gt;</span>

<span class="nt">&lt;tal:block</span> <span class="na">replace=</span><span class="s">"structure provider:plone.rightcolumn"</span> <span class="nt">/&gt;</span>

<span class="nt">&lt;/body&gt;</span>
<span class="nt">&lt;/html&gt;</span>
</code></pre></div></div>

<p>This whole setup was working swimmingly; we could cache pages for extended periods of times with things like the portlets updating much more frequently and with caching keyed to specific groups of users.</p>

<h2 id="where-did-that-portlet-go">Where did that portlet go?</h2>

<p>This being a large and complex intranet, it took some time for someone to notice that some lightly-used portlets were no longer showing up. These were portlets that depend on certain content being there, so their absence was not necessarily a problem. However, it was becoming clear that even when their specific conditions were being met, they were not being rendered still. This was quickly narrowed down to the <abbr title="Edge Side Includes">ESI</abbr>-included portlet rendering; if you bypassed the cache the portlets would show up.</p>

<p>So what went wrong?</p>

<p>Portlets are essentially rendered as part of the Zope viewlet framework. Viewlets are snippets of page output that are looked up by a key consisting of the current context, the current request, the current view and the viewlet manager. Portlets thus have access to these same pieces of information, and you can thus register portlets that only show for certain contexts (particular content types, marker interfaces, etc.), browser layers (usually themes), and even only for specific views or portlet managers (tying the portlet to the left, right or dashboard portlet wells).</p>

<p>With the lesser-known <a href="https://github.com/plone/plone.app.portlets/blob/7a6303400b4ecf7595fb21ec9c43b38b31fb8aca/plone/app/portlets/metadirectives.py#L67"><code class="language-plaintext highlighter-rouge">&lt;plone:portletRenderer /&gt;</code> directive</a>, you can also vary the way portlets are rendered for the above keys. Thus, a portlet can look different in different themes, different portlet managers, or when a certain extra marker interface is present on your content objects. This is what had happened to the vanished portlets here; they had been tied to specific <em>views</em>:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;configure</span>
    <span class="na">xmlns=</span><span class="s">"http://namespaces.zope.org/zope"</span>
    <span class="na">xmlns:plone=</span><span class="s">"http://namespaces.plone.org/plone"</span>
    <span class="nt">/&gt;</span>

  <span class="nt">&lt;plone:portlet</span>
    <span class="na">name=</span><span class="s">"foobar.portlets.localcalendar"</span>
    <span class="na">interface=</span><span class="s">".localportlet.ILocalCalendarPortlet"</span>
    <span class="na">assignment=</span><span class="s">".localportlet.Assignment"</span>
    <span class="na">renderer=</span><span class="s">".localportlet.Hidden"</span>
    <span class="na">addview=</span><span class="s">".localportlet.AddForm"</span>
    <span class="nt">/&gt;</span>

  <span class="c">&lt;!-- My HQ page --&gt;</span>
  <span class="nt">&lt;plone:portletRenderer</span>
    <span class="na">portlet=</span><span class="s">".localportlet.ILocalCalendarPortlet"</span>
    <span class="na">class=</span><span class="s">".localportlet.Renderer"</span>
    <span class="na">view=</span><span class="s">"foobar.types.browser.mychain.MyChainView"</span>
    <span class="nt">/&gt;</span>

  <span class="c">&lt;!-- My Store page --&gt;</span>
  <span class="nt">&lt;plone:portletRenderer</span>
    <span class="na">portlet=</span><span class="s">".localportlet.ILocalCalendarPortlet"</span>
    <span class="na">class=</span><span class="s">".localportlet.Renderer"</span>
    <span class="na">view=</span><span class="s">"foobar.types.browser.store.StoreView"</span>
    <span class="nt">/&gt;</span>
<span class="nt">&lt;/configure&gt;</span>
</code></pre></div></div>

<p>The above <code class="language-plaintext highlighter-rouge">plone:portlet</code> declaration registers a portlet that is hidden by default. The two <code class="language-plaintext highlighter-rouge">plone:portletRenderer</code> declarations then assign new renderers when certain views are being used instead. This neat trick allows for the portlet to be targeted very specifically.</p>

<p>This all works great, unless you use a dedicated view for <abbr title="Edge Side Includes">ESI</abbr> rendering of the portlets. Suddenly the current view is no longer <code class="language-plaintext highlighter-rouge">MyChainView</code> or <code class="language-plaintext highlighter-rouge">StoreView</code>, but rather <code class="language-plaintext highlighter-rouge">@@right-column</code>. Thus the dedicated renderer is skipped in favour of the <code class="language-plaintext highlighter-rouge">.localportlet.Hidden</code> renderer, which does what it says on the tin: not render.</p>

<h2 id="reconstruct-the-whole-context">Reconstruct the <em>whole</em> context</h2>

<p>The solution is of course to reconstruct the whole context; the <code class="language-plaintext highlighter-rouge">@@right-column</code> view already had most things right, only the current view is wrong. With a simple set of TAL declarations we can set up a new value for the <code class="language-plaintext highlighter-rouge">view</code> variable when rendering the portlets. Here is the reworked <code class="language-plaintext highlighter-rouge">main_template.pt</code> code:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;td</span> <span class="na">id=</span><span class="s">"portal-column-two"</span>
    <span class="na">metal:define-slot=</span><span class="s">"column_two_slot"</span>
    <span class="na">tal:condition=</span><span class="s">"sr"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"visualPadding"</span>
       <span class="na">tal:define=</span><span class="s">"
           esi_header request/HTTP_X_ESI | nothing;
           base context/@@plone_context_state/current_base_url | nothing;
           location python:base and base.rstrip('/').split('/')[-1].lstrip('@');
           esi python:esi_header and (location not in ('manage-portlets', 'manage-content-type-portlets'));
           viewContext string:?__view_context=${view/__name__};
           queryString request/QUERY_STRING;
           queryString python: queryString and viewContext + '&amp;amp;' + queryString or viewContext;
                   "</span><span class="nt">&gt;</span>
    <span class="nt">&lt;metal:portlets</span> <span class="na">define-slot=</span><span class="s">"portlets_two_slot"</span><span class="nt">&gt;</span>
      <span class="nt">&lt;esi:include</span> <span class="na">tal:condition=</span><span class="s">"esi"</span>
                   <span class="na">tal:attributes=</span><span class="s">"src string:${context/absolute_url}/@@right-column${queryString}"</span> <span class="nt">/&gt;</span>
      <span class="nt">&lt;tal:noesi</span> <span class="na">condition=</span><span class="s">"not: esi"</span>
                 <span class="na">replace=</span><span class="s">"structure provider:plone.rightcolumn"</span> <span class="nt">/&gt;</span>
    <span class="nt">&lt;/metal:portlets&gt;</span>
    <span class="ni">&amp;nbsp;</span>
  <span class="nt">&lt;/div&gt;</span>
<span class="nt">&lt;/td&gt;</span>
</code></pre></div></div>

<p>We use a GET parameter to pass along the name of the view to look up; I’ve used a double-underscore prefix here to reduce the chances we clash with a query string parameter used elsewhere in the site. The <code class="language-plaintext highlighter-rouge">@@right-column</code> view then restores this view for portlet rendering (with a fallback to the Plone default view context <code class="language-plaintext highlighter-rouge">@@plone</code>):</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;html</span> <span class="na">tal:omit-tag=</span><span class="s">""</span><span class="nt">&gt;</span>
<span class="nt">&lt;body</span> <span class="na">tal:omit-tag=</span><span class="s">""</span><span class="nt">&gt;</span>

<span class="nt">&lt;tal:block</span>
    <span class="na">define=</span><span class="s">"viewname request/__view_context | nothing;
            viewname python:viewname and '@@' + viewname or '@@plone';
            view nocall:context/?viewname"</span>
	<span class="na">replace=</span><span class="s">"structure provider:plone.rightcolumn"</span> <span class="nt">/&gt;</span>

<span class="nt">&lt;/body&gt;</span>
<span class="nt">&lt;/html&gt;</span>
</code></pre></div></div>

<p>Et voilà, our portlets are showing up good and proper again.</p>]]></content><author><name>Martijn Pieters</name></author><category term="plone" /><category term="ESI" /><category term="caching" /><category term="portlets" /><summary type="html"><![CDATA[Using ESI includes to cache Plone portlets separately.]]></summary></entry><entry><title type="html">Unicode in RTF documents</title><link href="https://www.zopatista.com/python/2012/06/06/rtf-and-unicode/" rel="alternate" type="text/html" title="Unicode in RTF documents" /><published>2012-06-06T00:00:00+01:00</published><updated>2012-06-06T00:00:00+01:00</updated><id>https://www.zopatista.com/python/2012/06/06/rtf-and-unicode</id><content type="html" xml:base="https://www.zopatista.com/python/2012/06/06/rtf-and-unicode/"><![CDATA[<p><em>How to encode unicode codepoints in <abbr title="Rich Text Format">RTF</abbr> documents using PyRTF.</em></p>

<p>Some time ago I had to output some nicely formatted reports from a web application, to be usable offline by Windows users. Naturally, I used the aging but still reliable <a href="https://pypi.python.org/pypi/PyRTF"><code class="language-plaintext highlighter-rouge">PyRTF</code> module</a> to generate <abbr title="Rich Text Format">RTF</abbr> documents with headers, tables, and a consistent style.</p>

<p>As my application users are mostly Norwegians, however, I quickly discovered that the <code class="language-plaintext highlighter-rouge">PyRTF</code> module does not handle international characters (i.e. anything outside the <abbr title="American Standard Code for Information Interchange">ASCII</abbr> codepoints), at all. There is no unicode support at all (it has been on the TODO list since forever), let alone converting unicode codepoints to whatever <abbr title="Rich Text Format">RTF</abbr> uses to represent international characters.</p>

<p>Recently, a <a href="http://stackoverflow.com/q/10852810/100297">Stack Overflow question</a> reminded me of how I solved this problem at the time, and clearly this question has <a href="http://stackoverflow.com/q/9908647/100297">come</a> <a href="https://groups.google.com/forum/?fromgroups#!topic/django-users/gZH1mnBfgoI">up</a> <a href="http://web.archive.org/web/20120812232813/http://osdir.com/ml/web2py/2010-03/msg01045.html">before</a>. Because some approaches I’ve seen can actually produce incorrect or overly verbose output (including <a href="https://code.google.com/p/pyrtf-ng/">pyrtf-ng</a>), I wanted to explain and expand on my solution to provide a definitive answer to the problem, and also see how my original method faired in terms of speed.</p>

<h2 id="so-does-rtf-handle-unicode">So <em>does</em> <abbr title="Rich Text Format">RTF</abbr> handle unicode?</h2>

<p>Since PyRTF doesn’t filter the text you add to a document at all we can just encode unicode strings ourselves. Lucky for me, the <a href="https://en.wikipedia.org/wiki/Rich_Text_Format">Wikipidia entry on <abbr title="Rich Text Format">RTF</abbr></a> has a fairly detailed section on how <a href="https://en.wikipedia.org/wiki/Rich_Text_Format#Character_encoding"><abbr title="Rich Text Format">RTF</abbr> handles characters outside the <abbr title="American Standard Code for Information Interchange">ASCII</abbr> range</a>. Together with the <a href="http://www.boumphrey.com/rtf/rtfspec.pdf">published <abbr title="Rich Text Format">RTF</abbr> 1.9.1 specification</a> (PDF) there is plenty of information on how to encode unicode codepoints to <abbr title="Rich Text Format">RTF</abbr> control sequences.</p>

<p>There basically are two choices:</p>

<ul>
  <li>
    <p>The <code class="language-plaintext highlighter-rouge">\'hh</code> control sequence; a backslash and single quote, followed by an 8-bit hexadecimal value. The value is interpreted as a code-point in a Windows codepage, limiting it’s use. You <em>can</em> assign different codepages to different fonts, but you still cannot use the full range of unicode in a paragraph.</p>
  </li>
  <li>
    <p>The <code class="language-plaintext highlighter-rouge">\uN?</code> control sequence; backslash ‘u’ followed by a signed 16-bit integer value in decimal and a placeholder character (represented here by a question mark). The signed 16-bit integer number here is consistent with the <abbr title="Rich Text Format">RTF</abbr> standard for control characters, a value between -32768 and 32767.</p>

    <p>This control sequence <em>can</em> properly represent unicode, at least for the U+0000 through to U+FFFF codepoints. This sequence was introduced in the 1.5 revision of the <abbr title="Rich Text Format">RTF</abbr> spec, in 1997, so it should be widely supported. The placeholder character is meant to be used by readers that do not yet support this escape sequence and should be an <abbr title="American Standard Code for Information Interchange">ASCII</abbr> character closest to the unicode codepoint.</p>
  </li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">\uN?</code> format is the easiest to produce, especially if you ignore the replacement character (just set it to ‘?’ at all times, surely most <abbr title="Rich Text Format">RTF</abbr> readers support the 1.5 <abbr title="Rich Text Format">RTF</abbr> standard by now, it’s been out there for 15 years).</p>

<h2 id="encoding-the-slow-and-incorrect-way">Encoding the slow (and incorrect) way</h2>

<p>A quick search with Google showed me how <a href="https://code.google.com/p/pyrtf-ng/source/browse/trunk/rtfng/Renderer.py?r=81#506">pyrtf-ng encodes unicode points</a>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">writeUnicodeElement</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">element</span><span class="p">):</span>
    <span class="n">text</span> <span class="o">=</span> <span class="sh">''</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="sh">'</span><span class="s">\u%s?</span><span class="sh">'</span> <span class="o">%</span> <span class="nf">str</span><span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">))</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">element</span><span class="p">])</span>
    <span class="n">self</span><span class="p">.</span><span class="nf">_write</span><span class="p">(</span><span class="n">text</span> <span class="ow">or</span> <span class="sh">''</span><span class="p">)</span>
</code></pre></div></div>

<p>Unfortunately, the above snippet does a few things wrong: it uses a control code for <em>every</em> character in the unicode string, producing output that is at least 5 times as long as the input, and it doesn’t produce negative numbers for codepoints over <code class="language-plaintext highlighter-rouge">\u7fff</code>:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">example</span> <span class="o">=</span> <span class="sa">u</span><span class="sh">'</span><span class="s">CJK Ideograph: </span><span class="se">\u8123</span><span class="sh">'</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="sh">''</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="sh">'</span><span class="s">\u%s?</span><span class="sh">'</span> <span class="o">%</span> <span class="nf">str</span><span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">))</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">example</span><span class="p">])</span>
<span class="go">u'\\u67?\\u74?\\u75?\\u32?\\u73?\\u100?\\u101?\\u111?\\u103?\\u114?\\u97?\\u112?\\u104?\\u58?\\u32?\\u33059?'
</span></code></pre></div></div>

<p>A recent <a href="http://stackoverflow.com/a/9912561/100297">Stack Overflow answer</a> improved on this by only encoding characters over <code class="language-plaintext highlighter-rouge">\u007f</code> (decimal 127) but it still iterates over every character in the string to do so:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="sh">''</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="sh">'</span><span class="s">\u%s?</span><span class="sh">'</span> <span class="o">%</span> <span class="nf">str</span><span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">))</span> <span class="k">if</span> <span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">127</span> <span class="k">else</span> <span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">example</span><span class="p">])</span>
<span class="go">u'CJK Ideograph: \\u33059?'
</span></code></pre></div></div>

<p>This outputs unicode because codepoints &lt; 128 are left untouched; numbers are not properly converted to signed shorts either. Here is my variation that remedies these things, and dispenses with the <code class="language-plaintext highlighter-rouge">str()</code> call:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="sh">''</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="sh">'</span><span class="s">\u%i?</span><span class="sh">'</span> <span class="o">%</span> <span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span> <span class="o">&lt;</span> <span class="sa">u</span><span class="sh">'</span><span class="se">\u8000</span><span class="sh">'</span> <span class="k">else</span> <span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="o">-</span> <span class="mi">65536</span><span class="p">)</span> <span class="nf">if </span><span class="p">(</span><span class="n">e</span> <span class="o">&gt;</span> <span class="sh">'</span><span class="se">\x7f</span><span class="sh">'</span> <span class="ow">or</span> <span class="n">e</span> <span class="o">&lt;</span> <span class="sh">'</span><span class="se">\x20</span><span class="sh">'</span> <span class="ow">or</span> <span class="n">e</span> <span class="ow">in</span> <span class="sa">u</span><span class="sh">'</span><span class="se">\\</span><span class="s">{}</span><span class="sh">'</span><span class="p">)</span> <span class="k">else</span> <span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">example</span><span class="p">])</span>
<span class="go">'CJK Ideograph: \\u-32477?'
</span></code></pre></div></div>

<p>This feels like rather a waste to me, and must be slow as well. I wanted to see how my own solution stacks up against the character-by-character, naive implementation.</p>

<h2 id="encoding-the-lazy-way">Encoding the lazy way</h2>

<p>While casting around for my own solution, I also looked into the Python <a href="http://docs.python.org/library/codecs.html"><code class="language-plaintext highlighter-rouge">codecs</code> module</a> to come up with ideas on how to do this more efficiently. Of course, the codecs provided by that module are all implemented in C, but the <code class="language-plaintext highlighter-rouge">unicode_escape</code> codec did produce output quite close to what I needed for <abbr title="Rich Text Format">RTF</abbr>; codepoints between <code class="language-plaintext highlighter-rouge">\u0020</code> and <code class="language-plaintext highlighter-rouge">\u007f</code> are left alone, the rest are encoded to one of the <code class="language-plaintext highlighter-rouge">\xhh</code>, <code class="language-plaintext highlighter-rouge">\uhhhh</code> or <code class="language-plaintext highlighter-rouge">\Uhhhhhhhh</code> 8, 16 or 32-bit escapes (with the exception of <code class="language-plaintext highlighter-rouge">\t</code>, <code class="language-plaintext highlighter-rouge">\n</code> and <code class="language-plaintext highlighter-rouge">\r</code>). Would there be any way to reuse this output?</p>

<p>Well, if you combine this with a bit of <a href="http://docs.python.org/library/re.html#re.sub"><code class="language-plaintext highlighter-rouge">re.sub</code></a> magic, you can in fact produce convincing <abbr title="Rich Text Format">RTF</abbr> command sequences:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="kn">import</span> <span class="n">re</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="kn">import</span> <span class="n">struct</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_charescape</span> <span class="o">=</span> <span class="n">re</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sa">r</span><span class="sh">'</span><span class="s">(?&lt;!\\)\\(?:x([0-9a-fA-F]{2})|u([0-9a-fA-F]{4}))</span><span class="sh">'</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="k">def</span> <span class="nf">_replace_struct</span><span class="p">(</span><span class="n">match</span><span class="p">):</span>
<span class="gp">...</span><span class="w">     </span><span class="n">match</span> <span class="o">=</span> <span class="k">match</span><span class="p">.</span><span class="nf">groups</span><span class="p">()</span>
<span class="gp">...</span><span class="w">     </span><span class="c1"># Convert XX or XXXX hex string into 2 bytes
</span><span class="gp">...</span><span class="w">     </span><span class="n">codepoint</span> <span class="o">=</span> <span class="p">(</span><span class="k">match</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">and</span> <span class="sh">'</span><span class="s">00</span><span class="sh">'</span> <span class="o">+</span> <span class="k">match</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">or</span> <span class="k">match</span><span class="p">[</span><span class="mi">1</span><span class="p">]).</span><span class="nf">decode</span><span class="p">(</span><span class="sh">'</span><span class="s">hex</span><span class="sh">'</span><span class="p">)</span>
<span class="gp">...</span><span class="w">     </span><span class="c1"># Convert 2 bytes into a signed integer, insert into escape sequence
</span><span class="gp">...</span><span class="w">     </span><span class="k">return</span> <span class="sh">'</span><span class="se">\\</span><span class="s">u%i?</span><span class="sh">'</span> <span class="o">%</span> <span class="n">struct</span><span class="p">.</span><span class="nf">unpack</span><span class="p">(</span><span class="sh">'</span><span class="s">!h</span><span class="sh">'</span><span class="p">,</span> <span class="n">codepoint</span><span class="p">)</span>
<span class="c">... 
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">escaped</span> <span class="o">=</span> <span class="n">example</span><span class="p">.</span><span class="nf">encode</span><span class="p">(</span><span class="sh">'</span><span class="s">unicode_escape</span><span class="sh">'</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">escaped</span>
<span class="go">'CJK Ideograph: \\u8123'
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_charescape</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="n">_replace_struct</span><span class="p">,</span> <span class="n">escaped</span><span class="p">)</span>
<span class="go">'CJK Ideograph: \\u-32477?'
</span></code></pre></div></div>

<p>Using the <a href="http://docs.python.org/library/struct.html"><code class="language-plaintext highlighter-rouge">struct</code> module</a> gave me a quick means to re-interpret the hexadecimal notation as produced by the <code class="language-plaintext highlighter-rouge">unicode_escape</code> format as a signed short, but I did have to make sure there were 2 bytes at all times.</p>

<p>Of course, the above trick does not handle newlines, returns or tabs (<code class="language-plaintext highlighter-rouge">\n</code>, <code class="language-plaintext highlighter-rouge">\r</code> and <code class="language-plaintext highlighter-rouge">\t</code> respectively) correctly, nor does it escape existing backslashes yet, but I hoped back when that this proof of concept should operate several orders of a magnitude faster than the naive character-by-character method when dealing with mostly-<abbr title="American Standard Code for Information Interchange">ASCII</abbr> input; most of the work is done in C by the <code class="language-plaintext highlighter-rouge">codecs</code> and <code class="language-plaintext highlighter-rouge">re</code> modules, after all.</p>

<p>So this time around I decided to time these:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="kn">import</span> <span class="n">timeit</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="k">def</span> <span class="nf">test1</span><span class="p">():</span> <span class="sh">''</span><span class="p">.</span><span class="nf">join</span><span class="p">([</span><span class="sh">'</span><span class="s">\u%i?</span><span class="sh">'</span> <span class="o">%</span> <span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span> <span class="o">&lt;</span> <span class="sa">u</span><span class="sh">'</span><span class="se">\u8000</span><span class="sh">'</span> <span class="k">else</span> <span class="nf">ord</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="o">-</span> <span class="mi">65536</span><span class="p">)</span> <span class="nf">if </span><span class="p">(</span><span class="n">e</span> <span class="o">&gt;</span> <span class="sh">'</span><span class="se">\x7f</span><span class="sh">'</span> <span class="ow">or</span> <span class="n">e</span> <span class="o">&lt;</span> <span class="sh">'</span><span class="se">\x20</span><span class="sh">'</span> <span class="ow">or</span> <span class="n">e</span> <span class="ow">in</span> <span class="sa">u</span><span class="sh">'</span><span class="se">\\</span><span class="s">{}</span><span class="sh">'</span><span class="p">)</span> <span class="k">else</span> <span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">testdocument</span><span class="p">])</span>
<span class="c">... 
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="k">def</span> <span class="nf">test2</span><span class="p">():</span> <span class="n">_charescape</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="n">_replace_struct</span><span class="p">,</span> <span class="n">testdocument</span><span class="p">.</span><span class="nf">encode</span><span class="p">(</span><span class="sh">'</span><span class="s">unicode_escape</span><span class="sh">'</span><span class="p">))</span>
<span class="c">... 
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">declaration</span> <span class="o">=</span> <span class="p">(</span>
<span class="gp">...</span><span class="w">     </span><span class="sa">u</span><span class="sh">'</span><span class="s">Alle mennesker er f</span><span class="se">\xf8</span><span class="s">dt frie og med samme menneskeverd og menneskerettigheter. </span><span class="sh">'</span>
<span class="gp">...</span><span class="w">     </span><span class="sa">u</span><span class="sh">'</span><span class="s">De er utstyrt med fornuft og samvittighet og b</span><span class="se">\xf8</span><span class="s">r handle mot hverandre i brorskapets </span><span class="se">\xe5</span><span class="s">nd.</span><span class="sh">'</span>
<span class="gp">...</span><span class="w"> </span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">testdocument</span> <span class="o">=</span> <span class="n">declaration</span> <span class="o">*</span> <span class="mi">100</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">timeit</span><span class="p">.</span><span class="nf">timeit</span><span class="p">(</span><span class="sh">'</span><span class="s">test1()</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">from __main__ import test1</span><span class="sh">'</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="go">5.982733964920044
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">timeit</span><span class="p">.</span><span class="nf">timeit</span><span class="p">(</span><span class="sh">'</span><span class="s">test2()</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">from __main__ import test2</span><span class="sh">'</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="go">1.4459600448608398
</span></code></pre></div></div>

<p>Cool, so my hybrid encode plus regular-expression based solution looks to be around 4 times as fast, at least when it comes to simple Norwegian text with a handful of latin-1 characters, my most common case. Note however that I am not handling the <abbr title="Rich Text Format">RTF</abbr> escape characters properly, nor are the <code class="language-plaintext highlighter-rouge">\n</code>, <code class="language-plaintext highlighter-rouge">\r</code> and <code class="language-plaintext highlighter-rouge">\t</code> characters handled correctly.</p>

<h2 id="can-i-do-better">Can I do better?</h2>

<p>But I am actually being too clever by half (read: pretty dumb really); why did I encode to <code class="language-plaintext highlighter-rouge">unicode_escape</code> in the first place? I was still in the process of fully understanding the issues and saw a shortcut. My regular expression isn’t particularly clever, I dabbled with the struct module to get my signed short values, and with all this hocus-pocus I lost sight of the goal: to escape certain classes of characters to <abbr title="Rich Text Format">RTF</abbr> command codes.</p>

<p>But aren’t regular expressions quite good at finding those classes all by themselves? I may as well use a decent expression that selects what needs to be encoded directly:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_charescape_direct</span> <span class="o">=</span> <span class="n">re</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="sa">u</span><span class="sh">'</span><span class="s">([</span><span class="se">\x00</span><span class="s">-</span><span class="se">\x1f\\\\</span><span class="s">{}</span><span class="se">\x80</span><span class="s">-</span><span class="se">\uffff</span><span class="s">])</span><span class="sh">'</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="k">def</span> <span class="nf">_replace_direct</span><span class="p">(</span><span class="n">match</span><span class="p">):</span>
<span class="gp">...</span><span class="w">     </span><span class="n">codepoint</span> <span class="o">=</span> <span class="nf">ord</span><span class="p">(</span><span class="k">match</span><span class="p">.</span><span class="nf">group</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span>
<span class="gp">...</span><span class="w">     </span><span class="k">return</span> <span class="sh">'</span><span class="se">\\</span><span class="s">u%s?</span><span class="sh">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">codepoint</span> <span class="k">if</span> <span class="n">codepoint</span> <span class="o">&lt;</span> <span class="mi">32768</span> <span class="k">else</span> <span class="n">codepoint</span> <span class="o">-</span> <span class="mi">65536</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_charescape_direct</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="n">_replace_direct</span><span class="p">,</span> <span class="n">example</span><span class="p">).</span><span class="nf">encode</span><span class="p">(</span><span class="sh">'</span><span class="s">ascii</span><span class="sh">'</span><span class="p">)</span>
<span class="go">'CJK Ideograph: \\u-32477?'
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="k">def</span> <span class="nf">test3</span><span class="p">():</span> <span class="n">_charescape_direct</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="n">_replace_direct</span><span class="p">,</span> <span class="n">testdocument</span><span class="p">).</span><span class="nf">encode</span><span class="p">(</span><span class="sh">'</span><span class="s">ascii</span><span class="sh">'</span><span class="p">)</span>
<span class="c">...
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">timeit</span><span class="p">.</span><span class="nf">timeit</span><span class="p">(</span><span class="sh">'</span><span class="s">test3()</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">from __main__ import test3</span><span class="sh">'</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="go">0.5356400012969971
</span></code></pre></div></div>

<p>Suddenly we have an 10 times speed increase! Not only that, I am now also properly escaping the three whitespace characers <code class="language-plaintext highlighter-rouge">\n</code>, <code class="language-plaintext highlighter-rouge">\r</code> and <code class="language-plaintext highlighter-rouge">\t</code>, and as an added bonus, the <abbr title="Rich Text Format">RTF</abbr> special characters <code class="language-plaintext highlighter-rouge">\</code>, <code class="language-plaintext highlighter-rouge">{</code> and <code class="language-plaintext highlighter-rouge">}</code> are now also being escaped! I call this a result, and a lesson to learn.</p>

<h2 id="perhaps-we-can-translate-instead">Perhaps we can translate instead</h2>

<p>We could also use a <a href="http://docs.python.org/library/stdtypes.html#str.translate">translation table</a> to do my escaping for me. This is simply a dict that maps unicode codepoints to a replacement value. To create a static dict for all unicode values could be somewhat tricky, requiring either a custom <code class="language-plaintext highlighter-rouge">__missing__</code> method or loading a generated structure on import.</p>

<p>Before digging into clever solutions to that, I should perhaps first test the speed of a simple translation table, one that only covers codepoints up to ‘\u00ff’, or latin-1:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_table</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span> <span class="sa">u</span><span class="sh">"</span><span class="se">\\</span><span class="sh">'</span><span class="s">{0:02x}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">32</span><span class="p">)}</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_table</span><span class="p">.</span><span class="nf">update</span><span class="p">({</span><span class="nf">ord</span><span class="p">(</span><span class="n">c</span><span class="p">):</span> <span class="sa">u</span><span class="sh">"</span><span class="se">\\</span><span class="sh">'</span><span class="s">{0:02x}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">c</span><span class="p">))</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="sh">'</span><span class="se">\\</span><span class="s">{}</span><span class="sh">'</span><span class="p">})</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_table</span><span class="p">.</span><span class="nf">update</span><span class="p">({</span><span class="n">i</span><span class="p">:</span> <span class="sa">u</span><span class="sh">"</span><span class="se">\\</span><span class="s">u{0}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="mi">128</span><span class="p">,</span> <span class="mi">256</span><span class="p">)})</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="nf">len</span><span class="p">(</span><span class="n">_table</span><span class="p">)</span>
<span class="go">163
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">timeit</span><span class="p">.</span><span class="nf">timeit</span><span class="p">(</span><span class="sh">'</span><span class="s">testdocument.translate(_table).encode(</span><span class="sh">"</span><span class="s">ascii</span><span class="sh">"</span><span class="s">)</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">from __main__ import testdocument, _table</span><span class="sh">'</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="go">2.66812801361084
</span></code></pre></div></div>

<p>Unfortunately, using <code class="language-plaintext highlighter-rouge">.translate</code> turns out to be slowing us down considerably. Reducing the table to just a few codepoints doesn’t help either:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_basictable</span> <span class="o">=</span> <span class="p">{</span><span class="nf">ord</span><span class="p">(</span><span class="n">c</span><span class="p">):</span> <span class="sa">u</span><span class="sh">"</span><span class="se">\\</span><span class="sh">'</span><span class="s">{0:02x}</span><span class="sh">"</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="nf">ord</span><span class="p">(</span><span class="n">c</span><span class="p">))</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="sh">'</span><span class="se">\n\r\t\\</span><span class="s">{}</span><span class="sh">'</span><span class="p">}</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="nf">len</span><span class="p">(</span><span class="n">_basictable</span><span class="p">)</span>
<span class="go">6
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">timeit</span><span class="p">.</span><span class="nf">timeit</span><span class="p">(</span><span class="sh">'</span><span class="s">testdocument.translate(_basictable)</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">from __main__ import testdocument, _basictable</span><span class="sh">'</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">500</span><span class="p">)</span>
<span class="go">2.0113179683685303
</span></code></pre></div></div>

<p>So it looks like I might want to avoid using <code class="language-plaintext highlighter-rouge">.translate</code> if at all possible.</p>

<h2 id="worst-case-scenario">Worst-case scenario</h2>

<p>So far, I’ve compared methods by testing them against some Norwegian text, typical of many European languages with a generous helping of <abbr title="American Standard Code for Information Interchange">ASCII</abbr> characters.</p>

<p>To get a more complete picture, I need to test these methods against a worst-case scenario, a UTF-8 encoded test set from a great set of <a href="https://github.com/bits/UTF-8-Unicode-Test-Documents">UTF-8 test documents</a>:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="kn">import</span> <span class="n">urllib</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">utf8_sequence</span> <span class="o">=</span> <span class="n">urllib</span><span class="p">.</span><span class="nf">urlopen</span><span class="p">(</span><span class="sh">'</span><span class="s">https://raw.github.com/bits/UTF-8-Unicode-Test-Documents/master/UTF-8_sequence_unseparated/utf8_sequence_0-0xffff_assigned_printable_unseparated.txt</span><span class="sh">'</span><span class="p">).</span><span class="nf">read</span><span class="p">().</span><span class="nf">decode</span><span class="p">(</span><span class="sh">'</span><span class="s">utf-8</span><span class="sh">'</span><span class="p">)</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="nf">len</span><span class="p">(</span><span class="n">utf8_sequence</span><span class="p">)</span>
<span class="go">58081
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">testdocument</span> <span class="o">=</span> <span class="n">utf8_sequence</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">timeit</span><span class="p">.</span><span class="nf">timeit</span><span class="p">(</span><span class="sh">'</span><span class="s">test1()</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">from __main__ import test1</span><span class="sh">'</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="go">0.7785000801086426
</span><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">timeit</span><span class="p">.</span><span class="nf">timeit</span><span class="p">(</span><span class="sh">'</span><span class="s">test3()</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">from __main__ import test3</span><span class="sh">'</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="go">0.8913929462432861
</span></code></pre></div></div>

<p>Interesting! So in the worse-case scenario, where the vast majority (99.8%) of the text requires encoding, the character-by-character method is actually a little faster again! But this also means that for most cases, where you insert shorter text snippets into an <abbr title="Rich Text Format">RTF</abbr> document, and where a far larger percentage of characters do not need escaping, the regular expression method will beat the character-by-character method hands down.</p>

<h2 id="so-what-about-non-bmp-unicode">So what about non-<abbr title="Basic Multilingual Plane">BMP</abbr> Unicode?</h2>

<p>So far I’ve focused only on characters within the <a href="https://en.wikipedia.org/wiki/Unicode_plane#Basic_Multilingual_Plane"><abbr title="Basic Multilingual Plane">BMP</abbr></a>. You can apparently use a <a href="https://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B10000_to_U.2B10FFFF">UTF-16 surrogate pair</a>, at least according to Wikipedia for codepoints byond the <abbr title="Basic Multilingual Plane">BMP</abbr>. However, the <abbr title="Rich Text Format">RTF</abbr> specification itself is silent on this, and no endian-nes is documented anywhere that I can find. The Microsoft platform uses UTF-16-LE throughout, so perhaps <abbr title="Rich Text Format">RTF</abbr> readers support little-endian surrogate pairs too.</p>

<p>However, I cannot at this time be bothered to extend my encoder to support such codepoints. On a UCS-2-compiled python there is a happy coincidence that codepoints beyond the <abbr title="Basic Multilingual Plane">BMP</abbr> are treated mostly like UTF-16 surrogate pairs anyway, so they are sort-of supported by this method:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">beyond</span> <span class="o">=</span> <span class="sa">u</span><span class="sh">'</span><span class="se">\U00010196</span><span class="sh">'</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">_charescape_direct</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="n">_replace_direct</span><span class="p">,</span> <span class="n">beyond</span><span class="p">).</span><span class="nf">encode</span><span class="p">(</span><span class="sh">'</span><span class="s">ascii</span><span class="sh">'</span><span class="p">)</span>
<span class="go">'\\u-10240?\\u-8810?'
</span></code></pre></div></div>

<p>Note, however, that the first byte is -10240, or <code class="language-plaintext highlighter-rouge">0xd800</code> in unsigned hexadecimal, making this a big-endian encoded surrogate pair. Presumably on Windows that’ll encode the other way around.</p>

<p>On a UCS-4 platform the codepoint will be ignored by the regular expression and the <code class="language-plaintext highlighter-rouge">.encode('ascii')</code> call will raise a UnicodeEncodeError instead.</p>

<p>I am calling this ‘unsupported’ and a day. Suggestions for implementing this in a neat and performant way are welcome!</p>

<h2 id="off-to-pypi-we-go">Off to PyPI we go</h2>

<p>I am quite happy with the simple regular expression method, and prefer it over the character-by-character loop.</p>

<p>So I packaged up my regular expression method as a handy <a href="https://pypi.python.org/pypi/rtfunicode">module on PyPI</a>, complete with Python 2 and 3 support and a miniscule test suite; the <a href="https://github.com/mjpieters/rtfunicode">source code is available on GitHub</a>.</p>

<p>The module in fact registers a new codec, called <code class="language-plaintext highlighter-rouge">rtfunicode</code>, so after you import the package all you need do is use the new codec in the <code class="language-plaintext highlighter-rouge">.encode()</code> method:</p>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="kn">import</span> <span class="n">rtfunicode</span>
<span class="gp">&gt;&gt;&gt;</span><span class="w"> </span><span class="n">declaration</span><span class="p">.</span><span class="nf">encode</span><span class="p">(</span><span class="sh">'</span><span class="s">rtfunicode</span><span class="sh">'</span><span class="p">)</span>
<span class="go">'Alle mennesker er f\\u248?dt frie og med samme menneskeverd og menneskerettigheter. De er utstyrt med fornuft og samvittighet og b\\u248?r handle mot hverandre i brorskapets \\u229?nd.'
</span></code></pre></div></div>

<p>Hopefully it comes in handy for others. Feedback is most welcome, as are patches!</p>]]></content><author><name>Martijn Pieters</name></author><category term="python" /><category term="RTF" /><category term="unicode" /><category term="encoding" /><summary type="html"><![CDATA[How to encode unicode codepoints in RTF documents using PyRTF.]]></summary></entry><entry><title type="html">The dreaded plone.relations IntId KeyError</title><link href="https://www.zopatista.com/plone/2011/06/29/the-dreaded-plone-relations-intid-keyerror/" rel="alternate" type="text/html" title="The dreaded plone.relations IntId KeyError" /><published>2011-06-29T00:00:00+01:00</published><updated>2011-06-29T00:00:00+01:00</updated><id>https://www.zopatista.com/plone/2011/06/29/the-dreaded-plone-relations-intid-keyerror</id><content type="html" xml:base="https://www.zopatista.com/plone/2011/06/29/the-dreaded-plone-relations-intid-keyerror/"><![CDATA[<p><em>When IntIds go missing, the going gets tough. Specifically, plone.app.relations and related packages do not deal gracefully when a relationship source or target is missing. Here is how we clear such broken relationships.</em></p>

<p>We’ve been experimenting with <a href="http://pypi.python.org/pypi/plone.app.relations">plone.app.relations</a> to manage relationships between objects for a few years now. This package uses <a href="http://pypi.python.org/pypi/zc.relationship">zc.relations</a> to lay the links between content items in your site, which in turn relies on <a href="http://pypi.python.org/pypi/zope.app.intid">zope.app.intid</a> to indirectly create those links. Basically, intids are pointers to the real objects and lets you handle the linking efficiently.</p>

<h2 id="water-in-the-bilge">Water in the Bilge</h2>

<p>The relations machinery is not very forgiving if any intid has gone AWOL. Normally, the relations data structures are kept in sync through Zope events, but this doesn’t always work out. In our experience, you can end up with objects and their intids removed, but the relationships pointing to the now-gone intids still in place. When this happens, things break, and you get trackbacks ending in the dreaded <code class="language-plaintext highlighter-rouge">KeyError: &lt;long number&gt;</code> in <code class="language-plaintext highlighter-rouge">getObject</code> of <code class="language-plaintext highlighter-rouge">zope/app/intid/__init__.py</code>. The traceback line before that will be zc/relationship/index.py in the method <code class="language-plaintext highlighter-rouge">resolveToken</code>.</p>

<p>Now, the zc.relations package is very powerful and very, very flexible. This comes at a price, as it’s internal data structures are quite daunting to the uninitiated. If you have to repair these relations and all you have is the missing intid at one end of the relation, it’ll be a long hard slug through a maze of 3 or 4 different packages and opaque TreeSets.</p>

<h2 id="bucket-by-bucket">Bucket by Bucket</h2>

<p>Luckily, we already did the deep code dive for you. The following method, if passed an intid, will find any references to it in the relations data structure and remove these for you:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">plone.relations.interfaces</span> <span class="kn">import</span> <span class="n">IComplexRelationshipContainer</span>
<span class="kn">from</span> <span class="n">zope.app.intid.interfaces</span> <span class="kn">import</span> <span class="n">IIntIds</span>

<span class="k">def</span> <span class="nf">removeKeyErrorRelationship</span><span class="p">(</span><span class="n">iid</span><span class="p">):</span>
    <span class="sh">"""</span><span class="s">Remove all relationships that point to a intid no 
       longer in the site
    </span><span class="sh">"""</span>
    <span class="n">intids</span> <span class="o">=</span> <span class="nf">getUtility</span><span class="p">(</span><span class="n">IIntIds</span><span class="p">)</span>
    <span class="n">relationships</span> <span class="o">=</span> <span class="nf">getUtility</span><span class="p">(</span><span class="n">IComplexRelationshipContainer</span><span class="p">,</span> 
                               <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">relations</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">relIndex</span> <span class="o">=</span> <span class="n">relationships</span><span class="p">.</span><span class="n">relationIndex</span>
    <span class="k">for</span> <span class="n">direction</span> <span class="ow">in</span> <span class="p">(</span><span class="sh">'</span><span class="s">target</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">source</span><span class="sh">'</span><span class="p">):</span>
        <span class="n">data</span> <span class="o">=</span> <span class="n">relIndex</span><span class="p">.</span><span class="n">_name_TO_mapping</span><span class="p">[</span><span class="n">direction</span><span class="p">].</span><span class="nf">get</span><span class="p">(</span><span class="n">iid</span><span class="p">)</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span> <span class="ow">or</span> <span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">value</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
            <span class="k">continue</span> <span class="c1"># Empty set for this direction
</span>        <span class="k">for</span> <span class="n">relid</span> <span class="ow">in</span> <span class="nf">list</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
            <span class="n">keyref</span> <span class="o">=</span> <span class="n">intids</span><span class="p">.</span><span class="n">refs</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">relid</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">keyref</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
                <span class="c1"># Not even the relationship exists anymore
</span>                <span class="n">relIndex</span><span class="p">.</span><span class="nf">_remove</span><span class="p">(</span><span class="n">relid</span><span class="p">,</span> <span class="p">(</span><span class="n">iid</span><span class="p">,),</span> <span class="n">direction</span><span class="p">)</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="n">relation</span> <span class="o">=</span> <span class="n">keyref</span><span class="p">.</span><span class="nb">object</span>
                <span class="k">try</span><span class="p">:</span>
                    <span class="n">relation</span><span class="p">.</span><span class="n">__parent__</span><span class="p">.</span><span class="nf">remove</span><span class="p">(</span><span class="n">relation</span><span class="p">)</span>
                <span class="k">except</span> <span class="nb">AttributeError</span><span class="p">:</span>
                    <span class="c1"># The relation object only exists in the intid utility;
</span>                    <span class="c1"># in this case __parent__ is None.
</span>                    <span class="n">relIndex</span><span class="p">.</span><span class="nf">unindex</span><span class="p">(</span><span class="n">relation</span><span class="p">)</span>
                    <span class="n">relIndex</span><span class="p">.</span><span class="nf">unindex_doc</span><span class="p">(</span><span class="n">relid</span><span class="p">)</span> <span class="c1"># be doubly sure
</span>                    <span class="n">intids</span><span class="p">.</span><span class="nf">unregister</span><span class="p">(</span><span class="n">keyref</span><span class="p">)</span>
</code></pre></div></div>

<p>Note that this method assumes you already have <a href="http://stackoverflow.com/questions/5819978/how-do-i-trigger-portal-quickinstaller-reinstallproducts-form-outside-the-plone-s/5820885#5820885">the local site manager set up properly</a>. This is a great little method to get rid of individual KeyError problems.</p>

<h2 id="man-the-pumps">Man the Pumps</h2>

<p>It would be better, if you could clear out all missing intids from the relations tool altogether, <em>before</em> they become a problem and things fall down. Luckily, there is! The following code will hunt down and remove all missing intids from the tool. Note that it’ll take a while (it’ll scan through two whole relations indexes), so you better sit back and relax while the work is done.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">plone.relations.interfaces</span> <span class="kn">import</span> <span class="n">IComplexRelationshipContainer</span>
<span class="kn">from</span> <span class="n">zope.app.intid.interfaces</span> <span class="kn">import</span> <span class="n">IIntIds</span>
<span class="kn">from</span> <span class="n">BTrees.IOBTree</span> <span class="kn">import</span> <span class="n">difference</span>

<span class="k">def</span> <span class="nf">clearAllMissingLinks</span><span class="p">():</span>
    <span class="sh">"""</span><span class="s">Find and remove all missing intids in the
       relations tool.
    </span><span class="sh">"""</span>
    <span class="n">intids</span> <span class="o">=</span> <span class="nf">getUtility</span><span class="p">(</span><span class="n">IIntIds</span><span class="p">)</span>
    <span class="n">relationships</span> <span class="o">=</span> <span class="nf">getUtility</span><span class="p">(</span><span class="n">IComplexRelationshipContainer</span><span class="p">,</span> 
                               <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">relations</span><span class="sh">'</span><span class="p">)</span>
    <span class="n">relIndex</span> <span class="o">=</span> <span class="n">relationships</span><span class="p">.</span><span class="n">relationIndex</span>
    <span class="n">rtotal</span> <span class="o">=</span> <span class="n">itotal</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">direction</span> <span class="ow">in</span> <span class="p">(</span><span class="sh">'</span><span class="s">target</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">source</span><span class="sh">'</span><span class="p">):</span>
        <span class="n">idx</span> <span class="o">=</span> <span class="n">relIndex</span><span class="p">.</span><span class="n">_name_TO_mapping</span><span class="p">[</span><span class="n">direction</span><span class="p">]</span>
        <span class="k">for</span> <span class="n">iid</span> <span class="ow">in</span> <span class="nf">difference</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">intids</span><span class="p">.</span><span class="n">refs</span><span class="p">):</span>
            <span class="n">itotal</span> <span class="o">+=</span> <span class="mi">1</span>
            <span class="k">for</span> <span class="n">relid</span> <span class="ow">in</span> <span class="nf">list</span><span class="p">(</span><span class="n">idx</span><span class="p">[</span><span class="n">iid</span><span class="p">][</span><span class="mi">1</span><span class="p">]):</span>
                <span class="n">keyref</span> <span class="o">=</span> <span class="n">intids</span><span class="p">.</span><span class="n">refs</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">relid</span><span class="p">)</span>
                <span class="k">if</span> <span class="n">keyref</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
                    <span class="c1"># Not even the relationship exists anymore
</span>                    <span class="n">relIndex</span><span class="p">.</span><span class="nf">_remove</span><span class="p">(</span><span class="n">relid</span><span class="p">,</span> <span class="p">(</span><span class="n">iid</span><span class="p">,),</span> <span class="n">direction</span><span class="p">)</span>
                <span class="k">else</span><span class="p">:</span>
                    <span class="n">relation</span> <span class="o">=</span> <span class="n">keyref</span><span class="p">.</span><span class="nb">object</span>
                    <span class="k">try</span><span class="p">:</span>
                        <span class="n">relation</span><span class="p">.</span><span class="n">__parent__</span><span class="p">.</span><span class="nf">remove</span><span class="p">(</span><span class="n">relation</span><span class="p">)</span>
                    <span class="k">except</span> <span class="nb">AttributeError</span><span class="p">:</span>
                        <span class="c1"># The relation object only exists in the intid utility;
</span>                        <span class="c1"># in this case __parent__ is None.
</span>                        <span class="n">relIndex</span><span class="p">.</span><span class="nf">unindex</span><span class="p">(</span><span class="n">relation</span><span class="p">)</span>
                        <span class="n">relIndex</span><span class="p">.</span><span class="nf">unindex_doc</span><span class="p">(</span><span class="n">relid</span><span class="p">)</span> <span class="c1"># be doubly sure
</span>                        <span class="n">intids</span><span class="p">.</span><span class="nf">unregister</span><span class="p">(</span><span class="n">keyref</span><span class="p">)</span>
                <span class="n">rtotal</span> <span class="o">+=</span> <span class="mi">1</span>
    <span class="k">return</span> <span class="n">itotal</span><span class="p">,</span> <span class="n">rtotal</span>
</code></pre></div></div>

<p>Note that this method returns the total number of intids identified, as well as the total number of relationships removed.</p>

<h2 id="patch-the-leak">Patch the leak?</h2>

<p>Instead of pumping out the water, we should of course patch the leak. We have yet to find it though, but if we do, we’ll make sure the affected packages receive the patch!</p>

<h3 id="april-2012-update-clean-up-methods-fine-tuned"><em>April 2012 Update</em>: clean-up methods fine-tuned.</h3>

<p>I’ve found that in practice some relationships only were still referenced by intid keyrefs and present in the relationships index, but no longer were present in the relationship utility itself. These have to be manually unindexed and removed; the code examples above have been updated to reflect this.</p>

<p><em>This article was originally published on <a href="http://web.archive.org/web/20111231230726/http://www.jarn.com/">jarn.com</a>.</em></p>]]></content><author><name>Martijn Pieters</name></author><category term="plone" /><category term="plone.relations" /><category term="IntIds" /><category term="KeyError" /><summary type="html"><![CDATA[When IntIds go missing, the going gets tough. Specifically, plone.app.relations and related packages do not deal gracefully when a relationship source or target is missing. Here is how we clear such broken relationships.]]></summary></entry><entry><title type="html">Saving the day: recovering lost objects</title><link href="https://www.zopatista.com/plone/2008/12/18/saving-the-day-recovering-lost-objects/" rel="alternate" type="text/html" title="Saving the day: recovering lost objects" /><published>2008-12-18T00:00:00+00:00</published><updated>2008-12-18T00:00:00+00:00</updated><id>https://www.zopatista.com/plone/2008/12/18/saving-the-day-recovering-lost-objects</id><content type="html" xml:base="https://www.zopatista.com/plone/2008/12/18/saving-the-day-recovering-lost-objects/"><![CDATA[<p><em>When a customer discovers over a week later that an important object was accidentially deleted, what do you do?</em></p>

<h2 id="oh-noes">Oh noes!</h2>

<p>A customer discovered that an important entire section of his site was missing and asked us to bring it back. This was in a heavily edited site, with loads of writes each day, but we quickly located the offending transaction: someone had deleted the object in question 9 days earlier.</p>

<p>Undo was no longer an option, though: too many things had changed, not least the catalog. Truncating the Data.fs (removing all transactions since, including the offending one) was not only undesirable, but impossible as the site stores the data in Oracle through <a href="http://web.archive.org/web/20081224104516/http://wiki.zope.org/ZODB/RelStorage">RelStorage</a>.</p>

<h2 id="time-travel">Time travel</h2>

<p>So, instead of permanently removing transactions, we used a handy little package to do some time traveling: <a href="http://pypi.python.org/pypi/zc.beforestorage">zc.beforestorage</a>.</p>

<p><code class="language-plaintext highlighter-rouge">zc.beforestorage</code> does require a ZODB version 3.8 or 3.9; the customer installation is on Plone 3.0, so a newer ZODB3 egg was necessary for this operation. A small additional buildout configuration file (saved as beforestorage.cfg) helps out:</p>

<div class="language-ini highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[buildout]</span><span class="w">
</span><span class="py">extends</span><span class="w"> </span><span class="p">=</span>
<span class="w">    </span><span class="na">buildout.cfg</span><span class="w">
</span><span class="na">eggs</span><span class="w"> </span><span class="na">+=</span><span class="w">
    </span><span class="na">zc.beforestorage</span><span class="w">
    </span><span class="na">ZODB3</span><span class="w">
    </span><span class="na">zope.proxy</span><span class="w">

</span><span class="nn">[versions]</span><span class="w">
</span><span class="py">ZODB3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">3.8.1</span>
<span class="py">zope.proxy</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">3.4.2</span>
<span class="w">
</span><span class="nn">[relstorage-patch]</span><span class="w">
</span><span class="py">recipe</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">plone.recipe.command</span>
<span class="py">command</span><span class="w"> </span><span class="p">=</span><span class="w"> 
    </span><span class="s">cd ${buildout:eggs-directory}/ZODB3-3.8.1-py2.4-linux-i686.egg/ZODB</span>
<span class="w">    </span><span class="na">curl</span><span class="w"> </span><span class="na">-s</span><span class="w"> </span><span class="na">http://svn.zope.de/zope.org/relstorage/tags/1.1c1/poll-invalidation-1-zodb-3-8-0.patch</span><span class="w"> </span><span class="na">|</span><span class="w"> </span><span class="na">patch</span><span class="w"> </span><span class="na">-N</span><span class="w"> </span><span class="na">-p0</span><span class="w">
    </span><span class="na">cd</span><span class="w"> </span><span class="na">${buildout:directory}</span><span class="w">
</span><span class="py">update-command</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">${relstorage-patch:command}</span>
<span class="w">
</span><span class="nn">[instance]</span><span class="w">
</span><span class="na">zope-conf-additional</span><span class="w"> </span><span class="na">+=</span><span class="w">
    </span><span class="na">enable-product-installation</span><span class="w"> </span><span class="na">False</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">relstorage-patch</code> section in the above code ensures that our ZODB3 egg is patched with the RelStorage additions, and the zope.proxy egg is needed because ZODB 3.8 requires a newer version. The <code class="language-plaintext highlighter-rouge">enable-product-installation</code> line is required because <code class="language-plaintext highlighter-rouge">zc.beforestorage</code> puts your ZODB in read-only mode (understandibly); the option tells Zope not to try and write product information to the ZODB.</p>

<p>Once buildout has been run with this configuration (with the <code class="language-plaintext highlighter-rouge">-c</code> switch), you’ll still need to edit the zope.conf file for your instance, usually in parts/instance/etc/zope.conf. You need to edit the <code class="language-plaintext highlighter-rouge">&lt;zodb_db main&gt;</code> section to wrap the storage in the beforestorage. Ours looked something like this:</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;zodb_db</span> <span class="err">main</span><span class="nt">&gt;</span>
    # Main database
    cache-size 650000
%import zc.beforestorage
%import relstorage
    <span class="nt">&lt;before&gt;</span>
    before 2008-12-08T10:29:03
    <span class="nt">&lt;relstorage&gt;</span>
        <span class="nt">&lt;oracle&gt;</span>
            dsn RELSTORAGE_DSN
            password xxxxxxxxx
            user xxxxxxxx
        <span class="nt">&lt;/oracle&gt;</span>
    <span class="nt">&lt;/relstorage&gt;</span>
    <span class="nt">&lt;/before&gt;</span>
    mount-point /
<span class="nt">&lt;/zodb_db&gt;</span>
</code></pre></div></div>

<p>Any line with the word ‘before’ in it is new. The timestamp we learned from the undo log, simply converted to UTC. Now, when you start the instance, you are in the past. You can’t alter this past (no killing of grandfathers), but you <em>can</em> read it. And lo and behold, the deleted object is back.</p>

<h2 id="recovery">Recovery</h2>

<p>Now that we have found the lost object, we can recover it. We simply exported it; in the ZMI, choose the Export/Import button, and save the export on the server. Remove the zc.beforestorage configuration (just run buildout with your regular buildout file), restart, import the .zexp file, done!</p>

<p>Note that you’ll need to reindex the imported content and that any related data that lives outside of the object itself is gone. For example, its intid are gone and all relationships to it will have to be recreated etc. But you just saved your customers bacon, I’m sure they won’t mind a little manual work!</p>

<p><em>This article was originally published on <a href="http://web.archive.org/web/20111231230726/http://www.jarn.com/">jarn.com</a>.</em></p>]]></content><author><name>Martijn Pieters</name></author><category term="plone" /><category term="recovery" /><category term="beforestorage" /><summary type="html"><![CDATA[When a customer discovers over a week later that an important object was accidentially deleted, what do you do?]]></summary></entry><entry><title type="html">One cookie please, but hold the pickles</title><link href="https://www.zopatista.com/plone/2007/11/09/one-cookie-please/" rel="alternate" type="text/html" title="One cookie please, but hold the pickles" /><published>2007-11-09T11:00:00+00:00</published><updated>2007-11-09T11:00:00+00:00</updated><id>https://www.zopatista.com/plone/2007/11/09/one-cookie-please</id><content type="html" xml:base="https://www.zopatista.com/plone/2007/11/09/one-cookie-please/"><![CDATA[<p><em>The python pickle module is dangerous, didn’t you know?</em></p>

<h2 id="all-your-base-are-belong-to-us">All your base are belong to us</h2>

<p>By now you all should have installed <a href="http://web.archive.org/web/20080629190539/http://plone.org/products/plone-hotfix/releases/20071106">last Tuesday’s Hotfix</a>. If you haven’t yet, but are running Plone 2.5 or Plone 3.0 websites, you should do so <strong>yesterday</strong>, or at least as soon as humanly possible.</p>

<p>The Hotfix patches a serious security problem in the statusmessages and linkintegrity modules, where network-supplied data was interpreted as <a href="http://docs.python.org/lib/module-pickle.html">pickles</a>. “Network-supplied” data in this case means both cookies and form data, and no authentication is required to exploit the holes.</p>

<h2 id="what-happen-">What happen ?</h2>

<p>The basic problem with the holes is that the Plone community was totally unaware of how dangerous the pickle module really is. Hanno Schlichting did file a <em>report</em><sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> a few months ago stating that the code was potentially dangerous, but even he didn’t fully appreciate that pickles are a security hole only waiting for attacker input. The scary thing here is that the code in question was written by extremely capable and experienced developers, but none of them were aware of the fact that you cannot ever use pickles to load user-supplied data.</p>

<p>What is needed then, is education. This is my contribution.</p>

<h2 id="you-are-on-the-way-to-destruction">You are on the way to destruction</h2>

<p>So what is wrong with pickles? They are just a damn handy way to serialize arbitrary data into binary strings and back again, right?</p>

<p>Yes, they are that, but the pickle format used is also a <a href="http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/">simple stack language</a> that allows the creation of arbitrary python structures, and execute them. This stack language allows you to import modules (the ‘c’ symbol), and apply arguments to callables (the ‘R’ symbol), thus causing code to be run. Combine this with the python built-in methods <code class="language-plaintext highlighter-rouge">eval</code> and <code class="language-plaintext highlighter-rouge">compile</code> and you have the perfect vehicle for an attacker to have the pickle loader routine execute arbitrary python code when loading a well-crafted pickle. Just image what an attacker could do with that to your Zope server. Do you think you’ll ever be sure you got all the backdoors out of your Data.fs?</p>

<h2 id="we-get-signal">We get signal</h2>

<p>So next time you need to preserve data across HTTP requests, please do not be tempted to use the pickle module to create strings for you. Rarely will you have anything more than a handful of simple datatypes to pass along anyway, so just invent a simple dataformat and use that instead. (No, using a subclass of the python implementation of pickle is not a simpler solution).</p>

<p>With statusmessages for example, each message consists of a message and a type string, both unicode. So we changed to a hand-rolled format using a 2 byte length header (11 bits of message length, 5 for the type) directly followed by the message and type strings (encoded to utf-8). When reading this from a cookie again later, the decoder simply has to read the lengths from the first 2 bytes, then read the right amount of characters to get the message and type back. A similar method was used to encode the linkintegrity data. Simple, effective, and impervious to attacks.</p>

<blockquote cite="http://en.wikipedia.org/wiki/All_your_base_are_belong_to_us">
  <p>Congratulation!!<br />
A.D.2111<br />
All bases of CATS were destroyed.<br />
It seems to be peaceful.<br />
But it is incorrect. CATS is still alive.<br />
ZIG-01 must fight against CATS again.<br />
And down with them completely !<br />
Good luck.<br />
<cite><a href="http://en.wikipedia.org/wiki/All_your_base_are_belong_to_us">Zero Wing, 1989</a></cite></p>
</blockquote>

<p><em>This article was originally published on <a href="http://web.archive.org/web/20111231230726/http://www.jarn.com/">jarn.com</a>.</em></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>the original link, <code class="language-plaintext highlighter-rouge">http://dev.plone.org/plone/ticket/6943</code>, is unfortunately now lost without trace. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Martijn Pieters</name></author><category term="plone" /><category term="cookies" /><category term="pickling" /><category term="pickles" /><category term="security" /><summary type="html"><![CDATA[The python pickle module is dangerous, didn't you know?]]></summary></entry></feed>