joe di castrohttp://joedicastro.com2011-05-16T21:05:00+02:00Conocer el tamaño de un directorio con Python2011-05-16T21:05:00+02:00joe di castrohttp://joedicastro.com/conocer-el-tamano-de-un-directorio-con-python.html<p>Aunque conocer el tamaño de un directorio en sistemas como Linux es algo trivial, solo es necesario emplear el comando <code>du</code>, si queremos hacer lo mismo con <strong>Python</strong> -sin hacer uso de este comando- la cosa ya no es tan sencilla. Sobre todo si lo que queremos es una solución que nos devuelva tanto el tamaño de un fichero como el de un directorio. Cuando me encontré con esta necesidad lo primero que hice fue buscar en Internet para conocer alguna solución previa (reinventar la rueda no siempre es lo mejor) y me encontré con esto:</p> <div class="codehilite"><pre><span class="k">def</span> <span class="nf">get_dir_size</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Get size of a directory tree in bytes.&quot;&quot;&quot;</span> <span class="n">path_size</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">path</span><span class="p">,</span> <span class="n">dirs</span><span class="p">,</span> <span class="n">files</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">walk</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="k">for</span> <span class="n">fil</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span> <span class="n">filename</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">fil</span><span class="p">)</span> <span class="n">path_size</span> <span class="o">+=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="k">return</span> <span class="n">path_size</span> </pre></div> <p>Esta solución nos daría el tamaño en bytes de un directorio. Pero esta solución, que encontré en varios sitios, presentaba dos problemas:</p> <ul> <li> <p><strong>No da un tamaño exacto</strong>. Esto se debe a que no tiene en cuenta las carpetas y ficheros ocultos (los que empiezan con un <code>.</code> en Linux) y los ficheros especiales <code>..</code> (que apuntan al directorio superior). Además tampoco tiene en cuenta los enlaces simbólicos. Por está razón la salida de esta función no coincide con el espacio que nos reporta el comando <abbr title="Linux, Unix, Solaris, BSD, etc">UN*X</abbr> <code>du -bs</code></p> </li> <li> <p><strong>No funciona para un solo fichero</strong>. Solo trabaja cuando lo ejecutamos sobre un directorio, al hacerlo sobre un solo fichero nos dará como resultado siempre 0.</p> </li> </ul> <p>Teniendo en cuenta este punto de partida, elaboré una función que solucionara estos dos problemas y que devolviera el tamaño exacto de un directorio o fichero. Esta es la <strong>función que nos da el resultado correcto</strong>:</p> <div class="codehilite"><pre><span class="k">def</span> <span class="nf">get_size</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Get size of a directory tree or a file in bytes.&quot;&quot;&quot;</span> <span class="n">path_size</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">path</span><span class="p">,</span> <span class="n">directories</span><span class="p">,</span> <span class="n">files</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">walk</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="k">for</span> <span class="n">filename</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span> <span class="n">path_size</span> <span class="o">+=</span> <span class="n">os</span><span class="o">.</span><span class="n">lstat</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">filename</span><span class="p">))</span><span class="o">.</span><span class="n">st_size</span> <span class="k">for</span> <span class="n">directory</span> <span class="ow">in</span> <span class="n">directories</span><span class="p">:</span> <span class="n">path_size</span> <span class="o">+=</span> <span class="n">os</span><span class="o">.</span><span class="n">lstat</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">directory</span><span class="p">))</span><span class="o">.</span><span class="n">st_size</span> <span class="n">path_size</span> <span class="o">+=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">the_path</span><span class="p">)</span> <span class="k">return</span> <span class="n">path_size</span> </pre></div> <p>El resultado de esta función es el mismo que el que nos devuelve el comando Linux <code>du -bs</code>. Además tiene en cuenta los enlaces simbólicos y no los sigue. Luego buscando una <strong>solución ligeramente más rápida</strong> (aunque menos elegante y <em>pythonica</em>) y que siguiera dando resultados precisos, cree una variante basada en el empleo de generadores. </p> <div class="codehilite"><pre><span class="k">def</span> <span class="nf">get_size_fast</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Get size of a directory tree or a file in bytes.&quot;&quot;&quot;</span> <span class="k">def</span> <span class="nf">get_sizes</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Make a generator of individual file &amp; directory sizes.&quot;&quot;&quot;</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">islink</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isdir</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="k">for</span> <span class="n">file_or_dir</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="n">path</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">the_path</span><span class="p">,</span> <span class="n">file_or_dir</span><span class="p">)</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="k">yield</span> <span class="n">os</span><span class="o">.</span><span class="n">lstat</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">.</span><span class="n">st_size</span> <span class="k">else</span><span class="p">:</span> <span class="k">for</span> <span class="n">size</span> <span class="ow">in</span> <span class="n">get_sizes</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="k">yield</span> <span class="n">size</span> <span class="k">yield</span> <span class="n">os</span><span class="o">.</span><span class="n">lstat</span><span class="p">(</span><span class="n">the_path</span><span class="p">)</span><span class="o">.</span><span class="n">st_size</span> <span class="k">else</span><span class="p">:</span> <span class="k">yield</span> <span class="n">os</span><span class="o">.</span><span class="n">lstat</span><span class="p">(</span><span class="n">the_path</span><span class="p">)</span><span class="o">.</span><span class="n">st_size</span> <span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">get_sizes</span><span class="p">(</span><span class="n">the_path</span><span class="p">))</span> </pre></div> <h2 id="obtener_el_tama+o_del_directorio_en_la_mejor_unidad_posible">Obtener el tamaño del directorio en la mejor unidad posible</h2> <p>Estas funciones proporcionan el resultado que deseamos, pero lo entregan en una unidad difícilmente legible, en bytes. ¿Que ocurre si queremos verlo en <a href="http://es.wikipedia.org/wiki/Prefijo_binario">Mebibytes, GibiBytes</a>, ... y que además sea siempre la más adecuada para una mejor visualización? Para responder a esta pregunta desarrolle una función que nos hace precisamente esto, tomar un tamaño en bytes y devolvernos el valor correcto en la <a href="http://physics.nist.gov/cuu/Units/binary.html">unidad binaria IEC</a> más adecuada:</p> <div class="codehilite"><pre><span class="k">def</span> <span class="nf">best_unit_size</span><span class="p">(</span><span class="n">bytes_size</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Get a size in bytes &amp; convert it to the best IEC prefix for readability.</span> <span class="sd"> Return a dictionary with three pair of keys/values:</span> <span class="sd"> &quot;s&quot; -- (float) Size of path converted to the best unit for easy read</span> <span class="sd"> &quot;u&quot; -- (str) The prefix (IEC) for s (from bytes(2^0) to YiB(2^80))</span> <span class="sd"> &quot;b&quot; -- (int / long) The original size in bytes</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">for</span> <span class="n">exp</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">90</span> <span class="p">,</span> <span class="mi">10</span><span class="p">):</span> <span class="n">bu_size</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">bytes_size</span><span class="p">)</span> <span class="o">/</span> <span class="nb">pow</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="k">if</span> <span class="nb">int</span><span class="p">(</span><span class="n">bu_size</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">2</span> <span class="o">**</span> <span class="mi">10</span><span class="p">:</span> <span class="n">unit</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">:</span><span class="s">&quot;bytes&quot;</span><span class="p">,</span> <span class="mi">10</span><span class="p">:</span><span class="s">&quot;KiB&quot;</span><span class="p">,</span> <span class="mi">20</span><span class="p">:</span><span class="s">&quot;MiB&quot;</span><span class="p">,</span> <span class="mi">30</span><span class="p">:</span><span class="s">&quot;GiB&quot;</span><span class="p">,</span> <span class="mi">40</span><span class="p">:</span><span class="s">&quot;TiB&quot;</span><span class="p">,</span> <span class="mi">50</span><span class="p">:</span><span class="s">&quot;PiB&quot;</span><span class="p">,</span> <span class="mi">60</span><span class="p">:</span><span class="s">&quot;EiB&quot;</span><span class="p">,</span> <span class="mi">70</span><span class="p">:</span><span class="s">&quot;ZiB&quot;</span><span class="p">,</span> <span class="mi">80</span><span class="p">:</span><span class="s">&quot;YiB&quot;</span><span class="p">}[</span><span class="n">exp</span><span class="p">]</span> <span class="k">break</span> <span class="k">return</span> <span class="p">{</span><span class="s">&quot;s&quot;</span><span class="p">:</span><span class="n">bu_size</span><span class="p">,</span> <span class="s">&quot;u&quot;</span><span class="p">:</span><span class="n">unit</span><span class="p">,</span> <span class="s">&quot;b&quot;</span><span class="p">:</span><span class="n">bytes_size</span><span class="p">}</span> </pre></div> <p>Esta función nos devuelve un diccionario con tres claves:</p> <ul> <li><code>'s'</code>: Es el tamaño convertido a la mejor unidad IEC posible en términos de legibilidad.</li> <li><code>'u'</code>: Es el prefijo IEC para el tamaño anterior.</li> <li><code>'b'</code>: Es el tamaño original en bytes.</li> </ul> <p>Para entenderla, lo mejor es mostrar algunos ejemplos:</p> <div class="codehilite"><pre><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">get_size</span> <span class="gp">&gt;&gt;&gt; </span><span class="n">size</span> <span class="o">=</span> <span class="n">get_size</span><span class="o">.</span><span class="n">best_unit_size</span><span class="p">(</span><span class="mi">38467206502</span><span class="p">)</span> <span class="gp">&gt;&gt;&gt; </span><span class="s">&quot;{0:.2f} {1}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">size</span><span class="p">[</span><span class="s">&#39;s&#39;</span><span class="p">],</span> <span class="n">size</span><span class="p">[</span><span class="s">&#39;u&#39;</span><span class="p">])</span> <span class="go">&#39;35.83 GiB&#39;</span> <span class="gp">&gt;&gt;&gt; </span><span class="n">size</span> <span class="o">=</span> <span class="n">get_size</span><span class="o">.</span><span class="n">best_unit_size</span><span class="p">(</span><span class="mi">45332</span><span class="p">)</span> <span class="gp">&gt;&gt;&gt; </span><span class="s">&quot;{0:.2f} {1}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">size</span><span class="p">[</span><span class="s">&#39;s&#39;</span><span class="p">],</span> <span class="n">size</span><span class="p">[</span><span class="s">&#39;u&#39;</span><span class="p">])</span> <span class="go">&#39;44.27 KiB&#39;</span> <span class="gp">&gt;&gt;&gt; </span><span class="n">size</span> <span class="o">=</span> <span class="n">get_size</span><span class="o">.</span><span class="n">best_unit_size</span><span class="p">(</span><span class="mi">9878323</span><span class="p">)</span> <span class="gp">&gt;&gt;&gt; </span><span class="s">&quot;{0:.2f} {1} es igual a {2} bytes&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">size</span><span class="p">[</span><span class="s">&#39;s&#39;</span><span class="p">],</span> <span class="n">size</span><span class="p">[</span><span class="s">&#39;u&#39;</span><span class="p">],</span> <span class="n">size</span><span class="p">[</span><span class="s">&#39;b&#39;</span><span class="p">])</span> <span class="go">&#39;9.42 MiB es igual a 9878323 bytes&#39;</span> </pre></div> <p>Y evidentemente, combinar las dos funciones en una, nos evita tener que pasar las dos a un mismo directorio/fichero. </p> <div class="codehilite"><pre><span class="k">def</span> <span class="nf">get_unit_size</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Calculate size of a directory/file &amp; convert it for the best IEC prefix.</span> <span class="sd"> Return a dictionary with three pair of keys/values:</span> <span class="sd"> &quot;s&quot; -- (float) Size of path converted to the best unit for easy read</span> <span class="sd"> &quot;u&quot; -- (str) The prefix (IEC) for s (from bytes(2^0) to YiB(2^80))</span> <span class="sd"> &quot;b&quot; -- (int / long) The original size in bytes</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">bytes_size</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">path</span><span class="p">,</span> <span class="n">directories</span><span class="p">,</span> <span class="n">files</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">walk</span><span class="p">(</span><span class="n">the_path</span><span class="p">):</span> <span class="k">for</span> <span class="n">filename</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span> <span class="n">bytes_size</span> <span class="o">+=</span> <span class="n">os</span><span class="o">.</span><span class="n">lstat</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">filename</span><span class="p">))</span><span class="o">.</span><span class="n">st_size</span> <span class="k">for</span> <span class="n">directory</span> <span class="ow">in</span> <span class="n">directories</span><span class="p">:</span> <span class="n">bytes_size</span> <span class="o">+=</span> <span class="n">os</span><span class="o">.</span><span class="n">lstat</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">directory</span><span class="p">))</span><span class="o">.</span><span class="n">st_size</span> <span class="n">bytes_size</span> <span class="o">+=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="n">the_path</span><span class="p">)</span> <span class="k">for</span> <span class="n">exp</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">90</span> <span class="p">,</span> <span class="mi">10</span><span class="p">):</span> <span class="n">bu_size</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">bytes_size</span><span class="p">)</span> <span class="o">/</span> <span class="nb">pow</span><span class="p">(</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="k">if</span> <span class="nb">int</span><span class="p">(</span><span class="n">bu_size</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">2</span> <span class="o">**</span> <span class="mi">10</span><span class="p">:</span> <span class="n">unit</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">:</span><span class="s">&quot;bytes&quot;</span><span class="p">,</span> <span class="mi">10</span><span class="p">:</span><span class="s">&quot;KiB&quot;</span><span class="p">,</span> <span class="mi">20</span><span class="p">:</span><span class="s">&quot;MiB&quot;</span><span class="p">,</span> <span class="mi">30</span><span class="p">:</span><span class="s">&quot;GiB&quot;</span><span class="p">,</span> <span class="mi">40</span><span class="p">:</span><span class="s">&quot;TiB&quot;</span><span class="p">,</span> <span class="mi">50</span><span class="p">:</span><span class="s">&quot;PiB&quot;</span><span class="p">,</span> <span class="mi">60</span><span class="p">:</span><span class="s">&quot;EiB&quot;</span><span class="p">,</span> <span class="mi">70</span><span class="p">:</span><span class="s">&quot;ZiB&quot;</span><span class="p">,</span> <span class="mi">80</span><span class="p">:</span><span class="s">&quot;YiB&quot;</span><span class="p">}[</span><span class="n">exp</span><span class="p">]</span> <span class="k">break</span> <span class="k">return</span> <span class="p">{</span><span class="s">&quot;s&quot;</span><span class="p">:</span><span class="n">bu_size</span><span class="p">,</span> <span class="s">&quot;u&quot;</span><span class="p">:</span><span class="n">unit</span><span class="p">,</span> <span class="s">&quot;b&quot;</span><span class="p">:</span><span class="n">bytes_size</span><span class="p">}</span> </pre></div> <p>Que nos devuelve un diccionario similar al anterior, lo que nos proporciona la posibilidad de disponer tanto del tamaño en bytes como en la mejor unidad IEC posible con una única función. </p> <p>Todas estas funciones con ejemplos (y además una clase que hace uso de ellas), se pueden encontrar en el fichero <code>get_size.py</code> en mi repositorio <em>Python Recipes</em> que se encuentra alojado en <a href="http://github.com/joedicastro/python-recipes">github</a>. Si se ejecuta el fichero como un script puede verse una comparativa de las diversas funciones en rendimiento y precisión con respecto al comando <code>du -bs</code></p>