João Victor Risso's Blog - GSoChttps://joaovictortr.me/2017-08-29T00:58:00-03:00GSoC - Final Report2017-08-29T00:58:00-03:002017-08-29T00:58:00-03:00João Victor Rissotag:joaovictortr.me,2017-08-29:/2017/gsoc-final-report.html<p>Final report of my project on the Google Summer of Code</p><h2>Initial Proposal</h2>
<p>In the initial proposal [4], the goal was to implement wrappers for functions that can be executed on the <span class="caps">GPU</span>, as to accelerate computations of models in Theano. More specifically, the goal was to implement the following functionalities:</p>
<ol>
<li><strong>Wrapper for the warp-ctc library into Theano</strong>, in order to provide fast <span class="caps">CTC</span> computations both in the <span class="caps">CPU</span> and <span class="caps">GPU</span>. There were two existing wrappers in GitHub, however they were neither complete, or compatible with Theano’s <code>gpuarray</code> library.</li>
<li><strong>Wrapper for a Symmetric Eigenvalue Solver</strong>, using the cuSolver library, in order to obtain the eigenvalues on the <span class="caps">GPU</span>.</li>
<li><strong>Wrapper for a <span class="caps">QR</span> factorization function</strong>, also using the cuSolver library, which would enable fast eingenvalue computations and factorizations.</li>
<li><strong>Implement Spatial Transformer Network Ops from cuDNN</strong>, which allows neural networks to handle distorted inputs, and learn how the transformation parameters to better extract features from images.</li>
</ol>
<h3>Changes to the Proposal</h3>
<p>However, the Theano developers have been working on the implementation of wrappers for functions of the <a href="http://icl.cs.utk.edu/magma/"><span class="caps">MAGMA</span> library</a>, which implements linear algebra functions in a very efficient manner, with support for multi-cores and GPUs. In the <a href="https://github.com/Theano/Theano/issues/5911">issue</a> that lists which operations have been implemented, there are both the items 2 and 3, that is, the <span class="caps">QR</span> factorization and eingenvalue solver for symmetric (hermitian) matrices.</p>
<p>In order to avoid doing rework, it was discussed and decided with mentors that implementing the spatial transformer would be the replacement for items 2 and 3 of the proposal, since it would also be interesting to have a <span class="caps">CPU</span> implementation of that functionality.</p>
<p>Hence, the project was divided into three parts: implementation of the <span class="caps">CTC</span> wrapper, spatial transformer using cuDNN, and the <span class="caps">CPU</span> spatial transformer.</p>
<h2>Contributions</h2>
<p>In this section, I will describe the contributions made to Theano during the course of the project.</p>
<h3>Connectionist Temporal Classification Loss</h3>
<p>The first part of the project consisted in implementing a wrapper for Theano that makes use of <a href="https://github.com/baidu-research/warp-ctc">warp-ctc</a> [1], a fast implementation of the <span class="caps">CTC</span> loss function by Baidu Research. Their implementation works both on multi-core processors (by using OpenMP threads) and also on GPUs, using <span class="caps">CUDA</span> kernels to compute the <span class="caps">CTC</span> function. A more detailed explanation of how warp-ctc works, is provided in the paper that accompanied the release [2].</p>
<p>Outputs of a <span class="caps">CTC</span> network are given by a softmax layer, whose results are interpreted as a probability distribution over all possible label sequences, conditioned by a given input sequence. Given that distribution, an objective function was derived to maximize the probabilities of correct labellings. Since the objective function is differentiable, the network can be trained with backpropagation through time.</p>
<p>Implementations of the <span class="caps">CTC</span> functionality for <span class="caps">CPU</span> and <span class="caps">GPU</span>, can be found in Theano’s <code>theano.tensor.nnet.ctc</code> and <code>theano.gpuarray.ctc</code> modules, respectively. Furthermore, optimizations were implemented to allow the user to call a single <span class="caps">CPU</span> function, and may have it ‘lifted’ for execution on the <span class="caps">GPU</span>, depending on his configurations. Finally, wrappers for <span class="caps">CTC</span> gradients using were also implemented for both <span class="caps">CPU</span> and <span class="caps">GPU</span>.</p>
<p>Below there is a brief description of each Op, and links to where they are located in Theano’s codebase:</p>
<ul>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/ctc.py#L199">theano.tensor.nnet.ctc.ctc</a>: function that setups a <span class="caps">CTC</span> Op, i.e. it setups the node on the graph that will compute the <span class="caps">CTC</span> loss function.</li>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/ctc.py#L88">theano.tensor.nnet.ctc.ConnectionistTemporalClassification</a>: COp class that implements the computation of the <span class="caps">CTC</span> loss function.</li>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/gpuarray/ctc.py#L144">theano.gpuarray.gpu_ctc</a>: function that setups a <span class="caps">GPU</span> <span class="caps">CTC</span> Op, i.e. it setups the node on the graph that will compute the <span class="caps">CTC</span> loss function on the <span class="caps">GPU</span>.</li>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/gpuarray/ctc.py#L20">theano.gpuarray.GpuConnectionistTemporalClassification</a>: COp class that implements the computation of the <span class="caps">CTC</span> loss function on the <span class="caps">GPU</span>.</li>
</ul>
<p>In the COp classes, one can find the paths to the C wrappers, which make the interface between Theano and the warp-ctc library.</p>
<h3>Spatial Transformer</h3>
<p>In the second and third parts of the project, I have worked on implementing a spatial transformer, at first only on the <span class="caps">GPU</span>, and then on the <span class="caps">CPU</span>. A Spatial Transformer is a component of a neural network that can provide spatial manipulation of data within the network. Spatial manipulation can improve models by introducing invariance to affine transformations, such as translation, scaling and rotation. This kind of invariance improves classification performance, since the networks become able to recognize samples that have distortions or are rotated, for example.</p>
<p><img alt="Spatial Transformer Representation" src="spatial_transformer.png" title="Spatial Transformer Representation"></p>
<p>There are three main components in a Spatial Transformer, as shown in the image above (provided in the paper by <a href="https://arxiv.org/abs/1506.02025">Jaderberg et. al</a>):</p>
<ul>
<li><strong>Localisation network:</strong> neural network that receives the input feature map U, where U is a space spanned by the width, height and channels. It outputs the parameters of the transformation to be applied to the feature map. In 2D, the parameters take the form of a 2x3 matrix (i.e. an affine transformation matrix). It can take the form of any neural network, but it should include a final regression layer to produce the transformation parameters.</li>
<li><strong>Grid generator:</strong> normalized grid of coordinates over the input feature map. It maps the original coordinate system of the input to an interval in [-1, 1], and applies the transformation the normalized space.</li>
<li><strong>Sampler:</strong> the sampler takes a set of sampling points from the grid generator, along with the input feature map U and produces the sampled output feature map V.</li>
</ul>
<h4>Spatial Transformer using cuDNN</h4>
<p>cuDNN provides spatial transformer functions since version 6, and those functions were utilized to implement the second part of the project. There are two types of functions: forward and backward. Forward functions implement the operations of the sampling grid, and the sampler. Backward functions are used to compute gradients of the outputs of each forward function, i.e. there is no function to compute gradients of the inputs and another to compute the gradients of the affine transformation, such that they backpropagated in the neural network, in order for it to learn.</p>
<p>Spatial transformer functions from cuDNN were implemented in Theano’s <code>gpuarray.dnn</code> module, as <a href="http://deeplearning.net/software/theano/extending/op.html">Theano Ops</a>. In order to wrap the required functions, I have implemented wrappers in C, which interface with Theano <code>PyGpuArrayObject</code><span class="quo">‘</span>s and the cuDNN functions.</p>
<p>Below there is a brief description of each Op, and links to where they are located in Theano’s codebase:</p>
<ul>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/gpuarray/dnn.py#L2973"><code>theano.gpuarray.dnn.dnn_spatialtf</code></a>: function that setups a complete spatial transformer, i.e. it setups the sampling grid and the sampler, and returns the latter to the user.</li>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/gpuarray/dnn.py#L2792"><code>theano.gpuarray.dnn.GpuDnnTransformerGrid</code></a>: COp class that implements the sampling grid, using cuDNN’s forward grid generator function.</li>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/gpuarray/dnn.py#L2848"><code>theano.gpuarray.dnn.GpuDnnTransformerSampler</code></a>: COp class that implements the sampler, which is currently limited by cuDNN to bilinear interpolation. This class interfaces with cuDNN’s forward sampler function.</li>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/gpuarray/dnn.py#L2907"><code>theano.gpuarray.dnn.GpuDnnTransformerGradI</code></a>: COp class that implements the gradients of the inputs, which interfaces with cuDNN’s backward sampling grid function.</li>
<li><a href="https://github.com/Theano/Theano/blob/master/theano/gpuarray/dnn.py#L2944"><code>theano.gpuarray.dnn.GpuDnnTransformerGradT</code></a>: COp class that implements the gradients of the affine transformation, which interfaces with cuDNN’s backward sampler function.</li>
</ul>
<p>In the COp classes, one can find the paths to the C wrappers, which make the interface between Theano and cuDNN.</p>
<h4>Spatial Transformer on the <span class="caps">CPU</span></h4>
<p>Based on a implementation in <a href="https://github.com/Lasagne/Lasagne/blob/master/lasagne/layers/special.py#L354">Lasagne</a>, a spatial transformer was implemented on the <span class="caps">CPU</span> as well. The work on this third of the project consisted in adapting the implementation from Lasagne, which uses Theano symbolic variables to perform the computations, into Theano Ops.</p>
<p>However, Lasagne does not provide the implementations for the gradients, and neither does cuDNN. So those have to be implemented based on the equations of the paper by Jaderberg et al. [3]. Furthemore, it is necessary to provide a concrete implementation (e.g. using NumPy) of each of those Ops, in order to enable users to debug code that uses the functionalities provided by the spatial transformer.</p>
<p>Most of the implementation is completed, including the gradients of inputs, with only the gradients of the affine transformation currently not passing the gradient tests. Fixing the computations of the affine transformation is the last step required to finish the implementation of the spatial transformer on the <span class="caps">CPU</span>.</p>
<h3>Pull Requests</h3>
<p>A vast majority of the discussions with the mentors during the course of the project, were carried out in the public mailing list of Theano developers, and on GitHub Pull Requests. Furthermore, all implementations passed through a process of unit testing, and peer review by the mentors.</p>
<p>First and second parts of the project were successfully completed, and are already merged into Theano (see the Pull Requests below). However, the third part is not yet completed, for reasons that have already been explained.</p>
<p>Links to the original Pull Requests are provided below:</p>
<ol>
<li>Connectionist Temporal Classification Loss with warp-ctc: <a href="https://github.com/Theano/Theano/pull/5949">Pull Request</a></li>
<li>Spatial Transformer using cuDNN: <a href="https://github.com/Theano/Theano/pull/6061">Pull Request</a></li>
<li>Spatial Transformer on the <span class="caps">CPU</span> (<span class="caps">WIP</span>): <a href="https://github.com/Theano/Theano/pull/6298">Pull Request</a></li>
</ol>
<p>You can also see my commits in Theano, <a href="https://github.com/Theano/Theano/commits/master?author=joaovictortr">here</a> for the <span class="caps">CTC</span> and Spatial Transformer with cuDNN, and <a href="https://github.com/joaovictortr/Theano/commits/spatialtf_cpu?author=joaovictortr">here</a> for the Spatial Transformer on the <span class="caps">CPU</span>.</p>
<h2>Conclusion</h2>
<p>In this project, I have implemented wrappers for <span class="caps">GPU</span> functions in Theano, in order to accelerate the computation of deep learning models. Two of the three parts of the project have been merged into Theano, with the third only requiring fixing the computation of gradients.</p>
<p>During this summer, I have learned a lot about the inner workings of Theano. I have also improved considerably my knowledge of Python, as I come from a strong C/C++ background.</p>
<h3>What’s Next?</h3>
<p>I’ll start getting deeper into machine learning, and Theano will be a great tool for the job. With some knowledge of the internals, I can implement my own models, as well as suggest and add new functionalities.</p>
<h2>Acknowledgements</h2>
<p>I would like to thank Steven Bocco, my mentor, for guiding me in the execution of the project, providing feedback and reviewing the code. I would also like to thank Frédéric Bastien and Arnaud Bergeron, for helping with organizational aspects, and code reviewing.</p>
<p>Finally, I would like to thank the staff of GSoC, and Google, for this unique opportunity.</p>
<h2>References</h2>
<p>[1] <a href="https://insidehpc.com/2016/01/warp-ctc/">“Accelerating Machine Learning with Open Source Warp-<span class="caps">CTC</span>”</a>. 2016. Accessed on: 2017-08-28.</p>
<p>[2] Amodei et. al. <a href="https://arxiv.org/abs/1512.02595">Deep Speech 2: End-to-End Speech Recognition in English and Mandarin</a>. 2015. Accessed on: 2017-08-28</p>
<p>[3] Jaderberg et. al. <a href="https://arxiv.org/abs/1506.02025">Spatial Transformer Networks</a>. 2015. Accessed on: 2017-08-28</p>
<p>[4] Risso, <span class="caps">J. V.</span> T.. <a href="https://www.sharelatex.com/project/58cee4bcf98f21f60fedec74">Extend usage of optimized <span class="caps">GPU</span> libraries in Theano</a>. 2017. Accessed on: 2017-08-28</p>GSoC - Spatial Transformer 32017-08-06T16:04:00-03:002017-08-06T16:04:00-03:00João Victor Rissotag:joaovictortr.me,2017-08-06:/2017/gsoc-spatial-transformer3.html<p>In this post, I’ll present a third update on the Spatial Transformer development in Theano</p><p>Spatial transformer implementation in the <span class="caps">GPU</span> using cuDNN is now complete. The
problem with the gradient computation was solved, with the help of my mentor.
Also, comprehensive unit tests of the Op and the gradients were added. Further
details can be found in the units tests, and discussion in the <a href="https://github.com/Theano/Theano/pull/6061">pull request</a>.</p>
<p>As future direction, and last part of my GSoC project, I will implement a <span class="caps">CPU</span>
version of the spatial transformer, using symbolic operations from Theano. The
initial implementation and discussions can be followed on the appropriate <a href="https://github.com/Theano/Theano/pull/6298">pull request</a> as well.</p>GSoC - Spatial Transformer 22017-07-24T14:37:00-03:002017-07-24T14:37:00-03:00João Victor Rissotag:joaovictortr.me,2017-07-24:/2017/gsoc-spatial-transformer2.html<p>In this post, I’ll present an update on the Spatial Transformer development in Theano</p><p>Spatial transformer implementation is still not yet completed, due to some
issues in the implementation of the gradients. These issues should be solved
by this week, and a <span class="caps">CPU</span> implementation of the spatial transformer might also
be implemented thereafter, which is already implemented in Lasagne, but needs
to be ported into Theano.</p>
<p>Most of the problem with gradients consists in some <code>DisconnectedType</code> objects
appearing in the computational graph, where Theano variables would be expected,
when we want to compute and verify the gradients in the unit tests. I have tried
several different approaches to setup the grad test, but have not succeeded in
any of them.</p>
<p>Gradient computations in Theano implement, through symbolic operations, the chain
rule of differential calculus:</p>
<p><span class="math">\(\frac{\partial C}{\partial x} = \frac{\partial C}{\partial f} * \frac{\partial f}{\partial x}\)</span></p>
<p>Where C is the cost function that returns a scalar, f is a function computed by
the spatial transformer over a vector or tensor x, and x consists in our input images.
In order to compute the gradient of the inputs with respect to the cost function, one
must implement the chain rule using symbolic operations. However, in the case of the
gradients of the spatial transformer, the issue is that the derivative of the cost
with respect to f is a <code>DisconnectedType</code> object, i.e. not a Theano variable, as
one would expect.</p>
<p>In order summarize the changes, here is a short list of what as changed since the last time:</p>
<ul>
<li>Op now uses scaling factors to set the dimensions of the sampling grid.</li>
<li>Op implementation was merged into a single class (GpuDnnTransformer).</li>
<li>Initial implementations of the gradients for both inputs and affine transformation were added.</li>
<li>Comprehensive tests and type checks were added to verify the Op implementation.</li>
</ul>
<p>You can follow the <a href="https://github.com/Theano/Theano/pull/6061">Pull Request</a> to
keep track of the ongoing development and discussions.</p>
<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
var align = "center",
indent = "0em",
linebreak = "false";
if (false) {
align = (screen.width < 768) ? "left" : align;
indent = (screen.width < 768) ? "0em" : indent;
linebreak = (screen.width < 768) ? 'true' : linebreak;
}
var mathjaxscript = document.createElement('script');
mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
mathjaxscript.type = 'text/javascript';
mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
mathjaxscript[(window.opera ? "innerHTML" : "text")] =
"MathJax.Hub.Config({" +
" config: ['MMLorHTML.js']," +
" TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'AMS' } }," +
" jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
" extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
" displayAlign: '"+ align +"'," +
" displayIndent: '"+ indent +"'," +
" showMathMenu: true," +
" messageStyle: 'normal'," +
" tex2jax: { " +
" inlineMath: [ ['\\\\(','\\\\)'] ], " +
" displayMath: [ ['$$','$$'] ]," +
" processEscapes: true," +
" preview: 'TeX'," +
" }, " +
" 'HTML-CSS': { " +
" styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
" linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
" }, " +
"}); " +
"if ('default' !== 'default') {" +
"MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
"var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
"VARIANT['normal'].fonts.unshift('MathJax_default');" +
"VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
"VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
"VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
"});" +
"}";
(document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
}
</script>GSoC - Spatial Transformer2017-06-29T00:30:00-03:002017-06-29T00:30:00-03:00João Victor Rissotag:joaovictortr.me,2017-06-29:/2017/gsoc-spatial-transformer.html<p>In this post, I’ll present a follow up on the Spatial Transformer development in Theano</p><p>Currently, I am working on the implementation of a wrapper over cuDNN functions
to provide a Spatial Transformer in Theano. Today, I’ve got the first part of
the transformer working. In this post, I’ll show a basic example of how to use
this initial functionality.</p>
<p>First, let’s obtain an image onto which we will apply some transformations. There
is simple example image provided by scipy. Then, we’ll import scipy, matplotlib
(to view the image), and then load the image:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span>
<span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">misc</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">misc</span><span class="o">.</span><span class="n">face</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
<p>Which should show the following image:</p>
<p><img alt="Raccoon face image from scipy.misc module" src="https://joaovictortr.me/images/raccoon_face.png"></p>
<p>Images are usually represented with (h, w, c) data layout, where h is the
height, w is the width, and c is the number of color channels. Also, the pixel
values are represented in a range of integer values in [0, 255]. When we pack
a set of images with the data layout in a single Numpy array, we get a data
layout of (n, h, w, c), where n is the number of images, becoming a 4-dimensional
array (or a 4-D tensor).</p>
<p>Since the spatial transformer utilizes a grid to sample data from the input, we
must define its dimensions. If the width and height dimensions are bigger than
the corresponding ones from the input image, we will perform oversampling, using
bilinear interpolation. Conversely, by using a smaller width or height, we will
perform an subsampling of the original image, also with bilinear interpolation.</p>
<p>So, let’s define our grid dimensions as follows:</p>
<div class="highlight"><pre><span></span><span class="c1"># shape: (num_images, channels, height, width)</span>
<span class="n">grid_dims</span> <span class="o">=</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="mi">128</span><span class="p">)</span>
</pre></div>
<p>The layout of the grid dimensions is given by the number of images, followed by
the number of channels, height and width of these images.</p>
<p>Packing images together is relevant in this case, since the spatial transformer
functions from cuDNN expect a 4D data layout, we pack our original image in the
(n, h, w, c), while adding more examples of the same image to the final array:</p>
<div class="highlight"><pre><span></span><span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">grid_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="p">[</span><span class="n">f</span><span class="p">])</span>
</pre></div>
<p>Also, our input images need to be normalized into the interval [-1, 1] (see the
<a href="https://arxiv.org/abs/1506.02025">original paper</a> for more details). So we convert
our images to float, then normalize them:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">normalize_input</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
<span class="c1"># Scale input from [0, 255] to [0, 2]</span>
<span class="n">scale_factor</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">**</span> <span class="o">-</span><span class="mi">7</span> <span class="c1"># equivalent to 1 / 128</span>
<span class="nb">input</span> <span class="o">*=</span> <span class="n">scale_factor</span>
<span class="c1"># Re-scale input from [0, 2] to [-1, 1] (normalized)</span>
<span class="nb">input</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="nb">input</span>
<span class="n">img</span> <span class="o">=</span> <span class="n">normalize_input</span><span class="p">(</span><span class="n">img</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">))</span>
</pre></div>
<p>The <code>normalize_input</code> function takes the images as input, and normalizes the
pixel values into the range [-1, 1]. Multiplication and subtraction work on
each element of the array through <a href="https://docs.scipy.org/doc/numpy-1.12.0/user/basics.broadcasting.html">broadcasting</a>.</p>
<p>Now, we have to obtain the transformation we want to apply on the image. The
transformation is usually obtained from a localisation network, which must
have a regression layer at the output layer. A localisation network can be
built on top of already existing components of Theano.</p>
<p>In this case, we’ll use a predefined transformation, to show how the spatial
transformer works without the localisation net. Suppose we want to flip the
image - rotate it by 180 degrees, in order to achieve that using an affine
transformation, we have to define a rotation operation on our transformation matrix:</p>
<div class="highlight"><pre><span></span><span class="c1"># Rotation matrix for 180 degree rotation, including translation</span>
<span class="n">rotate</span> <span class="o">=</span> <span class="p">[[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]]</span>
<span class="c1"># One matrix is applied for each image, in this case we use the same matrix</span>
<span class="c1"># for all images.</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">grid_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="p">[</span><span class="n">rotate</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
</pre></div>
<p>Spatial transformer also expects the data layout of the tensor to be in (n, c, h, w),
instead of the (n, h, w, c) we have created, so we have to transpose the tensor:</p>
<div class="highlight"><pre><span></span><span class="n">img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">axes</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
</pre></div>
<p>We could also perform the transpose operation on a Theano symbolic variable,
using the transpose function of <code>theano.tensor</code>.</p>
<p>Now we are ready to transform the images, and then we instantiate the spatial transformer:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">theano.gpuarray.dnn</span> <span class="kn">import</span> <span class="n">dnn_spatialtf</span>
<span class="n">transformer</span> <span class="o">=</span> <span class="n">dnn_spatialf</span><span class="p">(</span><span class="n">img</span><span class="p">,</span> <span class="n">theta</span><span class="p">,</span> <span class="n">grid_dims</span><span class="p">)</span>
</pre></div>
<p>Then, we have to compile and call a theano function to compute the transformed values:</p>
<div class="highlight"><pre><span></span><span class="c1"># Create theano function to compute transformation</span>
<span class="n">fn</span> <span class="o">=</span> <span class="n">theano</span><span class="o">.</span><span class="n">function</span><span class="p">([],</span> <span class="p">[</span><span class="n">transformer</span><span class="p">])</span>
<span class="c1"># Compute transformation</span>
<span class="n">out_img_gpu</span> <span class="o">=</span> <span class="n">fn</span><span class="p">()</span>
</pre></div>
<p>Now, we have three things to consider:</p>
<ol>
<li>Results are in the <span class="caps">GPU</span> memory, so we have to copy them back</li>
<li>Resulting values of the images are in the [-1, 1] range, so we have to rescale them to [0, 255]</li>
<li>Data layout of resulting images is (n, c, h, w), so we have to convert them to (n, h, w, c), in order to visualize the images.</li>
</ol>
<p>Which we will do next:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rescale_input</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
<span class="c1"># Re-scale output to range [0, 2]</span>
<span class="nb">input</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="c1"># Re-scale output to range [0, 255]</span>
<span class="nb">input</span> <span class="o">*=</span> <span class="mi">128</span>
<span class="k">return</span> <span class="nb">input</span>
<span class="c1"># Copy results back from the GPU</span>
<span class="n">out_img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">out_img_gpu</span><span class="p">)</span>
<span class="c1"># Re-scale values from [-1, 1] to [0, 255], and convert to uint8</span>
<span class="n">out_img</span> <span class="o">=</span> <span class="n">rescale_input</span><span class="p">(</span><span class="n">out_img</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">uint8</span><span class="p">)</span>
<span class="c1"># Convert from NCHW to NHWC</span>
<span class="n">out_img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">out_img</span><span class="p">,</span> <span class="n">axes</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</pre></div>
<p>Finally, we can visualize our transformed images:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span>
<span class="k">for</span> <span class="n">img_idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">out_img</span><span class="p">)):</span>
<span class="n">plt</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">out_img</span><span class="p">[</span><span class="n">img_idx</span><span class="p">])</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</pre></div>
<p>Which would look like the following:</p>
<p><img alt="Raccoon face image from scipy.misc module" src="https://joaovictortr.me/images/raccoon_face_flipped.png"></p>
<p>In this case, the image is subsampled, because our grid dimensions are smaller
than those of the original image. However, we can still see the original image flipped.</p>
<p>That is it for today, however there are tasks left to complete the implementation:</p>
<ul>
<li>Add proper tests to the already implemented functionality</li>
<li>Gradients are not yet supported, so one of the next steps is implementing the
backward operations to implement the gradients (allowing backpropagation)</li>
<li>Add proper tests also to the gradients</li>
<li>Provide a functional example: I’m working with a <a href="https://github.com/Lasagne/Recipes/blob/master/examples/spatial_transformer_network.ipynb">Lasagne implementation</a>
to build a neural net model that can be used with the <span class="caps">GPU</span> Spatial Transformer.</li>
</ul>GSoC - The Road So Far2017-06-26T22:30:00-03:002017-06-26T22:30:00-03:00João Victor Rissotag:joaovictortr.me,2017-06-26:/2017/gsoc-project-progress-report.html<p>In this post, I will talk about the progress of my GSoC project from Jun, 11 until Jun, 26.</p><p>In the last post, I have introduced what the GSoC project consisted of and the
operations I intend to implement by the end of the program. I am happy to announce
that the first part of the project is now completed, however it is still
pending to be merged. You can find the <span class="caps">PR</span> with the code and discussions on
the wrapper implementation <a href="https://github.com/Theano/Theano/pull/5949">here</a>.</p>
<h3>Connectionist Temporal Classification</h3>
<p>The first part of the project consisted in implementing a wrapper for Theano
that makes use of <a href="https://github.com/baidu-research/warp-ctc">warp-ctc</a>,
a fast implementation of the <span class="caps">CTC</span> loss function by Baidu Research. Their
implementation works both on multi-core processors (by using OpenMP threads)
and also on GPUs, using <span class="caps">CUDA</span> kernels to compute the <span class="caps">CTC</span> function. More details
on the inner workings of warp-ctc, can be found
<a href="https://github.com/baidu-research/warp-ctc">here</a> and <a href="http://arxiv.org/abs/1512.02595">here</a>.</p>
<p><span class="caps">CTC</span> stands for Connectionist Temporal Classification and it was proposed by
<a href="http://www.cs.toronto.edu/%7Egraves/icml_2006.pdf">Graves <em>et.al</em></a>. <span class="caps">CTC</span>
consists in a loss function that allows temporal classification of unsegmented
data in recurrent neural networks (RNNs). Problems with unsegmented data are
very common in perceptual tasks, such as handwriting recognition and speech recognition.</p>
<p>Labelling unsegmented sequence data, such as images and sound, was challenging
in RNNs because objective functions for these kinds of networks are defined
separately for each point (of features) in the training sequence. That meant
that training data had to be pre- and post-processed to obtain a final label sequence.</p>
<p>Outputs of a <span class="caps">CTC</span> network are given by a softmax layer, whose results are
interpreted as a probability distribution over all possible label sequences,
conditioned by a given input sequence. Given that distribution, an objective
function was derived to maximize the probabilities of correct labellings. Since
the objective function is differentiable, the network can be trained with
backpropagation through time.</p>
<p>At the moment, I am working on a functional example of the <span class="caps">CTC</span> functionality to
showcase how the implementation can be used in Theano. I’m building the network
using Lasagne, which is a lightweight library of neural network building
blocks on top of Theano.</p>
<h3>Spatial Transformer</h3>
<p>I have started working on the Spatial Transformer Op last week, with some of
the basic functionality already in place. The Op is built on top of cuDNN
functions that provide the components needed for a Spatial Transformer,
therefore the implementation will be limited to GPUs at the moment.</p>
<p>A Spatial Transformer is a component of a neural network that can provide
spatial manipulation of data within the network. Spatial manipulation can
improve models by introducing invariance to affine transformations, such as
translation, scaling and rotation. This kind of invariance improves classification
performance, since the networks become able to recognize, for example, distorted samples.</p>
<p><img alt="Spatial Transformer. Source: original paper by Jaderberg **et. al*" src="https://joaovictortr.me/images/spatial_transformer.png"></p>
<p>There are three main components in a Spatial Transformer, as shown in the image
above (provided in the <a href="https://arxiv.org/abs/1506.02025">paper by Jaderberg <em>et. al</em></a>):</p>
<ul>
<li>Localisation network: neural network that receives the input feature map U,
where U is a space spanned by the width, height and channels. It outputs the
parameters of the transformation to be applied to the feature map. In 2D, the
parameters take the form of a 2x3 matrix (i.e. an affine transformation matrix).
It can take the form of any neural network, but it should include a final regression
layer to produce the transformation parameters.</li>
<li>Grid generator: normalized grid of coordinates over the input feature map. It
maps the original coordinate system of the input to an interval in [-1, 1], and
applies the transformation the normalized space.</li>
<li>Sampler: sampler take a set of sampling points from the grid generator, along
with the input feature map U and produces the sampled output feature map V.</li>
</ul>
<p>The cuDNN library provides functions to create both the grid generator and the
sampler as well as the back-propagation functions. Therefore, the main task of
the second part is to provide a wrapper in Theano over these functions, in order
to provide the Spatial Transformer Op. There is a limitation however, cuDNN can
only handle 2D transformations on the inputs, so for now mainly images are supported.</p>
<p>Since it is still a work in progress, further testing is necessary for the grid
generator and the sampler. Back-propagation is not yet implemented, but it will
be soon. You can keep up with the development of the wrapper and provide feedback
in the <a href="https://github.com/Theano/Theano/pull/6061">pull request</a>.</p>
<h3>Showcasing and Examples</h3>
<p>I have started working today with Lasagne to implement neural network models to
demonstrate how to utilize the implementations, serving as a reference for users
interested in the functionalities.</p>
<p>So far, I am trying to find an interesting dataset to work with, specially on <span class="caps">CTC</span>,
mainly because the lack of open speech datasets. Also, the models proposed in the
original papers are somewhat complex, so they might take too much time to build,
so I will focus on building simpler models.</p>
<h3>Conclusion</h3>
<p>I have finished the first implementation part of the project, by providing the
wrapper warp-ctc. Now, I am working on implementing the Spatial Transformer,
with some of the basic functionalities already in place. Furthermore, I intend
to provide some simple models to demonstrate how to use the functionalities
using Lasagne and Theano.</p>Starting at GSoC 2017 with Theano2017-06-11T23:45:00-03:002017-06-11T23:45:00-03:00João Victor Rissotag:joaovictortr.me,2017-06-11:/2017/gsoc-starting-with-theano.html<p>I’ve been accepted for GSoC 2017, in this post I’ll talk about some details of the project</p><p>I have been selected for GSoC 2017 and I’m glad to work on a project to bring
more <span class="caps">GPU</span> acceleration to <a href="https://github.com/Theano/Theano">Theano</a>, a sub-org
of the <a href="https://www.python.org/psf/">Python Software Foundation</a>.</p>
<p>This summer I’ll add support for some <span class="caps">GPU</span> libraries in Theano, which will help
applications to run potentially faster in <span class="caps">GPU</span>-enabled systems.</p>
<p>Why <span class="caps">GPU</span> acceleration matters, anyway?</p>
<p>GPUs are becoming increasingly important
to accelerate Deep Learning applications. These accelerators provide massive
parallelism through lots of concurrent threads. Parallel processing of images,
speech and video, for example, speeds up the tasks of training and inference
in neural networks.</p>
<p>What is Theano?</p>
<p>Theano is a Python library to handle mathematical expressions
with multi-dimensional arrays (tensors) in an efficient way. Theano has been
developed in the <a href="https://mila.umontreal.ca/">Machine Institute for Learning Algorithms (<span class="caps">MILA</span>)</a>,
at the University of Montreal, since 2007. It is commonly utilized to implement
neural network models, and provides several features, such as:</p>
<ul>
<li>Efficient symbolic differentiation: the library computes the derivatives of
functions with one or many inputs.</li>
<li>Speed and stability optimizations: Theano can rearrange your computations,
in order to obtain more numerical stability. It also support dynamic C code
generation, which provides efficient code to perform your computations.</li>
<li>Transparent use of a <span class="caps">GPU</span>: compute-intensive tasks can be performed on a <span class="caps">GPU</span>
transparently, and some operations can lift the <span class="caps">CPU</span> tasks automatically to
the <span class="caps">GPU</span>.</li>
</ul>
<p>As Marek Rei pointed out in a <a href="http://www.marekrei.com/blog/theano-tutorial/">blog post</a>,
Theano isn’t actually a machine learning library, but it provides you with
means to build your own machine learning models.</p>
<p><a href="https://summerofcode.withgoogle.com/projects/#5331912125054976">My project</a>
for this summer consists of three main tasks:</p>
<ul>
<li>Integrate Baidu Research’s <a href="https://github.com/baidu-research/warp-ctc">warp-ctc</a>
library to provide fast connectionist temporal classification (<span class="caps">CTC</span>) loss
computations. Work on this feature is almost completed (for both <span class="caps">CPU</span> and <span class="caps">GPU</span>),
and should be merged soon. You can find more details on the
<a href="https://github.com/Theano/Theano/pull/5949">pull request</a>.</li>
<li>Add further <span class="caps">GPU</span>-accelerated linear algebra functions to Theano’s new <span class="caps">GPU</span>
backend. Details from the original project have changed, so I’ll add a follow
up post as soon as possible.</li>
<li>Implement Spatial Transformer Networks’ operations from the cuDNN library.
These networks are composed of three main components: a localization network,
a grid generator and a grid sampler. You can find more details on the
<a href="https://arxiv.org/abs/1506.02025">original paper</a>.</li>
</ul>
<p>I will add more posts as the project progresses throughout the summer.</p>