Improve support for Recursive Neural Networks. · davisking/dlib#290

(4 comments) (0 reactions) (0 assignees)C++ (12,570 stars) (3,314 forks)batch import

enhancementhelp wanted

説明

I am opening this issue partly to summarize the limitations of dlib I encountered when using it to implement a RNN, and partly to follow-up the discussion in pull request #213, to propose alternative solutions.

@davisking commented having interest in eventually adding RNN support directly to dlib (at issue #157), but also faced many of the same issues I did. He is in favor of a dynamic approach, where the architecture could be defined at run-time, but I choose the easy way and stick to how dlib already handle neural networks, where the architecture is static, defined at compile-time. Anyway, the issues I bring up here should affect any RNN.

I created a special rnn_ layer, which expects its input the transpose of a batch of sequences. It contains an "inner" network, defined as a template argument, that process sequentially each batch of sequence step, taking as input the current sequence position together with the output of the previous iteration. This layer allows for representing a cycle in the network graph topology.

The main issue I faced is the need to unroll the direct acyclic graph: when forwarding through the rnn_ layer, the method forward() of the inner network layers may cache its internal data for an eventual call to backward(), which will use the data cached in the most recent forward() call.

To cope with this requirement (and still have all the layers implemented in dlib available for use in the rnn_ inner topology), after every call to forward(), one for each sequence step, I have to make a copy of the inner network object, to make the corresponding call to backward().

In @davisking proposed solution, tensor class would be able to share the state with another tensor (maybe some kind of shared_ptr?), so the copy of the network could be shallow, and all the copied objects could share the inner tensors with the originals. For myself, I prefer to avoid the copies altogether, and don't "unroll the DAG". I believe this can be done by changing slightly the contract of backward() and forward() methods, with respect to the cache needed by some layers (and as consequence, all layers that cache data between forward() and backward() would need to be adapted).

The new rule would be that if forward() was called N times successively, backward() would also be called successively N times, but corresponding to the reverse order of the forward() calls. So, if the layer needs to cache data, it can be done in a internal stack: one push for every forward(), one pop for every backward(). This should be done only during training (so the stack won't grow forever). If backward() won't be called, no data must be cached. The layer can decide which is the case by having taking an extra parameter bool is_training or bool expect_backward in forward(). I hope such a solution would not impact the performance of existing layers, if implemented. Is such solution acceptable?

Second problem, not nearly as bad as the first, is that the rnn_ trained parameters are actually the parameters of all the inner network layers concatenated. In order to return them via get_layer_params(), I have to copy all of them into a single tensor. This tensor is updated by the training method, so before using the inner network again, I assign back it's layers' parameters with the updated values from the tensor.

Now, I believe the dimensions property is not important to the training algorithm, who sees the tensor as a flat array. If it won't compromise the performance, it would be nice if instead of a tensor&, get_layer_params() returned an user defined iterator. In ordinary cases, the iterator could be a plain pointer to float, but in my case, layer rnn_ would return an iterator accessing directly the parameters of the inner network layers, avoiding that copy.

So, what do you think? Changes like these are desirable and fit for merge?

コントリビューターガイド

技術スタック: cpp
領域: machine learning
Issue 種別: feature
難度: 5
推定時間: over 1 week
活動状況: stale
明確さ: mostly clear
前提条件: C++dlibneural networksdeep learning
初心者向け度: 10
調査方針: Examine the discussion in pull request #213 and issue #157 for context. Review the current implementation of neural network layers in dlib, especially the forward() and backward() methods and the tensor class. Evaluate the proposed stack based caching for forward/backward calls and the iterator based parameter retrieval.