描述
A list of PyTorch 1.7 features. Items are checked if we have something more or less equivalent in Flux or in the julia ecosystem and supported by Flux. This list is not complete, it comes from a rough scan of pytorch's documentation. Please feel free to add anything I missed in the comments, and whoever has write access to modify the list. Related issue https://github.com/FluxML/ML-Coordination-Tracker/issues/16, and more generally anything in https://github.com/FluxML/ML-Coordination-Tracker/issues
Pytorch Features
Conv Layers
-
Conv1d,Conv2d,Conv3d. -
ConvTranspose1d,ConvTranspose2d,ConvTranspose3d. - groups in convolution layers
-
Fold,Unfold. In progress: https://github.com/FluxML/NNlib.jl/pull/444
Pooling Layers
-
MaxPool1d,MaxPool2d,MaxPool3d -
MaxUnPool1d,MaxUnPool2d,MaxUnPool3d -
AvgPool1d,AvgPool2d,AvgPool3d -
FractionalMaxPool2d -
LPPool1d,LPPool2d -
AdaptiveAvgPool1d,AdaptiveAvgPool2d,AdaptiveAvgPool3d -
AdaptiveMaxPool1d,AdaptiveMaxPool2d,AdaptiveMaxPool3d
Padding Layers
- ReflectionPad (1d,2d)
- ReplicationPad (1d,2d,3d) ( NNlib.pad_repeat)
- ZeroPad (2d)
- ConstantPad (1d,2d,3d)
-
Add corresponding layers for all of the aboves wrapping the NNlin functionskeep as functions. Need to add them Flux's docs.
Activations
- ... . NNlib has an extensive collection of activation, plus we have any julia function.
Normalization Layers
-
BatchNorm1d,BatchNorm2d,BatchNorm3d -
LayerNorm -
GroupNorm -
InstanceNorm1d,InstanceNorm2d,InstanceNorm3d -
SyncBatchNorm -
LocalResponseNorm. Very old unfinished PR #312. It is an outdated technique, probably we can live without it. - Move the functional implementations to NNlib.jl (https://github.com/FluxML/NNlib.jl/issues/19)
Recurrent Layers
-
RNN -
GRU -
LSTM
Attention Layers
-
Transformer. Well maintained implementations in Tansformers.jl. - MultiHeadAttention
Should be moved from Transformers.jl to Flux.jl(ensure hitting cudnn kernels). PR #2146
Linear Layers
-
Identity -
Linear -
Bilinear
Dropout Layers
-
Dropout -
Dropout2d,Dropout3d(#1490) -
AlphaDropout
Sparse Layers
-
EmbeddingPR #1516 -
EmbeddingBagPR #2031
Distance Functions
-
CosineSimilarity. We have this in Distances.jl. Also easy to handcode. TODO check if AD and gpu friendly. -
PairwiseDistance. We have this in Distances.jl TODO check if AD and gpu friendly (could use Tullio.jl to achieve both)
Loss Functions
- .... . We should be well covered here.
- CTCLoss. Being Implemented in #1287 (todo: remove separate GPU case, integrate with cudnn)
Vision Layers
-
PixelShuffle. #1468 -
Upsample(for 1d, 2d, and 3d). (partially done in #1468)- 'nearest'
- 'linear' (cpu version merged in NNlib, CUDA PR still to come)
- 'bilinear'
- 'bicubic'
- 'trilinear' (cpu versino merged in NNlib, CUDA PR still open )
Initialization
-
xavier_uniform,xavier_normal. Calledglorothere. -
kaiming_normalkaiming_uniform -
sparse -
orthogonal(#1496)
Parallelism and Distributed
-
DataParallel -
DistributedDataParallel(solved by https://github.com/DhairyaLGandhi/DaggerFlux.jl -
set_num_threads,set_num_interop_threads. Not sure which operations are parallelized in pytorch. Here we have parallelization only in blas operations.
Distributions
- diff rules for
logpdfoffered by DistributionsAD.jl -
rsample. params's differentiability through sampling supported by many distr:gradient(mu -> rand(Normal(mu, 1)), 0) == (1,).
ONNX
- Current best support in ONNXmutable. See this discussion
FFT
- ... . Zygote has the adjoints for
AbstractFFTs.
Quantization
- ...
Pruning
- WIP pruning package here
Optim
- schedulers #1434 and #1506, also see ParameterSchedulers.jl
- Integrate with Flux's optimizers? (See https://github.com/FluxML/Optimisers.jl/pull/15)
- Document in Flux (see #1511 and #1513)
- [ ] Reexport in Flux (see #1506)(TBD) -
LambdaLR(handled in ParameterSchedulers.jl) -
MultiplicativeLR(handled in ParameterSchedulers.jl)
- optimizers
- SGD (+ momentum)
- Adam
- AdaGrad
- AdaDelta
- RMSprop
- LBFGS. Integration with Optim.jl
LinAlg
-
det -
norm
Tensorboard
- integration offered by TensorBoardLogger.jl
XLA
- Some work in XLA.jl
Misc
- Pytorch has both layers and their functional counterpart.
-
einsum. AD and CUDA compatible Einstein summation given by Tullio.jl and other packages- add Documentation to Flux.jl
- LazyModuleMixin (pytorch 1.8) PR #2078
-
weight_norm. Attempt in #1005 , PR #2053 - modules iterator. #1444
-
spectral_norm. Old attempt in #115
Pytorch Extras
Torchvision
- datasets. Some are implemented in DLDatasets.jl (unreleased), some in FastAI.jl, some in MLDatasets.jl, many are missing.
- Will consolidate in MLDatasets.jl (see https://github.com/lorenzoh/DLDatasets.jl/issues/1)
- models. Some are implemented in Metalhead.jl, but it is a bit stale and not comprehensive.
- Metalhead's PR should add a bunch of model and generally revive the repo
- We should expose the possibility to load pretrained weights
- io
- transforms. Some
unreleasedwork in DataAugmentation.jl
Torchaudio ...
Torchtext ...