FluxML/Flux.jl

PyTorch feature parity

Open

#1,431 创建于 2020年12月19日

在 GitHub 查看
 (94 评论) (33 反应) (0 负责人)Julia (4,725 star) (619 fork)batch import
help wanted

描述

A list of PyTorch 1.7 features. Items are checked if we have something more or less equivalent in Flux or in the julia ecosystem and supported by Flux. This list is not complete, it comes from a rough scan of pytorch's documentation. Please feel free to add anything I missed in the comments, and whoever has write access to modify the list. Related issue https://github.com/FluxML/ML-Coordination-Tracker/issues/16, and more generally anything in https://github.com/FluxML/ML-Coordination-Tracker/issues

Pytorch Features

Conv Layers

Pooling Layers

  • MaxPool1d, MaxPool2d, MaxPool3d
  • MaxUnPool1d, MaxUnPool2d, MaxUnPool3d
  • AvgPool1d, AvgPool2d, AvgPool3d
  • FractionalMaxPool2d
  • LPPool1d, LPPool2d
  • AdaptiveAvgPool1d, AdaptiveAvgPool2d, AdaptiveAvgPool3d
  • AdaptiveMaxPool1d, AdaptiveMaxPool2d, AdaptiveMaxPool3d

Padding Layers

  • ReflectionPad (1d,2d)
  • ReplicationPad (1d,2d,3d) ( NNlib.pad_repeat)
  • ZeroPad (2d)
  • ConstantPad (1d,2d,3d)
  • Add corresponding layers for all of the aboves wrapping the NNlin functions keep as functions. Need to add them Flux's docs.

Activations

  • ... . NNlib has an extensive collection of activation, plus we have any julia function.

Normalization Layers

  • BatchNorm1d, BatchNorm2d, BatchNorm3d
  • LayerNorm
  • GroupNorm
  • InstanceNorm1d,InstanceNorm2d,InstanceNorm3d
  • SyncBatchNorm
  • LocalResponseNorm. Very old unfinished PR #312. It is an outdated technique, probably we can live without it.
  • Move the functional implementations to NNlib.jl (https://github.com/FluxML/NNlib.jl/issues/19)

Recurrent Layers

  • RNN
  • GRU
  • LSTM

Attention Layers

  • Transformer. Well maintained implementations in Tansformers.jl.
  • MultiHeadAttention Should be moved from Transformers.jl to Flux.jl (ensure hitting cudnn kernels). PR #2146

Linear Layers

  • Identity
  • Linear
  • Bilinear

Dropout Layers

  • Dropout
  • Dropout2d, Dropout3d (#1490)
  • AlphaDropout

Sparse Layers

  • Embedding PR #1516
  • EmbeddingBag PR #2031

Distance Functions

  • CosineSimilarity. We have this in Distances.jl. Also easy to handcode. TODO check if AD and gpu friendly.
  • PairwiseDistance. We have this in Distances.jl TODO check if AD and gpu friendly (could use Tullio.jl to achieve both)

Loss Functions

  • .... . We should be well covered here.
  • CTCLoss. Being Implemented in #1287 (todo: remove separate GPU case, integrate with cudnn)

Vision Layers

  • PixelShuffle. #1468
  • Upsample (for 1d, 2d, and 3d). (partially done in #1468)
    • 'nearest'
    • 'linear' (cpu version merged in NNlib, CUDA PR still to come)
    • 'bilinear'
    • 'bicubic'
    • 'trilinear' (cpu versino merged in NNlib, CUDA PR still open )

Initialization

  • xavier_uniform, xavier_normal. Called glorot here.
  • kaiming_normal kaiming_uniform
  • sparse
  • orthogonal (#1496)

Parallelism and Distributed

  • DataParallel
  • DistributedDataParallel(solved by https://github.com/DhairyaLGandhi/DaggerFlux.jl
  • set_num_threads, set_num_interop_threads. Not sure which operations are parallelized in pytorch. Here we have parallelization only in blas operations.

Distributions

  • diff rules for logpdf offered by DistributionsAD.jl
  • rsample. params's differentiability through sampling supported by many distr: gradient(mu -> rand(Normal(mu, 1)), 0) == (1,).

ONNX

FFT

  • ... . Zygote has the adjoints for AbstractFFTs.

Quantization

  • ...

Pruning

  • WIP pruning package here

Optim

  • schedulers #1434 and #1506, also see ParameterSchedulers.jl
    • Integrate with Flux's optimizers? (See https://github.com/FluxML/Optimisers.jl/pull/15)
    • Document in Flux (see #1511 and #1513) - [ ] Reexport in Flux (see #1506) (TBD)
    • LambdaLR (handled in ParameterSchedulers.jl)
    • MultiplicativeLR (handled in ParameterSchedulers.jl)
  • optimizers
    • SGD (+ momentum)
    • Adam
    • AdaGrad
    • AdaDelta
    • RMSprop
    • LBFGS. Integration with Optim.jl

LinAlg

  • det
  • norm

Tensorboard

XLA

Misc

  • Pytorch has both layers and their functional counterpart.
  • einsum. AD and CUDA compatible Einstein summation given by Tullio.jl and other packages
    • add Documentation to Flux.jl
  • LazyModuleMixin (pytorch 1.8) PR #2078
  • weight_norm. Attempt in #1005 , PR #2053
  • modules iterator. #1444
  • spectral_norm. Old attempt in #115

Pytorch Extras

Torchvision

Torchaudio ...

Torchtext ...

贡献者指南