enhancementhelp wanted
Description
May I ask if there is any plan adding the coverage attention mechanism (https://arxiv.org/pdf/1601.04811.pdf) and coverage loss (https://arxiv.org/pdf/1704.04368.pdf) to the decoder, as these could potentially help alleviating the repetition problem in generation?
Or, any hints on a quick implementation? Thanks!