Neural Machine Translation of Rare Words with Subword Units | https://arxiv.org/abs/1508.07909 | |
---|---|---|
Root Mean Square Layer Normalization | https://dl.acm.org/doi/pdf/10.5555/3454287.3455397 | |
RoFormer: Enhanced Transformer with Rotary Position Embedding | https://arxiv.org/abs/2104.09864 |