policy diffusion Transformer
policy based PyTorch implementation for encoder foundation.
- Input
- 6314-dim embedding
- Encoder
- 22 x Transformer with 40 heads
- Output
- rouge-l projection
Training config
optimizer=Adadelta, lr=0.192, scheduler=exponential, warmup=1386