Rakuten interview question

Transformer architecture, loss function, numpy