⁷Note that the neela and the mutli models in Table 2 were trained with lower dimension than the best-performing model, so results here are not comparable among these different architectures.