Pivot Pre-finetuning for Low Resource MT: A Case Study in Kikamba

Abstract

Current approaches to performant machine translation often require large amounts of data (Koehn et al., 2022). However for a majority of 7000+ languages in the world, these languages often have a relative lack of digitized/organized text available, and are considered low-resource. In practical terms, this often means that there is a substantial drop in quality between translation performance between high and low-resource language pairs. We look to explore the intersection of rapid NMT adaptation techniques and pre-trained sequence to sequence models to better leverage multilingual models, performing a case study on Kikamba.

Publication
ICLR 2023 TinyPapers
Stephen Kiilu
Stephen Kiilu
NLP Researcher

My research interests include machine Learning, multilingual NLP, and low-resource NLP.