The partner organizations behind Lacuna Fund have announced the second cohort of supported projects supporting Natural Language Processing (NLP) technologies Across Africa

The partners behind the Fund are

Funding recipients will create openly accessible text and speech datasets that will fuel natural language processing (NLP) technologies in 29 languages across Africa. The teams will be required to produce training datasets in Eastern, Western, and Southern Africa. This will support a range of needs for low resource languages, including machine translation, speech recognition, named entity recognition and part of speech tagging, sentiment analysis, and multi-modal datasets. All datasets produced will be locally developed and owned, and will be openly accessible to the international data community.

Lacuna Fund received over 50 applications from, or in partnership with, organizations across Africa. While each one of them, and many others, are poised for impact, the selected projects include among others:

  • Building an Annotated Spoken Corpus for Igbo NLP Tasks – University of Ibadan/Nweya
  • Entity Recognition and Parts of Speech Datasets for African Languages – K4A/Nabende
  • Open Source Datasets for Local Ghanaian Languages: A Case for Twi and Ga – Ashesi University/Boateng
  • Masakhane MT: Decolonizing Scientific Writing for Africa – K4A/Abbott
  • Building NLP Text and Speech Datasets for Low Resourced Languages in East Africa – Makerere/Katumba
  • Multimodal Datasets for Bemba – University of Zambia/SikasoteSpeaking during the announcement, FAIR Forward’s Balthas Seibold said, “If we want to seriously level the playing field, we not only need to invest in open training data, computing power and machine learning expertise, but raise attention and bring visibility to the growing African technology ecosystems. The Lacuna Fund builds on a recent groundswell of momentum to create better and more open NLP tools in African languages from machine learning community members, including academic workshops and programs, volunteer collaborations, startup projects, and other efforts.”

Participating in the Lacuna Fund complements FAIR Forward’s activities to build skills and capacities which can directly use these datasets to create AI-based solutions. In addition, FAIR Forward also creates open AI training datasets, especially for voice recognition in low-resources languages, which also contribute to the work of the Lacuna Fund projects.