Shares

Google and a consortium of Africa’s leading research institutions have unveiled WAXAL, a massive, open-access speech dataset.

While voice-activated assistants and real-time translation have become staples of modern life in the West, most of Africa’s 2,000+ languages have been left behind due to a lack of high-quality training data. WAXAL (which means “to speak” in Wolof) changes that narrative by providing foundational data for 21 Sub-Saharan languages, ranging from Hausa and Yoruba to Luganda and Acholi.

The WAXAL project is the result of a three-year intensive collaboration funded by Google. The dataset is comprised of:

  • 1,250 hours of transcribed, natural speech for conversational AI.
  • 20+ hours of high-fidelity studio recordings, specifically designed to help developers create natural-sounding synthetic voices.

“The ultimate impact of WAXAL is the empowerment of people in Africa,” said Aisha Walcott-Bryantt, Head of Google Research Africa. “This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, reaching over 100 million people.”

Unlike many global tech projects, WAXAL was built on a principle of local sovereignty. Data collection was led by African academic powerhouses, including Makerere University (Uganda), the University of Ghana, and Digital Umuganda (Rwanda).

Crucially, these partner institutions retain full ownership of the data. This “community-first” framework ensures that the intellectual property and cultural nuances of the data remain in the hands of the people who created it.

“For AI to have a real impact in Africa, it must speak our languages and understand our contexts,” says Joyce Nakatumba-Nabende of Makerere University. “In Uganda, it has already strengthened our local research capacity and supported new student and faculty-led projects.”

The reach of the project is vast, covering a diverse linguistic map:

  • West Africa: Akan, Ewe, Fante, Fulani, Hausa, Igbo, Yoruba, and more.
  • East Africa: Luganda, Swahili, Kikuyu, Dholuo, and Acholi.
  • Southern/Central Africa: Shona, Lingala, and Malagasy.

The WAXAL dataset is now available to the public, inviting a new wave of African-led innovation.

The complete WAXAL collection is released under an open license and is available to access today on Hugging Face.