Large Language Models as Open Source
So when Prof Jerry John Kponyo asks "What data sets do we need to ensure responsible AI", my reply is that we need so much more than good datasets to ensure responsible AI. I honestly think that democratic countries need to join together to develop LLMs that are open source, open to researchers, and designed explicitly with betterment of human lives in mind. I think the LLMs of the future will be like electricity or water, something that everyone will need and use for most of our everyday tasks. For that reason, we need to create models where we can ensure there is transparency and equal access for everyone.
The training data for these models, should of course include much richer data like the afro-centric data sets developed by Prof Kponyo's fantastic Responsible AI lab. There is no doubt we need models that reflect the entire human experience and are not just based on Western European and North American data.
That said, my answer has to circle back to the fact that no matter how rich and our data sets become, our models will have shortcomings and biases. By owning the models, however, we can make sure that we can identify those shortcomings in an open and public setting – and iterate to make sure the next model is a little bit less imperfect.
After four texts from four countries, I would now like to return to Enkelejada Kasneci. She began our global dialog with thoughts on AI and the education system – and will conclude this series with a look at the perspectives of all the contributors.