http://ipkitten.blogspot.com/2024/09/participatory-research-and-joint.html
One of the key ways in which African communities (whether geographically bounded or bounded by common interests) are addressing the challenge of the low-resource nature of their languages in the machine learning and natural language processing space is through participatory research and open licensing of African [language] datasets arising from such research.
For participatory research which this post focuses on, one pertinent issue is how to quantify and recognise the contributions of key participants in the creation of African datasets, in a way that is accurate and equitable. When existing, standard open licenses are utilised to distribute and share datasets/outputs from participatory research, it may be difficult for such licenses to appropriately capture and address the contributions and interests of research contributors/participants. A particularly compelling illustration of this challenge is the plethora of African language datasets that are openly shared on digital platforms using existing standard open licences without acknowledgment of, and/or catering to the interests of data sources, content creators, translators, data curators, language technologists, data evaluators that understand the languages involved. These stakeholders have diverse needs ranging from access to technology and electricity; language preprocessors, MT toolkits and keyboards, access to compute resource; etc.
Creating and producing African datasets is rarely a solo venture hence the concept of participatory research. Various persons and organisations with different interests come together and collaborate to produce these African language datasets. It is important and ethical to acknowledge and make provisions for recognising their significant contributions and input.
Each African dataset creation/production community can establish a custom for acknowledging and where appropriate, fairly compensate contributions especially for contributions that are material to the African dataset that emerge. This post presents the African dataset creation split sheet, as one way to address this issue.
From music split sheets to African dataset creation split sheet
A split sheet is a widely recognised document in the music publishing (and even recording) industry and is used to identify and name all of the key creative contributors to the song creation process (i.e. songwriters and sometimes producers) as well as the respective percentage of contribution (‘splits’) of each named contributor. To be effective and useful, split sheets are handed out to collaborators prior to the start of recording of the composition. All collaborators discuss splits and list the percentages on the sheet as soon as the composition is completed. Copies of the signed split sheet are handed over to all collaborators and also lodged with publishing companies and collective management organisations.
Split sheets are neither copyright agreements nor licence arrangements. However, given the way copyright law defines works of joint authorship, royalties from songs or musical compositions made or created without split sheets would likely be equally divided among all involved contributors or worse, have one person or organisation taking up royalties that actually belong to several persons jointly. Per the South African Copyright Act, a “work of joint authorship” means a work produced by the collaboration of two or more authors in which the contribution of each author is not separable from the contribution of the other author or authors“. See also this Katpost analysing a UK Court of appeal decision on joint authorship. As such, split sheets are helpful and necessary to accurately capture contributions to compositions and by extension, royalty splits amongst composers.
Deciding on percentage splits
As with any joint authored or collaborative project, people are not always on the same page about what percentage they feel they contributed. This can make settling on a split that feels fair to everyone complicated. However, certain splits are customary based on genre. Whatever be the split formula, ownership shares of a given song must add up to 100% in total. Similarly, African dataset creation communities can establish a custom for splits depending on the form of dataset creation/production project. For example, the approach of the Masakhane Research community according to its Authorship Guidelines is that an individual would be invited to be an author if they have contributed one or more of results; data; lived experiences; code; etc. to the research. This thinking and the split sheet idea or approach may be extended to participatory research for datasets to capture the contributions of all parties to the creation of a given dataset.
Extrapolating songwriter split sheet information to the datasets creation environment
Songwriter split sheets indicate each contributor’s percentage of contribution and also their percentage of ownership. Generally, split sheets contain information such as song title, legal name, role of and percentage owned each contributor to the song, collective management organisation/publishing company, and/or record label of each contributor, contact information including email addresses of each contributor, and signature of each party.
In many ways, the creation and production of African language datasets is similar to the process of songwriting, composition and recording process. Both creative/productive processes involve various persons and organisations with different interests coming together and collaborating to produce protectable materials. As such, extending the songwriter split sheet idea or approach to the African dataset creation environment to capture the contributions of all parties to the creation of a given dataset means not just requiring every key contributor to provide some of their bio data but require them to reflect on their role and contribution to the datasets created. These and related information have been condensed into a form – African Datasets creation split sheet – to provide a tool for those working or interested in the creation of African datasets to capture key contributions and address diverse interests.
What do readers think about the utility and/or efficacy of this tool? Are there revisions that readers consider necessary?
Songwriter split sheets indicate each contributor’s percentage of contribution and also their percentage of ownership. Generally, split sheets contain information such as song title, legal name, role of and percentage owned each contributor to the song, collective management organisation/publishing company, and/or record label of each contributor, contact information including email addresses of each contributor, and signature of each party.
In many ways, the creation and production of African language datasets is similar to the process of songwriting, composition and recording process. Both creative/productive processes involve various persons and organisations with different interests coming together and collaborating to produce protectable materials. As such, extending the songwriter split sheet idea or approach to the African dataset creation environment to capture the contributions of all parties to the creation of a given dataset means not just requiring every key contributor to provide some of their bio data but require them to reflect on their role and contribution to the datasets created. These and related information have been condensed into a form – African Datasets creation split sheet – to provide a tool for those working or interested in the creation of African datasets to capture key contributions and address diverse interests.
What do readers think about the utility and/or efficacy of this tool? Are there revisions that readers consider necessary?
Content reproduced from The IPKat as permitted under the Creative Commons Licence (UK).