There are no reviews yet. Be the first to send feedback to the community and the maintainers!
textaugment
TextAugment: Text Augmentation Librarycovid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africacovid19africa
Africa open COVID-19 data working groupmasakhane-web
Masakhane Web is a translation web application for solely African Languages.PuoBERTa
A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.gov-za-multilingual
The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statementsHigher_Education_EDA
This is an EDA Git for education researchers and practitionersproject-state-capture
Zondo Commission or State Capture Commission Transcriptsza-mavito
DSFSI South African Terminlogy Lists and Lexicon Projectdsfsi-datasets
Datasets made available for different small projectsPuoData
Curated corpora for Setswana. Used to train PuoBERTa.za-bank-risk
This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIEβFRSE) of El-Haj et al. which created a corpus of annual reports of United Kingdom (UK) companies.sa-parliament
South African Member Of Parliament Dataembedding-eval-data
Embedding Evaluation Data for South African Languages2020-AMMI-salomon
zabantu-beta
ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languagesgov-za-sona-multilingual
izindaba-zesizulu
Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.Love Open Source and this site? Check out how you can help us