Data Science for Social Impact Research Group @ University of Pretoria (@dsfsi)

Top repositories

1

textaugment

TextAugment: Text Augmentation Library
Python
395
star
2

covid19za

Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Jupyter Notebook
255
star
3

covid19africa

Africa open COVID-19 data working group
Jupyter Notebook
48
star
4

masakhane-web

Masakhane Web is a translation web application for solely African Languages.
Jupyter Notebook
34
star
5

PuoBERTa

A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.
Makefile
4
star
6

gov-za-multilingual

The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements
Jupyter Notebook
4
star
7

Higher_Education_EDA

This is an EDA Git for education researchers and practitioners
Jupyter Notebook
3
star
8

project-state-capture

Zondo Commission or State Capture Commission Transcripts
2
star
9

za-mavito

DSFSI South African Terminlogy Lists and Lexicon Project
HTML
2
star
10

dsfsi-datasets

Datasets made available for different small projects
Jupyter Notebook
2
star
11

PuoData

Curated corpora for Setswana. Used to train PuoBERTa.
2
star
12

za-bank-risk

This repository is an initial pipeline for reading, processing, labelling and classifying unstructured annual reports of South African (SA) banks with the aim of identifying financial risk. It leveraged work by the Corporate Financial Information Environment-Final Report Structure Extractor (CFIEโ€“FRSE) of El-Haj et al. which created a corpus of annual reports of United Kingdom (UK) companies.
Jupyter Notebook
2
star
13

sa-parliament

South African Member Of Parliament Data
Python
2
star
14

embedding-eval-data

Embedding Evaluation Data for South African Languages
1
star
15

2020-AMMI-salomon

Jupyter Notebook
1
star
16

dsfsi-dataset-template

Makefile
1
star
17

zabantu-beta

ZaBantu is a fleet of light-weight Masked Language Models for Southern Bantu Languages
Python
1
star
18

gov-za-sona-multilingual

Python
1
star
19

izindaba-zesizulu

Categorised isiZulu News. Source data is the isiZulu news from the SABC social media posts.
1
star