• Stars
    star
    1
  • Language
    R
  • Created about 2 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

More Repositories

1

splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
JavaScript
1,251
star
2

splink_demos

Interactive notebooks containing demonstration code of the splink library
HTML
38
star
3

shinyGovstyle

Now up to GDS frontend version v4.0.0
CSS
38
star
4

airflow-pdf2embeddings

NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
Python
35
star
5

xltabr

xltabr: An R package for writing formatted cross tabulations (contingency tables) to Excel using openxlsx
R
31
star
6

etl-pipeline-example

An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
Python
26
star
7

coffee-and-coding-public

MoJ coffee and coding sessions that can be made publicly available
HTML
24
star
8

etl_manager

A python package to create a database on the platform using our moj data warehousing framework
Python
20
star
9

IntroRTraining

Introductory R training
HTML
18
star
10

dataengineeringutils3

Fully unit tested utility functions for data engineering. Python 3 only.
Python
14
star
11

our-coding-standards

DASD's coding principles for analytical projects
HTML
13
star
12

mojchart

R package for formatting ggplot2 charts and applying MoJ corporate colours.
R
13
star
13

user-guidance

User guidance for the MoJ Analytical Platform
HTML
12
star
14

writing_functions_in_r

How to write functions in R
HTML
12
star
15

rpackage_training

Making and developing R packages
11
star
16

pq-tool

Tool to analyse past parliamentary questions with visualisation in RShiny
R
10
star
17

splink_graph

pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)
HTML
10
star
18

pydbtools

Python version of dbtools
Python
10
star
19

data-engineering-and-modelling-applicant-info

Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.
9
star
20

mojap-arrow-pd-parser

Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL and Parquet is read the same (using arrow).
Python
8
star
21

s3tools

Interact with files in s3 on the Analytical Platform
R
8
star
22

mojrap

For generalised functions for RAP. If there are any functions in your RAP that will be useful to other people, please use this space to share them.
R
8
star
23

docker_spark_history_ui

A dockerised version of the spark history server which enables us to access metrics in the spark ui from a log generated by AWS glue
Dockerfile
8
star
24

graph-club

Tri-weekly hackathons and talks on Graph Theory and Network Analysis.
Jupyter Notebook
8
star
25

splink_synthetic_data

Generate synthetic datasets for linking
Python
7
star
26

rmarkdown_training

Short training session on RMarkdown, for JSAS
R
7
star
27

mojspeakr

Formatting RMarkdown into govspeak for publishing on gov.uk
R
7
star
28

dataengineeringutils

A python package containing functions that help manage our data management processes on AWS
Python
6
star
29

data_linter

Docker image used to automatically validate data
Python
6
star
30

fuzzyfinder

Fuzzy search for matching records and score search results according to how closely they match
Python
6
star
31

mojap-aws-tools-demo

A repo to test the different open source AWS tools we use / maintain for Data Engineering
Jupyter Notebook
6
star
32

NLP-guidance

Some thinking about Natural Language Processing
JavaScript
6
star
33

dbtools

Basic wrapper functions to query data using boto3 and Athena
R
5
star
34

splink_cluster_studio

Create interactive dashboards to visualise and analyse the outputs of data linking
JavaScript
5
star
35

mojap-metadata

Schema definitions and management of our metadata used by the Data Engineering Team at MoJ
Python
5
star
36

Rdbtools

Accessing Athena on the Analytical Platform
R
4
star
37

splink_scalaudfs

Data linking functions in Scala, to be used in a Pyspark environment.
Scala
4
star
38

data_generator

Generates data using faker and our meta data schemas
Python
4
star
39

rshiny-template

Template RShiny project
R
4
star
40

intro_r_training_extension

An extension to the IntroRTraining course
HTML
4
star
41

iam_builder

Little helper to write IAM policies
Python
4
star
42

ggplotTraining

HTML
4
star
43

mojSuppression

R
3
star
44

QA.that

R
3
star
45

platform_user_guidance

**DEPRECATED** See https://github.com/moj-analytical-services/user-guidance
HTML
3
star
46

data-engineering-exports

Infrastructure to allow data from the Analytical Platform to be accessed by other services
Python
3
star
47

goodtables_test

Public repo with examples of goodtables
Jupyter Notebook
3
star
48

splink_comparison_viewer

JavaScript
3
star
49

s3_data_packer

Python
3
star
50

Rs3tools

R
3
star
51

coffee_roulette_pairs

A package to generate random pairings for Coffee Roulette
R
3
star
52

FuzzyMatchR

Reference page to link to R implementation of a probabilistic matching function
3
star
53

mojverse

The tidyverse equivalent for MoJ packages
3
star
54

intro_to_github_training

R
3
star
55

AWS-study-group-quizzes

2
star
56

I-RAP

R
2
star
57

data-engineering-template

Standard content, settings and hooks for data engineering
Shell
2
star
58

rmarkdown-vegawidget-template

A template for a deployed app that renders a markdown report
R
2
star
59

s3browser

A R Studio Addin that allows you to browse the files you have access to in S3
JavaScript
2
star
60

splink_data_generation

Generate datasets with known m and u probabilities to feed into the Fellegi Sunter model
Jupyter Notebook
2
star
61

RSuperscript

A function that allows you to add superscripts and subscripts to cells in excel
R
2
star
62

airflow_osrm_scrape

Scrapes the open streetmap routing machine for all combinations of LSOAs, and MSOAs
Python
2
star
63

metadata_vis

Data discovery tool that ingests metadata and makes it searchable. Uses metadata in the format required for https://github.com/moj-analytical-services/etl_manager
CSS
2
star
64

OPG

Python
2
star
65

airflow-de-intro-project

Python
2
star
66

SQL_from_square_one

Guidance on learning SQL from square one (i.e. zero knowledge)
HTML
1
star
67

iceberg-evaluation

Jupyter Notebook
1
star
68

random-coffee-trials

Automating rct
R
1
star
69

jwmodel

Judicial Workforce Modelling R Package
R
1
star
70

airflow_get_index_of_multiple_deprivation

Airflow job to get dataset of index of multiple deprivation
Python
1
star
71

datacleaningutils

Unit tested functions for cleaning data as part of ETL processes
Python
1
star
72

rshiny-test

R
1
star
73

cronjob-template

Example of project with a Cronjob
1
star
74

shiny-headers-demo

R
1
star
75

gluejobutils

Python 2.7 utility functions to include with AWS glue jobs
Python
1
star
76

actions-lint-python

1
star
77

lookup_hmcts_regions

A lookup table that maps local authorities to HMCTS regions.
Jupyter Notebook
1
star
78

.github

Ministry of Justice Analytical Services GitHub workflow templates
1
star
79

pq_scraper

Parliamentary Questions (PQ) scraper
Python
1
star
80

airflow-murad-ali-j-test

Python
1
star
81

vega-lite-away-day

R
1
star
82

data_linter_deprecated

A package to lint data against our meta data schemas
Python
1
star
83

predictr

R
1
star
84

kerins-shiny-app

R
1
star
85

civilreadr

Easy reading of published civil CSVs
R
1
star
86

template-airflow-python

Template repository for running airflow python tasks in Kubernetes/Docker
Python
1
star
87

ap-tools-training

R
1
star
88

geoharmonise

R
1
star
89

splink_examples_synthetic_data

Python
1
star
90

criminal_history_sankey

A sankey diagram for criminal history statistics
HTML
1
star
91

rshiny-xoen-kaniko-test

Testing kaniko to build Docker images
R
1
star
92

github-outside-collaborators

Manage outside collaborators on our Github repositories
Ruby
1
star
93

airflow-platform-user-data

Airflow job to gather platform user data from Auth0
Python
1
star
94

mojap-airflow-tools

A few wrappers and tools to use with Airflow on the Analytical Platform
Python
1
star
95

oracleConnectR

Wrapper to simplify connection to Oracle databases
R
1
star
96

mojtext

Functions to automate text
R
1
star
97

mojtable

R
1
star