• Stars
    star
    1
  • Language
    Ruby
  • License
    MIT License
  • Created over 1 year ago
  • Updated 24 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Manage outside collaborators on our Github repositories

More Repositories

1

splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
JavaScript
1,251
star
2

splink_demos

Interactive notebooks containing demonstration code of the splink library
HTML
38
star
3

shinyGovstyle

Now up to GDS frontend version v4.0.0
CSS
38
star
4

airflow-pdf2embeddings

NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
Python
35
star
5

xltabr

xltabr: An R package for writing formatted cross tabulations (contingency tables) to Excel using openxlsx
R
31
star
6

etl-pipeline-example

An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
Python
26
star
7

coffee-and-coding-public

MoJ coffee and coding sessions that can be made publicly available
HTML
24
star
8

etl_manager

A python package to create a database on the platform using our moj data warehousing framework
Python
20
star
9

IntroRTraining

Introductory R training
HTML
18
star
10

dataengineeringutils3

Fully unit tested utility functions for data engineering. Python 3 only.
Python
14
star
11

our-coding-standards

DASD's coding principles for analytical projects
HTML
13
star
12

mojchart

R package for formatting ggplot2 charts and applying MoJ corporate colours.
R
13
star
13

user-guidance

User guidance for the MoJ Analytical Platform
HTML
12
star
14

writing_functions_in_r

How to write functions in R
HTML
12
star
15

rpackage_training

Making and developing R packages
11
star
16

pq-tool

Tool to analyse past parliamentary questions with visualisation in RShiny
R
10
star
17

splink_graph

pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)
HTML
10
star
18

pydbtools

Python version of dbtools
Python
10
star
19

data-engineering-and-modelling-applicant-info

Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.
9
star
20

mojap-arrow-pd-parser

Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL and Parquet is read the same (using arrow).
Python
8
star
21

s3tools

Interact with files in s3 on the Analytical Platform
R
8
star
22

mojrap

For generalised functions for RAP. If there are any functions in your RAP that will be useful to other people, please use this space to share them.
R
8
star
23

docker_spark_history_ui

A dockerised version of the spark history server which enables us to access metrics in the spark ui from a log generated by AWS glue
Dockerfile
8
star
24

graph-club

Tri-weekly hackathons and talks on Graph Theory and Network Analysis.
Jupyter Notebook
8
star
25

splink_synthetic_data

Generate synthetic datasets for linking
Python
7
star
26

rmarkdown_training

Short training session on RMarkdown, for JSAS
R
7
star
27

mojspeakr

Formatting RMarkdown into govspeak for publishing on gov.uk
R
7
star
28

dataengineeringutils

A python package containing functions that help manage our data management processes on AWS
Python
6
star
29

data_linter

Docker image used to automatically validate data
Python
6
star
30

fuzzyfinder

Fuzzy search for matching records and score search results according to how closely they match
Python
6
star
31

mojap-aws-tools-demo

A repo to test the different open source AWS tools we use / maintain for Data Engineering
Jupyter Notebook
6
star
32

NLP-guidance

Some thinking about Natural Language Processing
JavaScript
6
star
33

dbtools

Basic wrapper functions to query data using boto3 and Athena
R
5
star
34

splink_cluster_studio

Create interactive dashboards to visualise and analyse the outputs of data linking
JavaScript
5
star
35

mojap-metadata

Schema definitions and management of our metadata used by the Data Engineering Team at MoJ
Python
5
star
36

Rdbtools

Accessing Athena on the Analytical Platform
R
4
star
37

splink_scalaudfs

Data linking functions in Scala, to be used in a Pyspark environment.
Scala
4
star
38

data_generator

Generates data using faker and our meta data schemas
Python
4
star
39

rshiny-template

Template RShiny project
R
4
star
40

intro_r_training_extension

An extension to the IntroRTraining course
HTML
4
star
41

iam_builder

Little helper to write IAM policies
Python
4
star
42

ggplotTraining

HTML
4
star
43

mojSuppression

R
3
star
44

QA.that

R
3
star
45

platform_user_guidance

**DEPRECATED** See https://github.com/moj-analytical-services/user-guidance
HTML
3
star
46

data-engineering-exports

Infrastructure to allow data from the Analytical Platform to be accessed by other services
Python
3
star
47

goodtables_test

Public repo with examples of goodtables
Jupyter Notebook
3
star
48

splink_comparison_viewer

JavaScript
3
star
49

s3_data_packer

Python
3
star
50

Rs3tools

R
3
star
51

coffee_roulette_pairs

A package to generate random pairings for Coffee Roulette
R
3
star
52

FuzzyMatchR

Reference page to link to R implementation of a probabilistic matching function
3
star
53

mojverse

The tidyverse equivalent for MoJ packages
3
star
54

intro_to_github_training

R
3
star
55

AWS-study-group-quizzes

2
star
56

I-RAP

R
2
star
57

data-engineering-template

Standard content, settings and hooks for data engineering
Shell
2
star
58

rmarkdown-vegawidget-template

A template for a deployed app that renders a markdown report
R
2
star
59

s3browser

A R Studio Addin that allows you to browse the files you have access to in S3
JavaScript
2
star
60

splink_data_generation

Generate datasets with known m and u probabilities to feed into the Fellegi Sunter model
Jupyter Notebook
2
star
61

RSuperscript

A function that allows you to add superscripts and subscripts to cells in excel
R
2
star
62

airflow_osrm_scrape

Scrapes the open streetmap routing machine for all combinations of LSOAs, and MSOAs
Python
2
star
63

metadata_vis

Data discovery tool that ingests metadata and makes it searchable. Uses metadata in the format required for https://github.com/moj-analytical-services/etl_manager
CSS
2
star
64

OPG

Python
2
star
65

airflow-de-intro-project

Python
2
star
66

SQL_from_square_one

Guidance on learning SQL from square one (i.e. zero knowledge)
HTML
1
star
67

iceberg-evaluation

Jupyter Notebook
1
star
68

random-coffee-trials

Automating rct
R
1
star
69

jwmodel

Judicial Workforce Modelling R Package
R
1
star
70

airflow_get_index_of_multiple_deprivation

Airflow job to get dataset of index of multiple deprivation
Python
1
star
71

datacleaningutils

Unit tested functions for cleaning data as part of ETL processes
Python
1
star
72

rshiny-test

R
1
star
73

cronjob-template

Example of project with a Cronjob
1
star
74

shiny-headers-demo

R
1
star
75

gluejobutils

Python 2.7 utility functions to include with AWS glue jobs
Python
1
star
76

actions-lint-python

1
star
77

lookup_hmcts_regions

A lookup table that maps local authorities to HMCTS regions.
Jupyter Notebook
1
star
78

.github

Ministry of Justice Analytical Services GitHub workflow templates
1
star
79

pq_scraper

Parliamentary Questions (PQ) scraper
Python
1
star
80

a11ycharts

R
1
star
81

airflow-murad-ali-j-test

Python
1
star
82

vega-lite-away-day

R
1
star
83

data_linter_deprecated

A package to lint data against our meta data schemas
Python
1
star
84

predictr

R
1
star
85

kerins-shiny-app

R
1
star
86

civilreadr

Easy reading of published civil CSVs
R
1
star
87

template-airflow-python

Template repository for running airflow python tasks in Kubernetes/Docker
Python
1
star
88

ap-tools-training

R
1
star
89

geoharmonise

R
1
star
90

splink_examples_synthetic_data

Python
1
star
91

criminal_history_sankey

A sankey diagram for criminal history statistics
HTML
1
star
92

rshiny-xoen-kaniko-test

Testing kaniko to build Docker images
R
1
star
93

airflow-platform-user-data

Airflow job to gather platform user data from Auth0
Python
1
star
94

mojap-airflow-tools

A few wrappers and tools to use with Airflow on the Analytical Platform
Python
1
star
95

oracleConnectR

Wrapper to simplify connection to Oracle databases
R
1
star
96

mojtext

Functions to automate text
R
1
star
97

mojtable

R
1
star