• Stars
    star
    6
  • Rank 2,539,965 (Top 51 %)
  • Language
    Python
  • Created almost 5 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Docker image used to automatically validate data

More Repositories

1

splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Python
1,328
star
2

shinyGovstyle

Apply GOV.UK styled components and formats in shiny
CSS
39
star
3

splink_demos

Interactive notebooks containing demonstration code of the splink library
HTML
38
star
4

airflow-pdf2embeddings

NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
Python
36
star
5

xltabr

xltabr: An R package for writing formatted cross tabulations (contingency tables) to Excel using openxlsx
R
31
star
6

etl-pipeline-example

An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
Python
26
star
7

coffee-and-coding-public

MoJ coffee and coding sessions that can be made publicly available
HTML
25
star
8

etl_manager

A python package to create a database on the platform using our moj data warehousing framework
Python
20
star
9

IntroRTraining

Introductory R training
HTML
18
star
10

mojchart

R package for formatting ggplot2 charts and applying MoJ corporate colours.
R
15
star
11

dataengineeringutils3

Fully unit tested utility functions for data engineering. Python 3 only.
Python
14
star
12

our-coding-standards

DASD's coding principles for analytical projects
HTML
13
star
13

user-guidance

User guidance for the MoJ Analytical Platform
HTML
12
star
14

writing_functions_in_r

How to write functions in R
HTML
12
star
15

rpackage_training

Making and developing R packages
11
star
16

data-engineering-and-modelling-applicant-info

Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.
10
star
17

pq-tool

Tool to analyse past parliamentary questions with visualisation in RShiny
R
10
star
18

splink_graph

pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)
HTML
10
star
19

pydbtools

Python version of dbtools
Python
10
star
20

mojap-arrow-pd-parser

Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL and Parquet is read the same (using arrow).
Python
9
star
21

s3tools

Interact with files in s3 on the Analytical Platform
R
8
star
22

mojrap

For generalised functions for RAP. If there are any functions in your RAP that will be useful to other people, please use this space to share them.
R
8
star
23

graph-club

Tri-weekly hackathons and talks on Graph Theory and Network Analysis.
Jupyter Notebook
8
star
24

splink_synthetic_data

Generate synthetic datasets for linking
Python
8
star
25

docker_spark_history_ui

A dockerised version of the spark history server which enables us to access metrics in the spark ui from a log generated by AWS glue
Dockerfile
8
star
26

rmarkdown_training

Short training session on RMarkdown, for JSAS
R
7
star
27

mojspeakr

Formatting RMarkdown into govspeak for publishing on gov.uk
R
7
star
28

fuzzyfinder

Fuzzy search for matching records and score search results according to how closely they match
Python
7
star
29

dataengineeringutils

A python package containing functions that help manage our data management processes on AWS
Python
6
star
30

mojap-aws-tools-demo

A repo to test the different open source AWS tools we use / maintain for Data Engineering
Jupyter Notebook
6
star
31

NLP-guidance

Some thinking about Natural Language Processing
JavaScript
6
star
32

dbtools

Basic wrapper functions to query data using boto3 and Athena
R
5
star
33

splink_cluster_studio

Create interactive dashboards to visualise and analyse the outputs of data linking
JavaScript
5
star
34

mojap-metadata

Schema definitions and management of our metadata used by the Data Engineering Team at MoJ
Python
5
star
35

Rdbtools

Accessing Athena on the Analytical Platform
R
4
star
36

splink_scalaudfs

Data linking functions in Scala, to be used in a Pyspark environment.
Scala
4
star
37

data_generator

Generates data using faker and our meta data schemas
Python
4
star
38

rshiny-template

Template RShiny project
R
4
star
39

intro_r_training_extension

An extension to the IntroRTraining course
HTML
4
star
40

iam_builder

Little helper to write IAM policies
Python
4
star
41

ggplotTraining

HTML
4
star
42

mojSuppression

R
3
star
43

QA.that

R
3
star
44

splink_comparison_viewer

JavaScript
3
star
45

platform_user_guidance

**DEPRECATED** See https://github.com/moj-analytical-services/user-guidance
HTML
3
star
46

data-engineering-exports

Infrastructure to allow data from the Analytical Platform to be accessed by other services
Python
3
star
47

goodtables_test

Public repo with examples of goodtables
Jupyter Notebook
3
star
48

s3_data_packer

Python
3
star
49

Rs3tools

R
3
star
50

coffee_roulette_pairs

A package to generate random pairings for Coffee Roulette
R
3
star
51

FuzzyMatchR

Reference page to link to R implementation of a probabilistic matching function
3
star
52

mojverse

The tidyverse equivalent for MoJ packages
3
star
53

intro_to_github_training

R
3
star
54

AWS-study-group-quizzes

2
star
55

splink_data_generation

Generate datasets with known m and u probabilities to feed into the Fellegi Sunter model
Jupyter Notebook
2
star
56

I-RAP

R
2
star
57

data-engineering-template

Standard content, settings and hooks for data engineering
Shell
2
star
58

rmarkdown-vegawidget-template

A template for a deployed app that renders a markdown report
R
2
star
59

s3browser

A R Studio Addin that allows you to browse the files you have access to in S3
JavaScript
2
star
60

RSuperscript

A function that allows you to add superscripts and subscripts to cells in excel
R
2
star
61

airflow_osrm_scrape

Scrapes the open streetmap routing machine for all combinations of LSOAs, and MSOAs
Python
2
star
62

metadata_vis

Data discovery tool that ingests metadata and makes it searchable. Uses metadata in the format required for https://github.com/moj-analytical-services/etl_manager
CSS
2
star
63

OPG

Python
2
star
64

airflow-de-intro-project

Python
2
star
65

SQL_from_square_one

Guidance on learning SQL from square one (i.e. zero knowledge)
HTML
1
star
66

iceberg-evaluation

Jupyter Notebook
1
star
67

random-coffee-trials

Automating rct
R
1
star
68

jwmodel

Judicial Workforce Modelling R Package
R
1
star
69

airflow_get_index_of_multiple_deprivation

Airflow job to get dataset of index of multiple deprivation
Python
1
star
70

datacleaningutils

Unit tested functions for cleaning data as part of ETL processes
Python
1
star
71

rshiny-test

R
1
star
72

cronjob-template

Example of project with a Cronjob
1
star
73

shiny-headers-demo

R
1
star
74

gluejobutils

Python 2.7 utility functions to include with AWS glue jobs
Python
1
star
75

actions-lint-python

1
star
76

lookup_hmcts_regions

A lookup table that maps local authorities to HMCTS regions.
Jupyter Notebook
1
star
77

.github

Ministry of Justice Analytical Services GitHub workflow templates
1
star
78

pq_scraper

Parliamentary Questions (PQ) scraper
Python
1
star
79

a11ycharts

R
1
star
80

airflow-murad-ali-j-test

Python
1
star
81

vega-lite-away-day

R
1
star
82

data_linter_deprecated

A package to lint data against our meta data schemas
Python
1
star
83

predictr

R
1
star
84

kerins-shiny-app

R
1
star
85

civilreadr

Easy reading of published civil CSVs
R
1
star
86

template-airflow-python

Template repository for running airflow python tasks in Kubernetes/Docker
Python
1
star
87

ap-tools-training

R
1
star
88

criminal_history_sankey

A sankey diagram for criminal history statistics
HTML
1
star
89

geoharmonise

R
1
star
90

splink_examples_synthetic_data

Python
1
star
91

rshiny-xoen-kaniko-test

Testing kaniko to build Docker images
R
1
star
92

github-outside-collaborators

Manage outside collaborators on our Github repositories
Ruby
1
star
93

airflow-platform-user-data

Airflow job to gather platform user data from Auth0
Python
1
star
94

mojap-airflow-tools

A few wrappers and tools to use with Airflow on the Analytical Platform
Python
1
star
95

oracleConnectR

Wrapper to simplify connection to Oracle databases
R
1
star
96

mojtext

Functions to automate text
R
1
star
97

mojtable

R
1
star
98

intro-to-python

Jupyter Notebook
1
star
99

data-and-analytics-engineering-tech-radar

Visualizing our technology choices
Python
1
star