There are no reviews yet. Be the first to send feedback to the community and the maintainers!
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backendssplink_demos
Interactive notebooks containing demonstration code of the splink libraryairflow-pdf2embeddings
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.xltabr
xltabr: An R package for writing formatted cross tabulations (contingency tables) to Excel using openxlsxetl-pipeline-example
An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful informationcoffee-and-coding-public
MoJ coffee and coding sessions that can be made publicly availableetl_manager
A python package to create a database on the platform using our moj data warehousing frameworkIntroRTraining
Introductory R trainingmojchart
R package for formatting ggplot2 charts and applying MoJ corporate colours.dataengineeringutils3
Fully unit tested utility functions for data engineering. Python 3 only.our-coding-standards
DASD's coding principles for analytical projectsuser-guidance
User guidance for the MoJ Analytical Platformwriting_functions_in_r
How to write functions in Rrpackage_training
Making and developing R packagesdata-engineering-and-modelling-applicant-info
Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.pq-tool
Tool to analyse past parliamentary questions with visualisation in RShinysplink_graph
pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)pydbtools
Python version of dbtoolsmojap-arrow-pd-parser
Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL and Parquet is read the same (using arrow).s3tools
Interact with files in s3 on the Analytical Platformmojrap
For generalised functions for RAP. If there are any functions in your RAP that will be useful to other people, please use this space to share them.graph-club
Tri-weekly hackathons and talks on Graph Theory and Network Analysis.splink_synthetic_data
Generate synthetic datasets for linkingdocker_spark_history_ui
A dockerised version of the spark history server which enables us to access metrics in the spark ui from a log generated by AWS gluermarkdown_training
Short training session on RMarkdown, for JSASmojspeakr
Formatting RMarkdown into govspeak for publishing on gov.ukfuzzyfinder
Fuzzy search for matching records and score search results according to how closely they matchdataengineeringutils
A python package containing functions that help manage our data management processes on AWSdata_linter
Docker image used to automatically validate datamojap-aws-tools-demo
A repo to test the different open source AWS tools we use / maintain for Data EngineeringNLP-guidance
Some thinking about Natural Language Processingdbtools
Basic wrapper functions to query data using boto3 and Athenasplink_cluster_studio
Create interactive dashboards to visualise and analyse the outputs of data linkingmojap-metadata
Schema definitions and management of our metadata used by the Data Engineering Team at MoJRdbtools
Accessing Athena on the Analytical Platformsplink_scalaudfs
Data linking functions in Scala, to be used in a Pyspark environment.data_generator
Generates data using faker and our meta data schemasrshiny-template
Template RShiny projectintro_r_training_extension
An extension to the IntroRTraining courseiam_builder
Little helper to write IAM policiesggplotTraining
mojSuppression
QA.that
splink_comparison_viewer
platform_user_guidance
**DEPRECATED** See https://github.com/moj-analytical-services/user-guidancedata-engineering-exports
Infrastructure to allow data from the Analytical Platform to be accessed by other servicesgoodtables_test
Public repo with examples of goodtabless3_data_packer
Rs3tools
coffee_roulette_pairs
A package to generate random pairings for Coffee RouletteFuzzyMatchR
Reference page to link to R implementation of a probabilistic matching functionmojverse
The tidyverse equivalent for MoJ packagesintro_to_github_training
AWS-study-group-quizzes
splink_data_generation
Generate datasets with known m and u probabilities to feed into the Fellegi Sunter modelI-RAP
data-engineering-template
Standard content, settings and hooks for data engineeringrmarkdown-vegawidget-template
A template for a deployed app that renders a markdown reports3browser
A R Studio Addin that allows you to browse the files you have access to in S3RSuperscript
A function that allows you to add superscripts and subscripts to cells in excelairflow_osrm_scrape
Scrapes the open streetmap routing machine for all combinations of LSOAs, and MSOAsmetadata_vis
Data discovery tool that ingests metadata and makes it searchable. Uses metadata in the format required for https://github.com/moj-analytical-services/etl_managerOPG
airflow-de-intro-project
SQL_from_square_one
Guidance on learning SQL from square one (i.e. zero knowledge)iceberg-evaluation
random-coffee-trials
Automating rctjwmodel
Judicial Workforce Modelling R Packageairflow_get_index_of_multiple_deprivation
Airflow job to get dataset of index of multiple deprivationdatacleaningutils
Unit tested functions for cleaning data as part of ETL processesrshiny-test
cronjob-template
Example of project with a Cronjobshiny-headers-demo
gluejobutils
Python 2.7 utility functions to include with AWS glue jobsactions-lint-python
lookup_hmcts_regions
A lookup table that maps local authorities to HMCTS regions..github
Ministry of Justice Analytical Services GitHub workflow templatespq_scraper
Parliamentary Questions (PQ) scrapera11ycharts
airflow-murad-ali-j-test
vega-lite-away-day
data_linter_deprecated
A package to lint data against our meta data schemaspredictr
kerins-shiny-app
civilreadr
Easy reading of published civil CSVstemplate-airflow-python
Template repository for running airflow python tasks in Kubernetes/Dockerap-tools-training
criminal_history_sankey
A sankey diagram for criminal history statisticsgeoharmonise
splink_examples_synthetic_data
rshiny-xoen-kaniko-test
Testing kaniko to build Docker imagesgithub-outside-collaborators
Manage outside collaborators on our Github repositoriesairflow-platform-user-data
Airflow job to gather platform user data from Auth0mojap-airflow-tools
A few wrappers and tools to use with Airflow on the Analytical PlatformoracleConnectR
Wrapper to simplify connection to Oracle databasesmojtext
Functions to automate textmojtable
intro-to-python
data-and-analytics-engineering-tech-radar
Visualizing our technology choicesLove Open Source and this site? Check out how you can help us