• This repository has been archived on 25/Jan/2023
  • Stars
    star
    1
  • Language
    R
  • License
    Creative Commons ...
  • Created over 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Wrapper to simplify connection to Oracle databases

More Repositories

1

splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Python
1,328
star
2

shinyGovstyle

Apply GOV.UK styled components and formats in shiny
CSS
39
star
3

splink_demos

Interactive notebooks containing demonstration code of the splink library
HTML
38
star
4

airflow-pdf2embeddings

NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
Python
36
star
5

xltabr

xltabr: An R package for writing formatted cross tabulations (contingency tables) to Excel using openxlsx
R
31
star
6

etl-pipeline-example

An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
Python
26
star
7

coffee-and-coding-public

MoJ coffee and coding sessions that can be made publicly available
HTML
25
star
8

etl_manager

A python package to create a database on the platform using our moj data warehousing framework
Python
20
star
9

IntroRTraining

Introductory R training
HTML
18
star
10

mojchart

R package for formatting ggplot2 charts and applying MoJ corporate colours.
R
15
star
11

dataengineeringutils3

Fully unit tested utility functions for data engineering. Python 3 only.
Python
14
star
12

our-coding-standards

DASD's coding principles for analytical projects
HTML
13
star
13

user-guidance

User guidance for the MoJ Analytical Platform
HTML
12
star
14

writing_functions_in_r

How to write functions in R
HTML
12
star
15

rpackage_training

Making and developing R packages
11
star
16

data-engineering-and-modelling-applicant-info

Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.
10
star
17

pq-tool

Tool to analyse past parliamentary questions with visualisation in RShiny
R
10
star
18

splink_graph

pyspark-parallelised functions producing graph-theoretical metrics in connected component clusters for use in record-linkage (or other domains)
HTML
10
star
19

pydbtools

Python version of dbtools
Python
10
star
20

mojap-arrow-pd-parser

Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL and Parquet is read the same (using arrow).
Python
9
star
21

s3tools

Interact with files in s3 on the Analytical Platform
R
8
star
22

mojrap

For generalised functions for RAP. If there are any functions in your RAP that will be useful to other people, please use this space to share them.
R
8
star
23

graph-club

Tri-weekly hackathons and talks on Graph Theory and Network Analysis.
Jupyter Notebook
8
star
24

splink_synthetic_data

Generate synthetic datasets for linking
Python
8
star
25

docker_spark_history_ui

A dockerised version of the spark history server which enables us to access metrics in the spark ui from a log generated by AWS glue
Dockerfile
8
star
26

rmarkdown_training

Short training session on RMarkdown, for JSAS
R
7
star
27

mojspeakr

Formatting RMarkdown into govspeak for publishing on gov.uk
R
7
star
28

fuzzyfinder

Fuzzy search for matching records and score search results according to how closely they match
Python
7
star
29

dataengineeringutils

A python package containing functions that help manage our data management processes on AWS
Python
6
star
30

data_linter

Docker image used to automatically validate data
Python
6
star
31

mojap-aws-tools-demo

A repo to test the different open source AWS tools we use / maintain for Data Engineering
Jupyter Notebook
6
star
32

NLP-guidance

Some thinking about Natural Language Processing
JavaScript
6
star
33

dbtools

Basic wrapper functions to query data using boto3 and Athena
R
5
star
34

splink_cluster_studio

Create interactive dashboards to visualise and analyse the outputs of data linking
JavaScript
5
star
35

mojap-metadata

Schema definitions and management of our metadata used by the Data Engineering Team at MoJ
Python
5
star
36

Rdbtools

Accessing Athena on the Analytical Platform
R
4
star
37

splink_scalaudfs

Data linking functions in Scala, to be used in a Pyspark environment.
Scala
4
star
38

data_generator

Generates data using faker and our meta data schemas
Python
4
star
39

rshiny-template

Template RShiny project
R
4
star
40

intro_r_training_extension

An extension to the IntroRTraining course
HTML
4
star
41

iam_builder

Little helper to write IAM policies
Python
4
star
42

ggplotTraining

HTML
4
star
43

mojSuppression

R
3
star
44

QA.that

R
3
star
45

splink_comparison_viewer

JavaScript
3
star
46

platform_user_guidance

**DEPRECATED** See https://github.com/moj-analytical-services/user-guidance
HTML
3
star
47

data-engineering-exports

Infrastructure to allow data from the Analytical Platform to be accessed by other services
Python
3
star
48

goodtables_test

Public repo with examples of goodtables
Jupyter Notebook
3
star
49

s3_data_packer

Python
3
star
50

Rs3tools

R
3
star
51

coffee_roulette_pairs

A package to generate random pairings for Coffee Roulette
R
3
star
52

FuzzyMatchR

Reference page to link to R implementation of a probabilistic matching function
3
star
53

mojverse

The tidyverse equivalent for MoJ packages
3
star
54

intro_to_github_training

R
3
star
55

AWS-study-group-quizzes

2
star
56

splink_data_generation

Generate datasets with known m and u probabilities to feed into the Fellegi Sunter model
Jupyter Notebook
2
star
57

I-RAP

R
2
star
58

data-engineering-template

Standard content, settings and hooks for data engineering
Shell
2
star
59

rmarkdown-vegawidget-template

A template for a deployed app that renders a markdown report
R
2
star
60

s3browser

A R Studio Addin that allows you to browse the files you have access to in S3
JavaScript
2
star
61

RSuperscript

A function that allows you to add superscripts and subscripts to cells in excel
R
2
star
62

airflow_osrm_scrape

Scrapes the open streetmap routing machine for all combinations of LSOAs, and MSOAs
Python
2
star
63

metadata_vis

Data discovery tool that ingests metadata and makes it searchable. Uses metadata in the format required for https://github.com/moj-analytical-services/etl_manager
CSS
2
star
64

OPG

Python
2
star
65

airflow-de-intro-project

Python
2
star
66

SQL_from_square_one

Guidance on learning SQL from square one (i.e. zero knowledge)
HTML
1
star
67

iceberg-evaluation

Jupyter Notebook
1
star
68

random-coffee-trials

Automating rct
R
1
star
69

jwmodel

Judicial Workforce Modelling R Package
R
1
star
70

airflow_get_index_of_multiple_deprivation

Airflow job to get dataset of index of multiple deprivation
Python
1
star
71

datacleaningutils

Unit tested functions for cleaning data as part of ETL processes
Python
1
star
72

rshiny-test

R
1
star
73

cronjob-template

Example of project with a Cronjob
1
star
74

shiny-headers-demo

R
1
star
75

gluejobutils

Python 2.7 utility functions to include with AWS glue jobs
Python
1
star
76

actions-lint-python

1
star
77

lookup_hmcts_regions

A lookup table that maps local authorities to HMCTS regions.
Jupyter Notebook
1
star
78

.github

Ministry of Justice Analytical Services GitHub workflow templates
1
star
79

pq_scraper

Parliamentary Questions (PQ) scraper
Python
1
star
80

a11ycharts

R
1
star
81

airflow-murad-ali-j-test

Python
1
star
82

vega-lite-away-day

R
1
star
83

data_linter_deprecated

A package to lint data against our meta data schemas
Python
1
star
84

predictr

R
1
star
85

kerins-shiny-app

R
1
star
86

civilreadr

Easy reading of published civil CSVs
R
1
star
87

template-airflow-python

Template repository for running airflow python tasks in Kubernetes/Docker
Python
1
star
88

ap-tools-training

R
1
star
89

criminal_history_sankey

A sankey diagram for criminal history statistics
HTML
1
star
90

geoharmonise

R
1
star
91

splink_examples_synthetic_data

Python
1
star
92

rshiny-xoen-kaniko-test

Testing kaniko to build Docker images
R
1
star
93

github-outside-collaborators

Manage outside collaborators on our Github repositories
Ruby
1
star
94

airflow-platform-user-data

Airflow job to gather platform user data from Auth0
Python
1
star
95

mojap-airflow-tools

A few wrappers and tools to use with Airflow on the Analytical Platform
Python
1
star
96

mojtext

Functions to automate text
R
1
star
97

mojtable

R
1
star
98

intro-to-python

Jupyter Notebook
1
star
99

data-and-analytics-engineering-tech-radar

Visualizing our technology choices
Python
1
star