• Stars
    star
    393
  • Rank 109,518 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Clean personally identifiable information from dirty dirty text.

scrubadub

Remove personally identifiable information from free text. Sometimes we have additional metadata about the people we wish to anonymize. Other times we don't. This package makes it easy to seamlessly scrub personal information from free text, without compromising the privacy of the people we are trying to protect.

scrubadub currently supports removing:

  • Names
  • Email addresses
  • Addresses/Postal codes (US, GB, CA)
  • Credit card numbers
  • Dates of birth
  • URLs
  • Phone numbers
  • Username and password combinations
  • Skype/twitter usernames
  • Social security numbers (US and GB national insurance numbers)
  • Tax numbers (GB)
  • Driving licence numbers (GB)
Build Status Version Downloads Test Coverage Documentation Status

Quick start

Getting started with scrubadub is as easy as pip install scrubadub and incorporating it into your python scripts like this:

>>> import scrubadub

# My cat may be more tech-savvy than most, but he doesn't want other people to know it.
>>> text = "My cat can be contacted on [email protected], or 1800 555-5555"

# Replaces the phone number and email addresse with anonymous IDs.
>>> scrubadub.clean(text)
'My cat can be contacted on {{EMAIL}}, or {{PHONE}}'

There are many ways to tailor the behavior of scrubadub using different Detectors and PostProcessors. Scrubadub is highly configurable and supports localisation for different languages and regions.

Installation

To install scrubadub using pip, simply type:

pip install scrubadub

There are several other packages that can optionally be installed to enable extra detectors. These scrubadub_address, scrubadub_spacy and scrubadub_stanford, see the relevant documentation (address detector documentation and name detector documentation) for more info on these as they require additional dependencies. This package requires at least python 3.6. For python 2.7 or 3.5 support use v1.2.2 which is the last version with support for these versions.

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.

More Repositories

1

scrubadub_spacy

Clean personally identifiable information from dirty dirty text using spaCy.
Python
40
star
2

terraform-aws-config

A quick example of configuring the AWS Config service with terraform
HCL
31
star
3

terraform-s3-replication

S3 bucket replication using Terraform
HCL
18
star
4

terraform-aws-cloudtrail

Basic demonstration of setting up CloudTrail and VPC flow logs
HCL
13
star
5

terraform-aws-guardduty

A small example of setting up GuardDuty using Terraform
HCL
11
star
6

terraform-tutorials

A set of tutorial and example materials for Terraform.
6
star
7

terraform-aws-bastion

Example of using a 'bastion' VPC with a 'private' VPC that has instances hidden behind a NAT gateway
HCL
6
star
8

terraform-s3-encryption

Terraform and associated tools for exploring the use of encryption in S3
HCL
5
star
9

nifi-tutorials

A collection of small Apache NiFi tutorials and examples.
4
star
10

cli-mfa

Notes and tools for working with MFA and the AWS CLI
Shell
3
star
11

catwalk

A platform for models
Python
3
star
12

terraform-aws-vpc

Terraform scripts to set up an instance of the "VPC with public and private subnet" scenario.
HCL
2
star
13

aws-neptune

Demonstration of AWS Neptune with Juypter and Gremlin
HCL
2
star
14

scrubadub_address

Clean addresses from dirty dirty text.
Python
1
star
15

terraform-aws-sandbox

A set of Terraform and other assets that build a simple AWS environment that can be used to explore security
HCL
1
star
16

module-auroradb

Simple module for creating an Aurora DB cluster
HCL
1
star
17

terraform-aws-rds

Example of setting up a sql-server RDS instance for testing
HCL
1
star