• Stars
    star
    755
  • Rank 60,125 (Top 2 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created over 2 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generate deterministic fake values: The same input will always generate the same fake-output.

copycat

import { copycat } from '@snaplet/copycat'

copycat.email('foo')
// => '[email protected]'

copycat.email('bar')
// => '[email protected]'

copycat.email('foo')
// => '[email protected]'

Motivation

The problem

Many of the use cases we aim to solve with snaplet involves anonymizing sensitive information. In practice, this involves replacing each bit of sensitive data with something else that resembles the original value, yet does not allow the original value to be inferred.

To do this, we initially turned to faker for replacing the sensitive data with fake data. This approach took us quite far. However, we struggled with getting the replacement data to be deterministic: we found we did not have enough control over how results are generated to be able to easily ensure that for each value of the original data we wanted to replace, we'd always get the same replacement value out.

Faker allows one to seed a pseudo-random number generator (PRNG), such that the same sequence of values will be generated every time. While this means the sequence is deterministic, the problem was we did not have enough control over where the next value in the sequence was going to be used. Changes to the contents or structure in the original data we're replacing and changes to how we are using faker both had an effect on the way we used this sequence, which in turn had an effect on the resulting replacement value for any particular value in the original data. In other words, we had determinism, but not in a way that is useful for our purpose.

The solution

What we were really needing was not the same sequence of generated values every time, but the same mapping to generated values every time.

This is exactly what we designed Copycat to do. For each method provided by Copycat, a given input value will always map to the same output value.

import { copycat } from '@snaplet/copycat'

copycat.email('foo')
// => '[email protected]'

copycat.email('bar')
// => '[email protected]'

copycat.email('foo')
// => '[email protected]'

Copycat works statelessly: for the same input, the same value will be returned regardless of the environment, process, call ordering, or any other external factors.

Under the hood, Copycat hashes the input values (using SipHash), with the intention of making it computationally infeasible for the input values to be inferred from the output values.

Alternative approaches

It is still technically possible to make use of faker or similar libraries that offer deterministic PRNG - with some modification. That said, these solutions came with practical limitations that we decided made them less viable for us:

  • It is possible to simply seed the PRNG for every identifier, and then use it to generate only a single value. This seems to be a misuse of these libraries though: there is an up-front cost to seeding these PRNGs that can be expensive if done for each and every value to be generated. Here are benchmarks that point to this up-front cost.
  • You can generate a sequence of N values, hash identifiers to some integer smaller than N, then simply use that as an index to lookup a value in the sequence. This can even be done lazily. Still, you're now limiting the uniqueness of the values to N. The larger N is, the larger the cost of keeping these sequences in memory, or the more computationally expensive it is if you do not hold onto the sequences in memory. The smaller N is, the less unique your generated values are.

Note though that for either of these approaches, hashing might also still be needed to make it infeasible for the inputs to be inferred from the outputs.

API Reference

Overview

All Copycat functions take in an input value as their first parameter:

import { copycat } from '@snaplet/copycat'

copycat.email('foo')
// => '[email protected]'

The given input can be any JSON-serializable value. For any two calls to the same function, the input given in each call serializes down to the same value and the same output will be returned.

Note that unlike JSON.stringify(), object property ordering is not considered.

Working with PII (Personal Identifiable Information)

If you're using sensitive information as input to Copycat, the fact that Copycat makes use of SipHash means it is difficult for the original input value to be inferred from the output value - it is computationally infeasible.

// It is difficult to infer 'Some sensitive input' from 'Rhianna Ebert'
copycat.fullName('Some sensitive input')
// -> 'Rhianna Ebert'

That said, there is still something we need to watch out for: with enough guessing, the input values can still be figured out from the output values.

Lets say we replaced all the first names in some table of data. Included in this data was the name 'Susan', which was replaced with 'Therese':

copycat.firstName('Susan') // -> 'Therese'

While the attacker is able to see the name Therese, it is difficult for them to look at Copycat's code, and figure out 'Susan' from 'Therese'. But the attacker knows they're dealing with first names, and they have access to the Copycat library. What they can do, is input a list of first names into Copycat, until they find a matching name.

Let's say they input the name 'John'. The result is 'April', which does not match 'Therese', so they move on. They next try 'Sarah', which maps to 'Florencio' - again no match, they move on. They next try Susan, which maps to the name they see - Therese. This means they have a match, and now know that the original name was Susan:

copycat.firstName('John') // -> 'April', no match
copycat.firstName('Sarah') // -> 'Florencio', no match
copycat.firstName('Susan') // -> 'Therese', match!

To prevent this, you'll need to give copycat a key to use when hashing the values:

// store this somewhere safe
const key = copycat.generateHashKey('g9u*rT#!72R$zl5e')


copycat.fullName('foo')
// => 'Mohamed Weber'

copycat.setHashKey(key)

copycat.fullName('foo')
// => 'Bertha Harris'

The idea is that while Copycat's code is publicly known, the key isn't publically known. This means that even though attackers have access to Copycat's code, they are not able to figure out which inputs map to which outputs, since they do not have access to the key.

faker

A re-export of faker from @faker-js/faker. We do not alter faker in any way, and do not seed it.

fictional

A re-export of fictional, a library used under the hood by copycat for mapping inputs to primitive values.

copycat.scramble(input[, options])

Takes in an input value, and returns a value of the same type and length, but with each character/digit replaced with a different character/digit.

For string, the replacement characters will be in the same character range:

  • By default, spaces are preserved (see preserve option below)
  • Lower case ASCII characters are replaced with lower case ASCII letters
  • Upper case ASCII characters are replaced with upper case ASCII letters
  • Digits are replaced with digits
  • Any other ASCII character in the code point range 32 to 126 (0x20 - 0x7e) is replaced with either an alphanumeric character, or _, -, or +
  • Any other character is replaced with a Latin-1 character in the range of (0x20 - 0x7e, or 0xa0 - 0xff)
copycat.scramble('Zakary Hessel')
// => 'Xowkjj Lzydrd'

If a number is given, each digit will be replaced, and the floating point (if relevant) will be preserved:

copycat.scramble(782364.902374)
// => 533482.326595

If an object or array is given, the values inside the object or array will be recursively scrambled:

copycat.scramble({
  a: [
    {
      b: 23,
      c: 'foo',
    },
  ],
})
// => { a: [ { b: 24, c: 'uro' } ] }

If a date is given, each segment in the date will be scrambled:

copycat.scramble(new Date('2022-10-25T19:08:39.374Z'))
// => {}

If a boolean or null value is given, the value will simply be returned.

If a value of any other type is given, an error will be thrown

options

  • preserve: An array of characters that should remain the same if present in the given input string
copycat.scramble('[email protected]', { preserve: ['@', '.'] })
// => '[email protected]'

copycat.oneOf(input, values)

Takes in an input value and an array of values, and returns an item in values that corresponds to that input:

copycat.oneOf('foo', ['red', 'green', 'blue'])
// => 'green'

copycat.someOf(input, range, values)

Takes in an input value and an array of values, repeatedly picks items from that array a number of times within the given range that corresponds to that input. Each item will be picked no more than once.

copycat.someOf('foo', [1,2], ['paper', 'rock'])
// => []

copycat.int(input[, options])

Takes in an input value and returns an integer.

copycat.int('foo')
// => 5208378699696662

options

  • min=0 and max=Infinity: the minimum and maximum possible values for returned numbers

copycat.bool(input)

Takes in an input value and returns a boolean.

copycat.bool('foo')
// => false

copycat.float(input[, options])

Takes in an input value and returns a number value with both a whole and decimal segment.

copycat.float('foo')
// => 51167487947531.74

copycat.char(input)

Takes in an input value and returns a string with a single character.

copycat.char('foo')
// => 'a'

The generated character will be an alphanumeric: lower and upper case ASCII letters and digits 0 to 9.

copycat.digit(input)

Takes in an input value and returns a string with a single digit value.

copycat.digit('foo')
// => '2'

copycat.hex(input)

Takes in an input value and returns a string with a single hex value.

copycat.hex('foo')
// => '6'

options

  • min=0 and max=Infinity: the minimum and maximum possible values for returned numbers

copycat.dateString(input[, options])

Takes in an input value and returns a string representing a date in ISO 8601 format.

copycat.dateString('foo')
// => '2002-03-15T14:10:10.000Z'

options

  • minYear=1980 and maxYear=2019: the minimum and maximum possible year values for returned dates

copycat.uuid(input)

Takes in an input and returns a string value resembling a uuid.

copycat.uuid('foo')
// => '2fabe7f3-6216-5e0b-a885-4fb9951363f5'

copycat.email(input)

Takes in an input and returns a string value resembling an email address.

copycat.email('foo')
// => '[email protected]'

options

  • limit: Constrain generated values to be less than or equal to limit number of chars

copycat.firstName(input)

Takes in an input and returns a string value resembling a first name.

copycat.firstName('foo')
// => 'Morris'

options

  • limit: Constrain generated values to be less than or equal to limit number of chars

copycat.lastName(input)

Takes in an input and returns a string value resembling a last name.

copycat.lastName('foo')
// => 'Gleichner'

options

  • limit: Constrain generated values to be less than or equal to limit number of chars

copycat.fullName(input)

Takes in an input and returns a string value resembling a full name.

copycat.fullName('foo')
// => 'Bertha Harris'

options

  • limit: Constrain generated values to be less than or equal to limit number of chars

copycat.phoneNumber(input)

Takes in an input and returns a string value resembling a phone number.

copycat.phoneNumber('foo')
// => '+69642130883467'

note The strings resemble phone numbers, but will not always be valid. For example, the country dialing code may not exist, or for a particular country, the number of digits may be incorrect. Please let us know if you need valid phone numbers, and feel free to contribute :)

copycat.username(input)

Takes in an input and returns a string value resembling a username.

copycat.username('foo')
// => 'Albin.Schneider56223'

options

  • limit: Constrain generated values to be less than or equal to limit number of chars

copycat.password(input)

Takes in an input value and returns a string value resembling a password.

copycat.password('foo')
// => 'uoU{Dz6@[d!M'

Note: not recommended for use as a personal password generator.

copycat.city(input)

Takes in an input and returns a string value representing a city.

copycat.city('foo')
// => 'Rockford'

copycat.country(input)

Takes in an input and returns a string value representing a country.

copycat.country('foo')
// => 'Tajikistan'

copycat.streetName(input)

Takes in an input and returns a string value representing a fictitious street name.

copycat.streetName('foo')
// => 'Jewel Oval'

copycat.streetAddress(input)

Takes in an input and returns a string value representing a fictitious street address.

copycat.streetAddress('foo')
// => '11 Felipa Course'

copycat.postalAddress(input)

Takes in an input and returns a string value representing a fictitious postal address.

copycat.postalAddress('foo')
// => '114 Pacocha Ville, Potomac 5305, Saint Barthelemy'

copycat.countryCode(input)

Takes in an input and returns a string value representing a country code.

copycat.countryCode('foo')
// => 'GB'

copycat.timezone(input)

Takes in an input and returns a string value representing a time zone.

copycat.timezone('foo')
// => 'America/Caracas'

copycat.word(input)

Takes in an input value and returns a string value resembling a fictitious word.

copycat.word('foo')
// => 'Makinyo'

options

  • capitalize=true: whether or not the word should start with an upper case letter
  • minSyllables=2 and maxSyllables=4: the minimum and maximum possible number of syllables that returned words will contain
copycat.word('id-2', {
  minSyllables: 1,
  maxSyllables: 6,
  unicode: 0.382
})
// => 'Meano'
'Memu'

copycat.words(input)

Takes in an input value and returns a string value resembling fictitious words.

copycat.words('foo')
// => 'Nitaso keraekora'

options

  • min=2 and max=3: the minimum and maximum possible number of words that returned strings will contain.
  • capitalize='first': whether or not the words should start with upper case letters. If true or 'all' is given, each string returned will start with an upper case letter in each word. If 'first' is given, for each string returned, only the first word will start with an upper case letter. If false is given, each string returned will always contain only lower case letters.
  • minSyllables=1 and maxSyllables=4: the minimum and maximum possible number of syllables that returned words will contain

copycat.sentence(input)

Takes in an input value and returns a string value resembling a sentence of fictitious words.

copycat.sentence('foo')
// => 'No kaikahy kokin raekinmohy somikamu, momi vakimuno hyvayo yo nonanihy.'

options

  • minClauses=1 and maxClauses=2: the minimum and maximum possible number of clauses that a returned sentence will contain.
  • minWords=5 and maxWords=8: the minimum and maximum possible number of words that each clause will contain.
  • minSyllables=1 and maxSyllables=4: the minimum and maximum possible number of syllables that returned words will contain

copycat.paragraph(input)

Takes in an input value and returns a string value resembling a paragraph of fictitious words.

copycat.paragraph('foo')
// => 'Kemuvakin hychi chikaisota shi hakokinta no hanoceani. Vamuha keta rakeno me ceamomimu so kinma me, shisohy kake mukaiyurae ko meta nakinma nomukeko ma. Koka kaianiva rahyra mishimano meramua ki. Ceakinko hykochiyu chimanoshi kaika kaishiyo. Shiceayu nishiko hanomira vakinkaiko vi shiashira chiko ni, yunayuha ke kemoke ki hymakoka nacea haceamemu. Hako vikako hakomuchi vano memoako makechiyo keyucea yokina, yokanoha vakeshikai ma shicea ka muceame.'

options

  • minSentences=3 and minSentences=7: the minimum and maximum possible number of sentences that a returned paragraph will contain.
  • minClauses=1 and maxClauses=2: the minimum and maximum possible number of clauses that each sentence will contain.
  • minWords=5 and maxWords=8: the minimum and maximum possible number of words that each clause will contain.
  • minSyllables=1 and maxSyllables=4: the minimum and maximum possible number of syllables that returned words will contain

copycat.ipv4(input)

Takes in an input value and returns a string value resembling an IPv4 address.

copycat.ipv4('foo')
// => '215.18.220.239'

copycat.mac(input)

Takes in an input value and returns a string value resembling a MAC address.

copycat.mac('foo')
// => '9b:16:b2:05:40:6c'

copycat.userAgent(input)

Takes in an input value and returns a string value resembling a browser User Agent string.

copycat.userAgent('foo')
// => 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.4) Gecko/20100101 Firefox/14.4.7'

note For simplicity, this is currently working off of a list of 500 pre-defined user agent strings. If this is too limiting for your needs and you need something more dynamic than this, please let us know, and feel free to contribute :)

copycat.times(input, range, fn)

Takes in an input value and a function fn, calls that function repeatedly (each time with a unique input) for a number of times within the given range, and returns the results as an array:

copycat.times('foo', [4, 5], copycat.word)
// => [ 'Mahy', 'Ceavihy', 'Koceachita', 'Mia', 'Moyuni' ]

As shown above, range can be a tuple array of the minimum and maximum possible number of times the maker should be called. It can also be given as a number, in which case fn will be called exactly that number of times:

copycat.times('foo', 2, copycat.word)
// => [ 'Tachishimo', 'Mahy' ]

copycat.generateHashKey(secret)

Takes in a 16 byte secret value, and returns an array with four 32-bit integer number values:

 copycat.generateHashKey('Lhz1Xe7l$vPIwWr3')
// => Uint32Array(4) [ 830105676, 1815569752, 1230009892, 863131511 ]

See Working with PII for more.

copycat.setKeyHash(key)

Takes in an array with four 32-bit integers to use as the hash key, and changes copycat's internal state to use it when mapping inputs to output values. See Working with PII for more.

 const key = copycat.generateHashKey('Lhz1Xe7l$vPIwWr3')

 copycat.setHashKey(key)

More Repositories

1

postgres_lsp

A Language Server for Postgres
Rust
3,188
star
2

postgres-new

In-browser Postgres sandbox with AI assistance
TypeScript
2,212
star
3

nextjs-openai-doc-search

Template for building your own custom ChatGPT style doc search powered by Next.js, OpenAI, and Supabase.
TypeScript
1,249
star
4

supabase-csharp

A C# Client library for Supabase
C#
491
star
5

supabase-on-aws

Self-hosted Supabase on AWS
TypeScript
382
star
6

supabase-kt

A Kotlin Multiplatform Client for Supabase.
Kotlin
364
star
7

sql-examples

Curated list of SQL to help you find useful script easily πŸš€
Vue
358
star
8

chatgpt-your-files

Production-ready MVP for securely chatting with your documents using pgvector
TypeScript
330
star
9

postgrest-rs

Rust client for PostgREST
Rust
328
star
10

supabase-custom-claims

How to implement custom claims with Supabase
JavaScript
254
star
11

supabase-graphql-example

A HackerNews-like clone built with Supabase and pg_graphql
PLpgSQL
214
star
12

snapshot

Capture a snapshot (or subset) of your Postgres database whilst transforming the data.
TypeScript
211
star
13

supabase-kubernetes

Helm 3 charts to deploy a Supabase on Kubernetes
Smarty
206
star
14

svelte-kanban

PLpgSQL
203
star
15

godot-engine.supabase

A lightweight addon which integrates Supabase APIs for Godot Engine out of the box.
GDScript
172
star
16

nuxt-supabase

A supa simple wrapper around Supabase.js to enable usage within Nuxt.
TypeScript
167
star
17

postgrest-go

Isomorphic Go client for PostgREST. (Now Updating)
Go
125
star
18

supabase-ui-svelte

Supabase authentication UI for Svelte
Svelte
120
star
19

postgrest-csharp

A C# Client library for Postgrest
C#
117
star
20

firebase-to-supabase

Firebase to Supabase Migration Guide
JavaScript
100
star
21

vue-supabase

A supa simple wrapper around Supabase.js to enable usage within Vue.
TypeScript
87
star
22

expo-stripe-payments-with-supabase-functions

Bring the Func(πŸ•Ί)
TypeScript
86
star
23

realtime-csharp

A C# client library for supabase/realtime.
C#
71
star
24

svelte-supabase

JavaScript
71
star
25

partner-gallery-example

Supabase Partner Gallery Example
TypeScript
67
star
26

supabase-sveltekit-example

Svelte
65
star
27

supabase-flutter-quickstart

Flutter implementation of the Quickstart Supabase User Management app.
C++
61
star
28

deno-fresh-openai-doc-search

Template for building your own custom ChatGPT style doc search powered by Fresh, Deno, OpenAI, and Supabase.
TypeScript
60
star
29

supabase-traefik

Python
57
star
30

flutter-stripe-payments-with-supabase-functions

Dart
55
star
31

supabase-terraform

HCL
53
star
32

postgrest-kt

Postgrest Kotlin Client
Kotlin
52
star
33

supabase-vscode-extension

Supabase Extension for VS Code and GitHub Copilot.
TypeScript
50
star
34

flutter-auth-ui

Supabase Auth UI library for Flutter
Dart
49
star
35

supabase-rb

An isomorphic Ruby client for Supabase.
Ruby
46
star
36

realtime-swift

A Swift client for Supabase Realtime server.
Swift
45
star
37

supabase-by-example

TypeScript
44
star
38

flutter-chat

Simple chat application built with Flutter and Supabase.
Dart
44
star
39

postgrest-swift

Swift client for PostgREST
Swift
43
star
40

gotrue-csharp

C# implementation of Supabase's GoTrue
C#
39
star
41

postgrest-rb

Isomorphic Ruby client for PostgREST.
Ruby
32
star
42

gotrue-swift

A Swift client library for GoTrue.
Swift
32
star
43

storage-go

Storage util client for Supabase in Go
Go
31
star
44

vec2pg

Migrate vector workloads to Postgres
Python
29
star
45

chatgpt-plugin-template-deno

Template for building ChatGPT Plugins in TypeScript that run on Supabase's custom Deno Edge Runtime.
TypeScript
27
star
46

storage-swift

Swift client library to interact with Supabase Storage
Swift
25
star
47

base64url-js

Pure TypeScript implementation of Base64-URL encoding for JavaScript strings.
TypeScript
22
star
48

gotrue-kt

Kotlin Client for GoTrue API
Kotlin
21
star
49

storage-csharp

A C# implementation of Supabase's Object Storage API
C#
19
star
50

sql-to-rest

SQL to PostgREST translator
TypeScript
19
star
51

postgrest-ex

Elixir Client library for PostgREST
Elixir
18
star
52

gotrue-ex

An Elixir client for the GoTrue authentication service
Elixir
16
star
53

heroku-to-supabase

Heroku to Supabase Migration Guide
16
star
54

sveltekit-subscription-payments

Clone, deploy, and fully customize a SaaS subscription application with SvelteKit.
TypeScript
13
star
55

nuxt3-quickstarter

CSS
13
star
56

gotrue-go

Typed Golang cilent for the Supabase fork of GoTrue
Go
12
star
57

supabase-php

PHP
11
star
58

pg_headerkit

A set of functions for adding special features to your application that uses PostgREST API calls to your PostgreSQL database.
PLpgSQL
11
star
59

functions-csharp

C# client for interacting with Supabase Functions
C#
11
star
60

auth-go

Go
10
star
61

gotrue-java

A Java client library for the GoTrue API.
Java
10
star
62

onesignal

Next.js app showcasing how you can implement push notification using OneSignal and Supabase
TypeScript
10
star
63

postgres-wasm-fdw

A WebAssembly foreign data wrapper example project
Rust
9
star
64

functions-swift

Swift Client library to interact with Supabase Functions.
Swift
8
star
65

supabase-adminpack

A Trusted Language Extension containing a variety of useful databse admin queries.
8
star
66

functions-go

Golang client library to interact with Supabase Functions.
Go
7
star
67

unboring.sg

A website and browser extensions to discover things to eat, do, and learn.
TypeScript
7
star
68

supabase-community-bot

TypeScript
5
star
69

ai-writer

An AI writing assistant powered by OpenAI, Supabase, and Next.js
TypeScript
5
star
70

storage-java

A Java client library for the Supabase Storage API
Java
4
star
71

supabase-management-js

Convenience wrapper for the Supabase Management API: https://supabase.com/docs/reference/api/introduction
TypeScript
4
star
72

supabase-go

Go
4
star
73

deno-supabase-js

Using supabase-js in Deno.
4
star
74

launchweek.dev

Tracking launch weeks across the industry, for product discovery and inspiration.
JavaScript
3
star
75

.github

Supabase Community
3
star
76

core-swift

Shared interfaces and helpers for Swift libraries
Swift
2
star
77

supa-jam

TypeScript
2
star
78

nftree-garden

Mint an NFT, plant a Tree!
JavaScript
1
star
79

realtime-go

1
star
80

functions-rs

Rust
1
star
81

squad

1
star
82

gotrue-php

PHP
1
star
83

storage-php

PHP
1
star
84

postgrest-php

PHP
1
star
85

deno-storage-js

Repo to overwrite https://deno.land/x/storagee listing.
1
star
86

supabase-fastify-api

TypeScript
1
star
87

gotrue-fsharp

F# client for interacting with Supabase GoTrue
F#
1
star
88

realtime-php

1
star
89

core-py

Shared interfaces and helpers for the supabase-py libs
Python
1
star
90

hacktoberfest-hackathon-template

Template for the Supabase Hacktoberfest Hackathon that follows https://hacktoberfest.digitalocean.com/resources/maintainers
1
star
91

postgrest-java

1
star