Giter VIP home page Giter VIP logo

pg-anonymizer's Introduction

pg-anonymizer

Export your PostgreSQL database anonymized. Replace all sensitive data thanks to faker. Output to a file that you can easily import with psql.

oclif Version Downloads License

Usage

Run this command by giving a connexion string and an output file name (no need to install first thanks to npx):

npx pg-anonymizer postgres://user:secret@localhost:1234/mydb -o dump.sql

โ˜๏ธ This command requires pg_dump. It may already be installed as soon as PostgreSQL is installed.

Output can also be stdout ('-') so you can pipe the output to zip, gz, or to psql:

npx pg-anonymizer postgres://user:secret@localhost:1234/mydb -o - | psql DATABASE_URL

API

--columns | -c

Specify list of columns to anonymize

Use --columns option with a comma separated list of column name:

npx pg-anonymizer postgres://localhost/mydb \
  --columns=email,firstName,lastName,phone

Specifying another list via --columns replace the default automatically anonymized values:

email,name,description,address,city,country,phone,comment,birthdate

You can also specify the table for a column using the dot notation:

public.user.email,public.product.description,email,name

Customize replacements

You can also choose which faker function you want to use to replace data (default is faker.random.word):

npx pg-anonymizer postgres://localhost/mydb \
  --columns=firstName:faker.name.firstName,lastName:faker.name.lastName

๐Ÿ‘‰ You don't need to specify faker function since the command will try to find correct function via column name.

You can use plain text too for static replacements:

npx pg-anonymizer postgres://localhost/mydb \
  --columns=textcol:hello,jsoncol:{},intcol:12

--extension

Use an extension file to create your own custom replacements

Create an extension file, written in javascript

// myExtension.js
module.exports = {
  maskEmail: (email) => {
   const [name, domain] = email.split('@');
   const { length: len } = name;
   const maskedName = name[0] + '...' + name[len - 1];
   const maskedEmail = maskedName + '@' + domain;
   return maskedEmail;
  }
};

Pass the path to --extension and use the module exports in --columns

npx pg-anonymizer postgres://localhost/mydb \
  --extension ./myExtension.js \
  --columns=email:extension.maskEmail

--config | -f

Use a configuration file

You can use the --config option to specify a file with a list of column names and optional replacements, one per line:

Create a configuration file:

name
email
password:faker.random.word

Pass the path to the file into --config

npx pg-anonymizer postgres://localhost/mydb \
  --config /path/to/file

--skip

Skip tables

Use --skip to skip anonymizing entire tables

npx pg-anonymizer postgres://localhost/mydb --skip public.posts

--preserve-null | -n

Preserve NULL values

Use --preserve-null to skip anonymization on fields with NULL values.

npx pg-anonymizer postgres://localhost/mydb --preserve-null

--faker-locale

Set fakers locale (i18n)

Use --faker-locale to change the locale used by faker (default: en)

Import the anonymized file

The anonymized output file is plain SQL text, you can import it with psql.

psql -d mylocaldb < output.sql

Why

There are a bunch of competitors, still I failed to use them:

  • postgresql_anonymizer may be hard to setup and may be cumbersome for simple usage. Still, I guess it's the best solution.
  • pganonymize fails when it does not use public schema or columns have uppercase characters
  • pganonymizer also fails with simple cases. Errors are not explicit and silent.

pg-anonymizer's People

Contributors

0xflotus avatar alexhall avatar dependabot[bot] avatar hugoarru avatar imreactmd avatar jackall3n avatar kfranqueiro avatar rap2hpoutre avatar semantic-release-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg-anonymizer's Issues

Excluding Tables

Is it possible to exclude certain tables from being included in the dump?

Context aware anonymization

It would be great to add the possibility to pass a whole row to a masking function we had a situation when we have a column called entity_number and for a person, it's their personal_id for the company itis their company_id, based on another column (entity_type) we can distinguish between them. so it will be really useful to have this "context" available so I can generate random personal_id / company_id.
I understand that it's not possible if the information is stored in another table but if it's in the same row it shouldn't be so difficult I hope. I can try to implement it myself if yo are accepting merge requests but to be honest I'm not very familiar with javascript/typescript :/ but I can give it a shot

strange error after upgrading to 0.7.0

After upgrading to 0.7.0 I encountered an issue:

ERROR: syntax error at or near "List"
LINE 1: List: public.message.target_from, public.message.target_to

public.message.target_from, public.message.target_to seams like part of the content of my --configFile which is actually

`
public.message.target_from:extension.maskContact
public.message.target_to:extension.maskContact

`

after downgrading to 0.6.0 everything works as expected

Can not specify colums in tables with uppercase characters

Unfortunately, it is not possible for me to anonymize columns that are in tables with uppercase characters

These are always skipped.

I've tried every variant I can think of. For example the following:

-o anont.sql --list=abc."USER".name
-o anont.sql --list=abc."USER".name

Is this basically not possible or am I doing something wrong?

Thanks for the help in advance

Update Faker version

Hi! I see the Faker version is v5 whilst v8 is out already. Are there plans to update the faker version?

Apply ESlint fixes

Currently, there is an incompatible mix between prettier default and ESlint. We should rely only on ESlint, and run lint on all files.

Side note: I have to wait for #13 to be merged to avoid conflicts

params like skip not working or list with strange rules

The project looks cool! Thanks for sharing ๐Ÿ‘

I tried to use it a bit, and I manage to give a list of columns.
But it was working only if it was written only the column name, like --list=email and not --list=public.user.email like in the docs.
And also, --skip is not skipping the table I want it to skip. This time no matter how I type it:
--skip=public.videoLink, --skip=videoLink, --skip=public."videoLink"

I still see (kept relevant infos..):

npx pg-anonymizer postgresql://user:password@localhost:5432/mydb -o dump.sql --skip=public."videoLink" --list=email

List: email
Skipping: public.videolink
Output file: dump.sql

Launching pg_dump...
Command pg_dump started, running anonymization.

public."User": idclient, [email]
Anonymizing 1 column...

public."VideoLink": id, link, serviceid
Skipping... no matching columns

And it's not skipped.

Support for JSON values

Is it possible to anonymize values in JSONs (simple) structure?

Let's say I have JSONB column:

data
----
{"first_name":"Michael","phone":"+123456"}

and I want to anonymize data->>'first_name'. Would that work too?

Fix Typescript issues

Currently, typescript issues are not fixed, and build can not be done. We have to fix typescript issues.

Side note: I have to wait for #13 to be merged to avoid conflicts

Ignore NULL values

Hey ๐Ÿ‘‹

It would be cool if it would be possible to set a configuration to even when using faker's config file, to set that any field that is NULL in the DB it should remain NULL instead of generating a fake name for example.

Let me know where I can help!

Cheers!

Error: Nonexistent flags

This command was working great until fairly recently

npx pg-anonymizer postgres://"${PROD_POSTGRES_USER}":"${PROD_POSTGRES_PASSWORD}"@"${PROD_POSTGRES_HOST}":5432/"${PROD_POSTGRES_DB}" --no-privileges --no-owner --clean --if-exists -T "Changelog"

This has been failing with the following error:
โ€บ Error: Nonexistent flags: --no-privileges, --no-owner, --clean, โ€บ --if-exists, -T, -T, -T โ€บ See more help with --help

Have the ability to add pgdump flags been removed?

Thanks!

pg_dump command failed

@rap2hpoutre

Even though pg_dump is installed, I get this error when running pg-anonymizer:

npx pg-anonymizer postgres://....
npx: installed 80 in 4.721s
Launching pg_dump
pg_dump command failed. Are you sure it is installed?

root@ip-10-0-00-000:/ # pg_dump --version
pg_dump (PostgreSQL) 12.7 (Ubuntu 12.7-1.pgdg20.04+1)

Anything I can try?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.