humanmade / go-anonymize-mysqldump Goto Github PK
View Code? Open in Web Editor NEWAllows you to pipe data from mysqldump or an SQL file and anonymize it.
License: GNU General Public License v3.0
Allows you to pipe data from mysqldump or an SQL file and anonymize it.
License: GNU General Public License v3.0
I just tried to follow the installation instructions for Mac. I ran
curl -OL https://github.com/humanmade/go-anonymize-mysqldump/releases/download/latest/go-anonymize-mysqldump_darwin_amd64.gz
and then tried
gunzip go-anonymize-mysqldump_darwin_amd64.gz
However, the latter returns
gunzip: go-anonymize-mysqldump_darwin_amd64.gz: not in gzip format
When trying to fix #15 with a simple markdown update, I noticed the CI pipeline is now failing.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
exit status 128
I'm getting a:
panic: interface conversion: sqlparser.Expr is *sqlparser.NullVal, not *sqlparser.SQLVal
goroutine 33 [running]:
main.modifyValues(0xc0004f4000, 0x4fd, 0x555, 0xc000014420, 0x9, 0xc0000f8700, 0x4, 0x4, 0x0, 0x0, ...)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:299 +0x572
main.applyConfigToInserts(0xc0000a2dc0, 0xc0000c08c0, 0x2, 0x4, 0x0, 0x0, 0x0)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:277 +0x1f9
main.applyConfigToParsedLine(0x66be60, 0xc0000a2dc0, 0xc0000c08c0, 0x2, 0x4, 0x0, 0x0, 0x0, 0x0)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:250 +0x68
main.processLine(0xc000560000, 0xff0d5, 0xc0000c08c0, 0x2, 0x4, 0x0, 0x0)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:218 +0x2c1
main.processInput.func1(0xc000014550, 0xc0000c08c0, 0x2, 0x4, 0xc0003dc360, 0xc000560000, 0xff0d5)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:194 +0xab
created by main.processInput
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:192 +0x41c
mysqldump: Got errno 32 on write
I believe it is because i set some field types that are not yet supported.
If that's the case, i think the error should be more clear, so it's easier to know that it's my fault and not a real error : O
I want to be able to add additional configs, as in when applications have additional fields to anonymize or aren't even WordPress. Ideally this would be in a JSON/YAML/TOML format.
I discovered today that this won't be able to parse multi-line insert statements like the following SQL:
INSERT INTO `wp_cavalcade_jobs` VALUES
(1,1,"wp_version_check","a:0:{}","2017-08-24 08:15:12","2017-09-11 20:15:12",43200,"failed"),
(2,1,"wp_update_plugins","a:0:{}","2017-08-24 08:15:12","2017-09-11 20:15:12",43200,"failed"),
(3,1,"wp_update_themes","a:0:{}","2017-08-24 08:15:12","2017-09-11 20:15:12",43200,"failed"),
(4,1,"wp_scheduled_delete","a:0:{}","2017-10-15 10:11:15","2017-10-15 10:11:15",86400,"failed");
Because we're parsing line by line, it reads each value as its own SQL statement which breaks the parser. I need to either merge them all into a single line or find some other solution.
While I've been building the prototype, I've been replacing values using functions with fixed strings. I want to use a faker-type library to actually generate random data, but unit testing randomly generated data is tough so I've been deferring solving that problem.
I just realized I could use the existing functions as mocks for testing purposes, now I just need to write the integration.
Currently we support replacing usernames, emails, etc. I would like to expose the other data types the faker library supports as well.
We're currently using production database in dev.
We plan to improve it by anonymizing the user data.
However we have some test/dev/admin accounts with a variety of roles and associated data which we want to preserve.
Would be nice being able to skip any transformation to a row if a condition matches. Example config we'd use to achieve this
"tableName": "users",
"ignore": [
{
"field": "mail",
"position": 6,
"value": "[email protected]"
},
{
"field": "mail",
"position": 6,
"value": "[email protected]"
}
],
"fields": [
As a work around:
We take 2 mysqldumps, the first one with all data going through anonymization.
The 2nd dump is appended to the original with "REPLACE INTO" queries with the users we plan to keep as is.
mysqldump --no-create-info --where "email like '%admin.com'" --replace application user_data > own_users.sql
Need to run tests automatically.
We have a user table where username and email columns contain the same value.
We'd like that the anonymized replacements would be the same for both columns in the same row.
If you have a UNIQUE index set for those fields, anonymize will break the dump as there's no option for truly unique emails/usernames.
My work around has been to add new transformation functions:
// add these imports
import (
"strconv"
"time"
)
func generateUniqueEmail(value *sqlparser.SQLVal) *sqlparser.SQLVal {
return sqlparser.NewStrVal([]byte(strconv.Itoa(int(time.Now().UnixNano())) + "@gmail.com"))
}
func generateUniqueUsername(value *sqlparser.SQLVal) *sqlparser.SQLVal {
return sqlparser.NewStrVal([]byte(faker.Internet().UserName() + strconv.Itoa(int(time.Now().UnixNano()))))
}
Given a database like
DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
`email` varchar(255),
);
INSERT INTO `test` VALUES ('foo'), (NULL), ('hodger');
if you use a config like
{
"patterns": [
{
"tableName": "test",
"fields": [
{
"field": "email",
"position": 1,
"type": "email",
"constraints": null
}
]
}
]
}
the tool will crash with a panic:
panic: interface conversion: sqlparser.Expr is *sqlparser.NullVal, not *sqlparser.SQLVal
goroutine 35 [running]:
main.modifyValues(0xc0003f6060, 0x3, 0x4, 0xc00008e2f8, 0x4, 0xc0001e6000, 0x1, 0x4, 0x0, 0x0, ...)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:299 +0x572
main.applyConfigToInserts(0xc00041a0b0, 0xc0001e4000, 0x1, 0x4, 0x0, 0x0, 0x0)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:277 +0x1f9
main.applyConfigToParsedLine(0x126f020, 0xc00041a0b0, 0xc0001e4000, 0x1, 0x4, 0x0, 0x0, 0x0, 0x0)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:250 +0x68
main.processLine(0xc0001cc1c0, 0x36, 0xc0001e4000, 0x1, 0x4, 0x0, 0x0)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:218 +0x2c1
main.processInput.func1(0xc00008e320, 0xc0001e4000, 0x1, 0x4, 0xc0003f20c0, 0xc0001cc1c0, 0x36)
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:194 +0xab
created by main.processInput
/Users/aang/Code/golang/src/github.com/humanmade/go-anonymize-mysqldump/anonymize-mysqldump.go:192 +0x41c
If NULL
does not appear, it works fine.
Ty for creating this tool!!
It would be perfect to setup all users with the same password, so we can login on any of them during development.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.