artlabss / open-data-anonymizer Goto Github PK
View Code? Open in Web Editor NEWPython Data Anonymization & Masking Library For Data Science Tasks
Home Page: https://www.artlabs.tech
License: BSD 3-Clause "New" or "Revised" License
Python Data Anonymization & Masking Library For Data Science Tasks
Home Page: https://www.artlabs.tech
License: BSD 3-Clause "New" or "Revised" License
Hi,
while calling the anonymize function:
could you please let me know, how can I fix this error?
Thanks, Shiva
Is your feature request related to a problem? Please describe.
Thank you for developing this interesting package! However, I think the package would greatly benefit from testing. The two main reasons are:
Describe the solution you'd like
At a a minimum unit and integration tests should be implemented. Unit tests should ensure that functions and methods behave as expected. Pass in a value and check that the correct output is created. Integration tests should ensure that, from a user's standpoint, the functions in your package are doing what they should.
Testing is commonly implemented using the pytest
package.
To be extra safe, static code checking should be done, for instance using pytest-flake8
, and pytest-mypy
for static type checking. These can catch hidden bugs before they become a problem. Additionally, adding pytest-pylint
would ensure that contributors format their code well.
Describe alternatives you've considered
There is no alternative to testing. However, there are a few packages to implement it aside from pytest
, such as Robot
and unittest
.
Additional context
This package is related to privacy, so it is critical that users can be confident in the product. I gain confidence in a package by seeing what claims are being substantiated by passing tests. If there isn't a test for it, I assume the feature cannot yet be trusted.
As it stands, I also would not be able to contribute (e.g. refactoring for efficiency) because I cannot be sure that I am not breaking current functionality.
Is your feature request related to a problem? Please describe.
It would be nice if the dfAnonymizer held an instance of the Faker instance that could be replaced or configured. This way I can add in my own custom providers and have them called using pandas.
Right now it looks like the Faker instance is created once the method _fake_column is called.
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
My script:
from anonympy.pdf import pdfAnonymizer
# need to specify paths, since I don't have them in system variables
anonym = pdfAnonymizer(
path_to_pdf="embedded_text.pdf",
pytesseract_path=r"C:\Program Files\Tesseract-OCR\tesseract.exe",
poppler_path=r"C:\Users\Krish Patel\Downloads\poppler-23.07.0\Library\bin",
)
# Calling the generic function
anonym.anonymize(
output_path="output.pdf", remove_metadata=True, fill="red", outline="black"
)
I already have downloaded both poppler and tesseract.
Then I ran
pip install anonympy
-> Ran successfully
Then I ran the script and got the below error:
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction> python .\test_open_data_anony_pdf.py
Traceback (most recent call last):
File "P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction\test_open_data_anony_pdf.py", line 1, in <module>
from anonympy.pdf import pdfAnonymizer
File "C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\anonympy\__init__.py", line 1, in <module>
from anonympy import pandas
File "C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\anonympy\pandas\__init__.py", line 6, in <module>
from anonympy.pandas.core_pandas import dfAnonymizer
File "C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\lib\site-packages\anonympy\pandas\core_pandas.py", line 6, in <module>
from cape_privacy.pandas import dtypes
ModuleNotFoundError: No module named 'cape_privacy'
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction>
Then I ran pip install cape-dataframes
-> It ran successfully.
Then on running pip install cape-privacy
it gave the following error:
(openadapt-py3.10) PS P:\OpenAdapt AI - MLDS AI\cloned_repo\test_other\OpenAdapt\openadapt\research_redaction> pip install cape-privacy
...
/Tcnumpy\core\src\multiarray\scalarapi.c /Fobuild\temp.win-amd64-3.10\Release\numpy\core\src\multiarray\scalarapi.obj
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DNPY_INTERNAL_BUILD=1 -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Inumpy\core\include -Ibuild\src.win-amd64-3.1\numpy\core\include/numpy -Inumpy\core\src\private -Inumpy\core\src -Inumpy\core -Inumpy\core\src\npymath -Inumpy\core\src\multiarray -Inumpy\core\src\umath -Inumpy\core\src\npysort -I"C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\include" -I"C:\Program Files\Python310\include" -I"C:\Program Files\Python310\Include" -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\ATLMFC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -I"C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" /Tcbuild\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.c /Fobuild\temp.win-amd64-3.10\Release\build\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.obj
error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DNPY_INTERNAL_BUILD=1 -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Inumpy\core\include -Ibuild\src.win-amd64-3.1\numpy\core\include/numpy -Inumpy\core\src\private -Inumpy\core\src -Inumpy\core -Inumpy\core\src\npymath -Inumpy\core\src\multiarray -Inumpy\core\src\umath -Inumpy\core\src\npysort -I"C:\Users\Krish Patel\AppData\Local\pypoetry\Cache\virtualenvs\openadapt-DSRh12US-py3.10\include" -I"C:\Program Files\Python310\include" -I"C:\Program Files\Python310\Include" -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -Ibuild\src.win-amd64-3.1\numpy\core\src\private -Ibuild\src.win-amd64-3.1\numpy\core\src\npymath -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.36.32532\ATLMFC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" -I"C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -I"C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" -I"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" /Tcbuild\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.c /Fobuild\temp.win-amd64-3.10\Release\build\src.win-amd64-3.1\numpy\core\src\multiarray\scalartypes.obj" failed with exit status 2
scalartypes.c
numpy\core\include\numpy/npy_3kcompat.h(198): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\common.h(269): warning C4244: 'return': conversion from 'npy_intp' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(482): warning C4996: 'PyUnicode_AsUnicode': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(483): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(488): warning C4996: 'PyUnicode_FromUnicode': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(483): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(482): warning C4996: 'PyUnicode_AsUnicode': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(483): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(488): warning C4996: 'PyUnicode_FromUnicode': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(516): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(517): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(1912): warning C4244: 'function': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(1912): warning C4244: 'function': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(1866): warning C4996: 'PyUnicode_AsUnicode': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(1867): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
numpy\core\src\multiarray\scalartypes.c.src(1871): warning C4996: 'PyObject_AsReadBuffer': deprecated in 3.0
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2788): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2768): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(2788): warning C4244: '=': conversion from 'Py_ssize_t' to 'int', possible loss of data
numpy\core\src\multiarray\scalartypes.c.src(3228): error C2440: 'function': cannot convert from 'double' to 'PyObject *'
numpy\core\src\multiarray\scalartypes.c.src(3228): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
numpy\core\src\multiarray\scalartypes.c.src(3228): error C2198: '_Py_HashDouble': too few arguments for call
numpy\core\src\multiarray\scalartypes.c.src(3237): error C2440: 'function': cannot convert from 'double' to 'PyObject *'
numpy\core\src\multiarray\scalartypes.c.src(3237): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
numpy\core\src\multiarray\scalartypes.c.src(3236): error C2198: '_Py_HashDouble': too few arguments for call
numpy\core\src\multiarray\scalartypes.c.src(3243): error C2440: 'function': cannot convert from 'double' to 'PyObject *'
numpy\core\src\multiarray\scalartypes.c.src(3243): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
numpy\core\src\multiarray\scalartypes.c.src(3242): error C2198: '_Py_HashDouble': too few arguments for call
numpy\core\src\multiarray\scalartypes.c.src(3228): error C2440: 'function': cannot convert from 'npy_longdouble' to 'PyObject *'
numpy\core\src\multiarray\scalartypes.c.src(3228): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
numpy\core\src\multiarray\scalartypes.c.src(3228): error C2198: '_Py_HashDouble': too few arguments for call
numpy\core\src\multiarray\scalartypes.c.src(3237): error C2440: 'function': cannot convert from 'npy_longdouble' to 'PyObject *'
numpy\core\src\multiarray\scalartypes.c.src(3237): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
numpy\core\src\multiarray\scalartypes.c.src(3236): error C2198: '_Py_HashDouble': too few arguments for call
numpy\core\src\multiarray\scalartypes.c.src(3243): error C2440: 'function': cannot convert from 'npy_longdouble' to 'PyObject *'
numpy\core\src\multiarray\scalartypes.c.src(3243): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
numpy\core\src\multiarray\scalartypes.c.src(3242): error C2198: '_Py_HashDouble': too few arguments for call
numpy\core\src\multiarray\scalartypes.c.src(3258): error C2440: 'function': cannot convert from 'double' to 'PyObject *'
numpy\core\src\multiarray\scalartypes.c.src(3258): warning C4024: '_Py_HashDouble': different types for formal and actual parameter 1
numpy\core\src\multiarray\scalartypes.c.src(3258): error C2198: '_Py_HashDouble': too few arguments for call
numpy\core\src\multiarray\scalartypes.c.src(4478): warning C4244: 'return': conversion from 'npy_intp' to 'int', possible loss of data
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for numpy
Running setup.py clean for numpy
error: subprocess-exited-with-error
python setup.py clean did not run successfully.
exit code: 1
[10 lines of output]
Running from numpy source directory.
`setup.py clean` is not supported, use one of the following instead:
- `git clean -xdf` (cleans all files)
- `git clean -Xdf` (cleans all versioned files, doesn't touch
files that aren't checked into the git repo)
Add `--force` to your command to use it anyway if you must (unsupported).
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed cleaning build dir for numpy
Failed to build numpy
ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects
[notice] A new release of pip is available: 23.1.2 -> 23.2
[notice] To update, run: python.exe -m pip install --upgrade pip
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
[notice] A new release of pip is available: 23.1.2 -> 23.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Expected behavior
A clear and concise description of what you expected to happen.
It should run the script and
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
It would be great if we can set the locale for Faker - setLocale()
That so when anonymising data, for example addresses, with the locale set, it will generate addresses specific to that location.
Hi, this is a courtesy ping from the folks at Cape. It looks like this library is using the cape_privacy
Python package that we maintain. FYI, the package has been renamed to cape_dataframes
and can be installed via pip install cape-dataframes
in the future. The existing cape_privacy
PyPI package won't be updated with new versions of this library in the future.
Describe the bug
I am not able to install anonympy with python 3.10
To Reproduce
Steps to reproduce the behavior:
pip install anonympy
I get
Collecting anonympy
Downloading anonympy-0.3.7.tar.gz (5.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.8/5.8 MB 6.0 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
error in anonympy setup command: 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6*'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Expected behavior
It must be installed without any error
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
pip install anonympy
retults in
error in anonympy setup command: 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6*'
this error happens just sometimes... i am using python 3.10.10 and i can install it, then i suddenly cannot anymore.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.