yelp / detect-secrets Goto Github PK
View Code? Open in Web Editor NEWAn enterprise friendly way of detecting and preventing secrets in code.
License: Apache License 2.0
An enterprise friendly way of detecting and preventing secrets in code.
License: Apache License 2.0
Scanning a Unicode
file (e.g. detect-secrets scan poc.foo
) results in
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 76: ordinal not in range(128)
The last place in the stack trace is
https://github.com/Yelp/detect-secrets/pull/129/files was a previous fix for something similar, where from __future__ import unicode_literals
fixed the issue easily. I vim
d into my site-packages
, added the line to ini_file_parser.py
, and it fixed it. So we just need to make a PR similar to #129.
It is one of the rarely specific kinds of secrets where we should ensure length and stuff of the captured group, so that e.g. foo
or bar
are not captured with that keyword. This isn't possible for most keywords, since e.g. passwords can be anything.
The type
in the secrets baseline just lists "type": "High Entropy String"
for both the Base64HighEntropyString and HexHighEntropyString detectors. This makes determining which plugin is active quite difficult.
From #52, we're able to do:
$ detect-secrets scan --string '012345678a'
but what happens if the string two or more secrets? e.g.
$ detect-secrets scan --string '"0123456789a" and "0123456789b"'
Right now, we're only going to show the scanned results for the first secret. But you can imagine it's kinda weird UX to only show results for the first one (silently ignoring the second).
We're less concerned about private keys, if they are encrypted with a passphrase. An example format is:
-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: AES-128-CBC,99AD1487680054D5E49D263D3E4CBFEB
We probably can use this heuristic to reduce flagged data.
Someone reported to us the following redacted stacktrace:
...detect_secrets/core/audit.py", line 347, in _highlight_secret
secret_line[index_of_secret + len(raw_secret):],
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 51: ordinal not in range(128)
In #127 we added a line exclude for all plugins and reverted 15a6e6a, which was a line exclude regex for just the high-entropy plugins, this was partially due to imagined use-cases of things someone may want to exclude from all plugins e.g. lines whose first non-whitespace character is a#
, and general cleanliness (DRY).
The keyword detector however, is a young plugin with kind of an ambitious goal, and users shouldn't have to add to the FALSE_POSITIVES
dict things that may be specific to their codebases, and version bumps won't need to happen to trim false positives.
This is a good-first-issue, b/c it was done very similarly in 15a6e6a
User may be spamming s
Enter
, and so we will want to go backwards.
This tool is meant to be fast. Not blazingly fast, but fast nonetheless. Before we can optimize on speed though, we need to create some testing frameworks to accurately measure performance of the engine.
This way, we can make improvements, and note the speed differential for different regexes / features.
So that we do not have to repeat
if WHITELIST_REGEX.search(string):
return output
in each plugin
If possible
While trying out detect-secrets for the first time today, I noticed the term 'secret_key' is not present in the keyword plugin (see https://github.com/Yelp/detect-secrets/blob/master/detect_secrets/plugins/keyword.py#L38). This means it doesn't find the very common Django SECRET_KEY
variable. I was wondering if folks think 'secret_key' should be added to the keyword plugin's blacklist...but as a newcomer to this library I wasn't sure if that would cause consternation, since it would basically point out SECRET_KEY
for any and all Django projects (if I understand the plugin correctly).
Just looking to start the conversation! Thanks!
The docs suggest using the --scan
option but it doesnt exist
$ detect-secrets --scan > .secrets.baseline
usage: detect-secrets [-h] [--base64-limit BASE64_LIMIT]
[--hex-limit HEX_LIMIT] [-v] [--initialize [INITIALIZE]]
[--exclude EXCLUDE]
detect-secrets: error: unrecognized arguments: --scan
Where applicable plugins are currently the BasicAuth, Keyword and PrivateKey plugins
We output the same thing over and over again
e.g. for one repo, running pre-commit run detect-secrets --all-files
outputs:
Detect secrets...........................................................Failed
hookid: detect-secrets
Files were modified by this hook. Additional output:
The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.
The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.
The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.
We should try to do this just once.
$ cat environment.variables
PASSWORD=d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1
$ detect-secrets scan environment.variables
{
"exclude_regex": null,
"generated_at": "2018-12-21T20:02:03Z",
"plugins_used": [
{
"base64_limit": 4.5,
"name": "Base64HighEntropyString"
},
{
"name": "BasicAuthDetector"
},
{
"hex_limit": 3,
"name": "HexHighEntropyString"
},
{
"name": "PrivateKeyDetector"
}
],
"results": {},
"version": "0.11.0"
}
Environment variables look like .ini
files, but without a header. We should be able to capture this using this fact.
ๆไนๆๅ ฅๅฐ่ชๅทฑ็ๅฐ้กน็ฎไธญ๏ผ
Doing e.g. detect-secrets audit .secrets.baseline
won't highlight the relevant secret, which makes it kind of frustrating to use.
I didn't dive into what is causing this, b/c I switched to a python 3 venv, but we should fix it.
Currently, we use the baseline for two purposes:
For migratory purposes, we need some way to distinguish them, so that we can aggregate how many secrets left to move to a more secure storage. Therefore, the proposed way of displaying secrets in the baseline is:
{
"type": "High Entropy String",
"line_number": 19,
"hashed_secret": "b8b693a3759e023b509093f4cacf0a3c973266fc",
"is_secret": false,
}
Then, we should make an --audit
command line flag that prompts users through each of their secrets found, verifies with them whether it is indeed a secret, and makes the appropriate baseline change.
Also noticed a few bugs in that file:
The regex for high_entropy_strings is too restrictive in the sense that it requires a string to be defined as 'string here'
or "string here"
.
This means for file formats like .ini
or .yaml
, it fails to capture high entropy strings.
Eg.
[credentials]
admin_secret = superhighentropystringhere
This prevents unnecessary baseline modifications.
So we json.dumps a dictionary, not an ordered dictionary, so doing sort_keys=True
in the json.dumps
here should fix it.
To name a few:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
frozenset('0123456789ABCDEFabcdef')
"abcdefghijklmnopqrstuvwxyz=/"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
'0123456789abcdef'
'0123456789ABCDEF'
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
'0123456789ABCDEFabcdef'
'abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
should not trigger, as it is sequential. We could do secret in that_string
or maybe a set of sequential strings, but that's the naive first-thought solution.
$ tox -e pre-commit -- run detect-secrets --all-files
[detect-secrets] Detect secrets..........................................Failed
hookid: detect-secrets
Files were modified by this hook. Additional output:
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
ERROR: InvocationError: '/nail/home/louist/pg/puppet/.tox/pre-commit/bin/pre-commit run detect-secrets --verbose --all-files'
________________________________________________________________________________________________________________________________________________________ summary _________________________________________________________________________________________________________________________________________________________
ERROR: pre-commit: commands failed
diff --git a/.secrets.baseline b/.secrets.baseline
index 70a2e0c..37fd650 100644
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -1,19 +1,7 @@
{
"exclude_regex": "^(\\.git|venv|vendor|secrets)",
- "generated_at": "2018-06-13T09:34:56Z",
- "plugins_used": [
- {
- "limit": 4.5,
- "name": "Base64HighEntropyString"
- },
- {
- "limit": 3,
- "name": "HexHighEntropyString"
- },
- {
- "name": "PrivateKeyDetector"
- }
- ],
+ "generated_at": "2018-06-18T09:46:46Z",
+ "plugins_used": [],
"results": {
"Puppetfile": [
{
In #132 a keyword exclude option was added, but we didn't write what it was to the baseline. This is problematic because e.g. detect-secrets scan --update .secrets.baseline
will re-scan and never use the keyword exclude.
I realized this before releasing 0.12.1 but thought it was pretty low priority, as it is mostly used to enable us to not have to bump detect-secrets
and detect-secrets-server
versions to make additions to the FALSE_POSITIVES
dictionary in the keyword plugin, and sync it upstream asynchronously.
If you have e.g.
self.thepassword = "thepassword"
and run the soon-to-be-merged keyword detector, then run the audit functionality, it will highlight the first occurrence, and instead of the second. This is a bug.
This is because in audit.py
we find the index of the secret
detect-secrets/detect_secrets/core/audit.py
Line 561 in 1415b4b
secret_generator
method of plugins.
This somewhat related to the issue of handling multiple secrets on the same line
In the private key plugin we alert if there is any beginning line of a private key in a file.
If all that is between a BEGIN
string and corresponding END
string, is e.g. some_private_key
, then we shouldn't alert off of it.
The code for this won't be as pretty as it is now.
So right now the private_key
plugin returns a dictionary of the form {'filename':PotentialSecret}
, whereas the high_entropy_strings
plugin returns a dictionary of the form {PotentialSecret.__hash()__: PotentialSecret}
. We should change this to be the same, I lean towards the {PotentialSecret.__hash()__: PotentialSecret}
because I think the rest of the code assumes this. So e.g.
- output[filename] = PotentialSecret(
+ secret = PotentialSecret(
self.secret_type,
filename,
line_num,
string,
)
+ output[secret] = secret
Having a bunch of
INFO: Checking file: some_image.png
WARNING: some_image.png failed to load.
from
In detect-secrets-server
we already have the IGNORED_FILE_EXTENSIONS
tuple we made to skip files like this
IGNORED_FILE_EXTENSIONS = (
'7z',
'bmp',
'bz2',
'dmg',
'exe',
'gif',
'gz',
'ico',
'jar',
'jpg',
'jpeg',
'png',
'rar',
'realm',
's7z',
'tar',
'tif',
'tiff',
'webp',
'zip',
)
maybe we should move it to detect-secrets
, change it to a dict
, and use it.
So only in say, .json
files you want to exclude checksum
, but no other file type. Or something like that.
Or no answer =
in .tf
files
Currently, baselines have no notion of which version of detect-secrets created it. This makes things slightly annoying, because with a major version bump, it could invalidate old baselines, requiring the user to recreate the baseline to be compliant once again.
At the very least, we should have baselines know which detect-secrets version created it, so we can be aware when this happens.
E.g. #26
$ detect-secrets scan test_data/short_files/first_line.py > .secrets.baseline
$ echo "delete the secret from test_data/short_files/first_line.py"
$ PYTHONPATH=`pwd` detect_secrets/pre_commit_hook.py --baseline .secrets.baseline test_data/short_files/first_line.py
$ git diff .secrets.baseline
Certain API keys use hyphens.
e.g. blahblah-aaaa-bbbb-cccc-ddddddd
This currently is not caught by the suite of HighEntropyStringPlugins.
Repro: detect-secrets --scan . -vvv
Here lies the relevant stack trace:
Traceback (most recent call last):
File "/hey/three_six/bin/detect-secrets", line 11, in <module>
sys.exit(main())
File "/hey/three_six/lib/python3.6/site-packages/detect_secrets/main.py", line 27, in main
log.set_debug_level(args.verbose)
File "/hey/three_six/lib/python3.6/site-packages/detect_secrets/core/log.py", line 46, in _set_debug_level
self.setLevel(mapping[debug_level])
KeyError: 3
From https://github.com/Yelp/detect-secrets/pull/105/files#diff-557d95bdd433460fd987ace2659caee6, right now we only mention # pragma: whitelist secret
in the README
Hello,
I've tested the new version and I get this error :
Traceback (most recent call last):
File "/home/.../detect-secrets-allprojects/venv/bin/detect-secrets", line 11, in
sys.exit(main())
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/main.py", line 43, in main
_perform_scan(args, plugins),
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/main.py", line 118, in _perform_scan
args.all_files,
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/core/baseline.py", line 52, in initialize
output.scan_file(file)
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/core/secrets_collection.py", line 185, in scan_file
self._extract_secrets_from_file(f, filename_key)
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/core/secrets_collection.py", line 282, in _extract_secrets_from_file
results.update(plugin.analyze(f, filename))
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/plugins/base.py", line 32, in analyze
secrets = self.analyze_string(line, line_num, filename)
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/plugins/keyword.py", line 136, in analyze_string
filetype=determine_file_type(filename),
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/plugins/keyword.py", line 157, in secret_generator
match = REGEX.search(lowered_string)
Do you have an idea ?
Best regards,
Tioborto
Hello,
I'm trying to scan using detect-secrets --all-files
and I noticed that the tool does not detect multiple instances of the same secret in a single file. It will simply flag the first instance found. The comment located here makes me believe that this is by design. Is it possible to change this behavior or is the tool not designed for this?
While testing what I uploaded to Test PyPI, I did detect-secrets --help
(on Python 2), and got:
ImportError: No module named configparser
(high_entropy_strings.py
)
flake8
, in requirements-dev.txt
requires configparser
, so we never ran into this during make test
, ๐ฎ pretty sneaky!
We should make a make test w/ setup.py dependencies only
option from the Makefile.
But for now, python-future
suggest doing from configparser import ConfigParser
so that is what I will do.
Hi,
I run truffleHog. I recognize other projects have spun out doing similar things, so a little while ago I broke the regexes into their own package:
https://github.com/dxa4481/truffleHogRegexes
The thinking is I'd ideally like to get the whole community contributing regexes to one place, even if the underlying engine and technology is different.
It's on pypi, feel free to include the regex library.
Regex in question
https://github.com/Yelp/detect-secrets/blob/master/detect_secrets/plugins/basic_auth.py#L10
https://www.loggly.com/blog/five-invaluable-techniques-to-improve-regex-performance/
Maybe we can run truffle hog regex, instead
Perhaps in baseline.py we can do something similar to
+ if os.path.isdir(
+ os.path.join(rootdir, '.git')
+ ):
+ # This only works when you run it on the root directory of another repository
+ git_ls_files_args = [
+ 'git',
+ '--git-dir', os.path.join(rootdir, '.git'),
+ 'ls-files',
+ ]
+ else:
+ # This only works when you run it on a folder or file in the current repository
+ git_ls_files_args = [
+ 'git',
+ 'ls-files',
+ rootdir,
+ ]
with open(os.devnull, 'w') as fnull:
git_files = subprocess.check_output(
- [
- 'git',
- 'ls-files',
- rootdir,
- ],
+ git_ls_files_args,
stderr=fnull,
)
However I tested this and although the output from the command was good and all, the outputted baseline was not.
Also note, that the above only works if the directory outside the current repository is the root directory for that repository, not if it's some/other/repo/folder_inside_that_repo
I can imagine we could loop through all parents and see if one of them has a .git
directory, but that feels real dirty. We could maybe try something similar to
subprocess.check_output(
('git', 'remote', 'get-url', 'origin'),
cwd=os.path.dirname(filename)
).decode('utf8').strip()
detect-secrets
doesn't complain about:
whitelisted_api_key: "DEADBEEF1234" # pragma: whitelist secret
but complains about:
whitelisted_api_key: "DEADBEEF1234" # pragma: whitelist secret blah
If it's a comment, it should support text that follows it.
When scanning a non-ini file with more than 1 million lines, it would hang at line below.
(self._analyze_ini_file(add_header=True), configparser.Error,),
I'm able to trace back to configParse and found the following line is extremely inefficient to add all offending lines (essentially all the lines in the file) into the error message with string concatenation.
self.message += '\n\t[line %2d]: %s' % (lineno, line)
I did not have the patience to wait for the scan to finish, on my laptop it did hang for at least more than 10 minutes.
We need a more efficient way to scan large non-ini file.
Some of the plugins, in particular, entropy-based and keyword plugins, can generate a relatively high number of false positives. When some of our teams are using detect-secrets, they choose to exclude certain plugins (with or without the combination of excluding some files). Currently, if you run a scan with --no-xxx-scan
option, the used plugin list would be persisted in the baseline file.
If some developer or automation system picks up the repo, have no pre-commit hook setup and also unaware of the exclude list, they could run into the issue that they issue detect-secrets --update baseline
, then the baseline file is regenerated with all plugins used.
Would the community entertain the idea that detect-secrets --update baseline
scan use the plugin list from baseline instead of all plugins (default setting)? Some additional options can be added if you want to use more plugins than baseline ones to scan the repo.
We have something implemented in our fork (offline in our GHE), we'd like to hear some feedback on the problem before submitting a big PR.
--no-line-numbers
(in baseline)
--no-generated-at
(in baseline)
and "Make pre-commit hook only look at the git diff" options.
Are all possible ideas.
hookid: detect-secrets
The supplied baseline may be incompatible with the current
version of detect-secrets. Please recreate your baseline to
avoid potential mis-configurations.
Current Version: 0.10.3
Baseline Version: 0.9.1
detect-secrets should heal the repository when this happens rather than requiring manual intervention.
When you audit a baseline, if a file has been removed, it crashes the audit command and you lose all your progress.
Progress should be saved after each keystroke, and system calls should be protected by a try ... catch block
When using --import <baseline_filename>
, a baseline is created without knowledge of the existing baseline file. This means that the current baseline file will be scanned for secrets (which it will clearly find, due to the secret hashes stored in there).
When we upgrade baselines, we currently need to perform a two-liner (without sponge
):
$ detect-secrets scan --import .secrets.baseline > .secrets.baseline.new
$ mv .secrets.baseline.new .secrets.baseline
If we already know the filename we're importing from (as compared to reading from stdin), we should also write to it.
$ detect-secrets scan --upgrade .secrets.baseline
This will write results to the provided file, and ignore the false positives in the current baseline file.
One that the scanner server will use instead of the regular exclude regex, if present.
Hello,
When i run a scan on a repo, KeywordDetector is not used.
If it's normal, why it's disable ?
{
"exclude_regex": null,
"generated_at": "2018-11-26T09:42:29Z",
"plugins_used": [
{
"base64_limit": 4.5,
"name": "Base64HighEntropyString"
},
{
"name": "BasicAuthDetector"
},
{
"hex_limit": 3,
"name": "HexHighEntropyString"
},
{
"name": "PrivateKeyDetector"
}
],
"results": {},
"version": "0.10.5"
}
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.