yelp / detect-secrets Goto Github PK

View Code? Open in Web Editor NEW

3.7K 3.7K 454.0 1.97 MB

An enterprise friendly way of detecting and preventing secrets in code.

License: Apache License 2.0

Makefile 0.07% Python 99.38% PHP 0.03% Shell 0.53%

detect-secrets's People

Contributors

Stargazers

Watchers

Forkers

kevinhock pieterlange techlord-rce hanxue husttb lixiangchen-zh fuzesoft mrxuyong ahlfors lukw00heck gridl yanghongkjxy awesome-security louistrezzini magic-coder welldoer lyw007 hacder tanjians briantyr sharmer156 wsf1990 xunianddup ibrahimalayah nxsre luhuadong guykisel cleborys lookcrabs florianeidner ddj0509 tiletheplane operasoftware maulik2 fisheye-123 batermj miss-bug tatumsu thepro-dot-xyz cclauss sasqwatch piano-wow pqyplzxhgf linuxshark micylt killuazhu munesh124 koukatsumi theblackboxsociety hpandeycodeit neunkasulle nymous tiandiyixian ample p3t3rp4rk3r namburgesas justineyster baboateng magnologan dgzlopes richo yumathecompanion zhongyang adrianbn oicmudkips captainfreak modulexcite lirantal santhosh34 npesaresi patil2099 chandrani0702 atymic gdemarcsek serviolimareina t0mmykn1fe nskn security-architecture acumenix malthejorgensen rhinoceros alichebel raf64flo n00biekrakr xianlimei devsecops-src optionalg parampavar 404-not-find subramaniaym netcode jkirsteins msmyers gazali-alfatih justin-brazil nexeck e7dal manesioz 0atman rkworks

detect-secrets's Issues

Scan throws a UnicodeEncodeError when ini file has unicode on RHS

Scanning a Unicode file (e.g. detect-secrets scan poc.foo) results in

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 76: ordinal not in range(128)

The last place in the stack trace is

detect-secrets/detect_secrets/plugins/common/ini_file_parser.py

Line 103 in 3ab11c6

re.escape(values_list[current_value_list_index]),

https://github.com/Yelp/detect-secrets/pull/129/files was a previous fix for something similar, where from __future__ import unicode_literals fixed the issue easily. I vimd into my site-packages, added the line to ini_file_parser.py, and it fixed it. So we just need to make a PR similar to #129.

Improve accuracy of aws_secret_access_key in keyword detector

It is one of the rarely specific kinds of secrets where we should ensure length and stuff of the captured group, so that e.g. foo or bar are not captured with that keyword. This isn't possible for most keywords, since e.g. passwords can be anything.

type in baseline file does not indicate which plugin is used

The type in the secrets baseline just lists "type": "High Entropy String" for both the Base64HighEntropyString and HexHighEntropyString detectors. This makes determining which plugin is active quite difficult.

Detect same secret multiple times in the same line or file

From #52, we're able to do:

$ detect-secrets scan --string '012345678a'

but what happens if the string two or more secrets? e.g.

$ detect-secrets scan --string '"0123456789a" and "0123456789b"'

Right now, we're only going to show the scanned results for the first secret. But you can imagine it's kinda weird UX to only show results for the first one (silently ignoring the second).

Reduce false positives for private keys

We're less concerned about private keys, if they are encrypted with a passphrase. An example format is:

We probably can use this heuristic to reduce flagged data.

[audit functionality] Handle Unicode better

Someone reported to us the following redacted stacktrace:

...detect_secrets/core/audit.py", line 347, in _highlight_secret
    secret_line[index_of_secret + len(raw_secret):],
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 51: ordinal not in range(128)

Line exclude regex just for keyword detector

In #127 we added a line exclude for all plugins and reverted 15a6e6a, which was a line exclude regex for just the high-entropy plugins, this was partially due to imagined use-cases of things someone may want to exclude from all plugins e.g. lines whose first non-whitespace character is a#, and general cleanliness (DRY).

The keyword detector however, is a young plugin with kind of an ambitious goal, and users shouldn't have to add to the FALSE_POSITIVES dict things that may be specific to their codebases, and version bumps won't need to happen to trim false positives.

This is a good-first-issue, b/c it was done very similarly in 15a6e6a

Add a (b)ack option to 'Is this a valid secret?'

User may be spamming s Enter, and so we will want to go backwards.

Create performance testing benchmarks

This tool is meant to be fast. Not blazingly fast, but fast nonetheless. Before we can optimize on speed though, we need to create some testing frameworks to accurately measure performance of the engine.

This way, we can make improvements, and note the speed differential for different regexes / features.

Make excluding whitelisted lines part of the base plugin

So that we do not have to repeat

if WHITELIST_REGEX.search(string):
            return output

in each plugin

If possible

Should 'secret_key' be added to the keyword plugin?

While trying out detect-secrets for the first time today, I noticed the term 'secret_key' is not present in the keyword plugin (see https://github.com/Yelp/detect-secrets/blob/master/detect_secrets/plugins/keyword.py#L38). This means it doesn't find the very common Django SECRET_KEY variable. I was wondering if folks think 'secret_key' should be added to the keyword plugin's blacklist...but as a newcomer to this library I wasn't sure if that would cause consternation, since it would basically point out SECRET_KEY for any and all Django projects (if I understand the plugin correctly).

Just looking to start the conversation! Thanks!

--scan option is just completely missing

The docs suggest using the --scan option but it doesnt exist

$ detect-secrets --scan > .secrets.baseline
usage: detect-secrets [-h] [--base64-limit BASE64_LIMIT]
                      [--hex-limit HEX_LIMIT] [-v] [--initialize [INITIALIZE]]
                      [--exclude EXCLUDE]
detect-secrets: error: unrecognized arguments: --scan

Refactor applicable plugins to all inherit from RegexBasedDetector

Where applicable plugins are currently the BasicAuth, Keyword and PrivateKey plugins

Output "The baseline file was updated." dialogue once

We output the same thing over and over again

e.g. for one repo, running pre-commit run detect-secrets --all-files outputs:

Detect secrets...........................................................Failed
hookid: detect-secrets

Files were modified by this hook. Additional output:

The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.


The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.


Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.


The baseline file was updated.
Probably to keep line numbers of secrets up-to-date.
Please `git add .secrets.baseline`, thank you.

We should try to do this just once.

ini-like file not parsing

$ cat environment.variables
PASSWORD=d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1
$ detect-secrets scan environment.variables
{
  "exclude_regex": null,
  "generated_at": "2018-12-21T20:02:03Z",
  "plugins_used": [
    {
      "base64_limit": 4.5,
      "name": "Base64HighEntropyString"
    },
    {
      "name": "BasicAuthDetector"
    },
    {
      "hex_limit": 3,
      "name": "HexHighEntropyString"
    },
    {
      "name": "PrivateKeyDetector"
    }
  ],
  "results": {},
  "version": "0.11.0"
}

Environment variables look like .ini files, but without a header. We should be able to capture this using this fact.

密码检测框架

怎么插入到自己的小项目中？

Audit doesn't highlight on python2

Doing e.g. detect-secrets audit .secrets.baseline won't highlight the relevant secret, which makes it kind of frustrating to use.

I didn't dive into what is causing this, b/c I switched to a python 3 venv, but we should fix it.

Add an --audit functionality to audit created baseline

Currently, we use the baseline for two purposes:

Acknowledgement of current true positive secrets in the code base, AND
Whitelisted false positives.

For migratory purposes, we need some way to distinguish them, so that we can aggregate how many secrets left to move to a more secure storage. Therefore, the proposed way of displaying secrets in the baseline is:

{
    "type": "High Entropy String",
    "line_number": 19,
    "hashed_secret": "b8b693a3759e023b509093f4cacf0a3c973266fc",
    "is_secret": false,
}

Then, we should make an --audit command line flag that prompts users through each of their secrets found, verifies with them whether it is indeed a secret, and makes the appropriate baseline change.

private_key.py could be considered a derivative work without proper license attribution

detect-secrets/detect_secrets/plugins/private_key.py

Line 20 in 86e96ee

This is based off https://github.com/pre-commit/pre-commit-hooks.

https://github.com/pre-commit/pre-commit-hooks/blob/a193eab99ed99429b9b8e517be68bed9a7f6ec4f/LICENSE#L10-L11

Also noticed a few bugs in that file:

high_entropy_strings plugin fails to capture non-string-like secrets

The regex for high_entropy_strings is too restrictive in the sense that it requires a string to be defined as 'string here' or "string here".

This means for file formats like .ini or .yaml, it fails to capture high entropy strings.

Eg.

[credentials]
admin_secret = superhighentropystringhere

Prevent detect-secrets from reordering the dicts in baseline

This prevents unnecessary baseline modifications.

So we json.dumps a dictionary, not an ordered dictionary, so doing sort_keys=True in the json.dumps here should fix it.

Fix sequential false-positives

To name a few:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
frozenset('0123456789ABCDEFabcdef')
"abcdefghijklmnopqrstuvwxyz=/"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
'0123456789abcdef'
'0123456789ABCDEF'
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
'0123456789ABCDEFabcdef'
'abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

should not trigger, as it is sequential. We could do secret in that_string or maybe a set of sequential strings, but that's the naive first-thought solution.

pre-commit hook removes plugins from baseline (on version 0.8.8)

$ tox -e pre-commit -- run detect-secrets --all-files
[detect-secrets] Detect secrets..........................................Failed
hookid: detect-secrets

Files were modified by this hook. Additional output:

Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.
Your baseline file (.secrets.baseline) is unstaged.
`git add .secrets.baseline` to fix this.

ERROR: InvocationError: '/nail/home/louist/pg/puppet/.tox/pre-commit/bin/pre-commit run detect-secrets --verbose --all-files'
________________________________________________________________________________________________________________________________________________________ summary _________________________________________________________________________________________________________________________________________________________
ERROR:   pre-commit: commands failed

diff --git a/.secrets.baseline b/.secrets.baseline
index 70a2e0c..37fd650 100644
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -1,19 +1,7 @@
 {
   "exclude_regex": "^(\\.git|venv|vendor|secrets)",
-  "generated_at": "2018-06-13T09:34:56Z",
-  "plugins_used": [
-    {
-      "limit": 4.5,
-      "name": "Base64HighEntropyString"
-    },
-    {
-      "limit": 3,
-      "name": "HexHighEntropyString"
-    },
-    {
-      "name": "PrivateKeyDetector"
-    }
-  ],
+  "generated_at": "2018-06-18T09:46:46Z",
+  "plugins_used": [],
   "results": {
     "Puppetfile": [
       {

Add keyword exclude to baseline

In #132 a keyword exclude option was added, but we didn't write what it was to the baseline. This is problematic because e.g. detect-secrets scan --update .secrets.baseline will re-scan and never use the keyword exclude.

I realized this before releasing 0.12.1 but thought it was pretty low priority, as it is mostly used to enable us to not have to bump detect-secrets and detect-secrets-server versions to make additions to the FALSE_POSITIVES dictionary in the keyword plugin, and sync it upstream asynchronously.

[audit functionality] Return the correct occurrence of the secret text in a line

If you have e.g.

self.thepassword = "thepassword"

and run the soon-to-be-merged keyword detector, then run the audit functionality, it will highlight the first occurrence, and instead of the second. This is a bug.

This is because in audit.py we find the index of the secret

detect-secrets/detect_secrets/core/audit.py

Line 561 in 1415b4b

index_of_secret = secret_line.lower().index(raw_secret.lower())

and we don't return the index of the secret in the secret_generator method of plugins.

This somewhat related to the issue of handling multiple secrets on the same line

detect-secrets/detect_secrets/plugins/base.py

Line 76 in 1415b4b

# TODO: Handle multiple secrets on single line.

Filter out false-positives in private key detector

In the private key plugin we alert if there is any beginning line of a private key in a file.

If all that is between a BEGIN string and corresponding END string, is e.g. some_private_key, then we shouldn't alert off of it.

The code for this won't be as pretty as it is now.

Inconsistent analyze_string return values

So right now the private_key plugin returns a dictionary of the form {'filename':PotentialSecret}, whereas the high_entropy_strings plugin returns a dictionary of the form {PotentialSecret.__hash()__: PotentialSecret}. We should change this to be the same, I lean towards the {PotentialSecret.__hash()__: PotentialSecret} because I think the rest of the code assumes this. So e.g.

-            output[filename] = PotentialSecret(
+            secret = PotentialSecret(
                 self.secret_type,
                 filename,
                 line_num,
                 string,
             )
+            output[secret] = secret

Handle un-scannable files more gracefully

Having a bunch of
INFO: Checking file: some_image.png
WARNING: some_image.png failed to load.
from

detect-secrets/detect_secrets/core/secrets_collection.py

Line 310 in 9f3d9ee

log.warning("%s failed to load.", filename)

is not ideal. We know we cannot scan certain files e.g. images, so we should behave more gracefully.

In detect-secrets-server we already have the IGNORED_FILE_EXTENSIONS tuple we made to skip files like this

IGNORED_FILE_EXTENSIONS = (
    '7z',
    'bmp',
    'bz2',
    'dmg',
    'exe',
    'gif',
    'gz',
    'ico',
    'jar',
    'jpg',
    'jpeg',
    'png',
    'rar',
    'realm',
    's7z',
    'tar',
    'tif',
    'tiff',
    'webp',
    'zip',
)

maybe we should move it to detect-secrets, change it to a dict, and use it.

File extension specific exclusions

So only in say, .json files you want to exclude checksum, but no other file type. Or something like that.

Or no answer = in .tf files

baselines break with version bumps

Currently, baselines have no notion of which version of detect-secrets created it. This makes things slightly annoying, because with a major version bump, it could invalidate old baselines, requiring the user to recreate the baseline to be compliant once again.

At the very least, we should have baselines know which detect-secrets version created it, so we can be aware when this happens.

E.g. #26

pre-commit hook removes audited secrets

Steps to Reproduce

$ detect-secrets scan test_data/short_files/first_line.py > .secrets.baseline
$ echo "delete the secret from test_data/short_files/first_line.py"
$ PYTHONPATH=`pwd` detect_secrets/pre_commit_hook.py --baseline .secrets.baseline test_data/short_files/first_line.py
$ git diff .secrets.baseline

Improve false negative ratio by detecting keys with hyphens

Certain API keys use hyphens.

e.g. blahblah-aaaa-bbbb-cccc-ddddddd

This currently is not caught by the suite of HighEntropyStringPlugins.

-v Verbosity overflow, 3+ v's cause a KeyError

Repro: detect-secrets --scan . -vvv

Here lies the relevant stack trace:

Traceback (most recent call last):
  File "/hey/three_six/bin/detect-secrets", line 11, in <module>
    sys.exit(main())
  File "/hey/three_six/lib/python3.6/site-packages/detect_secrets/main.py", line 27, in main
    log.set_debug_level(args.verbose)
  File "/hey/three_six/lib/python3.6/site-packages/detect_secrets/core/log.py", line 46, in _set_debug_level
    self.setLevel(mapping[debug_level])
KeyError: 3

Document additional whitelist directive regexes

From https://github.com/Yelp/detect-secrets/pull/105/files#diff-557d95bdd433460fd987ace2659caee6, right now we only mention # pragma: whitelist secret in the README

Keyword plugin regex.search throws exception in 0.11.3

Hello,

I've tested the new version and I get this error :

Traceback (most recent call last):
File "/home/.../detect-secrets-allprojects/venv/bin/detect-secrets", line 11, in
sys.exit(main())
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/main.py", line 43, in main
_perform_scan(args, plugins),
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/main.py", line 118, in _perform_scan
args.all_files,
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/core/baseline.py", line 52, in initialize
output.scan_file(file)
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/core/secrets_collection.py", line 185, in scan_file
self._extract_secrets_from_file(f, filename_key)
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/core/secrets_collection.py", line 282, in _extract_secrets_from_file
results.update(plugin.analyze(f, filename))
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/plugins/base.py", line 32, in analyze
secrets = self.analyze_string(line, line_num, filename)
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/plugins/keyword.py", line 136, in analyze_string
filetype=determine_file_type(filename),
File "/home/.../detect-secrets-allprojects/venv/local/lib/python3.5/site-packages/detect_secrets/plugins/keyword.py", line 157, in secret_generator
match = REGEX.search(lowered_string)

Do you have an idea ?

Best regards,

Tioborto

Same secret multiple times in the same file

Hello,

I'm trying to scan using detect-secrets --all-files and I noticed that the tool does not detect multiple instances of the same secret in a single file. It will simply flag the first instance found. The comment located here makes me believe that this is by design. Is it possible to change this behavior or is the tool not designed for this?

configparser requirement not in setup.py

While testing what I uploaded to Test PyPI, I did detect-secrets --help (on Python 2), and got:

ImportError: No module named configparser (high_entropy_strings.py)

flake8, in requirements-dev.txt requires configparser, so we never ran into this during make test, 😮 pretty sneaky!

We should make a make test w/ setup.py dependencies only option from the Makefile.

But for now, python-future suggest doing from configparser import ConfigParser so that is what I will do.

Consider using truffleHogRegexes

Hi,

I run truffleHog. I recognize other projects have spun out doing similar things, so a little while ago I broke the regexes into their own package:

https://github.com/dxa4481/truffleHogRegexes

The thinking is I'd ideally like to get the whole community contributing regexes to one place, even if the underlying engine and technology is different.

It's on pypi, feel free to include the regex library.

BasicAuth regex runs all night long

Regex in question
https://github.com/Yelp/detect-secrets/blob/master/detect_secrets/plugins/basic_auth.py#L10

https://www.loggly.com/blog/five-invaluable-techniques-to-improve-regex-performance/

Maybe we can run truffle hog regex, instead

detect-secrets --scan outside/of/repository/ does not work

Perhaps in baseline.py we can do something similar to

+        if os.path.isdir(
+            os.path.join(rootdir, '.git')
+        ):
+            # This only works when you run it on the root directory of another repository
+            git_ls_files_args = [
+                'git',
+                '--git-dir', os.path.join(rootdir, '.git'),
+                'ls-files',
+            ]
+        else:
+            # This only works when you run it on a folder or file in the current repository
+            git_ls_files_args = [
+                'git',
+                'ls-files',
+                rootdir,
+            ]
         with open(os.devnull, 'w') as fnull:
             git_files = subprocess.check_output(
-                [
-                    'git',
-                    'ls-files',
-                    rootdir,
-                ],
+                git_ls_files_args,
                 stderr=fnull,
             )

However I tested this and although the output from the command was good and all, the outputted baseline was not.

Also note, that the above only works if the directory outside the current repository is the root directory for that repository, not if it's some/other/repo/folder_inside_that_repo I can imagine we could loop through all parents and see if one of them has a .git directory, but that feels real dirty. We could maybe try something similar to

    subprocess.check_output(
        ('git', 'remote', 'get-url', 'origin'),
        cwd=os.path.dirname(filename)
    ).decode('utf8').strip()

`pragma: whitelist secret` doesn't support additional text

detect-secrets doesn't complain about:

whitelisted_api_key: "DEADBEEF1234" # pragma: whitelist secret

but complains about:

whitelisted_api_key: "DEADBEEF1234" # pragma: whitelist secret blah

If it's a comment, it should support text that follows it.

Slow performance when scanning a non ini file with millions of lines

When scanning a non-ini file with more than 1 million lines, it would hang at line below.

(self._analyze_ini_file(add_header=True), configparser.Error,),

I'm able to trace back to configParse and found the following line is extremely inefficient to add all offending lines (essentially all the lines in the file) into the error message with string concatenation.

self.message += '\n\t[line %2d]: %s' % (lineno, line)

I did not have the patience to wait for the scan to finish, on my laptop it did hang for at least more than 10 minutes.

We need a more efficient way to scan large non-ini file.

Respect plugin list from baseline

Some of the plugins, in particular, entropy-based and keyword plugins, can generate a relatively high number of false positives. When some of our teams are using detect-secrets, they choose to exclude certain plugins (with or without the combination of excluding some files). Currently, if you run a scan with --no-xxx-scan option, the used plugin list would be persisted in the baseline file.

If some developer or automation system picks up the repo, have no pre-commit hook setup and also unaware of the exclude list, they could run into the issue that they issue detect-secrets --update baseline, then the baseline file is regenerated with all plugins used.

Would the community entertain the idea that detect-secrets --update baseline scan use the plugin list from baseline instead of all plugins (default setting)? Some additional options can be added if you want to use more plugins than baseline ones to scan the repo.

We have something implemented in our fork (offline in our GHE), we'd like to hear some feedback on the problem before submitting a big PR.

Add options for baseline diff minimizing

--no-line-numbers (in baseline)
--no-generated-at (in baseline)
and "Make pre-commit hook only look at the git diff" options.

Are all possible ideas.

`pre-commit autoupdate` always fails if detect-secrets is present

hookid: detect-secrets                

The supplied baseline may be incompatible with the current                  
version of detect-secrets. Please recreate your baseline to                 
avoid potential mis-configurations.   

Current Version: 0.10.3               
Baseline Version: 0.9.1

detect-secrets should heal the repository when this happens rather than requiring manual intervention.

Performance issues

I've been trying to run this tool against the discourse repository.
It seems to get stuck when encountering this file:

[secrets_collection]    INFO    Checking file: ./plugins/discourse-narrative-bot/lib/discourse_narrative_bot/certificate_generator.rb

Any workaround/suggestion?

--audit crashes on non-existant files

When you audit a baseline, if a file has been removed, it crashes the audit command and you lose all your progress.

Progress should be saved after each keystroke, and system calls should be protected by a try ... catch block

How to reproduce

Create a baseline with secrets
Remove one of the files referenced in the baseline
Audit the baseline

`--import` baseline could be smarter

Issues

When using --import <baseline_filename>, a baseline is created without knowledge of the existing baseline file. This means that the current baseline file will be scanned for secrets (which it will clearly find, due to the secret hashes stored in there).
When we upgrade baselines, we currently need to perform a two-liner (without sponge):

$ detect-secrets scan --import .secrets.baseline > .secrets.baseline.new
$ mv .secrets.baseline.new .secrets.baseline

If we already know the filename we're importing from (as compared to reading from stdin), we should also write to it.

Suggested Fix

$ detect-secrets scan --upgrade .secrets.baseline

This will write results to the provided file, and ignore the false positives in the current baseline file.

Add a server-scanner only exclude regex

One that the scanner server will use instead of the regular exclude regex, if present.

Keyword Detector is not used in 0.10.5

Hello,

When i run a scan on a repo, KeywordDetector is not used.
If it's normal, why it's disable ?

{
"exclude_regex": null,
"generated_at": "2018-11-26T09:42:29Z",
"plugins_used": [
{
"base64_limit": 4.5,
"name": "Base64HighEntropyString"
},
{
"name": "BasicAuthDetector"
},
{
"hex_limit": 3,
"name": "HexHighEntropyString"
},
{
"name": "PrivateKeyDetector"
}
],
"results": {},
"version": "0.10.5"
}

Thank you

yelp / detect-secrets Goto Github PK

detect-secrets's People

Contributors

Stargazers

Watchers

Forkers

detect-secrets's Issues

Steps to Reproduce

How to reproduce

Issues

Suggested Fix

Recommend Projects

Recommend Topics

Recommend Org