Comments (4)
I spoke with @KevinHock today about this, and decided to record conversation down, for posterity.
Historical Context
Initially, the ini parser was written in order to try and catch secrets that did not need quote marks around them -- namely, config files.
$ cat config.ini
[private]
key=secret
The issue is that there's no easy way to identify whether a file is a config file. File extensions don't work, because config files don't have a typical set of extensions that they correspond to. And, there's no special header file that identifies that a file is a config file. e.g. It's not like you could do:
$ file config.ini
Therefore, the only way to really identify whether a file is a config file is to try and parse it, and handle errors appropriately.
Issue
It seems that this approach runs into two performance hits:
- Needing to parse the entire file, with
configparser
, before having usable results. - Error traceback construction for large files takes a long time (as @killuazhu pointed out)
Possible Solutions
1. Use the first N lines to try and determine whether a file is actually a config file
Credit to @KevinHock for this idea. Essentially, if the following conditions hold true, we may be able to identify whether a file is a config file by reading the first few lines.
a. The first N lines are a representative sample for the entire file, and
b. The first N lines are independently parseable as a config file by themselves.
If we're able to do this, then we would be able to optimize on both issues listed above, since you don't need to parse the entire file to determine whether a given file is suitable for ini file parsing.
Our issue is that we don't have a large enough sample set of config files to test out this method.
2. Try to use a different library for config file parsing
If we use a different library, we may be able to avoid that error traceback construction, and speed things along. Or similarly, we might be able to perform a special sub-classed invocation of configparser
to avoid ParsingError
recording every line of output.
3. Rethink how we approach config files
detect-secrets/detect_secrets/plugins/high_entropy_strings.py
Lines 59 to 64 in 1fabf92
Maybe, there's a better way to do this, than trying to scan the ini file twice?
from detect-secrets.
I ran into this today as well, with a file that was ~250k lines.
from detect-secrets.
We did a short-term solution, number 2 from @domanchi's comment, in the above referenced PRs. They are live in version 0.12.2.
Thanks again for making this issue, I'm gonna keep it open until we improve on it more completely.
from detect-secrets.
Closing this issue, seeing that #187 has factual evidence that the changes made have been effective for long files.
We can separately track performance for files with long lines.
from detect-secrets.
Related Issues (20)
- Update urllib3 to v1.26.17 in requirements_dev.txt to eliminate vulnerability HOT 1
- Custom Filters cannot be passed in from the command line
- Should include filename in error displayed if plugin file in baseline not found HOT 1
- brew missing dependencies HOT 3
- UI improvement: Wording of audit prompt is confusing HOT 3
- False positive detection of Git revision hash as high entropy string in `pyproject.toml` HOT 3
- Detection of telegram bot API-keys
- Fix README so copy/paste works HOT 1
- Supported languages? HOT 3
- Secret followed by type hint are not detected
- Secrets are not found in Jupyter Notebooks HOT 1
- validity checking of detected secrets ? HOT 5
- Detect a npmrc auth token being checked in HOT 1
- Problem with Python3.11 and pre-commit HOT 4
- False Negative - YAML Parser Stops Reading After First String Value/Does Not Read Lists of Strings
- Getting detect-secrets: command not found error HOT 3
- Request: using a baseline as an allowlist HOT 5
- Pre-commit hook fails with "error: Unable to read baseline." HOT 1
- Reddit HOT 1
- Request: Push a new tag HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detect-secrets.