Comments (16)
So I think we should restore the
header
functionality
Although perhaps we should make header
have lower priority than filename
, like we now already do with signature
(and unlike in the old implementation, where it was non-deterministic whether filename
match or header
match would take precedence). Perhaps then we won't even need to fix json?
from micro.
But besides ^{$ is very unspecific so far and doesn't really help the initial problem still persists.
But the problem still persists but it will only happen if the file name matches more than one file type, the problem would be the signature patterns, they must be fixed, improved and added so that false positives can be corrected. The signature
should be used in case the file type is unknown
too and that false positives are reported by whoever finds it and that the signatures that would avoid these errors are implemented.
from micro.
Placing an option that activates or deactivates the use of signature
would be trying to sweep the problem of detecting the file type from the file content down the table, in the same way as placing these bugs only in files that match with more than one filename
.
Maybe warn that signature
may present errors and ask the user to report bugs found with this option activated in the option documentation, something like:
* `detectsignature`: If it is `true`, the `signature` field of the syntax files
will be used to try to find the `filetype` when it is `unknown` or when
more than one `filetype` is found (this function may end up generating false positives, if this happens please report the bug so we can improve its functionality).
default value: `false`
But I still believe that signature
should be used to try to find the type of files of type unknown
by default, so signature will have a greater frequency of use than it is today, like #3183 (comment) that shows an example of use of the signature of the json
files and in which scenarios this could happen unexpectedly.
from micro.
If I put the signature with "%YAML", any file that has this content of this type?
No, as described in #3183 the signature
check is currently used as last resort to sort out multiple detected filename
s (more or less as Vim
does)
What is this field for?
See above.
How is it used by the micro to carry out detection?
It will iterate over the amount of detectlimit
lines to search for e.g. keywords and or patterns with which the file type can be recognized.
In which scenarios would I have which behaviors?
As written, it's currently used to define the chosen one of multiple filename
matches, depending on the given/defined patterns.
In what cases could there be an error?
In case the pattern is insufficient or filename
will be changed without considering signature
once again.
That's definitely something to keep in mind when we decide to add more filename
patterns into already existing definitions and don't create new dedicated definitions.
How to make proper use of it?
As of now, I agree that it most probably was a fault to keep them available (with the same pattern as before with header
) for everything else other than the C* headers. Because this caused unintended confusion.
But as already written, there is still some room to use it as a absolute last fallback.
How do I find this information without having to review PR's and issues?
Well, what shall I say/write?
We figured out by discussion, since confusion came up due to the lack of proper documentation.
As already often said:
This project is already larger than 2-3 maintainers can handle (which was the reason why I vote for you too). 😃
At least labeling should be possible for volunteers, but unfortunately it isn't.
Help is appreciated with support to prepare the proper doc update. 😉
from micro.
I've read the discussion and I come to a conclusion that we should admit that replacing "headers" with "signatures" was a hasty decision. It is not just a matter of a proper doc update, it is a matter of breakage of useful functionality. header
and signature
are different things, with quite different purposes, both are useful, we should have added the latter without obstructing the former.
If we have a file named foo
with the following content:
#!/bin/sh
echo "hello world"
with v2.0.13 it is recognized and highlighted as a shell script. With the newest master it is not.
To me, this is a regression, and a rather serious one.
The majority of micro's syntax files have no header
/signature
patterns. In those syntax files that do, those patterns are mainly shebangs (#!/bin/sh
, #!/usr/bin/awk
etc) and things like <?xml ... ?>
or <!DOCTYPE html5>
, i.e. things that: 1. can be quite reliably used for detecting the filetype independently of filename
, not just in addition to it. 2. should only occur in the first line of the file, not in the first 100 lines or so. In other words, the old header
semantics was exactly what was needed for those filetypes, while the new signature
semantics makes little sense for them.
So I think we should restore the header
functionality, restore its usage instead of signature
for most filetypes, and keep using signature
probably just for distinguishing C/C++/Objective-C headers.
When it comes to false positives caused by header
in the past, were #2894 and #3054 the only known cases of such? If so, in all cases json was guilty, so I think we should fix that simply by removing this "^{$"
pattern from json.yaml
, since this pattern is indeed too general and probably was causing more harm than good.
from micro.
To me, this is a regression, and a rather serious one.
Ok, then we keep it short and header
will return. signature
is and can be used to solve multiple hits with the same filename
and header
will step up in the moment no filename
hit at all.
When it comes to false positives caused by
header
in the past, were #2894 and #3054 the only known cases of such?
Yep, grep
tells that the json
definition is right now the only one with a very generic pattern.
Last but not least the doc of signature
can then be extended/optimized too.
from micro.
With #2819 it was changed in that way that it considers the signature
-check only in case more than one file type check hit, to concretize the result (C* file types). Previously it has led to to false positive results that the header
-check hit (#2894), before the possible later file type check could hit.
It can be discussed to try this guess as well in the moment no file type check hit at all, but this again could lead to false positives in the moment the signature
-check identifies some lines by the "wrong" signature
of a different syntax definition than expected due to an much to generalized wording within the signature
definition. For example it detects a buzz word within a comment of your unspecified/unknown file.
So from that point there is no guarantee to hit exactly, it would be a guess and it could be wrong (with a chance for new issues).
The correct way would be to add an concrete file type in a new or existing syntax definition.
from micro.
In my opinion, the fields within the detect
key should be used to detect the file type, as well as the header
in 2.0.13.
But in view of the behavior of the signature
field, I think it would be worth removing the signatures that don't make sense, for example, in what scenario would the signature of json files be used? I can't think of any filenames for this functionality to be used.
The change was made as if the behavior of the header
and signature
were the same, but some signatures will never be used and could be removed.
from micro.
...for example, in what scenario would the signature of json files be used?
But wasn't the same question legit for the old json header
too?
I can't think of any filenames for this functionality to be used.
Currently me too, but this doesn't mean that they don't exist. I "broke" some expected default handling in the past.
But besides ^{$
is very unspecific so far and doesn't really help the initial problem still persists.
The signature check can help as fallback and heavily relies on sufficient rules.
the fields within the detect key should be used to detect the file type
So you'd recommend to consider to provide this as a fallback method and remove insufficient signatures, right?
from micro.
...for example, in what scenario would the signature of json files be used?
But wasn't the same question legit for the old json
header
too?
I don't think so, all files that start with {
in the first line will be considered json files, but this is very vague and can generate a lot of false positives.
The point is that the header has been changed as if it were for an equivalent, which is not true.
The json signature will only be used if another syntax file matches the name of the file, currently json files are those that end with .json and no other syntax file matches a similar pattern, so the way it is in file types like json, this field is unusable.
I think the right thing to do in this case would be to remove it from the file, but that's not my decision.
the fields within the detect key should be used to detect the file type
So you'd recommend to consider to provide this as a fallback method and remove insufficient signatures, right?
No, actually the signature
is used to define the type of the file in cases where more than one detect.filename
match (as far as I understand, correct me if I'm wrong) and the header
that was used to define the type of the file by the content of the first line of the file has been removed, this was an approved decision and merged.
I was just thinking that the signature was the same as the header, I hadn't correctly understood the function of signature
, I'll close the subject.
from micro.
Maybe user can be asked about filetype if it was deduced from patterns or can be added one more option to control if patterns are used at all
from micro.
But I still believe that signature should be used to try to find the type of files of type unknown by default [...]
I'm open for that. We just need to keep in mind, that this will have downsides.
Just one example:
Any (new) language which has types (structs, classes, etc.) and allows to "open" them at the next line with {
, but no file extension is given yet or the language is yet unknown, because it's new.
Do we want to have it highlighted as json
? 😉
So proper highlighting should be produced with explicit file extensions or whole file names in the syntax definition. Everything else is some (better) guessing, especially in cases with more or less "wildcard" signatures.
from micro.
Ok. What vim does? You said in one of the PR's vim uses similar approach so do they have the issue?
from micro.
Vim
detects it as json
, but it does this by this simple fact:
https://github.com/vim/vim/blob/master/runtime/filetype.vim#L1627
So the file name "Pipfile.lock" is explicit tracked to be of a json file type. 😉
For *.h files for example Vim
does that dynamic lookup...
https://github.com/vim/vim/blob/master/runtime/filetype.vim#L364 -> https://github.com/vim/vim/blob/master/runtime/autoload/dist/ft.vim#L183
...from which micro
s signature
check was inspired. It doesn't do this by default.
from micro.
From what I understand, the FTheader
function is only used for *.h
files, just as the FTbas
function is only used for *.sys\c
files. Here signature can (?) run for all files definitions.
I believe that the solution for the Pipfile.lock file is to update the filename
regex of json
to match this file.
The question is, what does the signature
field look like?
You may also provide an optional
signature
regex that will check a certain
amount of lines of a file to find specific marks. For example:detect: filename: "\\.ya?ml$" signature: "%YAML"
If I put the signature
with "%YAML", any file that has this content of this type?
We found out not with Pipfile.lock and signature
from json
, but that's not what the documentation says.
What is this field for?
How is it used by the micro to carry out detection?
In which scenarios would I have which behaviors?
In what cases could there be an error?
How to make proper use of it?
How do I find this information without having to review PR's and issues?
from micro.
So proper highlighting should be produced with explicit file extensions or whole file names in the syntax definition. Everything else is some (better) guessing, especially in cases with more or less "wildcard" signatures.
Off topic sorry... It is a good place for a tuned and lightweight AI similar to googles magika which weights 1MB and it would be really perfect for such a problem.
It is even open source
from micro.
Related Issues (20)
- palettero plugin breaks `\u001b[` keybindings
- lock to provide warning when opening a file already opened in another instance of micro HOT 2
- set filetype sets the filetype for all opened tabs and not the current focused HOT 4
- Feature proposal: TUI drop-down menus HOT 2
- Enhanced multi-line editing
- Why doesn't Micro use Nano's syntax coloring? HOT 6
- [WSL] Font changed to Raster Fonts when opening Micro
- Changing default linter behaviour HOT 1
- how to input nano keybinds into micro? HOT 4
- Maybe redo should keep cursor where modification happens (not respecting pure line change in redoing)? HOT 6
- Micro places compiled Java programs into incorrect directory when auto-compiling HOT 2
- Running remote commands causes the editor to freeze up HOT 2
- Scroll with moving the scrollbar - a small idea for more convenience HOT 2
- Request (or just a general question) about .mo (modelica) syntax highlight support. HOT 1
- Weird tab bar behavior HOT 1
- Jumping to matching brace not as expected HOT 5
- editorconfig not applied HOT 2
- XHTML? HOT 3
- Distribution with DotSlash? HOT 1
- Match brace highlight is not working at all. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from micro.