Comments (49)
Hm, OK, maybe add some saveguard for Delphi 2007 and earlier:
{$IFNDEF D2009}
{$UNDEF Unicode}
{$ENDIF}updated my fork! Pls see - renamed define Unicode to UnicodeRE so it's separate now with Delphi's define. it seems good to me.
Yes and no: While there is no longer a conflict with the predefined Unicode symbol, UnicodeRE is defined by default. The user will have do figure out that the compile error much later in the file means that he has to undefine it. With my saveguard suggestion this will be done automatically (first variant) or he will at least get an explicit hint what to do (second variant). But I forgot fpc support, so that safeguard must be more complex.
(It's getting rather late at my location. I'm afraid I must stop now. Thank you for your great help. I'll check back here tomorrow or the day after.)
from tregexpr.
There is another problem with Delphi 6 and 7: They don't support UTF-8 with BOM. Removing the BOM fixes that problem.
How to observe this? can you suggest a fix?
The compiler complains about an invalid identifier (meaning the BOM at the beginning of the file).
The fix is to remove it. I used Notepad++ for that: Encoding -> UTF-8 did the trick.
from tregexpr.
Can you disable that {$DEFINE UNICODE} in RegExpr?
from tregexpr.
Yes, but if I do that, I get other compile errors, e.g. in line 5950: where ExecNext is called without parameters. That's another bug: The symbol DefParam is not defined for Delphi 2007.
I'll try to fix that too and see where I get.
from tregexpr.
Delphi 2007 is completely missing from regexpr_compilers.inc. I'll add that.
In fact, Delphi 2005 to 2007 are missing from that file. I just added them.
Now I get a compile error regarging inlining in line 887. Removing that inline declaration got it to compile in Delphi 2007 now.
from tregexpr.
I have attached the changes that were necessary to compile with Delphi 2007. If you want I can also check with Delphi 6 and 7, but I don't have 2005 and 2006 available on this computer.
But somebody must have though that Unicode must be defined, so I am afraid undefining it will break something, even if it compiles now.
from tregexpr.
Integrated your change for *compilers.inc. thanks!
Now I want to merge your changes for main code.
function IsPairedBreak(p: PRegExprChar): boolean; // cannot inline with local type declaration {$IFDEF InlineFuncs}inline;{$ENDIF}
can you change it to something like
function IsPairedBreak(p: PRegExprChar): boolean; {$IFDEF D2009} {$IFDEF InlineFuncs}inline;{$ENDIF} {$ENDIF}
so that 'InlineFuncs' will work here only on D2009+. or D2007+. or D2005+. please adjust that version here.
from tregexpr.
About 'Unicode'. It is totally safe to disable it! it will give no problems! but regex engine will not work on Unicode strings (WideStrings). so it will handle ASCII/ANSI.
from tregexpr.
Unfortunately I currently have only Delphi 6, 7, 2007, 10.2 and 11 available. I can therefore only say that the inline compiles with Delphi 10.2 but doesn't with 2007.
But simply moving the type declaration for PtrPair outside the function fixes the problem:
type
PtrPair = {$IFDEF Unicode} ^LongInt; {$ELSE} ^Word; {$ENDIF}
function IsPairedBreak(p: PRegExprChar): boolean; {$IFDEF InlineFuncs}inline;{$ENDIF}
const
cBreak = {$IFDEF Unicode} $000D000A; {$ELSE} $0D0A; {$ENDIF}
begin
Result := PtrPair(p)^ = cBreak;
end;
from tregexpr.
About 'Unicode'. It is totally safe to disable it! it will give no problems! but regex engine will not work on Unicode strings (WideStrings). so it will handle ASCII/ANSI.
Are you sure? Why was it added back then? For fpc support?
edit: OK, I see: Nothing in the change relied on that symbol being defined. Yes, removing it should probably be fine, unless something was added later that did.
from tregexpr.
It was added for non-Unicode old Delphi. so one can UNdefine 'inlinefuncs'.
from tregexpr.
Tried to integrate new change. Pls test it in my fork. I will post the pull-request later.
from tregexpr.
https://github.com/Alexey-T/TRegExpr
from tregexpr.
Yes, this compiles, if I remove the define for Unicode again.
There are some compiler hints though:
The declaration for field fLineSepArray should be ifdef'ed like this:
{$IFDEF UseLineSep}
{$IFNDEF UniCode}
fLineSepArray: array[byte] of boolean;
{$ENDIF}
{$ENDIF}
In function TRegExpr.FindRepeated the variable i should be ifdef'ed like this:
var
// ...
ArrayIndex: integer;
{$IFDEF UnicodeEx}
i: integer;
{$ENDIF}
and iin procedure TRegExpr.FillFirstCharSet the variable i should be ifdef'ed like this:
var
// ...
min_cnt: integer;
{$IFDEF UseLineSep}
i: integer;
{$ENDIF}
from tregexpr.
regarding {$DEFINE Unicode}:
My suggestion would be to rename that symbol to ForceUnicode and later use
{$IFDEF Unicode}
// the compiler defaults to Unicode
{$DEFINE SUPPORT_UNICODE}
{$ENDIF}
{$IFDEF ForceUnicode}
// the user wants to force Unicode support
{$DEFINE SUPPORT_UNICODE}
{$ENDIF}
And later in the code replace {$IFDEF Unicode} with {$IFDEF SUPPORT_UNICODE}
That should solve that issue for all kinds of compilers and the user can still enable it.
from tregexpr.
There is another problem with the include file: It declares D_105 for delphi 10.5, but that version does not exist. It's Delphi 11 now, so the symbol should probably be D_110 (or D_11). Since that symbol is not used anywhere, renaming it would not have any consequences.
from tregexpr.
I corrected my fork regarding your msgs. Except this one
regarding {$DEFINE Unicode}: My suggestion would be to rename that symbol to ForceUnicode and later use
it 's not good idea. We won't be able to UNdefine unicode mode for new Delphi/FPC. it is needed when I want to speedup the engine. for ASCII.
from tregexpr.
Hm, OK, maybe add some saveguard for Delphi 2007 and earlier:
{$IFNDEF D2009}
{$UNDEF Unicode}
{$ENDIF}
or
{$IFNDEF D2009}
{$IFDEF Unicode}
'Delphi < 2009 does not support Unicode, disable it above'
{$ENDIF}
{$ENDIF}
There are some more compile errors with Delphi 6. I'll try to fix them.
from tregexpr.
Delphi 6 problem: The types NativeInt and NativeUInt do not exist. In addition to that the type NativeInt is declared incorrectly as Int64 in Delphi 7 to Delphi 2007 (I found out the hard way) and NativeUInt appeared in Delphi XE2.
My proposed fix for this:
{$IFNDEF FPC}
// Delphi doesn't have PtrInt but has NativeInt
// but unfortunately NativeInt is declared wrongly in several versions
{$IF SizeOf(Pointer)=4}
PtrInt=Integer;
PtrUInt=Cardinal;
{$ELSE}
PtrInt = Int64;
PtrUInt = UInt64;
{$IFEND}
{$ENDIF}
from tregexpr.
Hm, OK, maybe add some saveguard for Delphi 2007 and earlier:
{$IFNDEF D2009}
{$UNDEF Unicode}
{$ENDIF}
updated my fork! Pls see - renamed define Unicode to UnicodeRE so it's separate now with Delphi's define. it seems good to me.
from tregexpr.
{$IF SizeOf(Pointer)=4}
Will that compile on older Delphi? D7, D6?
from tregexpr.
{$IF SizeOf(Pointer)=4}
Will that compile on older Delphi? D7, D6?
It will work with Delphi and later. Note the {$IFEND} though. {$ENDIF} won't compile.
There is another problem with Delphi 6 and 7: They don't support UTF-8 with BOM. Removing the BOM fixes that problem.
from tregexpr.
{$IF SizeOf(Pointer)=4}
Merged. thanks. to my fork.
from tregexpr.
There is another problem with Delphi 6 and 7: They don't support UTF-8 with BOM. Removing the BOM fixes that problem.
How to observe this? can you suggest a fix?
from tregexpr.
The compiler complains about an invalid identifier (meaning the BOM at the beginning of the file).
RegExpr don';t operate on FILES! it operate on strings! so you must skip BOM from the string which you pass!
from tregexpr.
I used Notepad++ for that
Better use CudaText, and give me feedback on it in my Github. :=) it is much better.
from tregexpr.
Yes and no: While there is no longer a conflict with the predefined Unicode symbol, UnicodeRE is defined by default. The user will have do figure out that the compile error much later in the file means that he has to undefine it. With my saveguard suggestion this will be done automatically (first variant)
now I dont understand. What compiler error. We don't give any compiler errors now. User can still undef UnicodeRE to return to ASCII mode of engine.
With my saveguard suggestion this will be done automatically (first variant)
but i merged that safeguard, isn't it?
from tregexpr.
RegExpr don';t operate on FILES!
I was talking about the unit source code itself and the include file. Both are UTF-8 with a BOM. The Delphi 6 and 7 compilers don't like that and cannot compile it. Removing the BOM fixes that problem.
from tregexpr.
Aha, I get it now - removed the BOM now (using CudaText !)
from tregexpr.
It seems we solved all your issues, right? feel free to post more.
from tregexpr.
now I dont understand. What compiler error. We don't give any compiler errors now. User can still undef UnicodeRE to return to ASCII mode of engine.
If UnicodeRE is defined, which is the default, Delphi 6 gives the compile error:
"regexpr.pas(1324) Error: Incompatible types: 'Char' and 'WideChar'"
OK, I now understand what you ment: It should still compile even for pre-Unicode Delphis even If UnicodeRE is declared. Well, it doesn't. That's because the ARegExpr paramter of RegExprSubExpressions is a string, not a wide string. Declaring it as RegExprString solves that problem.
from tregexpr.
good catch. I fixed it as suggested.
from tregexpr.
There is only a warning left:
regexpr.pas(4119) Warning: Variable 'GrpIndex' might not have been initialized
No idea how to fix that or if it needs fixing at all.
from tregexpr.
It don;t need fixing (if you look the code flow, it's inited). added initing of it.
from tregexpr.
Now I get a compile error for Unicode Delphi (10.2), if UnicodeRE is not defined:
[dcc32 Error] regexpr.pas(1259): E2010 Incompatible types: 'Char' and 'AnsiChar'
Should that combination also compile? I'm not sure it makes sense.
Edit: Even changing the obvious
PRegExprChar = PAnsiChar;
REChar = AnsiChar;
... only results in a different compile error:
[dcc32 Error] regexpr.pas(1000): E2010 Incompatible types: 'AnsiChar' and 'Char'
I now definitely think it's not worth the effort.
from tregexpr.
We have at line 152: REChar = Char;
what if you replace 'Char' to 'AnsiChar'?
from tregexpr.
PRegExprChar = PChar;
RegExprString = AnsiString;
REChar = Char;
PChar to PANsiChar too!
from tregexpr.
I did that, see above.
from tregexpr.
I fixed that obvious moment, now you have err at line 1000. I cannot fix it - I don't have new Delphi! I don't want to install it, sorry, can you see how we work near the line 1000 and fix it?
from tregexpr.
I think in order to fix this, one the unit AnsiStrings must be added to the uses list. Unfortuately that has quite a lot of side effects.
from tregexpr.
It makes no sense to disable UnicodeRE on Unicode Delphi, you are right.
from tregexpr.
Yes, adding the AnsiStrings unit solves the compile error, but produces a lot of warnings.
I suggest to add
{$IFDEF UNICODE}
{$DEFINE UNICODERE}
{$ENDIF}
instead.
from tregexpr.
adding the AnsiStrings unit
I don't know what Delphi versions support it, we need good IFDEFs for this unit
from tregexpr.
{$IFDEF UNICODE}
{$DEFINE UNICODERE}
{$ENDIF}
Makes little sense, coz we define UnicodeRE by default
from tregexpr.
AnsiStrings was introduced with Delphi 2009 and got renamed to System.AnsiStrings at some later time. But since the Unit prefix System is always in the list of Unit Scope Names, simply using AnsiStrings would be enough. But as I said above: Adding it would make it compile but leave very many warnings.
from tregexpr.
Then let us not use AnsiStrings.
from tregexpr.
Makes little sense, coz we define UnicodeRE by default
Alternatively you could add
{$IFDEF UNICODE}
{$IFNDEF UNICODERE}
{$MESSAGE ERROR 'you cannot undefine UNICODERE for Unicode Delphi versions'}
{$ENDIF}
{$ENDIF}
from tregexpr.
Okay, added this code
from tregexpr.
I would have moved that below the last DEFINE that users may set:
{$DEFINE UnicodeRE} // Use WideChar for characters and UnicodeString/WideString for strings
{ off $DEFINE UnicodeEx} // Support Unicode >0xFFFF, e.g. emoji, e.g. "." must find 2 WideChars of 1 emoji
{ off $DEFINE UseWordChars} // Use WordChars property, otherwise fixed list 'a'..'z','A'..'Z','0'..'9','_'
{ off $DEFINE UseSpaceChars} // Use SpaceChars property, otherwise fixed list
{ off $DEFINE UseLineSep} // Use LineSeparators property, otherwise fixed line-break chars
{$IFDEF UNICODE}
{$IFNDEF UnicodeRE}
{$MESSAGE ERROR 'You cannot undefine UnicodeRE for Unicode Delphi versions'}
{$ENDIF}
{$ENDIF}
because otherwise it is more difficult to figure out that there are more user definable symbols.
It's getting late again and I think we have now covered all issues. I will close this ticket now. Thanks again.
btw: I have now added regexpr to GExperts in order to replace an outdated version of that unit that comes with SynEdit. This fixes a problem with case sensitivity described in GExperts Grep as described here: https://en.delphipraxis.net/topic/6608-gexperts-grep-is-always-case-sensitive-when-regular-expressions-are-enabled/
from tregexpr.
Related Issues (20)
- Crash with O4 HOT 3
- $DEFINE UnicodeRE off-> tests fail HOT 9
- function regNextQuick(p: PRegExprChar): PRegExprChar has problem with inline in delphi HOT 12
- Sub-call broken in loop HOT 2
- op-star prevents match HOT 5
- Delphi 12 changes HOT 2
- OP_Star / FindRepeated and group ref \1 HOT 3
- FindRepeatead and Unicode / may break OP_STAR/PLUS/... HOT 2
- OP_ANYLINEBREAK incomplete in FindRepeated
- OP_ANYLINEBREAK and FillFirstCharSet
- Optional Feature: backtracking is sub-calls HOT 5
- Nested back-ref does not work HOT 2
- Problems with DefParam and OverMeth defines in Delphi 12. HOT 6
- Add to CI matrix additional lazarus versions we want to support with
- Capture in CI benchmark and build report
- Add to CI runners to check `solaris x64` and `aarch64`
- Add to README badges with CI results
- Why test `TestBranches` takes so long? HOT 2
- Can we remove travis configs? HOT 1
- Crash when searching inside huge line (len=130K) HOT 20
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tregexpr.