Giter VIP home page Giter VIP logo

Comments (49)

dummzeuch avatar dummzeuch commented on September 18, 2024 1

Hm, OK, maybe add some saveguard for Delphi 2007 and earlier:
{$IFNDEF D2009}
{$UNDEF Unicode}
{$ENDIF}

updated my fork! Pls see - renamed define Unicode to UnicodeRE so it's separate now with Delphi's define. it seems good to me.

Yes and no: While there is no longer a conflict with the predefined Unicode symbol, UnicodeRE is defined by default. The user will have do figure out that the compile error much later in the file means that he has to undefine it. With my saveguard suggestion this will be done automatically (first variant) or he will at least get an explicit hint what to do (second variant). But I forgot fpc support, so that safeguard must be more complex.

(It's getting rather late at my location. I'm afraid I must stop now. Thank you for your great help. I'll check back here tomorrow or the day after.)

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024 1

There is another problem with Delphi 6 and 7: They don't support UTF-8 with BOM. Removing the BOM fixes that problem.

How to observe this? can you suggest a fix?

The compiler complains about an invalid identifier (meaning the BOM at the beginning of the file).

The fix is to remove it. I used Notepad++ for that: Encoding -> UTF-8 did the trick.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Can you disable that {$DEFINE UNICODE} in RegExpr?

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Yes, but if I do that, I get other compile errors, e.g. in line 5950: where ExecNext is called without parameters. That's another bug: The symbol DefParam is not defined for Delphi 2007.
I'll try to fix that too and see where I get.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Delphi 2007 is completely missing from regexpr_compilers.inc. I'll add that.
In fact, Delphi 2005 to 2007 are missing from that file. I just added them.
Now I get a compile error regarging inlining in line 887. Removing that inline declaration got it to compile in Delphi 2007 now.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

I have attached the changes that were necessary to compile with Delphi 2007. If you want I can also check with Delphi 6 and 7, but I don't have 2005 and 2006 available on this computer.

But somebody must have though that Unicode must be defined, so I am afraid undefining it will break something, even if it compiles now.

RegExpr.zip
.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Integrated your change for *compilers.inc. thanks!
Now I want to merge your changes for main code.

function IsPairedBreak(p: PRegExprChar): boolean; // cannot inline with local type declaration {$IFDEF InlineFuncs}inline;{$ENDIF}

can you change it to something like

function IsPairedBreak(p: PRegExprChar): boolean; {$IFDEF D2009} {$IFDEF InlineFuncs}inline;{$ENDIF} {$ENDIF}

so that 'InlineFuncs' will work here only on D2009+. or D2007+. or D2005+. please adjust that version here.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

About 'Unicode'. It is totally safe to disable it! it will give no problems! but regex engine will not work on Unicode strings (WideStrings). so it will handle ASCII/ANSI.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Unfortunately I currently have only Delphi 6, 7, 2007, 10.2 and 11 available. I can therefore only say that the inline compiles with Delphi 10.2 but doesn't with 2007.
But simply moving the type declaration for PtrPair outside the function fixes the problem:

type
PtrPair = {$IFDEF Unicode} ^LongInt; {$ELSE} ^Word; {$ENDIF}
function IsPairedBreak(p: PRegExprChar): boolean; {$IFDEF InlineFuncs}inline;{$ENDIF}
const
cBreak = {$IFDEF Unicode} $000D000A; {$ELSE} $0D0A; {$ENDIF}
begin
Result := PtrPair(p)^ = cBreak;
end;

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

About 'Unicode'. It is totally safe to disable it! it will give no problems! but regex engine will not work on Unicode strings (WideStrings). so it will handle ASCII/ANSI.

Are you sure? Why was it added back then? For fpc support?

edit: OK, I see: Nothing in the change relied on that symbol being defined. Yes, removing it should probably be fine, unless something was added later that did.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

It was added for non-Unicode old Delphi. so one can UNdefine 'inlinefuncs'.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Tried to integrate new change. Pls test it in my fork. I will post the pull-request later.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

https://github.com/Alexey-T/TRegExpr

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Yes, this compiles, if I remove the define for Unicode again.

There are some compiler hints though:

The declaration for field fLineSepArray should be ifdef'ed like this:

{$IFDEF UseLineSep}
  {$IFNDEF UniCode}
fLineSepArray: array[byte] of boolean;
  {$ENDIF}
{$ENDIF}

In function TRegExpr.FindRepeated the variable i should be ifdef'ed like this:

var
// ...
ArrayIndex: integer;
{$IFDEF UnicodeEx}
i: integer;
{$ENDIF}

and iin procedure TRegExpr.FillFirstCharSet the variable i should be ifdef'ed like this:

var
// ...
min_cnt: integer;
{$IFDEF UseLineSep}
i: integer;
{$ENDIF}

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

regarding {$DEFINE Unicode}:
My suggestion would be to rename that symbol to ForceUnicode and later use

{$IFDEF Unicode}
// the compiler defaults to Unicode
{$DEFINE SUPPORT_UNICODE}
{$ENDIF}
{$IFDEF ForceUnicode}
// the user wants to force Unicode support
{$DEFINE SUPPORT_UNICODE}
{$ENDIF}

And later in the code replace {$IFDEF Unicode} with {$IFDEF SUPPORT_UNICODE}

That should solve that issue for all kinds of compilers and the user can still enable it.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

There is another problem with the include file: It declares D_105 for delphi 10.5, but that version does not exist. It's Delphi 11 now, so the symbol should probably be D_110 (or D_11). Since that symbol is not used anywhere, renaming it would not have any consequences.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

I corrected my fork regarding your msgs. Except this one

regarding {$DEFINE Unicode}: My suggestion would be to rename that symbol to ForceUnicode and later use

it 's not good idea. We won't be able to UNdefine unicode mode for new Delphi/FPC. it is needed when I want to speedup the engine. for ASCII.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Hm, OK, maybe add some saveguard for Delphi 2007 and earlier:
{$IFNDEF D2009}
{$UNDEF Unicode}
{$ENDIF}

or

{$IFNDEF D2009}
{$IFDEF Unicode}
'Delphi < 2009 does not support Unicode, disable it above'
{$ENDIF}
{$ENDIF}

There are some more compile errors with Delphi 6. I'll try to fix them.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Delphi 6 problem: The types NativeInt and NativeUInt do not exist. In addition to that the type NativeInt is declared incorrectly as Int64 in Delphi 7 to Delphi 2007 (I found out the hard way) and NativeUInt appeared in Delphi XE2.

My proposed fix for this:

{$IFNDEF FPC}
// Delphi doesn't have PtrInt but has NativeInt
// but unfortunately NativeInt is declared wrongly in several versions
{$IF SizeOf(Pointer)=4}
PtrInt=Integer;
PtrUInt=Cardinal;
{$ELSE}
PtrInt = Int64;
PtrUInt = UInt64;
{$IFEND}
{$ENDIF}

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Hm, OK, maybe add some saveguard for Delphi 2007 and earlier:
{$IFNDEF D2009}
{$UNDEF Unicode}
{$ENDIF}

updated my fork! Pls see - renamed define Unicode to UnicodeRE so it's separate now with Delphi's define. it seems good to me.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

{$IF SizeOf(Pointer)=4}

Will that compile on older Delphi? D7, D6?

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

{$IF SizeOf(Pointer)=4}

Will that compile on older Delphi? D7, D6?

It will work with Delphi and later. Note the {$IFEND} though. {$ENDIF} won't compile.

There is another problem with Delphi 6 and 7: They don't support UTF-8 with BOM. Removing the BOM fixes that problem.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

{$IF SizeOf(Pointer)=4}

Merged. thanks. to my fork.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

There is another problem with Delphi 6 and 7: They don't support UTF-8 with BOM. Removing the BOM fixes that problem.

How to observe this? can you suggest a fix?

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

The compiler complains about an invalid identifier (meaning the BOM at the beginning of the file).

RegExpr don';t operate on FILES! it operate on strings! so you must skip BOM from the string which you pass!

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

I used Notepad++ for that

Better use CudaText, and give me feedback on it in my Github. :=) it is much better.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Yes and no: While there is no longer a conflict with the predefined Unicode symbol, UnicodeRE is defined by default. The user will have do figure out that the compile error much later in the file means that he has to undefine it. With my saveguard suggestion this will be done automatically (first variant)

now I dont understand. What compiler error. We don't give any compiler errors now. User can still undef UnicodeRE to return to ASCII mode of engine.

With my saveguard suggestion this will be done automatically (first variant)

but i merged that safeguard, isn't it?

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

RegExpr don';t operate on FILES!

I was talking about the unit source code itself and the include file. Both are UTF-8 with a BOM. The Delphi 6 and 7 compilers don't like that and cannot compile it. Removing the BOM fixes that problem.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Aha, I get it now - removed the BOM now (using CudaText !)

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

It seems we solved all your issues, right? feel free to post more.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

now I dont understand. What compiler error. We don't give any compiler errors now. User can still undef UnicodeRE to return to ASCII mode of engine.

If UnicodeRE is defined, which is the default, Delphi 6 gives the compile error:
"regexpr.pas(1324) Error: Incompatible types: 'Char' and 'WideChar'"

OK, I now understand what you ment: It should still compile even for pre-Unicode Delphis even If UnicodeRE is declared. Well, it doesn't. That's because the ARegExpr paramter of RegExprSubExpressions is a string, not a wide string. Declaring it as RegExprString solves that problem.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

good catch. I fixed it as suggested.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

There is only a warning left:
regexpr.pas(4119) Warning: Variable 'GrpIndex' might not have been initialized

No idea how to fix that or if it needs fixing at all.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

It don;t need fixing (if you look the code flow, it's inited). added initing of it.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Now I get a compile error for Unicode Delphi (10.2), if UnicodeRE is not defined:
[dcc32 Error] regexpr.pas(1259): E2010 Incompatible types: 'Char' and 'AnsiChar'

Should that combination also compile? I'm not sure it makes sense.

Edit: Even changing the obvious

PRegExprChar = PAnsiChar;
REChar = AnsiChar;

... only results in a different compile error:
[dcc32 Error] regexpr.pas(1000): E2010 Incompatible types: 'AnsiChar' and 'Char'

I now definitely think it's not worth the effort.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

We have at line 152: REChar = Char;
what if you replace 'Char' to 'AnsiChar'?

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024
  PRegExprChar = PChar;
  RegExprString = AnsiString;
  REChar = Char;

PChar to PANsiChar too!

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

I did that, see above.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

I fixed that obvious moment, now you have err at line 1000. I cannot fix it - I don't have new Delphi! I don't want to install it, sorry, can you see how we work near the line 1000 and fix it?

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

I think in order to fix this, one the unit AnsiStrings must be added to the uses list. Unfortuately that has quite a lot of side effects.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

It makes no sense to disable UnicodeRE on Unicode Delphi, you are right.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Yes, adding the AnsiStrings unit solves the compile error, but produces a lot of warnings.
I suggest to add

{$IFDEF UNICODE}
{$DEFINE UNICODERE}
{$ENDIF}

instead.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

adding the AnsiStrings unit

I don't know what Delphi versions support it, we need good IFDEFs for this unit

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

{$IFDEF UNICODE}
{$DEFINE UNICODERE}
{$ENDIF}

Makes little sense, coz we define UnicodeRE by default

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

AnsiStrings was introduced with Delphi 2009 and got renamed to System.AnsiStrings at some later time. But since the Unit prefix System is always in the list of Unit Scope Names, simply using AnsiStrings would be enough. But as I said above: Adding it would make it compile but leave very many warnings.

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Then let us not use AnsiStrings.

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

Makes little sense, coz we define UnicodeRE by default

Alternatively you could add

{$IFDEF UNICODE}
{$IFNDEF UNICODERE}
{$MESSAGE ERROR 'you cannot undefine UNICODERE for Unicode Delphi versions'}
{$ENDIF}
{$ENDIF}

from tregexpr.

Alexey-T avatar Alexey-T commented on September 18, 2024

Okay, added this code

from tregexpr.

dummzeuch avatar dummzeuch commented on September 18, 2024

I would have moved that below the last DEFINE that users may set:

{$DEFINE UnicodeRE} // Use WideChar for characters and UnicodeString/WideString for strings
{ off $DEFINE UnicodeEx} // Support Unicode >0xFFFF, e.g. emoji, e.g. "." must find 2 WideChars of 1 emoji
{ off $DEFINE UseWordChars} // Use WordChars property, otherwise fixed list 'a'..'z','A'..'Z','0'..'9','_'
{ off $DEFINE UseSpaceChars} // Use SpaceChars property, otherwise fixed list
{ off $DEFINE UseLineSep} // Use LineSeparators property, otherwise fixed line-break chars
{$IFDEF UNICODE}
{$IFNDEF UnicodeRE}
{$MESSAGE ERROR 'You cannot undefine UnicodeRE for Unicode Delphi versions'}
{$ENDIF}
{$ENDIF}

because otherwise it is more difficult to figure out that there are more user definable symbols.

It's getting late again and I think we have now covered all issues. I will close this ticket now. Thanks again.

btw: I have now added regexpr to GExperts in order to replace an outdated version of that unit that comes with SynEdit. This fixes a problem with case sensitivity described in GExperts Grep as described here: https://en.delphipraxis.net/topic/6608-gexperts-grep-is-always-case-sensitive-when-regular-expressions-are-enabled/

from tregexpr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.