Giter VIP home page Giter VIP logo

Comments (18)

sakairyota avatar sakairyota commented on June 15, 2024 4

Dear @khaled-alshamaa
Thank you for your response, and I'm sorry for misunderstanding of original issue.

Additionally I noticed that in arIdentify function $cDec of first loop looks 2nd byte of $str (byte of index 1). If it is true and it is not intended behavior, it may be related.

from ar-php.

sakairyota avatar sakairyota commented on June 15, 2024 3

Hi,

In our project, similer problem has occured.
And we found the partial condition of the problem.

We are using utf8Glyphs function.

$result = $arabic->utf8Glyphs("ٱد");

has produced invalid UTF-8 byte stream.

unpack('C*', $result);
=> [
     1 => 217,
     2 => 32,
     3 => 239,
     4 => 186,
     5 => 169,
   ]

217, 32 is invalid UTF-8 sequence.

We found the following condition reproduces the issue.

  • The text is starting from non arabic char (characters not in range of u060C - u0652)
  • When The char just before the first arabic char is an ascii char, it becomes white space ' '.
  • When The char just before the first arabic char is a multi byte character, the squence becomes invalid (the last byte of the char is converted to white space byte)

For example "aد" becomes " ﺩ".
And "あد" becomes b"Òü ´║®" (invalid utf-8 squence)

Additionally ٱ (u0671) is arabic character but not in range u060C - u0652. "ٱد" (first example) also produce invalid squence .

Thank you.

from ar-php.

muotaz avatar muotaz commented on June 15, 2024 2

السلام عليكم

الرجاء الاطلاع على التكت التالية:
#25

from ar-php.

mouadh-dev avatar mouadh-dev commented on June 15, 2024 2

Dear @khaled-alshamaa ,
I am sorry if I didn't clarify my problem but I mention the direction in the issue,
and thanks for any help you gave to me.

from ar-php.

mouadh-dev avatar mouadh-dev commented on June 15, 2024 1

Dear @khaled-alshamaa,
thanks for your effort I appreciate that.

from ar-php.

sakairyota avatar sakairyota commented on June 15, 2024 1

Dear @khaled-alshamaa

Thanks

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Dear @mouadh-dev,

Please check this issue #25, the same problem has been reported before and we already fix it. Either download the latest development version from here in GitHub, or wait until we release the next version 6.3 within a week from now.

Thanks for your feedback 🤗

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Dear @sakairyota,

Thanks for reporting this and providing very clear instructions to reproduce it. Please note that it is something different than the original issue reported in this thread.

It looks critical, so I will look at it carefully. I hope to resolve it and include the fix in the upcoming version 6.3 that we will release it very soon 😉

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Dear @sakairyota,

Thanks again for bringing this issue to the table. It is just the right time when we are extensively reviewing that legacy utf8Glyphs part of the code. Well, your issue was not straightforward and touched on several parts. Therefore, I will divide and conquer ;-)

  1. The $cDec in the arIdentifier method does not start from the 2nd byte because the initial $i value is -1
  2. We have a method called addGlyphs to add any extra letter which does not exist yet in the library (just like the case of the letter ٱ that you mentioned). You can add it using the following line of code:
$Arabic->addGlyphs('ٱ', 'FB50FB51FB50FB51', false, true);
  • The 2nd parameter is a string of 16 hexadecimal digits referring to the letter Unicode in the following order (ISOLATED FORM, FINAL FORM, INITIAL FORM, MEDIAL FORM)
  • The 3rd parameter tills that our added char can not connect to the following letter.
  • The 4th parameter tills that our added char can connect to the previous letter.

But when I tried that solution myself, I noticed another related issue. The arIdentifier does not recognize that letter as an Arabic letter! When I dug deep, I found that the range defined in that method was not complete and miss adding some extended Arabic characters. Therefore, I redefined it properly using this reference, and it is working fine now.

One final note, the new version will have an external JSON file where you can contribute easily by adding any Arabic characters not already included.

I will work now on the other issues you mentioned. Sorry to be late in handling these reported bugs. I am just waiting for my weekend to work on this side project.

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Hi @sakairyota, here is me again!

Thank you for carefully reviewing the utf8Glyphs method and for reporting this precise feedback with clear reproducing instructions.

I am happy to let you know that we also solved the issue you submitted when we have Arabic and non-Arabic chars in the same word (i.e., with no space separator) and the case of losing the first char when a string/line starts with a non-Arabic char.

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Dear @sakairyota, please download the latest version directly from GitHub, test it, and close this issue if we resolved your reported problem properly.

from ar-php.

mouadh-dev avatar mouadh-dev commented on June 15, 2024

Dear @khaled-alshamaa ,
This is my probleme and it's not resolved yet, still the direction rtl not working.

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Dear @mouadh-dev,

I already resolved the problem you mentioned at the root of this thread in this previous reply.

Please note that you say nothing about any RTL problem, but let me guess from your screenshot that you are referring to the right align text issue, which is simply out of this library scope!

It is related to the output function/library you are using. For example, the imagettftext() has no parameter to control that. Well, there are some tricks to overcome that limitation by using the imagettfbox() functionality like the one described in this stack overflow post.

I hope these notes give you enough clue to solve your problem ;-)

from ar-php.

sakairyota avatar sakairyota commented on June 15, 2024

Dear @khaled-alshamaa

Thank you for response and the modification.

The $cDec in the arIdentifier method does not start from the 2nd byte because the initial $i value is -1

Yes, It start with -1, but it is incremented in while condition and it becomes 0, and then $ascii[$i + 1] is evaluated. So I think the first $cDec value is $ascii[1].

New JSON format looks good with extensibility. But unfortunately all of our team members do not familiar with Arabic and Arabic scripts, so it is slightly difficult for us to use this (I learned Arabic script system a little in Wikipedia etc., but still it is difficult to find right form especially the letters in expanded block :( .

Solved version looks good. I'll try it.

Sorry to be late in handling these reported bugs.
I understand, and rather your quick response helps me. Thanks a lot.

from ar-php.

sakairyota avatar sakairyota commented on June 15, 2024

Dear @khaled-alshamaa

I have tried 3 patterns, and 2 patterns are OK. But 'ٱب' becomes 'ب&#x;' without any configuration. I have fixed it and set default values for arGlyphs to return original char without calling addGlyphs.
And I have added the 3 patterns to the unit test.
Please refer to #30.

And I found testEnglishToArabicTransliteration15Cases fails.

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Thanks for your efforts, I will check it on the weekend and back to you.

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

Dear @sakairyota,

The 3rd pattern doesn't work because you didn't add the letter using addGlyphs() method before calling utf8Glyphs() for that string:

$Arabic->addGlyphs('ٱ', 'FB50FB51FB50FB51', false, true);

Well, because we are in a rush to release version 6.3 of the library, I will merge your PR as an appreciation of your efforts. But, I will remove the code block you have added to the arGlyphsInit() because it is not a suitable mechanism for handling the glyphs of Arabic letters context (check this Arabic Presentation Forms-A). Also, the library code can take properly the characters not listed in the ar_glyphs.json file (but will not process their glyphs if you did not add them using addGlyphs()). Still, the new test cases you added to the PHPUnit file are valid, but I need to add the call to the addGlyphs() as mentioned before.

from ar-php.

khaled-alshamaa avatar khaled-alshamaa commented on June 15, 2024

I fixed the issues in your PR #30

from ar-php.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.