Giter VIP home page Giter VIP logo

php-mb-string's Introduction

php-mb-string

GitHub Workflow Status (branch) Packagist Packagist Version Project license GitHub stars Donate to this project using Paypal

A high performance multibyte sting implementation for frequently reading/writing operations.

Why I Write This Package?

Consider that you have a LONG multibyte string and you want to do lots of following operations on it.

  • Random reading/writing such as $char = $str[5]; or $str[5] = '許';.
  • Replacement such as str_replace($search, $replace, $str);.
  • Insertion such as substr_replace($insert, $str, $position, 0);.
  • Get substring such as substr($str, $start, $length);.

Because strings in PHP are not UTF-8, to do operations above safely, you have to either use mb_*() functions or calculate the index by yourself. Using mb_*() functions frequently can be a performance loss because it has to re-decode the source string basing on the given encoding every time when you call it. The longer the string is, the severer the problem becomes.

Instead, this class internally stores the string in its UTF-32 form, which is fixed-width (1 char always occupies 4 bytes) so we are able to perform speedy random accesses. With the power of random access, we could use str_*() functions to do the job internally.

Installation

composer require jfcherng/php-mb-string

Example

See tests/MbStringTest.php.

Benchmark

See benchmark/_results.txt.

What Are You Doing With This Package?

I develop this for a PHP diff package, jfcherng/php-diff.

php-mb-string's People

Contributors

jfcherng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

php-mb-string's Issues

substr() expects parameter 1 to be string, bool given in MbString.php:340

Fatal error: Uncaught TypeError: substr() expects parameter 1 to be string, bool given in /srv/src/vendor/jfcherng/php-mb-string/src/MbString.php:340
Stack trace:
#0 /srv/src/vendor/jfcherng/php-mb-string/src/MbString.php(340): substr(false, 4)
#1 /srv/src/vendor/jfcherng/php-mb-string/src/MbString.php(71): Jfcherng\Utility\MbString->inputConv('\t\t\tvar re = '',...')
#2 /srv/src/vendor/jfcherng/php-diff/src/Renderer/Html/AbstractHtml.php(153): Jfcherng\Utility\MbString->set('\t\t\tvar re = '',...')
#3 /srv/src/vendor/jfcherng/php-diff/src/Renderer/Html/AbstractHtml.php(89): Jfcherng\Diff\Renderer\Html\AbstractHtml->renderChangedExtent(Object(Jfcherng\Diff\Renderer\Html\LineRenderer\Char), '\t\t\tvar re = '',...', '\t\t\tvar re = '',...')
#4 /srv/src/vendor/jfcherng/php-diff/src/Renderer/Html/AbstractHtml.php(113): Jfcherng\Diff\Renderer\Html\AbstractHtml->getChanges(Object(Jfcherng\Diff\Differ))
#5 /srv/src/vendor/jfcherng/php-diff/src/Renderer/AbstractRenderer.php(172): Jfcherng\Diff\Renderer\Html\AbstractHtml->renderWo in /srv/src/vendor/jfcherng/php-mb-string/src/MbString.php on line 340

iconv() can return a string or false.

I am guessing this string is what is breaking it but that's a different issue '\\s!"#$%&()*+,-./:;<=>?@[\]^_{|}����������������\u201d\u201c'

iconv behaviour unreliable

Hi!

I've been building a CMS versioning system using your excellent php-diff library which of course has php-mb-string as a dependency.
It turns out that it works on some server environments, and not others due to it's usage of iconv to convert encoding to UTF-32.
https://github.com/jfcherng/php-mb-string/blob/master/src/MbString.php#L361

e.g. locally \iconv('UTF-8', 'UTF-32', 'This is a string.') outputs the expected string. However, on some hosting providers this returns false.

In a perfect world we could control the server environments it gets installed on, but often it's not the case. 😅

Is there any chance of switching to something like mb_convert_encoding instead of iconv?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.