Giter VIP home page Giter VIP logo

cchardet's Introduction

cChardet

cChardet is high speed universal character encoding detector. - binding to uchardet.

PyPI version

Build for Linux

Build for macOS

Build for Windows

Supported Languages/Encodings

  • International (Unicode)
    • UTF-8
    • UTF-16BE / UTF-16LE
    • UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
  • Arabic
    • ISO-8859-6
    • WINDOWS-1256
  • Bulgarian
    • ISO-8859-5
    • WINDOWS-1251
  • Chinese
    • ISO-2022-CN
    • BIG5
    • EUC-TW
    • GB18030
    • HZ-GB-2312
  • Croatian:
    • ISO-8859-2
    • ISO-8859-13
    • ISO-8859-16
    • Windows-1250
    • IBM852
    • MAC-CENTRALEUROPE
  • Czech
    • Windows-1250
    • ISO-8859-2
    • IBM852
    • MAC-CENTRALEUROPE
  • Danish
    • ISO-8859-1
    • ISO-8859-15
    • WINDOWS-1252
  • English
    • ASCII
  • Esperanto
    • ISO-8859-3
  • Estonian
    • ISO-8859-4
    • ISO-8859-13
    • ISO-8859-13
    • Windows-1252
    • Windows-1257
  • Finnish
    • ISO-8859-1
    • ISO-8859-4
    • ISO-8859-9
    • ISO-8859-13
    • ISO-8859-15
    • WINDOWS-1252
  • French
    • ISO-8859-1
    • ISO-8859-15
    • WINDOWS-1252
  • German
    • ISO-8859-1
    • WINDOWS-1252
  • Greek
    • ISO-8859-7
    • WINDOWS-1253
  • Hebrew
    • ISO-8859-8
    • WINDOWS-1255
  • Hungarian:
    • ISO-8859-2
    • WINDOWS-1250
  • Irish Gaelic
    • ISO-8859-1
    • ISO-8859-9
    • ISO-8859-15
    • WINDOWS-1252
  • Italian
    • ISO-8859-1
    • ISO-8859-3
    • ISO-8859-9
    • ISO-8859-15
    • WINDOWS-1252
  • Japanese
    • ISO-2022-JP
    • SHIFT_JIS
    • EUC-JP
  • Korean
    • ISO-2022-KR
    • EUC-KR / UHC
  • Lithuanian
    • ISO-8859-4
    • ISO-8859-10
    • ISO-8859-13
  • Latvian
    • ISO-8859-4
    • ISO-8859-10
    • ISO-8859-13
  • Maltese
    • ISO-8859-3
  • Polish:
    • ISO-8859-2
    • ISO-8859-13
    • ISO-8859-16
    • Windows-1250
    • IBM852
    • MAC-CENTRALEUROPE
  • Portuguese
    • ISO-8859-1
    • ISO-8859-9
    • ISO-8859-15
    • WINDOWS-1252
  • Romanian:
    • ISO-8859-2
    • ISO-8859-16
    • Windows-1250
    • IBM852
  • Russian
    • ISO-8859-5
    • KOI8-R
    • WINDOWS-1251
    • MAC-CYRILLIC
    • IBM866
    • IBM855
  • Slovak
    • Windows-1250
    • ISO-8859-2
    • IBM852
    • MAC-CENTRALEUROPE
  • Slovene
    • ISO-8859-2
    • ISO-8859-16
    • Windows-1250
    • IBM852
    • M

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
    result = chardet.detect(msg)
    print(result)

Benchmark

$ cd src/
$ pip install chardet
$ python tests/bench.py

Results

CPU: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz

RAM: DDR4-3200 64GB

Platform: Ubuntu 20.04 amd64

Python 3.9.0

Request (call/s)
chardet v3.0.4

0.46

cchardet v2.1.7

1404.05

LICENSE

See COPYING file.

Contact

Platform

Support

  • Windows i686, x86_64
  • Linux i686, x86_64
  • macOS x86_64

Do not Support

cchardet's People

Contributors

pyyoshi avatar dependabot[bot] avatar craigds avatar decaz avatar meshy avatar denik avatar moden-py avatar nirzak avatar asapo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.