Giter VIP home page Giter VIP logo

libchardet's Introduction

libchardet - Mozilla's Universal Charset Detector C/C++ API

Build Status GitHub license GitHub last release GitHub closed issues GitHub closed pull requests

License

Copyright (c) 2019 JoungKyun.Kim http://oops.org All rights reserved.

This program is under MPL 1.1 or LGPL 2.1

Description

libchardet is based on Mozilla Universal Charset Detector library and, detects the character set used to encode data.

Original code was writed by Netscape Communications Corporation, Techniques used by universalchardet are described at <http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html>.

libchardet see also John Gardiner Myers's Encode-Detect-1.01 perl module, and added C wrapping API, and library build environment with libtool.

From 1.0.5, libchardet was reflected single-byte charset detection confidence algorithm of uchardet and new language models. (Arabic, Danish, Esperanto, German, Spanish, Turkish, Vietnamese)

From 1.0.6, bom members have been added to the DetectObj structure. The value of the bom member is 1, which means that it has been detected as a BOM. Support for bom member can be determined by the existence of the CHARDET_BOM_CHECK constant. See example below.

Installation

See also INSTALL document

Sample Codes

See also test directory of source code

       #include <chardet.h>

       int main (void) {
            DetectObj *obj;
            char * str = "안녕하세요";

            if ( (obj = detect_obj_init ()) == NULL ) {
                 fprintf (stderr, "Memory Allocation failed\n");
                 return CHARDET_MEM_ALLOCATED_FAIL;
            }

       #ifndef CHARDET_BINARY_SAFE 
            // before 1.0.5. This API is deprecated on 1.0.5
            switch (detect (str, &obj))
       #else
            // from 1.0.5
            switch (detect_r (str, strlen (str), &obj))
       #endif
            {
                 case CHARDET_OUT_OF_MEMORY :
                      fprintf (stderr, "On handle processing, occured out of memory\n");
                      detect_obj_free (&obj);
                      return CHARDET_OUT_OF_MEMORY;
                 case CHARDET_NULL_OBJECT :
                      fprintf (stderr,
                                "2st argument of chardet() is must memory allocation "
                                "with detect_obj_init API\n");
                      return CHARDET_NULL_OBJECT;
            }

        #ifndef CHARDET_BOM_CHECK
            printf ("encoding: %s, confidence: %f\n", obj->encoding, obj->confidence);
        #else
            // from 1.0.6 support return whether exists BOM
            printf (
                "encoding: %s, confidence: %f, exist BOM: %d\n",
                 obj->encoding, obj->confidence, obj->bom
            );
        #endif
            detect_obj_free (&obj);

            return 0;
       }

or looping code

       #include <chardet.h>

       int main (void) {
            Detect    * d;
            DetectObj * obj;
            char * str = "안녕하세요";

            if ( (d = detect_init ()) == NULL ) {
                 fprintf (stderr, "chardet handle initialize failed\n");
                 return CHARDET_MEM_ALLOCATED_FAIL;
            }

            while ( 1 ) {
                detect_reset (&d);

                if ( (obj = detect_obj_init ()) == NULL ) {
                     fprintf (stderr, "Memory Allocation failed\n");
                     return CHARDET_MEM_ALLOCATED_FAIL;
                }

       #ifndef CHARDET_BINARY_SAFE 
                // before 1.0.5. This API is deprecated on 1.0.5
                switch (detect_handledata (&d, str,, &obj))
       #else
                // from 1.0.5
                switch (detect_handledata_r (&d, str, strlen (str), &obj))
       #endif
                {
                     case CHARDET_OUT_OF_MEMORY :
                          fprintf (stderr, "On handle processing, occured out of memory\n");
                          detect_obj_free (&obj);
                          return CHARDET_OUT_OF_MEMORY;
                     case CHARDET_NULL_OBJECT :
                          fprintf (stderr,
                                    "2st argument of chardet() is must memory allocation "
                                    "with detect_obj_init API\n");
                          return CHARDET_NULL_OBJECT;
                }

        #ifndef CHARDET_BOM_CHECK
                printf ("encoding: %s, confidence: %f\n", obj->encoding, obj->confidence);
        #else
                // from 1.0.6 support return whether exists BOM
                printf (
                    "encoding: %s, confidence: %f, exist BOM: %d\n",
                    obj->encoding, obj->confidence, obj->bom
                );
        #endif
                detect_obj_free (&obj);

                if ( 1 )
                    break;
            }
            detect_destroy (&d);

           return 0;
       }

APIs

libchardet's People

Contributors

joungkyun avatar gaoxiang-ut avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.