Giter VIP home page Giter VIP logo

libmobi's Introduction

Libmobi

C library for handling Mobipocket/Kindle (MOBI) ebook format documents.

Library comes with several command line tools for working with mobi ebooks. The tools source may also be used as an example on how to use the library.

Features:

  • reading and parsing:
    • some older text Palmdoc formats (pdb),
    • Mobipocket files (prc, mobi),
    • newer MOBI files including KF8 format (azw, azw3),
    • Replica Print files (azw4)
  • recreating source files using indices
  • reconstructing references (links and embedded) in html files
  • reconstructing source structure that can be fed back to kindlegen
  • reconstructing dictionary markup (orth, infl tags)
  • writing back loaded documents
  • metadata editing
  • handling encrypted documents
  • encrypting documents for use on eInk Kindles

Todo:

  • improve writing
  • serialize rawml into raw records
  • process RESC records

Doxygen documentation:

Source:

Packages:

Packaging status

Installation:

[for git] $ ./autogen.sh
$ ./configure
$ make
[optionally] $ make test
$ sudo make install

On macOS, you can install via Homebrew with brew install libmobi.

Alternative build systems

  • The supported way of building project is by using autotools.
  • Optionally project provides basic support for CMake, Xcode and MSVC++ systems. However these alternative configurations are not covering all options of autotools project. They are also not tested and not updated regularly.

Usage

  • single include file: #include <mobi.h>
  • linker flag: -lmobi
  • basic usage:
#include <mobi.h>

/* Initialize main MOBIData structure */
/* Must be deallocated with mobi_free() when not needed */
MOBIData *m = mobi_init();
if (m == NULL) { 
  return ERROR; 
}

/* Open file for reading */
FILE *file = fopen(fullpath, "rb");
if (file == NULL) {
  mobi_free(m);
  return ERROR;
}

/* Load file into MOBIData structure */
/* This structure will hold raw data/metadata from mobi document */
MOBI_RET mobi_ret = mobi_load_file(m, file);
fclose(file);
if (mobi_ret != MOBI_SUCCESS) { 
  mobi_free(m);
  return ERROR;
}

/* Initialize MOBIRawml structure */
/* Must be deallocated with mobi_free_rawml() when not needed */
/* In the next step this structure will be filled with parsed data */
MOBIRawml *rawml = mobi_init_rawml(m);
if (rawml == NULL) {
  mobi_free(m);
  return ERROR;
}
/* Raw data from MOBIData will be converted to html, css, fonts, media resources */
/* Parsed data will be available in MOBIRawml structure */
mobi_ret = mobi_parse_rawml(rawml, m);
if (mobi_ret != MOBI_SUCCESS) {
  mobi_free(m);
  mobi_free_rawml(rawml);
  return ERROR;
}

/* Do something useful here */
/* ... */
/* For examples how to access data in MOBIRawml structure see mobitool.c */

/* Free MOBIRawml structure */
mobi_free_rawml(rawml);

/* Free MOBIData structure */
mobi_free(m);

return SUCCESS;
  • for examples of usage, see tools

Requirements

  • compiler supporting C99
  • zlib (optional, configure --with-zlib=no to use included miniz.c instead)
  • libxml2 (optional, configure --with-libxml2=no to use internal xmlwriter)
  • tested with gcc (>=4.2.4), clang (llvm >=3.4), sun c (>=5.13), MSVC++ (2015)
  • builds on Linux, MacOS, Windows (MSVC++, MinGW), Android, Solaris
  • tested architectures: x86, x86-64, arm, ppc
  • works cross-compiled on Kindle :)

Tests

  • Github Action status
  • Travis status
  • Coverity status

Projects using libmobi

License:

  • LGPL, either version 3, or any later

Credits:

  • The huffman decompression and KF8 parsing algorithms were learned by studying python source code of KindleUnpack.
  • Thanks to all contributors of Mobileread MOBI wiki

libmobi's People

Contributors

bfabiszewski avatar codetheweb avatar occia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libmobi's Issues

AZW3 file generates table of contents that does not work

I have an AZW3 file that I cannot post publicly, but could send you by email for testing. When converted to EPUB, it generates non-functional Table of Contents (TOC) - the chapter names are correct, but links do not work. The TOC entries are like:

  <navPoint id="toc-2" playOrder="2">
   <navLabel>
    <text>CAP&amp;Iacute;TULO II: Otra mudanza ca&amp;oacute;tica</text>
   </navLabel>
   <content src="part00000.html#"/>
  </navPoint>

Note that '' is missing a tag after # character. The same happens with internal links in ebook text to the chapter titles. The same file converts fine to EPUB e.g. with Calibre.

BTW, tried to email you privately about this first, but the email does not go through and sits in the retry queue. Your own mail server at your .net domain says that your email address is graylisted...

Greg

an issue with the function implementation of mobi_buffer_get_varlen_internal in src/buffer.c

When you create a MOBIBuffer object:

    typedef struct {
    size_t offset; /**< Current offset in respect to buffer start */
    size_t maxlen; /**< Length of the buffer data */
    unsigned char *data; /**< Pointer to buffer data */
    MOBI_RET error; /**< MOBI_SUCCESS = 0 if operation on buffer is successful, non-zero value on failure */
} MOBIBuffer;

the initial value of buf->offset is 0:

MOBIBuffer * mobi_buffer_init_null(unsigned char *data, const size_t len) {
    MOBIBuffer *buf = malloc(sizeof(MOBIBuffer));
    if (buf == NULL) {
        debug_print("%s", "Buffer allocation failed\n");
        return NULL;
    }
    buf->data = data;
    buf->offset = 0;
    buf->maxlen = len;
    buf->error = MOBI_SUCCESS;
    return buf;
}

I think there is a problem calling mobi_buffer_get_varlen_internal when direction is -1(read buffer backwards) with a value of buf->offset that is 3.
If buf->offset is 3, it should Reads maximum 4 bytes from the buffer. Stops when byte has bit 7 set.
so it should read byte number 3, byte number 2, byte number 1, and then byte number 0.
but when it comes to read byte number 0, we can see the following check at line 267:
if (buf->offset < 1) {
it checks if zero is less than 1 and it is, so an error is printed and only the last 3 bytes that have been read return and not the 4.
(even though according to pull request it should return 0)

if it needs to read byte number 0 - it should read it and then return without decrementing buf->offset of 0 because if it does it, it will lead to an integer underflow and we will get the max value for size_t in buf->offset, so I suggest checking if it is 0 after reading the byte to the value byte and after updating the value of val, and if buf->offset is 0,
we should check byte_count and according to that decide whether to execute

                debug_print("%s", "End of buffer\n");
                buf->error = MOBI_BUFFER_END;
                return 0;

or to set byte to stop_flag so it will stop reading and return val, while keeping buf->offset at 0,

Can't get image from mobi

Hello @bfabiszewski
I am using your another lib QLMobi combine with libmobi to parse html and images from mobi book.
Most book works great, but some books can not get media image.
I have try to fix but can not get the point. Hope you can help,this is the last problem for me i think~
Both QLMobi and libmobi are great nearly perfect lib.
Thank you very much for your great job~
World of Warcraft - Dawn of the Aspects Part I.mobi.zip

Also i am the developer of Alook Browser - 2x Speed (https://itunes.apple.com/us/app/alook-web-browser-2x-speed/id1261944766?mt=8) if you are using iOS ,and here is a promotional code JWYTH3FE4JJK
Forgive my poor english~
Best Regards.

Out of bounds write, crash

diff --git a/src/util.c b/src/util.c
index be08b26..8887afd 100644
--- a/src/util.c
+++ b/src/util.c
@@ -1601,7 +1601,7 @@ static MOBI_RET mobi_decompress_content(const MOBIData *m, char *text, FILE *fil
         if (dump) {
             fwrite(decompressed, 1, decompressed_size, file);
         } else {
-            if (text_length > *len) {
+            if (text_length + decompressed_size > *len) {
                 debug_print("%s", "Text buffer too small\n");
                 /* free huff/cdic tables */
                 mobi_free_huffcdic(huffcdic);
-- 
2.7.4

Bug: Integer overflow parsing record offsets

There is an error parsing the records offsets in mobi_load_rec. If the next record offset is lower than the previous that results in a negative size that overflows the unsigned integer, so the malloc in mobi_load_recdata can be enormous.

        if (curr->next != NULL) {
            next = curr->next;
            size = next->offset - curr->offset; // <- integer overflow here
        } else {
           ....stripped
        }

        curr->size = size;
        ret = mobi_load_recdata(curr, file); // -> malloc(curr->size); -> enormous malloc

Here is sample that shows this behaviour:
sample.zip

Amazon azw4 format?

Cześć Bartek!
One of the users of my app (@voice Aloud Reader in Google Play) sent me the first azw4 ebook. Do you think you could include this format into your library? Would you need any help with this? I was able to convert the file to epub using the latest calibre, but the format is weird - short lines about 80 characters long formatted as <p>...</p>. Could be a problem with this original file, or calibre's conversion process, don't know at this time.

OK, just managed to update my old Kindle HDX 3rd generation, and it opened the azw4 file fine, no problem with formatting there. Apparently Calibre's conversion is not perfect yet. Please let me know if you have any plans regarding AZW4. Thanks!

Grzesiek

I am confused about a function.

_buffer_get_varlen I am puzzled by this function, why should I read 7 bit, Stops when byte has bit 7 set, I am also confused about this condition. Should not be a step-by-step read 8 bit

Trying to get in touch regarding a security issue

Hey there!

I'd like to report a security issue but cannot find contact instructions on your repository.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

enabling MOBI_DEBUG on Windows

I am getting a CMake error if I enable MOBI_DEBUG on Windows (VS 2022):

cl : command line error D8021: invalid numeric argument '/Wextra'

convert mobi ebook to epub error

convert mobi file to epub format successfully, but the epub file format is error, it can't be opened by iBooks and many android epub readers. I check the epub file with calibre-edit, and get the error below:

ERROR: Parsing failed: xmlParseEntityRef: no name, line 1, column 807    [OEBPS/part00000.html]
INFO: File too large    [OEBPS/part00000.html]

123_test.epub.zip

AddressSanitizer: heap-buffer-overflow at buffer.c:212

We found with our fuzzer several heap-buffer-overflow errors when compiling libmobi with address sanitizer and run with the command mobitool -i7m $file. Someone else also found a few others here.

We will list them separately in the following issue threads and this is the 1st one.

POC (proof-of-crash) files:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A212_1.mobi
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A212_2.mobi

gdb output:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c:212_1.mobi.gdb.txt
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c:212_2.mobi.gdb.txt

README question: can libmobi also create new documents from scratch?

The README lists a lot of features, but they're all apparently centered around reading or modifying an existing file.

Can libmobi also create new ebooks from scratch? (For use in an EPUB->MOBI conversion software) If yes, maybe another bullet point in the README clarifying that would be useful 🙂

Thanks for creating this cool library!

Homebrew formula

Homebrew is an awesome package manager for macOS. If you add a brew formula, i.e. libmobi.rb, it will get very convenient to install libmobi on macOS.

MOBI_ATTRNAME_MAXSIZE 100 for some books it's not enought

Please increase MOBI_ATTRNAME_MAXSIZE and MOBI_ATTRVALUE_MAXSIZE to 150

#define MOBI_ATTRNAME_MAXSIZE 150 /< Maximum length of tag attribute name, like "href" */
#define MOBI_ATTRVALUE_MAXSIZE 150 /
< Maximum length of tag attribute value */

thanks

toc.ncx is sometimes created with wrong navigation labels

The issue is with "World of Warcraft - Dawn of the Aspects Part I.mobi" ebook file, submitted with "Mobi file can't parse #10" by @LiuDeng:

The toc.ncx that libmobi generates from this file has wrong links. For example for "Part I" we have in toc.ncx:

<navPoint id="toc-3" playOrder="3">
<navLabel>
<text>Part I</text>
</navLabel>
<content src="part00000.html#0000006908"/>

However, there is no element with id "0000006908" in part00000.html at all. Instead, "Part I" header is preceded with:

<a id="0000006902">

Could you maybe tell me where and how the toc.ncx is constructed, maybe then I could find a fix on my own...

Greg

Can't convert mobi file to epub

HI:
I use create_epub(const MOBIRawml *rawml, const char *fullpath) ,create epub file, but epub file Wrong format, can't open; thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.