Giter VIP home page Giter VIP logo

Comments (11)

rikyoz avatar rikyoz commented on June 12, 2024

Hi!
Thanks for reporting the issue!
Which version of MSVC are you using?
I couldn't reproduce your issue, but I only tested with recent versions of MSVC.

However, while troubleshooting the problem, I found a few lines of code where path encoding wasn't correctly handled.
It might be related to your issue.

I just pushed a commit to the develop branch that should fix those lines (d789fa6).
Hopefully, it should fix your issue, too.

from bit7z.

ra-dave avatar ra-dave commented on June 12, 2024

Appreciate the quick response! I'm using VS 2022.

Just tried the new code, still getting the same crash with BitFileCompressor compress().

BitFileExtractor extract() can't handle a UTF-8 character in the zip filename either. That is the more critical one for me at the moment because I can workaround the BitFileCompressor side if I need to by creating a zip file with a temporary filename and then renaming it outside of bit7z. But trying to do something similar on the extract side is much more problematic (might not have permissions to rename the zip file, etc).

This is where it is crashing when I try extract():

image

image

My build settings:

C:\bit7z\build>cmake ../ -DCMAKE_BUILD_TYPE=Debug
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
-- Standard filesystem: YES
-- Target Version: 4.0.0
-- Compiler ID: MSVC
-- Compiler Version: 19.35.32215.0
-- Architecture: x64
-- Build Type: Debug
-- Language Standard for bit7z: C++17
-- Auto format detection: OFF
-- Regex matching extraction: OFF
-- Use std::byte: OFF
-- Use native string: OFF
-- Generate Position Independent Code: OFF
-- Disable Zip ASCII password check: OFF
-- 7-zip version: 23.01
-- Build tests: OFF
-- Build docs: OFF
-- Auto prefix long paths: OFF
-- Use the default codepage: OFF
-- Path sanitization: OFF
-- Code analysis: OFF
-- Static runtime: OFF
-- CPM: Adding package [email protected] (v23.01)
-- 7-zip source code available at C:/bit7z/build/_deps/7-zip-src
-- Configuring done
-- Generating done
-- Build files have been written to: C:/bit7z/build

from bit7z.

rikyoz avatar rikyoz commented on June 12, 2024

I'm using VS 2022.
Uhm, this is really strange, as I can't replicate the issue...

Just tried the new code, still getting the same crash with BitFileCompressor compress().

I just pushed some other minor fixes that should make the encoding handling even more robust.
However, at this point, I don't think the problem is in bit7z...

BitFileExtractor extract() can't handle a UTF-8 character in the zip filename either. That is the more critical one for me at the moment because I can workaround the BitFileCompressor side if I need to by creating a zip file with a temporary filename and then renaming it outside of bit7z. But trying to do something similar on the extract side is much more problematic (might not have permissions to rename the zip file, etc).

I see. Do you have this issue only with zip files or also with other formats like 7z?

This is where it is crashing when I try extract():

Interesting! Bit7z internally uses MSVC's implementation of std::filesystem. In this particular case, it uses the function std::filesystem::u8path as it expects to construct a path from a UTF-8 string.

Now, MSVC's fs::path internally uses wide strings, so it needs to convert a UTF-8 string to a wide string.

The error message No mapping for the Unicode character exists in the target multi-byte code page. makes me think that the error is that the original string you pass to bit7z is not actually UTF-8 encoded and hence contains some invalid UTF-8 sequences, causing the exception.

One possible way to avoid the throwing of the exception would be to use bit7z's own string conversion functions like this:

   return fs::path{ bit7z::widen( str ) }; // instead of fs::u8path

However, if the original string is not UTF-8 encoded, this might not result in a correct path encoding.

Anyway, thank you for all the details you provided! 🙏

from bit7z.

ra-dave avatar ra-dave commented on June 12, 2024

My zip filename seems to be properly UTF-8 encoded (based on the fact that it appears in the debugger's watch window correctly when I use ",s8" modifier.

I see that you have some automated tests for testing files in the archive that are UNICODE characters but are there any tests for testing the actual archive filename itself? (for Windows)

from bit7z.

rikyoz avatar rikyoz commented on June 12, 2024

My zip filename seems to be properly UTF-8 encoded (based on the fact that it appears in the debugger's watch window correctly when I use ",s8" modifier.

Uhm ok, strange! I'll need to investigate this further, even though without replicating is a bit difficult to understand what the actual problem is!

I see that you have some automated tests for testing files in the archive that are UNICODE characters but are there any tests for testing the actual archive filename itself? (for Windows)

Actually, yeah, here and here the automated tests do exactly that: reading an archive with Unicode characters in its name!

from bit7z.

ra-dave avatar ra-dave commented on June 12, 2024

Uhm ok, strange! I'll need to investigate this further, even though without replicating is a bit difficult to understand what the actual problem is!

It's easy to replicate. Just use a UTF-8 encoded std::string for the archive filename, in Windows.

from bit7z.

rikyoz avatar rikyoz commented on June 12, 2024

It's easy to replicate. Just use a UTF-8 encoded std::string for the archive filename, in Windows.

Yeah, the problem is that when I try to replicate it, it works!

Just for context, I'm testing everything in a clean installation of Windows 10 inside a VM with the default regional and language settings:

The only things installed are 7-zip, Visual Studio 2022, CMake, and Git.
I cloned bit7z in the Documents folder, and created a CMake project in a separate folder called bit7z-test:

cmake_minimum_required( VERSION 3.8.2 FATAL_ERROR )
project( bit7ztest )

set( CMAKE_CXX_STANDARD 11 )
set( CMAKE_CXX_STANDARD_REQUIRED ON )

add_executable( bit7ztest main.cpp )

add_subdirectory( ${PROJECT_SOURCE_DIR}/../bit7z/ ${CMAKE_CURRENT_BINARY_DIR}/bit7z )
target_link_libraries( bit7ztest PRIVATE bit7z )
target_compile_options( bit7ztest PRIVATE /utf-8 )

The test files (to be compressed) are in the test_unicode folder in the project's folder:

I tested the compression, and it works just fine:

#include <iostream>

#include <direct.h>

#include <bit7z/bit7zlibrary.hpp>
#include <bit7z/bitexception.hpp>
#include <bit7z/bitfilecompressor.hpp>

using namespace bit7z;

auto main() -> int {
    std::cout << "Current Code Page: " << GetACP() << std::endl;
    _chdir(R"(C:\Users\Riccardo\Documents\bit7z-test\)");

    try {
        Bit7zLibrary lib{ R"(C:\Program Files\7-Zip\7z.dll)" };
        BitFileCompressor compressor{ lib, BitFormat::Zip };
        compressor.compressDirectory( "test_unicode/", u8"SottoMenù.zip" );
    } catch (const bit7z::BitException& ex) {
        std::cerr << ex.what() << std::endl;
    }
    return 0;
}



I tested extraction too, from the same archive created in the previous test:

#include <iostream>

#include <direct.h>

#include <bit7z/bit7zlibrary.hpp>
#include <bit7z/bitexception.hpp>
#include <bit7z/bitfileextractor.hpp>

using namespace bit7z;

auto main() -> int {
    std::cout << "Current Code Page: " << GetACP() << std::endl;
    _chdir(R"(C:\Users\Riccardo\Documents\bit7z-test\)");

    try {
        Bit7zLibrary lib{ R"(C:\Program Files\7-Zip\7z.dll)" };
        BitFileExtractor extractor{ lib, BitFormat::Zip };
        extractor.extract( u8"SottoMenù.zip", "test_extract/" );
    } catch (const bit7z::BitException& ex) {
        std::cerr << ex.what() << std::endl;
    }
    return 0;
}

Again, it worked without issues:

All the tests were performed using the develop branch of bit7z.

from bit7z.

rikyoz avatar rikyoz commented on June 12, 2024

The only way I found to replicate your issue is by forcing the encoding of the string literal to be Windows-1252:

compressor.compressDirectory( "test_unicode/", "SottoMen\xF9.zip" );

But we already confirmed that you're using UTF-8 strings, so I'll need to investigate the issue further!

from bit7z.

rikyoz avatar rikyoz commented on June 12, 2024

So, I've released a new maintenance version v4.0.2 that contains all the fixes to the UTF-8 support I made on the develop branch.
I don't know if you tested all these changes, but they could fix your issue.

If the issue persists, could you provide a more detailed stack trace of where the exception happens? At least on bit7z's side.
Also, are you using MSVC's /utf-8 option, or the /source-charset and /execution-charset options?

Any details you can provide will be really useful;
I'm continuing to try to replicate it, but I haven't been lucky so far, so I really appreciate any help you can provide!

from bit7z.

ra-dave avatar ra-dave commented on June 12, 2024

I'm still testing but it appears 4.0.2 fixed my issue (4.0.1 definitely still had the issue).

Thanks! I'll let you know if there are any issues but it's looking good.

from bit7z.

rikyoz avatar rikyoz commented on June 12, 2024

I'm still testing but it appears 4.0.2 fixed my issue (4.0.1 definitely still had the issue).
Thanks!

Great, you're welcome!

I'll let you know if there are any issues but it's looking good.

Thanks!

from bit7z.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.