Giter VIP home page Giter VIP logo

Comments (5)

rasmusgreve avatar rasmusgreve commented on June 10, 2024

Sample files demonstrating the issue:
format.txt
sample.txt
Note that format.txt, would be format.ini if GitHub allowed upload of .ini files.

from fwdataviz.

shriprem avatar shriprem commented on June 10, 2024

This is because FWDataViz uses byte counts, and not character counts, for determining field widths. This choice was made during the initial stages to keep the core algorithm for the plugin simple and fast. Unicode usage was enabled only on the File Type Label, Record Type Label and Field Labels since their use with those labels didn't affect that goal.

That being said, I took the time to review the code before posting this response here. My preliminary observation is that a File Type-level flag can be added to switch the algorithm between byte count (existing) and character count (new) modes for determining field widths.

The UTF-8 character encoding can yield between 1 and 4 bytes per character. Character counting entails inspecting every byte in each record, and on every paint event of the NPP document (this paint event is triggered more frequently than you would think). This overhead affects the plugin's performance. Therefore, the switch to character count mode should be made on only those file types that may require it. So, I plan to make this switch available in the File Type Configuration dialog in a very obscure, non-intuitive manner. Only those needing this mode should be able to make the switch.

Also, note that turning on the character count mode means that TAB characters will be treated as multiple spaces and counted as such. This characteristic is part of the Scintilla API. You can see this in play with my other plugin: GotoLineCol wherein both byte-based and character-based column navigation feature is available.

Lastly, in this round of enhancement, the File Record Terminator, ADFT Line Regex Keys, and Record Type Regex Keys could continue to be limited to ANSI characters only. We will take baby steps, and only if and when we need them.

It will take me a few days to implement this enhancement. In the meanwhile, if you have any additional inputs, please let me know. Thanks!

from fwdataviz.

shriprem avatar shriprem commented on June 10, 2024

The enhancement for Multi-Byte Character mode is available in the latest 2.2.0.0 Release.

Please be sure to read the Multi-Byte Character document. You may benefit from enabling the hidden option for quick override mode from the plugin panel.

DLL-only manual upgrade of a previously installed version of the plugin

Go to the 2.2.0.0 Release page. Download the zip file version to match your Notepad++ bitness. Then, either:

  1. In Notepad++, navigate to menu: Settings ยป Import ยป Import Plugins... and import the dll file extracted from the zip file.
  2. OR, Extract the FWDataviz.dll file into the <NPP_Plugins_folder>/FWDataviz folder to overwrite the existing DLL therein.

from fwdataviz.

rasmusgreve avatar rasmusgreve commented on June 10, 2024

That is an excellent solution to the problem, and a fine compromise between keeping the highlighter fast while also supporting formats with multi-byte characters.
Thank you for your great work!

from fwdataviz.

shriprem avatar shriprem commented on June 10, 2024

@rasmusgreve , please try out the latest v2.3.0.0 release for FWDataViz. This release fully supports multi-byte characters, to include Record Terminators, ADFT Regexes & Record Markers. Also, this release features Field Copy, Field Paste and Field Hop -- all of these also accessible via menu-based keyboard shortcuts.

Thank you.

from fwdataviz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.