Comments (6)
I cannot tell from the example a.js
file in the description whether the á
character is correctly encoded as UTF-8 in the file you're actually using when you see this error.
Can you confirm that the input file, a.js
is actually correct utf-8?
from closure-compiler.
Actually, could you just attach 2 files to this issue?
- The actual
a.js
file. - The exact output from closure-compiler itself. (i.e. the input that python is seeing)
from closure-compiler.
Here are the input files: a.zip
C3 A1
is 11000011 10100001
, which is of form 110xxxxx 10yyyyyy
, i.e. a leading code point and a continution code point. See e.g. Wikipedia on UTF-8 Encoding. The Unicode code point in this case will be xxxxxyyyyyy
= 00011 100001
= 0xE1
= https://www.compart.com/en/unicode/U+00E1.
The exact output from closure-compiler itself. (i.e. the input that python is seeing)
The test case does not produce any JavaScript output from closure-compiler. Python attempts to capture the stderr error message from Closure process, but Python croaks internally since it cannot decode the stderr bytes that Closure is outputting, and so does not produce any output to the calling a.py file.
Executing the following python file instead
import subprocess
ret = subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='iso-8859-1', stderr=subprocess.PIPE, shell=True)
print(ret.stderr)
does not throw an exception, and instead causes Python to print the stderr as expected:
a.js:1:4: WARNING - [JSC_SUSPICIOUS_NAN] Comparison against NaN is always false. Did you mean isNaN()?
1| if (4 == NaN) console.log('á');
^^^^^^^^
from closure-compiler.
What I want to know is this:
Is closure-compiler actually generating an invalid character sequence to stderr
, or is something else going on?
One thing that could be happening is that the stderr
output from closure-compiler could be getting mixed with output from either its own stdout
or output from some other process that happens to share the same output stream. Due to buffering, the 2-character sequence for 'á'
closure-compiler
sends to stderr
could be interrupted by output from somewhere else..
Thanks for providing the a.js
file and your command line. We can use that to find out what the actual stderr
output from the latest closure-compiler
build is for this case.
If this problem is in some way actually tied to Windows, we're unlikely to fix it ourselves as none of the core team uses Windows when working on closure-compiler.
from closure-compiler.
Thank you for supplying the a.js
file.
- I downloaded it
- I checked out and built the latest version of closure-compiler as a Java jar file.
- I stored the path to that jar file in
$ccjar
- I ran the following commands to check the behavior.
First confirm that my terminal / OS is using UTF-8
$ echo $LANG
en_US.UTF-8
$ echo á |xxd
00000000: c3a1 0a
Yep. c3a1
is the correct byte pair for this UTF-8 character as stated in a previous comment.
Now confirm that the character is correct in a.js
$ xxd a.js
00000000: 6966 2028 3420 3d3d 204e 614e 2920 636f if (4 == NaN) co
00000010: 6e73 6f6c 652e 6c6f 6728 27c3 a127 293b nsole.log('..');
00000020: 0d0a ..
Yep.
Now run the compiler with the options as described in earlier comments and save its stderr
output into err.out
and use xxd
to check the contents of that file.
$ java -jar $ccjar --charset=UTF8 --js a.js --js_output_file o.js 2> err.out
$ xxd err.out
00000000: 612e 6a73 3a31 3a34 3a20 5741 524e 494e a.js:1:4: WARNIN
00000010: 4720 2d20 5b4a 5343 5f53 5553 5049 4349 G - [JSC_SUSPICI
00000020: 4f55 535f 4e41 4e5d 2043 6f6d 7061 7269 OUS_NAN] Compari
00000030: 736f 6e20 6167 6169 6e73 7420 4e61 4e20 son against NaN
00000040: 6973 2061 6c77 6179 7320 6661 6c73 652e is always false.
00000050: 2044 6964 2079 6f75 206d 6561 6e20 6973 Did you mean is
00000060: 4e61 4e28 293f 0a20 2031 7c20 6966 2028 NaN()?. 1| if (
00000070: 3420 3d3d 204e 614e 2920 636f 6e73 6f6c 4 == NaN) consol
00000080: 652e 6c6f 6728 27c3 a127 293b 0d0a 2020 e.log('..');..
00000090: 2020 2020 2020 205e 5e5e 5e5e 5e5e 5e0a ^^^^^^^^.
000000a0: 0a30 2065 7272 6f72 2873 292c 2031 2077 .0 error(s), 1 w
000000b0: 6172 6e69 6e67 2873 290a arning(s).
Yep. We again see "c3" and "a1" used as the 2-byte encoding in bytes at positions 0x87 and 0x88.
The Java jar executing in Linux is definitely generating stderr using UTF-8 encoding.
Probably the closure-compiler you're running has been converted from a jar file to a native Windows binary using Graal, because I think that's what the google/closure-compiler-npm code that generates the NPM release tries to make the default.
I'm not sure if the different behavior you see is the result of Windows behavior or in the behavior of Java on Windows (as emulated by Graal), or something else.
from closure-compiler.
One simplification/note to the bug test case is that the original a.py
was
import subprocess
subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True)
although this bug does not relate to --charset=UTF8
parameter, and the bug occurs also with shorter line
import subprocess
subprocess.run(['npx', 'google-closure-compiler','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True)
It is expected that the issue does not occur on Linux or macOS, since those OSes default to UTF-8 widely.
In my Windows shell I have changed my active codepage to UTF-8, i.e.
C:\emsdk\emscripten\main>chcp
Active code page: 65001
See chcp 65001.
Although this change does not affect the bug, so this is not a Windows terminal/console issue, but something somewhere in the libraries in question either in Closure or somewhere else like observed.
We successfully worked around this in Emscripten code by specifying a directive encoding='iso-8859-1' if WINDOWS else 'utf-8'
when invoking Closure.
from closure-compiler.
Related Issues (20)
- com.google.guava vulnerability issue in closure-compiler-20220502 HOT 4
- Assigning a variable while passing to a function fails AC HOT 6
- A for loop compiles incorrectly in WHITESPACE_ONLY and SIMPLE_OPTIMIZATIONS HOT 3
- CommandLineRunner: "renaming cannot be disabled when ADVANCED_OPTIMIZATIONS is used" is wrong HOT 4
- "Copyright The Closure Library Authors" always shows up as the first comment even in code that is ultimately not Apache 2.0 licensed HOT 2
- ADVANCED compilation mode incorrectly removes non-dead code HOT 7
- BUG in Whitespace only for online compiler
- Extern definitions for WebGPU HOT 4
- Google Closure Compiler on Mac with M3 chip HOT 2
- Concatenated template literals are not merged
- java.lang.RuntimeException: INTERNAL COMPILER ERROR. HOT 3
- INTERNAL COMPILER ERROR: Cannot invoke "com.google.javascript.rhino.PMap.isEmpty()" because "that" is null HOT 1
- Removal of non deprecated method SourceFile fromInputStream HOT 3
- Closure minification adds \uxxxx escapes into output file, increasing code size HOT 2
- Dataflow analysis appears to diverge around: WHILE 28288:118 HOT 1
- closure-compiler-unshaded v20240317 embeds jakarta.annotation-api HOT 5
- How does --warnings_allowlist_file works? I tried many ways but no luck HOT 6
- JSC_INEXISTENT_PROPERTY_WITH_SUGGESTION is hidden if the inexistent property is set in unrelated code. HOT 2
- Keep long bigint literals in hexadecimal format HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from closure-compiler.