clj-det-enc is a encoding detector using juniversalchardet java library.
(require '[det-enc.core :as det])
Usage: (det/detect target)
(det/detect "utf8.txt")
;=> "UTF-8"
(det/detect "unknown.txt")
;=> nil
Usage: (det/detect target encodingname-when-unknown)
(det/detect "unknown.txt" "EUC-JP")
;=> "EUC-JP"
(det/detect "unknown.txt" :default)
;=> "SHIFT_JIS"
return:
encoding name or nil when target encoding cannot be detected.
target:
Whatever clojure.java.io/input-stream can deal with.
(File, filename(String), InputStream, BufferedStream etc)
Target stream is closed automatically.
encodingname-when-unknown:
Return this value when target encoding cannot be detected.
- :default means the default charset of your Java virtual machine.
What encodings can be detected? See juniversalchardet
leiningen
[clj-det-enc "1.0.0"]