--- author: Stéphane Laurent date: '2017-10-24' highlighter: 'pandoc-solarized' output: html_document: keep_md: no md_document: preserve_yaml: True variant: markdown tags: R title: File encoding detection in R --- There is a Java port of `universalchardet`, the encoding detector library of Mozilla. It is called `juniversalchardet`. I'm going to show how to use it with the `rJava` package. Firstly, download the `jar` file here: . Then, proceed as follows: ``` {.r} library(rJava) .jinit() .jaddClassPath("path/to/juniversalchardet-1.0.3.jar") detector <- new(J("org/mozilla/universalchardet/UniversalDetector"), NULL) f <- "juniversalchardet.Rmd" # file whose encoding you want to know flength <- as.integer(file.size(f)) .jcall(detector, "V", "handleData", readBin(f, what="raw", n=flength), 0L, flength) .jcall(detector, "V", "dataEnd") .jcall(detector, "S", "getDetectedCharset") ## [1] "UTF-8" .jcall(detector, "V", "reset") ``` Update 2020 =========== Nowadays one can achieve the same result with the R package `uchardet` (which does not use Java).