Comments (12)
UPDATE: SI units are now supported in R-devel, see r71960.
from wishlist-for-r.
Filed PR18435 adding new SI prefixes RB (ronnabytes) and QB (quettabytes) to format()
for object_size
.
from wishlist-for-r.
As a first step, I just filed a backward-compatible patch to add support for IEC units in utils:::format.object_size()
, cf. PR #16649.
UPDATE: This has been implemented as of 2016-01-06 in r69879.
from wishlist-for-r.
IEC units are now supported by R. As the next step, I filed a backward-compatible patch to add support for JEDEC units in utils:::format.object_size()
, cf. PR #16657.
from wishlist-for-r.
- Can you give an example and reference for "The Ubuntu Linux distribution uses the IEC prefixes since 2010" ? Personally, I find the 'KiB' notation quite ugly. I see
df -h
,du -h
,ls -h
all use suffixesK
,M
,G
.. but no "iB" (or "B" or "b"). - The real problem is that the SI standard really want "KB" or "MB" to mean something different than "KiB" or "MiB" and JEDEC does not.... But really the SI system is the world standard one, and JEDEC is mainly "industry" and not science bases (which the SI is). So, in principle --- if we are willing to change back compatibility--- we should really move towards the real world standard, i.e., the SI standard system.... and consequently, I'd be against endorsing JEDEC any more than we do now
(by accepting it on "input").
from wishlist-for-r.
Thanks for the comments.
- I got the "Ubuntu" statement from [1], but must have been sloppy. I've now clarified it to say: "The Ubuntu Linux distribution uses the IEC prefixes for base-2 units and SI prefixes for base-10 units" which reflects Ubuntu's official UnitsPolicy.
- Searching the web, there are references starting ~2010 (around Ubuntu 10.10) saying Ubuntu will move to using decimal/base-10 units with SI prefixes throughout. I don't know where they are regarding that goal.
- SI vs JEDEC confusion: If I understand your comment correctly, you're saying we'll introduce more confusion if we explicitly add support for JEDEC. If so, I agree with you. My idea was to introduce it properly, to make it explicit that the old R units are home brewed. I'm happy to skip JEDEC.
- Long-term for R: If this is what you are saying, I agree, supporting both decimal/base-10 and binary/base-2 units, using SI and IEC prefixes respectively, would be ideal. I'm all for that as well. Since R has only single API entry (=
utils::format.object_size()
) we could even introduce argumentbase=getOption(object.size.base=2)
controlling whether base 2 or base 10 should be displayed (whenunits="auto"
). It would also allow us to migrate from current base 2 to base 10 smoothly (and allow users to undo via the option), if that is where we heading. BTW,gc()
should utilizeutils::format.object_size()
. - To implementing the transition from R's current base-2 units (Kb, Mb, Gb) to SI/base-10 units (kB, MB, GB), it might be less of a shock if one does this in few release cycles:
- Switch to using IEC/base-2 units (KiB, MiB, GiB, ...) for
units="auto"
. - Deprecate explicit usage of units="Kb", units="Mb", ...
- Switch to using SI/base-10 units (kB, MB, GB, ...) for
units="auto"
.
- Switch to using IEC/base-2 units (KiB, MiB, GiB, ...) for
What do you think?
from wishlist-for-r.
Another approach that could work is to add support for units="IEC"
, units="SI"
and units="legacy"
. That can be done without breaking backward compatibilty. The units="auto"
can equal units="legacy"
and any future transitions can be in what units="auto"
corresponds to.
UPDATE: The issues with this is that it's not possible to control whether units="MB"
is meant to be current R "legacy" (base-2) units or SI (base-10) units.
from wishlist-for-r.
Here's my new proposal for supporting "legacy", IEC and SI units in a backward compatible way and such that it will be easy to switch from today's default "legacy" to SI units at some point in R's future.
The file to be updated in R is src/library/utils/R/object.size.R:
object.size <- function(x)
structure(.Call(C_objectSize, x), class = "object_size")
format.object_size <- function(x, units = "b", standard = "auto", digits = 1L, ...)
{
known_bases <- c(legacy = 1024, IEC = 1024, SI = 1000)
known_units <- list(
SI = c("B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"),
IEC = c("B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"),
legacy = c("b", "Kb", "Mb", "Gb", "Tb", "Pb"),
LEGACY = c("B", "KB", "MB", "GB", "TB", "PB")
)
units <- match.arg(units, c("auto", unique(unlist(known_units), use.names = FALSE)))
standard <- match.arg(standard, c("auto", names(known_bases)))
## Infer 'standard' from 'units'?
if (standard == "auto") {
standard <- "legacy" ## default; to become "SI"
if (units != "auto") {
if (grepl("iB$", units)) {
standard <- "IEC"
} else if (grepl("b$", units)) {
standard <- "legacy" ## keep when "SI" is the default
} else if (units == "kB") {
## SPECIAL: Drop when "SI" becomes the default
stop("For SI units, please specify standard = \"SI\"")
}
}
}
base <- known_bases[[standard]]
units_map <- known_units[[standard]]
if (units == "auto") {
power <- if (x <= 0) 0 else min(as.integer(log(x, base = base)), length(units_map) - 1L)
} else {
power <- match(toupper(units), toupper(units_map)) - 1L
if (is.na(power)) {
stop(gettextf("Unit %s is not part of standard %s", sQuote(units), sQuote(standard)))
}
}
unit <- units_map[power + 1L]
## SPECIAL: Use suffix 'bytes' instead of 'b' for 'legacy'
if (power == 0 && standard == "legacy") unit <- "bytes"
paste(round(x / base^power, digits = digits), unit)
}
print.object_size <-
function(x, quote = FALSE, units = "b", standard = "auto", digits = 1L, ...)
{
y <- format.object_size(x, units = units, standard = standard, digits = digits)
if(quote) print.default(y, ...) else cat(y, "\n", sep = "")
invisible(x)
}
Examples and tests
assert_size <- function(x, ..., expected) {
size <- structure(x, class = "object_size")
res <- try(format(size, ...), silent = TRUE)
if (expected == "error") {
if (!inherits(res, "try-error"))
stop(sprintf("Expected %s but got %s", sQuote(expected), sQuote(res)))
} else if (res != expected) {
stop(sprintf("Expected %s but got %s", sQuote(expected), sQuote(res)))
}
}
## The default is the 'legacy' standard (backward compatibility)
assert_size(0, expected = "0 bytes")
assert_size(1, expected = "1 bytes")
assert_size(1023, expected = "1023 bytes")
assert_size(1024, expected = "1024 bytes")
## Standard inferred from 'legacy' units
assert_size(0, units = "b", expected = "0 bytes")
assert_size(1, units = "B", expected = "1 bytes")
assert_size(999, units = "B", expected = "999 bytes")
assert_size(1000, units = "Kb", expected = "1 Kb")
assert_size(1024, units = "KB", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "MB", expected = "1.9 Mb")
assert_size(3.1 * 1000^3, units = "GB", expected = "2.9 Gb")
assert_size(4.2 * 1000^8, units = "TB", expected = "3819877747446.3 Tb")
assert_size(4.2 * 1000^9, units = "Pb", expected = "3730349362740.5 Pb")
## Standard inferred from 'IEC' units
assert_size(1000, units = "KiB", expected = "1 KiB")
assert_size(1024, units = "KiB", expected = "1 KiB")
assert_size(2.0 * 1000^2, units = "MiB", expected = "1.9 MiB")
assert_size(3.1 * 1000^3, units = "GiB", expected = "2.9 GiB")
assert_size(4.2 * 1000^8, units = "TiB", expected = "3819877747446.3 TiB")
assert_size(4.2 * 1000^9, units = "PiB", expected = "3730349362740.5 PiB")
## Inferring standard from 'SI' units is not possible because they
## conflict with 'legacy' units (and it would be confusing to support
## high-range SI units not covered by the legacy units)
assert_size(3.1 * 1024^1, units = "kB", expected = "error")
assert_size(3.1 * 1024^6, units = "EB", expected = "error")
assert_size(3.1 * 1024^7, units = "ZB", expected = "error")
assert_size(3.1 * 1024^8, units = "YB", expected = "error")
## Automatic 'legacy' units (default)
assert_size(0, units = "auto", expected = "0 bytes")
assert_size(1, units = "auto", expected = "1 bytes")
assert_size(1023, units = "auto", expected = "1023 bytes")
assert_size(1024, units = "auto", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "auto", expected = "1.9 Mb")
## Automatic 'legacy' units
assert_size(0, units = "auto", standard = "legacy", expected = "0 bytes")
assert_size(1, units = "auto", standard = "legacy", expected = "1 bytes")
assert_size(1023, units = "auto", standard = "legacy", expected = "1023 bytes")
assert_size(1024, units = "auto", standard = "legacy", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "auto", standard = "legacy", expected = "1.9 Mb")
assert_size(3.1 * 1024^3, units = "auto", standard = "legacy", expected = "3.1 Gb")
assert_size(3.1 * 1024^4, units = "auto", standard = "legacy", expected = "3.1 Tb")
assert_size(3.1 * 1024^5, units = "auto", standard = "legacy", expected = "3.1 Pb")
assert_size(3.1 * 1024^6, units = "auto", standard = "legacy", expected = "3174.4 Pb")
## Automatic 'IEC' units
assert_size(0, units = "auto", standard = "IEC", expected = "0 B")
assert_size(1, units = "auto", standard = "IEC", expected = "1 B")
assert_size(1023, units = "auto", standard = "IEC", expected = "1023 B")
assert_size(1024, units = "auto", standard = "IEC", expected = "1 KiB")
assert_size(2.0 * 1000^2, units = "auto", standard = "IEC", expected = "1.9 MiB")
assert_size(3.1 * 1024^3, units = "auto", standard = "IEC", expected = "3.1 GiB")
assert_size(3.1 * 1024^4, units = "auto", standard = "IEC", expected = "3.1 TiB")
assert_size(3.1 * 1024^5, units = "auto", standard = "IEC", expected = "3.1 PiB")
assert_size(3.1 * 1024^6, units = "auto", standard = "IEC", expected = "3.1 EiB")
assert_size(3.1 * 1024^7, units = "auto", standard = "IEC", expected = "3.1 ZiB")
assert_size(4.2 * 1024^8, units = "auto", standard = "IEC", expected = "4.2 YiB")
assert_size(4.2 * 1024^9, units = "auto", standard = "IEC", expected = "4300.8 YiB")
## Automatic 'SI' units
assert_size(0, units = "auto", standard = "SI", expected = "0 B")
assert_size(1, units = "auto", standard = "SI", expected = "1 B")
assert_size(999, units = "auto", standard = "SI", expected = "999 B")
assert_size(1000, units = "auto", standard = "SI", expected = "1 kB")
assert_size(1024, units = "auto", standard = "SI", expected = "1 kB")
assert_size(2.0 * 1000^2, units = "auto", standard = "SI", expected = "2 MB")
assert_size(3.1 * 1000^3, units = "auto", standard = "SI", expected = "3.1 GB")
assert_size(3.1 * 1000^4, units = "auto", standard = "SI", expected = "3.1 TB")
assert_size(3.1 * 1000^5, units = "auto", standard = "SI", expected = "3.1 PB")
assert_size(3.1 * 1000^6, units = "auto", standard = "SI", expected = "3.1 EB")
assert_size(3.1 * 1000^7, units = "auto", standard = "SI", expected = "3.1 ZB")
assert_size(4.2 * 1000^8, units = "auto", standard = "SI", expected = "4.2 YB")
assert_size(4.2 * 1000^9, units = "auto", standard = "SI", expected = "4200 YB")
UPDATE: 2017-01-01: Forgot that SI uses 'kB'; minor tweaks above.
from wishlist-for-r.
I'll just add a link to a thread on twitter for your future references on this topic: https://twitter.com/henrikbengtsson/status/1231986947360354305
from wishlist-for-r.
Posted PR18297 titled 'Use standard file-size units everywhere in base R (e.g., Mb -> MiB)' on 2022-02-01.
from wishlist-for-r.
SI prefixes RB (ronnabytes) and QB (quettabytes) was has been added to R-devel (to become R 4.3.0), cf. wch/r-source@cd2d0ba
from wishlist-for-r.
One more location to fix, was just added to src/main/memory.c
in R-devel, cf. wch/r-source@459492b.
from wishlist-for-r.
Related Issues (20)
- Rscript -e EXPR fails to launch if stdin is closed HOT 3
- Suggested performance improvements in R HOT 1
- Separate Build-depends: from Suggests: in DESCRIPTION files HOT 9
- Allow setting breakpoints from debugger
- WISH: `drop` = FALSE by default for `[.data.frame` HOT 9
- Fast check for discreteness
- Make UTF-8 the default encoding for package metadata HOT 2
- WISH/ROBUSTNESS: Mechanism to prevent var <<- value from assigning non-existing 'var' HOT 1
- Consistency: fix matrix subsetting behaviour to be consistent with vectors and data.frames.
- Base R function for length(unique(x))
- WISH: Standardized SystemRequirements HOT 1
- `grDevices::dev.capabilities()` enhancements
- Permit larger seed argument values in set.seed() HOT 1
- Wish: base version of glue::glue() HOT 3
- Add predict method for `stats::kmeans()`
- R CMD check: Option for reporting on writes/updates to tools::R_user_dir() during checks HOT 1
- R CMD check: Option for testing with empty tools::R_user_dir() folders
- Control over `NA` equality in `base::rle()`
- WISH: Make serverSocket(0) useful (+ find a random TCP port that can be listened to) HOT 1
- Task & browser hooks to support implementation of a debug adapter protocol client
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wishlist-for-r.