Giter VIP home page Giter VIP logo

Comments (12)

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024 1

UPDATE: SI units are now supported in R-devel, see r71960.

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024 1

Filed PR18435 adding new SI prefixes RB (ronnabytes) and QB (quettabytes) to format() for object_size.

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

As a first step, I just filed a backward-compatible patch to add support for IEC units in utils:::format.object_size(), cf. PR #16649.

UPDATE: This has been implemented as of 2016-01-06 in r69879.

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

IEC units are now supported by R. As the next step, I filed a backward-compatible patch to add support for JEDEC units in utils:::format.object_size(), cf. PR #16657.

from wishlist-for-r.

mmaechler avatar mmaechler commented on May 28, 2024
  • Can you give an example and reference for "The Ubuntu Linux distribution uses the IEC prefixes since 2010" ? Personally, I find the 'KiB' notation quite ugly. I see df -h, du -h, ls -h all use suffixes K, M, G .. but no "iB" (or "B" or "b").
  • The real problem is that the SI standard really want "KB" or "MB" to mean something different than "KiB" or "MiB" and JEDEC does not.... But really the SI system is the world standard one, and JEDEC is mainly "industry" and not science bases (which the SI is). So, in principle --- if we are willing to change back compatibility--- we should really move towards the real world standard, i.e., the SI standard system.... and consequently, I'd be against endorsing JEDEC any more than we do now
    (by accepting it on "input").

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

Thanks for the comments.

  • I got the "Ubuntu" statement from [1], but must have been sloppy. I've now clarified it to say: "The Ubuntu Linux distribution uses the IEC prefixes for base-2 units and SI prefixes for base-10 units" which reflects Ubuntu's official UnitsPolicy.
  • Searching the web, there are references starting ~2010 (around Ubuntu 10.10) saying Ubuntu will move to using decimal/base-10 units with SI prefixes throughout. I don't know where they are regarding that goal.
  • SI vs JEDEC confusion: If I understand your comment correctly, you're saying we'll introduce more confusion if we explicitly add support for JEDEC. If so, I agree with you. My idea was to introduce it properly, to make it explicit that the old R units are home brewed. I'm happy to skip JEDEC.
  • Long-term for R: If this is what you are saying, I agree, supporting both decimal/base-10 and binary/base-2 units, using SI and IEC prefixes respectively, would be ideal. I'm all for that as well. Since R has only single API entry (=utils::format.object_size()) we could even introduce argument base=getOption(object.size.base=2) controlling whether base 2 or base 10 should be displayed (when units="auto"). It would also allow us to migrate from current base 2 to base 10 smoothly (and allow users to undo via the option), if that is where we heading. BTW, gc() should utilize utils::format.object_size().
  • To implementing the transition from R's current base-2 units (Kb, Mb, Gb) to SI/base-10 units (kB, MB, GB), it might be less of a shock if one does this in few release cycles:
    1. Switch to using IEC/base-2 units (KiB, MiB, GiB, ...) for units="auto".
    2. Deprecate explicit usage of units="Kb", units="Mb", ...
    3. Switch to using SI/base-10 units (kB, MB, GB, ...) for units="auto".

What do you think?

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

Another approach that could work is to add support for units="IEC", units="SI" and units="legacy". That can be done without breaking backward compatibilty. The units="auto" can equal units="legacy" and any future transitions can be in what units="auto" corresponds to.

UPDATE: The issues with this is that it's not possible to control whether units="MB" is meant to be current R "legacy" (base-2) units or SI (base-10) units.

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

Here's my new proposal for supporting "legacy", IEC and SI units in a backward compatible way and such that it will be easy to switch from today's default "legacy" to SI units at some point in R's future.

The file to be updated in R is src/library/utils/R/object.size.R:

object.size <- function(x)
    structure(.Call(C_objectSize, x), class = "object_size")

format.object_size <- function(x, units = "b", standard = "auto", digits = 1L, ...)
{
    known_bases <- c(legacy = 1024, IEC = 1024, SI = 1000)
    known_units <- list(
        SI      =  c("B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"),
        IEC     =  c("B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"),
        legacy  =  c("b", "Kb", "Mb", "Gb", "Tb", "Pb"),
        LEGACY  =  c("B", "KB", "MB", "GB", "TB", "PB")
    )

    units <- match.arg(units, c("auto", unique(unlist(known_units), use.names = FALSE)))
    standard <- match.arg(standard, c("auto", names(known_bases)))

    ## Infer 'standard' from 'units'?
    if (standard == "auto") {
        standard <- "legacy"           ## default; to become "SI"
        if (units != "auto") {
            if (grepl("iB$", units)) {
                standard <- "IEC"
            } else if (grepl("b$", units)) {
                standard <- "legacy"   ## keep when "SI" is the default
            } else if (units == "kB") {
	        ## SPECIAL: Drop when "SI" becomes the default
                stop("For SI units, please specify standard = \"SI\"")
	    }
        }
    }

    base <- known_bases[[standard]]
    units_map <- known_units[[standard]]

    if (units == "auto") {
        power <- if (x <= 0) 0 else min(as.integer(log(x, base = base)), length(units_map) - 1L)
    } else {
        power <- match(toupper(units), toupper(units_map)) - 1L
        if (is.na(power)) {
            stop(gettextf("Unit %s is not part of standard %s", sQuote(units), sQuote(standard)))
        }
    }

    unit <- units_map[power + 1L]

    ## SPECIAL: Use suffix 'bytes' instead of 'b' for 'legacy'
    if (power == 0 && standard == "legacy") unit <- "bytes"
    
    paste(round(x / base^power, digits = digits), unit)
}

print.object_size <-
    function(x, quote = FALSE, units = "b", standard = "auto", digits = 1L, ...)
{
    y <- format.object_size(x, units = units, standard = standard, digits = digits)
    if(quote) print.default(y, ...) else cat(y, "\n", sep = "")
    invisible(x)
}

Examples and tests

assert_size <- function(x, ..., expected) {
    size <- structure(x, class = "object_size")
    res <- try(format(size, ...), silent = TRUE)
    if (expected == "error") {
        if (!inherits(res, "try-error"))
            stop(sprintf("Expected %s but got %s", sQuote(expected), sQuote(res)))
    } else if (res != expected) {
        stop(sprintf("Expected %s but got %s", sQuote(expected), sQuote(res)))
    }
}

## The default is the 'legacy' standard (backward compatibility)
assert_size(0,    expected = "0 bytes")
assert_size(1,    expected = "1 bytes")
assert_size(1023, expected = "1023 bytes")
assert_size(1024, expected = "1024 bytes")

## Standard inferred from 'legacy' units
assert_size(0,            units = "b",  expected = "0 bytes")
assert_size(1,            units = "B",  expected = "1 bytes")
assert_size(999,          units = "B",  expected = "999 bytes")
assert_size(1000,         units = "Kb", expected = "1 Kb")
assert_size(1024,         units = "KB", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "MB", expected = "1.9 Mb")
assert_size(3.1 * 1000^3, units = "GB", expected = "2.9 Gb")
assert_size(4.2 * 1000^8, units = "TB", expected = "3819877747446.3 Tb")
assert_size(4.2 * 1000^9, units = "Pb", expected = "3730349362740.5 Pb")

## Standard inferred from 'IEC' units
assert_size(1000,         units = "KiB", expected = "1 KiB")
assert_size(1024,         units = "KiB", expected = "1 KiB")
assert_size(2.0 * 1000^2, units = "MiB", expected = "1.9 MiB")
assert_size(3.1 * 1000^3, units = "GiB", expected = "2.9 GiB")
assert_size(4.2 * 1000^8, units = "TiB", expected = "3819877747446.3 TiB")
assert_size(4.2 * 1000^9, units = "PiB", expected = "3730349362740.5 PiB")

## Inferring standard from 'SI' units is not possible because they
## conflict with 'legacy' units (and it would be confusing to support
## high-range SI units not covered by the legacy units)
assert_size(3.1 * 1024^1, units = "kB", expected = "error")
assert_size(3.1 * 1024^6, units = "EB", expected = "error")
assert_size(3.1 * 1024^7, units = "ZB", expected = "error")
assert_size(3.1 * 1024^8, units = "YB", expected = "error")


## Automatic 'legacy' units (default)
assert_size(0,            units = "auto", expected = "0 bytes")
assert_size(1,            units = "auto", expected = "1 bytes")
assert_size(1023,         units = "auto", expected = "1023 bytes")
assert_size(1024,         units = "auto", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "auto", expected = "1.9 Mb")

## Automatic 'legacy' units
assert_size(0,            units = "auto", standard = "legacy", expected = "0 bytes")
assert_size(1,            units = "auto", standard = "legacy", expected = "1 bytes")
assert_size(1023,         units = "auto", standard = "legacy", expected = "1023 bytes")
assert_size(1024,         units = "auto", standard = "legacy", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "auto", standard = "legacy", expected = "1.9 Mb")
assert_size(3.1 * 1024^3, units = "auto", standard = "legacy", expected = "3.1 Gb")
assert_size(3.1 * 1024^4, units = "auto", standard = "legacy", expected = "3.1 Tb")
assert_size(3.1 * 1024^5, units = "auto", standard = "legacy", expected = "3.1 Pb")
assert_size(3.1 * 1024^6, units = "auto", standard = "legacy", expected = "3174.4 Pb")

## Automatic 'IEC' units
assert_size(0,            units = "auto", standard = "IEC", expected = "0 B")
assert_size(1,            units = "auto", standard = "IEC", expected = "1 B")
assert_size(1023,         units = "auto", standard = "IEC", expected = "1023 B")
assert_size(1024,         units = "auto", standard = "IEC", expected = "1 KiB")
assert_size(2.0 * 1000^2, units = "auto", standard = "IEC", expected = "1.9 MiB")
assert_size(3.1 * 1024^3, units = "auto", standard = "IEC", expected = "3.1 GiB")
assert_size(3.1 * 1024^4, units = "auto", standard = "IEC", expected = "3.1 TiB")
assert_size(3.1 * 1024^5, units = "auto", standard = "IEC", expected = "3.1 PiB")
assert_size(3.1 * 1024^6, units = "auto", standard = "IEC", expected = "3.1 EiB")
assert_size(3.1 * 1024^7, units = "auto", standard = "IEC", expected = "3.1 ZiB")
assert_size(4.2 * 1024^8, units = "auto", standard = "IEC", expected = "4.2 YiB")
assert_size(4.2 * 1024^9, units = "auto", standard = "IEC", expected = "4300.8 YiB")

## Automatic 'SI' units
assert_size(0,            units = "auto", standard = "SI", expected = "0 B")
assert_size(1,            units = "auto", standard = "SI", expected = "1 B")
assert_size(999,          units = "auto", standard = "SI", expected = "999 B")
assert_size(1000,         units = "auto", standard = "SI", expected = "1 kB")
assert_size(1024,         units = "auto", standard = "SI", expected = "1 kB")
assert_size(2.0 * 1000^2, units = "auto", standard = "SI", expected = "2 MB")
assert_size(3.1 * 1000^3, units = "auto", standard = "SI", expected = "3.1 GB")
assert_size(3.1 * 1000^4, units = "auto", standard = "SI", expected = "3.1 TB")
assert_size(3.1 * 1000^5, units = "auto", standard = "SI", expected = "3.1 PB")
assert_size(3.1 * 1000^6, units = "auto", standard = "SI", expected = "3.1 EB")
assert_size(3.1 * 1000^7, units = "auto", standard = "SI", expected = "3.1 ZB")
assert_size(4.2 * 1000^8, units = "auto", standard = "SI", expected = "4.2 YB")
assert_size(4.2 * 1000^9, units = "auto", standard = "SI", expected = "4200 YB")

UPDATE: 2017-01-01: Forgot that SI uses 'kB'; minor tweaks above.

from wishlist-for-r.

llrs avatar llrs commented on May 28, 2024

I'll just add a link to a thread on twitter for your future references on this topic: https://twitter.com/henrikbengtsson/status/1231986947360354305

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

Posted PR18297 titled 'Use standard file-size units everywhere in base R (e.g., Mb -> MiB)' on 2022-02-01.

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

SI prefixes RB (ronnabytes) and QB (quettabytes) was has been added to R-devel (to become R 4.3.0), cf. wch/r-source@cd2d0ba

from wishlist-for-r.

HenrikBengtsson avatar HenrikBengtsson commented on May 28, 2024

One more location to fix, was just added to src/main/memory.c in R-devel, cf. wch/r-source@459492b.

from wishlist-for-r.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.