Comments (23)
I've got a lot of code to do this. And not just for weekends either...it will work for public holidays in defined locales. I will think about how to do this..
from lubridate.
Imanuel, that would be excellent.
from lubridate.
I will start with business_days()
. I think its argument should be numeric (e.g. 3) rather than a Period (e.g. days(3)) and should return a Period. Having a Period argument feels like doubling up.
Did you want to make business_days
general enough so that in the future it can deal with public holidays (e.g. Christmas)?
from lubridate.
If I understand the idea correctly, the semantics should be exactly as that of days(). So it should accept a numeric argument. It should not return a period, but rather an object of class that derives from "Period". May be "Working Period", "Business Period" or whatever.
from lubridate.
Alternatively new arguments can be added to existing functions: days(30,
keep="business"), days(20, keep="holidays"), days(3, keep="weekend").
Exclusion can be useful: days(30, exclude="weekend"), days(30,
exclude=c("Monday", "Saturday").
Further examples: years(4, exclude="leap"), seconds(4, keep="odd")
Not sure about keep <> exclude. May be keep<>skip.
from lubridate.
@Vitoshka thanks for the comments. Creating a subclass of Period might work. However, what I've done at the moment is more the alternative you present. The way I see it:
- A period is a period. Whether or not it is a calendar period (e.g. days, months etc) or not (e.g. business days). Existing periods don't have exact lengths and depend on the starting / end date. Same deal for business days - albeit a little more rubbery.
- Working day periods have calendar day equivalents (given a reference date). For example,
ymd(20131005) + business_days(1) # 2013-10-07 UTC
ymd(20131005) + days(2) # 2013-10-07 UTC
So what I've done is created a new slot in the Period
class, bday
which is used more or less solely so that the code can figure out the equivalent day
value using a new function, bdays_to_cdays
. I'm not sure its the best way of tackling this, but it reflects reality (x business days = y calendar days given a particular reference date) and it makes the code update easier because I'm really only tapping into existing functionality.
In terms of semantics and dealing with public holidays, you need to be able to specify an exact political locale. For example, the 25 April is a public holiday in most cities in Australia but isn't a public holiday in most parts of the US. The locale should be city specific (perhaps identified by UN/LOCODE as above), because different cities in the same larger political entity have different public holidays (e.g. Melbourne Cup is an official public holiday in Melbourne Australia but not in Sydney Australia). This locale is endogenous to the business_day
concept and as such I would argue that it is necessary to tweak the semantics so that the business_day()
includes a locale
argument (which for eg days()
does not have).
There is another issue related to public holidays which I haven't yet resolved: how do you store locale alongside bday in the Period
class? One option I am considering is using names
attributes for the bday
slot. Where no names are provided, locales are not used and default to excluding weekends only.
In terms of task left for me:
- Update / write documentation to reflect changes / additions I have made
- Write new test cases for
business_day()
(yes yes should have done that earlier) - Make sure other test cases pass with flying colours
- Decide whether or not to implement public holiday elements to
business_day()
and if so, how. This might have to wait until later but will try to keep the code flexible enough to accommodate this possibility.
from lubridate.
Imanuel Costigan on Sun, 06 Oct 2013 14:24:04 -0700 wrote:
[...]
So what I've done is created a new slot in the Period class, bday which is used
more or less solely so that the code can figure out the equivalent day value
using a new function, bdays_to_cdays. I'm not sure its the best way of tackling
this, but it reflects reality (x business days = y calendar days given a
particular reference date) and it makes the code update easier because I'm
really only tapping into existing functionality.In terms of semantics and dealing with public holidays, you need to be able to
specify an exact political locale.
Indeed. And this is why days(20, skip=holidays("Melbourne/Australia"))
might work better.
How about instead of @bday slot in period class to use a more generic
@Skip and @keep slots? If we decide to add this functionality, it better
be as generic as it possibly can be, even if it might be a bit more
work.
@Skip and @keep should be of an union class of period enumeration
classes. For example "holidays_enum" class is an enumeration of holidays
in a specific locale (also containing a slot to indicate the locale);
"leap_years_enum" is an enumeration of leap years; "wdays_enum" is an
enumeration of week days to skip etc.
There is another issue related to public holidays which I haven't yet
resolved: how do you store locale alongside bday in the Period class?
One option I am considering is using names attributes for the bday
slot. Where no names are provided, locales are not used and default
to excluding weekends only.
Please don't use attributes. Just define a new class with locale slot
and use it as a slot in Period class.
In terms of task left for me:
1 Update / write documentation to reflect changes / additions I have made
2 Write new test cases for business_day() (yes yes should have done that
earlier)
3 Make sure other test cases pass with flying colours
4 Decide whether or not to implement public holiday elements to business_day()
and if so, how. This might have to wait until later but will try to keep the
code flexible enough to accommodate this possibility.
Sounds good. But, let's figure out the interface first. Nobody wants to
do double work;)
Thanks,
Vitalie
from lubridate.
I'm not convinced by your interface proposal. Your suggestion does seem a bit more algorithmically elegant. However:
- Distinguishing between calendar
days
andbusiness_days
adds a bit more of a sharper distinction between the two period types (perhaps just likedays
andmonths
are distinguished - albeit a little less so). - Existing understanding of
days
will not be interrupted - Using
days
for both calendar and business days will require much more rework of existing code
See commit for my proposed approach.
In terms of the holiday_enum class, are you suggesting explicit dates be stored therein? I think that would not be feasible. It would be "easier" to functionally specify bad date checkers for specific locales (along with any aberrations from general holiday rules) than maintain a vector of dates. See the bday.r of my lubridate fork for a skeleton of the approach that might be taken.
from lubridate.
Comments are below. Here is an OO version of your code, specifically
your "bad_day_checker" idea.
Add a slot "skip" to Period class of virtual class "lubridateEnum".
Now, for each "thing" you want to skip define a class that derives from
"lubridateEnum" and define a methods "continuous_period" with 2
arguments which will convert Periods with jumps into standard
periods. First argument is of the derived class "lubridateEnum", second
is the Period. The dispatch is over the first argument. Return value of
"continuous_period" is a "standard" Period with "skip" slot NULL.
Examples.
You want to skip holidays. Then define derived "holidays" class (with
one slot: "location") and define "continuous_period(holidays, period)".
Same for weekends. Define derived class "weekends" from "lubridateEnum"
and define "continuous_period" for lubridateEnum.
Same for leap_years, odd_seconds etc. Now each time someone want a
weirdo skipping mechanism he/she can simply define their class and
"continuous_period" conversion method.
Imanuel Costigan on Sun, 06 Oct 2013 17:38:38 -0700 wrote:
1 Distinguishing between calendar days and business_days adds a bit more of a
sharper distinction between the two period types (perhaps just like days and
months are distinguished - albeit a little less so).
You don't distinguish period "types" either. You use the same class to
represent both. My whole point is that if you start altering Period
class by adding new slots, better add a generic slot which is
potentially useful in many circumstances.
In your version Period class has @DayS and @bdays containing redundant
information. Not good.
Note that month, days etc are not separate types ether, they are all
represented by the same class.
2 Existing understanding of days will not be interrupted
Again, you are altering the Period for the sake of business days only,
and thus modify the "existing understanding of days". That you don't
modify the constructor is a completely different issue. Less
constructors you have, more elegant is the language. Lubridate already
has enough constructors to hang yourself.
3 Using days for both calendar and business days will require much more rework
of existing code
See comment 2 above. You are already using it that way. It is not more
code, really.
See commit for my proposed approach.
Something went wrong. There are a lot of unrelated changes showing on
github. Might be a github diff issue.
In terms of the holiday_enum class, are you suggesting explicit dates be stored
therein?
Not necessary. It can be a simple label indicating the locale. Whenever
the computation on holidays is needed the code will use the central
database.
I think that would not be feasible. It would be "easier" to
functionally specify bad date checkers for specific locales (along
with any aberrations from general holiday rules) than maintain a
vector of dates. See the bday.r of my lubridate fork for a skeleton
of the approach that might be taken.
If someone will want to skip only "Mondays" or "leap year" will you add
"lyear" and "wday" slots to Period class? I am sorry but this doesn't
look like the way to go.
from lubridate.
User interface
Let's start with some use cases (of interest to me) to clearly see the user interface:
now() + months(1)
now() + months(1, spring=Weekends(locale = 'au/syd', roll='f', end_to_end=TRUE))
now() + months(1, spring=Holidays(locale='au/syd', roll='f', end_to_end=TRUE))
Comments:
- The first case is simply what is currently occurring. Implementation wise,
months(1) == months(1, spring=NULL)
- The second case would modify the first. It would spring forward one month (as per usual). If that date falls on a weekend, that date is rolled to the following (
'f'
for following) date which a weekday. If the end-to-end convention applies (end_to_end = TRUE
), then ifnow()
is the last weekday of the month, the resulting date would also be the last weekday of its month. It may be possible to add alocale
argument toWeekends()
as this may be defined differently across the world (especially between the Western world and say the Middle East). - The third case is a simple expansion of the concept in the second case. Instead of weekends being "bad days", weekends and gazetted public holidays in the given
locale
would be bad days. The roll and end-to-end conventions apply in a similar fashion. - The
locale
parameter should be based on the UN/LOCODE with the first bit representing the country and the second bit the city. IATA codes may be alternative. However, these are tied particularly to airports and some places don't have airports. The UN/LOCODE database would be more comprehensive. - The
roll
parameter could take a number of different options. I would expect to include the business day roll conventions common in financial markets (see Chapter 4). - The end-to-end convention is described above and should be
logical
.
Sketch of other potential uses:
now() + days(2, spring=Holidays(locale='au/syd', roll='f', end_to_end=FALSE))
now() + years(1, spring=Holidays(locale='au/syd', roll='f', end_to_end=TRUE))
It might also be desirable to have union / intersection of holidays.
# Union
now() + years(1, spring=Holidays(locale=c('au/syd,us/nyc', roll='f', end_to_end=TRUE))
# Intersection
now() + years(1, spring=Holidays(locale=c('au/syd&us/nyc', roll='f', end_to_end=TRUE))
OK so this is what I would envisage being the user interface. The next step is describing a good implementation of this.
from lubridate.
Implementation
(A work in progress)
Let's work backward on implementation.
From the above, the formal arguments of the Period
constructors, days()
, weeks()
, months()
etc will need to be expanded to include spring
. These constructors are enclose of new_period()
.
This additional formal argument will be derived from the virtual class Springer
. Two derived classes of this have been hinted at in the examples above: Weekends
and Holidays
. Their constructors will bear the same name. Holidays
should be derived from the Weekend
class as weekends are also deemed to be holidays (for my purposes). Also create a derived class, say NoSpringer()
that does nothing (maintains current behaviour). This is slightly cleaner than class union approach. If we are going to introduce classes to this, we may as well do this fully.
In terms of how the ops are modified to deal with springers, would define appropriate generics that have the operands as formala arguments, with method dispatch based on the class of the Springer. This would be called from the relevant generic ops method for Period classes.
...
Tasks:
- Write classes
Springer
,NoSpringer
,Weekends
andHolidays
- Write constructors for these classes bearing the same names.
- modify
days()
,weeks()
,months()
etc for new formal argument,spring
. Set its default value toNULL
. - Write
as.character()
method forSpringer
classes. These can be used in show methods forPeriod
- Write
spring
generic with methods for the Springers above and embed this inadd_period_to_date()
.
from lubridate.
I have a couple of questions on this design:
- Why "spring" and not "skip" or "exclude"? Don't you just want to exclude Weekends/Holidays when computing a period? Spring is also a season, and might be confusing with months and years.
- What is the effect of
now() + months(1, spring=Holidays(locale='au/syd', roll='f', end_to_end=TRUE))
andnow() + years(1, spring=Holidays(locale=c('au/syd&us/nyc', roll='f', end_to_end=TRUE))
. - Do we need end_to_end as part of the period class? We already have %m+% to roll to end of the month. It might be a good idea, though.
from lubridate.
Instead of modifying everything that already exists and then working with a time zone like system for holiday locales, might it be easier to create a new S4 class? It is a bit of work (especially to document), but then you can handle the behavior in the class methods for the math operators. This is how lubridate approaches the other timespan classes. To keep things simple you can let individual users create their own holiday schedules.
The idea would be to create a new timespan object that behaves like a period but has a slot that contains forbidden dates, perhaps in some sort of calendar object.
work_days <- new_odd_period("day", skip = my_holidays)
now() + work_days(5)
To second idea is to save work by letting users assemble their own list of holidays. You might need to think about what tools can help them here.
my_holidays <- ymd(20140101, 20141225, 20140704)
# or
# every weekend in 2014
my_holidays <- calendar(2014)[wday(calendar(2014)) == 1 | wday(calendar(2014)) == 7]
You could use this system to create a dedicated week_day
time span (e.g. skips weekends). week_day
could come in lubridate ready to use, but also serve as an example for a vignette that describes how to make a customized time span with the system.
Making the math work quickly and correctly seems like a challenge to me, but that will be a challenge no matter the interface.
As a bonus, the new class will be independent of the stable parts of lubridate, it will have its own dedicated documentation pages, and you could split it off as its own package if that appeals to you.
from lubridate.
@garrettgman a few responses:
- I've gone down the rabbit hole of creating vectors of dates that are bad days. This is quite a lot of work, you have to find a reliable source of all such dates for many years in advance, and I'd rather implement something more formulaic (which works pretty well in my experience).
- I agree that there's merit to creating a new class derived from Timespan (or possibly an extension of
Period
?). As you mentioned, it will prevent the existing stable Period class and methods from being polluted. That's a pretty big plus. The downside is that there will be yet another class available to represent timespans. But given that we are creating a new class anyway (Springer
at the moment), this may not be such a big deal. Although this is something that is hidden from users unless they want to create a Springer.
As the maintainer, how strongly do you feel about hiving this off into a new class derived from Timespan? It's your garden that we're playing in.
@Vitoshka a few responses:
- I have changed the
spring
slot tospringer
in thePeriod
class and reservedspring
for method names (i.e. more clearly a verb). There is still a danger to using spring. I'll rethink (and in light of @garrettgman suggestions). - the effect of the first call is to return the date one month from
now()
. However, if that date is not a business day in Sydney, then the following business date will be returned. In the case whennow()
is the last business day of the month andend_to_end=TRUE
, the call will return the last business day of the next month. Same goes for the other call. In a sense this isn't important. You can create your ownSpringer()
classes to behave differently.
from lubridate.
@Vitoshka perhaps Calendar
is more appropriate class name than Springer
and roll()
instead of spring()
.
from lubridate.
@imanuelcostigan I don't see a new class as a downside. In general, it is harder to maintain arguments than to maintain new classes and functions. Since arguments affect every instance of a function, they are more likely to create unintended side effects. Spreading a function over many cases also shrinks your options when you have to debug. Ideally, if you develop a timespan that behaves in a new way, I'd like to see that in a new class. If you develop a new way to add dates, I'd like to see that in its own function or operator.
A. I think that you can stay out of rabbit holes if you give your users the tools to create their own vectors of dates. That will give them complete control over the process as well. I see why you'd like to give them preset dates or locales. I just don't see how you'll be able to compile all of that data, or how a user could tweak it should they wish. That said, maybe a vector of dates won't actually help you when you implement the math.
You can make vectors easier for users with helper functions.
# 1. Use one of these to create a convenient set of dates
calendar <- function(year) {
start <- ymd(paste0(year, "0101"))
start + days(0:(364 + leap_year(start)))
}
decade <- function(year) { #something similar }
century <- function(year) { #something similar }
USholidays <- function(year) {
ymd(paste0(year, .us_holidays))
}
dates <- calendar(2014)
# Remove dates with logical tests
# just weekends in 2014
weekend <- function(x) wday(x) %in% c(1, 7)
dates <- dates[weekend(dates)]
# add US thanksgiving, the last Thurs of November
nov <- ymd(20141101) + days(0:30)
thurs <- nov[wday(nov) == 5]
thanksgiving <- thurs[length(thurs)]
dates <- c(dates, thanksgiving)
B. May I ask for clarification?
Are you suggesting (1) a new way to do math, like
ymd(20140205) + bdays(10) = 2014-02-17 # because 2014-02-15 was a Saturday
Here the Periods don't change; the math operator modifies its result to avoid business days. So I'd suggest a new operator or function instead of an argument. %m+%
is an example, but a function might work better
roll_forward_days(ymd(20140205) + bdays(10), avoid = "weekends")
Or are you suggesting (2) a new type of timespan that follows an altered time line, like
ymd(20140205) + bdays(10) = 2014-02-19 # because this is the tenth bday after Feb 5.
Here bdays follow a timeline that does not include weekends. That's a significant change from regular periods, so I'd suggest a new timespan class that uses its own methods, for arithmetic and anything else that seems appropriate.
from lubridate.
- Providing an API for user provided "bad days" might be useful. However, it is much less useful to me than a functional implementation and so is going to be less of a priority for me.
- I'd recast the first example you provide as
roll_forward_days(ymd(20140205) + days(10), avoid = "weekends")
as more likely in the cases I'm considering. The second example is also is another use case. In the first case, you are simply adding 10 calendar days to an instant and rolling the result if it falls on a weekend (according to some rule). In the second case, you are adding 10 business days to an instant. Basically when you are in the environment I'm interested in, you would rather not worry about "bad" days. You want to pass them over one way or the other.
I take your points about expanding existing class and modding its functions vs. new class(es) and new functions (or new methods as is appropriate) and don't disagree. That's sitting on one side of a scale. The other side of the scale is that this makes user interaction more complex. Users will need to know that in order to tap into the behaviour I'm wanting, they need to use different sets of period constructors etc. Perhaps not a great additional cognitive tax; but one nevertheless. In the implementation I'm considering (see calendars.R and periods.R in my fork), users can still using month()
etc without worrying about all the guff I'm wanting to load up as default options will mean they end up where they are currently landing. Granted, if hacking existing classes and methods introduces bugs, then that's arguably worse in user land than more complex interactions. But isn't that why we test? It makes developer land harder, but user land is simplified (assuming the implementation is not messed up as a result).
In any case, I'm happy to extend Timespan
to get what I'm looking for. It will mean duplicating large slabs of the Period infrastructure for minor code modifications.
Appreciate your feedback.
from lubridate.
Lubridate is already quite complex in terms of the number of different time-span classes and various constructors. Given that the new "odd_periods" conceptually behaves as "period" it might be worth re-using the same constructors which would return different classes depending on the given parameters. By adding new subclasses of a period one would encourage proliferation of nonsensical constructors: bdays, bmonths, my_weired_holidays etc. Instead, as Imanuel intends, one can define new behavior through the date skipping classes.
One can also use a general "elastic_Period" class with a classed slot ("skipper" or "springer" or whatever) to be a subclass of "period". Then days(10, skip=Holidays())
will return elastic_Period
with "skipper" slot set to the object returned by Holidays()
. This is what Imanuel is after, if I understand correctly. But, then what's the point of adding new class elastic_Period
if one can simply generalize Period
by one slot, and keep the consistency of all the constructors?
The modifications to the current UI, as Imanuel proposes, are minimal and the behavior is still extensible in a full S4 way though the skipper classes.
from lubridate.
Guys, I'm not convinced.
We test because as code becomes complex, it is difficult to maintain it without introducing unintended bugs. We should modularize for the same reason.
It is not a cognitive tax to provide a new function or a new object class. The simplest way to introduce a new behavior is to give the user a new word that clearly defines it. In other words, the code must provide a language that is precise. The cognitive tax appears when you do not clearly separate things that act different. A user (and a developer) must ask how the skip class would change all of the current ways you can use a period. What happens when you
- add a period with a skip class to one without, or one with a different skip class
- modulo a period by a period with a skip class
- divide an interval by a period with a skip class
- coerce a period with a skip class to an interval by associating it with an instant
- coerce a period with a skip class to a duration
- logically compare the lengths of two periods with different skip classes
- access the minutes, hours, days, etc. element of a period with a skip class
- combine periods with two different skip classes into a vector
- do %m+% math which rolls back to the end of a month, when the end of the month is a skipped date
- etc.
It is simpler for everyone to write a distinct class method (for a distinct class) to handle each of these, than to make each of the period methods behave in different ways in the presence of a skip class. You can still provide helper functions that keep the UI intuitive, e.g.
now() + business_days(2)
If you just want to do math with existing periods under new conditions (so the results avoid certain dates), than it would be simpler to create a new function than a new S4 class.
It is also simpler to set up a system that behaves consistently than to try to protect the users from making odd elastic periods. The guy who needs to make my_weird_holidays
is going to file a bug when he cannot.
If lubridate is already too complex due to a proliferation of object classes, then this suggest that we should start thinking about a separate package.
from lubridate.
I did quite like the possibility of modifying the existing constructor APIs as it allows one to modify the operators. But the cost is a less obvious definition (as Garrett has demonstrated).
@garrettgman said:
If you just want to do math with existing periods under new conditions (so the results avoid certain dates), than it would be simpler to create a new function than a new S4 class.
Thinking about it overnight, I think this might be the best compromise. In essence here's what I would like to do:
- Create new classes
PeriodWithCalendar
andCalendar
. The form derives fromPeriod
with an extra slot for an object of typeCalendar
. The latter's definition is obvious. - Create the
business_day()
constructor for aPeriodWithCalendar
object with one argument,calendar
and define all the various operations associated with this. - I will need to write a new generic function
shift
that imitates+
and-
and takes as an arguments a vector of dates, a vector of periods (or perhaps a character vector that maps to Periods) and a roll convention. This is because some of the roll conventions I'm thinking of applying need to know the instant to which the period is being applied (and not just the result of the ops(instant, Period). This is why I was keen to modify the existing Period constructors. - The roll operations that I've defined will be defined in a new generic function
roll
which takes three arguments: a vector of dates,PeriodWithCalendar
object and the roll convention.
I'll see how I get on with this and send through a pull request for feedback.
from lubridate.
I've spun this functionality out into a separate package: https://github.com/imanuelcostigan/fmdates
from lubridate.
@imanuelcostigan, that's great! I guess fmdates
would also be a proper place for handling holidays as per #341.
from lubridate.
There's now also https://github.com/DavisVaughan/almanac — I think it's now clear this is sufficiently complex that it belongs outside of lubridate.
from lubridate.
Related Issues (20)
- `parse_date_time()` cannot match missing zeroes
- Inconsistent behavior of `parse_date_time()` inside `dplyr::mutate()` HOT 2
- month() and otehrs fail on objects from class 'timeDate'
- ymd_hms() function left-pads some dates that have missing "seconds" values
- round_date in 0.1 sec doesn't work correctly
- FR: int_overlaps with exclusive endpoints
- unique() always zero for periods HOT 1
- Implement Set Operations methods for Dates HOT 1
- Parsing dates with `my` seems to have a limit size
- Do we need something like `%m-%` to subtract years from leap 02/29? HOT 1
- Cannot compute `<date> + lubridate::year(1)` when `<date>` is a leap day. HOT 1
- m:s:ms time data
- `dmy()` not failing (and returning incorrect date) on wrong date format
- ceiling_date() issue when using multi units
- Fractional Seconds with conversion and rounding/truncation?
- mdy("04 July 2019") GIVES "2019-04-20" : Instead should give an error.
- data.table merge doesn't work with intervals HOT 1
- yearmonth() throws an error with C_force_tz
- Feature request: Excel date origin
- how about adding lubridate hex log to tidyverse main page? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lubridate.