Comments (14)
@dylannalex Played around with the class and it looks great. Good job. Since both "Email" and "Date" are done, I'm closing this issue.
from pregex.
Hi, are you already working on it? I can try adding an Email and Date classes if you want.
from pregex.
@dylannalex I've actually finished the Email one, though Date is yet to be made. You can have a go at it, it first needs some thought on the design though. I'm thinking of it having a single string parameter "format", through which you define the format of the date that you're willing to match, e.g. "mm/dd/yyyy". You can find more formats here. The other thing i'm thinking is that we could have Date(*formats), so you can match many formats with one instance. So, instead of one having to do:
from pregex import *
date1 = Date("mm/dd/yyyy")
date2 = Date("dd/mm/yyyy")
dates = op.Either(date1, date2)
they can just do:
from pregex import *
dates = Date("mm/dd/yyyy", "dd/mm/yyyy")
In case no formats are provided, then the deafult will be to match any date format. What do you think?
I'm gonna create a branch called v2.0.1, as well as another branch based on this issue. You can work on it there.
from pregex.
Hi @manoss96, I've been working on the Date class. Here's what I came up with:
Date formats
I added Date.date_formats
, a Date class attribute that contains all valid date formats:
from pregex import *
Date.date_formats
>>> ('mm/dd/yyyy', 'dd/mm/yyyy', 'yyyy/mm/dd')
By default, Date matches any date format in Date.date_formats
.
Date arguments
I followed your suggestion and let the user match many formats with a single Date instance.
from pregex import *
text ="""
01/11/2001
12/09/1996
1875/11/02
"""
pre1 = Date()
pre1.get_matches(text)
>>> ['01/11/2001', '12/09/1996', '1875/11/02']
pre2 = Date("dd/mm/yyyy")
pre2.get_matches(text)
>>> ['01/11/2001', '12/09/1996']
pre3 = Date("dd/mm/yyyy", "yyyy/mm/dd")
pre3.get_matches(text)
>>> ['01/11/2001', '12/09/1996', '1875/11/02']
Note: Date converts all uppercase characters in a date format into lowercase characters (e.g. "DD/MM/YYYY" is converted to "dd/mm/yyyy")
Invalid formats
The given formats are compared to date formats on Date.date_formats
. When an invalid format is found, Date raises InvalidArgumentValueException
.
pre = Date("dd/mm/yyy")
>>> pregex.core.exceptions.InvalidArgumentValueException: Provided date format "dd/mm/yyy" is not valid.
Let me know your thoughts. I'm up to adding more features or improving any aspect you consider!
from pregex.
I think this should also consider short-hand notations for years such as 02
for 2002
. It might also make sense to add notations for time as well. Something like the strptime
function in the datetime module.
For example, you could have D/M/y
to match things like 01/03/02
, but D/M/Y
to match stuff like 01/03/2002
.
I feel like this format makes sense because it's already synonymous with other Python libraries and won't be a hassle for users to learn.
from pregex.
@dylannalex Looks great, good job! As for the formats, I suggest that we follow this notation. That way we can have all lowercase while at the same time we can differentiate between 2002 and 02 like @alansun17904 said. For now, I'd say that implementing any valid combination of "d/dd", "m/mm", and "yy/yyyy" along with separators "/" and "-", is good enough. In the future, more formats might follow.
To wrap up, I suggest the following list of formats:
- d/m/yy
- dd/m/yy
- d/mm/yy
- dd/mm/yy
- d/m/yyyy
- dd/m/yyyy
- d/mm/yyyy
- dd/mm/yyyy
- m/d/yy
- mm/d/yy
- m/dd/yy
- mm/dd/yy
- m/d/yyyy
- mm/d/yyyy
- m/dd/yyyy
- mm/dd/yyyy
- yy/m/d
- yyyy/m/d
- yy/mm/d
- yyyy/mm/d
- yy/m/dd
- yyyy/m/dd
- yy/mm/dd
- yyyy/mm/dd
Plus all of the above using the "-" separator, suming to a total of 24 + 24 = 48 different formats.
I don't know about your current implementation, but I suggest having a dictionary of 6 different keys, namely "d", "dd", "m", "mm", "yy" and "yyyy", each mapping to a different pre-defined "Pregex" instance for matching each possible part of the date. Then it's just a matter of combining these instances together, separated by either "/" or "-". How's that sound?
from pregex.
Sounds great, @manoss96! Thank you and @alansun17904 for the help!
About what @alansun17904 said, I'd avoid date time values for now, since I consider it would be better to have a Date class for matching only dates and a Time class for matching time values. Once we have these two classes working, implementing a DateTime class should be as easy as merging Date and Time.
from pregex.
About what @alansun17904 said, I'd avoid date time values for now, since I consider it would be better to have a Date class for matching only dates and a Time class for matching time values. Once we have these two classes working, implementing a DateTime class should be as easy as merging Date and Time.
Yeah I agree with @dylannalex . As for the implementation that we discussed, feel free to use other classes from pregex.meta as they might help you. For example, you can use Integer(1, 12) for "m".
from pregex.
About default formats, it is impractical to hardcode all the 48 different combinations. What about adding an static method Date.date_formats()
to compute all different format combinations. I think the itertools.permutations from the standard library would be a great tool for this task. Let me know if I can import this function!
Oh, and one last thing:
you can use Integer(1, 12) for "m"
Do you mean Integer(1, 10)?
from pregex.
About default formats, it is impractical to hardcode all the 48 different combinations. What about adding an static method Date.date_formats() to compute all different format combinations. I think the itertools.permutations from the standard library would be a great tool for this task. Let me know if I can import this function!
Sure, you can use it. Just make sure that you import it with a different name starting with a "_" so it isn't directly imported every time pregex.meta is imported. Better yet, import it within the "Date" class itself.
Oh, and one last thing:
you can use Integer(1, 12) for "m"
Do you mean Integer(1, 10)?
Yeah I'm sorry you're right. I was under the impression that "m" matched "11" and "12" too, and that it only indicated that a single-digit month must not have a leading zero, e.g. "3" would be okay but "03" would not. In that case, I'm guessing using "Integer" would be an overkill so you can go with something simpler. However, if you find that some class in pregex.meta could help you, don't hesitate using it!
from pregex.
I've finished the Date class implementation. I've implemented each 48 different combinations dynamically, so adding new date formats should be straightforward.
Features:
- If no format is provided, Date considers all possible formats.
- All format provided are converted to all lower case (e.g. dD/mM/yyYY is converted to dd/mm/yyyy).
- Raises InvalidArgumentValueException when an invalid format is provided.
I also didn't use itertools.permutations, so no extra import needed!
I'm now working on documentation, which it's not my strong point. I'd really appreciate some help with it
In a nutshell, the Date class has the following structure:
class Date(_pre.Pregex):
'''
Matches any date.
:param str \*formats: Strings that determines which date formats to be considered a match.
A date can either be dd/mm/yy, mm/dd/yy or yy/mm/dd (separated by by '/' or '-'), where:
yy – two-digit year, e.g. 21
yyyy – four-digit year, e.g. 2021
m – one-digit month for months below 10, e.g. 3
mm – two-digit month, e.g. 03\
d – one-digit day of the month for days below 10, e.g. 2
dd – two-digit day of the month, e.g. 02
By default, all date formats are considered.
:raises InvalidArgumentValueException: Invalid date format provided.
'''
__date_separators: tuple[str, str] = ("-", "/")
__date_value_pre: dict[str, _pre.Pregex] = {
"d":_cl.AnyDigit() - "0",
"dd":_op.Either("0" + _cl.AnyDigit(), PositiveInteger(10, 31)),
"m":_cl.AnyDigit() - "0",
"mm":_op.Either("0" + _cl.AnyDigit(), PositiveInteger(10, 12)),
"yy":_cl.AnyDigit() * 2,
"yyyy":_cl.AnyDigit() * 4,
}
def __init__(self, *formats: str):
'''
Matches any date.
:param str \*formats: Strings that determines which date formats to be considered a match. \
A date can either be dd/mm/yy, mm/dd/yy or yy/mm/dd (separated by by '/' or '-'), where:
yy – two-digit year, e.g. 21
yyyy – four-digit year, e.g. 2021
m – one-digit month for months below 10, e.g. 3
mm – two-digit month, e.g. 03
d – one-digit day of the month for days below 10, e.g. 2
dd – two-digit day of the month, e.g. 02
By default, all date formats are considered.
:raises InvalidArgumentValueException: Invalid date format provided.
'''
def __date_pre(format: str) -> _pre.Pregex:
"""
Converts a date format into a ``Pregex`` instance.
:param str format: The date format to be converted.
"""
def __date_formats() -> list[str]:
'''
Returns a list containing all possible date format combinations.
'''
from pregex.
Looks good! Don't worry about documentation, I can do this later. A few points:
- Make sure you do (cl.AnyDigit() - "0") in "mm" and "dd" so a match with "00" isn't possible.
- Replace "PositiveInteger" with "Integer" as the former will try to match the sign "+" too.
- Add some tests in "tests/test_meta_essentials.py" if it's easy for you. Nothing crazy, just trying to match some valid/invalid dates. You can copy the testing structure of classes like HttpUrl, IPv4 and IPv6.
After doing these I think that you're good to go, so open a PR whenever you're ready.
from pregex.
I think this seems great! I can help out with documentation as well if need be.
from pregex.
Thanks for your help, @manoss96!
PR is open. I've added tests and fixed what we discussed. I also ensured date values (i.e. 'd', 'dd', 'm', 'mm', 'yy', 'yyyy') are not enclosed by any other digit:
__date_value_pre: dict[str, _pre.Pregex] = {
"d":_asr.NotEnclosedBy(_cl.AnyDigit() - "0", _cl.AnyDigit()),
"dd":_asr.NotEnclosedBy(
_op.Either("0" + (_cl.AnyDigit() - "0"), Integer(10, 31)),
_cl.AnyDigit()),
"m":_asr.NotEnclosedBy(_cl.AnyDigit() - "0", _cl.AnyDigit()),
"mm":_asr.NotEnclosedBy(
_op.Either("0" + (_cl.AnyDigit() - "0"), Integer(10, 12)),
_cl.AnyDigit()),
"yy":_asr.NotEnclosedBy(_cl.AnyDigit() * 2, _cl.AnyDigit()),
"yyyy":_asr.NotEnclosedBy(_cl.AnyDigit() * 4, _cl.AnyDigit()),
}
Greetings.
from pregex.
Related Issues (20)
- Collaboration HOT 1
- Adding CI tools for testing and linting HOT 1
- negated character class? HOT 3
- Case Insensitive Modifiers HOT 2
- How to get text in outer brackets? HOT 2
- Is there any way to match a pattern at start and end with one function? HOT 5
- Use the Pypi 'regex' module instead of the built-in 're' module HOT 2
- Cannot install pregex HOT 5
- Examples for Backreference, Conditional HOT 2
- How to match a DOT (RE's anything) ? HOT 2
- How to rewrite re.sub(pattern, '\\1_\\2', text) in PRegEx ? HOT 2
- pre = EnclosedBy(pre, Whitespace()) makes an error HOT 2
- get_matches() got 3 but has_match() only 1 HOT 4
- pre.split_by_match differs re.split with pros and cons HOT 3
- Contributing to pregex HOT 8
- In which file is OnceOrMore? HOT 2
- Email format parsing failed HOT 1
- Change naming of "Optional" to not conflict with Python standard typing library HOT 1
- I hope you can make a class for Korean. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pregex.