Giter VIP home page Giter VIP logo

Comments (14)

manoss96 avatar manoss96 commented on May 31, 2024 1

@dylannalex Played around with the class and it looks great. Good job. Since both "Email" and "Date" are done, I'm closing this issue.

from pregex.

dylannalex avatar dylannalex commented on May 31, 2024

Hi, are you already working on it? I can try adding an Email and Date classes if you want.

from pregex.

manoss96 avatar manoss96 commented on May 31, 2024

@dylannalex I've actually finished the Email one, though Date is yet to be made. You can have a go at it, it first needs some thought on the design though. I'm thinking of it having a single string parameter "format", through which you define the format of the date that you're willing to match, e.g. "mm/dd/yyyy". You can find more formats here. The other thing i'm thinking is that we could have Date(*formats), so you can match many formats with one instance. So, instead of one having to do:

from pregex import *

date1 = Date("mm/dd/yyyy")
date2 = Date("dd/mm/yyyy")

dates = op.Either(date1, date2)

they can just do:

from pregex import *

dates = Date("mm/dd/yyyy", "dd/mm/yyyy")

In case no formats are provided, then the deafult will be to match any date format. What do you think?

I'm gonna create a branch called v2.0.1, as well as another branch based on this issue. You can work on it there.

from pregex.

dylannalex avatar dylannalex commented on May 31, 2024

Hi @manoss96, I've been working on the Date class. Here's what I came up with:

Date formats

I added Date.date_formats, a Date class attribute that contains all valid date formats:

from pregex import *

Date.date_formats
>>> ('mm/dd/yyyy', 'dd/mm/yyyy', 'yyyy/mm/dd')

By default, Date matches any date format in Date.date_formats.

Date arguments

I followed your suggestion and let the user match many formats with a single Date instance.

from pregex import *

text ="""
01/11/2001
12/09/1996
1875/11/02
"""

pre1 = Date()
pre1.get_matches(text)
>>> ['01/11/2001', '12/09/1996', '1875/11/02']

pre2 = Date("dd/mm/yyyy")
pre2.get_matches(text)
>>> ['01/11/2001', '12/09/1996']

pre3 = Date("dd/mm/yyyy", "yyyy/mm/dd")
pre3.get_matches(text)
>>> ['01/11/2001', '12/09/1996', '1875/11/02']

Note: Date converts all uppercase characters in a date format into lowercase characters (e.g. "DD/MM/YYYY" is converted to "dd/mm/yyyy")

Invalid formats

The given formats are compared to date formats on Date.date_formats. When an invalid format is found, Date raises InvalidArgumentValueException.

pre = Date("dd/mm/yyy")
>>> pregex.core.exceptions.InvalidArgumentValueException: Provided date format "dd/mm/yyy" is not valid.

Let me know your thoughts. I'm up to adding more features or improving any aspect you consider!

from pregex.

alansun17904 avatar alansun17904 commented on May 31, 2024

I think this should also consider short-hand notations for years such as 02 for 2002. It might also make sense to add notations for time as well. Something like the strptime function in the datetime module.

For example, you could have D/M/y to match things like 01/03/02, but D/M/Y to match stuff like 01/03/2002.

I feel like this format makes sense because it's already synonymous with other Python libraries and won't be a hassle for users to learn.

from pregex.

manoss96 avatar manoss96 commented on May 31, 2024

@dylannalex Looks great, good job! As for the formats, I suggest that we follow this notation. That way we can have all lowercase while at the same time we can differentiate between 2002 and 02 like @alansun17904 said. For now, I'd say that implementing any valid combination of "d/dd", "m/mm", and "yy/yyyy" along with separators "/" and "-", is good enough. In the future, more formats might follow.

To wrap up, I suggest the following list of formats:

  1. d/m/yy
  2. dd/m/yy
  3. d/mm/yy
  4. dd/mm/yy
  5. d/m/yyyy
  6. dd/m/yyyy
  7. d/mm/yyyy
  8. dd/mm/yyyy
  9. m/d/yy
  10. mm/d/yy
  11. m/dd/yy
  12. mm/dd/yy
  13. m/d/yyyy
  14. mm/d/yyyy
  15. m/dd/yyyy
  16. mm/dd/yyyy
  17. yy/m/d
  18. yyyy/m/d
  19. yy/mm/d
  20. yyyy/mm/d
  21. yy/m/dd
  22. yyyy/m/dd
  23. yy/mm/dd
  24. yyyy/mm/dd

Plus all of the above using the "-" separator, suming to a total of 24 + 24 = 48 different formats.

I don't know about your current implementation, but I suggest having a dictionary of 6 different keys, namely "d", "dd", "m", "mm", "yy" and "yyyy", each mapping to a different pre-defined "Pregex" instance for matching each possible part of the date. Then it's just a matter of combining these instances together, separated by either "/" or "-". How's that sound?

from pregex.

dylannalex avatar dylannalex commented on May 31, 2024

Sounds great, @manoss96! Thank you and @alansun17904 for the help!

About what @alansun17904 said, I'd avoid date time values for now, since I consider it would be better to have a Date class for matching only dates and a Time class for matching time values. Once we have these two classes working, implementing a DateTime class should be as easy as merging Date and Time.

from pregex.

manoss96 avatar manoss96 commented on May 31, 2024

About what @alansun17904 said, I'd avoid date time values for now, since I consider it would be better to have a Date class for matching only dates and a Time class for matching time values. Once we have these two classes working, implementing a DateTime class should be as easy as merging Date and Time.

Yeah I agree with @dylannalex . As for the implementation that we discussed, feel free to use other classes from pregex.meta as they might help you. For example, you can use Integer(1, 12) for "m".

from pregex.

dylannalex avatar dylannalex commented on May 31, 2024

About default formats, it is impractical to hardcode all the 48 different combinations. What about adding an static method Date.date_formats() to compute all different format combinations. I think the itertools.permutations from the standard library would be a great tool for this task. Let me know if I can import this function!

Oh, and one last thing:

you can use Integer(1, 12) for "m"

Do you mean Integer(1, 10)?

from pregex.

manoss96 avatar manoss96 commented on May 31, 2024

About default formats, it is impractical to hardcode all the 48 different combinations. What about adding an static method Date.date_formats() to compute all different format combinations. I think the itertools.permutations from the standard library would be a great tool for this task. Let me know if I can import this function!

Sure, you can use it. Just make sure that you import it with a different name starting with a "_" so it isn't directly imported every time pregex.meta is imported. Better yet, import it within the "Date" class itself.

Oh, and one last thing:

you can use Integer(1, 12) for "m"

Do you mean Integer(1, 10)?

Yeah I'm sorry you're right. I was under the impression that "m" matched "11" and "12" too, and that it only indicated that a single-digit month must not have a leading zero, e.g. "3" would be okay but "03" would not. In that case, I'm guessing using "Integer" would be an overkill so you can go with something simpler. However, if you find that some class in pregex.meta could help you, don't hesitate using it!

from pregex.

dylannalex avatar dylannalex commented on May 31, 2024

I've finished the Date class implementation. I've implemented each 48 different combinations dynamically, so adding new date formats should be straightforward.

Features:

  • If no format is provided, Date considers all possible formats.
  • All format provided are converted to all lower case (e.g. dD/mM/yyYY is converted to dd/mm/yyyy).
  • Raises InvalidArgumentValueException when an invalid format is provided.

I also didn't use itertools.permutations, so no extra import needed!

I'm now working on documentation, which it's not my strong point. I'd really appreciate some help with it 😄
In a nutshell, the Date class has the following structure:

class Date(_pre.Pregex):
    '''
    Matches any date.

    :param str \*formats: Strings that determines which date formats to be considered a match.
        A date can either be dd/mm/yy, mm/dd/yy or yy/mm/dd (separated by by '/' or '-'), where:
            yy – two-digit year, e.g. 21
            yyyy – four-digit year, e.g. 2021
            m – one-digit month for months below 10, e.g. 3
            mm – two-digit month, e.g. 03\
            d – one-digit day of the month for days below 10, e.g. 2
            dd – two-digit day of the month, e.g. 02
        By default, all date formats are considered.
    
    :raises InvalidArgumentValueException: Invalid date format provided.
    '''
    __date_separators: tuple[str, str] = ("-", "/")
    __date_value_pre: dict[str, _pre.Pregex] = {
        "d":_cl.AnyDigit() - "0",
        "dd":_op.Either("0" + _cl.AnyDigit(), PositiveInteger(10, 31)),
        "m":_cl.AnyDigit() - "0",
        "mm":_op.Either("0" + _cl.AnyDigit(), PositiveInteger(10, 12)),
        "yy":_cl.AnyDigit() * 2,
        "yyyy":_cl.AnyDigit() * 4,
    }

    def __init__(self, *formats: str):
        '''
        Matches any date.

        :param str \*formats: Strings that determines which date formats to be considered a match. \
            A date can either be dd/mm/yy, mm/dd/yy or yy/mm/dd (separated by by '/' or '-'), where:
                yy – two-digit year, e.g. 21
                yyyy – four-digit year, e.g. 2021
                m – one-digit month for months below 10, e.g. 3
                mm – two-digit month, e.g. 03
                d – one-digit day of the month for days below 10, e.g. 2
                dd – two-digit day of the month, e.g. 02
            By default, all date formats are considered.
        
        :raises InvalidArgumentValueException: Invalid date format provided.
        '''

    def __date_pre(format: str) -> _pre.Pregex:
        """
        Converts a date format into a ``Pregex`` instance.
        
        :param str format: The date format to be converted.
        """

    def __date_formats() -> list[str]:
        '''
        Returns a list containing all possible date format combinations.
        '''

from pregex.

manoss96 avatar manoss96 commented on May 31, 2024

Looks good! Don't worry about documentation, I can do this later. A few points:

  • Make sure you do (cl.AnyDigit() - "0") in "mm" and "dd" so a match with "00" isn't possible.
  • Replace "PositiveInteger" with "Integer" as the former will try to match the sign "+" too.
  • Add some tests in "tests/test_meta_essentials.py" if it's easy for you. Nothing crazy, just trying to match some valid/invalid dates. You can copy the testing structure of classes like HttpUrl, IPv4 and IPv6.

After doing these I think that you're good to go, so open a PR whenever you're ready.

from pregex.

alansun17904 avatar alansun17904 commented on May 31, 2024

I think this seems great! I can help out with documentation as well if need be.

from pregex.

dylannalex avatar dylannalex commented on May 31, 2024

Thanks for your help, @manoss96!

PR is open. I've added tests and fixed what we discussed. I also ensured date values (i.e. 'd', 'dd', 'm', 'mm', 'yy', 'yyyy') are not enclosed by any other digit:

__date_value_pre: dict[str, _pre.Pregex] = {
        "d":_asr.NotEnclosedBy(_cl.AnyDigit() - "0", _cl.AnyDigit()),
        "dd":_asr.NotEnclosedBy(
            _op.Either("0" + (_cl.AnyDigit() - "0"), Integer(10, 31)),
            _cl.AnyDigit()),
        "m":_asr.NotEnclosedBy(_cl.AnyDigit() - "0", _cl.AnyDigit()),
        "mm":_asr.NotEnclosedBy(
            _op.Either("0" + (_cl.AnyDigit() - "0"), Integer(10, 12)),
            _cl.AnyDigit()),
        "yy":_asr.NotEnclosedBy(_cl.AnyDigit() * 2, _cl.AnyDigit()),
        "yyyy":_asr.NotEnclosedBy(_cl.AnyDigit() * 4, _cl.AnyDigit()),
    }

Greetings.

from pregex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.