Giter VIP home page Giter VIP logo

Comments (14)

bubkoo avatar bubkoo commented on August 26, 2024 2

@service-paradis It now will trimming leading and trailing spaces before return.

  export function formatTitle(title: string) {
    const exclude = core.getInput('exclude')
    if (exclude) {
      return exclude
        .split(/[\s\n]+/)
        .map((keyworld) => keyworld.trim())
        .filter((keyworld) => keyworld.length > 0)
        .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
        .replace(/\s+/, ' ')
        .trim()
    }
    return title
  }

from potential-duplicates.

mondeja avatar mondeja commented on August 26, 2024 1

Sorry @bubkoo, but this is not what we need.

If I'm not mistaken, when an issue is opened, if match with, at least, one filter (if their title is "valid") will not be checked for potential duplicates. We need that, regardless of the opened issue title (without "validate it"), remove from it the words that are not needed and, comparing with other titles, of these other titles would be also removed certain words to improve the match between titles.

As @service-paradis pointed, we need an "exclude" function. Is something that you plan to include or not?

from potential-duplicates.

service-paradis avatar service-paradis commented on August 26, 2024 1

Thank you for your work @bubkoo!

Would it makes sense to make the list of excluded words case insensitive?

I agree that it would be better if the exclusions were case insensitive. What do you think @bubkoo?

Yes, the typo comes from the source itself.

from potential-duplicates.

bubkoo avatar bubkoo commented on August 26, 2024

@service-paradis filter input is supported in the next release. Any newly created issue would stop detection when it's title match the filter. And filter can be a string or space separated strings work with https://www.npmjs.com/package/anymatch.

from potential-duplicates.

service-paradis avatar service-paradis commented on August 26, 2024

@bubkoo Thanks for your work and the follow up!
The changes is great. It is not exactly what I need though.

This is an examples.

  • if I open Request Ubuntu icon and Request Fedora icon, they will be flagged as potential duplicates

I would like the algo to exclude a custom list of words before comparing the title. For example, having something like:

excludes:
  - Request
  - icon
  • This way, the algo will compare Request Ubuntu icon with Request Fedora icon. They wont be flagged as potential duplicates
  • If I have a title Ubuntu icon and Request Ubuntu icon, the algo will compare Ubuntu icon with Request Ubuntu icon. They will be flagged as potential duplicates

from potential-duplicates.

bubkoo avatar bubkoo commented on August 26, 2024

@service-paradis config like this

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: 'Request ** icon'

And this is my tested issues #15 #16

from potential-duplicates.

service-paradis avatar service-paradis commented on August 26, 2024

Thanks again @bubkoo 😄
Since people are not that disciplined, would it also work for other derivation than the current filter?
Example, people can request icons using these kind of title:

  • Request: Ubuntu icon
  • Request Ubuntu icon
  • Request: Ubuntu
  • Request Ubuntu
  • Add Ubuntu icon
  • Add Ubuntu
  • Ubuntu icon
  • Ubuntu
  • ...

from potential-duplicates.

bubkoo avatar bubkoo commented on August 26, 2024

@service-paradis You can specify multi filters in each line, such as

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: |
            Request ** icon
            Add ** icon
            ** icon **
            ** Ubuntu **

from potential-duplicates.

bubkoo avatar bubkoo commented on August 26, 2024

@mondeja @service-paradis Keyworlds specified in exclude will be replaced with empty string before detecting.

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          exclude: |
            request
            icon
            add
            ubuntu

from potential-duplicates.

mondeja avatar mondeja commented on August 26, 2024

What about remove them instead of replacing with empty strings? Empty strings will be compared also, increasing the possibility of false positives. Check this test, the action is comparing "" Ansys "" with "" Ubuntu "" and raising false positives. What is the point of replacing the exclusions with empty strings?

from potential-duplicates.

service-paradis avatar service-paradis commented on August 26, 2024

Thanks again for your work on this @bubkoo

Unfortunately, I dont think the changes you made will totally solve the previous problem with unnecessary spaces.

For example, if we want to exclude "Request" and "icon" from "Request Ubuntu icon"

before:

  .reduce((memo, keyworld) => memo.replace(keyworld, ' '), title)

gives "⎵⎵Ubuntu⎵⎵"

after

  .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
  .replace(/\s+/, ' ')

gives "⎵Ubuntu⎵"

For the comparison, you are creating arrays using split(' '). We can see a slight improvements as it'll compare ["", "Ubuntu", ""] instead of ["", "", "Ubuntu", "", ""]. But it still can bring false positives.

Maybe trimming leading and trailing spaces before splitting would solve the above.

from potential-duplicates.

ericcornelissen avatar ericcornelissen commented on August 26, 2024

Would it makes sense to make the list of excluded words case insensitive?

(Also, keyworld should probably be keyword, not sure if you copied this snippet straight from the source.)

from potential-duplicates.

bubkoo avatar bubkoo commented on August 26, 2024

@service-paradis Thanks for your tips and suggestions.

from potential-duplicates.

service-paradis avatar service-paradis commented on August 26, 2024

@bubkoo I see that you added case insensitivity to math titles.

It would be also great to add case insensitivity to remove excluded words. For example, here, I need to add every words in different cases (ex. request and Request).

from potential-duplicates.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.