Thanks for this action! I would like to use something similar for <a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sorry <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Thank you for your work <a class="user-mention notranslate" data-hovercard-type="user"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks again <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Parameters to excludes custom words about potential-duplicates HOT 14 CLOSED

wow-actions commented on August 26, 2024 4

Parameters to excludes custom words

from potential-duplicates.

Comments (14)

bubkoo commented on August 26, 2024 2

@service-paradis It now will trimming leading and trailing spaces before return.

  export function formatTitle(title: string) {
    const exclude = core.getInput('exclude')
    if (exclude) {
      return exclude
        .split(/[\s\n]+/)
        .map((keyworld) => keyworld.trim())
        .filter((keyworld) => keyworld.length > 0)
        .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
        .replace(/\s+/, ' ')
        .trim()
    }
    return title
  }

from potential-duplicates.

mondeja commented on August 26, 2024 1

Sorry @bubkoo, but this is not what we need.

If I'm not mistaken, when an issue is opened, if match with, at least, one filter (if their title is "valid") will not be checked for potential duplicates. We need that, regardless of the opened issue title (without "validate it"), remove from it the words that are not needed and, comparing with other titles, of these other titles would be also removed certain words to improve the match between titles.

As @service-paradis pointed, we need an "exclude" function. Is something that you plan to include or not?

from potential-duplicates.

service-paradis commented on August 26, 2024 1

Thank you for your work @bubkoo!

Would it makes sense to make the list of excluded words case insensitive?

I agree that it would be better if the exclusions were case insensitive. What do you think @bubkoo?

Yes, the typo comes from the source itself.

from potential-duplicates.

bubkoo commented on August 26, 2024

@service-paradis filter input is supported in the next release. Any newly created issue would stop detection when it's title match the filter. And filter can be a string or space separated strings work with https://www.npmjs.com/package/anymatch.

from potential-duplicates.

service-paradis commented on August 26, 2024

@bubkoo Thanks for your work and the follow up!
The changes is great. It is not exactly what I need though.

This is an examples.

if I open Request Ubuntu icon and Request Fedora icon, they will be flagged as potential duplicates

I would like the algo to exclude a custom list of words before comparing the title. For example, having something like:

excludes:
  - Request
  - icon

This way, the algo will compare ~~Request~~ Ubuntu ~~icon~~ with ~~Request~~ Fedora ~~icon~~. They wont be flagged as potential duplicates
If I have a title Ubuntu icon and Request Ubuntu icon, the algo will compare Ubuntu ~~icon~~ with ~~Request~~ Ubuntu ~~icon~~. They will be flagged as potential duplicates

from potential-duplicates.

bubkoo commented on August 26, 2024

@service-paradis config like this

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: 'Request ** icon'

And this is my tested issues #15 #16

from potential-duplicates.

service-paradis commented on August 26, 2024

Thanks again @bubkoo 😄
Since people are not that disciplined, would it also work for other derivation than the current filter?
Example, people can request icons using these kind of title:

Request: Ubuntu icon
Request Ubuntu icon
Request: Ubuntu
Request Ubuntu
Add Ubuntu icon
Add Ubuntu
Ubuntu icon
Ubuntu
...

from potential-duplicates.

bubkoo commented on August 26, 2024

@service-paradis You can specify multi filters in each line, such as

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          filter: |
            Request ** icon
            Add ** icon
            ** icon **
            ** Ubuntu **

from potential-duplicates.

bubkoo commented on August 26, 2024

@mondeja @service-paradis Keyworlds specified in exclude will be replaced with empty string before detecting.

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: bubkoo/potential-duplicates@v1
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          exclude: |
            request
            icon
            add
            ubuntu

from potential-duplicates.

mondeja commented on August 26, 2024

What about remove them instead of replacing with empty strings? Empty strings will be compared also, increasing the possibility of false positives. Check this test, the action is comparing "" Ansys "" with "" Ubuntu "" and raising false positives. What is the point of replacing the exclusions with empty strings?

from potential-duplicates.

service-paradis commented on August 26, 2024

Thanks again for your work on this @bubkoo

Unfortunately, I dont think the changes you made will totally solve the previous problem with unnecessary spaces.

For example, if we want to exclude "Request" and "icon" from "Request Ubuntu icon"

before:

  .reduce((memo, keyworld) => memo.replace(keyworld, ' '), title)

gives "⎵⎵Ubuntu⎵⎵"

after

  .reduce((memo, keyworld) => memo.replace(keyworld, ''), title)
  .replace(/\s+/, ' ')

gives "⎵Ubuntu⎵"

For the comparison, you are creating arrays using split(' '). We can see a slight improvements as it'll compare ["", "Ubuntu", ""] instead of ["", "", "Ubuntu", "", ""]. But it still can bring false positives.

Maybe trimming leading and trailing spaces before splitting would solve the above.

from potential-duplicates.

ericcornelissen commented on August 26, 2024

Would it makes sense to make the list of excluded words case insensitive?

^{(Also, keyworld should probably be keyword, not sure if you copied this snippet straight from the source.)}

from potential-duplicates.

bubkoo commented on August 26, 2024

@service-paradis Thanks for your tips and suggestions.

from potential-duplicates.

service-paradis commented on August 26, 2024

@bubkoo I see that you added case insensitivity to math titles.

It would be also great to add case insensitivity to remove excluded words. For example, here, I need to add every words in different cases (ex. request and Request).

from potential-duplicates.

Parameters to excludes custom words about potential-duplicates HOT 14 CLOSED

Comments (14)

before:

after

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent