Giter VIP home page Giter VIP logo

Comments (7)

ilike2burnthing avatar ilike2burnthing commented on May 27, 2024

Quite a lot needs to change about the current title filters (these go back years, and it turns out I merged one of the PRs that added much of it). I'm kind of surprised they haven't caused more issues before now.

See my questions and comments in the code below, see if you can respond to them:

    title:
      selector: name
      filters:

# do these really need to be removed?
#        - name: re_replace
#          args: ["[\\[\\]]", ""]
#        - name: re-replace
#          args: ["(?i)\\bHD-?Olimpo\\b", ""]

# needs to be more specific and to use boundaries, currently:
# a full house > a BRDISK house
# fully loaded > BRDISKy loaded
#        - name: re_replace
#          args: ["(?i)(full(bluray)?)", "BRDISK"]

        - name: re_replace
          args: ["\\bE-AC-3\\b", "EAC3"]
        - name: re_replace
          args: ["(?i)\\b(es-en|en-es)\\b", "MULTi SPANiSH ENGLiSH"]
        - name: re_replace
          args: ["(?i)\\bes-cat?\\b", "MULTi SPANiSH CATALAN"]
        - name: re_replace
          args: ["(?i)\\bes-(ja|ja?p)\\b", "MULTi SPANiSH JAPANESE"]

# what is the third language? if it can vary,
# then are the other two always English and Spanish?
#        - name: re_replace
#          args: ["(?i)\\btriaudio\\b", "MULTi SPANiSH ENGLiSH"]

# not specific enough, not sure it can be made more specific
#        - name: re_replace
#          args: ["(?i)\\bdual\\b", "MULTi SPANiSH"]
#        - name: re_replace
#          args: ["(?i)\\bspa\\b", "SPANiSH"]

# not specific enough, especially for a Spanish tracker,
# not sure it can be made more specific
#        - name: re_replace
#          args: [" ES ", " SPANiSH "]
#        - name: re_replace
#          args: ["(?i)\\bvas\\b", "BASQUE"]

        - name: re_replace
          args: ["(?i)\\b(espa[ñn]ol|castellano|esp)\\b", "SPANiSH"]
        - name: re_replace
          args: ["(?i)\\b(ingl[ée]s|[ei]ng)\\b", "ENGLiSH"]
        - name: re_replace
          args: ["(?i)\\bcat\\b", "CATALAN"]
        - name: re_replace
          args: ["(?i)\\bfr[ae]\\b", "FRENCH"]
        - name: re_replace
          args: ["(?i)\\b(jap|jp)\\b", "JAPANESE"]
        - name: re_replace
          args: ["(?i)\\bita\\b", "ITALiAN"]
        - name: re_replace
          args: ["(?i)\\brus\\b", "RUSSiAN"]
        - name: re_replace
          args: ["(?i)\\bger\\b", "GERMAN"]
        - name: re_replace
          args: ["(\\s|\\.)+", "$1"]
        - name: re_replace
          args: ["^\\.", ""]

I used to have an account, but it was banned for inactivity (that's the problem with having accounts for a few hundred trackers). If you want me to take a closer look at this, you could open a support ticket with the tracker staff and ask them to reinstate the account nahik99374 and email address jacketttest [AT] gmail [DOT] com, then link to this issue 👋

from jackett.

goioCodes avatar goioCodes commented on May 27, 2024

From the changes in the code, I guess it's not possible to simply use [" ES[- ]", " SPANiSH "]. In that case, I will try to respond to your comments considering that.

# do these really need to be removed?
#        - name: re_replace
#          args: ["[\\[\\]]", ""]
#        - name: re-replace
#          args: ["(?i)\\bHD-?Olimpo\\b", ""]

I don't see how removing any of these will affect anything. So I would say it's not necessary. As it is written here though, removing the square brackets without replacing with a space will cause some words to fuse, such as the [PACK][NF ... specifications. Double spaces would then need to be removed.
Moreover, most of the releases do not include HD-Olimpo at the end but simply HDO.

# needs to be more specific and to use boundaries, currently:
# a full house > a BRDISK house
# fully loaded > BRDISKy loaded
#        - name: re_replace
#          args: ["(?i)(full(bluray)?)", "BRDISK"]

I don't understand the comments, I can only say that the FullBluRay specification is used when the file is a .iso image. If there is any further differentiation to be made I am not aware of it.
On a separate note, while this pattern works in that it will catch all .iso files, it will leave leftover bits for some of the specifications used. Here I'll write all the ways a .iso file is annotated, which maybe can be included in the regex:
FullBluRay
Full Blu-Ray
Full UHD
UHD FullBluRay

# what is the third language? if it can vary,
# then are the other two always English and Spanish?
#        - name: re_replace
#          args: ["(?i)\\btriaudio\\b", "MULTi SPANiSH ENGLiSH"]

The rules of the tracker indicate that every release must have at least the spanish and the original language audio track. This means that for movies whose original language is english, the third track is always catalan (couldn't find a counterexample). For anime and other foreign movies, it can vary (although most likely catalan, sometimes english). Unless there is some way of obtaining the original language of the release, I would not asume english or any other language aside from spanish is present. For example, most anime will have spanish, catalan and japanese and still can be marked as triaudio.

# not specific enough, not sure it can be made more specific
#        - name: re_replace
#          args: ["(?i)\\bdual\\b", "MULTi SPANiSH"]
#        - name: re_replace
#          args: ["(?i)\\bspa\\b", "SPANiSH"]

Unfortunately looks like Dual cannot be made more specific without relying on the position of the specification. If that is possible, then it is always after the resolution and video encoding have already appeared. Same for spa.
I must say I have searched for any release marked with spa and haven't found it. It is always either ES, or for some of the older releases other variants are used which are all already collected in the regex "(?i)\\b(espa[ñn]ol|castellano|esp)\\b".

# not specific enough, especially for a Spanish tracker,
# not sure it can be made more specific
#        - name: re_replace
#          args: [" ES ", " SPANiSH "]
#        - name: re_replace
#          args: ["(?i)\\bvas\\b", "BASQUE"]

ES is everything the tracker gives for the usual releases (containing spanish & original). So not possible without using the position. Honestly, since the biggest rule in this tracker is that there should be a spanish audio track, it could even be possible to just ignore everything and add SPANiSH to all titles.
What exactly makes vas less specific than all the other languages that follow?

Now, I think it would be great if the following patterns accepted the same variations as the ones you have included in the pairs above. For example, in any of the strings ES-CA-JA, ES-CAT-EN, the third language is not matched as it is written right now.

        - name: re_replace
          args: ["(?i)\\b(ingl[ée]s|[ei]ng)\\b", "ENGLiSH"]
        - name: re_replace
          args: ["(?i)\\bcat\\b", "CATALAN"]
        - name: re_replace
          args: ["(?i)\\b(jap|jp)\\b", "JAPANESE"]

Finally, just to not leave it out, I would also handle french like this, since I've seen it appear as FR, FRA, FRE, FRAN, FREN

        - name: re_replace
          args: ["(?i)\\bes-fr([ae]n?)?)\\b", "MULTi SPANiSH FRENCH"]
...
        - name: re_replace
          args: ["(?i)\\bfr([ae]n?)?\\b", "FRENCH"]

and to be fair, any other appearance of a second language, such as ES-GER, ES-ITA, ES-RU, and all of their variations, will still break the ES -> SPANiSH replacement. They are not very common but might be worth it to add them anyway. Here are their possible appearances:
Italian: IT, ITA
German: GER, GE, ALE, AL, ALEM
Russian: RU, RUS

from jackett.

ilike2burnthing avatar ilike2burnthing commented on May 27, 2024

Thanks, that's a great help. I'm working on making changes to the indexer while I type this comment, so hopefully I don't missing anything.

full(bluray)? means that bluray is optional, so it will match any use of full, and as there are no word boundaries it could be a partial match, such as fully. I've changed that now using your examples.

I've changed triaudio to just be MULTi SPANiSH, so that we're not making assumptions.

As vas is a Spanish word (ir, voy, vas, va...), I could see it resulting in false positives. Same with other short words like en, es, ja, and ca. I don't speak Spanish, but Google Translate didn't pick up any of the rest as being Spanish.

End result was slightly convoluted, but for most results it should be better than just throwing SPANiSH on the end, we just do that as a last resort.

It's tested with a different indexer, so it shouldn't break anything, but if I've missed anything specific to HD-Olimpo, let me know. New build should be out in ~6hrs.

from jackett.

garfield69 avatar garfield69 commented on May 27, 2024

v0.21.2496

from jackett.

goioCodes avatar goioCodes commented on May 27, 2024

Thank you for the update. I updated Jackett and this seems like a good solution. But it looks like SPANiSH is being appended to all titles, no matter if it already appeared before. For example:
Golpe a Wall Street (2023) [FullBluRay 1080p AVC ES-EN DTS-HD MA 5.1 Subs]HDO iso
Golpe a Wall Street (2023) [BRDISK 1080p AVC MULTi SPANiSH ENGLiSH DTS-HD MA 5.1 Subs]HDO iso SPANiSH

Also, what I meant in my last paragraph is that it would be nice if the pattern that matches each language independently also had the same variations as the one that is paired with es-, in order to ensure that a language appearing in chains longer than 2 is still matched. For example,
Rebel (2022) [BDRemux 1080p AVC ES-CA-FR DTS-HD MA 5.1 Subs][HDO]
converts to
Rebel (2022) [BDRemux 1080p AVC MULTi SPANiSH CATALAN-FR DTS-HD MA 5.1 Subs][HDO] SPANiSH
because the second french regex does not consider the FR variation. If the title contained ES-CA-FRE then it would match.

If there's some problem in the future I would still consider using the case-sensitive match for all languages independently (titles written in capitals are very rare), but this will work for 99% of releases 👍🏻

from jackett.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.