Italian indexer MIRCrew have a problem,about jackett/jackett

Comments (54)

mtguido87 commented on June 29, 2024 1

I don't know exactly how prowlarr does things, but I do know that it imports the mircrew indexer code that we wrote for Jackett, and for Jackett the flow is:

perform a search on the mircrew we site, example https://mircrew-releases.org/search.php?keywords=%2Ble+%2Bali+%2Bdella+%2Bliberta&terms=all&sc=0&sf=titleonly&sr=topics&sk=t&sd=d&st=0&ch=300&t=0&submit=Cerca&fid[]=25&fid[]=26&fid[]=51&fid[]=52&fid[]=29&fid[]=30&fid[]=31&fid[]=33&fid[]=34&fid[]=35&fid[]=36&fid[]=37&fid[]=39&fid[]=40&fid[]=41&fid[]=42&fid[]=43&fid[]=45&fid[]=46&fid[]=47

and process the results page HTML by listing the names of the posts and providing default values for the seed, leech and size.

when a user clicks on the download link, then perform the thank you, but doing a http GET, example: https://mircrew-releases.org/viewtopic.php?f=26&p=309816&thanks=309816&to_id=197&from_id=3950

then when the post page is refreshed, fetch the new page with another http GET, example: https://mircrew-releases.org/viewtopic.php?f=26&p=309816

Then we process this page with the HTML handler, looking for the first magnet link
  selectors:
    - selector: a[href^="magnet:?xt="]
      attribute: href
Note that this is not an edit of the post page, its a browser HTML view, so there is no ASCII carriage return to process So your theory does not apply to Jackett, and I expect it does not apply to Prowlarr either, but I would want one of the Prowlarr team to confirm this. So what ever issue is routinely preventing your prowlarr indexer to return the magnet, it is some other problem . I note that the mircrew website is recently always under performance pressure, for example I often see a you cannot perform a search at this time, the load is to high messages that cause the search to return no results. Its possible that the same applies to fetching the details page after the thank you is done, and its simply this that prevents the magnet link from being found. (This is my theory only, not based on any actual research / testing).

As for the size, when the indexer performs a search, mircrew returns a list of post links in the search results page, but with no other details other than the name. So the indexer has hardcoded the size, seed and leech values, size 512MB, 1 seed, 1 leech. And while the seed and leech numbers are present on the post page once you have clicked thank you, I cannot find the size in the post page. But even if the size and updated seed and leech values were present in the post page, the indexer is not going to go fetch every post from the results page in order to provide those values, that is not practical.

And instead I just double-checked and it verifies exactly from there, because when it tries to like it but the button is not there he clicks the one before and it opens quote and looks for the magnet there. I have already tried (also tried again right now) on jacket and it does exactly the same thing.
To understand how it works you have to look at the logs, from the configurazoine file and that's all you get right.
Plus I have tried with about 20 instances and the rule is exactly that, with going to head it fails, without it works. So the problem is that necessarily

On the other hand, regarding the size, how come it is not possible to extract the size from the page

from jackett.

garfield69 commented on June 29, 2024 1

so what is really happening is that once a user has performed a thank you, then the post page does not have the thank you link anymore.
so if you perform a search the second time, and click on the indexer download link, the indexer tries to performs a thank you with https://mircrew-releases.org/viewtopic.php?f=26&p=309816&thanks=309816&to_id=197&from_id=3950 but ends up selecting the edit link instead. This is not intended and is the cause of the error.

the only way forward would be to recode the indexer in C# so that code can be added to check for the presence of the thank you link and only use it if found.
if the thank you link is not found look for the magnet.
if the thank you is found, do the thank you, then find the magnet.
[edit] corrected analysis

from jackett.

garfield69 commented on June 29, 2024 1

that partly helps, in that the logs show the selector was picked up, but then it error out with a bencode.
I'm having a bad day diagnosing stuff, I'm going to take a break ;-b

from jackett.

garfield69 commented on June 29, 2024 1

\o/

from jackett.

garfield69 commented on June 29, 2024 1

Would there perhaps be a way by which the forum releasers can specify the size and have jackett be able to read it easily?

Probably a way forward would be for the releasers to include the size in their topic title within {}, for example:
title movie (2024) 1080p x265 {2.6 GB} ita eng AAC Sub ita eng-group or
title series 2024 - Stagione 01 (2024) [COMPLETA] 480p x264 {920 MB} ita eng AAC Sub ita eng-group

The indexer when processing the search results, could then look for the {} in the title and extract the size, and if not present default to 512MB

from jackett.

ilike2burnthing commented on June 29, 2024 1

I'd prefer to have some requirement for brackets, just to keep false positives to a minimum (e.g. 2B (2009) 1080p would show a size of 2 bytes). How about the following?

[\[\({](1?\d{1,3}(?:[\.,]\d{1,2})?\s*[KMGTP]?i?B)[\]\)}]

from jackett.

garfield69 commented on June 29, 2024 1

Given that the original query was for Radarr dual title unknown-movie flagging, I would just do movies to reduce the collateral, and definitely as a config optional. then strip out the 2nd title from the names.

from jackett.

garfield69 commented on June 29, 2024

I don't know exactly how prowlarr does things, but I do know that it imports the mircrew indexer code that we wrote for Jackett, and for Jackett the flow is:

perform a search on the mircrew we site, example https://mircrew-releases.org/search.php?keywords=%2Ble+%2Bali+%2Bdella+%2Bliberta&terms=all&sc=0&sf=titleonly&sr=topics&sk=t&sd=d&st=0&ch=300&t=0&submit=Cerca&fid[]=25&fid[]=26&fid[]=51&fid[]=52&fid[]=29&fid[]=30&fid[]=31&fid[]=33&fid[]=34&fid[]=35&fid[]=36&fid[]=37&fid[]=39&fid[]=40&fid[]=41&fid[]=42&fid[]=43&fid[]=45&fid[]=46&fid[]=47
and process the results page HTML by listing the names of the posts and providing default values for the seed, leech and size.
when a user clicks on the download link, then perform the thank you, but doing a http GET, example: https://mircrew-releases.org/viewtopic.php?f=26&p=309816&thanks=309816&to_id=197&from_id=3950
then when the post page is refreshed, fetch the new page with another http GET, example: https://mircrew-releases.org/viewtopic.php?f=26&p=309816
Then we process this page with the HTML handler, looking for the first magnet link

  selectors:
    - selector: a[href^="magnet:?xt="]
      attribute: href

Note that this is not an edit of the post page, its a browser HTML view, so there is no ASCII carriage return to process
So your theory does not apply to Jackett, and I expect it does not apply to Prowlarr either, but I would want one of the Prowlarr team to confirm this.
So what ever issue is routinely preventing your prowlarr indexer to return the magnet, it is some other problem
.
I note that the mircrew website is recently always under performance pressure, for example I often see a you cannot perform a search at this time, the load is to high messages that cause the search to return no results.
Its possible that the same applies to fetching the details page after the thank you is done, and its simply this that prevents the magnet link from being found. (This is my theory only, not based on any actual research / testing).

As for the size, when the indexer performs a search, mircrew returns a list of post links in the search results page, but with no other details other than the name. So the indexer has hardcoded the size, seed and leech values, size 512MB, 1 seed, 1 leech.
And while the seed and leech numbers are present on the post page once you have clicked thank you, I cannot find the size in the post page.
But even if the size and updated seed and leech values were present in the post page, the indexer is not going to go fetch every post from the results page in order to provide those values, that is not practical.

from jackett.

garfield69 commented on June 29, 2024

regarding the size, how come it is not possible to extract the size from the page

found the size on the post, sorry, missed it the first dozen times :-D

so we don't fetch the size from the post page, because its not practical.
the indexer does a GET to perform the search, which may return up to 100 results.
then the indexer would have to do another 100 GET, one for each post, and read the size.

no website is going to want anyone to perform 101 GET requests every time some one does a search, the traffic would push a web site into overload. IT may even block your IP as a DDoS generator.

from jackett.

ilike2burnthing commented on June 29, 2024

MIRCrew was changed from looking specifically for the thanks button a while ago - 259d98c. This is why on already thanked torrents it's selecting the quote button. This is working fine in both Jackett and Prowlarr for me. ilDraGoNeRo2 is also working using the same method (I've just updated the selector).

@garfield69 are you getting an error in Jackett?

from jackett.

garfield69 commented on June 29, 2024

@ilike2burnthing I believe the first time you search for a post that you have not already tried to download before, then the indexer works fine.
But if you search for it again and try to download it a second time then that is when you get the error.

from jackett.

ilike2burnthing commented on June 29, 2024

Yea, for me first, second, and all subsequent download attempts work fine.

from jackett.

garfield69 commented on June 29, 2024

Same here for me on Jackett. Is it something peculiar to prowlarr then, have you tried it there?

from jackett.

ilike2burnthing commented on June 29, 2024

Working on Prowlarr too. I think your original theory about it just coinciding with the site being up and down might be the likely culprit.

from jackett.

garfield69 commented on June 29, 2024

try searching for le ali della libert and see if you can download that once or twice,.
[edit] the first and second torrents don't work, but the third one works ok
which I think is what the OP has been trying to tell us.

from jackett.

ilike2burnthing commented on June 29, 2024

Oh waoh, yea I can't even download it the first time. Looks like a formatting issue on their part. Change a[href^="magnet:?xt="] to a[href*="magnet:?xt="]?

from jackett.

ilike2burnthing commented on June 29, 2024

Yea, seeing that as well. Playing around with things and will come back in a bit.

from jackett.

mtguido87 commented on June 29, 2024

As I have written since the first post, the problem occurs with releases where the thank you button is not present.
In that case the system quotes the message (I don't know why it does that) and then takes the magnet.
The problem occurs in these cases when the magnet goes to the head. I also put some explanatory pictures in the first post.

If you want to look for such a release that does not work look for this one: The Green Mile 2160p Licdom and you will see that the download fails.

https://i.postimg.cc/G3zfSnm2/image.png

Do you have any idea how to fix it?

from jackett.

garfield69 commented on June 29, 2024

Do you have any idea how to fix it?

I think we are finally understanding the scope of the problem, (BTW thank you for persevering and remaining calm throughout my ramblings), and now that we can reproduce the problem for ourselves we will try to work out a solve, if possible, within the yaml construct.
But it may be that the solution may be with a re-write of the indexer to C#.

from jackett.

ilike2burnthing commented on June 29, 2024

Might be as simple as this:

  selectors:
    - selector: a[href*="magnet:?xt="]
      attribute: href
      filters:
        - name: re_replace
          args: ["\n", ""]

Currently just trying to work out if htmldecode is also needed.

from jackett.

mtguido87 commented on June 29, 2024

This is the thread reguarding the image of "The green mile" that not work, as you can see the magnet go to the head:

https://i.postimg.cc/PJYfMf1S/image.png

from jackett.

mtguido87 commented on June 29, 2024

Potrebbe essere semplice come questo:
  selectors:
    - selector: a[href*="magnet:?xt="]
      attribute: href
      filters:
        - name: re_replace
          args: ["\n", ""]
Al momento sto solo cercando di capire se htmldecodeè necessario.

Thank you so much to investigate and try to fix the problem.
If you want also to solve the problem of the fixed "512mb" i will offer you a sweetie cappuccino!!

from jackett.

ilike2burnthing commented on June 29, 2024

#15034 (comment)

from jackett.

garfield69 commented on June 29, 2024

So ilike2burnthing has found a solve for the magnet issue, and it will be available in the Jackett app with the next release due out in about 5 hours or so.
When the Prowlarr team import the updated indexer from our source (anytime they please from now) then this should be available on that app too.

from jackett.

mtguido87 commented on June 29, 2024

So ilike2burnthing has found a solve for the magnet issue, and it will be available in the Jackett app with the next release due out in about 5 hours or so. When the Prowlarr team import the updated indexer from our source (anytime they please from now) then this should be available on that app too.

You guys are awesome!!! THANK YOU! On behalf of myself and others in the MIRCrew community who will benefit!
Will Prowlarr automatically transpose this fix or do you have to report it to someone somehow?

Since you explained to me that from your side there is no way to solve the file size problem right?
Would there perhaps be a way by which the forum releasers can specify the size and have jackett be able to read it easily? I could then flag it and solve that problem as well.
Let me know!

Thanks

from jackett.

ilike2burnthing commented on June 29, 2024

When the Prowlarr team import the updated indexer from our source (anytime they please from now) then this should be available on that app too.

Unless the file size of each release is in the search results page (which they can't/aren't going to do because the site is just a forum), this can't be fixed by them.

from jackett.

mtguido87 commented on June 29, 2024

Ci sarebbe forse un modo in cui i releaser del forum possano specificare la dimensione e fare in modo che Jackett sia in grado di leggerla facilmente?

Probabilmente una soluzione sarebbe che i rilasci includessero la dimensione nel titolo dell'argomento all'interno di {}, ad esempio: title movie (2024) 1080p x265 {2.6 GB} ita eng AAC Sub ita eng-groupor title series 2024 - Stagione 01 (2024) [COMPLETA] 480p x264 {920 MB} ita eng AAC Sub ita eng-group

L'indicizzatore durante l'elaborazione dei risultati della ricerca, potrebbe quindi cercare {} nel titolo ed estrarre la dimensione e, se non presente, impostare automaticamente 512 MB

This would be a very good solution.
If you give me confirmation that you could update jackett with this feature I will immediately try to notify the releasers to include the size between {} in the title, hopefully they will accept this proposal.

Do they just have to put the size between {}? Is it preferable to use a period or comma for the two decimals? Should GB also be written after the numbers?

Which of the following?
{3,60}
{3.60}
{3.60GB}
{3.60GB}
{3.60gb}
{3.60gb}

THANK YOU

from jackett.

garfield69 commented on June 29, 2024

I chose braces because its not usually part of the title, so it would be easy for the indexer to detect and attempt to parse the content as a size value.

The Jackett size parse tool is fairly flexible in its attempts to recognise a size value, as shown in this code snippet

        // ex: " 3.5  gb   " -> "3758096384" , "3,5GB" -> "3758096384" ,  "296,98 MB" -> "311406100.48" , "1.018,29 MB" -> "1067754455.04"
        // ex:  "1.018.29mb" -> "1067754455.04" , "-" -> "0" , "---" -> "0"
        public static long GetBytes(string str)
        {
            var valStr = new string(str.Where(c => char.IsDigit(c) || c == '.' || c == ',').ToArray());
            valStr = (valStr.Length == 0) ? "0" : valStr.Replace(",", ".");
            if (valStr.Count(c => c == '.') > 1)
            {
                var lastOcc = valStr.LastIndexOf('.');
                valStr = valStr.Substring(0, lastOcc).Replace(".", string.Empty) + valStr.Substring(lastOcc);
            }
            var unit = new string(str.Where(char.IsLetter).ToArray());
            var val = CoerceFloat(valStr);
            return GetBytes(unit, val);
        }

        public static long GetBytes(string unit, float value)
        {
            unit = unit.Replace("i", "").ToLowerInvariant();
            if (unit.Contains("kb"))
                return BytesFromKB(value);
            if (unit.Contains("mb"))
                return BytesFromMB(value);
            if (unit.Contains("gb"))
                return BytesFromGB(value);
            if (unit.Contains("tb"))
                return BytesFromTB(value);
            return (long)value;
        }

But for clarity and simplicity, {2 TB} or {6.9 GB} or {20 MB} or {400 KB} would most reliably get recognised.

from jackett.

garfield69 commented on June 29, 2024

size extractor from title added to indexer.

from jackett.

garfield69 commented on June 29, 2024

v0.21.1672

from jackett.

mtguido87 commented on June 29, 2024

Again, thank you for the very quick action.

I am preparing a thread on the MIRCrew forum to ask the releasers to use this formatting in the titles for size.

On the other hand, regarding the failure to report seeders and leechers, couldn't this be solved with the same method they use in the forum to report them? Basically the system contacts the first tracker in the magnet trackerlist and gets the related information from there, couldn't jackett do that too?

from jackett.

ilike2burnthing commented on June 29, 2024

Again, we're not going to do 100 additional requests for every search to return this data.

from jackett.

mtguido87 commented on June 29, 2024

Ancora una volta, non eseguiremo 100 richieste aggiuntive per ogni ricerca per restituire questi dati.

Sorry, I don't know the operation so I'm probably saying something stupid.
But jackett can now pick up the magnet without any problems, especially after the fix you made yesterday.
Once picked up, wouldn't it be able to pick up seeders and leechers directly from the magnet trackerlist without having to make 100 calls?

from jackett.

ilike2burnthing commented on June 29, 2024

No.

from jackett.

mtguido87 commented on June 29, 2024

Okay, I tried that. So I think the seeders and leechers we should definitely do without.

Thank you all the same!

from jackett.

mtguido87 commented on June 29, 2024

Good evening,
I proposed on mircrew to start putting the size of releases in curly brackets and they seem to have welcomed it, at least for future releases.
However, there is a releaser who asked if it would be possible, instead of looking for the size between the brackets, to look for it through a regex of this type:

(\d+[\.,]?\d+?\s*[KkMmGgTtBb][bB]?)

This is because so many existing releases have the dimension in square brackets. So we would gain the dimension not only on the future but also on the past!!!
Moreover, with such a regex the dimension would be obtained in all ways, even if it were not in brackets practically.

Is this a feasible thing for you?

Thanks!!!

from jackett.

ilike2burnthing commented on June 29, 2024

We can make it case insensitive and then use the following to reduce any false positives to a minimum:

\[(1?\d{1,3}(?:[\.,]\d{1,2})?\s*[KMGTP]?i?B)\]

This would mean that [1009,11 GiB] and [57kb] would be detected, but something like 2024b would not.

from jackett.

EmilioSalgari commented on June 29, 2024

I suggest (1?\d{1,3}(?:[\.,]\d{1,2})?\s*[KMGTP]?i?B)[^\w] so that [1009,11 GiB] [57kb] (43Kb) 6.9Tb would be detected but 320kbps would not.

from jackett.

mtguido87 commented on June 29, 2024

Good morning and thank you also for this update with Regex. We have communicated on the MIRCrew forum that they can now use brackets as so many were doing previously.

The TV series issue has come up now. This however on the Forum it often happens that within the thread a magnet is posted for each episode of the season.
Therefore, would it be possible (for Sonarr only) to make it so that when the download button is clicked, not only the first [magnet] in the thread is picked up but all the ones there? Would it create some kind of problem? I read on your tracker information that this is possible but would have to be coded in C#. Could you take care of this?
Thank you

from jackett.

ilike2burnthing commented on June 29, 2024

Not possible, in YAML or C#, without creating the aforementioned massive amount of additional traffic - #14543 (comment)

from jackett.

mtguido87 commented on June 29, 2024

Not possible, in YAML or C#, without creating the aforementioned massive amount of additional traffic - #14543 (comment)

Why should it generate massive traffic? I just ask that when you click download it doesn't just take the first magnet but all the ones in the thread. The action would be triggered only when the download is pressed so not for every search done.

from jackett.

ilike2burnthing commented on June 29, 2024

We present a single torrent/magnet for each result. If there are multiple torrents/magnets then we would present them as separate results, e.g. per episode/season/quality.

To do that in this case, we'd need to fetch every results page, click thanks on them all, refresh, and return all the magnets.

from jackett.

mtguido87 commented on June 29, 2024

We present a single torrent/magnet for each result. If there are multiple torrents/magnets then we would present them as separate results, e.g. per episode/season/quality.

To do that in this case, we'd need to fetch every results page, click thanks on them all, refresh, and return all the magnets.

Of course it is clear that this is not possible, and I understand that this would generate a lot of traffic.
Instead I was proposing, since mircrew is a forum and a thread is created for each season, the thread title (and thus also the result on sonarr) will indicate the entire season. At the download click it should put in download all the magnets contained in the thread and not just the first one (as it does now). Always if possible and shareable by you this method....

from jackett.

ilike2burnthing commented on June 29, 2024

This isn't possible, no.

from jackett.

garfield69 commented on June 29, 2024

The reason this is not possible, is because, as you may know, Jackett abides by the Torznab specifications with regards to the format of the queries and the responses.
The specs allow only for a single download link for every title returned from a query request.
So to provide a title with multiple download links would contravene the standards, and we are not going down that rabbit hole.

from jackett.

mtguido87 commented on June 29, 2024

The reason this is not possible, is because, as you may know, Jackett abides by the Torznab specifications with regards to the format of the queries and the responses. The specs allow only for a single download link for every title returned from a query request. So to provide a title with multiple download links would contravene the standards, and we are not going down that rabbit hole.

Ok now the reason is more clear. Thank you for the explanation.

from jackett.

boz96 commented on June 29, 2024

Hi guys, first of all thanks for all the work you are doing.
I take this opportunity to ask another thing: is it possible to bypass the "unknown movie" error?
Because on MIRCrew many releases have the double title [ITA_NAME - ENG_NAME (YYYY)] which is not recognized by Radarr. As a result, a good 50% of releases would not be taken into consideration.
Thanks a lot

from jackett.

garfield69 commented on June 29, 2024

is it possible to bypass the "unknown movie" error?

This is a limitation of Radarr which wants to match the full title.
Unfortunately I don't see a way around this, because the mircrew web site does not support tmdbid or imdbid searching, which is the usual way to bypass the title and work off the id.
Some multi-lingual web sites have a title in the primary language and an alternate title with the other language.
But since mircrew only provides the topic title in search results, there is no reliable way for the indexer to edit out the English title (or the Italian title for that matter) out of the title to thus return a shortened single language title which Radarr could match.

from jackett.

garfield69 commented on June 29, 2024

However, if the releasers were to add the year to the Italian side of the title then Sonarr would probably match.
eg. il miglio verde (1999) - the green mile (1999) 1080p x265 ITA ENG {4.0GB} -group

from jackett.

ilike2burnthing commented on June 29, 2024

We could add an indexer setting that would strip out the ~~first~~ second part of the title. A re_replace along the lines of:

^(.+ )- (?!Stagion[ei] \d|Special \((?:19|20)\d{2}|Miniserie).+? (\((?:19|20)\d{2})

It would definitely need to be disabled by default, as no matter how much I tweak that it's going to have false positives.

from jackett.

garfield69 commented on June 29, 2024

Hmm. would tagging if its a movie or not reduce the candidates an thus the false hits?
something like:

    _is_movie:
      selector: a[href^="./viewforum.php?f="]
      attribute: href
      case:
        a[href*="f=25"]: yes
        a[href*="f=26"]: yes
        a[href*="f=34"]: yes
        a[href*="f=36"]: yes
        "*": no

from jackett.

ilike2burnthing commented on June 29, 2024

Yea, that would definitely help if we only want to do it for movies. There were some examples of TV series with dual language titles (e.g. His Dark Materials, Percy Jackson and the Olympians, Servant of the People); they were few and far between, but are there.

I ended up with this:
^(.+ )- (?!Stagion[ei] \d|Special \((?:19|20)\d{2}|Miniserie).+? ((?:- Stagion[ei] \d+(?:-\d+)? )?\((?:19|20)\d{2}|\[COMPLETA|\[IN CORSO|\[IN PAUSA|\[INCOMPLETA)
but even that had false positives, and it breaks a lot of stuff outside of Film and Serie TV categories (though I suppose we could also add _is_serie_tv).

I'd lean towards just movies for now, and if we get reports of issues with 'unknown series' in Sonarr then we can revisit. What do you think?

Just movies:
^(.+ )- .+? (\((?:19|20)\d{2})

from jackett.

garfield69 commented on June 29, 2024

v0.21.1700

from jackett.

TheDestr0yer commented on June 29, 2024

I just tested the release and confirm that it now works. I also tried downloading and it goes without any problems. Thanks !

This is the title in Radarr

and this is the title on MIRCrew

from jackett.

Italian indexer MIRCrew have a problem about jackett HOT 54 CLOSED

Comments (54)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent