Comments (54)
I don't know exactly how prowlarr does things, but I do know that it imports the mircrew indexer code that we wrote for Jackett, and for Jackett the flow is:
- perform a search on the mircrew we site, example
https://mircrew-releases.org/search.php?keywords=%2Ble+%2Bali+%2Bdella+%2Bliberta&terms=all&sc=0&sf=titleonly&sr=topics&sk=t&sd=d&st=0&ch=300&t=0&submit=Cerca&fid[]=25&fid[]=26&fid[]=51&fid[]=52&fid[]=29&fid[]=30&fid[]=31&fid[]=33&fid[]=34&fid[]=35&fid[]=36&fid[]=37&fid[]=39&fid[]=40&fid[]=41&fid[]=42&fid[]=43&fid[]=45&fid[]=46&fid[]=47
- and process the results page HTML by listing the names of the posts and providing default values for the seed, leech and size.
- when a user clicks on the download link, then perform the thank you, but doing a http GET, example:
https://mircrew-releases.org/viewtopic.php?f=26&p=309816&thanks=309816&to_id=197&from_id=3950
- then when the post page is refreshed, fetch the new page with another http GET, example:
https://mircrew-releases.org/viewtopic.php?f=26&p=309816
- Then we process this page with the HTML handler, looking for the first magnet link
selectors: - selector: a[href^="magnet:?xt="] attribute: hrefNote that this is not an edit of the post page, its a browser HTML view, so there is no ASCII carriage return to process So your theory does not apply to Jackett, and I expect it does not apply to Prowlarr either, but I would want one of the Prowlarr team to confirm this. So what ever issue is routinely preventing your prowlarr indexer to return the magnet, it is some other problem . I note that the mircrew website is recently always under performance pressure, for example I often see a
you cannot perform a search at this time, the load is to high
messages that cause the search to return no results. Its possible that the same applies to fetching the details page after the thank you is done, and its simply this that prevents the magnet link from being found. (This is my theory only, not based on any actual research / testing).As for the size, when the indexer performs a search, mircrew returns a list of post links in the search results page, but with no other details other than the name. So the indexer has hardcoded the size, seed and leech values, size 512MB, 1 seed, 1 leech. And while the seed and leech numbers are present on the post page once you have clicked thank you, I cannot find the size in the post page. But even if the size and updated seed and leech values were present in the post page, the indexer is not going to go fetch every post from the results page in order to provide those values, that is not practical.
And instead I just double-checked and it verifies exactly from there, because when it tries to like it but the button is not there he clicks the one before and it opens quote and looks for the magnet there. I have already tried (also tried again right now) on jacket and it does exactly the same thing.
To understand how it works you have to look at the logs, from the configurazoine file and that's all you get right.
Plus I have tried with about 20 instances and the rule is exactly that, with going to head it fails, without it works. So the problem is that necessarily
On the other hand, regarding the size, how come it is not possible to extract the size from the page
from jackett.
so what is really happening is that once a user has performed a thank you, then the post page does not have the thank you link anymore.
so if you perform a search the second time, and click on the indexer download link, the indexer tries to performs a thank you with https://mircrew-releases.org/viewtopic.php?f=26&p=309816&thanks=309816&to_id=197&from_id=3950
but ends up selecting the edit link instead. This is not intended and is the cause of the error.
the only way forward would be to recode the indexer in C# so that code can be added to check for the presence of the thank you link and only use it if found.
if the thank you link is not found look for the magnet.
if the thank you is found, do the thank you, then find the magnet.
[edit] corrected analysis
from jackett.
that partly helps, in that the logs show the selector was picked up, but then it error out with a bencode.
I'm having a bad day diagnosing stuff, I'm going to take a break ;-b
from jackett.
\o/
from jackett.
Would there perhaps be a way by which the forum releasers can specify the size and have jackett be able to read it easily?
Probably a way forward would be for the releasers to include the size in their topic title within {}, for example:
title movie (2024) 1080p x265 {2.6 GB} ita eng AAC Sub ita eng-group
or
title series 2024 - Stagione 01 (2024) [COMPLETA] 480p x264 {920 MB} ita eng AAC Sub ita eng-group
The indexer when processing the search results, could then look for the {} in the title and extract the size, and if not present default to 512MB
from jackett.
I'd prefer to have some requirement for brackets, just to keep false positives to a minimum (e.g. 2B (2009) 1080p
would show a size of 2 bytes). How about the following?
[\[\({](1?\d{1,3}(?:[\.,]\d{1,2})?\s*[KMGTP]?i?B)[\]\)}]
from jackett.
Given that the original query was for Radarr dual title unknown-movie flagging, I would just do movies to reduce the collateral, and definitely as a config optional. then strip out the 2nd title from the names.
from jackett.
I don't know exactly how prowlarr does things, but I do know that it imports the mircrew indexer code that we wrote for Jackett, and for Jackett the flow is:
- perform a search on the mircrew we site, example
https://mircrew-releases.org/search.php?keywords=%2Ble+%2Bali+%2Bdella+%2Bliberta&terms=all&sc=0&sf=titleonly&sr=topics&sk=t&sd=d&st=0&ch=300&t=0&submit=Cerca&fid[]=25&fid[]=26&fid[]=51&fid[]=52&fid[]=29&fid[]=30&fid[]=31&fid[]=33&fid[]=34&fid[]=35&fid[]=36&fid[]=37&fid[]=39&fid[]=40&fid[]=41&fid[]=42&fid[]=43&fid[]=45&fid[]=46&fid[]=47
- and process the results page HTML by listing the names of the posts and providing default values for the seed, leech and size.
- when a user clicks on the download link, then perform the thank you, but doing a http GET, example:
https://mircrew-releases.org/viewtopic.php?f=26&p=309816&thanks=309816&to_id=197&from_id=3950
- then when the post page is refreshed, fetch the new page with another http GET, example:
https://mircrew-releases.org/viewtopic.php?f=26&p=309816
- Then we process this page with the HTML handler, looking for the first magnet link
selectors:
- selector: a[href^="magnet:?xt="]
attribute: href
Note that this is not an edit of the post page, its a browser HTML view, so there is no ASCII carriage return to process
So your theory does not apply to Jackett, and I expect it does not apply to Prowlarr either, but I would want one of the Prowlarr team to confirm this.
So what ever issue is routinely preventing your prowlarr indexer to return the magnet, it is some other problem
.
I note that the mircrew website is recently always under performance pressure, for example I often see a you cannot perform a search at this time, the load is to high
messages that cause the search to return no results.
Its possible that the same applies to fetching the details page after the thank you is done, and its simply this that prevents the magnet link from being found. (This is my theory only, not based on any actual research / testing).
As for the size, when the indexer performs a search, mircrew returns a list of post links in the search results page, but with no other details other than the name. So the indexer has hardcoded the size, seed and leech values, size 512MB, 1 seed, 1 leech.
And while the seed and leech numbers are present on the post page once you have clicked thank you, I cannot find the size in the post page.
But even if the size and updated seed and leech values were present in the post page, the indexer is not going to go fetch every post from the results page in order to provide those values, that is not practical.
from jackett.
regarding the size, how come it is not possible to extract the size from the page
found the size on the post, sorry, missed it the first dozen times :-D
so we don't fetch the size from the post page, because its not practical.
the indexer does a GET to perform the search, which may return up to 100 results.
then the indexer would have to do another 100 GET, one for each post, and read the size.
no website is going to want anyone to perform 101 GET requests every time some one does a search, the traffic would push a web site into overload. IT may even block your IP as a DDoS generator.
from jackett.
MIRCrew was changed from looking specifically for the thanks button a while ago - 259d98c. This is why on already thanked torrents it's selecting the quote button. This is working fine in both Jackett and Prowlarr for me. ilDraGoNeRo2 is also working using the same method (I've just updated the selector).
@garfield69 are you getting an error in Jackett?
from jackett.
@ilike2burnthing I believe the first time you search for a post that you have not already tried to download before, then the indexer works fine.
But if you search for it again and try to download it a second time then that is when you get the error.
from jackett.
Yea, for me first, second, and all subsequent download attempts work fine.
from jackett.
Same here for me on Jackett. Is it something peculiar to prowlarr then, have you tried it there?
from jackett.
Working on Prowlarr too. I think your original theory about it just coinciding with the site being up and down might be the likely culprit.
from jackett.
try searching for le ali della libert
and see if you can download that once or twice,.
[edit] the first and second torrents don't work, but the third one works ok
which I think is what the OP has been trying to tell us.
from jackett.
Oh waoh, yea I can't even download it the first time. Looks like a formatting issue on their part. Change a[href^="magnet:?xt="]
to a[href*="magnet:?xt="]
?
from jackett.
Yea, seeing that as well. Playing around with things and will come back in a bit.
from jackett.
As I have written since the first post, the problem occurs with releases where the thank you button is not present.
In that case the system quotes the message (I don't know why it does that) and then takes the magnet.
The problem occurs in these cases when the magnet goes to the head. I also put some explanatory pictures in the first post.
If you want to look for such a release that does not work look for this one: The Green Mile 2160p Licdom and you will see that the download fails.
https://i.postimg.cc/G3zfSnm2/image.png
Do you have any idea how to fix it?
from jackett.
Do you have any idea how to fix it?
I think we are finally understanding the scope of the problem, (BTW thank you for persevering and remaining calm throughout my ramblings), and now that we can reproduce the problem for ourselves we will try to work out a solve, if possible, within the yaml construct.
But it may be that the solution may be with a re-write of the indexer to C#.
from jackett.
Might be as simple as this:
selectors:
- selector: a[href*="magnet:?xt="]
attribute: href
filters:
- name: re_replace
args: ["\n", ""]
Currently just trying to work out if htmldecode
is also needed.
from jackett.
This is the thread reguarding the image of "The green mile" that not work, as you can see the magnet go to the head:
https://i.postimg.cc/PJYfMf1S/image.png
from jackett.
Potrebbe essere semplice come questo:
selectors: - selector: a[href*="magnet:?xt="] attribute: href filters: - name: re_replace args: ["\n", ""]Al momento sto solo cercando di capire se
htmldecode
รจ necessario.
Thank you so much to investigate and try to fix the problem.
If you want also to solve the problem of the fixed "512mb" i will offer you a sweetie cappuccino!!
from jackett.
from jackett.
So ilike2burnthing has found a solve for the magnet issue, and it will be available in the Jackett app with the next release due out in about 5 hours or so.
When the Prowlarr team import the updated indexer from our source (anytime they please from now) then this should be available on that app too.
from jackett.
So ilike2burnthing has found a solve for the magnet issue, and it will be available in the Jackett app with the next release due out in about 5 hours or so. When the Prowlarr team import the updated indexer from our source (anytime they please from now) then this should be available on that app too.
You guys are awesome!!! THANK YOU! On behalf of myself and others in the MIRCrew community who will benefit!
Will Prowlarr automatically transpose this fix or do you have to report it to someone somehow?
Since you explained to me that from your side there is no way to solve the file size problem right?
Would there perhaps be a way by which the forum releasers can specify the size and have jackett be able to read it easily? I could then flag it and solve that problem as well.
Let me know!
Thanks
from jackett.
When the Prowlarr team import the updated indexer from our source (anytime they please from now) then this should be available on that app too.
Unless the file size of each release is in the search results page (which they can't/aren't going to do because the site is just a forum), this can't be fixed by them.
from jackett.
Ci sarebbe forse un modo in cui i releaser del forum possano specificare la dimensione e fare in modo che Jackett sia in grado di leggerla facilmente?
Probabilmente una soluzione sarebbe che i rilasci includessero la dimensione nel titolo dell'argomento all'interno di {}, ad esempio:
title movie (2024) 1080p x265 {2.6 GB} ita eng AAC Sub ita eng-group
ortitle series 2024 - Stagione 01 (2024) [COMPLETA] 480p x264 {920 MB} ita eng AAC Sub ita eng-group
L'indicizzatore durante l'elaborazione dei risultati della ricerca, potrebbe quindi cercare {} nel titolo ed estrarre la dimensione e, se non presente, impostare automaticamente 512 MB
This would be a very good solution.
If you give me confirmation that you could update jackett with this feature I will immediately try to notify the releasers to include the size between {} in the title, hopefully they will accept this proposal.
Do they just have to put the size between {}? Is it preferable to use a period or comma for the two decimals? Should GB also be written after the numbers?
Which of the following?
{3,60}
{3.60}
{3.60GB}
{3.60GB}
{3.60gb}
{3.60gb}
THANK YOU
from jackett.
I chose braces because its not usually part of the title, so it would be easy for the indexer to detect and attempt to parse the content as a size value.
The Jackett size parse tool is fairly flexible in its attempts to recognise a size value, as shown in this code snippet
// ex: " 3.5 gb " -> "3758096384" , "3,5GB" -> "3758096384" , "296,98 MB" -> "311406100.48" , "1.018,29 MB" -> "1067754455.04"
// ex: "1.018.29mb" -> "1067754455.04" , "-" -> "0" , "---" -> "0"
public static long GetBytes(string str)
{
var valStr = new string(str.Where(c => char.IsDigit(c) || c == '.' || c == ',').ToArray());
valStr = (valStr.Length == 0) ? "0" : valStr.Replace(",", ".");
if (valStr.Count(c => c == '.') > 1)
{
var lastOcc = valStr.LastIndexOf('.');
valStr = valStr.Substring(0, lastOcc).Replace(".", string.Empty) + valStr.Substring(lastOcc);
}
var unit = new string(str.Where(char.IsLetter).ToArray());
var val = CoerceFloat(valStr);
return GetBytes(unit, val);
}
public static long GetBytes(string unit, float value)
{
unit = unit.Replace("i", "").ToLowerInvariant();
if (unit.Contains("kb"))
return BytesFromKB(value);
if (unit.Contains("mb"))
return BytesFromMB(value);
if (unit.Contains("gb"))
return BytesFromGB(value);
if (unit.Contains("tb"))
return BytesFromTB(value);
return (long)value;
}
But for clarity and simplicity, {2 TB}
or {6.9 GB}
or {20 MB}
or {400 KB}
would most reliably get recognised.
from jackett.
size extractor from title added to indexer.
from jackett.
v0.21.1672
from jackett.
Again, thank you for the very quick action.
I am preparing a thread on the MIRCrew forum to ask the releasers to use this formatting in the titles for size.
On the other hand, regarding the failure to report seeders and leechers, couldn't this be solved with the same method they use in the forum to report them? Basically the system contacts the first tracker in the magnet trackerlist and gets the related information from there, couldn't jackett do that too?
from jackett.
Again, we're not going to do 100 additional requests for every search to return this data.
from jackett.
Ancora una volta, non eseguiremo 100 richieste aggiuntive per ogni ricerca per restituire questi dati.
Sorry, I don't know the operation so I'm probably saying something stupid.
But jackett can now pick up the magnet without any problems, especially after the fix you made yesterday.
Once picked up, wouldn't it be able to pick up seeders and leechers directly from the magnet trackerlist without having to make 100 calls?
from jackett.
No.
from jackett.
Okay, I tried that. So I think the seeders and leechers we should definitely do without.
Thank you all the same!
from jackett.
Good evening,
I proposed on mircrew to start putting the size of releases in curly brackets and they seem to have welcomed it, at least for future releases.
However, there is a releaser who asked if it would be possible, instead of looking for the size between the brackets, to look for it through a regex of this type:
(\d+[\.,]?\d+?\s*[KkMmGgTtBb][bB]?)
This is because so many existing releases have the dimension in square brackets. So we would gain the dimension not only on the future but also on the past!!!
Moreover, with such a regex the dimension would be obtained in all ways, even if it were not in brackets practically.
Is this a feasible thing for you?
Thanks!!!
from jackett.
We can make it case insensitive and then use the following to reduce any false positives to a minimum:
\[(1?\d{1,3}(?:[\.,]\d{1,2})?\s*[KMGTP]?i?B)\]
This would mean that [1009,11 GiB]
and [57kb]
would be detected, but something like 2024b
would not.
from jackett.
I suggest (1?\d{1,3}(?:[\.,]\d{1,2})?\s*[KMGTP]?i?B)[^\w]
so that [1009,11 GiB]
[57kb]
(43Kb)
6.9Tb
would be detected but 320kbps
would not.
from jackett.
Good morning and thank you also for this update with Regex. We have communicated on the MIRCrew forum that they can now use brackets as so many were doing previously.
The TV series issue has come up now. This however on the Forum it often happens that within the thread a magnet is posted for each episode of the season.
Therefore, would it be possible (for Sonarr only) to make it so that when the download button is clicked, not only the first [magnet] in the thread is picked up but all the ones there? Would it create some kind of problem? I read on your tracker information that this is possible but would have to be coded in C#. Could you take care of this?
Thank you
from jackett.
Not possible, in YAML or C#, without creating the aforementioned massive amount of additional traffic - #14543 (comment)
from jackett.
Not possible, in YAML or C#, without creating the aforementioned massive amount of additional traffic - #14543 (comment)
Why should it generate massive traffic? I just ask that when you click download it doesn't just take the first magnet but all the ones in the thread. The action would be triggered only when the download is pressed so not for every search done.
from jackett.
We present a single torrent/magnet for each result. If there are multiple torrents/magnets then we would present them as separate results, e.g. per episode/season/quality.
To do that in this case, we'd need to fetch every results page, click thanks on them all, refresh, and return all the magnets.
from jackett.
We present a single torrent/magnet for each result. If there are multiple torrents/magnets then we would present them as separate results, e.g. per episode/season/quality.
To do that in this case, we'd need to fetch every results page, click thanks on them all, refresh, and return all the magnets.
Of course it is clear that this is not possible, and I understand that this would generate a lot of traffic.
Instead I was proposing, since mircrew is a forum and a thread is created for each season, the thread title (and thus also the result on sonarr) will indicate the entire season. At the download click it should put in download all the magnets contained in the thread and not just the first one (as it does now). Always if possible and shareable by you this method....
from jackett.
This isn't possible, no.
from jackett.
The reason this is not possible, is because, as you may know, Jackett abides by the Torznab specifications with regards to the format of the queries and the responses.
The specs allow only for a single download link for every title returned from a query request.
So to provide a title with multiple download links would contravene the standards, and we are not going down that rabbit hole.
from jackett.
The reason this is not possible, is because, as you may know, Jackett abides by the Torznab specifications with regards to the format of the queries and the responses. The specs allow only for a single download link for every title returned from a query request. So to provide a title with multiple download links would contravene the standards, and we are not going down that rabbit hole.
Ok now the reason is more clear. Thank you for the explanation.
from jackett.
Hi guys, first of all thanks for all the work you are doing.
I take this opportunity to ask another thing: is it possible to bypass the "unknown movie" error?
Because on MIRCrew many releases have the double title [ITA_NAME - ENG_NAME (YYYY)] which is not recognized by Radarr. As a result, a good 50% of releases would not be taken into consideration.
Thanks a lot
from jackett.
is it possible to bypass the "unknown movie" error?
This is a limitation of Radarr which wants to match the full title.
Unfortunately I don't see a way around this, because the mircrew web site does not support tmdbid or imdbid searching, which is the usual way to bypass the title and work off the id.
Some multi-lingual web sites have a title in the primary language and an alternate title with the other language.
But since mircrew only provides the topic title in search results, there is no reliable way for the indexer to edit out the English title (or the Italian title for that matter) out of the title to thus return a shortened single language title which Radarr could match.
from jackett.
However, if the releasers were to add the year to the Italian side of the title then Sonarr would probably match.
eg. il miglio verde (1999) - the green mile (1999) 1080p x265 ITA ENG {4.0GB} -group
from jackett.
We could add an indexer setting that would strip out the first second part of the title. A re_replace along the lines of:
^(.+ )- (?!Stagion[ei] \d|Special \((?:19|20)\d{2}|Miniserie).+? (\((?:19|20)\d{2})
It would definitely need to be disabled by default, as no matter how much I tweak that it's going to have false positives.
from jackett.
Hmm. would tagging if its a movie or not reduce the candidates an thus the false hits?
something like:
_is_movie:
selector: a[href^="./viewforum.php?f="]
attribute: href
case:
a[href*="f=25"]: yes
a[href*="f=26"]: yes
a[href*="f=34"]: yes
a[href*="f=36"]: yes
"*": no
?
from jackett.
Yea, that would definitely help if we only want to do it for movies. There were some examples of TV series with dual language titles (e.g. His Dark Materials, Percy Jackson and the Olympians, Servant of the People); they were few and far between, but are there.
I ended up with this:
^(.+ )- (?!Stagion[ei] \d|Special \((?:19|20)\d{2}|Miniserie).+? ((?:- Stagion[ei] \d+(?:-\d+)? )?\((?:19|20)\d{2}|\[COMPLETA|\[IN CORSO|\[IN PAUSA|\[INCOMPLETA)
but even that had false positives, and it breaks a lot of stuff outside of Film and Serie TV categories (though I suppose we could also add _is_serie_tv
).
I'd lean towards just movies for now, and if we get reports of issues with 'unknown series' in Sonarr then we can revisit. What do you think?
Just movies:
^(.+ )- .+? (\((?:19|20)\d{2})
from jackett.
v0.21.1700
from jackett.
I just tested the release and confirm that it now works. I also tried downloading and it goes without any problems. Thanks !
This is the title in Radarr
and this is the title on MIRCrew
from jackett.
Related Issues (20)
- [req]: CrabPT (่น้ปๅ ก) HOT 1
- [req]: dx-team.org HOT 2
- [passthepopcorn] (updating) Parse error HOT 1
- rutor.info http problem HOT 2
- [thepiratebay] (testing) Exception (thepiratebay): A task was canceled.: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing. HOT 6
- [req]: Renew subtorrent request HOT 1
- [acgrip] (testing) Exception (acgrip): Received an unexpected EOF or 0 bytes from the transport stream.: The SSL connection could not be established, see inner exception. HOT 2
- [req]: AT-12 Project HOT 9
- [req]: Polish Torrent HOT 3
- [req]: sort the RSS feed result based on real created time (ignore sticky) HOT 4
- [req]: Das Unerwartete HOT 2
- [dontorrent] (testing) Exception (dontorrent): certificate validation failed: [Subject] CN=dontorrent.cyou[Issuer] CN="allot.com/[email protected]", OU=Allot, O=Allot, L=Madrid, S=Madrid, C=ES[Serial Number] 00B0499F6500000000F845010000000000[Not Before] 12/16/2016 1:07:49PM[Not After] 12/16/2026 1:07:49PM[Thumbprint] 6F803BC752FA4307D71905588D579B556D7073B4: The SSL connection could not be established, see inner exception. HOT 9
- [req]: IrishTV HOT 3
- [req]: H-P2P HOT 1
- [totheglory] (testing) Exception (totheglory): Object reference not set to an instance of an object.: Parse error HOT 2
- [speedcd] (updating) Exception (speedcd): Error parsing the login form: Error parsing the login form HOT 19
- Old Greek Tracker - test search => Found no results while trying to browse this tracker. This may be an issue with the indexer, or other indexer settings such as search freeleech only etc. HOT 1
- LilleSky.org - Object reference not set to an instance of an object HOT 1
- [solidtorrents] (testing) Exception (solidtorrents):(solidtorrents.to:443) HOT 4
- [shizaroject] (testing) Test search in ShizaProject => Found no results while trying to browse this tracker. This may be an issue with the indexer, or other indexer settings such as search freeleech only etc. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jackett.