Giter VIP home page Giter VIP logo

Comments (50)

GhbSmwc avatar GhbSmwc commented on June 22, 2024 1

Also, I would prefer allowing quotes to surround the URL:

java -jar InternetWaybackMachine.jar "www.example url with spaces.com"

because using a text editor to replace can be diffcult, and errors could happen not just spaces, but also plus signs and other “reserved characters”.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

Thanks, @GhbSmwc. I'll fix it. Or even better you can make a PR :-)

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Looks like I found a solution towards the command prompt and batch file: %20 is the space character, and you double the percentage; %%20, give credit to this stack overflow post can help: https://stackoverflow.com/questions/22964857/space-to-url-in-a-batch-file
https://stackoverflow.com/questions/14509652/what-is-the-difference-between-and-in-a-cmd-file

This command worked:

java -jar InternetWaybackMachine.jar http://www.antialias.se/chiptunes/mod/DOH%%20-%%20Bob%%20le%%20cintre.mod >>OutputLog.txt

Oh and as a side note, when saving 100s of URLs from certain sites, keep in mind that there is about 1-3% chance that it will fail to save, not because it was excluded, but the site it is saving might've thinks it is a DDOS attack (happens on twitter when saving 1000s of links and images), or that the internet archive is under maintenance, so I thought it would be a good idea to have [>>Output.txt] at the end for re-saving failed links, as I made this tutorial: https://ia801400.us.archive.org/1/items/HowToSaveTwitter/HowToSaveTonsOfTweets.html

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

You may want to check out certain character encoding (like percent encoding) of japanese characters:

https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_%E3%81%AE%E3%82%B3%E3%83%94%E3%83%BC.png?1445384478
Which is
https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_のコピー.png?1445384478

This results a fail, even if the batch file uses chcp 65001 at the beginning. On tumblr, it seems to work (works both with and without chcp 65001):

https://yama252527.tumblr.com/post/184525594848/%E3%83%8A%E3%83%83%E3%83%88%E3%83%AC%E3%82%A4-%E3%83%80%E3%82%B8%E3%83%A3%E3%82%B9-183-22%E6%AD%B3
Which is:
https://yama252527.tumblr.com/post/184525594848/ナットレイ-ダジャス-183-22歳

I don't know if the difference on the URL interpretation is based on the website alone (it redirects you by the site, not by the browser) or the internet archive handles this. I did a manual save (visit https://archive.org/web/ on google chrome, and entered the URL) and it all works. NOTE: if you copy the URL on your address bar, it will be percent encoded when pasted, but will not if you copy some but not all the characters in it (say you copy all but the first character of the URL: ttps://example.com). Another possible cause is that it doesn't use UTF-8, rendering it incompatible with the chcp 65001. For more character encoding, see here and look for “UTF-8”, hopefully wikipedia listed all the possible URL encodings.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I did some further testing (this is the entire batch file):

chcp 65001
del "OutputLog1.txt"
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/thumb/zenshin_lusia_のコピー.png?1455468784 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/797/thumb/zenshin_trent_のコピー.png?1455548054 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/798/thumb/zenshin_shild_のコピー.png?1455548205 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/799/thumb/zenshin_fade_のコピー.png?1455630328 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/thumb/zenshin_flugel_のコピー.png?1455634546 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/original/zenshin_lusia_のコピー.png?1455468784 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/797/original/zenshin_trent_のコピー.png?1455548054 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/798/original/zenshin_shild_のコピー.png?1455548205 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/799/original/zenshin_fade_のコピー.png?1455630328 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/original/zenshin_flugel_のコピー.png?1455634546 >>OutputLog1.txt
pause

This also results in an error.

I converted the percent-encoded to its original string via this browser extension: https://chrome.google.com/webstore/detail/url-decode-encode/dgoepmkoiphgabefpbapldnjmbbiaoag

and if you manually enter those into the address bar, it works. I'll try and test every chcp values to see if it works on others: https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/chcp and https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

Hi @GhbSmwc. Instead of bug fixing, I have rewritten the entire program. You can try it. It should resolve the problem.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

Read the readme file for the instruction

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

uh, I'm using the latest version of java, and when running the jar file this happens:

C:\Users\RedBro\Desktop\InternetWayBackMachine-master\InternetWayBackMachine-master\archiver\InternetWayBackMachine-master_2019-08-04>java -jar internet-way-back-machine-1.0.jar
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/madadipouya/internet/waybackmachine/InternetWaybackMachine has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(Unknown Source)
        at java.security.SecureClassLoader.defineClass(Unknown Source)
        at java.net.URLClassLoader.defineClass(Unknown Source)
        at java.net.URLClassLoader.access$100(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.net.URLClassLoader$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:92)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:45)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:86)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
        at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

@GhbSmwc - are you using JDK 11 or 12? I think you are running Java 8 (class version 52). I have complied with JDK 11 or newer.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

image
I don't know if JDK is the version number of java or its components.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

I see. That's JDK 8. Let me see what I can do.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

thanks!

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

@GhbSmwc - give another try with https://github.com/kasramp/InternetWayBackMachine/releases/tag/internet-way-back-machine-1.1

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Um, it did work, but, I prefer this as an automated script to save each URL in a bulk. I would like the ability to run a single batch file, and will autosave each link, like I do earlier:

chcp 65001
del "OutputLog1.txt"
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/thumb/zenshin_lusia_のコピー.png?1455468784 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/797/thumb/zenshin_trent_のコピー.png?1455548054 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/798/thumb/zenshin_shild_のコピー.png?1455548205 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/799/thumb/zenshin_fade_のコピー.png?1455630328 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/thumb/zenshin_flugel_のコピー.png?1455634546 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/original/zenshin_lusia_のコピー.png?1455468784 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/797/original/zenshin_trent_のコピー.png?1455548054 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/798/original/zenshin_shild_のコピー.png?1455548205 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/799/original/zenshin_fade_のコピー.png?1455630328 >>OutputLog1.txt
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/original/zenshin_flugel_のコピー.png?1455634546 >>OutputLog1.txt
pause

And I also prefer the URLs to be able to be formatted as a column (one URL per line) and not just a row, as I use link extractor tools and they often output them like that. I did some more testing and special characters still breaks:

java -jar internet-way-back-machine-1.1.jar

I only run a batch file containing that command. Testing the URLs:
image
the space character and percent encoding still errors out, and japanese too:
image

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

By the way, if you don't know how I save bulk before now, which is my preferred method, this link: https://ia801400.us.archive.org/1/items/HowToSaveTwitter/HowToSaveTonsOfTweets.html contains my method on how to save. I wouldn't want to have to manuelly enter each link instead of auto-doing it, and I prefer the ability to output them to an external text file using >>OutputLog.txt when there is a random chance that the site may fail to save, like on twitter.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

@GhbSmwc - I see. Have you tried save-file command? You can pass a file containing urls.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

ok I got what's the problem. I have to look at it more deeply. It seems not that straightforward.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

Just provide more (raw NOT encoded)sample of urls that you have trouble with and I test against each.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Created a text file:

https://google.com
https://en.wikipedia.org
https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/thumb/zenshin_lusia_のコピー.png?1455468784
http://www.antialias.se/chiptunes/mod/4-MAT - 4-Mat's madness.mod

Resulted this:
image
Whats selected is each output of each line of the txt file containing the URLs. The one with spaces 404s and the japanese character randomly spits out a number 0.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

You may want to monitor the request (see the URLs that are being sent to and from your device), and compare that with manually entering the URLs (both percent-ed and raw), because when testing the tumblr's japanese characters (both percent and raw), it works:

https://yama252527.tumblr.com/post/185307526973/%E3%82%B9%E3%82%BA%E3%83%9F%E3%83%A4-a%E3%82%AD%E3%83%A5%E3%82%A6%E3%82%B3%E3%83%B3
https://yama252527.tumblr.com/post/185307526973/スズミヤ-aキュウコン

image

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I tried manuelly entering these URLs to save a page through my browser (entirely not using the script):

https://web.archive.org/save/https://yama252527.tumblr.com/post/185307526973/%E3%82%B9%E3%82%BA%E3%83%9F%E3%83%A4-a%E3%82%AD%E3%83%A5%E3%82%A6%E3%82%B3%E3%83%B3
https://web.archive.org/save/https://yama252527.tumblr.com/post/185307526973/スズミヤ-aキュウコン
https://web.archive.org/save/https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/thumb/zenshin_lusia_のコピー.png?1455468784
https://web.archive.org/save/https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/thumb/zenshin_lusia_%E3%81%AE%E3%82%B3%E3%83%92%E3%82%9A%E3%83%BC.png?1455468784

And ALL of them worked successfully.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Oh, and the reason why I wanted to output as a txt file is because the command prompt deletes oldest stuff on the console window when exceeding the buffer size, thus, should I have a really REALLY big list even with the biggest buffer setting on the command prompt, I end up with first nth of URLs not being displayed, having this output is REALLY important for AFK-ing when random failures happen (not just twitter, but when the IA goes into maintenance, also results in a fail).

I really liked this tool as this automates saving and I don't have to manuelly repeatedly save the links.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Found a major glitch when saving this list of links (these contains no encoded characters):
U.txt
^Download that file.

Now try making it read that file:
image

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I tested with a single link and the error still happens. Strangely, this error differs if you have the last line that's empty that exist vs not:
image
I tested this link: https://www.eff.org/deeplinks/2019/07/doj-and-fbi-show-no-signs-correcting-past-untruths-their-new-attacks-encryption and it worked.

I retested the link https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/326/328/medium/57d027867d8e17d46c3e365275effc34.png?1529667058 again, but this time, using save "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/326/328/medium/57d027867d8e17d46c3e365275effc34.png?1529667058" in the console window instead of an external txt file and errors out the same way I did using the text file. No non-ascii characters exist in the URL. I looked at the stacktrace to see whats going on:

java.lang.ArrayIndexOutOfBoundsException: 0
        at com.madadipouya.internet.waybackmachine.service.impl.DefaultInternetArchiveService.lambda$submit$0(DefaultInternetArchiveService.java:45)
        at java.util.Optional.ifPresent(Unknown Source)
        at com.madadipouya.internet.waybackmachine.service.impl.DefaultInternetArchiveService.submit(DefaultInternetArchiveService.java:38)
        at com.madadipouya.internet.waybackmachine.commands.SubmitCommand.submitUrl(SubmitCommand.java:35)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:282)
        at org.springframework.shell.Shell.evaluate(Shell.java:180)
        at org.springframework.shell.Shell.run(Shell.java:142)
        at org.springframework.shell.jline.InteractiveShellApplicationRunner.run(InteractiveShellApplicationRunner.java:84)
        at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:770)
        at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:760)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:318)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1213)
        at org.springframework.boot.SpringApplication.run(SpringApplication.java:1202)
        at com.madadipouya.internet.waybackmachine.InternetWaybackMachine.main(InternetWaybackMachine.java:11)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:47)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:86)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
        at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)

Hmmm, java.lang.ArrayIndexOutOfBoundsException: seems like getting each URL in the array have issues or that the index number to refer what URL in the list may be a value pointing beyond the last URL in the list.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Did some more URL testing, on smw central file bins (this one is mine), have a similar issues with the character encoding:

https://bin.smwcentral.net/u/18802/11.-%2BSpitz%2B-%2BRobinson.txt
https://bin.smwcentral.net/u/18802/11.-+Spitz+-+Robinson.txt
https://bin.smwcentral.net/u/18802/11.- Spitz - Robinson.txt
https://bin.smwcentral.net/u/18802/11.-%20Spitz%20-%20Robinson.txt

Now, here is some quick explanation:

[space] -> %20
+ (plus sign) -> %2B

According to w3schools: https://www.w3schools.com/tags/ref_urlencode.asp, the space character is replaced with two other representations, the + (your browser turns this into %2B) and the %20

The internet archive may get confused on how to interpret this correctly, its important to note that if you enter the top search bar:
image
as a URL in percent-encoded to view the saved version, it gets translated to its raw form:
image

Viewing the saved pages yields different results:
https://web.archive.org/web/20190705193557/https://bin.smwcentral.net/u/18802/11.-+Spitz+-+Robinson.txt results in a “file not found”
https://web.archive.org/web/20190804230931/https://bin.smwcentral.net/u/18802/11.-%20Spitz%20-%20Robinson.txt gives you the actual file.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Testing another japanese tumblr URL, both works:
U.txt
U2.txt
(I run them seperately because when one URL spits out an error on the program (not the link that failed to save saying error 404, I mean an error saying “Details of the error have been omitted.”)) would halt it from saving later URLs in the sequence.

EDIT: The reason why this works is because tumblr REDIRECTS you to the correct URL, thus you can put random stuff after https://yama252527.tumblr.com/post/177979176558/<random stuff here, or just end the string here> would redirect you to https://yama252527.tumblr.com/post/177979176558/ドクロッグ-グロリア-23歳-180-みんなのアイドル-グロリアでーす . Thus if tumblr didn't do that, theoretically it would error the same way as with https://s3-ap-northeast-1.amazonaws.com . I have to look for another japanese site that have Japanese character in the URL (which is rare because most site uses ID numbers or base64 strings instead of names and hated the percent encoding) that does not redirect to the correct location.

I found a site that deals with non-ascii character: https://www.avekt.com/en-us/WebApps/UrlEncodeJapanese

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I almost forgot, saving links that have no Japanese characters in the older version of the archive script (this, for example: https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/326/328/medium/57d027867d8e17d46c3e365275effc34.png?1529667058 ) does not error out and works correctly.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I almost forgot, saving links that have no Japanese characters in the older version of the archive script (this, for example: https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/326/328/medium/57d027867d8e17d46c3e365275effc34.png?1529667058 ) does not error out and works correctly.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

In case you need to refer to the older version and you have deleted it, I have a backup:
OldArchiveScript.zip

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

@GhbSmwc - thanks. I have the old version. I'll try to do one last fixing on it and see whether it works or not.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

@GhbSmwc - I've fixed the old version. You can download it from https://github.com/kasramp/InternetWayBackMachine/releases/tag/2019-08-07
I tested with the following URLs,

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

By the way, how did you manage to host this page: https://ia801400.us.archive.org/1/items/HowToSaveTwitter/HowToSaveTonsOfTweets.html
I mean is it a snapshot of a site or what?

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I used the file uploading service feature of the IA: https://archive.org/details/HowToSaveTwitter

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Still happens:
image

chcp 65001
del "OutputLog.txt"
java -jar InternetWaybackMachine.jar https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/514/original/onedraw_ibe_のコピー.png?1445384858 >>OutputLog.txt
java -jar InternetWaybackMachine.jar  https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_のコピー.png?1445384478 >>OutputLog.txt
pause

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

I think your terminal does not support Unicode. Can you check to see whether it works if you put the URL in the double quote? I don't have a Windows machine to test. In Bash works fine.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Still doesn't work:

chcp 65001
del "OutputLog.txt"
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/514/original/onedraw_ibe_のコピー.png?1445384858" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_のコピー.png?1445384478" >>OutputLog.txt
pause

and

chcp 65001
del "OutputLog.txt"
java -jar InternetWaybackMachine.jar ""https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/514/original/onedraw_ibe_のコピー.png?1445384858"" >>OutputLog.txt
java -jar InternetWaybackMachine.jar ""https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_のコピー.png?1445384478"" >>OutputLog.txt
pause

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I've installed ubuntu, and bash (thanks to this: https://itsfoss.com/install-bash-on-windows/ ). I would like to try this out on bash to see if that works on my PC, I'm new to this, and I don't know how to run the batch file with bash.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Good news, I test the URL that have spaces (no encoding), it worked:

chcp 65001
del "OutputLog1.txt"
java -jar InternetWaybackMachine.jar "http://www.antialias.se/chiptunes/mod/4-MAT - 4-Mat's madness.mod" >>OutputLog1.txt

pause

(uses OutputLog1.txt as a backup)

However, percent encoded japanese still breaks, with and without quotes.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

Great. In regards to Japanese characters issue that's a well known old bug of incompatibility between Windows Prompt and Java command line,
https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line/388500#388500
https://stackoverflow.com/questions/7660651/passing-command-line-unicode-argument-to-java-code
https://stackoverflow.com/questions/11927518/java-unicode-utf-8-and-windows-command-prompt

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

Either I have to add a feature to read a file containing URLs, like the new version. Or you can try to change the system locale to Japanese (as suggested in some replies of above links) to see whether it works or not. But that wouldn't be a permanent fix as it breaks on other locales like Korean.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

So it isn't your fault, it as if command prompt is speaking english and the java is speaking other language and the computer in between couldn't understand what it is t rying to do. I'll test the system locale to see if I can get command prompt to speak Japanese with java.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Changing system locale still blows up:
image
adding chcp 932 at the top gives the same result.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I wonder if changing the batch file's text encoding worked:
image

chcp 932
del "OutputLog.txt"
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_のコピー.png?1445384478"
pause

image
It did worked! So the error was the encoding of the text in the batch file.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Did some more testing:

chcp 65001
del "OutputLog.txt"
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/514/original/onedraw_ibe_のコピー.png?1445384858" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_のコピー.png?1445384478" >>OutputLog.txt
pause

Results this:
image
while this:

del "OutputLog.txt"
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/514/original/onedraw_ibe_のコピー.png?1445384858" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/chara_images/pictures/000/065/510/original/onedraw_estica_kansei_のコピー.png?1445384478" >>OutputLog.txt
pause

Results this:
image

Got to be really careful with the code page.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

However, these still doesn't work:

java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/thumb/zenshin_lusia_のコヒ゜ー.png?1455468784" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/797/thumb/zenshin_trent_のコヒ゜ー.png?1455548054" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/798/thumb/zenshin_shild_のコヒ゜ー.png?1455548205" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/799/thumb/zenshin_fade_のコヒ゜ー.png?1455630328" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/thumb/zenshin_flugel_のコヒ゜ー.png?1455634546" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/795/original/zenshin_lusia_のコヒ゜ー.png?1455468784" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/797/original/zenshin_trent_のコヒ゜ー.png?1455548054" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/798/original/zenshin_shild_のコヒ゜ー.png?1455548205" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/799/original/zenshin_fade_のコヒ゜ー.png?1455630328" >>OutputLog.txt
java -jar InternetWaybackMachine.jar "https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/original/zenshin_flugel_のコヒ゜ー.png?1455634546" >>OutputLog.txt

EDIT: These URLs don't exist:
image

Be very careful when you switch back and forth on the encoding, as they will not restore properly, for example:

https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/original/zenshin_flugel_のコピー.png?1455634546
gets converted to:
https://s3-ap-northeast-1.amazonaws.com/uchinoko/charas/avatars/000/011/800/original/zenshin_flugel_のコヒ゜ー.png?1455634546

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

I see. If I have time I try to add the feature so you can pass an input file. That way I don't think it breaks.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

@GhbSmwc - I've added the support for the file import https://github.com/kasramp/InternetWayBackMachine/releases/tag/2019-08-10
Hope it solves the issue.

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

Hates hash symbol:

chcp 65001
del "OutputLog2.txt"
java -jar InternetWaybackMachine.jar "https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/darknews %239.mod" >>OutputLog2.txt                                                                Page submission failed :-( 
java -jar InternetWaybackMachine.jar "https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/euskal rep%235.mod" >>OutputLog2.txt                                                               Page submission failed :-( 
java -jar InternetWaybackMachine.jar "https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/fanzine%2311.mod" >>OutputLog2.txt                                                                 Page submission failed :-( 
java -jar InternetWaybackMachine.jar "https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/fanzine%232.mod" >>OutputLog2.txt                                                                  Page submission failed :-( 
java -jar InternetWaybackMachine.jar "https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/hear is u only god.mod" >>OutputLog2.txt                                                           Page submission failed :-( 
java -jar InternetWaybackMachine.jar "https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/never mind %2357.mod" >>OutputLog2.txt                                                             Page submission failed :-( 
pause  

%23 is #

I tried manuel save, and it said that the request is invalid. Looking at the text bar, it is this:
https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/darknews
Wayback machine truncates anything after the non-percent-encoded space character. The one with only spaces seems to work right.

EDIT: Found some special properties with how hash is intepreted, for example:
https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/fanzine%2311.mod
Entering that to your browser's address bar works. But entering this doesn't:
https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/fanzine#11.mod

from internetwaybackmachine.

GhbSmwc avatar GhbSmwc commented on June 22, 2024

I found the issue.
%23 and # are treated differently. Here is an example:
FragmentIdentifier.zip
On .../test%23test.html of the address bar, append this to the end of it: #test (will be .../test%23test.html#test) and hit enter, you'll be taken to the bottom of that page. Not try entering this: .../test%23test.html%23test, you'll get file not found error.

The internet archive will treat both # and %23 the same, and when there is a # in the URL, that and all characters after it are removed. For example, saving https://en.wikipedia.org/wiki/Fragment_identifier#Examples will save https://en.wikipedia.org/wiki/Fragment_identifier instead. The same thing applies to URL files too, which leads to invalid URLs, for example, saving https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/euskal rep%235.mod will convert it to https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/euskal rep#5.mod, removes #5.mod at the end since the WBM thinks this is a web page that browsers would jump to that location, and tries to jump to https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/euskal rep instead, which is invalid, proven if you look at the URL in the WBM error page:
image

If you try to manuelly enter https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/euskal rep#5.mod in your browser's address bar, your browser, it would do the same as the WBM, thinking you are going to #5.mod section of https://modland.ziphoid.com/pub/modules/Protracker/Estrayk/euskal rep . So you have to replace the # with %23 in order to work.

I contacted the internet archive about this nasty flaw on the WBM, all percent encoded characters, including reserved ones are converted to their raw form, which have a special meaning different from their percent encoded form.

from internetwaybackmachine.

kasramp avatar kasramp commented on June 22, 2024

@GhbSmwc - Yes # interprets as the anchor in the url. And it's out of my control to solve it.
But the encoding bugs in Windows should be solved if you use the file import feature that I added yesterday.
You can do like this:
java -jar InternetWaybackMachine.jar -i [path to file]

from internetwaybackmachine.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.