Giter VIP home page Giter VIP logo

urlfix's Introduction

Python R Shell Script

Thank you ๐Ÿ–ค

Keep Building ๐Ÿ—

urlfix's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

urlfix's Issues

Add URL exceptions

Description

I would like to exclude some URLs from being replaced.

Similar Features

This will be similar to the replacement method once it's in place.

Feature Details

Exclude certain URLs from the replacement. If I have a URL like that of Zenodo that I know is correct (links that allow citing the latest version of multiple releases), this should be skipped when replacing.

Proposed Implementation

Write a RegEx to skip such links.

Move to Classes

Description

A proposal to implement a class or classes as opposed to standalone methods.

Similar Features

N/A

Feature Details

Using a class would provide an easier way to work with several files, improve readability, and provide greater control.

Proposed Implementation

Implement class URLFix

Support recursive replacements

Description

Given a directory of directories, I would like to be able to recursively replace outdated links.

Similar Features

Similar to DirURLFix but does so recursively.

Feature Details

Given a folder with sub folders, automatically detect sub folders and replace outdated links, if requested.

Proposed Implementation

  • Rethink DirURLFix. Walk over input directory.
  • Check that self.recursive works as expected.
  • Rewrite tests to reflect changes.

Requirements.txt doesn't specify urllib version

Describe the bug
When setting up the virtual environment, the urllib library version is not specified in the requirements.txt file thus it uses the system default.

To Reproduce

  1. Clone a clean branch of the repo, check the system's version of urllib using (Linux) pip show urllib.
  2. Create the virtual env python3 -m venv urlvenv and install requirements pip install -r requirements.txt
  3. Check the urllib version again, see that it matches the sysem default.

Expected behavior

Version of urllib specific to the source code to be explicitly installed.

Unexpected behavior

System default version of urllib is used when running the source code. The only version of urllib installed inside or outside the venv setup with the current requirements.txt file is urllib3==1.26.3, which is in fact the latest release (perhaps this is intentional).

System Details

Arch Linux.
Pip 20.2.3
Python 3.9

Speed up URL checks

Description

In checking if a URL is outdated, avoid visiting each URL.

Similar Features

This is similar to visit_urls but would avoid actually visiting the URLs.

Feature Details

The need for this is that visiting each link may be problematic and very slow for documents that have a lot of links.

Proposed Implementation

None yet

Create progress reports

Description

I would like to have a progress bar to report the estimated time for updating links.

Similar Features

N/A

Feature Details
I would like to estimate the time it will take and also report the actual time taken at the end of the update.

Proposed Implementation

For time taken, a simple use of time may suffice for example via time.time()- time.time()

Using performance timers in time may also be good.

Faster URL Checks: A Proposal

Description

Proposed implementation of a faster way to check for outdated links.

Similar Features

This is similar to the correct_urls argument in replace_urls

Feature Details

Instead of opening each link, it would be great to either

  • Use a cache for first time checks and use this subsequently.
  • Use a known URLs database and check if the URL exists in this database
  • Use threading

Proposed Implementation

Probably option 1 (0 index), but which database (?).

Restore test files even when tests fail

Describe the bug

Running python tests.py overwrites test files and does not restore them.

To Reproduce

For 9b11c37, run python tests.py and observe what happens to the files in the testfiles directory.

Expected behavior

I expected to have tests pass/fail without changing the original test files.

Unexpected behavior

When tests fail, the original test files have already been overwritten and cannot be easily restored.

System Details

Version: 9b11c37

Inplace replacement tests are not comprehensive

Describe the bug

Running tests twice will fail because of inplace replacement tests.

To Reproduce

Run python tests.py twice.

Expected behavior

I expected to have OK tests regardless of how many times I run them.

Unexpected behavior

Tests will fail. This is because when we replace inplace, the file has already changed therefore running twice fails. A way to fix this might be to restore the files in the testinplace directory after running tests. Might also need to test to ensure that we have 0s for all files if inplace worked.

System Details

OS independent, version 0.2.1 https://github.com/Nelson-Gon/urlfix/tree/558aa0fcf436b2e651eae97ecf9595cee2ac90f5/

Some links in MarkDown files are not replaced

Describe the bug

For markdown in the format [[]]()(), links in the second () are not replaced.

To Reproduce

Run replace_urls on Markdown files.

Expected behavior

I expected an output file with all links replaced.

Unexpected behavior

Only links in the first () are matched and replaced.

System Details

0.2.1 Dev: https://github.com/Nelson-Gon/urlfix/tree/4212bad4503cca5ae53f694c7bf623ca6d66fbf5

OS independent.


TODO:

  • Ensure text is kept

  • Match all URLs for Markdown in the form [[]]()()

  • Add tests for known string existence

  • Restore matching of .txt links.

Automatically handle firewall restrictions

Description

When running replace_urls, I receive the following error:

[WinError 10060] A connection attempt failed because the connected p....

Similar Features

This is related to replace_urls

Feature Details

For some reason, the code editor or python itself may be blocked by a firewall. In that case, replacements will not work and the above error will be thrown.

Proposed Implementation

Use urllib's proxy handlers.

Replace outdated URLs inplace

Description

Replace outdated URLs in the same file.

Similar Features

This is similar to replace_urls in URLFix and DirURLFix. The issue with the current replacements is that the former requires an output file while the latter creates them on-the-fly.

Feature Details

Using output files may create a lot of files in the user's system which may be problematic for users with low storage space.
Proposed Implementation

Create temporary files to write to before renaming them as the input file.

tests.py does not remove files it creates; ruins future tests

Describe the bug
The test creates files which cause future tests to fail.
To Reproduce
Clone the main branch, run the tests (success), then run the tests again (3 != 6 fails).

Expected behavior

The tests should succeed multiple times in a row if the code does not change or stays correct.

Unexpected behavior

Testing consecutively fails.

System Details

Developer version; Arch Linux; Python 3.91.

Warn if the target URL no longer exists

Description

If a URL is outdated and no new link exists, warn the user.

Similar Features

This is similar to replace_urls.

Feature Details

if url.is_outdated and url.redirects.status == 404:
user.warn
old_url.keep

The above is pseudocode that hopefully makes it clear what should be done.

Proposed Implementation

As above.

Only one link per line is matched.

Describe the bug

For both file and directory URL replacements, only one link per line is replaced.

To Reproduce

Run tests and observe what happens at the console with verbosity set. Alternatively, test the number of matched links against the expected number.

Expected behavior

Expected all URLs on every line to be matched.

Unexpected behavior

Only one URL per line is matched.

System Details

Main 0.2.1: https://github.com/Nelson-Gon/urlfix/tree/43018ca09b1aab6591b691b2e768a4233e5a4e2d OS independent.

Enable script mode

Description

I would like to be able to fix outdated links, at the command line.

Similar Features

This is just like using urlfix or dirurlfix but at the command line.

Feature Details

Sufficiently described.

Proposed Implementation

  • Create a file named main.py (for now). TODO: Find a suitable name that also avoids conflicts with urlfix.
  • Add the options --mode (-m), --input-file (-in), --output-file (-o), --verbose (-v), --inplace (-i)
  • Make this executable via python main.py -in myfile.md -o [optional] --v 1 -i 1. Add to path? This should work in the sense that one can run this from anywhere not just within urlfix (if cloned for instance).
  • Add correct_urls argument for known URLs. This is less practical and may not be convenient at the command line. A current idea is to provide a file with known URLs and read that instead.
  • Add tests for script mode.

Thanks!

Use logger instead of verbose messages

Description

Instead of printing to console, we could use a log file instead to handle all output to stdout.

Similar Features

This is similar to simply printing what is taking place at the console.

Feature Details

Sufficiently described above.

Proposed Implementation

Use logging as done in Nelson-Gon/pycite#23

Replace URLs in an entire file directory

Description

I would like to replace outdated URLs in all files in a given directory.

Similar Features

Currently replace_urls only accepts a single file.

Feature Details

The feature has been well described.

Proposed Implementation

  • Provide a directory argument.
  • Loop over all files and replace URLs in each file. This may be computationally slow.

Automatically detect file extension in replace_urls

Description

The default in replace_urls(output_file="replacement.txt") should automatically match the input file's format.

Similar Features

This is similar to replace_urls(output_file="replacement.txt")

Feature Details

It has been sufficiently described.

Proposed Implementation

Use a regular expression to match the file format and use that.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.