sganis / minirepo Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 13.0 1.14 MB

Create a local pypi repository to use pip off-line

License: MIT License

Python 100.00%

minirepo's People

Contributors

Stargazers

Watchers

Forkers

pythonpeixun shustinm greatbahram sy-base dm168168 nadeemnazeer3 niftymist abduhbm theonlynexus baas-hans bartbroere allamiro

minirepo's Issues

Compatibility Issue with get_config Function for Python 2 and Python 3

Issue Description:

The get_config function in the minirepo script uses raw_input, which is only available in Python 2. This causes a NameError when running the script in Python 3. Additionally, the function prompts for user input, which is not ideal for non-interactive use.
Steps to Reproduce:

* Run the minirepo script with Python 3.
* Observe the NameError due to raw_input.

Expected Behavior:

The script should work seamlessly in both Python 2 and Python 3, and it should avoid prompting for user input in a non-interactive environment.

Actual Behavior:

The script throws a NameError due to the use of raw_input in Python 3, and it prompts for user input if the configuration file is missing or has errors.

Proposed Solution:

Update the get_config function to ensure compatibility with both Python 2 and Python 3 by using input and setting default values without prompting for user input.

try:
    input = raw_input
except NameError:
    pass
def get_config():
        config_file = os.path.expanduser("~/.minirepo")
        repository = os.path.expanduser("~/minirepo")
        processes = 10
        try:
                with open(config_file, 'r') as f:
                    config = json.load(f)
        except (json.JSONDecodeError, FileNotFoundError):
                newrepo = input('Repository folder [%s]: ' % repository)
                if newrepo:
                        repository = newrepo
                newprocesses = input('Number of processes [%s]: ' % processes)
                if newprocesses:
                        processes = int(newprocesses)
                config = {
                "repository": repository,
                "processes":process,
                "python_versions":PYTHON_VERSIONS,
                "package_types": PACKAGE_TYPES,
                "extensions": EXTENSIONS
                }
                with open(config_file, 'w') as w:
                        json.dump(config, w, indent=2)

        for c in sorted(config):
                print('%-15s = %s' % (c,config[c]))
        print('Using config file %s ' % config_file)

        return config

Environment:

Python version: 2.x and 3.x
minirepo version: 1.0.3
Operating System: Zorin OS

OS Filters

When I ran minirepo it was downloading amd64 wheel files only. I expected that it would download wheel files and source files. The config I used was

{
  "processes"      : "20", 
  "package_types"  : ["bdist_wheel","sdist"], 
  "extensions"     : ["tgz","whl","gz"], 
  "python_versions": ["2.7","cp27","cp36","cp37","py2","py3","py2.py3","py27","py36","py37
  "repository"     : "/home/<username>/minirepo"
}

Now I saw in the code something about a platform filter.
"platforms" : ["linux","win32","win_amd64","macOS"],
It looks that it covers all platforms. But I wonder if there is some hardcoded filter to download amd64 only?

Because when I added to the config above the following line
"platforms" : ["linux"],
and restarted it, it took a while (i guess to process the huge packages.json file) and then it downloaded this:

2020-05-05 20:32:14,417:WARNING: Downloaded: cmappertools-1.0.24-cp27-cp27m-win_amd64.whl Ok pid:8919 1% [277/23193.0]

Before it had downloaded some 12GB of win_amd64 packages. Shouldn't if have downloaded only linux (...none-linux_arm.whl) and source files?
How can I make it download Linux and Source files?

Repo does not have a license

Hi, I'd like to use minirepo in one of my projects, but I'm hesitant to do so without a license for the repo.
Would you please consider adding one? Perhaps MIT or BSD?
Thank you!

Replace Log-Based Progress with Visual Progress Bar and Total Size Calculation in minirepo

Issue Description:

The current implementation of the minirepo script provides progress updates through log messages that display the number of packages downloaded and a percentage completion. While functional, this method is less user-friendly compared to a visual progress bar. Additionally, the script does not calculate or display the total size to be downloaded before starting the process, which would be useful for users in managing disk space and network bandwidth.

Findings:

Log-Based Progress Reporting:The script currently logs the number of packages downloaded and the percentage of progress.Example: WARNING: Downloaded: example-package-1.0.0.whl Ok pid:1234 20% [2/10] ..While this is informative, it lacks the visual clarity and continuous feedback provided by a graphical progress bar.
Absence of Total Download Size:The script does not calculate the total size of all packages to be downloaded before starting.This can lead to unexpected resource usage, particularly when dealing with large downloads.

Proposed Solution:

Implement a Visual Progress Bar:Replace the log-based progress reporting with a visual progress bar using for example the tqdm library.The progress bar should show the percentage completed, the current download speed, and the total data downloaded so far.
Calculate and Display Total Download Size:Before starting the download process, calculate the total size of all packages that will be downloaded.Display this total alongside the progress bar, so users know the expected resource usage.

Benefits:

A visual progress bar provides continuous feedback and is more intuitive for users. Additionally, knowing the total size in advance allows users to make informed decisions about whether to proceed with the download.

Environment:

Python version: 2.x and 3.x
minirepo version: 1.0.3
Operating System: Zorin OS

feature request: sync minirepo with only new/updated packages

if i re-run the minirepo command, it downloads again all pkgs. it would be nice to download only the new/updated packages (maybe keepiong mongodb/sqlite db of downloaded pkgs internally)

also to delete old/obsolete/old versioned pkgs

XML Parsing Error When Fetching Package Names from PyPI Simple API

Issue Description:

When using the minirepo script to fetch package names from the PyPI Simple API, an xml.etree.ElementTree.ParseError occurs. The error is due to attempting to parse HTML content as XML. The issue can be resolved by switching to an HTML parser like BeautifulSoup.
Steps to Reproduce:

* Run the minirepo script to fetch package names from PyPI Simple API.
* Observe the xml.etree.ElementTree.ParseError error message.

Expected Behavior:

The script should correctly parse the HTML content returned by the PyPI Simple API and extract package names without errors.

Actual Behavior:

The script throws an xml.etree.ElementTree.ParseError due to attempting to parse HTML content as XML.

Proposed Solution:

Replace the use of xml.etree.ElementTree with BeautifulSoup for parsing HTML content. Below is the updated get_names function:

Environment:

Python version: 3.x
minirepo version: 1.0.3
Operating System:  Zorin OS

This change ensures compatibility with the HTML content returned by the PyPI Simple API and prevents parsing errors.

configuration file contents
/root/.minirepo

{
  "processes": 10,
  "package_types": [
    "bdist_egg",
    "bdist_wheel",
    "sdist"
  ],
  "extensions": [
    "bz2",
    "egg",
    "gz",
    "tgz",
    "whl",
    "zip"
  ],
  "python_versions": [
    "3.0",
    "3.1",
    "3.2",
    "3.3",
    "3.4.10",
    "3.5.7",
    "3.6.9",
    "any",
    "cp27",
    "py2",
    "py2.py3",
    "py27",
    "source"
  ],
  "repository": "/data"
}

Proposed Solution:
Replace the use of xml.etree.ElementTree with BeautifulSoup for parsing HTML content. Below is the updated get_names function:

install bs4

pip install bs4

update the file

vim /root/venv/lib64/python3.6/site-packages/minirepo.py

from bs4 import BeautifulSoup

def get_names():
    # Fetch the HTML content from the PyPI Simple API
    resp = requests.get('https://pypi.python.org/simple')
    
    # Print the HTML content for debugging
    html_content = resp.content.decode('utf-8')  # Ensure the content is a string
    print("HTML content:")
    print(html_content)
    
    try:
        # Parse the HTML content
        soup = BeautifulSoup(html_content, 'html.parser')
        
        # Extract and return package names
        return [a.text for a in soup.find_all('a')]
    except Exception as e:
        print(f"HTML ParseError: {e}")
        sys.exit(1)

Typos in README

I've corrected some 2 typos in the json in a local branch. Would love to push it for review.

Depency on request isn't installed automatically

Really minor issue... When installing via pip, the dependency for requests is not picked up, so an error occurs when running. Looks like the line in setup.py is commented out - assuming accidentally?

from setup.py (approx #66):

install_requires=['requests'],

TypeError: unsupported operand type(s) for -: 'unicode' and 'int'

Running minirepo generates the following error with no human input taken.

minirepo
/******** Minirepo ********/
extensions      = [u'bz2', u'egg', u'gz', u'tgz', u'whl', u'zip']
package_types   = [u'bdist_egg', u'bdist_wheel', u'sdist']
processes       = ls -la
python_versions = [u'2.7', u'any', u'cp27', u'py2', u'py2.py3', u'py27', u'source']
repository      = pwd
Using config file /home/ec2-user/.minirepo
Traceback (most recent call last):
  File "/home/ec2-user/.local/share/virtualenvs/pypi_minirepo-r1IsK5pf/bin/minirepo", line 11, in <module>
    sys.exit(main())
  File "/home/ec2-user/.local/share/virtualenvs/pypi_minirepo-r1IsK5pf/lib/python2.7/site-packages/minirepo.py", line 290, in main
    pool = mp.Pool(PROCESSES)
  File "/usr/lib64/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 159, in __init__
    self._repopulate_pool()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 213, in _repopulate_pool
    for i in range(self._processes - len(self._pool)):
TypeError: unsupported operand type(s) for -: 'unicode' and 'int'

Skip if already in repo not skipping but downloading

Thanks for this great work, I've been running this script and the first time it downloads the packages however when running a 2nd and ongoing time it is actually downloading the packages again and not skipping them https://github.com/sganis/minirepo/blob/master/minirepo.py#L166

Fails when Installing on a Mac

I just installed the project, both times in a brand new VirtualEnv (Python 2.7) - on a Mac.

First time I used the PIP mechanism - the 2nd Time I pulled your GIT Repo

Alas - same error.... Not sure if it is not a MAC Issue... as it is occurring in the MultiProcessor calls....

The error is

/******** Minirepo ********/
extensions = [u'bz2', u'egg', u'gz', u'tgz', u'whl', u'zip']
package_types = [u'bdist_egg', u'bdist_wheel', u'sdist']
processes = 1
python_versions = [u'3.4', u'py34']
repository = /Users/tim/minirepo
Using config file /Users/tim/.minirepo
Traceback (most recent call last):
File "/Users/tim/pymirror/bin/minirepo", line 9, in
load_entry_point('minirepo==1.0.3', 'console_scripts', 'minirepo')()
File "build/bdist.macosx-10.11-x86_64/egg/minirepo.py", line 290, in main
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/init.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 159, in init
self._repopulate_pool()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 214, in _repopulate_pool
for i in range(self._processes - len(self._pool)):
TypeError: unsupported operand type(s) for -: 'unicode' and 'int'

Implement Package Tracking for Efficient Syncing

Issue Description:

Currently, the minirepo.py script does not have the capability to track downloaded packages and only sync new or updated packages during subsequent runs. This can lead to unnecessary re-downloads of packages that have already been fetched, resulting in wasted bandwidth, increased runtime, and redundant storage use.

Findings:

Redundant Downloads
No Version or Update Checking
Absence of Metadata Storage : There is no local storage of metadata about downloaded packages (such as package name, version, size, and checksum)

Proposed Solution:

Implement Local Metadata Storage by creating a local metadata file (e.g., a JSON file) to store details about downloaded packages, including package name, version, file size, and MD5 hash.
Pre-Download Check Against Metadata
Sync Only New or Updated Packages the script should check the local metadata
Use the metadata to identify and download only new or updated packages during subsequent runs

Benefits:

By only downloading new or updated packages, the script will significantly reduce unnecessary data transfer, conserving bandwidth.

dependency resolver

Hi
Do you use any dependency resolver for list of packages? I just skimmed your code and I didn't see anything that check package's dependencies.

sganis / minirepo Goto Github PK

minirepo's People

Contributors

Stargazers

Watchers

Forkers

minirepo's Issues

Issue Description:

Expected Behavior:

Actual Behavior:

Proposed Solution:

Environment:

Issue Description:

Findings:

Proposed Solution:

Benefits:

Environment:

Issue Description:

Expected Behavior:

Actual Behavior:

Proposed Solution:

Environment:

install_requires=['requests'],

Issue Description:

Findings:

Proposed Solution:

Benefits:

Recommend Projects

Recommend Topics

Recommend Org