sganis / minirepo Goto Github PK
View Code? Open in Web Editor NEWCreate a local pypi repository to use pip off-line
License: MIT License
Create a local pypi repository to use pip off-line
License: MIT License
The get_config function in the minirepo script uses raw_input, which is only available in Python 2. This causes a NameError when running the script in Python 3. Additionally, the function prompts for user input, which is not ideal for non-interactive use.
Steps to Reproduce:
* Run the minirepo script with Python 3.
* Observe the NameError due to raw_input.
The script should work seamlessly in both Python 2 and Python 3, and it should avoid prompting for user input in a non-interactive environment.
The script throws a NameError due to the use of raw_input in Python 3, and it prompts for user input if the configuration file is missing or has errors.
Update the get_config function to ensure compatibility with both Python 2 and Python 3 by using input and setting default values without prompting for user input.
try:
input = raw_input
except NameError:
pass
def get_config():
config_file = os.path.expanduser("~/.minirepo")
repository = os.path.expanduser("~/minirepo")
processes = 10
try:
with open(config_file, 'r') as f:
config = json.load(f)
except (json.JSONDecodeError, FileNotFoundError):
newrepo = input('Repository folder [%s]: ' % repository)
if newrepo:
repository = newrepo
newprocesses = input('Number of processes [%s]: ' % processes)
if newprocesses:
processes = int(newprocesses)
config = {
"repository": repository,
"processes":process,
"python_versions":PYTHON_VERSIONS,
"package_types": PACKAGE_TYPES,
"extensions": EXTENSIONS
}
with open(config_file, 'w') as w:
json.dump(config, w, indent=2)
for c in sorted(config):
print('%-15s = %s' % (c,config[c]))
print('Using config file %s ' % config_file)
return config
Python version: 2.x and 3.x
minirepo version: 1.0.3
Operating System: Zorin OS
When I ran minirepo it was downloading amd64 wheel files only. I expected that it would download wheel files and source files. The config I used was
{
"processes" : "20",
"package_types" : ["bdist_wheel","sdist"],
"extensions" : ["tgz","whl","gz"],
"python_versions": ["2.7","cp27","cp36","cp37","py2","py3","py2.py3","py27","py36","py37
"repository" : "/home/<username>/minirepo"
}
Now I saw in the code something about a platform filter.
"platforms" : ["linux","win32","win_amd64","macOS"],
It looks that it covers all platforms. But I wonder if there is some hardcoded filter to download amd64 only?
Because when I added to the config above the following line
"platforms" : ["linux"],
and restarted it, it took a while (i guess to process the huge packages.json file) and then it downloaded this:
2020-05-05 20:32:14,417:WARNING: Downloaded: cmappertools-1.0.24-cp27-cp27m-win_amd64.whl Ok pid:8919 1% [277/23193.0]
Before it had downloaded some 12GB of win_amd64 packages. Shouldn't if have downloaded only linux (...none-linux_arm.whl) and source files?
How can I make it download Linux and Source files?
Hi, I'd like to use minirepo in one of my projects, but I'm hesitant to do so without a license for the repo.
Would you please consider adding one? Perhaps MIT or BSD?
Thank you!
The current implementation of the minirepo script provides progress updates through log messages that display the number of packages downloaded and a percentage completion. While functional, this method is less user-friendly compared to a visual progress bar. Additionally, the script does not calculate or display the total size to be downloaded before starting the process, which would be useful for users in managing disk space and network bandwidth.
Log-Based Progress Reporting:The script currently logs the number of packages downloaded and the percentage of progress.Example: WARNING: Downloaded: example-package-1.0.0.whl Ok pid:1234 20% [2/10]
..While this is informative, it lacks the visual clarity and continuous feedback provided by a graphical progress bar.
Absence of Total Download Size:The script does not calculate the total size of all packages to be downloaded before starting.This can lead to unexpected resource usage, particularly when dealing with large downloads.
Implement a Visual Progress Bar:Replace the log-based progress reporting with a visual progress bar using for example the tqdm
library.The progress bar should show the percentage completed, the current download speed, and the total data downloaded so far.
Calculate and Display Total Download Size:Before starting the download process, calculate the total size of all packages that will be downloaded.Display this total alongside the progress bar, so users know the expected resource usage.
A visual progress bar provides continuous feedback and is more intuitive for users. Additionally, knowing the total size in advance allows users to make informed decisions about whether to proceed with the download.
Python version: 2.x and 3.x
minirepo version: 1.0.3
Operating System: Zorin OS
if i re-run the minirepo command, it downloads again all pkgs. it would be nice to download only the new/updated packages (maybe keepiong mongodb/sqlite db of downloaded pkgs internally)
When using the minirepo script to fetch package names from the PyPI Simple API, an xml.etree.ElementTree.ParseError occurs. The error is due to attempting to parse HTML content as XML. The issue can be resolved by switching to an HTML parser like BeautifulSoup.
Steps to Reproduce:
* Run the minirepo script to fetch package names from PyPI Simple API.
* Observe the xml.etree.ElementTree.ParseError error message.
The script should correctly parse the HTML content returned by the PyPI Simple API and extract package names without errors.
The script throws an xml.etree.ElementTree.ParseError due to attempting to parse HTML content as XML.
Replace the use of xml.etree.ElementTree with BeautifulSoup for parsing HTML content. Below is the updated get_names function:
Python version: 3.x
minirepo version: 1.0.3
Operating System: Zorin OS
This change ensures compatibility with the HTML content returned by the PyPI Simple API and prevents parsing errors.
configuration file contents
/root/.minirepo
{
"processes": 10,
"package_types": [
"bdist_egg",
"bdist_wheel",
"sdist"
],
"extensions": [
"bz2",
"egg",
"gz",
"tgz",
"whl",
"zip"
],
"python_versions": [
"3.0",
"3.1",
"3.2",
"3.3",
"3.4.10",
"3.5.7",
"3.6.9",
"any",
"cp27",
"py2",
"py2.py3",
"py27",
"source"
],
"repository": "/data"
}
Proposed Solution:
Replace the use of xml.etree.ElementTree with BeautifulSoup for parsing HTML content. Below is the updated get_names function:
install bs4
pip install bs4
update the file
vim /root/venv/lib64/python3.6/site-packages/minirepo.py
from bs4 import BeautifulSoup
def get_names():
# Fetch the HTML content from the PyPI Simple API
resp = requests.get('https://pypi.python.org/simple')
# Print the HTML content for debugging
html_content = resp.content.decode('utf-8') # Ensure the content is a string
print("HTML content:")
print(html_content)
try:
# Parse the HTML content
soup = BeautifulSoup(html_content, 'html.parser')
# Extract and return package names
return [a.text for a in soup.find_all('a')]
except Exception as e:
print(f"HTML ParseError: {e}")
sys.exit(1)
I've corrected some 2 typos in the json in a local branch. Would love to push it for review.
Really minor issue... When installing via pip, the dependency for requests is not picked up, so an error occurs when running. Looks like the line in setup.py is commented out - assuming accidentally?
from setup.py (approx #66):
Running minirepo generates the following error with no human input taken.
minirepo
/******** Minirepo ********/
extensions = [u'bz2', u'egg', u'gz', u'tgz', u'whl', u'zip']
package_types = [u'bdist_egg', u'bdist_wheel', u'sdist']
processes = ls -la
python_versions = [u'2.7', u'any', u'cp27', u'py2', u'py2.py3', u'py27', u'source']
repository = pwd
Using config file /home/ec2-user/.minirepo
Traceback (most recent call last):
File "/home/ec2-user/.local/share/virtualenvs/pypi_minirepo-r1IsK5pf/bin/minirepo", line 11, in <module>
sys.exit(main())
File "/home/ec2-user/.local/share/virtualenvs/pypi_minirepo-r1IsK5pf/lib/python2.7/site-packages/minirepo.py", line 290, in main
pool = mp.Pool(PROCESSES)
File "/usr/lib64/python2.7/multiprocessing/__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 159, in __init__
self._repopulate_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 213, in _repopulate_pool
for i in range(self._processes - len(self._pool)):
TypeError: unsupported operand type(s) for -: 'unicode' and 'int'
Thanks for this great work, I've been running this script and the first time it downloads the packages however when running a 2nd and ongoing time it is actually downloading the packages again and not skipping them https://github.com/sganis/minirepo/blob/master/minirepo.py#L166
I just installed the project, both times in a brand new VirtualEnv (Python 2.7) - on a Mac.
First time I used the PIP mechanism - the 2nd Time I pulled your GIT Repo
Alas - same error.... Not sure if it is not a MAC Issue... as it is occurring in the MultiProcessor calls....
The error is
/******** Minirepo ********/
extensions = [u'bz2', u'egg', u'gz', u'tgz', u'whl', u'zip']
package_types = [u'bdist_egg', u'bdist_wheel', u'sdist']
processes = 1
python_versions = [u'3.4', u'py34']
repository = /Users/tim/minirepo
Using config file /Users/tim/.minirepo
Traceback (most recent call last):
File "/Users/tim/pymirror/bin/minirepo", line 9, in
load_entry_point('minirepo==1.0.3', 'console_scripts', 'minirepo')()
File "build/bdist.macosx-10.11-x86_64/egg/minirepo.py", line 290, in main
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/init.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 159, in init
self._repopulate_pool()
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 214, in _repopulate_pool
for i in range(self._processes - len(self._pool)):
TypeError: unsupported operand type(s) for -: 'unicode' and 'int'
Currently, the minirepo.py script does not have the capability to track downloaded packages and only sync new or updated packages during subsequent runs. This can lead to unnecessary re-downloads of packages that have already been fetched, resulting in wasted bandwidth, increased runtime, and redundant storage use.
By only downloading new or updated packages, the script will significantly reduce unnecessary data transfer, conserving bandwidth.
Hi
Do you use any dependency resolver for list of packages? I just skimmed your code and I didn't see anything that check package's dependencies.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.