Comments (19)
Yes, there's a few ways of hosting example data of this sort. Investigating that ecosystem is on my to-do list actually.
from geoplot.
@choldgraf OK so I removed the data to a separate repo. Now what remains is removing these files from git history.
I don't suppose you know how to do that? It's an awful lot of magic...
from geoplot.
hhmmmm - it's something I've done but that was a long time ago :-)
usually I remind myself with this SO post:
this tool has always seemed helpful, though I've never used it since in my case it was usually just one file
https://rtyley.github.io/bfg-repo-cleaner/
A challenge here is that this rewrites git history, so I think it might mess up people's forks when they try to commit (double check this though). That said, it's a good reason to nip these things in the bud sooner than later....
from geoplot.
See here for some answers: https://twitter.com/huitseeker/status/909094893833695232
It sounds like people will need to rebase onto master if they've already got forks, but other than that I think you're safe to do this. Another person recommended the BFG thing above :-)
from geoplot.
I tried to manual way provided in the StackOverflow thread, that did not help---after pushing a rebase, the data was still there.
I will try the BFG approach tomorrow.
from geoplot.
from geoplot.
(geoplot) Honorss-MacBook-Air-42:geoplot.git Honors$ java -jar ../bfg-1.12.15.jar --delete-folders "data" .
Using repo : /Users/Honors/Desktop/geoplot.git/.
Found 196 objects to protect
Found 4 tag-pointing refs : refs/tags/0.0.1, refs/tags/0.0.2, refs/tags/0.0.3, refs/tags/0.0.4
Found 4 commit-pointing refs : HEAD, refs/heads/master, refs/pull/34/head, refs/pull/34/merge
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 613080fd (protected by 'HEAD')
Cleaning
--------
Found 257 commits
Cleaning commits: 100% (257/257)
Cleaning commits completed in 635 ms.
Updating 7 Refs
---------------
Ref Before After
----------------------------------------
refs/heads/master | 613080fd | ca2eecfd
refs/pull/34/head | 56f0e66c | c61b4257
refs/pull/34/merge | 04f1aefb | 63c3ca9b
refs/tags/0.0.1 | 5822a3d5 | 58b4d3cc
refs/tags/0.0.2 | 364a880c | b9dd4bf7
refs/tags/0.0.3 | 227db476 | 654cbec4
refs/tags/0.0.4 | c8a23d08 | a54b814a
Updating references: 100% (7/7)
...Ref update completed in 39 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
...............DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | 9ddd5f1f | e5b19f0b
Last dirty commit | 42dc047a | 8e55e249
In total, 291 object ids were changed. Full details are logged here:
/Users/Honors/Desktop/geoplot.git/..bfg-report/2017-09-17/10-42-56
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive
--
You can rewrite history in Git - don't let Trump do it for real!
Trump's administration has lied consistently, to make people give up on ever
being told the truth. Don't give up: https://github.com/bkeepers/stop-trump
--
(geoplot) Honorss-MacBook-Air-42:geoplot.git Honors$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 2184, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2148/2148), done.
Writing objects: 100% (2184/2184), done.
Total 2184 (delta 1091), reused 890 (delta 0)
(geoplot) Honorss-MacBook-Air-42:geoplot.git Honors$ git push
fatal: remote error:
You can't push to git://github.com/ResidentMario/geoplot.git
Use https://github.com/ResidentMario/geoplot.git
(geoplot) Honorss-MacBook-Air-42:geoplot.git Honors$ git push --set-upstream https://github.com/ResidentMario/geoplot.git master
To https://github.com/ResidentMario/geoplot.git
! [rejected] master -> master (fetch first)
error: failed to push some refs to 'https://github.com/ResidentMario/geoplot.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
(geoplot) Honorss-MacBook-Air-42:geoplot.git Honors$ git push --set-upstream https://github.com/ResidentMario/geoplot.git master --force
Counting objects: 1997, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1013/1013), done.
Writing objects: 100% (1997/1997), 116.10 MiB | 48.00 KiB/s, done.
Total 1997 (delta 963), reused 1980 (delta 955)
remote: Resolving deltas: 100% (963/963), done.
To https://github.com/ResidentMario/geoplot.git
+ 613080f...ca2eecf master -> master (forced update)
Branch master set up to track remote branch master from https://github.com/ResidentMario/geoplot.git.
It seems to have worked. When I look at the repo commit history, I no longer see a data
folder in any commits!
But...
(geoplot) Honorss-MacBook-Air-42:Desktop Honors$ git clone https://github.com/ResidentMario/geoplot.git
Cloning into 'geoplot'...
remote: Counting objects: 2285, done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 2285 (delta 87), reused 176 (delta 85), pack-reused 2105
Receiving objects: 100% (2285/2285), 152.76 MiB | 5.76 MiB/s, done.
Resolving deltas: 100% (1062/1062), done.
Checking connectivity... done.
...it's still 150 MiB.
The files seem to be well and truly gone. But the repo is still the same size as it was before!
from geoplot.
huh, that's strange! and the files are gone from history and everything?
from geoplot.
Yes. I asked this Q on StackOverflow.
from geoplot.
you need to force-push all the tags and branches as well (not just master)
from geoplot.
@asottile I ran:
git tag -d 0.0.3
git tag 0.0.3 fb27de2
git tag -d 0.0.4
git tag 0.0.4 9f7e5a9
git push --tags origin --force
Which netted:
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/ResidentMario/geoplot.git
+ 227db47...fb27de2 0.0.3 -> 0.0.3 (forced update)
+ c8a23d0...9f7e5a9 0.0.4 -> 0.0.4 (forced update)
But downloading and unpacking [email protected]
from here opens up 191 MB on disk (are these numbers going...up?).
from geoplot.
Those aren't the revisions I expect given the output above. Your tags still contain the data
history:
$ git tag -l | xargs --replace bash -c 'echo ============ && echo {} && echo ============ && git log --oneline {} -- data'
============
0.0.1
============
ec23e99 Demos.
cac4125 Work on aggplot.
e42eb0c Swap examples.
1114725 Populate examples plage.
2420b0e Another example.
57c20c8 Another example. Implement custom geometry in sankey.
fee0fd9 Another example.
30a4c34 Another example.
51c0d28 Another example.
ccbb393 WSubplotting, first example done.
9ddd5f1 Upload data.
============
0.0.2
============
ec23e99 Demos.
cac4125 Work on aggplot.
e42eb0c Swap examples.
1114725 Populate examples plage.
2420b0e Another example.
57c20c8 Another example. Implement custom geometry in sankey.
fee0fd9 Another example.
30a4c34 Another example.
51c0d28 Another example.
ccbb393 WSubplotting, first example done.
9ddd5f1 Upload data.
============
0.0.3
============
ec23e99 Demos.
cac4125 Work on aggplot.
e42eb0c Swap examples.
1114725 Populate examples plage.
2420b0e Another example.
57c20c8 Another example. Implement custom geometry in sankey.
fee0fd9 Another example.
30a4c34 Another example.
51c0d28 Another example.
ccbb393 WSubplotting, first example done.
9ddd5f1 Upload data.
============
0.0.4
============
ec23e99 Demos.
cac4125 Work on aggplot.
e42eb0c Swap examples.
1114725 Populate examples plage.
2420b0e Another example.
57c20c8 Another example. Implement custom geometry in sankey.
fee0fd9 Another example.
30a4c34 Another example.
51c0d28 Another example.
ccbb393 WSubplotting, first example done.
9ddd5f1 Upload data.
from geoplot.
For example, I expect the 0.0.4 tag to point to 4e1fbf5 (part of your master
history)
from geoplot.
Yeah I noticed this too (that they still contains the folder, even). I didn't delete the GitHub tags before pushing the local ones, which appears to have resulted in no change (?).
I ran a bunch of git push --delete origin 0.0.x
commands (per this Gist) then recreated the 0.0.4
release using the button on GitHub. That did seem to work---there's now just [email protected]
, which is 24 MB zipped. The repo is now down to 110 MiB when cloned, but that's still clearly too much?
from geoplot.
To summarize the changes thus far: I followed the BFG sequence above and, on the advice above, deleted all of the old tags (0.0.1
through 0.0.4
). Then I created a new 0.0.4
tag based on the present state of the repository.
Overall, this reduced git clone
size from 150-ish MiB to 100-sih MiB. Which still doesn't seem correct to me.
After further reflection I'm realizing that there are other large file diffs that I have pushed into history that are causing this excessive size. The figures
folder contains a set of images for the website that were updated relatively often; there used to be an html
folder with the raw website output. I've now cleaned both out.
One other large file that remains is the tutorial and API reference generator notebooks, which contain a lot of images as well. Practically speaking, the solution is going to be to fork all of that stuff off into a separate repository, e.g. geoplot.github.io
. I can host the website off of that domain (instead of my own) and haul all that cruft over to there instead of here.
I'm still not happy with the size of the repo on clone, but I'd like to prioritize feature work for a bit...
from geoplot.
one option is to use something like sphinx-gallery, which would let you include examples as .py
files and it'd collect the image outputs etc and render them notebook-like online. It shouldn't be too hard to get that working if it's a big part of the size.
from geoplot.
(e.g., see http://martinos.org/mne/dev/auto_tutorials/plot_sensors_decoding.html for an example from another package I work on)
from geoplot.
Down to ~35 MB now, after very aggressive history pruning. This is likely as good as it's going to get!
from geoplot.
woot! that's an order of magnitude improvement...nice!
from geoplot.
Related Issues (20)
- Stop using descartes HOT 1
- webmap() + polyplot() does not seem to work HOT 3
- Update conda-forge to release 0.4.3? HOT 4
- Shared colorbar between subplots HOT 2
- projection=gcrs.OSGB() sometimes creates a blank plot HOT 1
- Pin geopandas version? HOT 2
- cProfiling geoplot vs Cartopy vs GeoPandas: puzzling results HOT 2
- Typo in UserWarnings
- geoplot installed but geoplot.crs is not found HOT 4
- Legends in Geoplot 0.4.4 HOT 1
- Unable to install geoplot with pipenv under Ubuntu 20.04LTS HOT 1
- Grey boxes when clipping data with kdeplot HOT 8
- geoplot.kdeplot --> overlapping isolines HOT 8
- Fix KDEPlot hue tests HOT 6
- Voronoi plot fails with Value Error: Points cannot contain Nan HOT 4
- Does geoplot only work with EPSG:4326 unless the projection argument is set?
- Tests fail with geopandas 0.11
- Feature request? Apply pointplot "hue" to edgecolor only
- ModuleNotFoundError: No module named 'matplotlib.axes._subplots' HOT 1
- AttributeError: 'MultiPolygon' object has no attribute 'exterior' / TypeError: 'MultiPolygon' object is not iterable HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from geoplot.