dalehenrich / fsgit Goto Github PK

Git FileSystem implementation

Home Page: http://www.squeaksource.com/FSGit.html

Smalltalk 100.00%

fsgit's Introduction

#FileSystem-Git

Note: Since both FileSystem and FileSystem-Git are under active development, a lot of things may brake / be broken, when you load the packages.

FileSystem-Git is a Git implementation for Pharo Smalltalk. The focus of this project is to bring the power of version management that Git provides to the Pharo environment without forcing users to use a command line (at least for day-to-day workflows).

This repository is part proof of concept and part backup for the development of FileSystem-Git and FileTree.

##Installation instructions ###Pharo 1.4 and Pharo 2.0

Get the latest `FileSystem` packages:

  Gofer new
    squeaksource3: 'Pharo20';
    package: 'FileSystem-Core';
    package: 'FileSystem-Disk';
    package: 'FileSystem-Memory';
    package: 'FileSystem-AnsiStreams';
    package: 'FileSystem-Tests-Core';
    package: 'FileSystem-Tests-Disk';
    package: 'FileSystem-Tests-Memory';
    package: 'FileSystem-Tests-AnsiStreams';
    load

Load FileSystem-Git:

  Gofer new
    url: 'http://smalltalkhub.com/mc/MaxLeske/FileSytem-Git/main';
    package: 'System-Hashing';
    package: 'FileSystem-Git';
    load

fsgit's People

Contributors

Stargazers

Watchers

fsgit's Issues

Implement a proper git index

the monster called GitModificationManager needs to be replaced by a proper index. There are two reasons for this:

the implementation is hideous and uses all sorts of implicit magic to track changes
the modification manager only works as long as all changes are performed through a GitFilesystem. In case of FileTree however, that isn't enough because FileTree relies on FileDirectory. This means that the changes to the repository happen without a GitFilesystem knowing about them and the modification manager will think that there are no change (major WTF...)

validation issue after adding second mcz file to repository

Pharo1.3 - #13315, FSGit 2.2.5 ...

I created an empty git repo (Git, not OS-Git) then copied 9using the Monticello Browser) an existing mcz file (ConfigurationOfMetacello-dkh.675) from http://seaside.gemstone.com/ss/metacello and everything looked fine. Then I copied ConfigurationOfMetacello-dkh.674 from the same repository into the git repo. and got a validation error (sorry, don't have the stack).

It appears that ConfigurationOfMetacello-dkh.674 did get added to the repo, but I can't tell you what happened to the source tree ... I ended up blowing the whole thing away

I looked at the error message, but it didn't give me a clue as to the problem (I guess I was expecting an error and was surprised to see the mcz file in the repo).

Need a platform class

need to better isolate the specific GUI classes/methods from the core code ... GemStone doesn' t have any morphic classes, MCGitRepository>>morphicOpen: needs to be in a platform-specific package

ZnEasy referenced in GitDumbHtppProtocol>>readFile:

Either FS-Git-Remote requires Zinc, or this one slipped through

multiple packages per commit

If I recall correctly, the model in the MCGitRepository in FSGit performs one commit per package save, but in the work I'm currently doing for Metacello I think that the granularity of commits should allow for multiple packages to be grouped together in a single commit ... so we'll need some sort of staging going on ...

I assume that this type of thing is possible with Git, but I'm curious about your thoughts on this ...

Presumably the commit would be done in an ensure block while the individual packages are written to the repo (staged?)...have to have some error handling to possibly back out of the commit on error, but then we will need to worry about these packages that are no longer dirty but haven't be committed to repo (probably mark them dirty again?) ... solvable but sticky

handle large number of objects

See issue #11.

Even when we're not talking about packs we need to consider doing more work on disk to keep the memory free.
One idea would be to introduce a threshold. Below the threshold, all work would be done in memory for performance.

This also concerns memory filesystems in a special way since, obviously, everything is supposed to happen in memory. I do not think that it's likely that someone will use such a large repository in memory but we should be prepared.

#lastVersions: returns more than the number of commits supposed to

lastVersions: passes the integer by value to a recursive function. The quick fix is to use an object instead (e.g. an association) to make sure that all calls use the same integer as limit.

Another thing: should this method really walk the graph depth first?

[ENH]: List all available versions for a package

This is required for dalehenrich/filetree#50: "[ENH]: Show all available versions in MC Tools"

add abstraction layer between FileSystem and plumbing

`Metacello git`?

While thinking about using git in Smalltalk this morning it occurred to me that there is a natural way to integrate git into Metacello using Metacello git and executing commands like the following:

Metacello git
  project: 'Metacello';
  repository: 'git://github.com/dalehenrich/metacello-work.git';
  load "which does a clone of the repository  and then a load from the repo"

Metacello git
  project: 'Metacello';
  cherryPick: '-x -n 0b0165a620c433341601cac5ee01888f40c44c66'

Metacello git
  project: 'Metacello';
  pull: `origin master'

Metacello git
  project: 'Metacello';
  status

Metacello git
  project: 'Metacello';
  log

Metacello git
  project: 'Metacello';
  diff

Metacello git
  project: 'Metacello';
  commit: '-a -m"this is a commit comment"'

Not sure how that I like the command line arg passing, but it sure maps directly to the git documentation which is a real plus ... we could write a little argument parser to pick apart the command line and turn it into proper Smalltalk message sends underneath th covers.

We could do something like:

Metacello git
  project: 'Metacello';
  a;
  m: 'this is a commit comment';
  commit

but then positioinal args would be difficult to handle ... not sure how to map the following to proper Smalltalk:

Metacello git
  project: 'Metacello';
  pull: 'origin master'

Anyway, some food for thought ...

add theseion as collaborator

theseion aka max

MCGitRepository one more layer of refactoring?

MCGitRepository>>basicStoreVersion:

I want to experiment with different forms of identifying the parent commit.
- head used now, but I think that a Monticello git repository might represent a specific tag/branch/commit
I want to experiment with a different GitWriter ...
- presumably the writer could be completely pluggable
- for that matter the fileSystem may be as well ...
I want to experiment with different commit combinations
- I know that tags are used for extracting mcz versions, but there alternate techniques possible

retrieving loose objects loads the same trees as distinct objects

when retrieving loose objects from a repository, trees are loaded from their files but also via the load mechanism of the commit object. Thus the number of tree objects in the image might be 6 while there is in reality only one (e.g. 5 commits with no changes to the tree entries).

change naming scheme from FSxxx to FileSystemxxx

create index from pack file

In certain cases it will be necessary to create a new index for a pack file. Find out how.

large number of objects blow memory

Even having only a collection of object names in the image might blow the image memory. Consider a collection of 20-byte hashes with size > 2G. That alone might be enough and if not, something like collection asSortedCollection might well double the memory consumption (not sure how the VM handles integers).

This problem is pretty urgent. I can't even properly test the 8-byte offset table in the pack index file because I can't generate enough object names.

Create pack files that include delta chains

Packs are currently written without deltification. Find out how to deltify objects.