microsoft / scalar Goto Github PK

Scalar: A set of tools and extensions for Git to allow very large monorepos to run on Git without a virtualization layer

License: MIT License

C# 84.49% Shell 12.49% Batchfile 2.53% PowerShell 0.49%

scalar's People

Contributors

Stargazers

Watchers

scalar's Issues

Create a perf test suite

To give us high confidence that customer satisfaction will be greater than it would be on VFS for Git, I think we want to measure identical scenarios on both VFS for Git sparse mode and scalar. Ideally we can also configure what part of the cone is available, so we can compare and contrast different sizes of enlistments. We should use representative sparse enlistments for different segments of our customer base.

Thoughts on the approach or execution?

[Mount Removal] Remove disk layout upgrade code

UPDATE

The upgrade steps are run as part of scalar mount will will be going away as part of the mount removal process.

We should remove the upgrade code as part of removing the mount process, and if in the future we need to perform disk layout upgrades it will need to be driving by the service and/or the installer.

We no longer need back-compat logic for previous GVFS disk layouts. We will dramatically change the way we store the repo config, and hopefully do so before we ship to EA.

At some point, we will not allow breaking changes and then will need upgrade logic. Should we delete the disk layout code now and then redesign/reimplement the upgrade logic when we need it?

cc: @mjcheetham

[Mount Removal] Remove BlobSizesRoot related code

This cleanup task will make it easier to remove the mount process.

Create PR validation pipeline

Must have:
macOS build + unit tests
Windows build + unit tests

Rename EVERYTHING

New name TBD.

Replace all instances of "GVFS" and "VFSForGit" (and variants) in the filenames using an exact-rename commit.
Drop the base "GVFS" folder in the same commit.
Perform text-based replacement of "GVFS", "VFSForGit", and variants in most places (some things need to stay, such as the Git package name).

Use a derivative of calver for scalar

It would be ideal if the versioning scheme gave a better indication of the time of release and the milestone associated with the bits.

I propose the following, where {build} = counter(SourceRef, 0)

Source ref	Version template	Version example	Build Number
refs/heads/releases/19.08.157	{yy}.{MM}.{Milestone}.{build}	19.08.157.1	Release-19.08.157.1
refs/pull/25/merge	10.20.{PRNum}.{build}	10.20.25.1	PR-25.1
refs/heads/master	{yy}.{MM}.{dd}.{build}	19.08.10.33	CI-master.33
refs/tags/tagname	{yy}.{MM}.{dd}.{build}	19.08.10.34	CI-tagname.34

Sparse: Check for superceding parents in input

BUG: if we run git sparse-checkout set A A/B, then A is registered as a recursive closure AND A/B is marked as a recursive closure. This also means that A is marked as a parent path.

This results in Git complaining that the patterns are not cone-style, and reverts to the slow pattern matching algorithm.

To fix, consider removing paths from the "parent" list if they are in the "recursive" list. Further: remove children from the recursive list.

Progress indicators in `git read-tree -mu HEAD`

scalar sparse --add needs progress indicators. These progress indicators need to be in two places (at least):

BlobPrefetcher needs to provide feedback as it discovers and downloads blobs.
git read-tree -mu HEAD needs to provide feedback as it populates the working directory.

These are very different solutions, so this issue will track git read-tree -mu HEAD.

[Mount Removal] Move repo registration from mount verb to clone verb

Repo registration (with Scalar.Service) should happen during scalar clone rather than mount.

Additionally, the clone verb itself should update the registration file (or each clone should create its own file) and Scalar.Service should read the file(s) to discover which repos have been registered.

Additionally:

Scalar.Service should also remove repos when it finds they're no longer on disk.
There should be a verb for manually registering/unregistering repos
Functional tests should not register when cloning or they should register in a test location that does not impact the installed Scalar.Service

Performance: git add

We need to investigate what we can do about git add in our target enlistment.

git add -p from src took 31s.
git add . from src took 43s.

These are no-op adds with fsmonitor and untracked cache.

Remove code that that handles scenarios for old clones

There is some code specifically for 'old' clones that we should remove.

Ex.

scalar/Scalar/CommandLine/ScalarVerb.cs

Line 807 in 8372f6b

 // Note: Repos cloned with a version of Scalar that predates the local cache will not have a local cache configured 

Please audit the code the code and ensure any unnecessary code is removed.

Git: core.gvfs config setting

The core.gvfs config setting does a lot of things, including block unwanted commands.

This setting was dropped as part of the rename effort (#38) and should be put back for now.

However, there are a lot of things that config options does that we may not want it to do in the Scalar world. Update Git to split those actions apart based on other config options or add a core.scalar for our situation.

For example, we still want to block git gc, but that could be part of core.virtualizeobjects instead.

[Mount Removal] Update 'scalar status' verb to communicate with Scalar.Service and read RepoMetadata directly, or remove the verb entirely

Currently scalar status connects to the Scalar mount process to retrieve information about the repo. Assuming that this verb is kept, it should connect to Scalar.Service instead and/or read information about the repo from RepoMetadata.dat directly.

This work depends on #112 and #6

Create new algorithm for prefix-matching `sparse-checkout`

The current algorithm for matching in the sparse-checkout file will not scale to thousands of patterns over millions of files. We need something better.

Match the prefix-matching pattern from the VFS for Git Sparse Mode.

Remove LibGit2

Can we get rid of LibGit2 entirely? Here are some tradeoffs:

We currently track which downloaded objects are blobs or not. Dropping this stat would save time, and the batched read-object hook (#7) would make that less important.
CommitAndRootTreeExists() exists for a VFS for Git reason: to see if we need to prefetch the folders at a commit on clone time so we can generate an index before projecting. This isn't needed any more.
the LooseObjectsStep checks for corrupt loose objects. We'll have fewer objects with the batched read-object, and we could teach git pack-objects to clear corrupt objects, perhaps.

Outside of that last one, many of these changes are super small and don't have a huge impact on the full story.

'sparse' verb should record how long the git sparse-checkout took to run

Currently the log file only includes the overall time, and not the time of the git command specifically.

Prototype git sparse verb

Goal: provide a user-friendly experience around configuring the sparse checkout.

Scope: Verb to add and remove entire directories from sparse enlistment.

Non-goals: High-performance application of sparse-checkout file.

Progress indicator for vfs helper

While acquiring a set of objects prior to a workdir changing operation like checkout or reset, we should show progress similar to fetch's.

The clone verb does not properly initialize the repo if --no-mount is passed in

If the mount process is not running the clone verb will be unable to download the objects it needs to complete the checkout.

Update Functional Test data after rename

The functional tests use an old copy of the GVFS repo, so all of the paths use GVFS in the names. Those paths were modified automatically as part of the rename operation (#38).

As functional tests are re-added, we will need to revert the changes to those paths, but it will be a manual process.

Delete HealthVerb

Git commands have segmentation fault after staging and unstaging files

Steps to reproduce:

Scalar clone a repo
Make changes to tracked files
Create untracked files
git add all of the changes made above
git restore --staged .
Run a git command (e.g. git status)

Result:

Segmentation fault: 11

Remove GitHooksLoader

The GitHooksLoader should not be needed, as we only have the read-object hook, which is already native.

Install/configure watchman as part of the functional tests

Customers will always be configured to use Watchman in Scalar repos, and so our functional tests should configure/use Watchman as well.

Set scalar.telemetry-pipe in installer

Set scalar.telemetry-pipe in the installer to get telemetry from scalar daemons.
This is a peer of gvfs.telemetry-pipe.

scalar.telemetry-pipe=scalar-c780ac06-135a-4e9e-ab6c-d41e2d265baa

We do NOT need the corresponding scalar.telemetry-id (like we have in gvfs).

[Mount Removal] Perform maintenance jobs in the service rather than the mount process

This is part of the work required to eliminate the mount process.

Special care will need to be taken regarding ACLs. The service runs with elevation, and we need to make sure that non-elevated git processes are still able to use the files produced by the maintenance tasks.

The service runs with elevation, and we need to make sure that non-elevated git processes are still able to use the files produced by the maintenance tasks.

As an alternative, we should investigate running Scalar.Service as the user rather than as admin.

Investigate having the 'sparse' verb call 'git status' before finishing to prime the untracked cache

After adding a large set of cones to the repo using scalar sparse --add-stdin the first git status took a long time:

~/ScalarTests/repo/src>git status
On branch master
Your branch is up to date with 'origin/master'.

It took 17.52 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean

If we had the sparse verb call git status before it finishes users would have a better experience running git status for the first time.

Install: scripted installation

We need to produce a simple scripted installation for macOS that pulls together scalar, git, gcm core, watchman, and internal tooling and correctly configures everything. This is to support demo scenarios and automation like perf and large build runs.

Remove reliance on mount process from read-object hook

Rather than sending a message to the mount process, investigate simply acquiring the objects inline.

Sparse: Update SparseVerb to interact with `git sparse-checkout add`, drop other uses

We have the SparseVerb from VFS for Git. Update it to be a small layer over git sparse-checkout add with an additional scalar prefetch --folders-list first. That prefetch, along with #62, will make the expansion much faster.

Resumability: batch requests in vfs helper

In order to have partial resumability in the event of a network failure, we should limit the number of objects we request in one go and ask for multiple batches rather than one large batch.

This also opens the door for parallelization later.

Planning: feature branch in microsoft/git

This issue is to facilitate discussion.

In microsoft/git#171, we introduce the git sparse-checkout builtin. This has the features we need to get moving on the sparse clones in Scalar, but it is not ready for merging into vfs-2.22.0. In particular, we need to get feedback from the mailing list before we take a hard dependency on it, especially in the shipped version with microsoft/vfsforgit.

Here is my proposal:

Create a new feature branch features/sparse-checkout in microsoft/git.
The feature branch will include all updates to sparse-checkout (#8) and batch object downloading (#7, #36).
As vfs-2.22.0 advances, we can dual-checkin if it is a critical change. This should happen rarely as we are mostly doing upstream-first development in Git for VFS.
As git/git and git-for-windows/git ship new versions, microsoft/git gets a new vfs-2.XX.0 branch. The features/sparse-checkout will then be rebased on top of that using a force-push.
As features/sparse-checkout updates, we generate installers with suffix -sc to indicate this is something to consume in Scalar but not VFS for Git.

This setup should allow us to merge PRs like #54 and start working on functional tests, follow-up features, and perf tests.

/cc @jrbriggs, @wilbaker, @jeffhostetler, @kewillford, @jeschu1, @mjcheetham, @garimasi514, @nickgra.

Sparse: Add functional test workflow

Create a functional test set that follows a typical workflow around a sparse enlistment:

scalar clone --sparse=true
Verify root files only.
scalar sparse add
Verify folders are added.

May be combined with #76.

Sparse: Update functional tests to check sparse mode

After #76, update the sparse-mode functional tests to work with a sparse scalar clone.

Installer does not auto-unmount enlistments

When installing the product, we need to manually unmount all scalar mount processes.

Create a transport layer to use the gvfs protocol

We'll want to gather objects in bulk before a workdir changing operation like checkout.

Integrate fsmonitor with Watchman on macOS

For optimal performance, we need git status to run in O(modified) time. The fsmonitor feature exists in Git, and we should take advantage of it.

@dscho is working on this, but I can't assign it to him for some reason.

Progress indicators in BlobPrefetcher

scalar sparse --add needs progress indicators. These progress indicators need to be in two places (at least):

BlobPrefetcher needs to provide feedback as it discovers and downloads blobs.
git read-tree -mu HEAD needs to provide feedback as it populates the working directory.

These are very different solutions, so this issue will track BlobPrefetcher.

Rename the product

While we do that

flatten the repo

Reduce error/exception noise by having maintenance tasks check if the repo exists before running

Copied from microsoft/VFSForGit#1447

Remove GitHooksLoader related installation code

Related to #11

There is code in HooksInstaller (e.g. MergeHooksData) that is specific to GitHooksLoader and can be removed.

https://github.com/microsoft/gsd/blob/b04ff8960ed2f31d0d1e0c4f400960459b44dee8/GVFS/GVFS.Common/FileSystem/HooksInstaller.cs#L26

Rewrite README

The README is a leftover from VFS for Git. It needs updating. Perhaps it should just point to the roadmap for now?

[Mount Removal] Remove 'mount' and 'unmount' verbs, and remove the service verb options for mounting and unmounting

These verbs are no longer needed once the mount process is removed.

This issue depends on:

Create CI pipeline on master

Should rely largely on yaml from #3. Publishes installers.

Prefetch --stdin-folders-list doesn't match cone patterns

If we supply the same set of paths to git sparse-checkout add and scalar prefetch --stdin-folders-list, the prefetch command gets a smaller set of files than the sparse-checkout requires when writing files to disk. This leads to a very slow first checkout, even after prefetching.

The real solution is described in #36.

However, it may be worth a temporary fix to the BlobPrefetcher to match a few more paths and speed this up in the short term.

Scalar installs into C:\Program Files\GVFS

I'm not sure what instance of "GVFS" in the codebase causes the installer to write into C:\Program Files\GVFS, but it requires the GVFS.Service and other GVFS.Mount processes to be terminated for the Scalar installer to work.

Finish conversion to .NET Core

We'll fully embrace the new project model and drop remaining support for .NET Framework.

Fsmonitor: If watchman is installed, then set up fsmonitor

During a scalar clone, we can go ahead and set up the default fsmonitor hook if we detect that watchman is installed. This is orthogonal to #66, as we can assume the demo machine already has watchman installed independently. When #66 is complete, then the check will be redundant, but the hook placement will still work.

Precompute list of needed blobs before a checkout

Rather than fall back on read-object one-by-one, let's precompute what's needed, a la partial.

Ongoing: Port work from VFS for Git

Work in microsoft/vfsforgit sometimes needs a corresponding change here in microsoft/scalar.

Add a comment linking to the PR(s) that need porting to Scalar.

(Use 👍 to indicate you are working on it, 🚀 to indicate the item is done. 👎 for "don't need")

microsoft / scalar Goto Github PK

scalar's People

Contributors

Stargazers

Watchers

Forkers

scalar's Issues

Recommend Projects

Recommend Topics

Recommend Org