Giter VIP home page Giter VIP logo

tomono's Introduction

Multi- to Monorepo Migration

This script merges multiple independent tiny repositories into a single “monorepo”. Every original repo is moved into its own subdirectory, branches with the same name are all merged. See Example for the details.

Download the tomono script on github.com/hraban/tomono.

Features

  • 🕙 Full history of all your prior repos is intact, no changes to checksums
  • #️⃣ Signatures of old repos stay valid
  • 🔁 Create the monorepo and keep pulling in changes from your minirepos later
  • 🔀 Pull in entire new repos as you go, no need to prepare the whole thing at once
  • 🏷 Tags are namespaced to avoid clashes, but tag signatures remain valid
  • 🉑 Branches with weird names (slashes, etc)
  • 👥 No conflicts between files with the same name
  • 📁 Every project gets its own subdirectory

Usage

Run the tomono script with your config on stdin, in the following format:

$ cat my-repos.txt
[email protected]:mycompany/my-repo-abc.git  abc
[email protected]:mycompany/my-repo-def.git  def
[email protected]:mycompany/my-lib-uuu.git   uuu  lib/uuu
[email protected]:mycompany/my-lib-zzz.git   zzz  lib/zzz
https://gitee.com/shijie/zhongguo.git     **

Concrete example:

$ cat my-repos.txt | /path/to/tomono

That should be all ✅.

Custom name for monorepo directory

Don’t like core? Set a different name through an envvar before running the script:

export MONOREPO_NAME=the-big-repo

Custom “master” / “main” branch name

No need to do anything. This script does not handle any master / main branch in any special way. It just merges whatever branches exist. Don’t have a “master” branch? None will be created.

Make sure your own computer has the right branch set up in its init.defaultBranch setting.

Continue existing migration

Large teams can’t afford to “stop the world” while a migration is in progress. You’ll be fixing stuff and pulling in new repositories as you go.

Here’s how to pull in an entirely new set of repositories:

/path/to/tomono --continue < my-new-repos.txt

Make sure you have your environment set up exactly the same as above. Particularly, you must be in the parent dir of the monorepo.

Tags

Tags are namespaced per remote, to avoid clashes. If your remote foo and bar both have a tag v1.0.0, your monorepo ends up with foo/v1.0.0 and bar/v1.0.0 pointing at their relevant commits.

If you don’t like this rewriting, you can fetch all tags from a specific remote to the top-level of the monorepo:

$ git fetch --tags foo

Be prepared to deal with any conflicts.

Lightweight vs. Annotated Tags

N.B.: This namespacing works for all tags: lightweight, annotated, signed. However, for the latter two, there is one snag: an annotated tag contains its own tag name as part of the commit. I have chosen not to modify the object itself, so the annotated tag object thinks it still has its old name. This is a mixed bag: it depends on your case whether that’s a feature or a bug. One major advantage of this approach is that signed tags remain valid. But you will occasionally get messages like:

$ git describe linux/v5.9-rc4
warning: tag 'linux/v5.9-rc4' is externally known as 'v5.9-rc4'
v5.9-rc4-0-gf4d51dffc6c0

If you know what you’re doing, you can force update all signed and annotated tags to their (nested) ref tag name with the following snippet:

git for-each-ref --format '%(objecttype) %(refname:lstrip=2)' | \
    sed -ne 's/^tag //p' |
    GIT_EDITOR=true xargs -I + -n 1 -- git tag -f -a + +^{}

N.B.: this will convert all signed tags to regular annotated tags (their signatures would fail anyway).

Source: GitHub user mwasilew2.

Example

Run these commands to set up a fresh directory with git monorepos that you can later merge:

Initial setup of fake repos

d="$(mktemp -d)"
echo "Setting up fresh multi-repos in $d"
cd "$d"

mkdir foo
(
    cd foo
    git init
    git commit -m "foo’s empty root" --allow-empty
    echo "This is foo" > i-am-foo.txt
    git add -A
    git commit -m "foo’s master"
    git tag v1.0
    git checkout -b branch-a
    echo "I am a new foo feature" > feature-a.txt
    git add -A
    git commit -m "foo’s feature branch A"
)

mkdir 中文
(
    cd 中文
    git init
    echo "你好" > 你好.txt
    git add -A
    git commit -m "中文的root"
    git tag v1.0
    git checkout -b branch-a
    echo "你好 from feature-a" > feature-a.txt
    git add -A
    git commit -m "new 中文 feature branch A"
    git branch branch-b master
    git checkout branch-b
    echo "I am an entirely new 中文 feature: B" > feature-b.txt
    git add -A
    git commit -m "中文’s feature branch B"
)

You now have two directories:

  • foo (branches: master, branch-a)
  • 中文 (branches: master, branch-a, branch-b)

Combine into monorepo

Assuming the tomono script is in your $PATH, you can invoke it like this, from that same directory:

tomono <<EOF
$PWD/foo foo
$PWD/中文 中文
EOF

This will create a new directory, core, where you can find a git tree which looks somewhat like this:

*   b742af2 Merge 中文/branch-a (branch-a)
|\
| * c05c53c new 中文 feature branch A (中文/branch-a)
* |   a51d138 Merge foo/branch-a
|\ \
| * | ebb490a foo’s feature branch A (foo/branch-a)
* | | a08fa18 Root commit for monorepo branch branch-a
 / /
| | *   c53bf94 Merge 中文/branch-b (branch-b)
| | |\
| | | * 5e7f4f5 中文’s feature branch B (中文/branch-b)
| | |/
| |/|
| | * 2738327 Root commit for monorepo branch branch-b
| |
| | *   9a4b33a Merge 中文/master (HEAD -> master)
| | |\
| | |/
| |/|
| * | a9841a8 中文的root (tag: 中文/v1.0, 中文/master)
|  /
| *   b75840e Merge foo/master
| |\
| |/
|/|
* | 1515265 foo’s master (tag: foo/v1.0, foo/master)
* | f71fcde foo’s empty root
 /
* 7803cf5 Root commit for monorepo branch master

Pull in new changes from a remote

It’s possible that while you’re working on setting up your fresh monorepo, new changes have been pushed to the existing single repos:

(
    cd foo
    echo New changes >> i-am-foo.txt
    git commit -va -m 'New changes to foo'
)

Because their history was imported verbatim and nothing has been rewritten, you can import those changes into the monorepo.

First, fetch the changes from the remote:

$ cd core
$ git fetch foo

Now merge your changes using subtree merge:

git checkout master
git merge -X subtree=foo/ foo/master

And the updates should be reflected in the monorepo:

$ cat foo/i-am-foo.txt
This is foo
New changes

I used the branch master in this example, but any branch works the same way.

Continue

Now imagine you want to pull in a third repository into the monorepo:

mkdir zimlib
(
    cd zimlib
    git init
    echo "This is zim" > i-am-zim.txt
    git add -A
    git commit -m "zim’s master"
    git checkout -b branch-a
    echo "I am a new zim feature" > feature-a.txt
    git add -A
    git commit -m "zim’s feature branch A"
    # And some more weird stuff, to mess with you
    git checkout master
    git checkout -d
    echo top secret > james-bond.txt
    git add -A
    git commit -m "I am unreachable"
    git tag leaking-you HEAD
    git checkout --orphan empty-branch
    git rm --cached -r .
    git clean -dfx
    git commit -m "zim’s tricky empty orphan branch" --allow-empty
)

Continue importing it:

echo "$PWD/zimlib zim lib/zim" | tomono --continue

Note that we used a different name for this subrepo, inside the lib dir.

The result is that it gets imported into the existing monorepo, alongside the existing two projects:

$ cd core
$ git checkout master
Switched to branch 'master'
$ tree
.
├── foo
│   └── i-am-foo.txt
├── lib
│   └── zim
│       └── i-am-zim.txt
└── 中文
    └── 你好.txt

4 directories, 3 files
$ git checkout branch-a
Switched to branch 'branch-a'
$ tree
.
├── foo
│   ├── feature-a.txt
│   └── i-am-foo.txt
├── lib
│   └── zim
│       ├── feature-a.txt
│       └── i-am-zim.txt
└── 中文
    ├── feature-a.txt
    └── 你好.txt

4 directories, 6 files
$ head **/feature-a.txt
==> foo/feature-a.txt <==
I am a new foo feature

==> lib/zim/feature-a.txt <==
I am a new zim feature

==> 中文/feature-a.txt <==
你好 from feature-a

Implementation

(This section is best viewed in HTML form; the GitHub Readme viewer misses some info.)

The outer program structure is a flat bash script which loops over every repo supplied over stdin:

<<init>>

# Note this is top-level in the script so it’s reading from the script’s stdin
while <<windows-fix>> read -r repourl reponame repopath; do
    if [[ -z "$repopath" ]]; then
        repopath="$reponame"
    fi

    <<handle-remote>>
done

<<finalize>>

# <<copyright>>

Per repository

Every repository is fetched and fully handled individually, and sequentially:

  1. fetch all the data related to this repository,
  2. immediately check out and initialise every single branch which belongs to that repository.
git remote add "$reponame" "$repourl"
git config --add "remote.$reponame.fetch" "+refs/tags/*:refs/tags/$reponame/*"
git config "remote.$reponame.tagOpt" --no-tags
git fetch --atomic "$reponame"

<<list-branches>> | while read -r branch ; do
    <<handle-branch>>
done

The remotes are configured to make sure that a default fetch always fetch all tags, and also puts them in their own namespace. The default refspec for tags is +refs/tags/*:refs/tags/*, as you can see that puts everything from the remote at the same level in your monorepo. Obviously that will cause clashes, so we add the reponame as an extra namespace.

The --no-tags option is the complement to --tags, which has that default refspec we don’t want. That’s why we disable it and roll our own, entirely.

Per branch (this is where the magic happens)

In the context of /a single repository,/ every branch is independently read into a subdirectory for that repository, and merged into the monorepo.

This is the money shot.

<<move-files-to-subdirectory>>
<<ensure-on-target-branch-in-monorepo>>

git read-tree --prefix "$repopath" "refs/remotes/$reponame/$branch"
tree="$(git write-tree)"
merge_commit="$(git commit-tree \
    "$tree" \
    -p "$branch" \
    -p "$move_commit" \
    -m "Merge $reponame/$branch")"
git reset -q "$merge_commit"

Source: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

Move files to a subdirectory

The files are moved in a separate, isolated pre-merge step: this helps keep the merge commit a “pure” merge and helps git log --follow heuristics.

git read-tree "$empty_tree"
git read-tree --prefix "$repopath" "refs/remotes/$reponame/$branch"
tree="$(git write-tree)"
move_commit="$(git commit-tree \
    "$tree" \
    -p "refs/remotes/$reponame/$branch" \
    -m "Move all files to $repopath/")"

Source: https://stackoverflow.com/a/17440474/4359699

Ensure we are on the right branch

In this snippet, we ensure that we are ready to merge fresh code from a subrepo into this branch: either we checkout an existing branch in the monorepo by this name, or we create a fresh one.

We are given the variable $branch which is the final name of the branch we want to operate on. It is the same as the name of the branch in each individual target repo.

if ! git show-ref --verify --quiet "refs/heads/$branch"; then
    root_commit="$(git commit-tree \
        "$empty_tree" \
        -m "Root commit for monorepo branch $branch")"
    git branch -- "$branch" "$root_commit"
fi
git symbolic-ref HEAD "refs/heads/$branch"
git reset -q

Instead of using git checkout --orphan and trying to create a new empty commit from the index, we create the empty commit directly and point the new branch to it. Then, we read the branch, new or existing, into the index. Now we have the current index representing the branch, and HEAD pointing at the branch. This allows us to stay in the index and avoid the worktree.

Working with HEAD feels odd, and it requires using git reset to update the branch, rather than git branch -f ..., because the branch is checked out. This is still more reliable than not pointing HEAD at the branch, because HEAD is always pointing at some branch (e.g. “master”), so it is easier to just assume you’re always pointing at the “current” branch.

Sources:

Non-goal: merging into root

GitHub user @woopla proposed in #42 the ability to merge a minirepo into the monorepo root, as if you used . as the subdirectory. We ended up not going for it, but it was interesting to investigate how to do this with git read-tree. The closest I got was:

if [[ "$repopath" == "." ]]; then
    # Experimental—is this how git read-tree works? I find it very confusing.
    git read-tree "$branch" "$reponame/$branch"
else
    git read-tree --prefix "$repopath" "$reponame/$branch"
fi

I must to confess I find the git read-tree man page too daunting to fully stand by this. I mostly figured it out by trial and error. It seems to work?

If anyone could explain to me exactly what this tool is supposed to do, what those separate stages are (it talks about “stage 0” to “stage 3” in its 3 way merge), and how you would cleanly do this, just for argument’s sake, I’d love to know.

But, as it turned out, this tool already has a way to merge a repo into the root: just make it the monorepo, and use it as a target for a --continue operation. That solves that.

Set up the monorepo directory

We create a fresh directory for this script to run in, or continue on an existing one if the --continue flag is passed.

# Poor man’s arg parse :/
arg="${1-}"
: "${MONOREPO_NAME:=core}"

case "$arg" in
    "")
        if [[ -d "$MONOREPO_NAME" ]]; then
            >&2 echo "monorepo directory $MONOREPO_NAME already exists"
            exit 1
        fi
        mkdir "$MONOREPO_NAME"
        cd "$MONOREPO_NAME"
        git init
        ;;

    "--continue")
        if [[ ! -d "$MONOREPO_NAME" ]]; then
            >&2 echo "Asked to --continue, but monorepo directory $MONOREPO_NAME doesn’t exist"
            exit 1
        fi
        cd "$MONOREPO_NAME"
        if git status --porcelain | grep . ; then
            >&2 echo "Git status shows pending changes in the repo. Cannot --continue."
            exit 1
        fi
        # There isn’t anything special about --continue, really.
        ;;

    "--help" | "-h" | "help")
        cat <<EOF
Usage: tomono [--continue]

For more information, see the documentation at "https://tomono.0brg.net".
EOF
        exit 0
        ;;

    *)
        >&2 echo "Unexpected argument: $arg"
        >&2 echo
        >&2 echo "Usage: tomono [--continue]"
        exit 1
        ;;
esac

Most of this rigmarole is about UI, and preventing mistakes. As you can see, there is functionally no difference between continuing and starting fresh, beyond mkdir and git init. At the end of the day, every repo is read in greedily, and whether you do that on an existing monorepo, or a fresh one, doesn’t matter: every repo name you read in, is in fact itself like a --continue operation.

It’s horrible and kludgy but I just want to get something working out the door, for now.

List individual branches

I want a single branch name per line on stdout, for a single specific remote:

git branch -r --no-color --list "$reponame/*" --format "%(refname:lstrip=3)"

Implementations that didn’t make the cut

Solutions I abandoned, due to one short-coming or another:

git branch -r with grep

The most straight-forward way to list branch names:

$ git branch -r
  bar/branch-a
  bar/branch-b
  bar/master
  foo/branch-a
  foo/master

This could be combined with grep to filter all branches for a specific remote, and filter out the name. It’s very close, but how do you reliably remove an unknown string?

find .git/refs/hooks

( cd ".git/refs/remotes/$reponame" && find . -type f -mindepth 1 | sed -e s/..// )

Closer, but ugly, and I got reports that it missed some branches (although I was never able to repro)

git ls-remote

git ls-remote --heads --refs "$reponame" | sed 's_[^ ]* *refs/heads/__'

Originally suggested in a PR 39, I’ve decided not to use this because git-ls-remote actively queries the remote to list its branches, rather than inspecting the local state of whatever we just fetched. That feels like a race condition at best, and becomes very annoying if you’re dealing with password protected remotes or otherwise inaccessible repos.

Init & finalize

Initialization is what you’d expect from a shell script:

<<set-flags>>

<<prep-dir>>

empty_tree="$(git hash-object -t tree /dev/null)"

On the other side, when done, update the working tree to whatever the current branch is to avoid any confusion:

git checkout .

Error flags, warnings, debug

Various sh flags allow us to control the behaviour of the shell: treat any unknown variable reference as an error, treat any non-zero exit status in a pipeline as an error (instead of only looking at the last program), and treat any error as fatal and quit. Additionally, if the DEBUGSH environment variable is set, enable “debug” mode by echoing every command before it gets executed.

set -euo pipefail ${DEBUGSH+-x}

if ((BASH_VERSINFO[0] > 4 || (BASH_VERSINFO[0] == 4 && BASH_VERSINFO[1] >= 4))); then
	shopt -s inherit_errexit
fi

Also contains a monstrosity which is essentially a version guard around the inherit_errexit option, which was only introduced in Bash 4.4. Notably Mac’s default bash doesn’t support it so the version guard is useful.

Windows newline fix

On Windows the config file could contain windows newline endings (CRLF). Bash doesn’t handle those as proper field separators. Even on Windows…

We force it by adding CR as a field separator:

IFS=$'\r'"$IFS"

It can’t hurt to do this on other computers, because who has a carriage return in their repo name or path? Nobody does.

The real question is: why is this not standard in Bash for Windows? Who knows. I’d add it to my .bashrc if I were you 🤷‍♀️.

Building the code

This is for tomono development only—end users can directly use the tomono script from this repo without building anything.

Nix

To build a stand-alone executable:

nix build .#dist

Find the executable in ./result/bin/, and the documentation in ./result/doc.

To test the code

nix flake check .

Troubleshooting: If you don’t have flakes enabled, add this flag just after the nix command:

nix --extra-experimental-features "nix-command flakes" ...

Manually using Emacs

You can use Emacs to build the code manually:

Most of the code in this repository is generated from this readme file. This can be done in stock Emacs, by opening this file and calling M-x org-babel-tangle.

This file can also be exported to HTML. You can use the code below (and its exported command literate-html-export) to add some flourish to the HTML.

;;; literate-html.el --- Export org file to HTML -*- lexical-binding: t; -*-

;; Author: Hraban Luyat <[email protected]>
;; Keywords: lisp
;; Version: 0.0.1
;; Package-Requires: ((emacs "27.1") (dash "2.19.1"))
;; URL: https://tomono.0brg.net/

;; <<copyright>>

;;; Commentary:

;; Slightly more elaborate HTML export for literate programming in Org, aka
;; babel + noweb. Adds references between listings.

;;; Code:

(require 'cl-lib)
(require 'dash)
(require 's)
(require 'org)
(require 'ox-html) ;; For the dynamic config vars

(defun literate-html--org-info-name (info)
  (nth 4 info))

(defun literate-html--insert-ln (&rest args)
  (apply #'insert args)
  (newline))

(defun literate-html--should-reference (info)
  "Determine if this info block is a referencing code block"
  (not (memq (alist-get :noweb (nth 2 info))
             '(nil "no"))))

(defun literate-html--re-findall (re str &optional offset)
  "Find all matches of a regex in the given string"
  (let ((start (string-match re str offset))
        (end (match-end 0)))
    (when (numberp start)
      (cons (substring str start end) (literate-html--re-findall re str end)))))

;; Match groups are the perfect tool to achieve this but EL's regex is
;; inferior and it's not worth the hassle. Blag it manually.

(defun literate-html--strip-delimiters (s prefix suffix)
  "Strip a PREFIX and SUFFIX delimiter from S.

(literate-html--strip-delimiters \"<a>\" \"<\" \">\")
=> \"a\"

Note this function trusts the input string has those delimiters"
  (substring s (length prefix) (- (length suffix))))

(defun literate-html--strip-noweb-delimiters (s)
  "Strip the org noweb link delimiters from S, usually << and >>"
  (literate-html--strip-delimiters s
                        org-babel-noweb-wrap-start
                        org-babel-noweb-wrap-end))

(defun literate-html--extract-refs (body)
  (mapcar #'literate-html--strip-noweb-delimiters
          (literate-html--re-findall (org-babel-noweb-wrap) body)))

(defun literate-html--add-to-hash-list (k elem hash)
  "Assuming the HASH values are lists, add this ELEM to K’s list"
  (puthash k (cons elem (gethash k hash)) hash))

(defvar literate-html--forward-refs)
(defvar literate-html--back-refs)

(defun literate-html--register-refs (name refs)
  (puthash name refs literate-html--forward-refs)
  ;; Add a backreference to every ref
  (mapc (lambda (ref)
          (literate-html--add-to-hash-list ref name literate-html--back-refs))
        refs))

(defun literate-html--parse-blocks ()
  (let ((literate-html--forward-refs (make-hash-table :test 'equal))
        (literate-html--back-refs (make-hash-table :test 'equal)))
    (org-babel-map-src-blocks nil
      ;; Probably not v efficient, but should be memoized anyway?
      (let* ((info (org-babel-get-src-block-info full-block))
             (name (literate-html--org-info-name info)))
        (when (and name (literate-html--should-reference info))
          (literate-html--register-refs name (literate-html--extract-refs body)))))
    (list literate-html--forward-refs literate-html--back-refs)))

(defun literate-html--format-ref (ref)
  (format "[[%s][%s]]" ref ref))

(defun literate-html--insert-references-block (info title refs)
  (when refs
    (insert title)
    (->> refs (mapcar 'literate-html--format-ref) (s-join ", ") literate-html--insert-ln)
    (newline)))

(defun literate-html--insert-references (info forward back)
  (when (or forward back)
    (newline)
    (literate-html--insert-ln ":REFERENCES:")
    (literate-html--insert-references-block info "References: " forward)
    (literate-html--insert-references-block info "Used by: " back)
    (literate-html--insert-ln ":END:")))

(defun literate-html--fix-references (backend)
  "Append a references section to every noweb codeblock"
  (cl-destructuring-bind (forward-refs back-refs) (literate-html--parse-blocks)
    (org-babel-map-src-blocks nil
      (let ((info (org-babel-get-src-block-info full-block)))
        (when (literate-html--should-reference info)
          (let ((name (literate-html--org-info-name info)))
            (goto-char end-block)
            (literate-html--insert-references
             info
             (gethash name forward-refs)
             (gethash name back-refs))))))))

(defun literate-html-export ()
  "Export current org buffer to HTML"
  (interactive)
  (add-hook 'org-export-before-parsing-hook 'literate-html--fix-references nil t)

  ;; The HTML output
  (let ((org-html-htmlize-output-type 'css))
    (org-html-export-to-html)))

(provide 'literate-html)

Tests

(This section is best viewed in HTML form; the GitHub Readme viewer misses some info.)

The examples from this document can be combined into a test script:

<<set-flags>>
# In tests always echo the command:
set -x
export DEBUGSH=true

# The tomono script is tangled right next to the test script
export PATH="$PWD:$PATH"

# Ensure testing always works even on unconfigured CI etc
export GIT_AUTHOR_NAME="Test"
export GIT_AUTHOR_EMAIL="[email protected]"
export GIT_COMMITTER_NAME="Test"
export GIT_COMMITTER_EMAIL="[email protected]"

<<test-setup>>
<<test-run>>
<<test-evaluate>>

<<test-extra>>

All we needed to write was the code that actually evaluates the tests and fixtures.

I use that weird diff -u <(..) trick instead of a string compare like [[ "foo" == "..." ]] , because the diff shows you where the problem is, instead of just failing the test without comment.

Edge case: same branch and tag name

If you have a branch and tag with the same name in a git repo, you will be familiar with this error:

warning: refname ‘foo’ is ambiguous.

See #53. This happens whenever you refer to the tag or branch by its bare name, without specifying whether it’s a tag or a branch. To fix this, the monorepo script must always use refs/heads/... to specify the branch name.

Example:

mkdir duplicates
(
  cd duplicates
  git init -b check-dupes
  echo a > a
  echo b > b
  git add -A
  git commit -m commit1 a
  git tag check-dupes
  git commit -m commit2 b
)

We now have a duplicates repository with a branch and tag check-dupes, pointing at different revisions. After including it in the monorepo:

echo "$PWD/duplicates duplicates" | tomono --continue

We should get:

(
  cd core
  git checkout check-dupes
  # This file must exist
  diff -u duplicates/a <(echo a)
  # This file too
  diff -u duplicates/b <(echo b)
)

Copyright and license

This is a cleanroom reimplementation of the tomono.sh script, originally written with copyright assigned to Ravelin Ltd., a UK fraud detection company. There were some questions around licensing, and it was unclear how to go forward with maintenance of this project given its dispersed copyright, so I went ahead and rewrote the entire thing for a fresh start.

The license and copyright attribution of this entire document can now be set:

Copyright © 2020, 2022, 2023 Hraban Luyat

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

I did not look at the original implementation at all while developing this.

tomono's People

Contributors

hraban avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tomono's Issues

Error: .gitignore overlaps with .gitignore. Cannot bind.

Attempting to merge two repositories, "sdk" and "ios", into a common repo, "shared".

I've tried twice now, but receive this error:

Fetching sdk..
Automatic merge went well; stopped before committing as requested
Automatic merge went well; stopped before committing as requested
Automatic merge went well; stopped before committing as requested
Automatic merge went well; stopped before committing as requested
e718385aae9109caf79b60678769846ec427f7b3
error: Entry 'sdk/.gitignore' overlaps with 'sdk/.gitignore'. Cannot bind.

Both sdk and ios have gitignore files, so I removed git from the master branches and pushed. Tried to merge again and received the same error. Both repos have many branches.

  1. Does gitignore have to be manually removed from every branch?
  2. Should I modify tomono.sh to use diff-tree instead of read-tree?

Combine into monorepo - same git name

I'm facing problems with merging this repos (I guess that this is happen bacause Algorithm.git is the same)

tomono <<EOF
git@XXXX:/home/git/projects/Algorithm.git Algorithm_1
git@YYYY:/home/git/projects/Algorithm.git Algorithm_2
EOF

image

How to sync the new branches of the subrepos after migration?

Hello:
I have two subrepos named repo1 and repo2 with branch master/dev。After the migration,the monorepo have branch master/dev。After that,the repo1 and repo2 create new branch dev_new。How can monorepo sync the new branch dev_new?
Thanks

How to merge repositories from different paths to monorepo and subsplit

Plan

  • merge history of 1 remote repositories into 1 monorepo
  • publish via git subsplit from 1 monorepo to 1 repository back

Steps to Reproduce

1. Tomono

# repositories.txt
[email protected]:shopsys/product-feed-zbozi.git packages/zbozi
wget https://raw.githubusercontent.com/unravelin/tomono/master/tomono.sh
chmod +x tomono.sh 
cat repositories.txt | ./tomono.sh
# move all to this level (usually done manually, so not sure if this script works)
mv core/.git .
mv core/* .

2. Split To Many Repos

git subsplit init [email protected]:TomasVotruba/shopsys-monorepo-test.git
git subsplit publish --heads="master" packages/zbozi:[email protected]:TomasVotruba/tomono-after-split-product-feed-zbozi.git
rm -rf .subsplit/

Expected

Real Result

error: invalid path on Windows (carriage-return issue)

To run this on Windows I needed to use:
cat config.txt | tr -d '\r' | ./tomono
Without the middle section, read-tree failed with error: invalid path '{repopath} ?/.{first-file-in-repo}'. This is because the \r before \n gets included in $repopath.

packed-refs are ignored when listing remote branches

The function remote-branches has a bug which may lead to missing branches and/or subtrees in the monorepo.

The file-based approach for listing all branches for a given remote does not consider packed-refs. The Git GC (garbage collector) may occasionally pack individual ref files into a single file named .git/packed-refs and remove them from the refs directory. Those refs will be ignored by tomono.

Further information:

For now we cannot provide any steps to reproduce, but it seems that the issue is related to the number and size of the repositories to process. In our case, we migrated ~100 single repositories with a total of ~125000 commits and ~1400 remote branches into a monorepo, when we detected that the content of a certain single repository was missing.

Ownership Transfer

Hey @hraban, I don't think anyone here at Ravelin knows much about tomono, though we're greatful for the monorepo that it's given us. As you're the most active here in replying to issues and pull requests, would you be interested in us transferring ownership to you?

Some kind of issue in your sed command

Mac:

$ cat repos.txt | ./tomono/tomono.sh
~/work/src/github.com/althea-mesh/core ~/work/src/github.com/althea-mesh
Initialized empty Git repository in /Users/jehan/work/src/github.com/althea-mesh/core/.git/
Merging in https://github.com/althea-mesh/althea_types..
Automatic merge went well; stopped before committing as requested
Merging in https://github.com/althea-mesh/althea_kernel_interface..
warning: no common commits
sed: 1: "s_kernel_interface/__
": bad flag in substitute command: '_'

Linux:

ubuntu@ubuntu-xenial:/vagrant/src/github.com/althea-mesh$ sudo cat repos.txt | ./tomono/tomono.sh 
/vagrant/src/github.com/althea-mesh/core /vagrant/src/github.com/althea-mesh
Initialized empty Git repository in /vagrant/src/github.com/althea-mesh/core/.git/
Merging in https://github.com/althea-mesh/althea_types..
Automatic merge went well; stopped before committing as requested
Merging in https://github.com/althea-mesh/althea_kernel_interface..
warning: no common commits
sed: -e expression #1, char 21: unknown option to `s'

Tags

Hi,
your script does not migrate the existing tags to the monorepo (at least in my case). Do you have any advice how to achieve this?

Thanks!

Create monorepo with git commit history without branches

Hi.

Could you add the ability to create a clean new repository where the commit history of several projects is preserved, but the old branches are not migrated. I want to combine several of my projects which are already several years in one monorepository to get something like this:
. core/
. . projects/
. . . project1 (use devPod1,devPod2,devPod3...)
. . . project2 (use devPod1,devPod2,devPod3...)
. . . projetct3 (use devPod1, devPod3...)
. . developerPods
. . . devPod1
. . . devPod2
. . . devPod3

How to handle ambiguous refname

I am working on merging multiple long lived repos together.

Unfortunately in many places they have a branch name and tag with the same name. This causes this warning/error to be thrown

warning: refname 'reponame/branchname' is ambiguous.
fails here2?
warning: refname 'reponame/branchanme' is ambiguous.
fatal: sha1commithash is not a valid 'commit' object

The the fails here2? print comes after this line.
https://github.com/hraban/tomono/blob/master/tomono#L73

Is there a programmatic way to handle this?

I started looking at this and trying to use "refs/heads/" in order to use the fully qualified branch name, but I don't understand this well enough yet to get that to work.
https://stackoverflow.com/questions/28192422/git-warning-refname-xxx-is-ambiguous

Does it properly handle branches in multiple repos?

Dear all,

reading the readme, this tool supports multiple repos with multiple branches each, merging all of them into a new repo with even more branches.

Is that correct?

I am asking since that seems to be really hard - could not find that with many other options/tools here and also googling for quite a while.

Also, are the branches fully functioning afterwards or are there any restrictions? One of the issues here says that branches cannot be merged anymore post migration.

Many Thanks!
Andreas

Namepaced tags pushing to remote failure

Hello @hraban, I have an issue with pushing namespace tags, the issue in a way affects this tool since it produces the namespaces tags. I've tried several ways of pushing namespaced & non-namespaced tags altogether to multi repositories but have been unsuccessful.

Upon several attempts using different methods, I concluded on this ref refs/{remotes/origin,tags/foo,tags}/* which seems to fail. I really need help with this.

License missing

We'd like to build upon your work. However this repository is missing a license. Could you please add one?

LFS Support?

I wonder what it would take to support LFS, or if that's too big a lift for the plumbing involved...

Here's my public test case:

testmono.repos.txt:

https://github.com/hraban/tomono.git tomono tools/tomono
https://github.com/Apress/repo-with-large-file-storage.git lfsrepo

I have already install git lfs globally, so that I can just git clone from my own repos using LFS and everything "just works":

$ git clone https://github.com/Apress/repo-with-large-file-storage.git
Cloning into 'repo-with-large-file-storage'...
remote: Enumerating objects: 16, done.
remote: Total 16 (delta 0), reused 0 (delta 0), pack-reused 16
Receiving objects: 100% (16/16), 146.84 KiB | 18.36 MiB/s, done.
Resolving deltas: 100% (5/5), done.
$ cd repo-with-large-file-storage
$ git lfs ls-files
6e1b890daf * LargeFile.zip

But when I run tomono, I get:

$ env MONOREPO_NAME=testmono ./tomono/tomono < testmono.repos.txt
...
From https://github.com/Apress/repo-with-large-file-storage
 * [new branch]      master     -> lfsrepo/master
Downloading lfsrepo/LargeFile.zip (208 MB)
Error downloading object: lfsrepo/LargeFile.zip (6e1b890): Smudge error: Error downloading lfsrepo/LargeFile.zip (6e1b890dafa9956b1bdff43b3fd46aaf273085a1d2041bde7efce6cf5eb0262b): batch request: missing protocol: ""

Errors logged to '/home/bschober/git/testmono/.git/lfs/logs/20240607T153027.89679484.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: lfsrepo/LargeFile.zip: smudge filter lfs failed

Did we do it wrong?

We use the script, and have the files, we also have the commits (as shown by a git log in root), but they are disconnected.

All files have a single commit against them "Merging api to master" for example. Basically they look like fresh files - but, interestingly, the commits are still there, but point to files in their old location e.g.
api/index.js = single commit
index.js = all historical commits for api's index.js

What went wrong? Is there a way to recover?

Thanks

Commit history of a repository is missing after the merge of four repositories

Hi, Thanks for the solution.

I have tried merging four repositories together with this script.

However when we check the commit history of the merged repository, we found that the first repository's commit history is completely missing from the overall merged repo commits.

Is this expected or should we make any changes to the script to get the commit history of all four repos in the merged repository?

We just want to have the commit history INTACT (combination of all commits together across these 4 repos) on the merged repo.

Thanks,
Sitaram

Empty history for folders

Hello, thank you for the script. I'm was playing with it and noticed one issue:
The folder git history is empty. There is only one entry like this created by tomono:
"Merging module-test to master"

I've seen this issue:
#6
This is strange that this user is able to see the history for the folder, but not for the file.
In my case actually it's empty for both files and folders when I run "git log", but I see the file history when I run it from my IDE. But for the folders, it's always empty. So currently the main problem for me is folder history.
git blame works fine in CLI and IDE.

Do you think is there any way to fix it?
Is it related to this git bug as well? http://git.661346.n2.nabble.com/git-log-follow-doesn-t-follow-a-rename-over-a-merge-td6480971.html
I tried --follow as well. In this case log is completely empty.

Thank you.

Only the remote's local branches are cloned, not the remote's remote tracking branches

I'm trying tomono as follows:

  • Clone repositories A and B from server to local machine.
  • Run tomono on local A and local B.

The problem is that only master branch gets handled, although local A and local B have many remote branches. The solution is to track the remote branches in A and B first:

for i in $(git branch -r | grep -vE "HEAD|master"); do 
    git branch --track ${i#*/} $i; done

Then tomono does it job as expected.

Should tomono be changed to somehow consider also the untracked remote branches in A and B?

Git Log not returning full history of files

I have tried running the script on three repositories, which ran successfully. However when running "git log" on files, it seems we have lost the commit history of the file, apart from one commit entry which appears to come from the script, For example running "git log somefile" in ProjectBlah we see:

.....
"Merging ProjectBlah to master"
....

However, when running "git log" on a folder it returns the expected commit history.

Have you seen this before?

After using tomono, rebase and log contents are very different.

Hi,

This is a question. I've used the tomono script on a collection of repos that used to be a mono repo, but has since spent many years as individual repos, but now once again would like to be a mono repo.

So there are many commits that could be fixedup as one commit, and this is obvious when viewed in chronological order over the set of all paths. (20 commits all saying the same thing!)

git log gives me that order, and this script appears to have done the job very well - thank you because I did get similar results myself but without support for branches, and the simplicity of your approach is very enlightening)

But so that I can do these fixup's I need to rebase, when I do the rebase I see the commits in the order they were applied rather than in the order they were authored or committed and I see no options to resolve that within the git rebase method.
So this means those 20 commits are shown entire repo/paths apart in the list because in rebase you see repo/path#1 commits in chronological order, then repo/path#2...#n.

I wrote a script to write my own rebase todo file using git log, however, this is complicated due to merge commits, and the repo won't rebase without replaying the merges.

Do you have an ideas how I might progress?
When you have used this script, did you then try to rebase your monorepo to tidy it up and if so did you encounter this problem, or is it unique for me?

Merging branches afer migration

Hi,

Thanks for the script, it works as expected. Had some issues with slashes in the branch names, but you guys fixed them in the mean time.

We've been using git flow in all our individual repos until now. After running the migration script, as expected we have master, develop, and a few feature/X branches. The problem comes when trying to finish one of the features (merge it do develop). Git seems to be very confused and throws a lot of conflicts. Some concrete examples include:

  1. develop contains a file, while feature/f1 has deleted it.
  2. develop contains a file, while feature/f1 has renamed it.
  3. small files with changes (this can be fixed with setting rename-threshold lower)

Trying to perform the same merge in the original repository shows no conflict. Based on my understanding, this issue arises because read-tree shows as an independent copy on each individual branch (develop and feature/f1).

Did anybody encounter this issue? Any recommendations?

Thanks,
Calin

issue with annotated tags

There seems to be an issue with migration of tags if repos contain annotated tags. I'm not sure what exactly is happening there, but it seems that the script is just moving refs/tags without updating the "tag objects in the git database" (these should only exist for annotated tags). As a result there's some unexpected behaviour, e.g. :

$ git describe
warning: tag 'my_tag' is really 'repo1/mytag' here
my_tag

one solution is to rewrite annotated tags. A simple git tag should do it. It will result in a single tag e.g. repo1/mytag, which has both objects in the git database, the new one with metadata from migration and old one with the original metadata (date, author, signing info, etc.)

I'll submit a PR with a fix, but I'll gladly hear about any other ideas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.