Giter VIP home page Giter VIP logo

p4-fusion's Introduction

p4-fusion

build-check format-check

A fast Perforce depot to Git repository converter using the Helix Core C/C++ API as an attempt to mitigate the performance bottlenecks in git-p4.py.

This project was started as a proof of concept for an internal project which required converting P4 depots to Git repositories. A similar solution exists within Git, known as git-p4.py, however, it has (as of the time of writing) performance issues with any depot greater than 1 GB in size, and it runs in a single thread using Python2 which adds another set of limitations to the use of git-p4.py for larger use-cases.

This tool solves some of the most impactful scaling and performance limitations in git-p4.py by:

  • Using the Helix Core C++ API to handle downloading CLs with more control over the memory and how it is committed to the Git repo without unnecessary memory copies and file I/O.
  • Using libgit2 to forward the file contents received from the Perforce server as-is to a Git repository, while avoiding memory copies as much as possible. This library allows creating commits from file contents existing plainly in memory.
  • Using a custom wakeup-based threadpool implemented in C++11 that runs thread-local library contexts of the Helix Core C++ API to heavily multithread the changelist downloading process.

Performance

Please be aware that this tool is fast enough to instantaneously generate a tremendous amount of load on your Perforce server (more than 150K requests in a few seconds if running with a couple hundred network threads). Since p4-fusion will continue generating load within the limits set using the runtime arguments, it needs careful monitoring to ensure that your Perforce server does not get impacted.

However, having no rate limits and running this tool with several hundred network threads (or more if possible) is the ideal case for achieving maximum speed in the conversion process.

The number of network threads should be set to a number that generally is much more than the number of logical CPUs because the most time-taking step is a low CPU intensive task i.e. downloading the CL data from the Perforce server.

In our study, this tool is running upwards of 100 times faster than git-p4.py. We have observed an average time of 26 seconds for the conversion of the history inside a depot path containing around 3393 moderately sized changelists using 200 parallel connections, while git-p4.py was taking close to 42 minutes to convert the same depot path. If the Perforce server has the files cached completely then these conversion times might be reproducible, else if the file cache is empty then the first couple of runs are expected to take much more time.

These execution times are expected to scale as expected with larger depots (millions of CLs or more). The tool provides options to control the memory utilization during the conversion process so these options shall help in larger use-cases.

Usage

[ PRINT @ Main:56 ] Usage:
--client [Required]
        Name/path of the client workspace specification.

--flushRate [Optional, Default is 1000]
        Rate at which profiling data is flushed on the disk.

--fsyncEnable [Optional, Default is false]
        Enable fsync() while writing objects to disk to ensure they get written to permanent storage immediately instead of being cached. This is to mitigate data loss in events of hardware failure.

--includeBinaries [Optional, Default is false]
        Do not discard binary files while downloading changelists.

--lookAhead [Required]
        How many CLs in the future, at most, shall we keep downloaded by the time it is to commit them?

--branch [Optional]
        A branch to migrate under the depot path.  May be specified more than once.  If at least one is given and the noMerge option is false, then the Git repository will include merges between branches in the history.  You may use the formatting 'depot/path:git-alias', separating the Perforce branch sub-path from the git alias name by a ':'; if the depot path contains a ':', then you must provide the git branch alias.

--noMerge [Optional, Default is false]
        When false and at least one branch is given, then .  If this is true, then the Git history will not contain any merges, except for an artificial empty commit added at the root, which acts as a common source to make later merges easier.

--maxChanges [Optional, Default is -1]
        Specify the max number of changelists which should be processed in a single run. -1 signifies unlimited range.

--networkThreads [Optional, Default is 16]
        Specify the number of threads in the threadpool for running network calls. Defaults to the number of logical CPUs.

--noColor [Optional, Default is false]
        Disable colored output.

--path [Required]
        P4 depot path to convert to a Git repo

--port [Required]
        Specify which P4PORT to use.

--printBatch [Optional, Default is 1]
        Specify the p4 print batch size.

--refresh [Optional, Default is 100]
        Specify how many times a connection should be reused before it is refreshed.

--retries [Optional, Default is 10]
        Specify how many times a command should be retried before the process exits in a failure.

--src [Required]
        Relative path where the git repository should be created. This path should be empty before running p4-fusion for the first time in a directory.

--user [Required]
        Specify which P4USER to use. Please ensure that the user is logged in.

Notes On Branches

When at least one branch argument exists, the tool will enable branching mode.

Branching mode currently only supports very simple branch layouts. The format must be //common/depot/path/branch-name. The common depot path is given as the --path argument, and each --branch argument specifies one branch name to inspect. Branch names must be a directory name immediately after the path (it replaces the ...).

In branching mode, the generated Git repository will be initially populated with a zero-content commit. This allows branches to later be merged without needing the --allow-unrelated-histories flag in Git. All branches will have this in their history.

If a Perforce changelist contains an integration like action (move, integrate, copy, etc.) from another branch listed in a --branch argument, then the tool will mark the Git commit with the integration as having two parents - the current branch and the source branch. If a changelist contains integrations into one branch from multiple other branches, they are put into separate commits, each with just one source branch. If a changelist contains integrations into multiple branches, then each one of those is also its own commit.

Because Perforce integration isn't a 1-to-1 mapping onto Git merge, there can be situations where having the tool mark a commit as a merge, but not bringing over all the changes, leads to later merge logic not picking up every changed file correctly. To avoid this situation, the --noMerge true will ensure they only have the single zero-content root commit shared, so any merge done after the migration will force full file tree inspection.

If the Perforce tree contains sub-branches, such as //base/tree/sub being a sub-branch of //base/tree, then you can use the arguments --path //base/... --branch tree/sub:tree-sub --branch tree. The ordering is important here - provide the deeper paths first to have them take priority over the others. Because Git creates branches with '/' characters as implicit directories, you must provide the Git branch alias to prevent Git reporting an error where the branch "tree" can't be created because is already a directory, or "tree/sub" can't be created because "tree" isn't a directory.

Checking Results

In order to test the validity of the logic, we need to run the program over a Perforce depot and compare each changelist against the corresponding Git commit SHA, to ensure the files match up.

The provided script validate-migration.sh runs through every generated Git commit, and ensures the file state exactly matches the state of the Perforce depot.

Because of the extra effort the script performs, expect it to take orders of magnitude longer than the original p4-fusion execution.

Build

  1. Pre-requisites
  • Install [email protected] at /usr/local/ssl by following the steps here.
  • Install CMake 3.16+.
  • Install g++ 11.2.0 (older versions compatible with C++11 are also supported).
  • Clone this repository or get a release distribution.
  • Get the Helix Core C++ API binaries from the official Perforce website.
    • Tested versions: 2021.1, 2021.2, 2022.1
    • We recommend always picking the newest API versions that compile with p4-fusion.
  • Extract the contents in ./vendor/helix-core-api/linux/ or ./vendor/helix-core-api/mac/ based on your OS.

For CentOS, you can try yum install git make cmake gcc-c++ libarchive to set up the compilation toolchain. Installing libarchive is only required to fix a bug that stops CMake from starting properly.

This tool uses C++11 and thus it should work with much older GCC versions. We have tested compiling with both GCC 11.2.0 and GCC 4.8.

  1. Generate a CMake cache
./generate_cache.sh Debug

Replace Debug with Release or RelWithDebInfo or MinSizeRel for a differently optimized binary. Debug will run marginally slower (considering the tool is mostly bottlenecked by network I/O) but will contain debug symbols and allows a better debugging experience while working with a debugger.

By default tracing is disabled in p4-fusion. It can be enabled by including p in the second argument while generating the CMake cache. If tracing is enabled, p4-fusion generates trace JSON files in the cloning directory. These files can be opened in the about:tracing window in Chromium web browsers to view the tracing data.

Tests can be enabled by including t in the second command argument.

E.g. You can build tests and at the same time enable profiling by running ./generate_cache.sh Debug pt.

  1. Build
./build.sh
  1. Run!
./build/p4-fusion/p4-fusion \
        --path //depot/path/... \
        --user $P4USER \
        --port $P4PORT \
        --client $P4CLIENT \
        --src clones/.git \
        --networkThreads 200 \
        --printBatch 100 \
        --lookAhead 2000 \
        --retries 10 \
        --refresh 100

There should be a Git repo being created in the clones/.git directory with commits being created as the tool runs.

Note: The Git repository is created bare i.e. without a working directory and running the same command again shall detect the last committed CL and only continue from that CL onwards. Binaries files are ignored by default and this behaviour can be changed by using the --includeBinaries option. We do not handle .git directories in the Perforce history.

Contributing

Please refer to CONTRIBUTING.md


Licensed under the BSD 3-Clause License. Third-party license attributions are present in THIRDPARTY.md.

p4-fusion's People

Contributors

ericthemagician avatar groboclown avatar igorvpcleao avatar jclx avatar peter-esik avatar ppavacic avatar svc-scm avatar twarit-waikar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

p4-fusion's Issues

Add version string to code and --version argument

p4-fusion doesn't have a way to know the version of the built binary. We should fix this so that it is easier for people to work with multiple versions or people making p4-fusion version upgrades in their workflows

`p4-fusion` 1.13 core dumps with certain configs

Occasional core dumps with 1.13, most of them SIGSEGV

core dumps do not happen all of the time, but this command loops five times; a core dump usually happens several times:

P4USER=<perforce user>
P4PASSWD=<user ticket>
P4PORT=<perforce host>
DEPOT=<depot name>
export P4USER P4PASSWD P4PORT
for x in 1 2 3 4 5
do
tmpdir=${TMP}/test_p4-fusion-${RANDOM}
rm -rf "${tmpdir}"
mkdir -p "${tmpdir}"
echo "attempt ${x}"
p4-fusion-binary \
--path ${DEPOT}... \
--client "" \
--user ${P4USER} \
--src "${tmpdir}/.git" \
--networkThreads 16 \
--printBatch 10 \
--port ${P4PORT} \
--lookAhead 16 \
--maxChanges 32 \
--retries 100 \
--refresh 100000 \
--includeBinaries false \
--fsyncEnable true \
--noColor true 2>&1 | { grep "core dumped" || true; }
rc=$?
echo "return code: ${rc}"
rm -rf "${tmpdir}"
done

Note that <perforce user> needs to be replaced with a value user for the Perforce instance, <user ticket> needs to be replaced with the correct ticket for the user, <perforce host> replaced with the actual Perforce host + port #, and <depot name> with the depot name or path, including the leading // and trailing /.

Test plan

Connect to gitserver via a terminal and run the above command. If using p4-fusion 1.13, it will fail some or all of the time with core dumps.

Example output:

attempt 1
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted
return code: 134
attempt 2
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault
return code: 139
attempt 3
return code: 0
attempt 4
return code: 0
attempt 5
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault
return code: 139

If using p4-fusion 1.12, the output will be all return code: 0, like this one:

attempt 1
return code: 0
attempt 2
return code: 0
attempt 3
return code: 0
attempt 4
return code: 0
attempt 5
return code: 0

Not that it helps here, but changing lookAhead to 1 seems to stop the core dumps, at the cost of cloning speed.

Handle `SIG_KILL` to gracefully exit when the host receives a shutdown signal

This looks like something that might have caused issues but surprisingly hasn’t so far, which means p4-fusion is not corrupting the repo even if it gets interrupted. However, we should be on the safe side and also add the graceful exit handler for SIG_KILL in addition to SIG_INT (SIG_INT handler is already implemented)

Add safeguards to deter people from making any changes to commit message formats

  1. Add a comment here that says the changing the commit message format is a significant change which needs to be evaluated by code owners. The blast radius of changing this format needs to be contained in a single location.
    std::string commitMsg = cl + " - " + desc + "\n[p4-fusion: depot-paths = \"" + depotPath.substr(0, depotPath.size() - 3) + "\": change = " + cl + "]";
  2. Add tests the try to detect if a change in the commit message format has been made and if it is a valid change.

Improve thread affinity at high thread count

At high thread counts (over 500 or so), the CPU starts to get swamped with p4 calls, and it tends to starve the commit thread and causes it to not get any CPU time at all.

This means p4-fusion downloads all the files at first but it is really slow to digest those files and write them out into the repository while creating a commit. This essentially hangs the process.

High threads counts may even start losing connections because they were left idle for too long and so the Perforce server closed those connections.

p4-fusion should be smarter by giving more affinity to the commit thread and also, manage connections in a way that they are not closed off due to idle time. This means a new threadpool dispatch algorithm might be required.

'Invalid option' error when running p4-fusion

Receiving the following error when trying p4-fusion out

[ WARNING @ Run:126 ] Retrying: p4 changes -l -s submitted -r //src/...
[ ERROR @ HandleError:13 ] Received error: error Usage: changes [-i -t -l -L -f -c client -m count -s status -u user] [files...]
Invalid option: -r.

p4 changes has no '-r' option, although the docs state it does.

$ p4 -V
Perforce - The Fast Software Configuration Management System.
Copyright 1995-2022 Perforce Software. All rights reserved.
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/)
Version of OpenSSL Libraries: OpenSSL 1.1.1n 15 Mar 2022
See 'p4 help [ -l ] legal' for additional license information on
these licenses and others.
Extensions/scripting support built-in.
Parallel sync threading built-in.
Rev. P4/LINUX26X86_64/2022.1/2305383 (2022/06/28).

This does not work on M1 Macs

Just a heads up (this is 100% not on this repository but there is not a better place to warn anyone) this is unable to build on M1 macs because they are considered arm machines but there is not a mac arm build on the perforce download site

Feel free to close this since it is not your issue but I wanted it recorded somewhere that M1 macs are unable to build this because the G++ libraries that they use are for arm machines but that is not compatible with the x86/x64 intel C helper files from perforce.

Thanks so much for building this!


Unfortunately even with a docker container running in the M1 there is not an arm linux release which would indicate they have no desire to run on ARM CPUs. So I had to swap to another machine.

RFE: Add timestamp to log messages

Could you add date/time to the progress messages? Messages like this one:

CL 1218143 --> Commit 4d3be6af2d1251f83d9462dc1bc34ed3ab34764c with 6 files (23515/111346|499). Elapsed 59.9115 mins. 223.776 mins left.

[https://github.com/salesforce/p4-fusion/blob/master/p4-fusion/main.cc#L301]

complex --path value, --branch, and BranchSet::stripBasePath incompatibilities

Hello,

I have a streams-based repository with branches of the format:

//vehicles/rav4/...
//vehicles/rav4-prime/...
//vehicles/rav4-hybrid/...
//vehicles/corolla/...
//vehicles/corolla-cross/...
//vehicles/corolla-hybrid/...

With the "prime", "hybrid", and "cross" variants child streams of the appropriately-named mainline stream.

I'd like to p4-fusion --branch rav4 --branch rav4-prime --branch rav4-hybrid --path //vehicles/rav*/... to convert these three streams. The client is written to also select only these three.

The internal call to p4 changes -l -s submitted //vehicles/rav*/... will get only the CLs that are associated with the three streams implied by the depot path. This simplifies and shortens the list of CLs requested from the server.

The result of p4-fusion called this way is actually to request the right CLs but they end up skipping because they are determined to be "Not under the depot path" in BranchSet::ParseAffectedFiles due to the prefix strip not matching.

If I use --path //vehicles/... I end up with a lot of zero file CLs (corresponding to the non-rav4 streams) and an eventual segfault whose cause I haven't determined yet.

I can kludge the p4-fusion run to do what I want by manually assigning m_basePath but this is very ugly.

Cannot Migrate a Stream Branch

We use stream branches in our development.
I created a virtual branch to narrow down the scope of what I want to export to git.
I ran in to a number of issues, which I would normally open one for each, but you may need the whole story for them to all make sense. So sorry for the long story / issue.

Issue 1 - Mappings don't work properly

So I ran something like this after creating my new virtual branch:

p4-fusion --client "eyen_p4_fusion" \
    --includeBinaries true  \
    --path "//origin/main/..." \
    --port "perforce.theobjects.com:1666" \
    --printBatch 10 \
    --src /work/git \
    --user user \
    --lookAhead 100

but then I get an error like this:

[ PRINT @ Main:31 ] Running p4-fusion from: /work/p4-fusion/build/p4-fusion/p4-fusion
[ SUCCESS @ InitializeLibraries:144 ] Initialized P4Libraries successfully
[ PRINT @ Main:80 ] Updated client workspace view eyen_p4_fusion with 26 mappings
[ ERROR @ Main:92 ] The depot path specified is not under the p4_fusion_test client spec. Consider changing the client spec so that it does. Exiting.

I know it's under the spec. I just created the virtual branch and workspace.

So I just removed the return statement here and then I got further.

But before going further, I tracked down the problem to this: According to the docs

 Translate() is designed to map single files. 
To model the effect of passing a broader path through a mapping, create a new one-sided mapping that represents that path and Join() it with the other mapping.

In my workspace mapping, I had this:

//origin/main/parent_folder/folder1 /work/space/folder1/...
//origin/main/parent_folder/folder2 /work/space/folder2/...

But then when I try to map the --path argument //origin/main/..., Translate() is not able to determine that the path is valid.

So that's the crux of the issue and I don't know enough of the perforce api to know how to check properly.
I can ask them for support, but I suppose you are more knowledgeable than I am.

Issue 2 - It crashes with malloc(): unaligned tcache chunk detected

After removing that return statement, I then get a crash here:
image
These are the parameters printed at the beginning

[ PRINT @ Main:134 ] Perforce Port: ---                                                                  
[ PRINT @ Main:135 ] Perforce User: --                                                                                                                 
[ PRINT @ Main:136 ] Perforce Client: ---                                                                                                     
[ PRINT @ Main:137 ] Depot Path: //origin/main/...                                                                                                       
[ PRINT @ Main:138 ] Network Threads: 48                                                                                                                     
[ PRINT @ Main:139 ] Print Batch: 10                                                                                                                         
[ PRINT @ Main:140 ] Look Ahead: 100                                                                                                                         
[ PRINT @ Main:141 ] Max Retries: 10                                                                                                                         
[ PRINT @ Main:142 ] Max Changes: -1                                                                                                                         
[ PRINT @ Main:143 ] Refresh Threshold: 100                                                                                                                  
[ PRINT @ Main:144 ] Fsync Enable: 0                                                                                                                         
[ PRINT @ Main:145 ] Include Binaries: 1                                                                                                                     
[ PRINT @ Main:146 ] Profiling: 0                                                                                                                            
[ PRINT @ Main:147 ] Profiling Flush Rate: 1000                             

I didn't know what the tcache bug meant so I ran it with ASAN and the code runs now.
I don't know what this bug means. Maybe I should force it to use less network threads, but it ran.

So now I have my git repo, but this brings me to issue 3.

Issue 3: Workspace mappings are not respected

In my workspace mappings, I have mappings like this:

//origin/main/parent_folder/folder2 /work/space/folder2/...

but in my git repo, what I see when I do clone is this:
git_folder/parent_folder/folder2 instead of git_folder/folder

Ideally, I want the p4-fusion to Trasnlate() the mapping to the local folder and then strip the workspace root and then append the stripped path to the git repo. I don't know where to go for there.

Using openMP

You can use OpenMP for multi-threaded calculations

It is supported by most C ++ compilers (GCC, Clang and MSVC), this easily applies to a for loop for example,
You can even use another machine or the graphics card to perform some part of the calculations.

More info:
https://bisqwit.iki.fi/story/howto/openmp/
https://medium.com/swlh/introduction-to-the-openmp-with-c-and-some-integrals-approximation-a7f03e9ebb65

Exemple my brute-force project:
https://github.com/bensuperpc/GTA_SA_cheat_finder/blob/196dcedf30e81317148d025961b65f7213aa47e1/source/main.cpp#L89

Not ignoring .git directory

Hi there,

After running a sync for a large project I found this error:

[ ERROR @ AddFileToIndex:147 ] GitAPI: -1:10: invalid path: 'redacted/redacted/redacted/redacted/redacted/redacted/redacted/redacted/Private/Interfaces/.git'

It seems that p4-fusion is trying to add .git from a submodule located in a deeper directory.

By looking at p4-fusion source code (change_list.cc#L57), p4-fusion is ignoring .git/*, but not the .git per se. I believe this is a bug and it should also ignore files ending with .git.

Does it make sense? After implementing this change locally, the project was successfully synced.

git p4 sync ?

Hi,

Thanks for open sourcing this tool. It's definitely very fast when compared to git-p4.py

Do you plan to add the sync command ?

I am willing contribute its implementation if you can give me some pointers on how you'd like it done.

Thanks,
Mathieu.

issue with case-sensitivity

We were hunting down a missing file after the import:

The CL's action is "branch". Within the CL, all files have integrate as the action. Except one, which has the action "branch".

After the import, we found that the file that has "branch" as the action is missing from the git repo.

p4-fusion is a godsend by the way. Thanks for open-sourcing this tool!

version: 1.10

Random crash when built and run under Alpine linux

The error seems to only occur in the musl library, and doesn't mention any p4-fusion specific functions in the stacktrace, even in a debug build. However, we only see this in alpine builds.

p4-fusion crashes in this case either at the end of the execution, or before starting the Git commit process (or when the tool is downloading files)

GDB core dump below:

Core was generated by `p4-fusion --path //depot/path/... --client p4-client'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f5bd160c896 in ?? () from /lib/ld-musl-x86_64.so.1
[Current thread is 1 (LWP 296)]
(gdb) where
#0  0x00007f5bd160c896 in ?? () from /lib/ld-musl-x86_64.so.1
#1  0x00007f5bd160cc2f in ?? () from /lib/ld-musl-x86_64.so.1
#2  0x0000000000000000 in ?? ()

Rare cases when `~P4API()` crashes while p4-fusion is shutting down

In rare cases, p4-fusion crashes when running ThreadPool::ShutDown(), where the P4API destructor (defined in the Helix Core C++ API library) runs into a segmentation fault.

Version: 1.9
OS: CentOS 7

Crash backtrace from gdb:

Program terminated with signal 11, Segmentation fault.
#0  Signaler::DeleteOnIntr (this=0x55a7a6720460 <signaler>, ptr=0x7f1b1893c3b0) at ../p4/sys/signaler.cc:229
229	../p4/sys/signaler.cc: No such file or directory.
(gdb)
(gdb) where
#0  Signaler::DeleteOnIntr (this=0x55a7a6720460 <signaler>, ptr=0x7f1b1893c3b0) at ../p4/sys/signaler.cc:229
#1  0x000055a7a62c4b13 in Rpc::~Rpc (this=0x7f1b1893c3b0, __in_chrg=<optimized out>) at ../p4/rpc/rpc.cc:312
#2  0x000055a7a62a3ef7 in Client::~Client (this=0x7f1b1893c3b0, __in_chrg=<optimized out>) at ../p4/support/strbuf.h:369
#3  0x000055a7a629c8ac in ClientApi::~ClientApi (this=0x7f1b188df100, __in_chrg=<optimized out>) at ../p4/client/clientapi.cc:21
#4  0x000055a7a6298d4c in _Destroy<P4API> (__pointer=0x7f1b188df100) at /usr/include/c++/10.3.1/bits/stl_construct.h:135
#5  __destroy<P4API*> (__last=<optimized out>, __first=0x7f1b188df100) at /usr/include/c++/10.3.1/bits/stl_construct.h:152
#6  _Destroy<P4API*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/10.3.1/bits/stl_construct.h:185
#7  _Destroy<P4API*, P4API> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/10.3.1/bits/alloc_traits.h:738
#8  _M_erase_at_end (__pos=0x7f1b188df100, this=0x55a7a671f750 <ThreadPool::GetSingleton()::singleton+112>) at /usr/include/c++/10.3.1/bits/stl_vector.h:1796
#9  clear (this=0x55a7a671f750 <ThreadPool::GetSingleton()::singleton+112>) at /usr/include/c++/10.3.1/bits/stl_vector.h:1499
#10 ThreadPool::ShutDown (this=0x55a7a671f6e0 <ThreadPool::GetSingleton()::singleton>) at /tmp/tmp.CjOLdF/p4-fusion-src/p4-fusion/thread_pool.cc:78
#11 0x000055a7a6299385 in ThreadPool::ShutDown (this=<optimized out>) at /tmp/tmp.CjOLdF/p4-fusion-src/p4-fusion/thread_pool.cc:58
#12 0x000055a7a6289fc1 in Main(int, char**) () at /tmp/tmp.CjOLdF/p4-fusion-src/p4-fusion/main.cc:308
#13 0x000055a7a627b027 in main () at /tmp/tmp.CjOLdF/p4-fusion-src/p4-fusion/main.cc:344
#14 0x00007f1b18d31a03 in ?? ()
#15 0x00007f1b18d319dc in ?? ()
#16 0x00007ffc0b6fce20 in ?? ()
#17 0x0000000000000000 in ?? ()

Add Tag Creation for Automatic Labels

As Perforce supports automatic labels, which are a view mapping + a revision marker, it's possible to map these to a Git tag under very limited criteria:

  1. The Label contains the field Revision with a value of @(integer)
  2. The Label contains a View with a path that fully contains the depot path being copied.
  3. The (integer) revision is a changelist that the transfer includes.

Because this label fully matches to a Git commit, a corresponding tag can be created off of the label.

Submit/push back to Perforce? / compatibility with git-p4?

As far as I understand, p4-fusion only allows to clone/convert a p4 repo into a git one, not to make changes from git and contribute back to p4, right?
Is the resulting repo then compatible with git-p4 (ie CL annotations in commit messages/notes), so we can use git-p4 to submit back?

Parsing CL from commit messages breaks after using bfg-repo-cleaner

I used https://github.com/rtyley/bfg-repo-cleaner to purge large files from a repo generated by p4-fusion. bfg adds Former-commit-id to the commit messages:

...
[p4-fusion: depot-paths = "<snip>": change = 43110426]

Former-commit-id: 9885e208a53eb53e0f691fed688eae5565a814fd

After that, p4-fusion is unable to parse the CL from commit message:

[ WARNING @ Main:177 ] Detected last CL committed as CL 814f
[ PRINT @ Main:180 ] Requesting changelists to convert from the Perforce server
[ ERROR @ HandleError:13 ] Received error: error Invalid changelist/client/label/date '@>814f'.

814f comes from Former-commit-id: 9885e208a53eb53e0f691fed688eae5565a814fd.

Workaround: run bfg with --private to omit Former-commit-id from commit messages.

Audit behaviour when file type changes between non-binary and binary are encountered with `--includeBinaries false`

p4-fusion, when run with the --includeBinaries false arguments, currently disregards binary file changes completely, even if the file was changed from text to a binary file or vice versa.

We need to have a consistent behavior when choosing to ignore binary file changes when the file types are changed in such a manner.

E.g.

Case 1: A file added as text, then change to a binary file in a subsequent CL

p4-fusion will disregard the change and will leave the file state as was in the repo when it was a "text" file.

Case 2: A file added as text, then changed to binary, and then deleted from the depot

p4-fusion will add the file as text initially but not remove it later on since it was intermediately changed to a binary file before it got deleted.

How to insert P4PASSWD in order to use p4-fusion?

Hello, I have been trying to use p4-fusion.

But, I don't know where I should insert a P4PASSWD value.

I tried setting it to the environment but it did not work.

I would appreciate it if you could let me know.

minor: exclude vendor/libgit2/tests

I was downloading this repo on a slow connection and wondered why it took so long.

$ du -hs vendor/libgit2/tests/
53M	vendor/libgit2/tests/

Minor issue, feel free to close out and ignore.

Client issues result in empty Git repos

If p4-fusion runs into a problem with a client spec while cloning, it will create Git commits from the Perfoce changelists, but it will not sync any files into those commits.

Some examples are:

There are probably situations where errors before syncing can be ignored or retried, but for some errors, client validation being one of them, perhaps it should halt and exit with a non-zero code to indicate an error condition.

Optionally display a flag saying the file came from a proxy cache

When using p4-fusion with a Perforce proxy server, it is sometimes beneficial for performance debugging purposes to know the proxy cache hit/miss rate while downloading files through p4 print

This is usually done by adding a -Zproxyverbose tag in the p4 CLI:

p4 -Zproxyverbose sync

We can specify the same tag when we create the P4API contexts.

m_ClientAPI.SetProtocol("tag", "");

And then we should be able to log that information through p4-fusion, either simply to stdout or add it in the commit message

Support view mappings for branches

The docs show that this only works for single branches and even then, only branches containing single paths. Perforce allows branches to be located in any location, and even have different mappings for different branches.

Proposal: Allow the specification of a configuration file that describes the branches and the views used by them to sync:

[main]
//depot/main/config/my-component/... config/...
//depot/main/pkg/my-subcomponent-a/... pkg/my-subcomponent-a/...
//depot/main/pkg/my-subcomponent-b/... pkg/my-subcomponent-b/...

[dev]
//depot/comp/my-component/branches/dev/... ...

Cache user signatures outside the loop

Most of the times, signatures can be reused from the commit loop because the userbase doesn't change across a single p4-fusion run.

The expected performance boost from this is very small, but it is something to test as well.

p4-fusion fails to build on macOS with recent Helix Core C/C++ API versions

Repro

  1. Set up the environment for building p4-fusion
  2. Download the Helix Core C/C++ API for ARM macOS, at least version 2023.2 (somewhat older versions might be affected, too)
  3. Build p4-fusion

Expected: The build succeeds
Actual: The build fails at the linking stage, citing many undefined references to curl and sqlite functions.

Interestingly, the issue does not occur on Linux (tried on Ubuntu 22.04 LTS).

The issue can be fixed by adding the p4script_curl and p4script_sqlite libraries here:

target_link_libraries(p4-fusion PUBLIC

I wonder if there are valid use cases for building the tool with older P4 SDK versions.

--includeBinaries option causes hang

When p4-fusion is ran with --includeBinaries true it hangs and won't download any files

Full command is:

./build/p4-fusion/p4-fusion \                                                                                     
        --path //AAA/dev/... \
        --user $P4USER \
        --port $P4PORT \
        --client $P4CLIENT \
        --src clones/AAA/.git \
        --networkThreads 20 \
        --printBatch 100 \
        --lookAhead 2000 \
        --retries 10 \
        --refresh 100 \
        --includeBinaries true

And Log is:

[ PRINT @ Main:29 ] Running p4-fusion from: .[CENSORED]
[ SUCCESS @ InitializeLibraries:142 ] Initialized P4Libraries successfully
[ PRINT @ Main:72 ] Updated client workspace view [CENSORED]with 15 mappings
[ PRINT @ Main:126 ] Perforce Port: [CENSORED]
[ PRINT @ Main:127 ] Perforce User: [CENSORED]
[ PRINT @ Main:128 ] Perforce Client: [CENSORED]
[ PRINT @ Main:129 ] Depot Path: //[CENSORED]/[CENSORED]/...
[ PRINT @ Main:130 ] Network Threads: 20
[ PRINT @ Main:131 ] Print Batch: 100
[ PRINT @ Main:132 ] Look Ahead: 2000
[ PRINT @ Main:133 ] Max Retries: 10
[ PRINT @ Main:134 ] Max Changes: -1
[ PRINT @ Main:135 ] Refresh Threshold: 100
[ PRINT @ Main:136 ] Fsync Enable: 0
[ PRINT @ Main:137 ] Include Binaries: 1
[ PRINT @ Main:138 ] Profiling: 0
[ PRINT @ Main:139 ] Profiling Flush Rate: 1000
[ SUCCESS @ InitializeRepository:70 ] Initialized Git repository at clones/[CENSORED]/.git
[ PRINT @ Main:168 ] Requesting changelists to convert from the Perforce server
[ SUCCESS @ Main:180 ] Found 228 uncloned CLs starting from CL 7076 to CL 12754
[ PRINT @ Main:182 ] Creating 20 network threads
[ SUCCESS @ Main:184 ] Created 20 threads in thread pool
[ SUCCESS @ Main:204 ] Queued first 228 CLs up until CL 12754 for downloading
[ SUCCESS @ Main:207 ] Perforce server timezone is 60 minutes
[ SUCCESS @ Main:211 ] Received userbase details from the Perforce server
[ PRINT @ Main:216 ] Last CL to start downloading is CL 12754
[ WARNING @ CreateIndex:136 ] No HEAD commit was found. Created a fresh index.

Optionally omit colour codes from logs

While running p4-fusion remotely, it may happen the shell output or the shell output capture doesn't support displaying colour codes properly, and so p4-fusion logs would usually be surrounded with weird unicode embeds as shown below.

\u001b[91m[ ERROR @ Run:113 ] Connection dropped or command errored, retrying in 5 seconds.\u001b[0m

We should provide an optional build flag that omits the colour codes from the logging macros present in common.h

#define ERR(x) \
std::cerr << "\033[91m" \
<< "[ ERROR @ " << __func__ << ":" << __LINE__ << " ] " \
<< x << "\033[0m" << std::endl
#define WARN(x) std::cerr << "\033[93m" \
<< "[ WARNING @ " << __func__ << ":" << __LINE__ << " ] " \
<< x << "\033[0m" << std::endl
#define SUCCESS(x) std::cerr << "\033[32m" \
<< "[ SUCCESS @ " << __func__ << ":" << __LINE__ << " ] " \
<< x << "\033[0m" << std::endl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.