Giter VIP home page Giter VIP logo

cloud-migration-tool's People

Contributors

hojo avatar joacchim avatar vrancurel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloud-migration-tool's Issues

Report output in dest account

It may be practical to have the possibility to retrieve the output report directly from the destination account/bucket.

Deb/RPM installers

A Cpack configuration should be added to the CMakeLists in order to be able to provide both installers

Improve Cloudmig-view testability

We need a dummy cloudmig to be able to load and run loadable scenarios in order to test cloudmig-view easily, and quickly. This should be part of an effort to improve testability and stability of the tools.

Improve ETA

Currently, the ETA measures the last few seconds of bandwidth in order to attempt an estimation at how much data can be transferred. It does not take into account:

  • Bandwidth variation/evolution on a larger timespan than 10 secs.
  • Small objects migrations that induce a lot of latency (and thus reduce bandwidth)

We should try to find an efficient ETA algorithm, make it testable, run scenarios with it, and replace the current, unreliable one.

Add Status building to the view

Currently, the user can only visualize the transfer of the files.
We want to add the status building into those, for the user to be able to know how far the scan is along; and ensure that nothing is blocked.

ACL management : pushing raw ACL xml

We should provide an option allowing to attempt pushing raw ACL to the destination.
In case of failure, fallback to push ACL in a separate file with extension .acl.
Thus, the file Test.txt would either have its acl SET, or have a joint file Test.txt.acl containing the ACL's XML.

cloudmig fails to run in concurrent, simultaneous threads

Cloudmig will fail to run if run twice, in 2 different bash shells, simultaneously. Only 1 instance will succeed.
If i add a slight delay (even .1 seconds) between cloudmig runs, everything works perfectly fine. The problem occurs specifically, when run simultaneously.

I proved this out with the following script & configurations:
https://gist.github.com/mlaurie/8d8011aaeaad03a41e6e

If you don't get the error immediately, try running several times. It will error pretty commonly.

There 1 primary error that cloudmig gives:

Unexpected exception: Command 'cloudmig -c /tmp/testConfig1.json' returned non-zero exit status 1
cloudmig:6611:[INFO][Loading Profiles]: Starting...
cloudmig:6611:[INFO][Loading Profiles]: Profiles loaded with success.
cloudmig:6611:[INFO][Loading Status] Starting status loading...
cloudmig:6611:[ERR][Loading Status/Exists] Could not list open status store path(opendir): DPL_ENOENT
cloudmig:6611:[INFO][Creating Status Store]  Status Store not found. Creating...
mkdir: File exists
cloudmig:6611:[ERR][Creating Status Store] Could not create store(directory): DPL_EEXIST

Option to choose status bucket

An option allowing to give the name of the status bucket to be used would prevent problems caused by the bucket names limitations (255 chars, ...).

delete_source option fails on s3

On posix, the deletion is successful, but it finishes with segfault.
On S3, deletion is unsuccessful and it finishes with segfault.

On S3, attempting to use the "delete_source" configuration parameter results in failure and segfault.
(possibly due to decimal prefix in hostname?):

cloudmig -c configuration.json
....skipped output here....
cloudmig:7433:[INFO][Migrating] : file 100Kfile.txt migrated.
cloudmig:7426:[INFO]Uploading digest: 1/0 objs, 1/0 bytes
cloudmig:7426:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 1, "bytes": 102400, "done_bytes": 102400 }
cloudmig:7426:[INFO]Migration finished with success !
cloudmig:7426:[INFO][Deleting Source]: Starting deletion of the migration's source...
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
cloudmig:7426:[ERR][Deleting Source File] Could not delete the file 100Kfile.txt : DPL_FAILURE.
Segmentation fault (core dumped)

configuration.json:

{
    "source": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "destination": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "status": {
        "backend": "posix",
        "base_path": "/tmp/statusDir1"
    },
    "cloudmig": {
        "buckets": {"srcbucket:/": "dstbucket:/"},
        "delete-source": true,
        "worker-threads": 10,
        "create-directories": true
    }
}

Output to a log file

When in background-mode, the tool should by default output everything to a file.
An option to choose the output file would be convenient.

Background Mode option

An option should exist to tell the tool to run in background and not block the user's terminal during the transfer.

Logging : syslog

The application is to be used by system administrators (mainly, at least).
Instead of giving them a shitty log in the form of an output, we should log everything that's happening with syslog.

Add Files deletion to the view

When a migration is done, it is possible to delete the source files, but nothing is shown to the viewer for the user to visualize how the deletion is progressing.

ACL management : pushing raw ACL in a separate file

We should provide an option offering to push the RAW ACL's XML into a separate file.
For instance, the file foo.txt would be transfered, and the new file foo.txt.acl would contain the file's ACL's xml.

This would be the default in case of failure of the option given in ticket #8

End of migration report

The tool should report a last line at the end of the migration in order to tell the stats of the whole migration :

  • number of files transfered,
  • Number of file not transfered (failures)
  • Total size of the transfer.
  • Average speed,
  • Time it took,

Multi-thread the source files deletions

The source files' deletion is currently driven by the main thread, while it could benefit a lot by having all the available threads work to delete files concurrently.

Build error

I am getting build errors on doing make after the build file is generated by CMake scripts. Error is attached in the snapshot.
make_error

Add an option to choose whether to encrypt on the fly

For now, at most, the application uses de DPL_VFILE_MD5 flags that allows to check the file's integrity, but doesn't prevent anyone from reading it on the way.

An option to select on-the-fly encryption would be a good idea.

Option for source data removal

Currently, the source data is automatically removed at the end of the transfer.
It should not be by default, but be deleted only when an option is given to the program : --delete-source

Permission for the viewer

Currently, the viewer can see any migration going on, may it be by the same user or not.
Setting permissions over the directories conitainins the socket file should be enough to prevent this behavior, and limit the viewer to what it should access.

Improve the verbose option

Currently, there are 3 verbose modes :
DEBUG (with verbose option, includes FULL droplet library tracing)
INFO (default)
WARN (quiet mode)

It would be nice to choose whether to activate the droplet library tracing or not, with a more flexible option.

Droplet backend support

Currently, the tool is too rigid in the way it supports the multiple backends from droplet.
For instance, the POSIX droplet backend cannot be supported because it does not provide any bucket functionnality, feature over which the internals of cloudmig rely heavily.

The solution may be linked to the possibility to migrate from a directory to a bucket, from a bucket to a directory, and between directories within some buckets.

Add an option to deactivate transfer for specific file types

When migrating data from one backend to another, where the feature support is different between the two, we would like to be able to tell cloudmig not to try and migrate some specific kind of files.
For instance, the S3 backend does not support symlinks, and as such, we would like to prevent migrating any symlink during a migration to this backend, suppressing errors from symlink migrations.

Build: libmenu finding

Currently, the libmenu's path is written in the viewer's CMakeLists instead of being retrieved by a FindPackage call.
We need to make one to ensure we can find it.

Identification of migration in cloudmig-view

Currently, cloudmig-view displays the process ID, but that's not anything interesting for any user.

It should display the URIs of the source and the destination.
In order to do that, a simple file with one line could be written within the same directory than the socket file by the main tool, and be read by the viewer.

Viewer does not compile in 64bits

Many integer variables are declared as uint64_t, and printed with format %llu.
On 64 bits, this should be %lu, so a solution is needed to avoid this problematic, and to make the viewer compile.

Multipart upload not working: intermediate status cannot be saved

There is an error in the computing of the path of a status directory and its temporary status files, that leads to a difference in the names of the directory created for the status of a specific bucket and the name of the directory used in the path computed for the temporary status file.

All of this happens in status_bucket.c

ACL management : default behaviour

The acl are still not managed within the application :
Everything is created as a Canned ACL : PRIVATE

By default, the application should at least reproduce the canned_acl, or in case of failure, fallback to PRIVATE by default.

Migration might overwrite files without user's consent

When doing a "merge"-type migration (merging a source directory into an existing and used target directory), any name-conflicting file might be overwritten without so much as a second thought.

We might want to prevent that by default but add an option to force the overwrite.

Using delete_source parameter also deletes the status files

When using the "delete_source" configuration parameter, the status files are also deleted when the test completes.

While the migration is running, the status files are created properly and populated. But when the migration completes, it appears those status files are also removed, even when they are in a different file location (posix:/tmp/statusDir1) than the migration-source objects (s3:srcbucket:/)

Expected:

/tmp/statusDir1/cloudmig.s3.amazonaws.com.to.s3.amazonaws.com/
mlz1%3a%2f  
mlz1%3a%2f.json

Actual:

/tmp/statusDir1/


config.json

{
    "source": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "destination": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "status": {
        "backend": "posix",
        "base_path": "/tmp/statusDir1"
    },
    "cloudmig": {
        "buckets": {"srcbucket:/": "dstbucket:/"},
        "delete-source": true,
        "worker-threads": 10,
        "create-directories": true
    }
}

delete_source does not delete the source

Using the delete_source configuration option does not seem to delete the source.
It fails with ' Failed to lookup hostname ".s3.amazonaws.com" '

Error output:

cloudmig:5944:[INFO][Migrating] File '100Kfile.txt' transfer succeeded !
unlink: No such file or directory
cloudmig:5944:[INFO][Migrating] : file 100Kfile.txt migrated.
cloudmig:5937:[INFO]Uploading digest: 1/0 objs, 1/0 bytes
cloudmig:5937:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 1, "bytes": 102400, "done_bytes": 102400 }
cloudmig:5937:[INFO]Migration finished with success !
cloudmig:5937:[INFO][Deleting Source]: Starting deletion of the migration's source...
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
error: src/conn.c:533: dpl_conn_open_host: Failed to lookup hostname ".s3.amazonaws.com": Unknown server error
cloudmig:5937:[ERR][Deleting Source File] Could not delete the file 100Kfile.txt : DPL_FAILURE.
cloudmig:5937:[INFO][Deleting Source]: Deletion of the migration's source done.
cloudmig:5937:[STATUS]End of data migration. During this session :
    Transfered 1 objects, totaling 1/1 objects.
    Transfered 102400 Bytes, totaling 102400/102400 Bytes.
    Average transfer speed : 102400 Bytes/s.
    Transfer Duration : 0d0h0m1s.

configuration.json

{
    "source": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "destination": {
        "backend": "s3",
        "aws_region": "us-east-1",
        "host": "s3.amazonaws.com",
        "access_key": "mykey",
        "secret_key": "mysecret"
    },
    "status": {
        "backend": "posix",
        "base_path": "/tmp/statusDir1"
    },
    "cloudmig": {
        "buckets": {"srcbucket:/": "dstbucket:/"},
        "delete-source": true,
        "worker-threads": 10,
        "create-directories": true
    }
}

Properly manage ^C Interruption

The tool does not manage properly a ^C Interruption, possibly corrupting some files (statuses mostly).
We should manage it properly, and do a clean stop in this case.

Change the way the names are stored

Currently, the names in the status files are stored with a precise length.
If the length is 0 mod 4, there is no nul terminating character.
We should improve the binary status file formats in order to make manipulation of those files easier. (That would probably help cleaning code too)

success reported on failure when destination inaccessible

When the destination isn't directly accessable, cloudmig reports a successful migration when no files were transferred.
The transfer fails properly if the source is not accessible. This only occurs when destination is not accessible.

Example1:
When cloudmig receives HTTP-403 during posix->RS2 migration, it results in:

cloudmig:3476:[INFO][Migrating] File 'newfile.txt' transfer failed !
cloudmig:3476:[ERR][Migrating] : Could not migrate file newfile.txt
cloudmig:3469:[INFO]Uploading digest: 0/0 objs, 1/0 bytes
cloudmig:3469:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 0, "bytes": 1024, "done_bytes": 0 }
cloudmig:3469:[INFO]Migration finished with success !

Example2:
When cloudmig receives HTTP-307 during S3->S3 migration, it results in:

cloudmig:2545:[INFO][Migrating] File '100Kfile.txt' transfer failed !
cloudmig:2545:[ERR][Migrating] : Could not migrate file 100Kfile.txt
cloudmig:2538:[INFO]Uploading digest: 0/0 objs, 1/0 bytes
cloudmig:2538:[INFO][Uploading Status Digest]  Uploaded digest: { "objects": 1, "done_objects": 0, "bytes": 102400, "done_bytes": 0 }
cloudmig:2538:[INFO]Migration finished with success !

Transfer of files inside directories is buggy

Actually, by using the libdroplet's vfile and vdir api, a limitation occurs :
If a file with a name containing a directory-like path is to be transfered, the vfile API checks wether a file with the directory's name exists.
e.g.

The file 'bucket:/directory/foo.txt' requires a file named 'bucket:/directory/' to exist in order to be transfered.

Sadly, that is not a thing that can be expected from any provider/account, and it must be possible to transfer a file without those limitations.

cannot double migrate 1 source bucket

Cannot migrate a single sourceBucket into 2 different destination buckets.
Below is a sample configuration that might be used for this scenario (buckets section is most important here):

{
    "cloudmig": {
        "buckets": {"srcbucket1:/": "dstbucket1:/", "srcbucket1:/": "dstbucket2:/"},
        "create-directories": true
    }
}

Expected results:

  • srcbucket1 is migrated to dstbucket1
  • AND
  • srcbucket1 is migrated to dstbucket2

Actual results:

  • srcbucket1 is migrated to dstbucket1
  • OR
  • srcbucket1 is migrated to dstbucket2

A workaround is to migrate the same source twice, using two different bucket configurations.

Resume option

Currently, the tool resumes a migration by default.

In order to avoid the risk of starting two migrations on the same source account at the same time, it is preferrable to have an option to force resuming the migration.

By default, it should not do anything if a migration is currently ongoing.

destination buckets name

Currently, whenever a destination bucket name is given, the tool does not try to access the bucket when creating it.
This results in an error while creating the destination bucket where sometimes, it could merely use an existing bucket.

Add some usage documentation

Currently, there is no proper documentation about how to use the tool:

  • What parameters are mandatory ? -> source, destination, cloudmig/buckets
  • What can help to explain an issue ? -> Assumptions of the different configuration variables (ie: cloudmig/buckets assumes the / ending for a directory, otherwise the names are just prepended)
    etc...

Also some basic configurations might be useful, even a simple configuration file with all the default values set in; it would be almost like launching cloudmig with the minimal number of parameters, but everything would be written explicitly,

Verbose : droplet trace option

An option should be added in order to activate parts of the tracing options of the droplet library at will.
Something like openbsd's ktrace -t options would be nice (that is: -t ihce, with each letter meaning a specific tracing option)

Migration between buckets

Currently, the tool only migrates an account's content to another.

We need an option to migrate only between two buckets.

Multithreading

Most of the code is planned to be multithreaded one day or another, so let's just do that.

Transfer configuration

An option allowing to give a configuration file instead of command-line arguments.

written as --config-file=config_file

It would contain information about :

  • Source account (replacing the droplet source profile)
  • Destination account (replacing the droplet destination profile)
  • A list of association of source/destination buckets
  • Every command-line argument should have an equivalent in this file

Force restart option

A resume option is to be added to the tool, but there is no way to ignore the status bucket and force restarting the transfer.
Such an option may prove to be useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.