wrench-project / wrench Goto Github PK

WRENCH: Cyberinfrastructure Simulation Workbench

License: GNU Lesser General Public License v3.0

CMake 0.89% C++ 84.53% C 0.09% Shell 0.03% HTML 14.39% Python 0.08%

batch-job distributed-computing distributed-systems hpc reproducible-research scheduling-simulator scientific-workflows simulation-framework simulation-modeling workflow workflow-management-system workflow-simulator

wrench's People

Contributors

Stargazers

Watchers

Forkers

ryantanaka pfdutot frs69wq mesurajpandey valhayot spenceralbrecht whoodes derrior dzunghdo curioustauseef wyywyy23 herrhorizontal jonathanbader fredericrisling lpottier julien-monniot rsreds bradley39e

wrench's Issues

Ability to load batch workload trace file in JSON "batsim" format

BatchService: prediction error with batsched?

It seems that if one asks multiple times a queue wait time prediction with the same key then we get some "job already in the system" error. For instance:

Assertion '_jobs.count(job_id) == 0' failed (ERROR)
in file json_workload.cpp, line 62
function: void Workload::add_job_from_json_object(const Value &, const string &, double)
with message: Job 'config_XXXX' already exists in the Workload

If we generate distinct keys, then we don't get that error. So it's as if those prediction jobs are actually inserted into the workload... WRENCH issue? BATSCHED issue?

Needed Development: SimpleStorageService using S4U Storage

Update the implementation of the SimpleStorageService so that it uses the Storage abstraction provided by S4U, whenever available/documented. One issue to pay attention to is that pipelining of network transfer and disk writes (a "store-and-forward" approach is really not realistic).

Bug: Increase the value for the default network timeout

Feature Request: Augment the set of simulation events in the simulation output

What simulation "events" to add:

Task start
Task failure
File copy begins
File copy ends
File copy failure

One thing is that with our current design we will sprinkle our code with "add timestamp" everywhere in the code... is there a better way?

We should also augment the WorkflowTask object to keep track of detailed time info. For starters:

task_start_date ( already in there)
task_computation_start_date
task_computation_end_date

Implement a Workflow::getReadyClusters()

If one doesn't use task clusters, it's annoying to get a map of cluster IDs when doing a Workflow::getReadyTasks(). So we should simply have getReadyTasks() return a vector of tasks, and getReadyClusters a map of cluster ideas. A "ready cluster" is a cluster that contains only ready tasks.

General Compute Service API refactoring

At the moment, every compute service has two boolean arguments ( "support pilot jobs", "support standard jobs"), a double argument (the scratch space size), and a plist. Wouldn't it make more sense to have ALL these arguments part of the plist? (so that a compute service would only have a hostname and list of compute resources argument, and then an optional plist).

Feature Request:"copy and register", "delete and unregister" abilities

It would be good to augment the DataMovementManager and other components (e.g., job executors perhaps) with the option to do a combined "create/copy a file AND add an entry in the FileRegistryService". Similarly, when removing a file from a storage Service, it would be good to have a "remove and unregister". The objective is for a WMS developer who wants everything to be registered to not have to do tons of explicit separate register/unregister operations.

First step:

Modify the StorageService::deleteFile() API to take in a FileRegistryService pointer. It not nullptr is passed, then update the file registry service ONLY if the delete operation is successful
Modify the DataMovementManager (synchronous and asynchronous) to take in a FileRegistryService pointer. It not nullptr is passed, then update the file registry service ONLY if the copy operation is successful.
- This will require adding a "pending file copies" data structure to the DataMovementManager
- Algorithm is:
  - For each file copy request keep track of "update file registry service" or "don't update"
  - When receiving a "file copy done or failed" message:
    - If not failed: (i) check whether file registry service should be updated; (ii) if yes, update it.

Desired Development: Implement Vivaldi as a Network Proximity Service

It would be nice at some point to implement the Vivaldi system as a Network Proximity Service so that it can be used out of the box by WRENCH developers.

BatchService: what about RAM?

At the moment, the way in which the BatchService is handling RAM is strange:
[DONE] Updated the constructor to handle heterogenity

IGNORE RAM constraints (and document this)

Scratch space isolation between standard jobs

We need to clarify semantics for scratch space. Here is the proposal:

a compute service has a single scratch storage of a given size specified at construction time
When a standard job runs on the compute service (NOT within a pilot job), that standard job has its own temporary "directory" in scratch. As a result, the same file could be stored multiple times in scratch, each copy for a different standard job. This requires that a scratch storage service provide a bit more functionality than a normal storage service.
When standard jobs run within a pilot job then these standard jobs shared a single scratch temporary directory (and if one of them wipes out a file another needs, then too bad, that's the WMS's fault).

So, in a nutshell, we need to extend the StorageService and/or SimpleStorageService API and implementation to include a "temp directories" abstraction, to be defined.

Software Engineering: use "const" more :)

In the current code, there is very little use of the const keyword, even though this is really a great feature of C++. As we go forward, adding const here and there is a good thing.

Weird Doxygen problem

The "internal documentation" shows the "...Event" classes outside the wrench namespace, which is not correct. The "developer" documentation, however, shows these classes correctly inside the namespace. Not sure what's happening here....

SimpleStorageService: num_connections as constructor argument or property?

Should the number of concurrent connection be a constructor argument (as it is now) or just an optional property (default being: unlimited). Although I implemented the former, I now think the latter is better.

Energy consumption simulation

We should evolve WRENCH so that it exposes the "energy" functionality in SimGrid

Evolve Virtualized Cluster Service

Evolve VirtualizedClusterService so that it:

Allows a VM to be shutdown
Allows a job to be submitted to a particular VM

Add support for task priority value in WorkflowTask

Feature Request: a "Virtualized Cluster" compute service

A visible physical infrastructure
The ability to create/migrate/kill VM by hands
Each VM would expose a MHMVCS as in the cloud

Needed Development: Integration of BatchService and BatSched (from BATSIM)

Batsched integration milestones:

Make sure that the integration works with the updated Batsched protocol (waiting for confirmation from the batsched people that the protocol documentation on github is up to date)
Using the wrench or fast_conservative branch of Batsched on gitlab, implement the QUERY/ANSWER feature, that should be implemented
Modify the current wrench::BatchService API to add a getQueueWaitingTimeEstimate() function. That function will handle all messaging with the batch service. Once that's done, remove the ServiceInformationMessage handling in the Job Manager.

Feature Request: scratch space for compute services?

It would be useful to have a notion of scratch space for each compute service. Motivation: Files can be implicitly deleted from scratch.

ComputeService:

Constructor should always have a (optional) argument which is : scratch space in bytes?
- if specified: create a (not visible to the whole world) storage service that is "attached" to the compute service
NO MORE Default Storage Service

StandardJob:

"pre file copies"
-Copies CAN BE To scratch (if there is some), even though scratch is not visible from the outside
tasks:
- If a task is told to read/write a file from a particular SS, then fine
- If not, it looks for it / creates it in the scratch (if there is some)
- post file copies
  - Copies CAN BE From scratch (if there is some), even though scratch is not visible from the outside
file deletions:
- Explicit in whatever SS, fine
- Implicit in scratch (if there is some), but NOT for a standard job within a PILOT job

==== UGLY IMPLEMENTATION OPTION ===

{File, StorageService*, StorageService*}
{File, StorageService*, ComputeService::Scratch}

#define ComputeService::Scratch ((StorageService *)((unsigned long)666);

StandardJobExecutor:
...
storageService *src = std::get<1>(copy);
storageService *dst = std::get<2>(copy);
if (dst == ComputeService::Scratch) {
if (this->compute_service->hasScratch() {
dst = this->compute_service->getScratch():
} else {
// EXCEPTION
}
}

Augment Workflow::loadFromXXX() with Reference speed as a string

so that we no longer have to use flops=1 as the reference speed, which is confusing

BatchService: implement a ROUND_ROBIN host selection algorithm

At the moment, BatchService implements FIRST_FIT and BEST_FIT. We should augment this with a ROUND_ROBIN option that will spread jobs across hosts as much as possible, which is something some real systems do.

Something to watch out for: using addresses as search keys can cause an ABA address-recycling bug

In many parts of the code we use addresses of objects to search for their presence in lists. This is susceptible to the ABA address-recycling bug. For instance:

I allocate a StandardJob, which has adress 0xAAAA
I start an Alarm to let me know that that job has expired
The job completes normally well ahead of the expiration
I allocate ANOTHER StandardJob, which ALSO has address 0xAAAA because the heap allocator reuses the same location
That other job runs, and at some point I get the "job expired" message from the Alarm for job 0xAAAA

In this way, I am mistaking an "old message that I should ignore" for a "oh no, a job has expired" message.

The way to fix this: create a unique sequence number for each StandardJob (static variable inside the constructor that gets incremented). Then, before sending the message, the Alarm could, for instance, check that the sequence number of the job at address 0xAAAAA has not changed. Or, the message could be sent regardless, and the recipient of the message would then do the check. In essence, the check is: "yes, there is a job at that address you're telling me about, but let me checked if it's really the job you mean".

Better command-line argument parsing / help

Implement:
--help
--help-simgrid
--version

handle the differences between WRENCH help and SimGrid help.

Need to add a "VM creation overhead" in seconds to Cloud/VirtualizedCluster

We need to add a creation overhead property

BatchServiceTesting: FIRSTFIT, BESTFIT

SImplify the tests by relying on the WorkflowTask::getExecutionHost() method instead of reverse-engineering schedule based on task completion times. (Just like what's done for ROUNDROBIN).

Needed Development: Educational/Pedagogic modules

We need to develop pedagogic modules that can be used stand-alone and/or integrated into university courses to teach concepts related work workflows, HPC, distributed systems.

Bug: SimpleStorageService does not provide a mailbox name when max capacity is reached

Allow non-ready tasks in StandardJob

Batch Service: Implement StandardJob Termination with BATSCHED

And then enable the TerminateStandardJobsTest in the BATSCHED case

Feature Request: Memory specifications

I've been looking for a way to specify the amount of main memory per compute node, but haven't found one in here (like in SimGrid). Am I missing something, or is this intentional?

Workflow::getReadyClusters() Weirdness?

For some reason I looked at the code for Workflow::getReadyClusters(). I am a bit puzzled by this method and don't quite understand it (I never actually used the "cluster" feature). One thing that caught my eye first is that it calls setInternalState() and calls setState(). That seems really against our overall design. The state updates are made by the services, job manager, and by the WMS itself in waitForNextExecutionEvent(). Instead, the "get ready tasks" methods should just look at states, not update them. I am cut-and-pasting the method below.

The last else clause in this method is as:

   } else {
      if (task_map.find(task->getClusterID()) != task_map.end()) {
        if (task->getState() == WorkflowTask::State::NOT_READY) {
          task->setInternalState(WorkflowTask::InternalState::TASK_READY);
          task->setState(WorkflowTask::State::READY);
        }
        task_map[task->getClusterID()].push_back(task);
      }
    }

I have no idea why we need to do anything in that else in the first place, and definitely not what's in there... I commented out this entire else clause and all tests and examples run fine (but then, we don't use this method a lot).

@rafaelfsilva I believe you implemented this method? what do you think?

Desired Development: Daemonize Actors

S4U does/will provide a way to daemonize actors, which may be used to simply a bit the WRENCH code? For instance, the Alarm services?

Can't see SimGrid callback for host state changes

[Formerly: We should evolve WRENCH so that it handles host failures as dictated by SimGrid availability traces]
What changed to a SimGrid Bug issue, and copied to a general issue

Consistent use of ComputeService::ALL_RAM and ComputeService::ALL_CORES

In class ComputeService we have the convenient constants ALL_RAM and ALL_CORES to specify "on that host use all ram" and "on that host use all cores". This is use throughout the WRENCH code, and documented, but I just noticed that it's not used everywhere. For instance, in the VirtualizedClusterService class, we're still on the "old way" of using "zero" to mean "all". We should fix this before the release...

Feature Request: Ability to simulate multiple workflows with arrival times

At the moment, WRENCH only simulates one workflow execution. Users (e.g., Eddy Caron) have request a much more powerful model in which multiple workflows can arrive dynamically throughout the simulation. This requires some software engineering (and likely some thought). Furthermore, there should be the possibility of multiple WMS instances running concurrently, OR a single WMS instance managing multiple arriving worflows.

Before we get there we need:

Take the "shutdown all services" functionality out of the WMS (e.g., create a Terminator service that is given some termination condition by the user)
Make it possible to create a WMS that runs on a constrained set of services
Add a "start time" to a WMS
At this point should be easy to have concurrent WMS instances
Then think about the WMS that can handle a stream of workflows

Problem with Workflow::loadFromJSON on Mac?

I haven't had time to look into it, but one of our users has written a small simulator, and the loadFromJSON works on Linux, but not on Mac. I am attaching here the JSON file that causes problems. (had to rename is .json.txt so that Github would allow me to attach)
E1S51u.json.txt

Add timouts to service API functions

In the design of most services, the API functions that "use" the service are as follows:
A) Check that the service is up
B) Send a message
C) Wait for a reply

It seems that:
A) is missing in some cases [TODO: add it]
B) is sometimes asynchronous, but synchronous is better [TODO: fix it]
C) is often without a time out (and thus may hang if the service has been killed in the meantime, which is a "feature" for a dumb implementation, but should likely be a bug) [TODO: add Service::setTimeout() and Service::getTimout() methods!]

Simulation.output.getTrace() causes segfault for empty trace

Batsched compilation optional?

Would it be useful/convenient to make the Batsched integration optional? This is because there are so many dependencies and users who don't need Batsched then have to install so many packages. Perhaps we don't care though. Not a huge deal either way I guess.

Task states updated before notifications are received

As I am writing WRENCH-based simulators, I am noticing something: task states are updated before notifications are received. Task states are tricky, which is why I had a while back split the task state into "state" and "internal state". This was because, e.g., when a compute service sets a task state to completed, from the WMS's perspective the task is still pending until a notification is sent. This has made things much easier, but now another but similar issue is coming up. Here is a scenario:

A compute service sends back a "job done" notification to a job manager for task T
The job manager gets the notification and updates (non-internal) task states (i.e., T is now COMPLETED and some of T's children become READY)
The job manager sends the notification to the WMS which will be an event

In the meantime, after 3) above but before the WMS does a waitForAndProcessNextEvent(), the WMS is doing something like: "hmmm... what tasks are ready again?" And by looking at task states, it will see some of T's children as ready. It may even see T as completed. And then later, it will be told "task T has completed", although it already new that because it happened to look on its own at task states.

So far, in the simulators I've written, it's been weird in terms of the output I see (which caused me to wonder: "how could T's child be ready when T hasn't completed yet?", because I was only printing some "task completed" message upon receiving an actual event). For instance, by output could have been, for a T1->T2 workflow:

Submitting T1
T2 is ready
Submitting T2
T1 has completed!

which appears out-of-order, but it ok.

One question is : is this a bug or a feature?

I am thinking bug because it seems more coherent to say that "task states cannot change arbitrarily in between job submissions/cancellations and calls to waitForAndProcessNextEvent().

The fix wouldn't be super straightforward, since right now the logic in the Job Manager is, as mentioned above:

Wait for a Job Completion (or Failure) message
Update task states
Send a message to the WMS (which will be caught by waitForAndProcessNextEvent())

So, now, 2) has to happen in the waitForAndProcessNextEvent() method, which is awkward...

anyway, something to discuss/think about. Distributed computing, even in simulation, is never easy is it?

Feature Request: FileRegistryService: pick a replica based on network proximity

It would be useful for the FileRegistryService to have an option to not just get the list of replicas for a file, but instead to pick one based on whatever network proximity services are running, if any.

Feature Request: implement a queue wait time prediction for the non-batsched batch service

After all, this would make things very consistent, and would be pretty simple.

BatchService: deal with host heterogeneity

For now, we should likely enforce homogeneity of hosts
Later, think about how to support heterogeneity

Documentation Request: WRENCH Developer 101

It would be useful to have a "Developer 101" page that would guide a little bit people wanting to implement a WMS

Needed Development: Clean up exception handling when S4U makes it possible

One day, S4U will provide a clean exception hierarchy, at which point we'll nee to revisit/fix the low-level S4U calls in WRENCH so as to clean up and robustify our own code.

Simple objective: no longer have any use for the xbt_ex class/structure, only simgrid::*Exception classes.

Desired Development: Decouple Service Creation from Service Start

In the current implementation, the constructor of a service also starts that service (i.e., it creates the S4U actor for it). This leads to a problem. For instance:

I create a WMS service
I launch the simulation
launch() throws, as it should, the exception "You should have at least one compute service"
I decide to terminate my program
SimGrid complains that there is a running actor (the WMS service)

The alternative is that the constructor of a service does not start the actor. A separate start() call is used. This way, launch() can first check that it has all it needs, and then starts the services.

This seems like a better approach overall....