bsc-dom / dataclay-packaging Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 468 KB

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 47.42% Shell 50.46% C 0.91% Lua 1.21%

dataclay-packaging's People

Contributors

Watchers

dataclay-packaging's Issues

Implement MareNostrum4 deployment in travis

Re-using (misusing) alias bug

Roughly, to replicate the bug, follow the following:

p = Person("Real VIP", 13)
p.make_persistent("vip")
p = Person("Fake", 15)
try:
    p.make_persistent("vip")
except:
    print("Expected exception, ignoring")
p = Person.get_by_alias("vip")
print(p.name)

The exception kicks in. We expected to receive Real VIP as a print, but instead we receive the fake one.

On a marginally related note, would it make sense to add an "overwrite" flag to the make_persistent with alias?

Integration dataClay HPC scripts with COMPSs and tracing

Create a demo that integrates all the components. Related to #2

Race condition during make persistent of alias and memory pressure

Race condition/bug found:

Client sends object to be persistent make_persistent(alias) CL -> EE
EE is under memory pressure
EE receives make_persistent call, loads the object in memory and registers it in LM calling register_object: EE -> LM
LM receives register_object call and registers it before the client
Client tries to register the object in LM because it has an alias register_object: CL -> LM
LM receives register_object call and sends and exception to client ObjectAlreadyRegistered
Client receives exception and calls add_alias in LM
LM should add the alias but is not

Dislib integration

Design and implement an integration (contribution classes, needed methods...) with dislib

New model registration mechanism

Design and implement a new model registration mechanism registering source code

Compss + pyClay docker demo

Implement demo for compss + dataClay in dockers taking into account Python versioning.

Since the release 2.0, dataClay is not allowing anymore to keep more than one alias per object. The APIs are not updated accordingly, resulting in undefined behavior if an application tries to assign 2 or more aliases for the same object.

My proposal is to re-design the "additional aliases" feature in the same way file systems manage symbolic links: an alias is just a pointer to an objectID.

Add documentation hooks in Appveyor

Check how is documented the commited code. We could use git-lint to not check all the code/file but changed lines. Related to #6

Choose testing framework

Refers to #6

Unified logging configuration

Design and implement unique-common logging configuration for javaclay and pyclay. One idea could be to use special configurations or service parameters (--debug) that are translated to specific configurations for each language (log4j in java, python logging library...)

Tackle the teardown procedure for huge ephemereal datasets

When dataClay is being shut down, it stores all the data into the database.

This is correct for an object store where data is persistent. However, all our HPC use cases are focused on some kind of volatile datasets that are huge. The innecessary teardown procedures means that:

Executions may be half an hour longer than necessary (so batch explorations are much lower)
Home quota is filled (so sequential executions have to wait for a human to manually clean the files)

dataClay should retain the feature --as it is an object store-- but we should improve its behaviour on the "ephemereal execution" --as those are all our current HPC use cases.

Proposal:

Flag (or similar mechanism) for the orchestration to indicate a "dirty shutdown". This may be the default for enqueue_compss-triggered scenarios.
Alternatively, an "ephemereal HPC" flag which forces dataClay DataServices to avoid serializing to disk altogether.

COMPSs + dataClay synchronization of traces

While creating unified traces of COMPSs + dataClay USING DOCKER CONTAINERS, result threads belonging to dataClay services (dsjava, dspython, logicmodule) are not synchronized with COMPSs master and worker threads. This is happening because merger process in Extrae in COMPSs is using -no-syn flag. Using -no-syn flag will not synchronize traces created in different nodes/clocks. Currently, one workaround is to replace trace.sh script in COMPSs docker container to force them to synchronize resulting traces while merging them.

This may work in MareNostrum (still pending to test, with single-clock)

More information about synchronization:
https://tools.bsc.es/sites/default/files/documentation/html/extrae-3.5.1/merge.html

Modify dataClay contributions structure

A dataClay contribution is something that we offer to users. DataClay contributions should be defined in two different packages:

dataclay.contrib.models: this package contains classes that must be registered in dataClay (and therefore, execution classes and stubs must be generated) because those classes define object states. Objects that can be persisted, moved, replicated...
dataclay.contrib.modules: this package contains methods or functionalities that must not be registered in dataClay but registered classes can use it (like synchronization)

Then, users can register his own synchronization or his own collections in any case.

Also, users could use external libraries installing them in servers and clients. However, we should modify javaclay to avoid registration of external libraries and pyclay to not register inherited mixins as code.

Add support for batch object info into ExecutionEnvironments

Add a new RPC in ExecutionEnvironments for retrieving batch information about the objects.

The current use case for this call is to enable a performant way to retrieve extra information, to be used by the split, and use a single RPC call per ExecutionEnvironment (instead of doing a RPC per object).

This is useful because the split will need information about a bunch of objects (e.g. the split needs to know if the objects are in-memory or have been evicted into disk) and the split can use the object hint to aggregate all the objects into a single batch call to the ExecutionEnvironment.

The proof-of-concept implementation will allow to ask for "is_in_memory", but further metadata may be provided in the future (e.g. replica stuff, versions, memory tier, placement, whtaever).

Automatize merge and PR for dataclay-common

If a developer wants to change dataclay-common, he has to do a PR, and once accepted, modify javaclay/pyclay code (update submodule). Travis could do this somehow so developers don't need to do a PR in dataclay-common but only javaclay/pyclay (like we are doing for packaging).

A developer creates a PR in javaclay/pyclay that points to a different dataclay-common submodule reference i.e. (feature/new_calls)
The PR is accepted
Travis realizes that the submodule reference != dataclay-common develop last commit --> Travis merges datacaly-common branch feature/new_calls -> develop and updates the submodule reference

@alexbarcelo what do you think?

Implement run method in federated or replica object

Annotate methods to be executed in federated or replica objects

REST API

Design and implement a REST API for dataClay (dataclaycmd and services)

Federate model

Design and implement a system that allow users to "federate" a class. One design can be:

The user calls "DataClay.federateClass(namespace, classname, destDataClay)" and the class is send to an external dataClay. If the class is already there, then there's no effect

Then, when the user wants to update the class they can call "DataClay.updateFederatedClass" which will work exactly the same as a model redefinition explained in issue #14

Distributed Metadata system

Design and implement a distributed metadata system to:

Improve performance
Avoid going to LM to get metadata

Provide the Paraver configuration files in a easily reachable centralized place

At the moment of writing this issue, there is a paraver folder in each demo in the dataclay-demos repository. The compss_and_dataclay.cfg can be found there, and is the configuration that a user should use on the paraver application in order to see the tracing.

We may want to have a single central place to have that configuration (e.g. just as COMPSs does: the folder files/paraver/cfgs in their repo) and let the user use those configuration. Also, those configuration files should be available in MareNostrum as sometimes users may want to use paraver from within the MareNostrum login nodes.

Registered model redefinition

Design and implement a way to "redefine" or "replace" an already registered class. This will replace the current classes deployed in execution environments and objects belonging to the class will be lost/removed to avoid serialization issues. The call should warn the user about it. A name for the API call can be overwriteModel or replaceModel ...

Add docker support to dataClay scripts

Current scripts implemented for dataClay HPC using singularity can be used to implement local executions using docker.

Add support to ARM64/v8

Required support for arm64/v8 architecture

New Exception management

Design and implement a new exception management in a performant and usable way.

Modify license to Apache

Implement Extrae communication lines

Simplify singularity builds

As remote builds from dockerhub images are preserving environment variables, is it not needed anymore to parse any Dockerfile.

Remove [begin|end]ENVruntime from dockerfiles
Simplify singularity build

Check if we should add LICENSE in other places like docs

Bootstrap sanity check

At bootstrap time, if mandatory environment variables and/or properties are not set, a proper error message should be returned.

When important parameters are missing, currently thrown exceptions are not easily interpretable.

Modify dataclay-packaging travis configuration to send mail to all if failed

Investigate and address certain performance problems for high-volume workloads.

We detected certain misbehaviour when executing numerical applications without active features --i.e. doing attribute getters and attribute setters with big numerical data structures.

At first we assumed that it was some performance penalty at the gRPC or socket/serialization layer, and certain preliminary results seemed to confirm that, but after more in-depth tests the results came up unconclusive.

Nightly builds or snapshots

Decide to use nightly builds or run tests in last available package in maven, pypi and dockers. Any votes?

This is associated to epic #6

Singularity deployment

Implement deployment in supercomputers using Singularity containers

Implement Continous Integration system

Design and implement a CI system including CI Testing

Implement testing system

Refers to Epic #6

Create repository that test current packages in development (dockers, pypi and maven) using BDD functional tests.

Tests will be developed using cucumber and python behave. Reports will be published in github.io , one tool could be allure.

Tests will run tests using current published dev packages in different environments and architectures (arm, amd64, ...) . For that we will use travis matrix.

Functional tests should be grouped in travis jobs and may take maximum 50 minutes.

Orchestration of dataClay for each test still under design. We could use bash scripts that execute docker-compose up or any needed docker command; bash scripts could be reused in orchestration repository in the future (current only using singularity).

Java functional test can be "prepared" using python behave (deployment, new account, contracts, ...) and using jpype we could call each Java specific step. @alexbarcelo what do you think? This could allow us to have more reusability of code. Another option is to create two different feature files, one for Java and one for Python (and then use cucumber-jvm for Java and behave for Python) but is not conceptually correct that a feature depends on the environment (is like creating features for OS)

Final structure would be:

.
├── make-persistent
│   ├── docker-compose.yml  (maybe as a docstring in feature)
│   ├── environment.py (include calls to test-orchestration)
│   ├── java
│   │   ├── app
│   │   │   ├── cfgfiles
│   │   │   └── src
│   │   └── model
│   ├── make-persistent.feature
│   ├── python
│   │   ├── app
│   │   │   ├── cfgfiles
│   │   │   └── src
│   │   │       └── hellopeople.py
│   │   └── model
│   │       └── src
│   │           ├── classes.py
│   │           └── __init__.py
│   └── steps
│       └── make-persistent.py
└── test-orchestration
    └── orchestrate.sh

Add support to python 3.6

Integration with COMPSs + dataClay may require support to python 3.6 and/or make it default.
One option could be to install numpy only for x86_64 architecture?

Discuss what to do with client image (tag client with python version used by the client image?) Client image is used for demos of compss and dataclay in dockers.

First desing or implementation of CI with Travis

Support for newVersion onto Python objects

newVersion feature hasn't been properly implemented nor tested for Python objects

Configure Travis to test java unitary tests

Update 'delete alias'

Since atm an object can have just one alias, the following 2 things should be changed:

The deleteAlias method should not require any argument (both in Java and Python)
The documentation should be updated accordingly

Implement orchestration scripts

Implement scripts for orchestrating singularity/docker instances in localhost/marenostrum/...

Travis ARM release

When dataclay-packaging is merged to master, travis can deploy dockers to Dockerhub, check if Travis can deploy them in ARM also.

This is associated to #6

Simplify logging

Simplify logging so users don't need to configure log4j and global.properties CHECK_LOG4J_ENABLED=true in java or DEBUG=True in python. Also, simplify python logging.

There should be two flags "--debug" and "--verbose" that can be added in docker-compose and singularity-compose.

dsjava: 
   command: "--debug"

When this is enabled, then services will print debug info (grpc debug info will be printed in verbose mode even if grpc consider it debug)

Currently, for dataclaycmd, we just need to add "--debug" or "--verbose" to the command itself:

docker run -v $PWD/app/cfgfiles/:/home/dataclayusr/dataclay/cfgfiles/:ro \
	 bscdataclay/client:2.1 --debug WaitForDataClayToBeAlive 10 5

Design resource management limitations for requests

Design and implement a way to manage memory leaks caused by several requests at the same time (accounting)... Most of the memory management design can be found here:

Memory management design

Problem specification

Unlimited solution: Thread state persistence

Unlimited solution: Code block system

Limited and choosed solution: dataClay GC

Singularity nightly builds

Configure travis to run nightly singularity push and run singularity pulls in MN

Automatize method ids for dataClay Extrae

Distributed objects in cycle are not being cleaned by GC

We have two Storage Locations, with two objecs A and B where A references B and B references A. When those objects are not accessible, they should be cleaned by the GC but they're currently not.

When the DS1 GC finds out that object A is candidate to be cleaned, it is marked as 'unaccessible' and all references objects from A should receive a "-1" ref. counting. GC notifies DS2 that B has -1 reference. In the next iteration, GC realizes that A has 1 reference, so it is not an 'unaccessible' candidate anymore and A is marked as 'accessible', so GC should notify DS2 +1 reference to B.

If, during -1 notification, DS2 processes candidate B and notifies -1 to DS1, then objects are properly removed, however, this cannot be guaranteed, so if DS2 processes the -1 notification and notifies -1 later, then the objects are never cleaned.

This is not happening if the cycle is distributed.

Any ideas on how to fix that? @alexbarcelo

"LOCAL" flag for make persistent not working in COMPSs

"LOCAL" flag uses session.properties file to determine in which Storage Location persisted objects created during that session should be stored. However, in a COMPSs environment, session.properties file is propagated and copied to all workers, so all objects created by workers are going to be stored in master's SL. This should be fixed with a new design.

New dataclaycmd library

New dataclaycmd library that do not depend on java or python (maybe go) so users can install and use it. Previous dataclay tool required docker or dataclay JAR, dataClay 2.0 avoid the user to have anything installed since it is inside a docker container, however we should offer a better and friendlier way to do that (and not depend on dockers)

bsc-dom / dataclay-packaging Goto Github PK

dataclay-packaging's People

Contributors

Watchers

dataclay-packaging's Issues

Memory management design

Recommend Projects

Recommend Topics

Recommend Org