datanucleus / datanucleus-core Goto Github PK

DataNucleus core persistence support - the basis for anything in DataNucleus

Java 99.80% HTML 0.20%

datanucleus-core's Introduction

datanucleus-core

DataNucleus core persistence support - the basis for anything in DataNucleus.

This is built using Maven, by executing mvn clean install which installs the built jar in your local Maven repository.

KeyFacts

License : Apache 2 licensed
Issue Tracker : http://github.com/datanucleus/datanucleus-core/issues
Javadocs : 6.0, 5.2, 5.1, 5.0, 4.1, 4.0
Download : Maven Central
Dependencies : See file pom.xml
Support : DataNucleus Support Page

Persistence Process

The primary classes involved in the persistence process are

PersistenceNucleusContext - maps across to a PMF/EMF, and provides access to the StoreManager and ExecutionContext(s) (PersistenceNucleusContextImpl).
ExecutionContext - maps across to a PM/EM, and handles the transaction (ExecutionContextImpl)
StateManager - manages access to a persistent object (StateManagerImpl)
StoreManager - manages access to the datastore (see the datastore plugins, e.g RDBMSStoreManager)
MetaDataManager - manages the metadata for the class(es), so how it is persisted

Persistence : Retrieve of Objects

MyClass myObj = (MyClass)pm.getObjectById(id);
myObj.getSomeSet().add(newVal);

calls wrapper (see org.datanucleus.store.types.wrappers.XXX or org.datanucleus.store.types.wrappers.backed.XXX)
if optimistic txns then queues up til flush/commit
otherwise will call backing store for the wrapper (RDBMS) which updates the DB, or will mark the field as dirty (non-RDBMS) and the field is sent to the datastore at the next convenient place.

Query q = pm.newQuery("SELECT FROM " + MyClass.class.getName()); List results = (List)q.execute();
Makes use of QueryManager to create an internal Query object (wrapped by a JDO/JPA Query object). This may be something like org.datanucleus.store.rdbms.query.JDOQLQuery specific to the datastore.
The query is compiled generically. This involves converting each component of the query (filter, ordering, grouping, result etc) into Node trees, and then converting that into Expression trees. This is then stored in a QueryCompilation, and can be cached.
The query is then converted into a datastore-specific compilation. In the case of RDBMS this will be an RDBMSCompilation, and will be an SQL string (and associated parameter/result lookups).
The query is executed in the datastore and/or in-memory. The in-memory evaluator is in datanucleus-core under org.datanucleus.query.evaluator.memory. The execution process will return a QueryResult (which is a List).
Operations on the QueryResult such as "iterator()" will result in lazy loading of results from the underlying ResultSet (in the case of RDBMS)

Persistence : Pessimistic Transactions

All persist, remove, field update calls go to the datastore straight away. Flush() doesn't have the same significance here as it does for optimistic, except in that it will queue "update" requests until there are more than say 3 objects waiting. This means that multiple setters can be called on a single object and we get one UPDATE statement.

persist

Calls ExecutionContext.persistObject which calls EC.persistObjectWork.
Creates a StateManager (StateManagerImpl - SM). Adds the object to EC.dirtySMs.
Calls SM.makePersistent which calls SM.internalMakePersistent which will pass the persist through to the datastore plugin.
Calls PersistenceHandler.insertObject, which will do any necessary cascade persist (coming back through EC.persistObjectInternal, EC.indirectDirtySMs).

remove

Calls ExecutionContext.deleteObject, which calls ExecutionContext.deleteObjectWork.
This will add the object to EC.dirtySMs.
Calls SM.deletePersistent.
Calls SM.internalDeletePersistent which will pass the delete through to the datastore plugin.
Calls PersistenceHandler.deleteObject, which will do any necessary cascade delete (coming back through EC.deleteObjectInternal, EC.indirectDirtySMs).

update field

Calls SM.setXXXField which calls SM.updateField and, in turn, EC.makeDirty.
The update is then queued internally until EC.flushInternal is triggered (e.g 3 changes waiting).

Collection.add

Calls SCO wrapper.add which will add the element locally.
If a backing store is present (RDBMS) then passes it through to the backingStore.add().

Collection.remove/clear

Calls SCO wrapper.remove/clear which will add the element locally.
If a backing store is present (RDBMS) then passes it through to the backingStore.remove()/clear().
If no backing store is present and cascade delete is true then does the cascade delete, via EC.deleteObjectInternal.

Persistence : Optimistic Transactions

All persist, remove, field update calls are queued. Flush() processes all remove/add/updates that have been queued. Call ExecutionContext.getOperationQueue() to see the operations that are queued up waiting to flush.

persist

remove

Calls ExecutionContext.deleteObject, which calls ExecutionContext.deleteObjectWork.
Creates a StateManager as required. Adds the object to EC.dirtySMs.
Calls SM.deletePersistent. Uses DeleteFieldManager to process all reachable objects.

update field

Calls SM.setXXXField which calls SM.updateField and, in turn, EC.makeDirty.
The update is then queued internally until EC.flushInternal is triggered.

Collection.add

Calls SCO wrapper.add which will add the element locally.
Adds a queued operation to the queue for addition of this element.

Collection.remove/clear

Calls SCO wrapper.remove/clear which will add the element locally.
Adds a queued operation to the queue for removal of this element.

Flush Process

When a set of mutating operations are required to be flushed (e.g transaction commit) the FlushProcess for the StoreManager is executed. At the start of the flush process we have a set of primary objects that were directly modified by the user and passed in to calls, as well as a set of secondary objects that were connected to primary objects by relationships, and were also modified. A "modification" could mean insert, update, delete.

An RDBMS uses a org.datanucleus.flush.FlushOrdered [Javadoc]. Other datastores typically use a org.datanucleus.flush.FlushNonReferential [Javadoc].

MetaData Process

The MetaDataManager is responsible for loading and providing access to the metadata for all persistable classes. MetaData can come from Java annotations, XML metadata files, or via the JDO MetaData API.

Each class is represented via a org.datanucleus.metadata.ClassMetaData [Javadoc]. This in turn has a Collection of org.datanucleus.metadata.FieldMetaData [Javadoc] andy/or org.datanucleus.metadata.PropertyMetaData [Javadoc] depending whether the metadata is specified on a field or on a getter/setter method. Fields/properties are numbered alphabetically, with the absolute field number starting at the root class in an inheritance tree and the relative field number starting in the current class.

Query Process

DataNucleus provides a generic query processing engine. It provides for compilation of string-based query languages. Additionally it allows in-memory evaluation of these queries. This is very useful when providing support for new datastores which either don't have a native query language and so the only alternative is for DataNucleus to evaluate the queries, or where it will take some time to map the compiled query to the equivalent query in the native language of the datastore.

Query : Input Processing

When a user invokes a query, using the JDO/JPA APIs, they are providing either

A single-string query made up of keywords and clauses
A query object that has the clauses specified directly

The first step is to convert these two forms into the constituent clauses. It is assumed that a string-based query is of the form

SELECT {resultClause} FROM {fromClause} WHERE {filterClause}
GROUP BY {groupingClause} HAVING {havingClause}
ORDER BY {orderClause}]]></source>

The two primary supported query languages have helper classes to provide this migration from the single-string query form into the individual clauses. These can be found in org.datanucleus.store.query.JDOQLSingleStringParser [Javadoc] and org.datanucleus.store.query.JPQLSingleStringParser [Javadoc].

Query : Compilation

So we have a series of clauses and we want to compile them. So what does this mean? Well, in simple terms, we are going to convert the individual clauses from above into expression tree(s) so that they can be evaluated. The end result of a compilation is a org.datanucleus.store.query.compiler.QueryCompilation [Javadoc].

So if you think about a typical query you may have

SELECT field1, field2 FROM MyClass

This has 2 result expressions - field1, and field2 (where they are each a "PrimaryExpression" meaning a representation of a field). The query compilation of a particular clauses has 2 stages

Compilation into a Node tree, with operations between the nodes
Compilation of the Node tree into an Expression tree of supported expressions

and compilation is performed by a JavaQueryCompiler, so look at org.datanucleus.store.query.compiler.JDOQLCompiler [Javadoc] and org.datanucleus.store.query.compiler.JPQLCompiler [Javadoc]. These each have a Parser that performs the extraction of the different components of the clauses and generation of the Node tree. Once a Node tree is generated it can then be converted into the compiled Expression tree; this is handled inside the JavaQueryCompiler.

The other part of a query compilation is the org.datanucleus.store.query.compiler.SymbolTable [Javadoc] which is a lookup table (map) of identifiers and their value. So, for example, an input parameter will have a name, so has an entry in the table, and its value is stored there. This is then used during evaluation.

Query : Evaluation In-datastore

Intuitively it is more efficient to evaluate a query within the datastore since it means that fewer actual result objects need instantiating in order to determine the result objects. To evaluate a compiled query in the datastore there needs to be a compiler for taking the generic expression compilation and converting it into a native query. Additionally it should be noted that you aren't forced to evaluate the whole of the query in the datastore, maybe just the filter clause. This would be done where the datastore native language maybe only provides a limited amount of query capabilities. For example with db4o we evaluated the filter and ordering in the datastore, using their SODA query language. The remaining clauses can be evaluated on the resultant objects in-memory (see below). Obviously for a datastore like RDBMS it should be possible to evaluate the whole query in-datastore.

Query : Evaluation In-memory

Evaluation of queries in-memory assumes that we have a series of "candidate" objects. These are either user-input to the query itself, or retrieved from the datastore. We then use the in-memory evaluator org.datanucleus.store.query.inmemory.InMemoryExpressionEvaluator [Javadoc]. This takes in each candidate object one-by-one and evaluates whichever of the query clauses are desired to be evaluated. For example we could just evaluate the filter clause. Evaluation makes use of the values of the fields of the candidate objects (and related objects) and uses the SymbolTable for values of parameters etc. Where a candidate fails a particular clause in the filter then it is excluded from the results.

Query : Results

There are two primary ways to return results to the user.

Instantiate all into memory and return a (java.util.)List. This is the simplest, but obviously can impact on memory footprint.
Return a wrapper to a List, and intercept calls so that you can load objects as they are accessed. This is more complex, but has the advantage of not imposing a large footprint on the application.

To make use of the second route, consider extending the class org.datanucleus.store.query.AbstractQueryResult and implement the key methods. Also, for the iterator, you can extend org.datanucleus.store.query.AbstractQueryResultIterator.

Types : Second-Class Objects

When a persistable class is persisted and has a field of a (mutable) second-class type (Collection, Map, Date, etc) then DataNucleus needs to know when the user calls operations on it to change the contents of the object. To do this, at the first reference to the field once enlisted in a transaction, DataNucleus will replace the field value with a proxy wrapper wrapping the real object. This has no effect for the user in that the field is still castable to the same type as they had in that field, but all operations are intercepted.

Types : Container fields and caching of Values

By default when a container field is replaced by a second-class object (SCO) wrapper it will be enabled to cache the values in that field. This means that once the values are loaded in that field there will be no need to make any call to the datastore unless changing the container. This gives significant speed-up when compared to relaying all calls via the datastore. You can change to not use caching by setting either

Globally for the PersistenceManagerFactory - this is controlled by setting the persistence property org.datanucleus.cache.collections. Set it to false to pass through to the datastore.
For the specific Collection/Map - add a MetaData <collection> or <map> extension cache setting it to false to pass through to the datastore.

This is implemented in a typical SCO proxy wrapper by using the SCOUtils method useContainerCache() which determines if caching is required, and by having a method load() on all proxy wrapper container classes.

Types : Container fields and Lazy Loading

JDO and JPA provide mechanisms for specifying whether fields are loaded lazily (when required) or whether they are loaded eagerly (when the object is first met). DataNucleus follows these specifications but also allows the user to override the lazy loading for a SCO container. For example if a collection field was marked as being part of the default fetch group it should be loaded eagerly which means that when the owning object is instantiated the collection is loaded up too. If the user overrides the lazy loading for that field in that situation to make it lazy, DataNucleus will instantiate the owning object and instantiate the collection but leave it marked as "to be loaded" and the elements will be loaded up when needed. You can change the lazy loading setting via

Globally for the PMF/EMF - this is controlled by setting the persistence property org.datanucleus.cache.collections.lazy. Set it to true to use lazy loading, and set it to false to load the elements when the collection/map is initialised.
For the specific Collection/Map - add a MetaData <collection> or <map> extension cache-lazy-loading. Set it to true to use lazy loading, and false to load once at initialisation.

Types : SCO fields and Queuing operations

When DataNucleus is using an optimistic transaction it attempts to delay all datastore operations until commit is called on the transaction or flush is called on the PersistenceManager/EntityManager. This implies a change to operation of SCO proxy wrappers in that they must queue up all mutating operations (add, clear, remove etc) until such a time as they need to be sent to the datastore. The ExecutionContext has the queue for this purpose.

All code for the queued operations are stored under org.datanucleus.flush.

Types : Simple SCO interceptors

There are actually two sets of SCO wrappers in DataNucleus. The first set provide lazy loading, queueing, etc and have a "backing store" where the operations can be fed through to the datastore as they are made (for RDBMS). The second set are simple wrappers that intercept operations and mark the field as dirty in the StateManager. This second set are for use with all (non-RDBMS) datastores that don't utilise backing stores and just want to know when the field is dirty and hence should be written.

All code for the backed SCO wrappers are stored under org.datanucleus.store.types.wrappers.backed. All code for the simple SCO wrappers are stored under org.datanucleus.store.types.wrappers.

Schema

MultiTenancy

The handling for multi-tenancy code is present in each of the store plugins but is controlled from

org.datanucleus.metadata.AbstractClassMetaData.getMultitenancyMetaData : returns details of any multi-tenancy discriminator null if the class does not need it).
org.datanucleus.ExecutionContext.getTenantId : returns the tenant id to use for multi-tenancy (for any write operations, and optionally any read operations)
org.datanucleus.PersistenceNucleusContext.getTenantReadIds : returns the tenant ids to use in any read operations (overriding the tenant id above when specified).

The metadata of the class defines whether it has a tenancy discriminator (i.e you have to explicitly add the metadata to get a discriminator).

CDI Integration

DataNucleus allows use of CDI injected resources into attribute converter classes (JDO and JPA) as well as JPA lifecycle listeners. The basis for this is the specification of the persistence property datanucleus.cdi.bean.manager. If this is set then when creating an instance of the objected with injected resources, we call

CDIHandler.createObjectWithInjectedDependencies(cls);

datanucleus-core's People

Contributors

Stargazers

Watchers

datanucleus-core's Issues

Throw exception when user provides invalid/incomplete inheritance metadata to avoid later "problems"

eg with:
B <-- 1 to 1 --> A
^
|
C

and:
and no discriminator column

then:
A a = pm.getObjectById(A.class, id);
B b = a.getB();
Object o = pm.getObjectById(C.class, b.getId());

returns object of type B instead of type C
dntc.zip

In-memory evaluation of Optional.get doesn't check for isPresent but should

Enable StateManager pooling when runs reliably multi-threaded

When test.jpa.general MultiThreadTest will run fine with datanucleus.stateManager.maxIdle > 0 then change default for maxIdle to 100.

Runtime Enhancement : cater for relations when determining persistability of classes

When we have a class A that has a relation to class B, and in the metadata this relation field is not made explicitly persistent this should be fine (since the other class is going to be enhanced). The problem is that the timing of enhancement of the class B is too late to return that A.b is persistent and hence enhanced.
test.jar.gz

Workaround is to explicitly set the persistence-modifier to persistent so we know that A.b is persistent

CompleteClassTable : support column names for embedded collection element (map key/value, array element)

CompleteClassTable should potentially be able to provide naming for embedded collection elements (where the element is stored nested in something like a JSON/XML object), or embedded map keys/values, etc.

Currently this is not handled completely. We support nested embedded PC fields, and simply store the ColumnImpl into columns/columnByName (keyed by its column name), and its MemberColumnMappingImpl into mappingByEmbeddedMember (keyed by its embedded member name PATH). The problems with this are that

We cannot cope with storing embedded keys AND values this way, since if we have a field map then we would just get the mappingByEmbeddedMember path for a field field1 of key as map.field1, and if the value had a field of the same name (hence map.field1) then how would we know it was for the value?
There is no concept of hierarchy to represent where we are storing the members as "nested", we just have their embedded member path and have to extract it
We currently have a check whether a column is already used for a "table", and if a nested column name clashes with a column name for a primary field of the main class then we get an exception!

The problem with changing the way these MemberColumnMapping and Column objects are stored is that they are accessed from the store plugins for Cassandra, Excel, HBase, JSON, MongoDB, Neo4j, and ODF. Consequently these plugins would need updating to use the "new" API

CompleteClassTable and MemberColumnMapping do not support collection element or map key/value conversion. Need to add

For all non-RDBMS datastores we make use of the CompleteClassTable structure with its MemberColumnMapping. This handles a converter for the field as a whole, but does not currently support having a converter just for collection elements (when this field is a collection), and similar for map keys/values. We should put this information into the MemberColumnMapping somehow so that plugins can make use of it without having to retrieve the converter

Assorted improvements to DatastoreId and SingleFieldId

Both are for an "identity" that represents a persistable type, with a single "key". There should be much commonality.

Also SingleFieldId classes take in Class in the constructor for no reason now (could be String) - needs changing.

Also need to change ClassConstants.IDENTITY_OID_IMPL to be something like ClassConstants.DATASTORE_ID_IMPL

Problem areas are :

JDO "javax.jdo.identity" classes, that we need to convert back and forward to, uses class as argument in constructor (for no reason where String would do), and this wastes space.
SingleFieldId doesn't have a String constructor that takes the output of toString(), and toString() doesn't include the target class name.
DatastoreId classes allow configuration of the toString() method, but SingleFieldId doesn't. One way around this is to have a separate handler for the string form of the id (like IdentityUtils has a method getPersistableId()) so then we could dispense with DatastoreIdKodoImpl, DatastoreIdXcaliaImpl etc also maybe

Have now removed the majority of unnecessary references to DatastoreId/SingleFieldId and use IdentityUtils instead.

Enhancement contract : cater for serialisation of detachedState when user has overridden writeObject/writeReplace without calling "defaultWriteObject"

The enhancement contract will cater for normal Java serialisation where a user doesn't change the default process, OR where they make use of out.defaultWriteObject(). The dnDetachedState field is serialised/deserialised since not transient.

If the user deviates from this then it will not be serialised, and hence detached state is lost.
We would want to detect the overriding of writeObject/writeReplace and the absence of a call to defaultWriteObject (defaultReadObject) and in that case add on

out.writeObject(dnDetachedState);

as well as

dnDetachedState = (Object[])in.readObject();

Add in-memory evaluation support for Optional.orElse

Reachability algorith should transition from P NEW to TRANSIENT if object is no longer reachable

May values of PC object be deleted by JPOX if the PC object becomes transient? Reply to this Post
Reply with Quote
Hi!

I'm currently investigating a "problem":

Consider this:

public class Qualification implements Serializable
{
Organisation organisation;

[...]

}

public class Organisation
{
String name;

public Organisation(String name)
{
    this.name = name;
}

}

            Qualification qual = new Qualification("ISO 2001 certificate number 123045");
            Organisation org = new Organisation("JPOX Corporation");
            Organisation org2 = new Organisation("JPOX Consulting");

            qual.organisation = org;

            pm.makePersistent(qual);

            qual.organisation = org2;

            tx.commit();

Initially, org becomes PERSISTENT_NEW (by reachability). Then, the pointer pointing to org is replaced, thus, org is not longer reachable by qual. For that reason, org becomes PERSISTENT_NEW_DELETED, and then it becomes TRANSIENT. That's all correct.

What is wrong: org.name has now the value null, and not the value "JPOX Corporation", because JPOX silently deleted all the fields of org.

It explicitly violates the JDO2.0 spec, which tells on page 67 "18. A persistent-new-deleted instance transitions to transient on commit. No changes are made to the values.". However, JPOX currently makes changes to the values, leading to intricate, difficult-to-reproduce data loss problems.

Update lifecycle transitions for datastores that don't support transactions (so commit/rollback)

It would be desirable for the lifecycle transitions to better respect when a datastore doesn't support transactions. For example, if makePersistent is called then this is effectively going to put the object in the datastore and not allow it to be rolled back (i.e roll back does nothing). There are likely various use-cases to be worked through here. See jdo/general "StateTransitionsTest"

Change MetaData objects so that AbstractClassMetaData/AbstractMemberMetaData have MetaDataManager accessor

We currently have the various metadata objects with differing handling of the MetaDataManager. In FileMetaData it is set on the object, whereas on ClassMetaData etc it is passed in to methods as required.

Since MetaData objects can only be obtained from the MetaDataManager likely the best policy would be for AbstractClassMetaData to have a MetaDataManager reference, set on populate() and have an accessor. Similarly AbstractMemberMetaData can navigate to the class and so have an accessor for MetaDataManager. This will simplify the assorted methods, reducing the need to pass in the MetaDataManager.

Change persistence property "datanucleus.connection.singleConnectionPerExecutionContext" to true

This will impact on RDBMS datastores only currently. The problem with just changing it is that with that property, if a user forgets to close a PM then it will leave any datastore connection open.

In-memory evaluation of MonthDay.getMonthValue and YearMonth.getMonthValue are returning Month object!

Should return getMonthValue integer

Create a new autoStartMechanism "Package", that will scan a list of package for PC class to be initialized

Should add a properties datanucleus.autoStartPackageNames with comma-separated

Optimistic Transactions / Queued Operations Problems

The current technique used to queue operations in optimistic transactions is flawed: it queues operations per collection and does not honour the order of operations between different collections. If I have an optimistic transaction which does the following:

(given: two objects with collection properties, one object of element type)

objectA.collection.add(element)
objectA.collection.remove(element)
objectB.collection.add(element)
objectB.collection.remove(element)
objectA.collection.add(element)

The outcome after commit depends on the order in which the collections are processed.

A: add(element), remove(element), add(element)
B: add(element), remove(element)

With relationship management, element.owner would now be null; if B is processed first, it would be A. If everything was processed in order, it would be A (i.e. A would be correct / expected).

Originally raised on old DN forum.

Add converter for conversion from BufferedImage to ByteBuffer

Support lazy loading of elements of SCO Collections/Maps

We allow lazy loading ... of the whole collection currently. Nice to have is allow lazy load of the elements, just like a query is lazy loaded, e.g using a lazy loading iterator as per JDOQL

Marking a member as "serializable" conflicts with the use of a converter

If a user specifies a member with "serialized" and a converter then we should log a warning and ignore the serialized setting

Enhancement contract : dnNewObjectIdInstance() with compound PK when using persistent properties is generated incorrect for JPA

When enhancing with JPA and the entity has a compound PK and uses properties, the enhanced class has an incorrect dnNewObjectIdInstance method. It currently is generated as

public final Object dnNewObjectIdInstance(Object key)
{
return new CompoundPK2((String) key);
}

but the compound pk has no String constructor since it's not a JPA requirement

Add support for JPQL FROM join to a new "root" with ON condition

JPQL has certain restrictions on joining, allowing joining from the previous alias in a chain. So we can have a join along a relation
SELECT p FROM Person p LEFT OUTER JOIN p.account a

If there is no relation, it may be nice to allow a join to a new "root" like this
SELECT p FROM Person p LEFT OUTER JOIN MailingAddress a ON p.address = a.address

Clean up code around ClassLoaderResolverImpl for JRE classes so matches those in ClassNameConstants

List wrapper SCOs have inefficient initialise method when updating (setXXXField). Create efficient logic for working out changed elements

The structure for changing this is now in place in DN 4.1 and has been done for Sets, Maps. So wrappers for Lists can be modified from 4.1 onwards without needing other plugins changing.

See https://github.com/datanucleus/datanucleus-core/blob/master/src/main/java/org/datanucleus/store/types/wrappers/backed/ArrayList.java#L133 which currently does a clear() of the old list and addAll() of the new list.

Detaching an object graph with a Map and an overridden hashcode() in the map key fails when the key is being detached

Attached project demonstrates that PersistenceManager.detachCopy(Object) fails when the object graph has these properties:

There is a loop
The loop contains a Map key
The Map key has hashcode() overridden

The reason detaching fails is that the instance added to the Map is still being detached (the fields are not set yet).

It is possible that this issue cannot be fixed due to the nature of the object graph, and the requirement of JDO to detach a copy.
DN_FailingDetach.zip

Detaching the same object graph at the end of a transaction (i.e. in-place) works perfectly. The settings for this are:
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.RetainValues=true

In-memory query evaluation : support variables

We currently do not consider variables in in-memory evaluation. What we would need to do is change JavaQueryEvaluator.execute to handle the filter and result etc inside a loop, so we have variable values for each candidate within the loop, and then reset them at the end of each candidate in the loop. This will likely mean we need to handle things differently for aggregate results

Simple SCO container wrappers could delay processing of cascade-delete until flush/commit

The "simple" SCO wrappers for Collection/Map fields now handle cascade delete. They perform the delete immediately. While this complies with the JDO spec, it would be nice to register all updates perhaps with RelationshipManager and run that at flush to handle such deletes.

An example, if we have a 1-N bidir and we want to move an element from one owner to another. We remove the element from the old owner, and then add it to the new owner. The only problem is that the container has "dependent-field" specified and the remove() will cause it to be deleted before it can be added to the new owner. The proposed new mode of RelationshipManager will need to check for triggering of cascade-delete and then check if the owner of the element has since been changed and not null, this would then not cause it to be deleted.

Notes for implementation :

This is only aimed at the "simple" SCO wrappers currently. That is for all datastores except RDBMS and GAE. These don't have a backing store so have their own wrappers.
Need to cater for all possible routes for swapping elements. Namely if we have owner class A and element class B, and "a1" has element "b1" which we want to move to "a2". So two possible ways to swap :-

a2.getElements().add(b1);
a1.getElements().remove(b1);

a1.getElements().remove(b1);
a2.getElements().add(b1);

Any mechanism has to support both.

Once this is done then extend to SCO wrappers with backing store.

Update of embedded when using pessimistic txns can result in problem in dirty field handling

Consider

class A
{
    @Embedded
    B b;
}

class B
{
    @Embedded
    C c;
}

class C
{
    String field1;
    String field2;
}

We do the following updates

a.b.c.field1=newVal1;
a.b.c.field2=newVal2;

The first update will call SM.setXXXField. This calls SM.replaceField(pc, field, value, makeDirty) and cascades up to its owner. This then calls ec.makeDirty(a) followed by ec.makeDirty(c), and the second call triggers flushInternal in ExecutionContextImpl (due to reaching the limit of dirty objects). At this point the only object dirty in the ExecutionContextImpl is A. The call to ec.makeDirty(c) then completes marking it as dirty.

The second update then calls ec.makeDirty(a) which again triggers flushInternal. The dirty flag on the "c" hasn't yet been updated so anything in a store plugin relying on that information will fail.

This has likely been the situation since day 1 of supporting embedded objects

We get 2 SQL invoked, like these

UPDATE A SET C_ID=1,C_NAME_1='C3',C_NAME_2='C2',B_ID=1,B_NAME='B1' WHERE ID=1
UPDATE A SET C_ID=1,C_NAME_1='C3',C_NAME_2='C4',B_ID=1,B_NAME='B1' WHERE ID=1

so the first change goes through, followed by the second change.

If the user sets datanucleus.flush.auto.objectLimit to something like 3 then we get a single SQL like this
UPDATE A SET C_ID=1,C_NAME_1='C3',C_NAME_2='C4',B_ID=1,B_NAME='B1' WHERE ID=1

Lazily created persistent Set in jdoPreStore results in wrong object persistence order

When instantiating a previously nulled Set in the jdoPreStore method, then DN persists the added objects before the parent object is persisted to the DB. Therefore the parent foreign key constraint in the join table of the set will fail.

Test attached
datanucleus_test.tar.gz
: The failed addition of an element to the set is only logged as an error and the exception is not propagated as it IMHO should be.

Provide a persistence property to allow MetadataListener to be registered when a PMF is instantiated, such that it is called prior for any autostart classes.

background to this request in NUCCORE-1328.

Because DN does not support automatic schema creation for entities annotated with @PersistenceCapable(schema=...), I use a MetadataListener to do this work, "just in time". However, this prevents me from using the autoStart feature; instead I have to call SchemaTool's createSchema programmatic API instead.

Note : DN does now support schema creation (for RDBMS) as per NUCRDBMS-908

Add plugin-point for Query ResultSet handler

If users want to provide better large result-set handling, then a plugin-point would provide for this, and as better handlers are written can be swapped out more easily

Upgrade logging to use Log4j 2.x

If possible allow support for Log4j v1 and v2 and so it chooses v2 if present in the CLASSPATH, else v1, else j.u.l

This has Log4j2Logger in datanucleus-core, but with the default still Log4j v1.

Remaining work to make Log4j v2 as default

PluginParserTest has some use of Appender to detect messages raised. Needs changing to use v2 DONE
Add log4j v2 config files (to replace log4j.properties)
Update Maven/Eclipse to allow Log4j v2 config file. Maven DONE

Add mechanism whereby if there is metadata for a class that is not in the classpath we can just ignore it

DataFederation : consider changing specification to be like a "composite persistence-unit"

Enhancement contract : Embedded PK has incomplete dnCopyKeyFieldsXXX methods

When enhancing a class marked as @embeddable the class is enhanced, however its jdoCopyKeyFieldsXXX methods are not set (empty body) and so the key fields are not copied.

This is tested by "test.jpa.general" EmbeddedTest. When calling clean(...) this tries to use the SQL
DELETE FROM ... WHERE ... = ...
yet the PK values are not set. This is because AppIDObjectIdFieldConsumer is created and passed to Department.jdoCopyKeyFieldsFromObjectId, which calls AppIDObjectIdFieldConsumer.storeObjectField(...) with the DepartmentPK. This then calls DepartmentPK.jdoCopyKeyFieldsFromObjectId(...) which does nothing!

In-memory query evaluation : support correlated subqueries

See JDO2 TCK tests

CorrelatedSubqueries.testPositive
CorrelatedSubqueriesWithParameters.testPositive

query: SELECT FROM org.apache.jdo.tck.pc.company.Employee WHERE this.weeklyhours > (SELECT AVG(e.weeklyhours) FROM this.department.employees e)
[java] expected: java.util.ArrayList of size 6
[java] [FullTimeEmployee(1, emp1Last, emp1First, born 10/Jun/1970, phone {}, hired 1/Jan/1999, weeklyhours 40.0, $30000.0), FullTimeEmployee(2, emp2Last, emp2First, born 22/Dec/1975, phone {}, hired 1/Jul/2003, weeklyhours 40.0, $20000.0), FullTimeEmployee(4, emp4Last, emp4First, born 6/Sep/1973, phone {}, hired 15/Apr/2001, weeklyhours 40.0, $25000.0), FullTimeEmployee(6, emp6Last, emp6First, born 10/Jun/1969, phone {}, hired 1/Jun/2002, weeklyhours 40.0, $22000.0), FullTimeEmployee(7, emp7Last, emp7First, born 10/Jun/1970, phone {}, hired 1/Jan/2000, weeklyhours 40.0, $40000.0), FullTimeEmployee(10, emp10Last, emp10First, born 5/Sep/1972, phone {}, hired 1/Oct/2002, weeklyhours 40.0, $24000.0)]
[java] got: java.util.ArrayList of size 7
[java] [FullTimeEmployee(1, emp1Last, emp1First, born 10/Jun/1970, phone {}, hired 1/Jan/1999, weeklyhours 40.0, $30000.0), FullTimeEmployee(2, emp2Last, emp2First, born 22/Dec/1975, phone {}, hired 1/Jul/2003, weeklyhours 40.0, $20000.0), FullTimeEmployee(7, emp7Last, emp7First, born 10/Jun/1970, phone {}, hired 1/Jan/2000, weeklyhours 40.0, $40000.0), FullTimeEmployee(10, emp10Last, emp10First, born 5/Sep/1972, phone {}, hired 1/Oct/2002, weeklyhours 40.0, $24000.0), FullTimeEmployee(4, emp4Last, emp4First, born 6/Sep/1973, phone {}, hired 15/Apr/2001, weeklyhours 40.0, $25000.0), PartTimeEmployee(5, emp5Last, emp5First, born 5/Jul/1962, phone {}, hired 1/Nov/2002, weeklyhours 35.0, $18000.0), FullTimeEmployee(6, emp6Last, emp6First, born 10/Jun/1969, phone {}, hired 1/Jun/2002, weeklyhours 40.0, $22000.0)]
[java] at org.apache.jdo.tck.JDO_Test.fail(JDO_Test.java:682)
[java] at org.apache.jdo.tck.query.QueryTest.queryFailed(QueryTest.java:518)
[java] at org.apache.jdo.tck.query.QueryTest.checkQueryResultWithoutOrder(QueryTest.java:548)
[java] at org.apache.jdo.tck.query.QueryTest.execute(QueryTest.java:1293)
[java] at org.apache.jdo.tck.query.QueryTest.executeJDOQuery(QueryTest.java:1161)
[java] at org.apache.jdo.tck.query.jdoql.subqueries.CorrelatedSubqueries.testPositive(CorrelatedSubqueries.java:83)

A problem is that the candidates of the subquery will need retrieval and input before evaluation.

If the candidates of the subquery are of a different class to the candidate of the outer query then we need to retrieve them first.
If the candidates of the subquery relate to the outer query then this isn't necessary and we can nest the subquery evaluation in the outer query evaluation.

Class loading inside the loadClass (PersistenceCapable) method, should use the classloader of the current class

current:

try
{
    return Class.forName(className);
}
catch (ClassNotFoundException e)
{
    throw new NoClassDefFoundError(e.getMessage());
}

and it should be:

try
{
    return Class.forName(className, false, CurrentClass.getClass().getClassLoader());
}
catch (ClassNotFoundException e)
{
    throw new NoClassDefFoundError(e.getMessage());
}

This change breaks test.jdo.general "org.datanucleus.tests.metadata.AnnotationsPersistentInterfacesTest". Change rolled back

Ability to enhance all classes defined by a persistence.xml file

The DN enhancer allows us to enhance all classes defined by input XML/class files, or all classes defined by a persistence-unit name. Would be nice to allow the input of a persistence.xml file (with potentially multiple persistence-units) and enhance all classes

Composite PK auto-generation (compound identity support)

The org.datanucleus.enhancer.asm.primarykey.PrimaryKeyGenerator class has several TODOs that indicate support for compound identity is missing. This would be nice to have so I'm just opening a bug to request/track the additional feature.

If this isn't liable to coming anytime soon, I'd recommend that an error/warning be spit out during enhancement to mention that these classes aren't supported. As things stand now, this looks like it works up until runtime, when an exception similar to the following is thrown:
java.lang.NoSuchFieldError: thresholdRule
at com.knowledgecc.cxmlchecker.threshold.ThresholdRuleValidator_PK.hashCode(Unknown Source)
at java.util.HashMap.put(HashMap.java:418)
at org.datanucleus.util.ReferenceValueMap.put(ReferenceValueMap.java:145)
at org.datanucleus.cache.SoftRefCache.put(SoftRefCache.java:51)
at org.datanucleus.ObjectManagerImpl.putObjectIntoCache(ObjectManagerImpl.java:3874)
at org.datanucleus.ObjectManagerImpl.replaceObjectId(ObjectManagerImpl.java:3821)
at org.datanucleus.jdo.state.JDOStateManagerImpl.setIdentity(JDOStateManagerImpl.java:941)
at org.datanucleus.jdo.state.JDOStateManagerImpl.initialiseForPersistentNew(JDOStateManagerImpl.java:470)
at org.datanucleus.state.StateManagerFactory.newStateManagerForPersistentNew(StateManagerFactory.java:151)
at org.datanucleus.ObjectManagerImpl.persistObjectInternal(ObjectManagerImpl.java:1427)
at org.datanucleus.ObjectManagerImpl.persistObject(ObjectManagerImpl.java:1241)
at org.datanucleus.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:655)
at org.datanucleus.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:680)
at com.knowledgecc.entity.ThresholdRuleValidatorModelTest.saveValidator(ThresholdRuleValidatorModelTest.java:333)
at com.knowledgecc.entity.ThresholdRuleValidatorModelTest.testModelForSavedInstance(ThresholdRuleValidatorModelTest.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

Support 1-N with subclass-table and compound identity, or trap in metadata processing

I found a small issue with certain metadata files and throwing a java.lang.NullPointerException in org.datanucleus.store.rdbms.table.ClassTable#1046 where the datastoreClass variable may be null, and failing during the test datastoreClass.getIdMapping().

Please find attached to this issue a patch to apply to give the necessary information in the log and not the plain java.lang.NullPointerException.

DataFederation : Definition of persistence and retrieval of objects from federated datastores

Once the user has defined which datastores are federated for this PMF/EMF and which classes/objects are persisted into which datastore, then we need to produce the logic for handing off persistence and retrieval to the correct StoreManager.

We need to update QueryManager to allow for federation. The StoreManager needs to returns the Query object (instead of QueryManager currently), so then FederatedStoreManager can pass a query object that will handle the federation of query results.

Also need to take into account that the secondary datastore name is lowercased during the persistence property read, so need to do case-insensitive search for datastore name

Similarly need to update wrapper classes to get hold of the actual StoreManager for the object, maybe via a method on ObjectProvider which goes via ExecutionContext and if we have FederatedStoreManager then gets the right one - this is now done.

CopyOnAttach=false for collections likely doesn't handle element removal - check it

The handling for CopyOnAttach=false is likely problematic and needs more testing. Code in AttachFieldManager only seems to handle added elements in collections. If an element is deleted there seems to be no check with current datastore contents, and any subsequent datastore removal. Needs more unit tests

FetchPlan - support field.field, field#element.field, field#key.field, field#value.field syntax

The JDO2 spec defines the ability to specify fetch groups using field specs including
field1.field2
field1.field2.field3
field1#element
field1#element.field3

This is low priority since the same can be achieved by specifying fetch groups for each of the classes concerned.

An example

class A
{
    B b;
}

class B
{
    C c;
}

and we want to have a fetch group that loads up A.b, and B.c. With the above syntax we would add this for class A

<fetch-group name="group1">
    <field name="b"/>
    <field name="b.c"/>"
</fetch-group>

however you could easily do
(class A)

<fetch-group name="group1">
    <field name="b"/>
</fetch-group>

(class B)

<fetch-group name="group1">
    <field name="c"/>
</fetch-group>

Work required to complete this :-

Update AbstractPropertyMetaData so that it if a field name of "a.b" is specified it doesnt just put the name as "b".
Update FetchGroupMetaData.initialise() so that when encountering a field name specification as above, it finds the class that the field refers to and adds a fetch group of the same name over there (recursive).

This "feature" also, in principle, means that if we have A.B and A.C.B then the "B" could be loaded differently based on the place in the object graph. This would mean a major tear up of FetchPlan handling, and for little usage. That aspect is consequently not of interest

Add option of detaching wrappers of SCO classes, so allowing change tracking whilst detached

The current SCO wrappers don't have a way of marking the owning object as having the particular field as dirty when a mutator method is called.
e.g if we have a Collection field and whilst detached the user calls collection.add(...).

The changes needed are :
Enhancer : jdoMakeDirty - if detached (jdoStateManager == null && jdoDetachedState != null) then update the detached state for that field
Wrapper classes : makeDirty - if no ObjectProvider then call pc.jdoMakeDirty(fieldName).

With SVN trunk at 26/05/2009 the user can use "detachAllOnCommit" and the persistence property "datanucleus.detachAsWrapped" as "true" and will detach wrappers for SCOs. This will then keep track on dirty fields whilst detached (hence the attach can be optimised)

Taken from NUCCORE-1025. "Currently we default to detaching the raw java types for collection/map fields. There is a persistence property to allow detach (under some circumstances) with SCO wrappers for Collection/Map fields. The next logical step is to save a dirty flag when detached (i.e no owner) and allow the re-attach (need a setOwner method on the SCO)". Merged these two issues

RelationshipManager caters for subset of relation changes and should be extended to add other possibilities

For example, with RDBMS there is an amount of logic in FKSetStore, FKListStore, JoinSetStore, JoinListStore to manage some relation changes. This code should be moved to RelationshipManager so then all datastores would benefit from the relation management code.
SVN trunk now caters for set of a Collection field (replace the collection by a different collection, detecting the add/remove of elements).

JDO spec 15.3 for reference purposes.

The field on the other side of the relationship can be mapped by using the mapped-by attribute identifying the field on the side that defines the mapping. Regardless of which side changes the relationship, flush (whether done as part of commit or explicitly by the user) will modify the datastore to reflect the change and will update the memory model for consistency. There is no further behavior implied by having both sides of the relationship map to the same database column(s). In particular, making a change to one side of the relationship does not imply any runtime behavior by the JDO
implementation to change the other side of the relationship in memory prior to flush, and there is no requirement to load fields affected by the change if they are not already loaded. This implies that if the RetainValues flag or DetachAllOnCommit is set to true, and the relationship field is loaded, then the implementation will change the field on the other side so it is visible after transaction completion.

Similarly, if one side is deleted, the other side will be updated to be consistent at flush. During flush, each relationship in which the instance is involved is updated for consistency. These changes are applied to the object model instances. If the relationship is marked as dependent, the related instance is deleted. If the relationship is not marked as dependent, the corresponding field in the related instance is updated to not refer to the instance being deleted:
• If the related field is a collection, then any referencing element is removed.
• If the related field is a map, then any referencing map entry is removed.
• If the related field is a reference, then it is set to null.
If the related instances are not instantiated in memory, there is no requirement to instantiate them. Changes are applied to the second level cache upon commit.

The object model changes are synchronized to the database according to the declared mapping of the relationships to the database. If related instances are to be deleted, and there is a foreign key declared with a delete action of cascade delete, then the jdo implementation need do nothing to cause the delete of the related instance. Similarly, if there is a foreign key declared with a delete action of nullify, then the jdo implementation need do nothing to cause the column of the mapped relationship to be set to null. If there is a foreign key declared to be not nullable, and the requirement is to nullify the related field, then JDODataStoreException is thrown at flush.

Conflicting changes to relationships cause a JDOUserException to be thrown at flush time. Conflicting changes include:
• adding a related instance with a single-valued mapped-by relationship field to more than one one-to-many collection relationship
• setting both sides of a one-to-one relationship such that they do not refer to each other

Provide ability for store plugins to load references of related objects and cache them in the ExecutionContext (to save later fetch)

When a fetch of the DFG is triggered currently this will instruct the store plugin via XXXPersistenceHandler.fetchObject to fetch all unloaded DFG fields. Clearly, for the majority of non-RDBMS store plugins, we could just pull in all relation fields too (since we only store the "persistableId" of the related object in the "table". This information could then be cached in the ExecutionContext, and the first time the ObjectProvider is accessed for the value of that field and it isn't loaded we could call the ExecutionContext to see if there is info for that field and materialise the object(s) from the persistable id(s).

ExecutionContext has a mechanism for storing "associated values" of an ObjectProvider, so that can be used for storing these persistableIds

Property validation triggered twice on entity creation

Hi, currently the when using Bean Validation with JDO, the validation code gets triggered twice on entity creation (which might impact the performance depending on how heavy the validation is). The cause for that issue is this line over here:

https://github.com/datanucleus/datanucleus-core/blob/master/src/main/java/org/datanucleus/state/StateManagerImpl.java#L3293

As pointed out on the thread http://www.datanucleus.org/servlet/forum/viewthread_thread,7936 by one of the DN experts, the validation code gets triggered when the corresponding callback methods on JDOCallbackHandler get called, so that line is totally unnecessary (that's something I verified).

I'll provide a PR in order to fix this minor issue (which I tested using the latest code). If you also need test code for it, I can provide it as well (not so simple to do, though).
PR: #14

This PR will not be applied. Simply deleting lifecycle events from happening is NOT an option.
JPA uses the prePersist callback to have a lifecycle event as per what that does.
I see no test that this is trying to fix.

List.contains() returns inconsistent/incorrect result for a non-persistent argument

Class A has a field List<B> items. Class B has field "double value" and equals overridden to take into account the value of "value" (sorry for tautology :-)). An instance of B is added to A.items and A's instance is persisted. After retrieving A's instance back from the datastore and running contains() on the new instance of B with the same "double value" it returns "false". After iterating A.items it returns true for contains() with the same argument.

Looks like for the first time the cache is not loaded, so the org.datanucleus.store.types.sco.backed.List is used and ElementContainerStore.validateElementForReading() returns false, as the "element" is not persistent and not detached (it was never persisted). Not sure if this should be taken into account as it breaks transparency of JDO - if two objects are equal in Java terms why are they considered different if they differ in an opaque "persistent" state.

The second time already cached java.util.ArrayList collection is used, returning correct result.

Please see the attached testcase.
Verified java.util.List docs just in case:

Returns true if this list contains the specified element.
More formally, returns true if and only if this list contains
at least one element e such that
(o==null?e==null:o.equals(e)).

boolean contains(Object o)

Downgrading since the issue is use of methods only present in instances, so they have to be instantiated to evaluate them, hence you have to cache the collection. i.e easy workaround
NUCCORE-684.zip

One-Many bidir: moving elements by setting collection fails

Moving a collection element fails for One-Many bidir when it is done by setting a collection (containing that element) on a new owner. This is the case both for FKSetStore and JoinSetStore. Reproducible using ManagedRelationshipTest.testOneToManyJoinBidirSetCollectionMoveElement() and testOneToManyFKBidirSetCollectionMoveElement().

datanucleus / datanucleus-core Goto Github PK

datanucleus-core's Introduction

datanucleus-core

KeyFacts

Persistence Process

Persistence : Retrieve of Objects

Persistence : Pessimistic Transactions

persist

remove

update field

Collection.add

Collection.remove/clear

Persistence : Optimistic Transactions

persist

remove

update field

Collection.add

Collection.remove/clear

Flush Process

MetaData Process

Query Process

Query : Input Processing

Query : Compilation

Query : Evaluation In-datastore

Query : Evaluation In-memory

Query : Results

Types : Second-Class Objects

Types : Container fields and caching of Values

Types : Container fields and Lazy Loading

Types : SCO fields and Queuing operations

Types : Simple SCO interceptors

Schema

MultiTenancy

CDI Integration

datanucleus-core's People

Contributors

Stargazers

Watchers

Forkers

datanucleus-core's Issues

Recommend Projects

Recommend Topics

Recommend Org