Giter VIP home page Giter VIP logo

Comments (34)

DavyLandman avatar DavyLandman commented on June 8, 2024

I've run some jmh benchmarks in ways to avoid this allocation, they are all slower than just allocating A LOT.

I've tried:

  • storing inside ThreadLocal
  • keeping a queue and pushing reusable LookupWrappers on there
  • Implement a small circular buffer of LookupWrappers bound to the most recent threads

All are slower, closing this issue until someone proposes something else.

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024
/**
    * Special wrapper that is used for lookup, so that we don't have to create a weak-reference when we don't need it.
    * It would have been nicer if we could use the normal equals method of the target object, but that won't be able to work together with the WeakReferenceWrap.
    */

Since in the meantime the IValue.equals() method does not ignore annotations (or anything else) anymore, have the constraints for not using WeakReferenceWrap directly disappeared?

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

This is about weak reference equals, not about ivalue equals.

I checked caffeine, they are doing something similar.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

It would only work if we extend type equals to deal with this unwrapping. Since this class is made to avoid memory leaks in the typefactory

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

this is also a reference: https://github.com/cwi-swat/aterms/blob/master/shared-objects/src/shared/SharedObjectFactory.java

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

It solves the entire problem in a different way; perhaps good to compare to.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

Let's not then. We're already depending on caffeine so that's alright. Which configuration of caffeine caches are we thinking about?

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

Well the thing is, if you make a custom map implementation, you can inline the unwrapping of the weak reference, and call equals between the right unwrapped object.

But this map might be replaced by a caffeine LoadingCache with weak reference keys, but I would have to check if it still only does referential equality. And if it guarantees atomic loader action.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

Actually, a WeakHashMap might be just right, with some locking around it. I'll try something on Monday.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

Thanks for discussion @jurgenvinju, after reading how the jdk WeakHashMap works internally, it was exactly what we were searching for. So switched to that, when I implemented this class 2y back, I made some quite extensive tests, so I feel confident that it's working as intended.

I'm very curious to see a benchmark of this, I might make a backport for the pre-java-11 release so we can see the impact of it without the jump to java 11 fuzzying the picture.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

I fear I'll have to end up implementing a small map. Since the WeakHashMap get might end up changing the map, as it's clearing out old entries.

from vallang.

ben-manes avatar ben-manes commented on June 8, 2024

At one point I was able to coerce escape analysis to avoid the runtime allocation. It is finicky and can't be relied upon, so I don't know if that still helps.

Avoiding allocations via a ThreadLocal was an approach that I considered for Caffeine, after suggesting it to elastic/apm-agent-java#357 to resolve their problems. It won't be a good idea with Loom (virtual threads) and can be a concern in environments that play games with classloaders.

If you want a custom map then Guava's cache (originally MapMaker), Spring's ConcurrentReferenceHashMap, or Hibernate's ConcurrentReferenceHashMap may be worth considering. All were written in Java 5 when the caching libraries had poor concurrency. They are optimized for reference-based caching, though I helped refocus Guava's towards size-based as best that I could. All three are mature but not maintained, because their sin was an emphasis on soft references as the ideal caching strategy. While this looked great in a microbenchmark, in reality it caused stop-the-world death spirals and many production fires (including Adwords).

Maybe instead a discussion on what this class is used for might inspire an alternative approach that avoids your allocation pains?

from vallang.

ben-manes avatar ben-manes commented on June 8, 2024

P.S. WeakHashMap is not thread-safe. Unfortunately it is not good enough to perform writes inside a lock while reading outside of it, as your commit does. This can cause visibility problems that might lead to an infinite loop. WeakHashMap is similar to Java's older version, and that race is described nicely in this article.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

hi @ben-manes thanks for checking in. I just committed daf00de, that at least should protect for the cleanup during the get lock.

But yeah, WeakHashMap sure seems like an outdated map. HashMap & ConcurrentHashMap have a lot strategies to switch to trees in case of many collisions.

To help with some context. Our runtime typesystem dependens on hashconsing/maximalsharing for the type values. So in the TypeFactory we want to always return the same exact instance for the same type. A ConcurrentHashMap would already be good enough, except that it can cause memory leaks, since in an IDE setup, you might be constantly changing your types, and after a few days/weeks of an IDE open, our type store is taking up a few GB that nobody else has a reference too anymore. So we want to store the keys (and values) in a WeakReference.

Is that the discussion you were suggesting?

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

P.S. WeakHashMap is not thread-safe. Unfortunately it is not good enough to perform writes inside a lock while reading outside of it, as your commit does. This can cause visibility problems that might lead to an infinite loop. WeakHashMap is similar to Java's older version, and that race is described nicely in this article.

Damn, I was thinking about that, should have checked that more carefully. Okay, I'll have to revert this, and think of a new scheme on a branch.

Interestingly, our old approach with the HashMap never triggered this infinite loop. But it makes sense.

from vallang.

ben-manes avatar ben-manes commented on June 8, 2024

It looks like your HashConsingMap is more commonly referred to as an Interner (ala String.intern(s)). Guava's Interners.weak() delegates to MapMaker.weakKeys() with a dummy value (your WeakReference value is not needed).

You depend on the object identity rather than equality throughout the runtime? That does make it harder if you cannot safely discard and recreate an equivalent object, and have the system work correctly. If you cannot use an off-the-shelf class like Guava's, then a quick fallback might be a two-level cache. A standard size-based to capture recency and frequency, and evict/load through a victim cache to maintain identity. That might absorb enough allocations of the lookup wrappers that the remainder fall within tolerance of the young gen collector.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

True, it's like interning.

I don't think we want to import guava for this class.

But your suggestions about the two level cache sounds good. We use the old one with weak references for the 'older/less used' generation, and use regular keys for the hot map. Thanks for the idea, I think that might be a good trade-off.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

I've developed a first version of the 2 maps approach. I like the fact that for the types used in the past period, we avoid using a WeakReference and the lookup, but for new values and old values we will be going through 2 or three maps.

I'm currently not a fan of how I handled the promotion/demotion logic, as that ends up looping through the collection once every second.

from vallang.

ben-manes avatar ben-manes commented on June 8, 2024

Typically instead of a periodic poll one would use ReferenceQueue#remove() to block the thread on, e.g. see java.lang.ref.Cleaner. To avoid this thread, Caffeine and Guava check periodically as part of their maintenance routines which are guarded by a tryLock, so like WeakHashMap it is cleaned up in response to activity on the map.

I don't think you need to worry about demotion and can use an inclusive second-level cache. If we can assume that the reads from the weak cache are rarer due to the strong cache then the synchronization cost is negligible if simplified to a WeakHashMap. What do you think about something like below?

private final Cache<E, E> strong = Caffeine.newBuilder().maximumSize(10_000).build();
private final Map<E, WeakReference<E>> weak = Collections.synchronizedMap(new WeakHashMap<>());

public E get(E e) {
  return strong.get(e, obj -> {
    var interned = weak.get(obj);
    E value = (interned == null) ? null : interned.get();
    if (value == null) {
      weak.put(obj, new WeakReference<>(obj));
      value = obj;
    }
    return value;
  });
}

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

Typically instead of a periodic poll one would use ReferenceQueue#remove() to block the thread on, e.g. see java.lang.ref.Cleaner. To avoid this thread, Caffeine and Guava check periodically as part of their maintenance routines which are guarded by a tryLock, so like WeakHashMap it is cleaned up in response to activity on the map.

True, I didn't want to puth the cleanup in the get logic, and also didn't want to create a thread per instance. But I'll reconsider moving it to the get logic.

I don't think you need to worry about demotion and can use an inclusive second-level cache. If we can assume that the reads from the weak cache are rarer due to the strong cache then the synchronization cost is negligible if simplified to a WeakHashMap. What do you think about something like below?

private final Cache<E, E> strong = Caffeine.newBuilder().maximumSize(10_000).build();
private final Map<E, WeakReference<E>> weak = Collections.synchronizedMap(new WeakHashMap<>());

public E get(E e) {
  return strong.get(e, obj -> {
    var interned = weak.get(obj);
    E value = (interned == null) ? null : interned.get();
    if (value == null) {
      weak.put(obj, new WeakReference<>(obj));
      value = obj;
    }
    return value;
  });
}

Interesting idea, so it's always in the weak cache, and gets promoted to the strong cache on a get? interesting.

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

Great discussion! Thanks for this.

Some thoughts:

  • it is 100% likely that most types will be "build" several times during a normal run of a Rascal program and thus in a weak->strong cache nesting they would all end up in the strong cache pretty quickly;
  • For every version of the Rascal program as it is compiled/interpreted types of the previous version will become dead but they are already in the strong cache for the previous reason;
  • The number of dead types is expectedly very small; only long runs of an IDE with many different versions of types lead to memory issues.
  • So a clean up cycle of the cache is not something we'd need to spent time on every minute or every second. Rather once in an hour or so would be good enough IMHO. Consider that every unique type has to be typed in by someone, literally, during this process.
  • In a normal run of a deployed Rascal program, all types are known and declared priori (at class-load time) and they never die. It is not reasonable to pay a lot of efficiency in this deployed context, because of the other development context.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

Agreed, we are trying to find a way that has a minimal impact on the performance. Only a possible increased memory pressure in the compact scheme that @ben-manes proposed.

For every version of the Rascal program as it is compiled/interpreted types of the previous version will become dead but they are already in the strong cache for the previous reason;

The goal is that they should after a while be demoted to the weak cache. It's also why I don't want to put a fixed size limit on the cache (although for most cases that is the preferred way to do a cache). We cannot safely estimate a right size limit, and don't want to constantly be demoting&promoting a set of types, but we also do not want to keep around types longer then needed.

I agree that in a compiled run, they are rarely restored after initial class loading. But currently we are in a interpreter world still, and the TypeFactory is hit a lot during execution. So some of the concerns will go away in time, but will remain active for a interactive REPL or IDE.

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

I agree that in a compiled run, they are rarely restored after initial class loading. But currently we are in a interpreter world still, and the TypeFactory is hit a lot during execution. So some of the concerns will go away in time, but will remain active for a interactive REPL or IDE. (@DavyLandman)

That's not what I meant. In a compiled run the types never die. By the way even now in the compiled code there are still many dynamically created types (for example for instantiating polymorphic functions and data-types). However, it is with a closed world of programs highly unlikely that enough of those dynamically created types would die during a program run to motivate a garbage collect. We could know all of the types in advance probably using some kind of abstract interpretation but that is not worth the effort.

However, between versions of these compiled programs in the IDE, we have dead types as usual.
In this way the compiled code will behave as the interpreted code will and especially in a compiled (and dynamically loaded) context against an existing TypeFactory instance we will always have the same garbage collection issues with compiled code as the current interpreter has.

I don't think you need to worry about demotion ( @ben-manes)

Probably I read this wrong. This does not mean there will not be any demotion from the strong cache to the weak, but that it would not take up too much time to demote older references?

How would demotion work? We expect most types to go into the second level almost immediately, so I am worried about the demotion :-)

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

I don't think you need to worry about demotion ( @ben-manes)

Probably I read this wrong. This does not mean there will not be any demotion from the strong cache to the weak, but that it would not take up too much time to demote older references?

How would demotion work? We expect most types to go into the second level almost immediately, so I am worried about the demotion :-)

In @ben-manes approach, demotion works by the strong cache dropping entries after the 10_000 entries are reached. In that proposed configuration, the oldest entries are dropped (but that can be changed). Since the weak map contains all entries, you can always get the reference back from there (and thus promote them back).

I like the simplicity, but it does mean that you can get a very lock intensive startup phase where they are all putting stuff in the weak cache everytime.

Based on these discussions I will update my branch a bit, make it a bit more simpler.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

I've changed the logic a bit (#131), but kept most of it the same. Here are my trade-offs:

  • I've reduced the costs of a cache miss caused by an entry in cold storage, it gets promoted right away, so that the next calls will be getting it back from the hot/strong cache. I've kept the cleanup of these on a separate thread to reduce the cost of that cache miss
  • The maintenance thread occurrence is connected to the demoteAfter period. This means an entry can be almost twice as long in the hot cache as configured, but the trade-off is that the cleaner is running less often. Alternativly we could run it once every 5 minutes and be happy as well.
  • I've opted not to go for a synchronized version of the WeakHashMap as that would cause many locks during cache misses, especially at start-up we have a lot of cache misses, in LSP context, they are spread over different threads, so we do not want have a global lock there.

The only downside I can see (apart from the 290 lines of code) with the current strategy is that a cache miss on both levels means 2 gets and a put. So 3 hashCode calculations, and a bunch of equals calls in case of hashpostfix-collisions. So it will be slower at startup, but gain speed once the hot cache is filled with entries.

from vallang.

jurgenvinju avatar jurgenvinju commented on June 8, 2024

from vallang.

ben-manes avatar ben-manes commented on June 8, 2024

The reason why I didn't bother with demotion is because the strong cache's reference keeps the entry from being garbage collectable. Therefore we could avoid shuffling it back to the weak cache on eviction as that work doesn't seem to offer any benefits.

You might consider pulling the custom hashmap code in from Spring (apache) or Hibernate (LGPL). They are self contained and could be copied in directly to avoid the dependency. Neither depend on a background thread and support weak references without allocating on read.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

The reason why I didn't bother with demotion is because the strong cache's reference keeps the entry from being garbage collectable. Therefore we could avoid shuffling it back to the weak cache on eviction as that work doesn't seem to offer any benefits.

Just to be clear, this is at the cost of more entries in the weak cache. Right?

You might consider pulling the custom hashmap code in from Spring (apache) or Hibernate (LGPL). They are self contained and could be copied in directly to avoid the dependency. Neither depend on a background thread and support weak references without allocating on read.

True, I've thought about that, but am a bit hesitant, as we ideally would want a concurrent version to avoid locks, but I'll look into it a bit more, since I really would like to get rid of the allocation during the get.

from vallang.

ben-manes avatar ben-manes commented on June 8, 2024

Just to be clear, this is at the cost of more entries in the weak cache. Right?

Yes, but since you are already keeping those entries in-memory the cost is minimal (the allocation to store into a hashmap).

am a bit hesitant, as we ideally would want a concurrent version to avoid locks

Those are concurrent by using a segmented hash map inspired by Java 5's ConcurrentHashMap. The main concern is probably the amount of code to copy over, but at least are well proven by now. Otherwise you would need a dependency (e.g. on Guava) which is generally okay, but I agree that is not ideal for some projects.

from vallang.

DavyLandman avatar DavyLandman commented on June 8, 2024

Thanks to this discussion, I've simplified the logic in the pull request (#131). it's ready for review now.

from vallang.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.