Giter VIP home page Giter VIP logo

Comments (12)

non avatar non commented on September 25, 2024

Hey, thanks for reporting this issue!

I will look at your use case. But one question: If you use an Iterator[String] instead of a Stream[String] does your test start working?

Scala's Stream is famous for memoizing its results. Which means it's really easy for the stream's items to never be released. I'm not saying that your test has this problem, but I was just curious if it's something you've looked at.

from jawn.

non avatar non commented on September 25, 2024

Looking at your test code, it doesn't seem like it would hang onto a reference to the stream. I'll investigate. Thanks again for reporting this.

from jawn.

non avatar non commented on September 25, 2024

I just added a PR for you: https://github.com/agourlay/jawn-heap/pull/1

from jawn.

non avatar non commented on September 25, 2024

On my machine, I don't get an error.

I'm using OSX (Mavericks not Yosemite). Java version is:

java version "1.7.0_04"
Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)

Using htop I see memory usage constantly rising, and then hitting a ceiling (for me, around 14G or so) at which point a Full GC drops memory usage to like 200M. After that, memory stays low but I seem to hit a (relatively fast) Full GC every 55k object parses or so. (I saw around 2,643,000 objects being parsed).

Jawn is definitely expected to be able to handle an infinite stream, so I do consider the behavior you're seeing a bug.

from jawn.

non avatar non commented on September 25, 2024

Can you tell me a bit more about the platform you're running on, the behavior you're seeing, and what the GC log tells you?

from jawn.

non avatar non commented on September 25, 2024

Oh wait, I think I am seeing your behavior. The time between Full GCs is getting shorter. Hmm.

Still interested in your own observations, but I definitely have something to look into.

from jawn.

agourlay avatar agourlay commented on September 25, 2024

Wow thank you for the fast feedback!
I will try your PR right away.

I am running Ubuntu 14.10 with
java version "1.7.0_65" OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu1) OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

I would expect this test to run easily within 2 Go of Heap but everytime I get an java.lang.OutOfMemoryError: GC overhead limit exceeded.

If I add the option -XX:-UseGCOverheadLimit it runs forever as the program spends most of its time doing GC.

The test case I gave you is extracted from my current work on https://github.com/agourlay/json-2-csv-stream where I found this problem.

I will now try your PR on my machine and come back with more GC statistics.

from jawn.

non avatar non commented on September 25, 2024

I'm about to submit another PR. I wrote a test involving Iterator[String] instead, and I'm not seeing the same behavior. I think this might have to do with Stream after all (even though you are not holding onto the head).

from jawn.

non avatar non commented on September 25, 2024

OK, here's a summary of what I found: https://github.com/agourlay/jawn-heap/pull/2

Give it a shot and see what you think. I commented out the GC statistics because the newer test is fast enough that they are hard to read.

from jawn.

agourlay avatar agourlay commented on September 25, 2024

I have merged and tried your second PR and it now works in a nearly constant space for me too!

To be honest, I am a bit disappointed by Scala's Stream memoization but at least I learned something new today 👍

Thank you very much for digging into this issue although it was not specific to Jawn.

from jawn.

non avatar non commented on September 25, 2024

No problem. There's definitely a need for a non-memoizing stream for Scala.

from jawn.

agourlay avatar agourlay commented on September 25, 2024

Before changing all my code to Iterator I will try to get rid of memoization by following this blog post series http://blog.dmitryleskov.com/programming/scala/stream-hygiene-i-avoiding-memory-leaks/ and maybe give a try to the EphemeralStream from Scalaz.

Thank you again, I can close this issue now!

from jawn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.