Giter VIP home page Giter VIP logo

Comments (18)

danthony06 avatar danthony06 commented on September 17, 2024

There should be logs in /usr/local/tomcat/logs, but I'm not sure if they will have information useful for this problem. Was it previously running fast, or is this a new installation?

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

Hello,
before it was not so slow.
The only interesting part in the logs are snippets like:

19-Jan-2022 20:29:41.728 INFO [MessageBroker-4] org.springframework.web.socket.config.WebSocketMessageBrokerStats.lambda$initLoggingTask$0 WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 5 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5], sockJsScheduler[pool size = 8, active threads = 1, queued tasks = 4, completed tasks = 275440]

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

One thought that came to my mind:
Could setting indexes or similar options in the PostgreSQL help to improve the performance?

from bag-database.

pjreed avatar pjreed commented on September 17, 2024

The database should already have quite a few indexes on relevant columns; you could check using a tool like DBeaver to connect to the database, although if your tables are missing indexes, I would expect that to drastically slow down searching, not scanning new files. In fact, lacking indexes would actually make inserting new records faster since it can simply append them to the table without updating the indexes.

Scanning should mostly be limited by disk read speed, since it has to read in the entire file, and to a lesser degree by CPU speed, since it has to generate a hash to identify the bag file. This could be an issue if you're reading very large bags over a slow network connection, or potentially if you're reading large bag files from slow HDDs, especially if the bag files themselves are unindexed and there's a lot of disk thrashing going on.

from bag-database.

danthony06 avatar danthony06 commented on September 17, 2024

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

Hello,
@pjreed: indexes look fine and what you say about indexes sounds reasonable, so probably nothing to do with that
@danthony06: yes, they are mostly cut at 4.1 GB
Is there a limit ? or something to optimize?

from bag-database.

danthony06 avatar danthony06 commented on September 17, 2024

@ptulpen There's not limit to my knowledge. I mostly wanted to make sure you weren't uploading 100GB bag files that might be causing network issues.

from bag-database.

pjreed avatar pjreed commented on September 17, 2024

How much free RAM does your server have? Is it possible that it's hitting swap space while trying to read the bags?

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

I have 32 GB ram(and 8 cpu) and it is not fully used
I also increased the java memory limit with -e CATALINA_OPTS=" -Xmx10g"

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

I also see in the scanning process some interesting errors like these:

2022-09-06 14:53:24.289 [pool-2-thread-1] ERROR c.g.s.b.s.f.FilesystemBagStorageImpl - Unexpected error updating bag file:
java.lang.IllegalArgumentException: Chunk [**description**] is not a valid entry
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:219)
        at com.google.common.base.Splitter$MapSplitter.split(Splitter.java:526)
        at com.github.swrirobotics.bags.BagService.lambda$getMetadata$6(BagService.java:1307)
        at com.github.swrirobotics.bags.reader.BagFile.forMessagesOnTopic(BagFile.java:395)
        at com.github.swrirobotics.bags.BagService.getMetadata(BagService.java:1303)
        at com.github.swrirobotics.bags.BagService.extractTagsFromBagFile(BagService.java:1352)
        at com.github.swrirobotics.bags.BagService.addTagsToBag(BagService.java:1564)
        at com.github.swrirobotics.bags.BagService.insertNewBag(BagService.java:1499)
        at com.github.swrirobotics.bags.BagService.updateBagInDatabase(BagService.java:1761)
        at com.github.swrirobotics.bags.BagService.updateBagFile(BagService.java:1689)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl.lambda$updateBags$3(FilesystemBagStorageImpl.java:159)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl.updateBags(FilesystemBagStorageImpl.java:146)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl$$FastClassBySpringCGLIB$$17031e11.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
        at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl$$EnhancerBySpringCGLIB$$953167bd.updateBags(<generated>)
        at com.github.swrirobotics.bags.storage.BagScanner.lambda$scanStorage$0(BagScanner.java:371)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
        at com.github.swrirobotics.bags.storage.BagScanner.lambda$scanStorage$2(BagScanner.java:374)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

Description is part of the forders structure, which, but not even a complete folder name.
The name would be of the pattern YYYY-MM-DD_description
So are there any "forbidden" characters or patterns?

from bag-database.

pjreed avatar pjreed commented on September 17, 2024

That's interesting, that definitely isn't a normal error...

That exception looks like it's being thrown from code that is trying to parse tags in the bag file. If you've configured metadata topics, then it expects every message on those topics to have a string field named data, and each one of those should be a newline-separated set of key:value pairs; for example:

name: John Doe
email: [email protected]

I suspect you've got a bag file with metadata that is formatted in a way it doesn't expect; do you have an example of anything in your files that might be formatted differently from that?

from bag-database.

pjreed avatar pjreed commented on September 17, 2024

I've submitted a PR at #196 that will make it handle invalid metadata more gracefully when scanning bag files. I don't know if that will fix the speed issue you're having, but it may fix some other issues people have seen with it failing to recognize certain bag files...

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

Hello,
yes you are completely right
rostopic gave me
%time,field.data
1234567894591808795,mydescription

now I rebuild it to
%time,field.data
1234567894591808795,description: mydescription

With a small subset I tested it and if looks much faster
Now I rewrite upload scripts and a "repair" script

The more graceful metadata scanning sounds also good, issues like that could happen in other scenarios as well
I tried to test that as well, but I could not build it. can you provide the branch also as container ? (this is how I run the current system)

from bag-database.

pjreed avatar pjreed commented on September 17, 2024

Sure, I've pushed a image containing my build to ghcr.io/hatchbed/bag-database:v3.5.1-SNAPSHOT. Give that a try and see if it works for you.

from bag-database.

danthony06 avatar danthony06 commented on September 17, 2024

v3.5.1 has been released with this fix.

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

The patch regarding the metadata is great.
But a larger set of files showed, that is still takes a long time (for 875 files it took a week)
What also looks interesting is, that when I add new files and start a scan, a lot of files gets scanned and only when the scan is finished, they appear all at once in on the website
Is this an intended behaviour?

EDIT: also it appears to be on the database when everything is done (at least according to grepping through pqdump )
Maybe this is also connected with: #195
Blind guess would be that there is some kind of lock and caching

from bag-database.

ptulpen avatar ptulpen commented on September 17, 2024

another thought regarding this: we saw that we have many images and quite big videos inside the bags
can we maybe skip the analysis/extraction of that during the scan and focus on text based stuff?

from bag-database.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.