Comments (18)
There should be logs in /usr/local/tomcat/logs
, but I'm not sure if they will have information useful for this problem. Was it previously running fast, or is this a new installation?
from bag-database.
Hello,
before it was not so slow.
The only interesting part in the logs are snippets like:
19-Jan-2022 20:29:41.728 INFO [MessageBroker-4] org.springframework.web.socket.config.WebSocketMessageBrokerStats.lambda$initLoggingTask$0 WebSocketSession[0 current WS(0)-HttpStream(0)-HttpPoll(0), 5 total, 0 closed abnormally (0 connect failure, 0 send limit, 0 transport error)], stompSubProtocol[processed CONNECT(0)-CONNECTED(0)-DISCONNECT(0)], stompBrokerRelay[null], inboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 5], sockJsScheduler[pool size = 8, active threads = 1, queued tasks = 4, completed tasks = 275440]
from bag-database.
One thought that came to my mind:
Could setting indexes or similar options in the PostgreSQL help to improve the performance?
from bag-database.
The database should already have quite a few indexes on relevant columns; you could check using a tool like DBeaver to connect to the database, although if your tables are missing indexes, I would expect that to drastically slow down searching, not scanning new files. In fact, lacking indexes would actually make inserting new records faster since it can simply append them to the table without updating the indexes.
Scanning should mostly be limited by disk read speed, since it has to read in the entire file, and to a lesser degree by CPU speed, since it has to generate a hash to identify the bag file. This could be an issue if you're reading very large bags over a slow network connection, or potentially if you're reading large bag files from slow HDDs, especially if the bag files themselves are unindexed and there's a lot of disk thrashing going on.
from bag-database.
from bag-database.
Hello,
@pjreed: indexes look fine and what you say about indexes sounds reasonable, so probably nothing to do with that
@danthony06: yes, they are mostly cut at 4.1 GB
Is there a limit ? or something to optimize?
from bag-database.
@ptulpen There's not limit to my knowledge. I mostly wanted to make sure you weren't uploading 100GB bag files that might be causing network issues.
from bag-database.
How much free RAM does your server have? Is it possible that it's hitting swap space while trying to read the bags?
from bag-database.
I have 32 GB ram(and 8 cpu) and it is not fully used
I also increased the java memory limit with -e CATALINA_OPTS=" -Xmx10g"
from bag-database.
I also see in the scanning process some interesting errors like these:
2022-09-06 14:53:24.289 [pool-2-thread-1] ERROR c.g.s.b.s.f.FilesystemBagStorageImpl - Unexpected error updating bag file:
java.lang.IllegalArgumentException: Chunk [**description**] is not a valid entry
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:219)
at com.google.common.base.Splitter$MapSplitter.split(Splitter.java:526)
at com.github.swrirobotics.bags.BagService.lambda$getMetadata$6(BagService.java:1307)
at com.github.swrirobotics.bags.reader.BagFile.forMessagesOnTopic(BagFile.java:395)
at com.github.swrirobotics.bags.BagService.getMetadata(BagService.java:1303)
at com.github.swrirobotics.bags.BagService.extractTagsFromBagFile(BagService.java:1352)
at com.github.swrirobotics.bags.BagService.addTagsToBag(BagService.java:1564)
at com.github.swrirobotics.bags.BagService.insertNewBag(BagService.java:1499)
at com.github.swrirobotics.bags.BagService.updateBagInDatabase(BagService.java:1761)
at com.github.swrirobotics.bags.BagService.updateBagFile(BagService.java:1689)
at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl.lambda$updateBags$3(FilesystemBagStorageImpl.java:159)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl.updateBags(FilesystemBagStorageImpl.java:146)
at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl$$FastClassBySpringCGLIB$$17031e11.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:783)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:123)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:388)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
at com.github.swrirobotics.bags.storage.filesystem.FilesystemBagStorageImpl$$EnhancerBySpringCGLIB$$953167bd.updateBags(<generated>)
at com.github.swrirobotics.bags.storage.BagScanner.lambda$scanStorage$0(BagScanner.java:371)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
at com.github.swrirobotics.bags.storage.BagScanner.lambda$scanStorage$2(BagScanner.java:374)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Description is part of the forders structure, which, but not even a complete folder name.
The name would be of the pattern YYYY-MM-DD_description
So are there any "forbidden" characters or patterns?
from bag-database.
That's interesting, that definitely isn't a normal error...
That exception looks like it's being thrown from code that is trying to parse tags in the bag file. If you've configured metadata topics, then it expects every message on those topics to have a string
field named data
, and each one of those should be a newline-separated set of key:value pairs; for example:
name: John Doe
email: [email protected]
I suspect you've got a bag file with metadata that is formatted in a way it doesn't expect; do you have an example of anything in your files that might be formatted differently from that?
from bag-database.
I've submitted a PR at #196 that will make it handle invalid metadata more gracefully when scanning bag files. I don't know if that will fix the speed issue you're having, but it may fix some other issues people have seen with it failing to recognize certain bag files...
from bag-database.
Hello,
yes you are completely right
rostopic gave me
%time,field.data
1234567894591808795,mydescription
now I rebuild it to
%time,field.data
1234567894591808795,description: mydescription
With a small subset I tested it and if looks much faster
Now I rewrite upload scripts and a "repair" script
The more graceful metadata scanning sounds also good, issues like that could happen in other scenarios as well
I tried to test that as well, but I could not build it. can you provide the branch also as container ? (this is how I run the current system)
from bag-database.
Sure, I've pushed a image containing my build to ghcr.io/hatchbed/bag-database:v3.5.1-SNAPSHOT
. Give that a try and see if it works for you.
from bag-database.
v3.5.1 has been released with this fix.
from bag-database.
The patch regarding the metadata is great.
But a larger set of files showed, that is still takes a long time (for 875 files it took a week)
What also looks interesting is, that when I add new files and start a scan, a lot of files gets scanned and only when the scan is finished, they appear all at once in on the website
Is this an intended behaviour?
EDIT: also it appears to be on the database when everything is done (at least according to grepping through pqdump )
Maybe this is also connected with: #195
Blind guess would be that there is some kind of lock and caching
from bag-database.
another thought regarding this: we saw that we have many images and quite big videos inside the bags
can we maybe skip the analysis/extraction of that during the scan and focus on text based stuff?
from bag-database.
Related Issues (20)
- Tagging a specific timespan HOT 3
- Document AdminController API
- Add filesystem storage option for periodic checks
- Video Streaming framerate infinity HOT 2
- Setting up BagDB with Webviz and external LDAP server HOT 3
- Is bag database or one of its dependencies affected by log4j vulnerability? HOT 3
- ROS2 HOT 1
- getting error 401 when uploading HOT 3
- LDAP authentication only looks for user by email
- easier upload of bags with metadata HOT 4
- bag-database not reading from AWS S3 HOT 7
- issues when updating tags
- Spring4Shell vulnerability HOT 1
- [Feature request] metadata and add tagging via topics HOT 2
- Working on extending the db - need help with Sencha HOT 5
- when open with webviz, sometimes the bag during severe reduction HOT 1
- uploaded bags are not shown
- [Feature Request] Parse bags from archives (zip/tar) HOT 1
- Bad File Permissions Make Scanning Slow
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bag-database.