Comments (5)
What you're asking for is already happening. Node handles chunking automatically. You can choose to implement a Writable Stream anyway you like. Remember, the endpoint is not always a file--it could be another service that accepts streams, for instance. A simple example would be that you could use the passthrough
method to pipe a file upload stream through clamscan and then into zlib
to compress it, and then onto Amazon S3.
For example:
const { createGzip } = require('node:zlib');
const { passthrough } = require('clamscan');
const AWS = require('aws-sdk');
AWS.config.region = '<your region here>';
const s3Config = {
params: {
Bucket: '<your bucket name here>',
},
};
const s3 = new AWS.S3(s3Config);
const s3Stream = require('s3-upload-stream')(s3);
const gzip = createGzip();
const uploadStream = getSomeUploadStream(); // oversimplification
const clamscan = await new NodeClam().init({
debugMode: true,
clamdscan: {
host: 'localhost',
port: 3310,
bypassTest: true,
},
});
const av = clamscan.passthrough();
// Do some stream piping
input.pipe(av).pipe(gzip).pipe(s3Stream);
// Handle events from passthrough
av
.on('error', (error) => {
// Handle errors
})
.on('timeout', () => {
// scan/stream has timed-out
})
.on('finish', () => {
// stream has been fully read and sent to scanner
})
.on('end', () => {
console.log('All data has been scanned sent on to the destination!');
})
.on('scan-complete', (result) => {
console.log('Scan Complete: Result: ', result);
if (result.isInfected === true) {
// stream is a virus
} else if (result.isInfected === null) {
// Issue scanning stream
} else {
// stream is not a virus
}
});
output.on('finish', () => {
// data has been fully written to the output
output.destroy();
});
output.on('error', (error) => {
console.log('Final Output Fail: ', error);
});
from clamscan.
Hi there,
The deal is that the file is split on the client's end into multiple chunks of bytes, which then gets uploaded one by one, on their own POST requests
So each incoming request's ReadStream is not a complete file, but instead just a chunk of bytes
Here's the rough description of how it works:
First I'll initiate a WriteStream for an output physical file, with flags: 'a'
option for appending mode, that will be shared across all the requests
Clients initiate unique UUIDs for each unique individual files so that the server knows which group of requests to pipe into which shared WriteStream
Then I just pipe all the ReadStream, in sequence, into said WriteStream:
// end set to false so that the shared WriteStream will not be finalized after a single ReadStream's pipe
readStream.pipe(writeStream, { end: false })
After uploading all the chunks of bytes, client will then send another request to the server to finalize the stream, after which I will do:
writeStream.end()
The final output physical file will properly end up being bytes-identical with the pre-split file on the client's end
If you're wondering why such elaborate system is even a thing that people do, for my case it's to work around 100 MB POST limit of Cloudflare Free plan, but applicable to any other proxies with limited POST size
It basically allows me to immediately have a complete file (fast), instead of having to write into multiple "chunk files" which will then be re-combined later (slow)
But I can't say the same if I first pipe it through clamscan's PassthroughStream (assuming I also have it be first initiated for the individual file to then share across all the requests of chunks of bytes)
readStream
.pipe(scanStream, { end: false })
.pipe(writeStream, { end: false })
For reasons that I cannot really understand, the bytes that get piped into the writeStream will become all sorts of funky
I'm aware that it works as expected when simply dealing with a whole file though (i.e. something like the example codes in your reply)
from clamscan.
Yeah, unfortunately, clamscan can't really scan files as partials since the bad part could be split across chunks and hence undetectable.
A rudimentary example:
If the following string is known virus 'ABC1234'
Chunk 1: 'foobarAB'
(looks clean, let it through)
Chunk 2: 'C1234baz'
(looks clean let it through)
Concatenated Output: 'foobarABC1234baz'
(virus made it through, not good!)
So, behind the scenes, when using the passthrough
method this package is essentially splitting the Readable stream into 2 readable streams, sending one to the clamav socket/IP service and one to the piped output (for example, a writable file stream). It only sends that chunk to the piped output if we can confirm the chunk has been received by clamav. If at any point the clamav service detects a virus on the accumulated packets it has received, it will immediately kill the secondary piped output and emit the 'scan-complete'
event with the isInfected
set to true
. You should then delete the partially-written-to file immediately.
In other words, it can detect a virus mid-stream but only if it has all the chunks preceding it. This all happens in a socket "session". We open a socket connection to clamav, send some commands to let it know we are going to scan stuff, and then send some chunks. After each chunk it acknowledges receipt and then responds with GOOD/BAD. If any are BAD, we stop. If all are GOOD, we keep sending chunks until we have no more. Once all chunks are sent, we close the socket connection to clamav. If that socket connection is open for too long without anything written to it, ClamAV throws a TIMEOUT. Either way, that socket connection is created and stored in a handle/variable when the passthrough
method is called. I can't really be stored "globally".
Long story short, I don't think we can have an infinitely-appendable writable stream to the ClamAV socket in the same way that we can with a file. Infinitely-appendable files are easy since there aren't any "timeout" issues with files and they are always written to disk whereas commands written to a socket sessions are not.
I hope that makes sense, haha.
from clamscan.
Thank you for the thorough explanation
I see, that does make sense
The random idea I wrote in the original post:
I'd assume clamscan will have to facilitate withholding the passthrough data until they're done, before forwarding them to be scanned to clamav?
would not really make much sense either, now that I think about it
By that point, you'd simply be either wastefully keeping extra temp physical files for the withheld bytes, or simply ending up with more memory usage if they are simply being withheld in memory
Perhaps I'll just use the passthrough during the finalization, when it comes to chunked uploads, in the stage where the server needs to move over the final file from the working directory to the final storage
They are not already being written to the final storage directly, because some server owners that use network drivers learned that apparently such setup does not support append mode of WriteStream, which sounds like yet another deep rabbit hole in technicalities so my stop-gap measure has always been to just fs.copyFile, haha
Anyways, thanks again, and also for the awesome work you've put into this library!
I'm satisfied enough with what I've ended up learning from this issue, so I'll be closing it now
from clamscan.
Okay. Yeah, in your use-case, I think that's gonna be the best way forward. Have a good one!
from clamscan.
Related Issues (20)
- Frequent failed clamAV scan fails HOT 2
- Premature end of ClamAV socket stream behind telepresence proxy HOT 18
- Clamav file size limit - throw custom error HOT 10
- Socket times out when remote scanning EICAR test file HOT 7
- node-clam: Could not verify the clamdscan binary HOT 2
- scanStream returning isInfected = true even when its not HOT 4
- Receiving null response from ScanFile HOT 2
- Missing error handler for tcp client
- TypeError: clamscan.isInfected is not a function HOT 1
- Retry on ECONNREFUSED for ClamAV using TCP sockets HOT 1
- No/Empty Response HOT 1
- Virus analysis tools should use local heuristical analysis/sandboxes plus artificial CNS HOT 4
- NodeClamError: There was an issue scanning the path specified! HOT 1
- Support `port` only connection
- Could not find file to scan! Error When File Name Contains Multiple Spaces HOT 3
- Path issues - Ubuntu Server 20.04 + bad support for Windows HOT 9
- No response from Docker host from my local machine HOT 1
- Increasing file size does not work with daemon + Clamscan "INSTREAM: Size limit reached, (requested: 65536, max: 0)" HOT 5
- execFile & Clamdscan result parsing HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clamscan.