Giter VIP home page Giter VIP logo

Comments (11)

abeaumont avatar abeaumont commented on August 15, 2024

enry uses whole file content to determine the language, so having a 1GB file, it's not surprising to require such amount of memory. I guess it could be improved, but otoh, it doesn't make sense to feed enry with such big files.

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024

@abeaumont #65 and #67
My 1GB file is supposed to be skipped. But even if it is not skipped, the overall directory size is 1.7 gigs, so how is it possible to consume >4 gigs?!

from enry.

abeaumont avatar abeaumont commented on August 15, 2024

I see. I'll check why it's not skipped if it should.
About why it may consume 4GB, it's because a file of 1GB can take more than 1GB in memory:

package main

import "io/ioutil"
import "os"

func main() {
	ioutil.ReadFile(os.Args[1])
}
> dd if=/dev/zero of=a-giga-of-zeros bs=1GiB count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.817687 s, 1.3 GB/s
> /usr/bin/time -v go run file.go a-giga-of-zeros
	Command being timed: "go run file.go a-giga-of-zeros"
	User time (seconds): 1.43
	System time (seconds): 0.96
	Percent of CPU this job got: 120%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.99
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 4331416
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 123703
	Voluntary context switches: 1682
	Involuntary context switches: 157
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024

Holy crap! I am disappointed in Go now more than usual.

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024

golang/go#16269

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024

Funnily enough, exactly the same code allocates 3282392 kbytes on my machine (1.8.3)

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024
GODEBUG=gctrace=1 go run file.go a-giga-of-zeros
gc 1 @0.072s 0%: 0.050+0.38+0.027 ms clock, 0.20+0.099/0.24/0.62+0.11 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 2 @0.126s 0%: 0.009+0.42+0.046 ms clock, 0.039+0/0.29/1.0+0.18 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
# command-line-arguments
gc 1 @0.010s 4%: 0.015+1.8+0.084 ms clock, 0.046+0.10/1.7/0.80+0.25 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
# command-line-arguments
gc 1 @0.002s 10%: 0.037+6.1+0.033 ms clock, 0.11+0.093/3.7/3.4+0.099 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc 2 @0.016s 9%: 0.009+2.2+0.016 ms clock, 0.037+1.1/1.9/1.6+0.064 ms cpu, 7->8->8 MB, 9 MB goal, 4 P
gc 3 @0.037s 7%: 0.011+5.7+0.035 ms clock, 0.046+0.059/5.4/5.8+0.14 ms cpu, 13->14->13 MB, 16 MB goal, 4 P
gc 4 @0.069s 7%: 0.008+10+0.042 ms clock, 0.034+1.6/10/0.58+0.17 ms cpu, 23->25->23 MB, 27 MB goal, 4 P
gc 1 @0.002s 4%: 0.024+1.9+0.059 ms clock, 0.072+0/0.48/0.11+0.17 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc 2 @0.004s 4%: 0.010+2.0+0.12 ms clock, 0.040+0/0.001/1.9+0.49 ms cpu, 7->7->6 MB, 8 MB goal, 4 P
gc 3 @0.009s 4%: 0.005+1.4+0.16 ms clock, 0.021+0/0.022/1.4+0.67 ms cpu, 14->14->12 MB, 15 MB goal, 4 P
gc 4 @0.014s 3%: 0.006+2.2+0.026 ms clock, 0.026+0/0.007/2.2+0.10 ms cpu, 28->28->24 MB, 29 MB goal, 4 P
gc 5 @0.025s 1%: 0.013+10+0.047 ms clock, 0.054+0/0.006/10+0.18 ms cpu, 56->56->48 MB, 57 MB goal, 4 P
gc 6 @0.052s 1%: 0.008+7.7+0.040 ms clock, 0.033+0/0.004/7.7+0.16 ms cpu, 112->112->96 MB, 113 MB goal, 4 P
gc 7 @0.093s 0%: 0.010+16+0.037 ms clock, 0.043+0/0.002/16+0.14 ms cpu, 224->224->192 MB, 225 MB goal, 4 P
gc 8 @0.176s 0%: 0.006+54+0.036 ms clock, 0.024+0/0.022/54+0.14 ms cpu, 448->448->384 MB, 449 MB goal, 4 P
gc 9 @0.364s 0%: 0.010+104+0.033 ms clock, 0.043+0/0.005/104+0.13 ms cpu, 896->896->768 MB, 897 MB goal, 4 P
gc 10 @0.722s 0%: 0.007+201+0.023 ms clock, 0.028+0/0.008/201+0.093 ms cpu, 1792->1792->1536 MB, 1793 MB goal, 4 P
gc 11 @1.427s 0%: 0.010+498+0.028 ms clock, 0.040+0/0.008/498+0.11 ms cpu, 3584->3584->3072 MB, 3585 MB goal, 4 P

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024

@abeaumont Is it possible to test with GC() followed by FreeOSMemory() before processing every next file?

from enry.

abeaumont avatar abeaumont commented on August 15, 2024

@vmarkovtsev asdf extension is not recognized by enry, that's why it needs to read the file.
About calling memory functions by hand after every file, I don't think it's a good idea. It makes the code more complex for small gains in extreme cases, and probably causing a worse average performance.

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024

This is something we can easily benchmark using our existing benchmarking toolkit. Besides, I don't see how these two lines make the code more complex. I saw the regexps in tokenize.go, THAT IS COMPLEX :D

from enry.

vmarkovtsev avatar vmarkovtsev commented on August 15, 2024

@dennwc Thanks so much! Feels much better now.

from enry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.