adroll / baker Goto Github PK
View Code? Open in Web Editor NEWBaker is a high performance, composable and extendable data-processing pipeline for the big data era
Home Page: https://getbaker.io
License: MIT License
Baker is a high performance, composable and extendable data-processing pipeline for the big data era
Home Page: https://getbaker.io
License: MIT License
On an AWS EC2 instance of size
c5.2xlarge
, Baker can read zstandard records from S3, uncompress them and apply a basic filtering logic, compressing them back on local files using ~90% of capacity of each vCPU (8 in total) and ~3.5GB of RAM.It reads and writes a total of 94 million records in less than 9 minutes, that’s 178k records per second.
On a
c5.2xlarge
instance (48 vCPUs) the same test takes 2 minutes, so that’s a speed of 775k records per second.
This excerpt is from https://getbaker.io/docs/performance/, the readme correctly states the last one is a c5.12xlarge
instance.
install baker in windows 10 with command go get github.com/AdRoll/baker
go: github.com/AdRoll/baker upgrade => v0.0.0-20201209102217-af7b7c12682b
go: downloading github.com/charmbracelet/glamour v0.2.0
go: downloading github.com/sirupsen/logrus v1.4.2
go: downloading github.com/rasky/toml v0.1.1-0.20160309013025-90bcb678a72a
go: downloading github.com/konsorten/go-windows-terminal-sequences v1.0.2
go: downloading golang.org/x/sys v0.0.0-20200413165638-669c56c373c4
go: downloading github.com/yuin/goldmark v1.2.0
go: downloading github.com/muesli/termenv v0.6.0
go: downloading github.com/muesli/reflow v0.1.0
go: downloading github.com/mattn/go-runewidth v0.0.9
go: downloading github.com/alecthomas/chroma v0.7.3
go: downloading github.com/microcosm-cc/bluemonday v1.0.2
go: downloading github.com/olekukonko/tablewriter v0.0.4
go: downloading github.com/lucasb-eyer/go-colorful v1.0.3
go: downloading github.com/google/goterm v0.0.0-20190703233501-fc88cf888a3f
go: downloading github.com/dlclark/regexp2 v1.2.0
go: downloading github.com/danwakefield/fnmatch v0.0.0-20160403171240-cbb64ac3d964
# github.com/AdRoll/baker
..\..\dvgamerr\go\pkg\mod\github.com\!ad!roll\[email protected]\help_markdown.go:56:38: not enough arguments in call to syscall.Syscall
..\..\dvgamerr\go\pkg\mod\github.com\!ad!roll\[email protected]\help_markdown.go:56:39: undefined: syscall.SYS_IOCTL
..\..\dvgamerr\go\pkg\mod\github.com\!ad!roll\[email protected]\help_markdown.go:58:11: undefined: syscall.TIOCGWINSZ
go version go1.15.6 windows/amd64
Thanks.
I check the quick start example and get this error
# github.com/AdRoll/baker/input/inpututils
..\..\..\go\pkg\mod\github.com\!ad!roll\[email protected]\input\inpututils\fastreader.go:59:3: unknown field 'Setpgid' in struct literal of type syscall.SysProcAttr
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.