twitter / ccommon Goto Github PK
View Code? Open in Web Editor NEWCache Commons
License: Apache License 2.0
Cache Commons
License: Apache License 2.0
In include/cc_array.h
on line 112 (sha 05c6e1e) there is the line
return arr->data + arr->size * idx;
arr->data
is declared as void *
which has no defined operations on it. It is likely that changing it to a char *
(and cascading the requirements of that change) is sufficient.
There are 4 hashtables in Pelikan now, each only very slightly different. In general if we can create a hashtable template with the following components pluggable, we can serve them all. This is likely gonna require more macros, but we do use cc_queue.h anyway.
Additionally, it'd be nice to provide built-in per-entry locking support. This can be achieved by having an bit-mask for all entries, and set the corresponding bit to true when an entry is being accessed. The granularity of the lock can be adjusted by changing the mapping between bit-mask and hash entry (e.g. 2**n entries mapping to the same bit in mask, n being configurable).
Of course, if only we use a language that has proper templating...
We should use the same style guide for both projects.
Currently, ring_array
has a wpos
between 0
and cap - 1
(it uses modulo). However, this needs to reflect cap+1
possible states, which are having 0
to cap
elements.
If cap is 5, the array may have 0, 1, 2, 3, 4 or 5 elements.
See https://github.com/twitter/ccommon/pull/52/files#diff-24026a6092b5fb12913f83a58ff81b31R76 and https://github.com/twitter/ccommon/pull/52/files#diff-24026a6092b5fb12913f83a58ff81b31R98
According to the guidelines, we should be using unsigned v:1
instead[1]. At least one found in include/buffer/cc_buf.h, search for others and replace all of them.
[1] https://github.com/twitter/ccommon/blob/master/docs/c-styleguide.txt#L19-L23
Add a mock timer for tests that rely on timing, such as pipe tcp timewheel, these occasionally fail due to timing issues.
The documentation now explains the guarantees it should cover.
/Users/kyang/ccommon/src/cc_debug.c: In function 'debug_log_flush':
/Users/kyang/ccommon/src/cc_debug.c:106:23: warning: unused parameter 'arg' [-Wunused-parameter]
debug_log_flush(void *arg)
tcp_accept
in ccommon
didn't seem to be handling exception correctly. This type of behavior is hard to debug in production and should be captured in unittests
We should allow the option to force contiguous memory for preallocated pools of objects. This may be useful for resource pools where we want memory locality.
This most likely already happens because preallocation calls the allocate function in a tight loop. However, we rely on the memory allocator for this behavior, and depending on the implementation of it we may or may not actually end up with contiguous memory for a preallocated pool.
Some names in bstring module are confusing, we should rename them per @slyphon 's suggestion
Quote from twitter/pelikan#194
Jonathan:
bstring_set_raw might be better named bstring_set_literal
bstring_set_text might be better named bstring_set_char_p or bstring_from_char_ptr
Kevin:
I like bstring_set_literal. I think bstring_set_cstr is a clear name to me, not sure how you guys feel about it though.
Jonathan:
bstring_set_cstr is good. +1
Rust has CStr/CString to refer to a null-terminated string, so it follows that naming convention well.
we have two different directories for docs about the repo; notes and docs. we should merge these and prune what we don't need.
To subtract 16 bytes (metadata overhead for malloc) from the current default size, which is exactly 16KB
In the newer Linux and BSD kernels this will pass the flag to all accepted sockets by inheritance, saving a syscall per connection.
include/cc_signal.h
defines struct signal signals[SIGNAL_MAX];
which is transitively included in multiple files. This results in undefined behavior. The struct signal signals
should be declared as extern
in the header and defined in a single translation unit.
"- Use of extern should be considered as evil, if it is used in header files
to reference global variables."
Why?
Buffer operations are often error-prone, especially when raw pointers are used and the underlying buffer is moved before the references are discarded/reset. The one place where buffer movement happens is through cc_realloc()
, therefore, if we can make memory address change every time realloc
is called, we can detect such problems more quickly (these are nasty bugs to reproduce in production).
Since we wrap realloc
already, it should be easy to swap in our own implementation when debugging/testing. E.g. by turning that into a combo alloc
+ memcpy
, we are guaranteed to get a new address each time.
This will help us prevent problems like twitter/pelikan#98
Fix rust-enabled build
CI should be green.
CI is failing due to Rust-enabled build.
This started showing up in the last week without code change.
I noticed this asymmetry among different modules, and I was wondering if there's a reason for it or we should unify them. On some cases, the struct calls cc_alloc
and returns a pointer to the newly created struct[1], and in other cases, it receives a pointer to a memory address to use that has not yet being initialized[2]. The latter seems to be more generic, because it allows to be used with stack allocations.
[1] https://github.com/twitter/ccommon/blob/master/src/cc_log.c#L76
[2] https://github.com/twitter/ccommon/blob/master/include/cc_bstring.h#L52
I'm thinking about slowly converting all the user-defined _t type names into a few alternatives, so we are compliant with the style guide:
_e
for enum_f
for floating point numbers, regardless of size_i
for signed integers, regardless of size_u
for unsigned integers, regardless of size_p
for other pointer type_fn
for function pointer types_st
for structsIn general, we may want to discourage applying typedef on primitive types unless there's a good reason (e.g. different type representation on different platforms), types that may change in the future, or type names that are commonly used as alias.
Thoughts?
This also will aim to fix #78
In https://github.com/twitter/ccommon/blob/master/include/cc_log.h#L81 and in https://github.com/twitter/ccommon/blob/master/src/cc_log.c#L168 we declare _log_write
. The function is not static, and it seems to provide the public interface for logging. Why not call it log_write
instead?
So we can cap the maximum amount of memory used by each socket.
Mostly due to timing delays, should revisit to either relax timing (and reduce jitter's impact) or find another way that is timing independent.
In preparation to port fatcache to Pelikan, we need a non-cryptographic hash function that can generate message digests that are longer than what we use for hashtables (typically 32-bit) and have low collision rate.
After reading this post, and a few other blogposts, I think murmurhash3 is an excellent alternative for the SHA-1 that's currently used in fatcache.
I will copy and modify the C++ implementation (MIT license) into ccommon and see if any changes are necessary to make it work with a C compiler, another C version (provided by qLibc) doesn't seem to yield platform optimized code as the canonical version, and I will need to modify that as well to incorporate the source, so starting from the C++ version makes more sense.
Add a proc_macro for the config structs (eg: ccommon-stream, ccommon-debug, ...)
In #245 we added configuration structs for the ccommon components, however there is a significant amount of boilerplate.
I propose adding a proc_macro for generating code to reduce the boilerplate. It should be possible to define the overall configuration struct, with its default values in one location and let the proc_macro expand that out to a complete struct definition, accessors, and corresponding serde annotations.
https://github.com/twitter/ccommon/blob/master/include/cc_ring_array.h#L40
Move it from the header to the c file?
Current tcp_accept()
does not allow customized flags and sets the presumed ones (O_NONBLOCK
, TCP_NODELAY
) using separate calls.
A better interface would be to allow flags to be passed in, and take advantage of accept4
when possible.
accept4
avoids further syscalls when one wants to accept a connection and immediately apply certain flags to the socket.
It is relatively recent (glibc 2.10 and linux 2.6.28) and is missing from the current osx versions, so a compile time check and a fallback implementation is needed to make it work universally.
https://github.com/twitter/ccommon/blob/master/src/cc_rbuf.c#L94
Shouldn't it receive an rbuf**
instead?
Investigate a setup where channel type and buffer type are both pluggable, what would the stream setup look like then? Reference: #76
It appears that simply calling timeout_event_create causes a segfault, when the newly created timeout_event is being reset:
$r
Process 72759 launched: './pelikan_slimcache' (x86_64)
load config from ../template/slimcache.conf
Process 72759 stopped
* thread #1: tid = 0x268df5, 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
frame #0: 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61
58 t->free = false;
59
60 TAILQ_NEXT(t, tqe) = NULL;
-> 61 TAILQ_PREV(t, tevent_tqh, tqe) = NULL;
62 t->cb = NULL;
63 t->data = NULL;
64 t->recur = false;
(lldb) bt
* thread #1: tid = 0x268df5, 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
* frame #0: 0x00000001000192a5 pelikan_slimcache`timeout_event_reset(t=0x0000000100103970) + 37 at cc_wheel.c:61
frame #1: 0x000000010001931b pelikan_slimcache`timeout_event_create + 43 at cc_wheel.c:80
frame #2: 0x000000010001e53e pelikan_slimcache`main + 5 at main.c:62
frame #3: 0x000000010001e539 pelikan_slimcache`main(argc=<unavailable>, argv=<unavailable>) + 201
frame #4: 0x00007fff89b425fd libdyld.dylib`start + 1
We will be dropping our paid Travis CI plan at the end of 2021. We do not expect there to be any visible changes to this repo, but wanted to give some notice just in case. We recommend migrating CI jobs to GitHub Actions.
Travis CI provides free testing for open source projects. In addition, Twitter has paid for a small number of additional concurrent builds which were available for open source as well as private repositories. Many Twitter projects have already moved to GitHub Actions for CI, and we have no private repos left using Travis, so we will be discontinuing our plan at the end of 2021.
Since this repo is open source, we do not expect this change to impact Travis CI builds for this project. However, we still recommend most Twitter projects to migrate to GitHub Actions for CI at your convenience.
I was writing a test for tcp_maximize_sndbuf
and it was easy on OS X, but it's current implementation makes no sense on Linux (Ubuntu).
The implementation does a binary search with these values:
134250496 0
201342976 0
234889216 0
251662336 0
260048896 0
264242176 0
266338816 0
267387136 0
267911296 0
268173376 0
268304416 0
268369936 0
268402696 0
268419076 0
268427266 0
268431361 0
268433409 0
268434433 0
268434945 0
268435201 0
268435329 0
268435393 0
268435425 0
268435441 0
268435449 0
268435453 0
268435455 0
268435456 0
The left column is the attempted value and the right column is the status
received. It can be seen that the status is always 0, meaning that setsockopt
never fails even if the value used is higher than the maximum value possible. Calling getsockopt at the end returns the value 425984
(at least on my machine), which is lower than any value attempted.
Should we set it to INT_MAX
instead of doing the binary search?
It currently receives an int
timeout in milliseconds. We now have a better way to represent timeouts. I can implement this if we agree it is better.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.