Giter VIP home page Giter VIP logo

drtmh's People

Contributors

dst2019 avatar roccrtx avatar windybeing avatar wxdwfc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

drtmh's Issues

Is RTM necessary

Hi there,
I wonder whether the repo can be deployed in a cluster without RTM?

config_template.xml is not available.

Hi,

I am trying your code on our RDMA-enabled cluster. However I cannot find the config_template.xml file to run your sample program. Could you please upload one?

make noccocc时building src/core/rworker.cc出错

您好,感谢开源DrTM+H!我在使用过程中,遇到了一个问题:
背景:按照readme中 build DrTM+H, 按照推荐参数设置 cmake成功,但是在make noccocc这一步出错:
/src/core/mworker.cc:170:110: error: no matching function for call to 'rdmato:DAdapter::lDAdapterrdmato::Rmaltrld, rdmaio:RWicHandconst unsigned int&,int, std:: Bind helper<false, bol (noc::oltp::RRpc::)(char*, int, int), nocc::oltp::RRpc*b, const std:: Plconst std::_Placeholder<2>&,const std:: placeholder<3>6>;;type)'std::placeholders::1,std;:placeholders;: 2,std;:placeholders:;: 3));

/rc/core/rworker.cc:201:108: error:matching function for call to 'rdmaio::JDAdapter::UDAdapter(rdmaio::RdmaCtrl*&, rdmaio::RNicHandlerrdmaio::MemoryAttr,const unsigned int&, int, std:: Bind helper<false, bool (noccoltp::RRpc::)char, int, int), nocc::oltp::RRpc*&, const std:: Plac:holder<1>&,const std;: placeholder<2>&, const std:: placeholder<3>>;;type)
std::placeholders::1,std::placeholders:: 2,std::placeholders:: 3));

/src/core/rworker.cc:216:46: error:invalid new-express ion of abstract class type 'nocc::Adapteworker_id,queue); In file included from /home/DRTMH/src/core/rworker.cc:11:0

CMakeFiles/noccocc.dir/build.make:878: rec ipe for target 'CMakeFiles/noccocc.dir/src/core/rworker.cc.o' failedmake[3]: ***[CMakeFiles/noccocc.dir/src/core/rworker.cc.o] Error1CMakeFiles/Makefile2:107: recipe for target 'CMhkeFiles/noccocc.dir/all' failedmake[2]: ***[CMakeFiles/noccocc.dir/all] Error 2CMakeFiles/Makefile2:119: recipe for target 'CMakeFiles/noccocc.dir/rule' failedmake[1]: ***[CMakeFiles/noccocc.dir/rule] Error 2Makefile:177: recipe for target 'noccocc' failednake: ***[noccoccl Error 2

希望您可以解答,谢谢!

Segmentation Fault when running the sample code

Hi,

I am trying to run the sample script but I encounter a Segmentation Fault error. Do you have any suggestions on resolving this issue?

I used the hosts.xml file and config.xml file as mentioned in the README. Here is the command I use:
./run2.py config.xml noccocc "-t 24 -c 10 -r 100" tpcc 3

The output is like this: (I added some more output on my own):

[START] Input parsing done.
[START] cleaning remaining processes.
ssh -n -f nerv2 "cd /home/chao/git_repos/rocc/scripts/ && rm log"
ssh -n -f nerv2 "cd /home/chao/git_repos/rocc/scripts/ && ./noccocc --bench tpcc --txn-flags 1 --verbose --config config.
xml --id 1 -t 24 -c 10 -r 100 -p 3 1>log 2>&1 &"
ssh -n -f nerv3 "cd /home/chao/git_repos/rocc/scripts/ && rm log"
ssh -n -f nerv3 "cd /home/chao/git_repos/rocc/scripts/ && ./noccocc --bench tpcc --txn-flags 1 --verbose --config config.
xml --id 2 -t 24 -c 10 -r 100 -p 3 1>log 2>&1 &"
cd /home/chao/git_repos/rocc/scripts/ && ./noccocc --bench tpcc --txn-flags 1 --verbose --config config.xml --id 0 -t 24
-c 10 -r 100 -p 3
NOCC started with program [noccocc]. at 06-09-2018 09:36:48
[tpcc] settings:
new_order_remote_item_pct : 1
uniform_item_dist : 0
micro dist :20
[bench_runner.cc:324] Use TCP port 8888
[bench_runner.cc:346] use scale factor: 72; with total 24 threads.
[view.h:48] Start with 0 backups.
[view.cc:10] total 3 backups to assign
Txn NewOrder, 100
Remote counts: 100
NAIVE: 4[util.cc:164] huge page alloc failed!
[librdma] get device name mlx4_0, idx 0
[librdma] : Device 0 has 1 ports
[bench_runner.cc:153] Total logger area 0.00585938G.
[bench_runner.cc:163] add RDMA store size 4.88281G.
[bench_runner.cc:172] [Mem] RDMA heap size 8.03902G.
[util.cc:164] huge page alloc failed!
[util.cc:164] huge page alloc failed!
[NOCC] Meet a segmentation fault!
stack trace:
./noccocc() [0x4b3bb8]
/lib64/libc.so.6 : ()+0x35270
/lib64/libc.so.6 : ()+0x8981d
./noccocc : MemDB::AddSchema(int, TABLE_CLASS, int, int, int, int, bool)+0x105
./noccocc : nocc::oltp::tpcc::TpccMainRunner::init_store(MemDB
&)+0xe0
./noccocc : nocc::oltp::BenchRunner::run()+0x3d4
./noccocc : nocc::oltp::tpcc::TpccTest(int, char
*)+0x143
./noccocc : main()+0x589
/lib64/libc.so.6 : __libc_start_main()+0xf5
./noccocc() [0x47813c]

Thanks!

link_connect_qps() retries to link all qps forever

Hi rocc developers,

I am trying to use the rocc framework in my own code to support rdma-based communication. Basically I am trying to use the RWorker class to as the base class in my own code to model thread creation and routine scheduling. My thread class inherits the RWorker class just like bench_workers did in rocc's code. However, I met with some difficulties using the rocc framework as well as the librdma library. As one simple demo, I started one server node and one client node. The server node spawns 4 RWorkers and the client node spawns 3 RWorkers. At the time when the client finished initializing all the 4 RWorkers, some of the server workers stuck in the rdmaio::RdmaCtrl::link_connect_qps() function and cannot connect qps successfully to the other node and thus they retry forever (see the while(1) loop in the link_connect_qps function). Essentially, the PreConnector::get_send_socket() function called by the Qp::connect_rc() function will always return a negative socket value and thus will cause the next retry. Even the recv_thread spawned by the librdma library are good in consistently accepting new tcp connection requests, get_send_socket function consistently fail. I noticed that link_connect_qps() function will retry every 200ms until all qps in the cluster are linked. Is this guranteed to work correctly? In my case, it indeed connects forever. I am wondering if you guys have any idea to help me solve this issue. Thank you!

Issue with rlib: LOG_ERROR not found

Hi,

When I clone and try to build the repository as described in the README, the make stops at some point because it cannot find the LOG_ERROR in drtmh/third_party/rlib/ud_adapter.hpp.

One fast and simple workaround seems to be defining it in the .hpp file (#define LOG_ERROR 5, based on the value given in the drtmh/src/core/logging.h). Probably not ideal, but it seems it works.

Thanks.

ASSERT(cm_->open_thread_local_device(idx) != nullptr) in src/core/rworker.cc

Hi,

I would like to ask you if this assertion is something you have experienced before? Before the assertion, there are some warning messages about the query port_id 1 on device 1 not being active.

In order to build the project, I used the suggested flags (cmake -DUSE_RDMA=1 -DONE_SIDED_READ=1 -DROCC_RBUF_SIZE_M=13240 -DRDMA_STORE_SIZE=5000 -DRDMA_CACHE=0 -DTX_LOG_STYLE=2).

When I run: ./run2.py config.xml noccocc "-t 24 -c 10 -r 100" bank 2 (I use the default config.xml and I have added two (2) hostnames in the hosts.xml file), I get the output below.

I have also set the use_port_ to be 0 in RWorker::choose_rnic_port() as suggested in #2, since I have 1 NIC per machine. Furthermore, I have done the change as described in #4.

I would appreciate any feedback.

Thank you.

Output:

NOCC started with program [noccocc]. at 08-06-2021 11:04:12
[bench_runner.cc:303] Use TCP port 33333
[bench_runner.cc:325] use scale factor: 24; with total 24 threads.
[view.h:48] Start with 0 backups.
[view.cc:10] total 2 backups to assign
[Bank]: check workload 25, 15, 15, 15, 15, 15
[util.cc:167] huge page real size 12.9316G
[rnic.hpp:60] query port_id 1 on device 1 not active.
[bench_runner.cc:135] Total logger area 0.00390625G.
[bench_runner.cc:146] add RDMA store size 4.88281G.
[bench_runner.cc:156] First 4.88867G are left over.
[bench_runner.cc:159] RDMA heap size 8.041G.
[util.cc:167] huge page real size 0.294922G
[util.cc:167] huge page real size 0.294922G
[Bank], total 4800000 accounts loaded
[bank_main.cc:262] check cv balance 46280
[Runner] local db size: 220.746 MB
[Runner] Cache size: 0 MB
[bench_runner.cc:210] backed list num: 0
[bench_listener2.cc:70] try log results to ./results/noccocc_bank_2_24_10_100.log
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rdma_ctrl_impl.hpp:82] wrong dev_id: -1; total 2 found
[rworker.cc:106] Assertion!
[rnic.hpp:60] query port_id 1 on device 1 not active.
[NOCC] Meet an assertion failure!
stack trace:
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
[rnic.hpp:60] query port_id 1 on device 1 not active.
./noccocc() [0x4c0225]
/lib/x86_64-linux-gnu/libc.so.6 : ()+0x354c0
/lib/x86_64-linux-gnu/libc.so.6 : gsignal()+0x38
/lib/x86_64-linux-gnu/libc.so.6 : abort()+0x16a
./noccocc : nocc::MessageLogger::~MessageLogger()+0x2ee
./noccocc : nocc::oltp::RWorker::init_rdma(char*, unsigned long)+0x452
./noccocc : nocc::oltp::BenchWorker::run()+0x2d1
./noccocc : ndb_thread::pthread_bootstrap(void*)+0xf
/lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x76ba
/lib/x86_64-linux-gnu/libc.so.6 : clone()+0x6d
[ENDING] End benchmarks
[ENDING] send ending messages in SIGINT handler
[ENDING] kill processes
node0 password:
node1 password:
kill try 0
node0 password:
node1 password:
Kill done
[ENDING] kill processes done

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.