Giter VIP home page Giter VIP logo

qfs's People

Contributors

beol avatar christopherwirt avatar coderfi avatar comptonqc avatar cs2716 avatar dependabot[bot] avatar echenj avatar erolosty avatar fsareshwala avatar klabeeva avatar kstinsonqc avatar maxrabin avatar mckurt avatar michaelkamprath avatar mikeov avatar noahgoldman avatar onlyjob avatar quantcast-engineering avatar rr0gi avatar sday-qc avatar sfllaw avatar soarez avatar ssalevan avatar teodor-pripoae avatar ts2012 avatar wjiang-qc avatar zimmix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qfs's Issues

File size increased incredible when appending

Recently, I did two tests to check the appending function of Kfs, one was from command line, the other one was from java API, but both failed. The tests I made are below:

Command line:

  1. Created a file.txt, which the content was "abcdefghi", the size shown was 11 bytes.
  2. Created file1.txt which was copied from file.txt
  3. In kfsshell, run command: "append file.txt file1.txt".
  4. I got the message "append status: offset: 402653184". And file.txt size changed to 0, file1.txt changed to 402653184!
    file.txt <rs 1,6+3> rw-rw-r-- jeff jeff 0 Aug 13 14:21
    file1.txt <rs 1,6+3> rw-rw-r-- jeff jeff 402653195 Aug 13 14:21
  5. Executed "qfscat" command, I got the following error messages:
    /qfs/tmp/file1.txt: Input/output error 5
    08-13-2013 15:16:43.740 ERROR - (Reader.cc:1637) PW 0,52,3,file1.txt no such chunk -- hole: pos: 67108864 + 0 requested: 720896
    08-13-2013 15:16:43.740 INFO - (RSStriper.cc:1806) PW 0,52,3,file1.txt read failure: req: 0,4194304 status: -1007 stripe: 1 bad: 0 round: 0
    08-13-2013 15:16:43.740 INFO - (RSStriper.cc:1892) PW 0,52,3,file1.txt init recovery: req: 0,4194304 pos: 0 size: 720896 [0,720896)
    08-13-2013 15:16:43.740 ERROR - (Reader.cc:1637) PW 0,52,3,file1.txt no such chunk -- hole: pos: 134217728 + 0 requested: 720896
    08-13-2013 15:16:43.740 INFO - (RSStriper.cc:1806) PW 0,52,3,file1.txt read failure: req: 0,4194304 status: -1007 stripe: 2 bad: 1 round: 0
    08-13-2013 15:16:43.740 ERROR - (Reader.cc:1637) PW 0,52,3,file1.txt no such chunk -- hole: pos: 201326592 + 0 requested: 720896
    08-13-2013 15:16:43.740 INFO - (RSStriper.cc:1806) PW 0,52,3,file1.txt read failure: req: 0,4194304 status: -1007 stripe: 3 bad: 2 round: 0
    08-13-2013 15:16:43.740 ERROR - (Reader.cc:1637) PW 0,52,3,file1.txt no such chunk -- hole: pos: 268435456 + 0 requested: 655360
    08-13-2013 15:16:43.740 INFO - (RSStriper.cc:1806) PW 0,52,3,file1.txt read failure: req: 0,4194304 status: -1007 stripe: 4 bad: 3 round: 0
    08-13-2013 15:16:43.740 ERROR - (Reader.cc:1637) PW 0,52,3,file1.txt no such chunk -- hole: pos: 335544320 + 0 requested: 655360
    08-13-2013 15:16:43.740 INFO - (RSStriper.cc:1806) PW 0,52,3,file1.txt read failure: req: 0,4194304 status: -1007 stripe: 5 bad: 4 round: 0
    08-13-2013 15:16:43.770 ERROR - (Reader.cc:1294) PW 0,52,3,file1.txt short read detected: chunk: 345 version: 1 server: 127.0.0.1 21002 pos: 0 requested: 720896 returned: 11 size: 11
    08-13-2013 15:16:43.770 WARN - (Reader.cc:2196) PW 0,52,3,file1.txt invalid chunk: 345 version: 1 status: -1007 msg: short read detected
    08-13-2013 15:16:43.770 ERROR - (Reader.cc:1468) PW 0,52,3,file1.txt operation failure, seq: 580520357837216307 status: -1007 msg: short read detected op: read: chunkid: 345 version: 1 offset: 0 numBytes: 720896 iotm: 0.013235 skip-disk-chksum current chunk server: 127.0.0.1 21002 chunkserver: all data sent
    Request:
    READ
    Cseq: 580520357837216307
    Version: KFS/1.0
    Client-Protocol-Version: 114
    UserId: 500
    GroupId: 500
    User: jeff
    Max-wait-ms: 30000
    Chunk-handle: 345
    Chunk-version: 1
    Offset: 0
    Num-bytes: 720896
    Skip-Disk-Chksum: 1
    08-13-2013 15:16:43.770 INFO - (Reader.cc:1583) PW 0,52,3,file1.txt scheduling retry: 0 of 30 in 0 sec. op: read: chunkid: 345 version: 1 offset: 0 numBytes: 720896 iotm: 0.013235 skip-disk-chksum
    08-13-2013 15:16:43.778 ERROR - (Reader.cc:1294) PW 0,52,3,file1.txt short read detected: chunk: 345 version: 1 server: 127.0.0.1 21001 pos: 0 requested: 720896 returned: 11 size: 11
    08-13-2013 15:16:43.778 WARN - (Reader.cc:2196) PW 0,52,3,file1.txt invalid chunk: 345 version: 1 status: -1007 msg: short read detected
    08-13-2013 15:16:43.778 INFO - (RSStriper.cc:1806) PW 0,52,3,file1.txt read failure: req: 0,4194304 status: -1007 stripe: 0 bad: 5 round: 0
    08-13-2013 15:16:43.778 INFO - (RSStriper.cc:1976) PW 0,52,3,file1.txt read recovery failed: req: 0,4194304 status: -5 round: 0 bad stripes: 6 turning on read retries
    08-13-2013 15:16:43.778 ERROR - (RSStriper.cc:2016) PW 0,52,3,file1.txt read recovery failed: req: 0,4194304 status: 0 bad stripes: 6 invalid chunks: 6 pending: 0
  6. cpfromqfs failed with the same message, but with "-S" argument successfully.

API:

  1. Created file.txt with "kfsAccess.kfs_create(String fileName)"
  2. Appended it with "kfsAccess.kfs_append(String fileName)", it returned:
    08-13-2013 14:29:39.571 INFO - (WriteAppender.cc:1436) PW 0,49,2,file.txt scheduling retry: 1 of 30 in 5 sec. op: allocate: fid: 49 offset: 0
    08-13-2013 14:29:44.576 ERROR - (WriteAppender.cc:1333) PW 0,49,2,file.txt operation failure, seq: 1968023330218062915 status: -22 msg: append is not supported with striped files op: allocate: fid: 49 offset: 0 current chunk server: -1 chunkserver: no data sent
    Request:
    ALLOCATE
    Cseq: 1968023330218062914
    Version: KFS/1.0
    Client-Protocol-Version: 114
    UserId: 500
    GroupId: 500
    User: jeff
    Max-wait-ms: 30000
    Client-host: centos.localdomain
    Pathname: /qfs/tmp/file.txt
    File-handle: 49
    Chunk-offset: 0
    Chunk-append: 1
    Space-reserve: 11
    Max-appenders: 64
  3. Created file.txt with "STRIPED_FILE_TYPE_NONE:
    kfsAccess.kfs_create_ex(fileName,
    kfsAccess.DEFAULT_APPEND_REPLICATION,
    false, -1, -1,
    kfsAccess.DEFAULT_NUM_STRIPES,
    kfsAccess.DEFAULT_NUM_RECOVERY_STRIPES,
    kfsAccess.DEFAULT_STRIPE_SIZE,
    1, // STRIPED_FILE_TYPE_NONE
    false,
    0666);
    Now I have a file the strip type is 'r', which is different to the one created from command line('rs').
    file.txt <r 2> rw-rw-r-- jeff jeff 11 Aug 13 14:33
  4. Tried the 2nd step again, it was success, but the size changed to 134217728
    file.txt <r 2> rw-rw-r-- jeff jeff 134217728 Aug 13 14:34
  5. Cat the content, it returned error:
    08-13-2013 14:38:09.537 ERROR - (Reader.cc:1294) PW 0,50,3,file.txt short read detected: chunk: 336 version: 1 server: 127.0.0.1 21002 pos: 1048576 requested: 1048576 returned: 0 size: 11
    08-13-2013 14:38:09.537 WARN - (Reader.cc:2196) PW 0,50,3,file.txt invalid chunk: 336 version: 1 status: -1007 msg: short read detected
    08-13-2013 14:38:09.537 ERROR - (Reader.cc:1468) PW 0,50,3,file.txt operation failure, seq: 0 status: -1007 msg: short read detected op: read: chunkid: 336 version: 1 offset: 1048576 numBytes: 1048576 iotm: 0 skip-disk-chksum current chunk server: 127.0.0.1 21002 chunkserver: all data sent
    Request:
    READ
    Cseq: 0
    Version: KFS/1.0
    Client-Protocol-Version: 114
    UserId: 500
    GroupId: 500
    User: jeff
    Chunk-handle: 336
    Chunk-version: 1
    Offset: 1048576
    Num-bytes: 1048576
    Skip-Disk-Chksum: 1
  6. cpfromqfs failed without "-S" argument.

I have 3 questions:

  1. Why the file create with 'rs' mode(striped) from command line can be appended, but API failed? when a file has be created, normally we cannot guarantee it will not been appended in the future, so if 'rs' mode works for appending (like shell command does), why we have to create it with 'r' mode?
  2. why the size shows normal before it has been appended but incorrect after?
    I found some one explained it's because of the strip size increased. But how I can compute the exactly quota consumed? Is there any way to shrink it to the right number?
  3. howevery the cpformqfs command has '-S' argument to skip holes, and return the useful content, how about qfscat and API?

QFS Hibernate - "-f" option - No such file or directory

QFS 1.1.1 - I'm attempting to hibernate a QFS Chunk Server but don't know what to enter for the "-f" option. I've attempted the qfs_chunk.prp and qfs_meta.prp files. Here is what I'm attempting:

./qfshibernate -m metaIP -p 20000 -c chunkIP -d 30000 -s 120 -v -f /etc/qfs_chunk.prp
02-09-2015 23:11:24.869 DEBUG - version: 1.1.1-f94fead4fb99abd3837059205b589264ff43818b-Debug-18B26807 1.1.1-https://github.com/quantcast/qfs.git/master@f94fead4fb99abd3837059205b589264ff43818b
02-09-2015 23:11:24.873 DEBUG - (KfsNetClient.cc:1265) connecting to server: 10.30.0.75 20000 auth: off
02-09-2015 23:11:24.874 DEBUG - (MonClient.cc:164) op completed: retire chunk server: 10.30.0.91 30000 down time: 120 status: -2 status: -2 msg:
02-09-2015 23:11:24.874 ERROR - (qfshibernate_main.cc:123) hibernate failure: No such file or directory 2

Can someone point me in the right direction for the correct syntax?

Thank you.

Z

compile ARM - qcrs/prim.h

Trying to compile on an ARM box:

In file included from /root/qfs/src/cc/qcrs/rs_table.h:33:0,
from /root/qfs/src/cc/qcrs/decode.c:32:
/root/qfs/src/cc/qcrs/prim.h: In function ‘mask’:
/root/qfs/src/cc/qcrs/prim.h:41:5: error: incompatible types when returning type ‘int’ but ‘v16’ was expected
/root/qfs/src/cc/qcrs/prim.h: In function ‘mul2’:
/root/qfs/src/cc/qcrs/prim.h:49:8: error: incompatible types when assigning to type ‘v16’ from type ‘int’

Not sure if this has to do with SSE functions?

I think the CMakeLists.txt files might need to be updated to support ARM?

Please advise.

-alex

throughout is only 50MB

netspeed:1000Mb
replication:3

sample code:

int localfid = open("tests/testlog500m", O_RDONLY);

do{
| readlocallen = read(localfid, dataBuf, writeSizeBytes);
| strm.avail_in = readlocallen;
| if(readlocallen > 0) {
| | gettimeofday(&s1, NULL);
| | res = gKfsClient->Write(fd, dataBuf, writeSizeBytes);
| | gettimeofday(&s2, NULL);
| | timediff = (s2.tv_sec * 1000000.0 + s2.tv_usec) - (s1.tv_sec * 1000000.0 + s1.tv_usec);
| | timeall += timediff;
| }
| nwrote += writeSizeBytes;
} while (readlocallen > 0);
close(localfid);
cout << "write of " << nwrote / (1024 * 1024) / timeall * 1000000<< " (MB/s) is done" << endl;

why?

it is my first time to use github,sorry.

recordappend_test fails

It would appear that any call to gKfsClient->AtomicRecordAppend fails.

This seems to fail because when calling gKfsClient->Open, the entry.openmode doesn't get set to O_APPEND when it is specified as such.

Anyone else run in to this issue?

-alex

Compile warning on chunk/utils.cc

Compiling on F18 produces the following warning:
cd /home//rpmbuild/BUILD/qfs-1.0.2/src/cc/tests && /usr/bin/cmake -E cmake_link_script CMakeFiles/dirscan_test.dir/link.txt --verbose=1
/usr/bin/c++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -DBOOST_SP_USE_QUICK_ALLOCATOR -g -Wl,-z,relro CMakeFiles/dirscan_test.dir/dirscan_test_main.o -o dirscan_test -rdynamic ../tools/libqfs_tools.a ../libclient/libqfs_client.a ../qcdio/libqfs_qcdio.a -lpthread ../kfsio/libqfs_io.a ../common/libqfs_common.a ../qcdio/libqfs_qcdio.a -lpthread -lz -lboost_regex-mt -lrt -lcrypto ../qcrs/libqfs_qcrs.a
/home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk/utils.cc: In function 'void KFS::die(const string&)':
/home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk/utils.cc:60:37: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result]

Implement C Bindings for the QFS API

There are several cases where having a straight C binding for QFS would ease creating wrappers for use in other languages or projects where a C++ compiler is either inconvenient or unavailable.

Implemented a caching API to query chunkserver status

GitHub user [echenj|https://github.com/echenj] has created an issue on our public [GitHub Issues|https://github.com/quantcast/qfs/issues] list.

This ticket is a mirror of the first post of the public ticket on github. The purpose of this ticket is to have an owner of the public response. The owner is responsible for researching the answer and responding to the public issue in a timely manner.

#46

Issue Description
{code}
The metaserver ping function has been modified to include a caching system. The ping function itself has also been modified to return a Status object instead of modifying the Status passed as a parameter. The cache refresh time is 30 seconds, but will not refresh unless a user asks for data. The added queries will pull information from the cache.
The three new queries are:

  • /query/stats/<hostname/ip>
    • Displays all of the attributes for any UpServer instance
  • /query/dead/<hostname/ip>
    • Displays the current status of any server, and also the entire disconnect history
  • /query/dead/count
    • Displays all of the servers that have any recorded disconnects and their disconnect counts
      {code}

Supporting Multiple Users From Hadoop

I am wondering if there are plans to support multiple users from Hadoop. Currently, it seems that a MapReduce task that opens a QFS file system will assume the uid and gid of the task JVM.

In our case that means all QFS file interactions were occurring as the mapred user. Since we are integrating QFS into an existing cluster that has multiple users owning files in HDFS, making all files accessible (and owned) by mapred was not acceptable.

Our approach was to modify the initialize() method of the QuantcastFileSystem class to check for some Hadoop side information to try to determine the Hadoop user instead of just the user running the JVM. We first created a check for three config variables for user ID, group ID, and alternate group IDS (qfs.uid, qfs.gid, qfs.gids). If these were found we called a new QFSImpl constructor with the values. If the config variables were not found, we pull user and groups from a Hadoop UserGroupInformation class and pass those down to a second new QFSImpl constructor. In both cases, we make calls down into the C++ layer to set the effective user and group.

The reason we did the two different pathways was that some of our users do not exist as local accounts on all of our Hadoop nodes. Therefore looking up the hadoop user and group locally to get uid and gid would fail. Therefore explicitly setting the IDs in the Hadoop config variables gave a way for these users to create and read their files on every node.

What other ways are people handling multiple users and file owners?

Error when running mapreduce on QFS

Hi,

I have an issue when running mapreduce on qfs.
Everything works just fine when using bin/hadoop fs commands.
When I tried to run wordcount, it says:

15/06/24 09:48:40 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:18040
15/06/24 09:48:41 INFO input.FileInputFormat: Total input paths to process : 1
15/06/24 09:48:41 INFO mapreduce.JobSubmitter: number of splits:1
15/06/24 09:48:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1435157289542_0001
15/06/24 09:48:41 INFO impl.YarnClientImpl: Submitted application application_1435157289542_0001
15/06/24 09:48:41 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1435157289542_0001/
15/06/24 09:48:41 INFO mapreduce.Job: Running job: job_1435157289542_0001
15/06/24 09:48:47 INFO mapreduce.Job: Job job_1435157289542_0001 running in uber mode : false
15/06/24 09:48:47 INFO mapreduce.Job: map 0% reduce 0%
15/06/24 09:48:52 INFO mapreduce.Job: Task Id : attempt_1435157289542_0001_m_000000_0, Status : FAILED
Exception from container-launch: ExitCodeException exitCode=1:
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

15/06/24 09:48:57 INFO mapreduce.Job: Task Id : attempt_1435157289542_0001_m_000000_1, Status : FAILED
Exception from container-launch: ExitCodeException exitCode=1:
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

15/06/24 09:49:02 INFO mapreduce.Job: Task Id : attempt_1435157289542_0001_m_000000_2, Status : FAILED
Exception from container-launch: ExitCodeException exitCode=1:
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

15/06/24 09:49:09 INFO mapreduce.Job: map 100% reduce 100%
15/06/24 09:49:09 INFO mapreduce.Job: Job job_1435157289542_0001 failed with state FAILED due to: Task failed task_1435157289542_0001_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/06/24 09:49:09 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11945
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=11945
Total vcore-seconds taken by all map tasks=11945
Total megabyte-seconds taken by all map tasks=12231680
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0

The logs for container is:
/staging/fubo/.staging/job_1435157289542_0002/job_1435157289542_0002_1.jhist to qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002-1435157856115-fubo-word+mean-1435157882595-0-0-FAILED-default-1435157860419.jhist_tmp
2015-06-24 09:58:02,623 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002-1435157856115-fubo-word+mean-1435157882595-0-0-FAILED-default-1435157860419.jhist_tmp
2015-06-24 09:58:02,623 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying qfs://localhost:20000/tmp/hadoop-yarn/staging/fubo/.staging/job_1435157289542_0002/job_1435157289542_0002_1_conf.xml to qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002_conf.xml_tmp
2015-06-24 09:58:02,631 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done location: qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002_conf.xml_tmp
2015-06-24 09:58:02,632 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002.summary_tmp to qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002.summary
2015-06-24 09:58:02,632 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002_conf.xml_tmp to qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002_conf.xml
2015-06-24 09:58:02,633 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to done: qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002-1435157856115-fubo-word+mean-1435157882595-0-0-FAILED-default-1435157860419.jhist_tmp to qfs://localhost:20000/tmp/hadoop-yarn/staging/history/done_intermediate/fubo/job_1435157289542_0002-1435157856115-fubo-word+mean-1435157882595-0-0-FAILED-default-1435157860419.jhist
2015-06-24 09:58:02,633 INFO [Thread-53] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2015-06-24 09:58:02,636 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to Task failed task_1435157289542_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

2015-06-24 09:58:02,638 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is http://localhost:19888/jobhistory/job/job_1435157289542_0002
2015-06-24 09:58:02,645 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for application to be successfully unregistered.
2015-06-24 09:58:03,646 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:4 ContRel:0 HostLocal:1 RackLocal:0
2015-06-24 09:58:03,648 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory qfs://localhost:20000 /tmp/hadoop-yarn/staging/fubo/.staging/job_1435157289542_0002
2015-06-24 09:58:03,650 INFO [Thread-53] org.apache.hadoop.ipc.Server: Stopping server on 55169
2015-06-24 09:58:03,653 INFO [IPC Server listener on 55169] org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 55169
2015-06-24 09:58:03,653 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-06-24 09:58:03,653 INFO [TaskHeartbeatHandler PingChecker] org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler thread interrupted

Can anyone tell me what's the problem? Please tell me if I need to provide more information about it

Softlinks in qfs

Implement softlinks in qfs. Once this is done, move this to multi master branch.

Error in compiling fuse module on slackware

Hi,

I'm building latest git version of qfs on Slackware.
It's build fine except the fuse module that has the following error:

kfs_fuse_main.cc:475:20: error: ‘fork’ was not declared in this scope

I solved the issue by including also unistd.h into the file kfs_fuse_main.cc

The machine I'm using has the following configuration

OS: Slackware 14.0 64bit
Compiler: gcc-4.7.1
Libs: glibc-2.15 fuse-2.8.5

Hope that can help and sorry for my bad english.

Compile fails to find boost_system

Compiling on Fedora 18:
cd /home//rpmbuild/BUILD/qfs-1.0.2/src/cc/devtools && /usr/bin/cmake -E cmake_link_script CMakeFiles/stlset.dir/link.txt --verbose=1
/usr/bin/c++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -DBOOST_SP_USE_QUICK_ALLOCATOR -g -Wl,-z,relro CMakeFiles/stlset.dir/stlset_main.o -o stlset -rdynamic ../tools/libqfs_tools.a ../libclient/libqfs_client.a ../qcdio/libqfs_qcdio.a -lpthread ../kfsio/libqfs_io.a ../common/libqfs_common.a ../qcdio/libqfs_qcdio.a -lpthread -lz -lboost_regex-mt -lrt -lcrypto ../qcrs/libqfs_qcrs.a
CMakeFiles/stlset.dir/stlset_main.o: In function thread_exception': /usr/include/boost/thread/exceptions.hpp:49: undefined reference toboost::system::system_category()'
/usr/include/boost/thread/exceptions.hpp:49: undefined reference to boost::system::system_category()' CMakeFiles/stlset.dir/stlset_main.o: In function_static_initialization_and_destruction_0':
/usr/include/boost/system/error_code.hpp:214: undefined reference to boost::system::generic_category()' /usr/include/boost/system/error_code.hpp:215: undefined reference toboost::system::generic_category()'
/usr/include/boost/system/error_code.hpp:216: undefined reference to boost::system::system_category()' collect2: error: ld returned 1 exit status make[2]: *** [src/cc/devtools/stlset] Error 1 make[2]: Leaving directory/home//rpmbuild/BUILD/qfs-1.0.2'
make[1]: *** [src/cc/devtools/CMakeFiles/stlset.dir/all] Error 2
make[1]: /usr/bin/cmake -E cmake_progress_report /home//rpmbuild/BUILD/qfs-1.0.2/CMakeFiles 8
*** Waiting for unfinished jobs....
[ 94%] Building CXX object src/cc/chunk/CMakeFiles/chunkserver.dir/utils.o
cd /home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk && /usr/bin/c++ -DKFS_OS_NAME_LINUX -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_LARGE_FILES -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -DBOOST_SP_USE_QUICK_ALLOCATOR -g -I/home//rpmbuild/BUILD/qfs-1.0.2/src/cc -o CMakeFiles/chunkserver.dir/utils.o -c /home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk/utils.cc
Linking CXX executable dirscan_test
cd /home//rpmbuild/BUILD/qfs-1.0.2/src/cc/tests && /usr/bin/cmake -E cmake_link_script CMakeFiles/dirscan_test.dir/link.txt --verbose=1
/usr/bin/c++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -DBOOST_SP_USE_QUICK_ALLOCATOR -g -Wl,-z,relro CMakeFiles/dirscan_test.dir/dirscan_test_main.o -o dirscan_test -rdynamic ../tools/libqfs_tools.a ../libclient/libqfs_client.a ../qcdio/libqfs_qcdio.a -lpthread ../kfsio/libqfs_io.a ../common/libqfs_common.a ../qcdio/libqfs_qcdio.a -lpthread -lz -lboost_regex-mt -lrt -lcrypto ../qcrs/libqfs_qcrs.a
/home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk/utils.cc: In function 'void KFS::die(const string&)':
/home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk/utils.cc:60:37: warning: ignoring return value of 'ssize_t write(int, const void
, size_t)', declared with attribute warn_unused_result [-Wunused-result]
/usr/bin/cmake -E cmake_progress_report /home//rpmbuild/BUILD/qfs-1.0.2/CMakeFiles
[ 94%] Building CXX object src/cc/chunk/CMakeFiles/chunkserver.dir/DirChecker.o
cd /home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk && /usr/bin/c++ -DKFS_OS_NAME_LINUX -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_LARGE_FILES -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -DBOOST_SP_USE_QUICK_ALLOCATOR -g -I/home//rpmbuild/BUILD/qfs-1.0.2/src/cc -o CMakeFiles/chunkserver.dir/DirChecker.o -c /home//rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk/DirChecker.cc
make[2]: Leaving directory /home/<user>/rpmbuild/BUILD/qfs-1.0.2' /usr/bin/cmake -E cmake_progress_report /home/<user>/rpmbuild/BUILD/qfs-1.0.2/CMakeFiles 11 [ 94%] Built target dirscan_test /usr/bin/cmake -E cmake_progress_report /home/<user>/rpmbuild/BUILD/qfs-1.0.2/CMakeFiles 9 [ 95%] Building CXX object src/cc/chunk/CMakeFiles/chunkserver.dir/Chunk.o cd /home/<user>/rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk && /usr/bin/c++ -DKFS_OS_NAME_LINUX -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_LARGE_FILES -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -DBOOST_SP_USE_QUICK_ALLOCATOR -g -I/home/<user>/rpmbuild/BUILD/qfs-1.0.2/src/cc -o CMakeFiles/chunkserver.dir/Chunk.o -c /home/<user>/rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk/Chunk.cc Linking CXX executable chunkserver cd /home/<user>/rpmbuild/BUILD/qfs-1.0.2/src/cc/chunk && /usr/bin/cmake -E cmake_link_script CMakeFiles/chunkserver.dir/link.txt --verbose=1 /usr/bin/c++ -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -Wall -DBOOST_SP_USE_QUICK_ALLOCATOR -g -Wl,-z,relro CMakeFiles/chunkserver.dir/chunkserver_main.o CMakeFiles/chunkserver.dir/AtomicRecordAppender.o CMakeFiles/chunkserver.dir/BufferManager.o CMakeFiles/chunkserver.dir/ChunkManager.o CMakeFiles/chunkserver.dir/ChunkServer.o CMakeFiles/chunkserver.dir/ClientManager.o CMakeFiles/chunkserver.dir/ClientSM.o CMakeFiles/chunkserver.dir/DiskIo.o CMakeFiles/chunkserver.dir/KfsOps.o CMakeFiles/chunkserver.dir/LeaseClerk.o CMakeFiles/chunkserver.dir/Logger.o CMakeFiles/chunkserver.dir/MetaServerSM.o CMakeFiles/chunkserver.dir/RemoteSyncSM.o CMakeFiles/chunkserver.dir/Replicator.o CMakeFiles/chunkserver.dir/utils.o CMakeFiles/chunkserver.dir/DirChecker.o CMakeFiles/chunkserver.dir/Chunk.o -o chunkserver -rdynamic ../kfsio/libqfs_io.a ../common/libqfs_common.a ../libclient/libqfs_client.a ../qcdio/libqfs_qcdio.a -lpthread -lcrypto -lrt ../kfsio/libqfs_io.a -lz -lrt ../common/libqfs_common.a -lpthread ../qcdio/libqfs_qcdio.a -lboost_regex-mt ../qcrs/libqfs_qcrs.a ../kfsio/libqfs_io.a(NetErrorSimulator.o): In function__static_initialization_and_destruction_0':
/usr/include/boost/system/error_code.hpp:214: undefined reference to boost::system::generic_category()' /usr/include/boost/system/error_code.hpp:215: undefined reference toboost::system::generic_category()'
/usr/include/boost/system/error_code.hpp:216: undefined reference to boost::system::system_category()' collect2: error: ld returned 1 exit status make[2]: *_\* [src/cc/chunk/chunkserver] Error 1 make[2]: Leaving directory /home//rpmbuild/BUILD/qfs-1.0.2'
make[1]: *** [src/cc/chunk/CMakeFiles/chunkserver.dir/all] Error 2
make[1]: Leaving directory`/home//rpmbuild/BUILD/qfs-1.0.2'
make: *** [all] Error 2

Filesystem Atomicity

Qfs doesn't seem to provide filesystem atomicity.

If I'm in qfsshell in my qfs root directory like so (window #1):

qfsshell -s 10.10.0.50 -p 50000

QfsShell>ls -lha
testfile <r 3> rw-r--r-- ubuntu ubuntu 1.2G Feb 12 10:41

QfsShell>cp testfile foo

And in window #2:

QfsShell>cp foo bar

Window #2's cp command will complete first, cutting off the rest of the file.

A little later, Window #1's cp command will complete.
testfile is the same as foo but foo is not the same as bar.
(file listing from qfs-fuse):

%>ls -lha qfs2
-rw-r--r-- 0 ubuntu ubuntu 1.3G Feb 12 10:41 testfile
-rw-r--r-- 0 root root 1.3G Feb 13 14:59 foo
-rw-r--r-- 0 root root 307M Feb 13 14:53 bar

Qfs cannot build with gcc-4.7

Qfs built source code with command line
make BOOST_INCLUDEDIR=~/boost_1_55_0/
Cmake run by use gcc and g++ version 4.7. It's through error as below.

Scanning dependencies of target kfsMeta
[ 20%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/AuditLog.o
[ 20%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/Checkpoint.o
[ 21%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/ChunkServer.o
[ 21%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/ChildProcessTracker.o
[ 22%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/ClientSM.o
[ 22%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/DiskEntry.o
[ 23%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/kfsops.o
[ 23%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/kfstree.o
[ 24%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/LayoutManager.o
[ 24%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/Logger.o
[ 25%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/meta.o
[ 25%] Building CXX object src/cc/meta/CMakeFiles/kfsMeta.dir/MetaRequest.o
/home/rch/workspacecpp/qfs/src/cc/meta/MetaRequest.cc: In member function ‘virtual void KFS::MetaFsck::handle()’:
/home/rch/workspacecpp/qfs/src/cc/meta/MetaRequest.cc:2698:40: warning: ignoring return value of ‘int ftruncate(int, __off_t)’, declared with attribute warn_unused_result [-Wunused-result]
c++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-4.7/README.Bugs for instructions.
make[3]: *** [src/cc/meta/CMakeFiles/kfsMeta.dir/MetaRequest.o] Error 4
make[2]: *** [src/cc/meta/CMakeFiles/kfsMeta.dir/all] Error 2
make[1]: *** [all] Error 2
make: *** [release] Error 2

Results are compile from gcc version 4.8 had error same above. I'm not sure an error born from the compiler. Do you have any idea from build source code there?

support building qfs with Docker

GitHub user [chanwit|https://github.com/chanwit] has created an issue on our public [GitHub Issues|https://github.com/quantcast/qfs/issues] list.

This ticket is a mirror of the first post of the public ticket on github. The purpose of this ticket is to have an owner of the public response. The owner is responsible for researching the answer and responding to the public issue in a timely manner.

#61

Issue Description
{code}
This PR adds Dockerfile and a build script to help quickly building QFS with Docker.
To use it, just clone the repository and call "./build.sh". Docker is required for this build process.

The result binaries from the default release configuration will be placed under ~/.bin after the build is successfully.

Signed-off-by: Chanwit Kaewkasi [email protected]
{code}

qfs-metaserver unable to start up

Due to metaserver machine failure we had to restart whole QFS cluster. Unfortunately, metaserver is unable to check all chunks properly due to chunkservers' failure to re-check all chunk directories.
Each chunkserver log contains lots of entries like:

failed to open chunk file: /path_to_qfs_chunk_dir/18636013.11249341.1 :Success 0 out of requests

How can I get my cluster back up and running? I need to access data stored in it ASAP.

Java build with Hadoop 2.5.1, 2.5.2 test failed (passed with 0.2x, 1.0.4 and 1.1.2)

I am eager to help fixing this, if any one can shed some lights.
(It's my first time hacking QFS.)

Here's the stack trace of the test:

Test set: com.quantcast.qfs.hadoop.TestQuantcastFileSystem

Tests run: 4, Failures: 1, Errors: 2, Skipped: 0, Time elapsed: 1.482 sec <<< FAILURE!
testFiles(com.quantcast.qfs.hadoop.TestQuantcastFileSystem) Time elapsed: 1.435 sec <<< ERROR!
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
at com.quantcast.qfs.hadoop.QFSEmulationImpl.create(QFSEmulationImpl.java:208)
at com.quantcast.qfs.hadoop.QuantcastFileSystem.create(QuantcastFileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at com.quantcast.qfs.hadoop.TestQuantcastFileSystem.testFiles(TestQuantcastFileSystem.java:101)

testFileIO(com.quantcast.qfs.hadoop.TestQuantcastFileSystem) Time elapsed: 0.015 sec <<< ERROR!
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
at com.quantcast.qfs.hadoop.QFSEmulationImpl.create(QFSEmulationImpl.java:208)
at com.quantcast.qfs.hadoop.QuantcastFileSystem.create(QuantcastFileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at com.quantcast.qfs.hadoop.TestQuantcastFileSystem.testFileIO(TestQuantcastFileSystem.java:136)

testDirs(com.quantcast.qfs.hadoop.TestQuantcastFileSystem) Time elapsed: 0 sec <<< FAILURE!
junit.framework.AssertionFailedError
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at junit.framework.Assert.assertFalse(Assert.java:41)
at com.quantcast.qfs.hadoop.TestQuantcastFileSystem.testDirs(TestQuantcastFileSystem.java:85)

'short read detected' after AtomicRecordAppend then failure

When doing a 'Read' after issuing a 'AtomicRecordAppend', initially, I get a 'Request: LEASE_ACQUIRE' for approx 20seconds, then a 'Request: READ : short read detected'. and then gives up and fails.

Now, when I get the ' Request: READ : short read detected' response and then run the same operation at this time the read occurs successfully.

So basically, if I issue a 'Read' too early after a 'AtomicRecordAppend', during the 'LEASE_AQUIRE' stage it will eventually fail but subsequential 'Read' are successful.

Anyone had this issue?

client read/write architecture question

Based on the test that I have done, when performing a read on a big file from the qfs, it reads each 'chunk' sequentially from the chunkservers.

Would it be possible to read the chunks asynchronously and buffer the results in memory, then recombine to a new read stream? I would think that this would help saturate the client network link when reading and improve performance.

Also, when writing files via the client, each chunk server seems to be 'transmitting' during the 'upload'. What is the purpose of this? Is it writing back to the client or the metaserver?

Thanks,

-alex

Qfs cannot build on Ubuntu 13.0

Hi,
I tried to build Qfs on Ubuntu. I did try to to run 'make' and 'cmake' but both failed during building the Kerberos service.

From the error logs, it looks like the KfsKrb5.h is looking for krb5.h.  I could not find the krb5.h in repo.  any idea where I could get this?  how did you guys manage to build without krb5.h..disable kerberos support for server side auth?

Below is the error:

Linking CXX shared library libqfs_common.so
[ 15%] Built target kfsCommon-shared
[ 17%] Built target kfsrs
[ 26%] Built target kfsMeta
[ 27%] Building CXX object src/cc/krb/CMakeFiles/qfskrb.dir/KrbService.cc.o
In file included from /home/jeffery/code/qfs/src/cc/krb/KrbService.cc:28:0:
/home/sam/code/qfs/src/cc/krb/KfsKrb5.h:36:26: fatal error: krb5/krb5.h: No such file or directory

include <krb5/krb5.h>

                      ^

compilation terminated.
make[3]: *** [src/cc/krb/CMakeFiles/qfskrb.dir/KrbService.cc.o] Error 1
make[2]: *** [src/cc/krb/CMakeFiles/qfskrb.dir/all] Error 2
make[1]: *** [all] Error 2
make: *** [release] Error 2

Data locality question

We are evaluating to adopt QFS to replace HDFS for our Apache Spark stack, but one of our engineers has googled and found a paper saying that QFS is abandoned data locality by design.

I think I can understand this as 6+3 Reed-Solomon does not give a full 100% data locality.
To my understanding, it should at least be 66% data locality when reading data through a HDFS client.

Correct me if I am wrong. Thank you!

python client issue in ubuntu 12.04 x64

I am receiving the following error when running examples/python:

alex@alex-VirtualBox:/qfs/examples/python$ PYTHONPATH=/qfs/build/release/qfs_python/lib/python LD_LIBRARY_PATH=~/qfs/build/release/lib python qfssample.py qfssample.cfg
Traceback (most recent call last):
File "qfssample.py", line 48, in
import qfs
ImportError: /home/alex/qfs/build/release/lib/libqfs_io.so: undefined symbol: ZN5boost13match_resultsIN9__gnu_cxx17__normal_iteratorIPKcSsEESaINS_9sub_matchIS5_EEEE12maybe_assignERKS9

Looks like python is somehow missing the Boost Libs?

Note: I tested this on Centos 6.3 and it works fine.

Please advise.

Dynamic IPs for chunkservers

Hi,

I'm trying to run a qfs cluster inside docker. I have 1 metaserver and 3 chunkservers and all data is stored in persistent volumes on the docker host. After I restart the cluster, the master restores the data but cannot find the chunkservers becouse docker assigned a different IP to them.

Example

  • docker run metaserver, ip: 172.17.0.2, can be accessed at 172.17.0.1:20000 (docker bridge)
  • docker run chunk1, ip: 172.17.0.3, can be accessed at 172.17.0.1:22000 (docker bridge)
  • docker run chunk2, ip: 172.17.0.4, can be accessed at 172.17.0.1:22001 (docker bridge)
  • docker run chunk3, ip: 172.17.0.5, can be accessed at 172.17.0.1:22002 (docker bridge)

Here, all chunkservers have chunkServer.metaServer.hostname to 172.17.0.1 (docker bridge) so they are connecting successfully even if master have a different ip.

If I restart the docker containers, docker assigns them different ips and master thinks this chunkservers are down, even if they are running.

How can I bind a chunkserver to docker bridge ? If I try to set chunkServer.clientIp=172.17.0.1 it fails becouse it cannot bind to that interface.

KfsClient::GetDataLocation aborts process when called with directory argument

If KfsClient::GetDataLocation is called with a path that is a directory, the call will result in an assertion failure and abort the process. It is expected that this should return a recoverable error.

A minimal example follows:

#include <iostream>
#include <kfs/KfsClient.h>

using namespace std;

int main() {
    KFS::KfsClient* client = KFS::Connect("localhost", 20000);

    if(client == NULL) {
        cout << "error connecting" <<endl;
        return -1;
    }

    vector< vector<string> > locations;

    // This call will cause an assert and abort the process.
    int res = client->GetDataLocation("/", 0, KFS::CHUNKSIZE, locations);

    if(res < 0) {
        cout << "error getting data locations: " << KFS::ErrorCodeToStr(res) << endl;
    }

    // ... use locations
}

Compile:

c++ -Wall -g minimal-assert-failure.cc -I./build/release/include -L./build/release/lib -o minimal-assert-failure -lqfs_client

Result:

$ DYLD_LIBRARY_PATH=./build/release/lib ./minimal-assert-failure                                                                  
Assertion failed: (mMutex.IsOwned() && valid_fd(fd) && ! mFileTable[fd]->fattr.isDirectory), function LocateChunk, file /Users/sday/c/qfs/src/cc/libclient/KfsClient.cc, line 3602.

Expected result:

The call to GetDataLocations should return -EISDIR, rather than aborting the calling process.

chunkServer.rackId is not documented

Wiki/Configuration Reference should contain the description of RackId config parameter in the Chunk Server section, but it doesn't. The description of metaServer.rackPrefixes mention it and sampleServer configuration also contains this parameter.

Go library

Hi,

I'm planning to use QFS in a Go project. Is it any support for C API which I can then include as a native extension ? Calling C++ code from Go is not supported.

I can see protocol is very complex so a client rewrite in Go looks very hard.

Thank you,
Teodor

qfsc unit tests failing on ubuntu precise32

Running make test-release, we get the following failure:

fail    test_qfs_get_data_locations
** expect count == expected_length: unexpected number of chunk locations: 2 != 4
fail
23 tests

Host information:

$ uname -a
Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux

The same test does not fail when using the "example setup". It's possible this test relies on having two chunkservers in the test setup.

I'm running a git bisect to track down the change that caused this test to fail on ubuntu.

File System User Permissions

Hi,

I installed a 6 node qfs cluster with an additional head node running meta server. All the chunkservers and the metaserver are running under the name 'qfs'. All the chunk directories as well as directories for transaction logs and checkpoints in meta server are owned by this user. The 'qfs' user has read and write permissions in those directories.

Now from a separate client node I am invoking the qfsshell or cptoqfs tools to interact with the qfs cluster. These tools are run again with the user named 'qfs'. When I attempt to create a directory in the qfs, I get an error "permission denied". However, when I run these tools as sudo user in the client node, I am able to create or copy files into the qfs.

Is there a notion of superuser for the entire qfs? How can individual users control the permissions on those files? If there is a notion of users recognized by the qfs, how can we create those users?

/karthik

Failed to initialize YarnClient (Hadoop 2.6.0 and QFS 1.1.2)

The error message below was returned when running hadoop example test following instructions on http://www.abisen.com/configuring-qfs-on-cloudera-hadoop.html

Both qfsshell and hdfs commands are functional, such that the qfs cluster should have been setup properly.

It seems someone has encountered similar issue as well, https://groups.google.com/forum/#!topic/qfs-devel/D3HAoatGAG4

Command to run the hadoop test:

bin/hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep  $QFSPARAM /input/hadoop /output 'dfs[a-z.]+'
echo $QFSPARAM
-Dfs.qfs.impl=com.quantcast.qfs.hadoop.QuantcastFileSystem -Dfs.defaultFS=qfs://172.17.1.249:20000 -Dfs.qfs.metaServerHost=172.17.1.249 -Dfs.qfs.metaServerPort=20000 -libjars /lib/java/hadoop-qfs/hadoop-2.6.0-qfs-1.1.2.jar 
$ cat mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

15/02/18 00:56:25 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: Error in instantiating YarnClient
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1266)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1262)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1261)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1290)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at org.apache.hadoop.examples.Grep.run(Grep.java:77)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.Grep.main(Grep.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Failed to build - 'ext/pool_allocator.h' file not found

Hi,
I am getting the next error while I am trying to build qfs on Mac -

make all 
test -d build || mkdir build
cd build && \
    { test -d release || mkdir release; } && \
    cd release && \
    cmake -D CMAKE_BUILD_TYPE=RelWithDebInfo ../.. && \
    /Library/Developer/CommandLineTools/usr/bin/make install
-- The C compiler identification is Clang 5.1.0
-- The CXX compiler identification is Clang 5.1.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Boost version: 1.55.0
-- Found the following Boost libraries:
--   regex
--   system
-- Boost-includes = /usr/local/include
-- Boost-libs = /usr/local/lib/libboost_regex-mt.dylib;/usr/local/lib/libboost_system-mt.dylib
-- Found JNI: -framework JavaVM  
-- System name: Darwin
-- System processor: i386
-- qcrs: enabling ssse3
-- Performing Test MY_LAXVEC_CONV
-- Performing Test MY_LAXVEC_CONV - Success
-- qcrs: enabling -O3 flag
-- JNI found: building qfs_access
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/yosy/code/research/qfs/build/release
Scanning dependencies of target qcdio
[  0%] Building CXX object src/cc/qcdio/CMakeFiles/qcdio.dir/QCDiskQueue.o
[  1%] Building CXX object src/cc/qcdio/CMakeFiles/qcdio.dir/QCFdPoll.o
[  1%] Building CXX object src/cc/qcdio/CMakeFiles/qcdio.dir/QCIoBufferPool.o
[  2%] Building CXX object src/cc/qcdio/CMakeFiles/qcdio.dir/QCMutex.o
[  2%] Building CXX object src/cc/qcdio/CMakeFiles/qcdio.dir/QCThread.o
[  3%] Building CXX object src/cc/qcdio/CMakeFiles/qcdio.dir/QCUtils.o
Linking CXX static library libqfs_qcdio.a
[  3%] Built target qcdio
Scanning dependencies of target version
[  3%] Built target version
Scanning dependencies of target kfsCommon
[  4%] Building CXX object src/cc/common/CMakeFiles/kfsCommon.dir/BufferedLogWriter.o
In file included from /Users/yosy/code/research/qfs/src/cc/common/BufferedLogWriter.cc:53:
In file included from /Users/yosy/code/research/qfs/src/cc/common/Properties.h:34:
/Users/yosy/code/research/qfs/src/cc/common/StdAllocator.h:41:13: fatal error: 'ext/pool_allocator.h' file not found
#   include <ext/pool_allocator.h>
            ^
1 error generated.
make[3]: *** [src/cc/common/CMakeFiles/kfsCommon.dir/BufferedLogWriter.o] Error 1
make[2]: *** [src/cc/common/CMakeFiles/kfsCommon.dir/all] Error 2
make[1]: *** [all] Error 2
make: *** [release] Error 2

read-only FS?

I'm trying my first QFS cluster with 2 chunk servers (2 chunkdirs each). For some reason FUSE client mounts file system only in ro mode. I can create directory on QFS using qfsshell but command

cat /etc/qfs/MetaServer.prp |  qfsput -s localhost -p 20000 -f MetaServer.prp

create empty file:

Wrote 0 to MetaServer.prp

I don't see any errors in web UI... What could be the problem?

mEGroup not set in SetEUserAndEGroupSelf

There seems to be an issue in KfsClient.cc where the mEUser variable is set twice and the mEGroup variable is not set at all:

    client.mEUser  = globals.mEUser;
    client.mEUser  = globals.mEUser;
    client.mGroups = globals.mGroups;

I would suggest the following fix:

    file:qfs/src/cc/libclient/KfsClient.cc ../code_new/qfs/src/cc/libclient/KfsClient.cc      
    lines: 1096 to 1098
    client.mEUser  = globals.mEUser;
    client.mEGroup  = globals.mEGroup;
    client.mGroups = globals.mGroups;

Add directory/file paths to log messages

I was getting log messages like the following:

08-07-2013 13:41:17.249 INFO - (DiskIo.cc:XXXX) fs space available (0 0) error: -2 get fs available No such file or directory 2

I didn't know if these were issues or not, and without information about what directory was missing, I didn't know how to proceed. After debugging, I eventually realized the missing file was the 'evacuate' file. It'd be nice if it were just output from the start.

I wrote a patch to do this at the branch logdirectorypath at https://github.com/chu11/qfs.git

I now get a log message like this:

08-08-2013 14:34:07.160 INFO - (DiskIo.cc:2141) fs space available (0 0) error: -2 get fs available No such file or directory 2 /scratch/achu/evacuate

The patch I wrote is sort of a hack. I think something better would be if the IoRetCode, CompletionCode, and my new PathError variable were put into some encapsulated "Error" class. Decided such an effort wasn't a good idea until a discussion.

error while building mstress

when i was trying to run ./mstress_install.sh localhost
i go this error, i'm not sure if there is an order for things to do before running mstress_install or there is a missing library

Scanning dependencies of target mstress_client
[ 96%] Building CXX object benchmarks/mstress/CMakeFiles/mstress_client.dir/mstress_client.cc.o
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc: In function ‘int CreateDFSPaths(Client_, AutoCleanupKfsClient_)’:
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc:376:17: error: aggregate ‘std::ostringstream os’ has incomplete type and cannot be defined
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc: In function ‘int StatDFSPaths(Client_, AutoCleanupKfsClient_)’:
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc:406:17: error: aggregate ‘std::ostringstream os’ has incomplete type and cannot be defined
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc: In function ‘int ListDFSPaths(Client_, AutoCleanupKfsClient_)’:
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc:455:17: error: aggregate ‘std::ostringstream os’ has incomplete type and cannot be defined
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc: In function ‘int RemoveDFSPaths(Client_, AutoCleanupKfsClient_)’:
/home/maism/src/qfs/benchmarks/mstress/mstress_client.cc:500:17: error: aggregate ‘std::ostringstream os’ has incomplete type and cannot be defined
make[3]: *** [benchmarks/mstress/CMakeFiles/mstress_client.dir/mstress_client.cc.o] Error 1
make[2]: *** [benchmarks/mstress/CMakeFiles/mstress_client.dir/all] Error 2
make[1]: *** [benchmarks/mstress/CMakeFiles/mstress-tarball.dir/rule] Error 2
make: *** [mstress-tarball] Error 2

C client library is not "fork safe"

E.g. in our use-case parent process forks workers and waits for them to finish, once workers exit, parent process hangs at exit in

#0  __pthread_cond_destroy (cond=0x4bde7c0) at pthread_cond_destroy.c:77
#1  0x00007f2976fb7b79 in QCCondVar::~QCCondVar() () from /usr/lib/libqfs_qcdio.so
#2  0x00007f2976aa0bfd in KFS::BufferedLogWriter::~BufferedLogWriter() () from /usr/lib/libqfs_common.so
#3  0x00007f29758e7bc9 in __run_exit_handlers (status=0, listp=0x7f2975c535a8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#4  0x00007f29758e7c15 in __GI_exit (status=<optimized out>) at exit.c:104

Tests should be moved into a proper testing framework

Currently, QFS tests are (a) far and few between, and (b) stored as independent programs with their own main functions that are run by the qfstest.sh script. This setup is brittle and doesn't encourage more unit tests being added -- it isn't currently as easy to add unit tests as it should be.

Instead, we need to use an automated test suite (e.g. Google Test) which has proper unit testing support. We should move the brittle tests over to the new system as soon as possible.

hadoop-test-1.0.4.jar fails with local QFS

Ubuntu 12.04, hadoop 1.0.4, latest QFS, local single test node.

$ hadoop jar /home/cdh/hadoop-1.0.4/hadoop-test-1.0.4.jar mapredtest 100 10000

or any other test fails with

java.io.FileNotFoundException: /user/cdh/mapred.loadtest/genouts: No such file or directory 2
at com.quantcast.qfs.access.KfsAccess.kfs_retToIOException(KfsAccess.java:804)
at com.quantcast.qfs.hadoop.QFSImpl.stat(QFSImpl.java:100)
at com.quantcast.qfs.hadoop.QuantcastFileSystem.listStatus(QuantcastFileSystem.java:122)
at com.quantcast.qfs.hadoop.QuantcastFileSystem.delete(QuantcastFileSystem.java:175)
at org.apache.hadoop.mapred.TestMapRed.launch(TestMapRed.java:550)
at org.apache.hadoop.mapred.TestMapRed.run(TestMapRed.java:825)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.TestMapRed.main(TestMapRed.java:753)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

$hadoop fs -ls / mkdir / rm etc works fine.

The same tests on the same configuration, but with core-site.xml configured for HDFS, do not fail. I put the body of listStatus into try-catch - tests pass, everything works correctly.

Unable to initialize KFS Client (Hadoop 2.6.0 and QFS 1.1.2)

I was trying to run the example from hadoop distribution on QFS. Tasks are assigned and created on the slave nodes, but it seems the KFS client failed to initialize and kill the tasks. Please find below for the syslog from the failed container.

2015-02-25 07:57:02,433 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1424850718948_0001_000002
2015-02-25 07:57:03,024 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-02-25 07:57:03,043 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2015-02-25 07:57:03,044 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 1 cluster_timestamp: 1424850718948 } attemptId: 2 } keyId: -1473247521)
2015-02-25 07:57:03,407 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2015-02-25 07:57:03,410 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
2015-02-25 07:57:03,488 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2015-02-25 07:57:03,490 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2015-02-25 07:57:03,491 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2015-02-25 07:57:03,492 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2015-02-25 07:57:03,493 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2015-02-25 07:57:03,502 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2015-02-25 07:57:03,503 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2015-02-25 07:57:03,504 INFO [main] org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2015-02-25 07:57:33,599 INFO [main] org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler failed in state INITED; cause: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:131)
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:157)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:448)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:470)
    at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.getDefaultFileContext(JobHistoryUtils.java:247)
    at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.ensurePathInDefaultFileSystem(JobHistoryUtils.java:277)
    at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.getConfiguredHistoryStagingDirPrefix(JobHistoryUtils.java:191)
    at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:147)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:444)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:129)
    ... 24 more
Caused by: java.io.IOException: Unable to initialize KFS Client
    at com.quantcast.qfs.access.KfsAccess.<init>(KfsAccess.java:245)
    at com.quantcast.qfs.hadoop.QFSImpl.<init>(QFSImpl.java:43)
    at com.quantcast.qfs.hadoop.Qfs.<init>(Qfs.java:82)
    ... 29 more
2015-02-25 07:57:33,608 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping JobHistoryEventHandler. Size of the outstanding queue size is 0
2015-02-25 07:57:33,608 INFO [main] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped JobHistoryEventHandler. super.stop()
2015-02-25 07:57:33,609 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:131)
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:157)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:448)
    at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:470)
    at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.getDefaultFileContext(JobHistoryUtils.java:247)
    at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.ensurePathInDefaultFileSystem(JobHistoryUtils.java:277)
    at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.getConfiguredHistoryStagingDirPrefix(JobHistoryUtils.java:191)
    at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:147)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:444)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
    at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:129)
    ... 24 more
Caused by: java.io.IOException: Unable to initialize KFS Client
    at com.quantcast.qfs.access.KfsAccess.<init>(KfsAccess.java:245)
    at com.quantcast.qfs.hadoop.QFSImpl.<init>(QFSImpl.java:43)
    at com.quantcast.qfs.hadoop.Qfs.<init>(Qfs.java:82)
    ... 29 more
2015-02-25 07:57:33,610 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Skipping cleaning up the staging dir. assuming AM will be retried.

setting replication level

Please document how to set replication level (with examples).

Command qfs -setrep 2 /mnt/qfs/myfile returns

file:///mnt/qfs/myfile: Invalid argument 22

and leaves me confused without a clue about what am I doing wrong...

Memory leak + error + perfomance notice

[examples/cc/qfssample_main.cc:249]: (error) Memory leak: dataBuf
[benchmarks/mstress/mstress_client.cc:100]: (error) Common realloc mistake: 'actualPath_' nulled but not freed upon failure
[src/cc/access/kfs_module_py.cc:221] -> [src/cc/access/kfs_module_py.cc:212]: (error, inconclusive) Possible null pointer dereference: self - otherwise it is redundant to check it against null.
[src/cc/access/kfs_module_py.cc:750]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/chunk/ClientSM.cc:262] -> [src/cc/chunk/ClientSM.cc:278]: (error) Possible null pointer dereference: op - otherwise it is redundant to check it against null.
[src/cc/chunk/ClientSM.cc:687]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/chunk/ClientSM.cc:763]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/chunk/ChunkManager.cc:2232]: (error) Possible null pointer dereference: cih
[src/cc/chunk/KfsOps.cc:1526] -> [src/cc/chunk/KfsOps.cc:1528]: (performance) Variable 'needToForward' is reassigned a value before the old one has been used.
[src/cc/chunk/KfsOps.cc:1762]: (performance) Possible inefficient checking for 'checksums' emptiness.
[src/cc/chunk/KfsOps.cc:2217]: (performance) Possible inefficient checking for 'checksum' emptiness.
[src/cc/chunk/KfsOps.cc:2423]: (performance) Possible inefficient checking for 'checksums' emptiness.
[src/cc/chunk/chunkserver_main.cc:252] -> [src/cc/chunk/chunkserver_main.cc:249]: (error, inconclusive) Possible null pointer dereference: sInstance - otherwise it is redundant to check it against null.
[src/cc/common/Properties.cc:339]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/common/Properties.cc:362]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/fuse/kfs_fuse_main.cc:362] -> [src/cc/fuse/kfs_fuse_main.cc:360]: (error, inconclusive) Possible null pointer dereference: cp - otherwise it is redundant to check it against null.
[src/cc/emulator/LayoutEmulator.cc:353]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/emulator/LayoutEmulator.cc:384]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/kfsio/IOBuffer.cc:1011] -> [src/cc/kfsio/IOBuffer.cc:1005]: (error, inconclusive) Possible null pointer dereference: sIOBufferAllocator - otherwise it is redundant to check it against null.
[src/cc/kfsio/checksum.cc:206]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/libclient/FileOpener.cc:487] -> [src/cc/libclient/FileOpener.cc:471]: (error, inconclusive) Possible null pointer dereference: mCurOpPtr - otherwise it is redundant to check it against null.
[src/cc/libclient/KfsOps.cc:491]: (performance) Possible inefficient checking for 'checksums' emptiness.
[src/cc/libclient/KfsOps.cc:523]: (performance) Possible inefficient checking for 'checksums' emptiness.
[src/cc/libclient/Path.cc:127]: (performance) Prefer prefix ++/-- operators for non-primitive types.
[src/cc/libclient/WriteAppender.cc:1337] -> [src/cc/libclient/WriteAppender.cc:1316]: (error, inconclusive) Possible null pointer dereference: mCurOpPtr - otherwise it is redundant to check it against null.
[src/cc/libclient/KfsClient.cc:94] -> [src/cc/libclient/KfsClient.cc:96]: (performance) Variable 'verbose' is reassigned a value before the old one has been used.
[src/cc/meta/ChunkPlacement.h:225]: (error) Analysis failed. If the code is valid then please report this failure.
[src/cc/meta/LayoutManager.cc:4539] -> [src/cc/meta/LayoutManager.cc:4536]: (error, inconclusive) Possible null pointer dereference: cs - otherwise it is redundant to check it against null.
[src/cc/qcdio/qcunittest_main.cc:148]: (performance) Function parameter 'inStatus' should be passed by reference.
[src/cc/tests/dirfile_test_main.cc:180] -> [src/cc/tests/dirfile_test_main.cc:181]: (performance) Buffer 'fileName' is being written before its old content has been used.
[src/cc/tests/dirfile_test_main.cc:210] -> [src/cc/tests/dirfile_test_main.cc:211]: (performance) Buffer 'fileName' is being written before its old content has been used.
[src/cc/tests/reader_test_main.cc:94] -> [src/cc/tests/reader_test_main.cc:95]: (performance) Buffer 'fileName' is being written before its old content has been used.
[src/cc/tests/seekwrite_test_main.cc:131] -> [src/cc/tests/seekwrite_test_main.cc:132]: (performance) Buffer 'fileName' is being written before its old content has been used.
[src/cc/tests/truncate_test_main.cc:119] -> [src/cc/tests/truncate_test_main.cc:120]: (performance) Buffer 'fileName' is being written before its old content has been used.
[src/cc/tests/writer_test_main.cc:119] -> [src/cc/tests/writer_test_main.cc:120]: (performance) Buffer 'fileName' is being written before its old content has been used.
[src/cc/tools/kfsmkdirs.cc:42]: (performance) Possible inefficient checking for 'args' emptiness.
[src/cc/tools/kfspwd.cc:43]: (performance) Possible inefficient checking for 'args' emptiness.
[src/cc/tools/kfsls.cc:84]: (performance) Possible inefficient checking for 'args' emptiness.
[src/cc/tools/kfsls.cc:89]: (performance) Possible inefficient checking for 'args' emptiness.
[src/cc/tools/kfsrm.cc:45]: (performance) Possible inefficient checking for 'args' emptiness.
[src/cc/tools/kfsrmdir.cc:42]: (performance) Possible inefficient checking for 'args' emptiness.
[src/cc/tools/qfsping_main.cc:58]: (performance) Possible inefficient checking for 'upServers' emptiness.
[src/cc/tools/qfsping_main.cc:66]: (performance) Possible inefficient checking for 'downServers' emptiness.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.