Giter VIP home page Giter VIP logo

hadoop-book's Introduction

Hadoop Book Example Code

This repository contains the example code for Hadoop: The Definitive Guide, Fourth Edition by Tom White (O'Reilly, 2014).

Code for the First, Second, and Third Editions is also available.

Note that the chapter names and numbering has changed between editions, see Chapter Numbers By Edition.

Building and Running

To build the code, you will first need to have installed Maven and Java. Then type

% mvn package -DskipTests

This will do a full build and create example JAR files in the top-level directory (e.g. hadoop-examples.jar).

To run the examples from a particular chapter, first install the component needed for the chapter (e.g. Hadoop, Pig, Hive, etc), then run the command lines shown in the chapter.

Sample datasets are provided in the input directory, but the full weather dataset is not contained there due to size restrictions. You can find information about how to obtain the full weather dataset on the book's website at [http://www.hadoopbook.com/] (http://www.hadoopbook.com/).

Hadoop Component Versions

This edition of the book works with Hadoop 2. It has not been tested extensively with Hadoop 1, although most of it should work.

For the precise versions of each component that the code has been tested with, see book/pom.xml.

Copyright

Copyright (C) 2014 Tom White

hadoop-book's People

Contributors

tomwhite avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hadoop-book's Issues

Error: Project build

I try to build the source with command:
% mvn package -DskipTests -Dhadoop.version=1.1.1

And got this error:

/usr/local/maven/hadoop-book-master/common/src/main/java/oldapi/NcdcStationMetadataParser.java:[5,7] error: error while writing NcdcStationMetadataParser: could not create parent directories

Would you please show me how to fix this error? Thank you.

Chapter 11 fails to build

Trace:

$ mvn package -DskipTests -Dhadoop.version=1.1.1
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8.515s
[INFO] Finished at: Tue Aug 27 09:58:41 EDT 2013
[INFO] Final Memory: 19M/81M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project ch11: Compilation failure
[ERROR] error: error reading /Users/apennebaker/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar; cannot read zip file
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :ch11

System:

$ specs java os
Specs:

specs 0.4
https://github.com/mcandre/specs#readme

mvn --version
Apache Maven 3.0.4 (r1232337; 2012-01-17 03:44:56-0500)
Maven home: /usr/share/maven
Java version: 1.6.0_51, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.8.4", arch: "x86_64", family: "mac"

echo $CLASSPATH


echo $JAVA_HOME
/System/Library/Frameworks/JavaVM.framework/Home

java -version
java version "1.6.0_51"
Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509)
Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode)

system_profiler SPSoftwareDataType | grep 'System Version'
      System Version: OS X 10.8.4 (12E55)

AWS EC2 Single Hadoop cluster - issue with hdfs

Im tyring to use hdfs . I Have setup single node hadoop cluster on ec2. In Namenode I see following error.

2020-04-30 10:22:17,909 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 8 Total time for transactions(ms): 5 Number of transactions batched in Syncs: 121 Number of syncs: 5 SyncTimes(ms): 7
2020-04-30 10:22:17,954 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology
2020-04-30 10:22:17,955 WARN org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough replicas: expected size is 1 but only 0 storage types can be selected (replication=1, selected=[], unavailable=[DISK], removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
2020-04-30 10:22:17,955 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All required storage types are unavailable: unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
2020-04-30 10:22:17,955 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on default port 54310, call Call#6 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 171.61.106.68:43032
java.io.IOException: File /checkpoint/vgs11/actions/metadata could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2219)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2789)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)

My data node is able to send heartbeat to namenode is all good.

Jps shows correctly
13184 SecondaryNameNode
13604 Jps
12772 NameNode
12955 DataNode

Im using hadoop-3.2.1.

Can you let me know whats wrong in the configuration

Building and Running the Code

I followed the instructions to build and run the book's project code. So I have first installed Maven (lastest version) and Java (v. 1.7.0_91). Then I type the following command

% mvn package -DskipTests

However, I've got the following compilation errors:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project common: Fatal error compiling: directory not found: /home/hadoop-book-master/common/target/classes -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :common

Looking at the /common folder I actually could not find the target/classes folder. What should I do?

Thanks

Project build is utterly broken

Attempting to build according to the instructions in the README:

% mvn package -DskipTests -Dhadoop.version=1.0.1

results in:

[ERROR] COMPILATION ERROR : [INFO] ------------------------------------------------------------- [ERROR] /home/dlandis/git/hadoop-book/common/src/main/java/oldapi/MetOfficeRecordParser.java:[4,27] error: package org.apache.hadoop.io does not exist

There also seems to be other issues with the maven configuration as evidenced by the Issues list. This is frustrating since I actually just bought the book (not cheap) only to find the code is a mess.

Expand the list of supported Hadoop versions

Could we adjust the build to be more flexible with respect to Hadoop versions? I'm not sure how users are expected to get Hadoop setup for the book:

  • On Mac, Homebrew installs Hadoop v1.2.1.
  • In Ubuntu, the online tutorials tend to specify Hadoop v1.0.3.
  • In Windows, tutorials tend to use v1.0.0.

None of these match the book code's demand of Hadoop v1.1.1.

/not_sure_if_book_code_is_general_enough_to_work_with_different_minor_versions

https://maven.apache.org/enforcer/enforcer-rules/versionRanges.html

Error: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;

run chapater13 TextToParquetWithAvro error

shell:
hadoop jar HadoopTest.jar chapater13.TextToParquetWithAvro -libjars parquet-avro-1.10.0.jar,parquet-hadoop-1.10.0.jar,parquet-common-1.10.0.jar,parquet-column-1.10.0.jar,parquet-format-2.5.0.jar,avro-1.8.2.jar,parquet-encoding-1.10.0.jar,parquet-jackson-1.10.0.jar,avro-ipc-1.8.2.jar,avro-mapred-1.8.2.jar,paranamer-2.8.jar,commons-compress-1.16.1.jar,jackson-core-asl-1.9.13.jar,jackson-mapper-asl-1.9.13.jar,slf4j-api-1.8.0-beta2.jar,xz-1.8.jar,snappy-java-1.1.7.1.jar input/docs/quangle.txt output

result:
18/05/03 18:17:37 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
18/05/03 18:17:39 INFO input.FileInputFormat: Total input files to process : 1
18/05/03 18:17:39 INFO mapreduce.JobSubmitter: number of splits:1
18/05/03 18:17:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1525331086462_0020
18/05/03 18:17:40 INFO impl.YarnClientImpl: Submitted application application_1525331086462_0020
18/05/03 18:17:40 INFO mapreduce.Job: The url to track the job: http://blue:8088/proxy/application_1525331086462_0020/
18/05/03 18:17:40 INFO mapreduce.Job: Running job: job_1525331086462_0020
18/05/03 18:17:50 INFO mapreduce.Job: Job job_1525331086462_0020 running in uber mode : false
18/05/03 18:17:50 INFO mapreduce.Job: map 0% reduce 0%
18/05/03 18:17:56 INFO mapreduce.Job: Task Id : attempt_1525331086462_0020_m_000000_0, Status : FAILED
Error: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
18/05/03 18:18:03 INFO mapreduce.Job: Task Id : attempt_1525331086462_0020_m_000000_1, Status : FAILED
Error: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
18/05/03 18:18:10 INFO mapreduce.Job: Task Id : attempt_1525331086462_0020_m_000000_2, Status : FAILED
Error: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
18/05/03 18:18:18 INFO mapreduce.Job: map 100% reduce 0%
18/05/03 18:18:19 INFO mapreduce.Job: Job job_1525331086462_0020 failed with state FAILED due to: Task failed task_1525331086462_0020_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

Permission denied when trying to download NCDC data from S3

Hi.

I am trying do download the NCDC data from S3, as defined in:

http://hadoopbook.com/code.html

Got the following error:

17/04/17 18:34:57 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[s3n://hadoopbook/ncdc/all], targetPath=input/ncdc/all, targetPathExists=false, preserveRawXattrs=false}
17/04/17 18:34:57 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/04/17 18:34:57 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/04/17 18:35:00 ERROR tools.DistCp: Exception encountered
org.apache.hadoop.security.AccessControlException: Permission denied: s3n://hadoopbook/ncdc/all_$folder$
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:449)
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at org.apache.hadoop.fs.s3native.$Proxy10.retrieveMetadata(Unknown Source)
        at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:483)
        at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
        at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657)
        at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
        at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
        at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.jets3t.service.impl.rest.HttpException
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:519)
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)
        at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)
        at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)
        at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:548)
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:174)
        ... 19 more
hadoop@single-node:~$

Chapter 2: MapReduce: hadoop command doesn't work without class package

On the 25th page of the book there is a command to execute MaxTemperature map reduce job:

hadoop MaxTemperature input/ncdc/sample.txt output

with the execution logs output. However, as for my environment the execution result is different:

ERROR: MaxTemperature is not COMMAND nor fully qualified CLASSNAME.

After a quick look it turns out that it works only with class specified including package:

hadoop oldapi.MaxTemperature input/ncdc/sample.txt output

Not sure if it is something important, but want to highlight in the case somebody stumble as I did.

uname -a
Darwin viacheslavt-mac.local 17.6.0 Darwin Kernel Version 17.6.0: Tue May 8 15:22:16 PDT 2018; root:xnu-4570.61.1~1/RELEASE_X86_64 x86_64
hadoop version
Hadoop 3.0.3

Error strewn build

I've tried building with

mvn package -DskipTests -Dhadoop.version.1.1.1

on cmd.exe.

I get a slew of errors such as 'package org.apache.hadoop.io does not exist', 'package org.apache.hadoop.conf does not exist', 'package org.apache.hadoop.mapreduce does not exist'. I'm also getting numerous "cannot find symbol" errors relating to JobBuilder.java.

What could be causing this? This is frustrating as I have had numerous errors for the last 2/3 days trying to run this, many of which I have resolved, but new errors still keep cropping up.

The batch file for the script I'm using to run this is as follows:

set HADOOP_HEAPSIZE=500
set HADOOP_HOME=G:\Hadoop\hadoop-0.20.2
set HADOOP_INSTALL=G:\Hadoop\hadoop-0.20.2
set M2_HOME=G:\hadoop-book-3e-draft\apache-maven-3.0.5
set M2=%M2_HOME%\bin
set JAVA_HOME=G:\Java\jdk1.7.0_11
set PATH=%M2%;%JAVA_HOME%\bin;%HADOOP_INSTALL%\bin
mvn package -DskipTests -Dhadoop.version.1.1.1

Chapter 14 build failing: thrift 0.2.0 not available on Central Maven Repository

Trace:

[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Chapter 13: HBase 3.0
[INFO] ------------------------------------------------------------------------
Downloading: http://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-core/0.20-append-r1056497/hadoop-core-0.20-append-r1056497.pom
[WARNING] The POM for org.apache.hadoop:hadoop-core:jar:0.20-append-r1056497 is missing, no dependency information available
Downloading: http://repo.maven.apache.org/maven2/org/apache/thrift/thrift/0.2.0/thrift-0.2.0.pom
[WARNING] The POM for org.apache.thrift:thrift:jar:0.2.0 is missing, no dependency information available
Downloading: http://repo.maven.apache.org/maven2/org/apache/thrift/thrift/0.2.0/thrift-0.2.0.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Hadoop: The Definitive Guide, Project ............. SUCCESS [0.295s]
[INFO] Common Code ....................................... SUCCESS [1.196s]
[INFO] Chapter 2: MapReduce .............................. SUCCESS [0.122s]
[INFO] Chapter 3: The Hadoop Distributed Filesystem ...... SUCCESS [12.142s]
[INFO] Chapter 4: Hadoop I/O ............................. SUCCESS [1.282s]
[INFO] Chapter 4: Hadoop I/O (Avro) ...................... SUCCESS [3.536s]
[INFO] Chapter 5: Developing a MapReduce Application ..... SUCCESS [24.197s]
[INFO] Chapter 7: MapReduce Types and Formats ............ SUCCESS [0.811s]
[INFO] Chapter 8: MapReduce Features ..................... SUCCESS [0.850s]
[INFO] Chapter 11: Pig ................................... SUCCESS [0.620s]
[INFO] Chapter 12: Hive .................................. SUCCESS [0.239s]
[INFO] Chapter 13: HBase ................................. FAILURE [0.543s]
[INFO] Chapter 14: ZooKeeper ............................. SKIPPED
[INFO] Chapter 15: Sqoop ................................. SKIPPED
[INFO] Chapter 16: Case Studies .......................... SKIPPED
[INFO] Hadoop Examples JAR ............................... SKIPPED
[INFO] Snippet testing ................................... SKIPPED
[INFO] Hadoop: The Definitive Guide, Example Code ........ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 46.070s
[INFO] Finished at: Fri Jan 10 11:10:34 EST 2014
[INFO] Final Memory: 14M/81M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project ch13: Could not resolve dependencies for project com.hadoopbook:ch13:jar:3.0: Failure to find org.apache.thrift:thrift:jar:0.2.0 in https://repository.apache.org/content/repositories/releases/ was cached in the local repository, resolution will not be reattempted until the update interval of apache.releases has elapsed or updates are forced -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :ch13

I went to the Central Maven Repository to look for this package. They don't have thrift down to 0.2.0, they only go as far back as 0.9.0:

http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.thrift%22

'hadoop distcp' not working.

When running.

hadoop distcp \
  -Dfs.s3n.awsAccessKeyId='...' \
  -Dfs.s3n.awsSecretAccessKey='...' \
  s3n://hadoopbook/ncdc/all input/ncdc/all

As recommended here, from an EC2 Cluster, I get the following error:

2018-01-08 19:31:57,776 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[s3n://hadoopbook/ncdc/all], targetPath=input/ncdc/all, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192, verboseLog=false}, sourcePaths=[s3n://hadoopbook/ncdc/all], targetPathExists=false, preserveRawXattrsfalse
2018-01-08 19:31:57,904 INFO beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
2018-01-08 19:31:57,934 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-01-08 19:31:57,989 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-01-08 19:31:57,989 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2018-01-08 19:31:58,025 ERROR tools.DistCp: Exception encountered 
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3n"
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3266)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3286)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3337)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3305)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
	at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:76)
	at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
	at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:368)
	at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:96)
	at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:205)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:182)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:153)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:432)

Is there any better documentation on how to do this?

Building and Running the Code 2

I'm getting the following compilation errors trying to build the book's project code.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project common: Compilation failure
[ERROR] /home/hadoop-book-master/common/src/main/java/oldapi/NcdcStationMetadata.java:[7,8] error while writing oldapi.NcdcStationMetadata: could not create parent directories
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :common

Please let me know how to fix it.

Thanks

package org.apache.hadoop.record does not exist

Hello,

I was packaging the codes with maven and I encountered issue for chapter 22's codes. It notice me that "package org.apache.hadoop.record" is not exist.

I googled this package and found that it has already been deprecated by Avro. Avro and other dependent packages are installed by maven. You can see from the result below that other compile task are successful.

[INFO] 53 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Hadoop: The Definitive Guide, Project .............. SUCCESS [ 0.406 s]
[INFO] Common Code ........................................ SUCCESS [ 2.231 s]
[INFO] Chapter 2: MapReduce ............................... SUCCESS [ 0.194 s]
[INFO] Chapter 3: The Hadoop Distributed Filesystem ....... SUCCESS [ 0.398 s]
[INFO] Chapter 5: Hadoop I/O .............................. SUCCESS [ 0.376 s]
[INFO] Chapter 6: Developing a MapReduce Application ...... SUCCESS [ 0.336 s]
[INFO] Chapter 8: MapReduce Types and Formats ............. SUCCESS [ 0.651 s]
[INFO] Chapter 9: MapReduce Features ...................... SUCCESS [ 0.344 s]
[INFO] Chapter 12: Avro ................................... SUCCESS [ 5.353 s]
[INFO] Chapter 13: Parquet ................................ SUCCESS [ 1.922 s]
[INFO] Chapter 15: Sqoop .................................. SUCCESS [ 0.148 s]
[INFO] Chapter 16: Pig .................................... SUCCESS [ 0.173 s]
[INFO] Chapter 17: Hive ................................... SUCCESS [ 0.426 s]
[INFO] Chapter 18: Crunch ................................. SUCCESS [ 1.371 s]
[INFO] Chapter 19: Spark .................................. SUCCESS [ 13.280 s]
[INFO] Chapter 20: HBase .................................. SUCCESS [ 0.390 s]
[INFO] Chapter 21: ZooKeeper .............................. SUCCESS [ 0.090 s]
[INFO] Chapter 22: Case Studies ........................... FAILURE [ 0.196 s]
[INFO] Hadoop Examples JAR ................................ SKIPPED
[INFO] Snippet testing .................................... SKIPPED
[INFO] Hadoop: The Definitive Guide, Example Code ......... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 28.732 s
[INFO] Finished at: 2017-07-12T10:56:53+08:00
[INFO] Final Memory: 90M/360M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project ch22-case-studies: Compilation failure: Compilation failure:
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[4,57] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[5,53] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[6,47] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[35,46] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[38,65] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[100,55] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[110,71] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[120,57] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[218,66] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[9,57] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[10,73] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[11,69] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[12,73] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[13,74] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[14,69] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[53,53] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[56,36] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[57,55] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[60,38] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[146,58] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[146,136] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[147,38] package org.apache.hadoop.record.meta does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[226,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[227,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[231,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[232,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[236,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[237,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[241,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[242,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[246,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[247,43] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[260,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[261,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[265,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[266,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[270,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[271,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[275,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[276,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[280,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[281,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[285,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[286,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[290,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[291,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[295,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[296,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[300,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[301,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[305,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[306,44] package org.apache.hadoop.record does not exist
[ERROR] /usr/local/hadoop/myclass/hadoop-book/ch22-case-studies/src/main/java/fm/last/hadoop/io/records/TrackStats.java:[321,29] package org.apache.hadoop.record does not exist
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :ch22-case-studies

Also attach my environment variables here.
#JAVA
export JAVA_HOME=/usr/java/jdk1.8.0_131
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
#HADOOP
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CLASSPATH=/usr/local/hadoop/share/hadoop/common/hadoop-common-3.0.0-alpha3.jar:/usr/local/hadoop/myclass
#HIVE
export HIVE_INSTALL=/usr/local/hive
export PATH=$PATH:$HIVE_INSTALL/bin

Could anyone help me with this issue? Thanks

error running HBaseTemperatureBulkImporter

Hi Tom and other awesome contributors,

I'm running one of the examples provided in this book: HBaseTemperatureBulkImporter

But I keep running into this issue, stacktrace below:

What I have tried:

  1. I made sure I have permissions to this directory
  2. I started my local hadoop fs and created this directory as well.

Exception in thread "main" java.io.IOException: Mkdirs failed to create /user/stevesun/hbase-staging (exists=false, cwd=file:/Users/stevesun/personal_dev/HBaseMapReduceExample)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:1071)
at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.(SequenceFile.java:1371)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:272)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:294)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.writePartitions(HFileOutputFormat2.java:335)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configurePartitioner(HFileOutputFormat2.java:596)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:440)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:405)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:367)
at com.fishercoder.hFileIntoHBase.attempt3.HBaseTemperatureBulkImporter.run(HBaseTemperatureBulkImporter.java:117)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

Any help or pointers are greatly appreciated!

Can't get the HBASE part of the build to work

Trying to build on Ubuntu 12.04. Got most of the way there by installing everything, including ant and ivy manually. It builds the first few jars file but ant jar hbase generates the folowing:

hbase.compile:
[javac] /home/johnelle/hadoop-book/build.xml:126: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 5 source files to /home/johnelle/hadoop-book/build/classes
[javac] /home/johnelle/hadoop-book/ch13/src/main/java/HBaseStationCli.java:23: addColumn(byte[],byte[]) in org.apache.hadoop.hbase.client.Get cannot be applied to (byte[])
[javac] get.addColumn(INFO_COLUMNFAMILY);
[javac] ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 1 error

Maven cannot resolve dependencies

Hi,

i've just started reading the pre-release of the third edition and wanted to run the example code, but maven fails with the following error:

[ERROR] Failed to execute goal on project ch05: Could not resolve dependencies for project com.hadoopbook:ch05:jar:3.0: The following artifacts could not be resolved: org.apache.hadoop:hadoop-common:jar:1.0.1, org.apache.hadoop:hadoop-mapreduce-client-common:jar:1.0.1, org.apache.hadoop:hadoop-mapreduce-client-core:jar:1.0.1: Could not find artifact org.apache.hadoop:hadoop-common:jar:1.0.1 in apache.releases (https://repository.apache.org/content/repositories/releases/) -> [Help 1]

I'm not familiar with maven so any help would be very much appreciated!

Hadoop Pipes Fails with Authentication Error on Hadoop 0.20.2+737

The C++ example in chapter 2 of the Definitive Guide fails with an authentication error with Hadoop 0.20.2+737. Is there a configuration parameter we can set to permit this example to execute successfully on an Hadoop cluster?

    hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input sample.txt -output output -program bin/max_temperature
    10/10/15 10:48:34 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
    10/10/15 10:48:34 INFO mapred.FileInputFormat: Total input paths to process : 1
    10/10/15 10:48:35 INFO mapred.JobClient: Running job: job_201010121147_0019
    10/10/15 10:48:36 INFO mapred.JobClient:  map 0% reduce 0%
    10/10/15 10:48:52 INFO mapred.JobClient: Task Id : attempt_201010121147_0019_m_000001_0, Status : FAILED
    java.io.IOException
            at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
            at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:198)
            at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
            at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
            at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:383)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317)
            at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
            at org.apache.hadoop.mapred.Child.main(Child.java:211)

    attempt_201010121147_0019_m_000001_0: Server failed to authenticate. Exiting

Running the first example from chapter2: SCDynamicStore Error

Getting started with the book and was trying to follow the example in the book.

  1. Cloned the repo and at the hadoop-book dir ran the following:
    mvn package -DskipTests -Dhadoop.version=1.1.2

Build was successful.
2. export HADOOP_CLASSPATH=hadoop-examples.jar
3. hadoop MaxTemperature input/ncdc/sample.txt output

Got an error: Unable to load realm mapping info from SCVDynamicStore. Do you know why? This is a MacOS X 10.7.5.

Chapter 19: Spark build failure

Hi, I have been unable to get Maven to build Chapter 19. I am getting the following message with the build error:

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.6:compile (default) on project ch19-spark: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) -> [Help 1]

The issue appears to be related to the scala plugin. I am running macOS 10.14.4, and have Java 8 and 12 installed (OpenJDK).

I have tried removing the scala-reflect install, did not fix.
rm -R /Users/Joel/.m2/repository/org/scala-lang/scala-reflect/2.10.4

Here is the full trace

[ERROR] error: error while loading package, Missing dependency 'object java.lang.Object in compiler mirror', required by /Users/Joel/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar(scala/package.class)
[ERROR] error: error while loading package, Missing dependency 'object java.lang.Object in compiler mirror', required by /Users/Joel/.m2/repository/org/scala-lang/scala-library/2.10.4/scala-library-2.10.4.jar(scala/runtime/package.class)
[ERROR] error: scala.reflect.internal.MissingRequirementError: object java.lang.Object in compiler mirror not found.
[ERROR] at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
[ERROR] at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
[ERROR] at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
[ERROR] at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
[ERROR] at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
[ERROR] at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
[ERROR] at scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99)
[ERROR] at scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196)
[ERROR] at scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261)
[ERROR] at scala.tools.nsc.Global$Run.(Global.scala:1290)
[ERROR] at scala.tools.nsc.Driver.doCompile(Driver.scala:32)
[ERROR] at scala.tools.nsc.Main$.doCompile(Main.scala:79)
[ERROR] at scala.tools.nsc.Driver.process(Driver.scala:54)
[ERROR] at scala.tools.nsc.Driver.main(Driver.scala:67)
[ERROR] at scala.tools.nsc.Main.main(Main.scala)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:567)
[ERROR] at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
[ERROR] at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
[ERROR]

No lib directory

I pulled down the latest code for the book and discovered there is no lib/ directory. I originally got a copy of the code back on Dec. 2 and at the time I thought this directory existed. I don't recall running maven to build the examples. If I'm incorrect about that, my apologies.

At any rate, I got the most recent copy of the code and I'm trying to build the examples. I'm using the command you suggested in a previous comment: mvn package -DskipTests -Dhadoop.version=0.23.5 but getting a large number of errors. I can run jobs in this environment so I'm a little puzzled as to why I cannot build in the examples with this version of hadoop.

Back to the lib/ thing for a second. In Dec i was able to build the examples in any given chapter with Intellij as long as the lib directory and its contents are available, but now that it's not I'm sort of stuck.

mvn eclipse:eclipse failed

Dear Tom,

I have been learning and experimenting Hadoop through your great book. I tried to build with maven project, using command "mvn package -DskipTests -Dhadoop.version=1.0.4", it is SUCCESSFUL. However, when I tried to create Eclipse projects from Maven projects through Maven Eclipse plugin using the command
" mvn eclipse:eclipse -DskipTests -Dhadoop.version=1.0.4", I received a failure message, as below:

[ERROR] Failed to execute goal on project ch04: Could not resolve dependencies for project com.hadoopbook:ch04:jar:3.0: Failure to find com.hadoopbook:ch02:jar:3.0 in https://repository.apache.org/content/repositories/releases/ was cached in the local repository, resolution will not be reattempted until the update interval of apache.releases has elapsed or updates are forced -> [Help 1]

It looks like the "ch02" project was not established so that it could be used by Ch04 project. I am wondering if some additional configuration I need to set up at Eclipse side? Could you please give me some advice on this?

thanks a lot
Licheng

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.