HDP-friendly way of adding HBase sink for Flume

August 5, 2015, 11:57 am

≫ Next: Yarn Sqoop job failing

≪ Previous: Reply To: Welcome to the new Hortonworks Sandbox!

Hi, all.

Here’s my question on Stack Overflow:

https://stackoverflow.com/questions/31795745/whats-the-right-way-to-configure-an-hbase-sink-for-flume-on-hdp-2-2

I’m trying to figure out the HDP-friendly way of adding an HBase sink for Flume.

TIA,
Ali

↧

Yarn Sqoop job failing

August 5, 2015, 10:20 pm

≫ Next: Reply To: SLES 11.1 Ambari Registration Fails

≪ Previous: HDP-friendly way of adding HBase sink for Flume

We are trying to run a sqoop import in shell script in oozie via YARN. We are running hdp 2.3.0. The sqoop job runs fine via command prompt. But through oozie its failing as job.splitmetainfo file not found. I’m guessing some kind of misconfiguration somewhere as sqoop runs via command prompt. Any pointers would be much appreciated.

2015-08-06 05:13:39,851 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://xxx/user/root/.staging/job_1438837181977_0008/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1312)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1080)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ixxx/user/root/.staging/job_1438837181977_0008/job.splitmetainfo
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
… 17 more

↧

Reply To: SLES 11.1 Ambari Registration Fails

August 5, 2015, 11:03 pm

≫ Next: Reply To: Lab 3: Pig – Risk Factor riskfactor.pig "Path segment is null" error

≪ Previous: Yarn Sqoop job failing

The root cause was because of the number of CPUs on this machine (80 CPUs). The Ambari webserver hangs when the CPU count is too high – presumably because it is doing some sort of a thread calculation based on the CPU count, and choking itself to death. I disabled 60 of the 80 CPUs, bounced Ambari, and then it started ok.

↧

Reply To: Lab 3: Pig – Risk Factor riskfactor.pig "Path segment is null" error

August 6, 2015, 9:46 am

≫ Next: Reply To: Partial Sandbox Download

≪ Previous: Reply To: SLES 11.1 Ambari Registration Fails

Unfortunately not yet. Waiting for help…

↧

Reply To: Partial Sandbox Download

August 6, 2015, 10:41 am

≫ Next: Thrift crashing on HDP2.3

≪ Previous: Reply To: Lab 3: Pig – Risk Factor riskfactor.pig "Path segment is null" error

Varugis,

Do you know what the url for the virtualbox sandbox is?

↧

Thrift crashing on HDP2.3

August 6, 2015, 10:47 am

≫ Next: Reply To: Setting up ambari Server issue with Postgresql

≪ Previous: Reply To: Partial Sandbox Download

Hi,
is anyone else having issues with the HDP2.3 sandbox and Thrift? Everything was good for the same setup on 2.2.

After setting up the latest VM Thrift eventually crashes with the message below. The load doesn’t seem to matter, it seems like things starts happening after a certain number of requests.

Any feedback greatly appreciated.

2015-08-06 17:36:56,190 INFO [ConnectionCache_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f02cf088b2dec
2015-08-06 17:36:56,193 INFO [ConnectionCache_ChoreService_1] zookeeper.ZooKeeper: Session: 0x14f02cf088b2dec closed
2015-08-06 17:36:56,193 INFO [thrift-worker-0-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-08-06 17:36:56,566 ERROR [thrift-worker-1] client.AsyncProcess: Failed to get region location
java.io.IOException: hconnection-0xad042b2 closed
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:370)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:321)

↧

Reply To: Setting up ambari Server issue with Postgresql

August 6, 2015, 10:56 am

≫ Next: Reply To: Partial Sandbox Download

≪ Previous: Thrift crashing on HDP2.3

I had the same issue and was able to fix the problem. I noticed that ambari-server disk space is 100% full and after reclaiming the space and restarting ambari server the postgressql server is back to operational and was able to login into ambari ui

Thanks,Venkat

↧

Reply To: Partial Sandbox Download

August 6, 2015, 11:13 am

≫ Next: Data Modeling in Hadoop

≪ Previous: Reply To: Setting up ambari Server issue with Postgresql

I found the URL…I tried it with Wget and still the same thing.

↧

Data Modeling in Hadoop

August 6, 2015, 11:25 am

≫ Next: Reply To: Partial Sandbox Download

≪ Previous: Reply To: Partial Sandbox Download

HortonWorks has a white paper on Embarcadero’s ER/Studio site stating that it’s possible to model data in Hive using ER/Studio. It doesn’t say how to make the connection. Does anyone know which version of the HDP can be connected to by ER/Studio…any of the local “virtual” versions? If so, how is the connect string constructed?

Thanks.

↧

Reply To: Partial Sandbox Download

August 6, 2015, 1:10 pm

≫ Next: Reply To: Partial Sandbox Download

≪ Previous: Data Modeling in Hadoop

So we have established:

a) It is not a browser issue and since there is no proxy not a mitm issue
b) Did you run wget with –debug and –verbose (I think that these are the flags) to see what debugging spew was emitted – at least what the http error was?
c) Do you have any client side settings to terminate long running connections?
d) It might be worth trying on a different client like a linux box to see if is windows specific

↧

Reply To: Partial Sandbox Download

August 6, 2015, 1:26 pm

≫ Next: Reply To: Partial Sandbox Download

≪ Previous: Reply To: Partial Sandbox Download

To my knowledge there are no client settings which are terminating long connections.

Wget through a 206 Partial Content error, then retried 20 times to reconnect, and failed.

I’m pretty s.o.l. when it comes to another box, only have a windows computer. I thought about dual booting, but i’ll find some other way to get this sandbox before going that route.

↧

Reply To: Partial Sandbox Download

August 6, 2015, 3:51 pm

≫ Next: Reply To: Installing HDP 2.3 on a single node only

≪ Previous: Reply To: Partial Sandbox Download

Frankly this seems like a problem that the hortonworks people need to look into. The only other suggestion i have is to use range headers and then cat the two parts of the file together:

curl –header “Range: bytes=0-x” URL -o part1
curl –header “Range: bytes=x+1-n” URL -o part2
cat part1 part2 >> sandbox

BTW, why isn’t anyone from HortonWorks active on these threads. Do you even care about people trying to use your product?

↧

Reply To: Installing HDP 2.3 on a single node only

August 6, 2015, 10:47 pm

≫ Next: Reply To: Installing HDP 2.3 on a single node only

≪ Previous: Reply To: Partial Sandbox Download

Hello ALi,

It’s totally possible to install HDP2.3 on a single node, I usually do it on docker centos 6 in order to test new stuff.

I also arrive to manage the migration from HDP 2.2 to HDP 2.3 on a single node and multi node ( Docker) wih Kerberos include.
However, I made a manual upgrade and I didn’t used the Ambari upgrade Maanger.
What kind of problem did you had during this operation?

↧

Reply To: Installing HDP 2.3 on a single node only

August 7, 2015, 2:15 am

≫ Next: Comparing CheckSum of Local and HDFS File

≪ Previous: Reply To: Installing HDP 2.3 on a single node only

Ali,

I can confirm that there are no any problems installing HDP 2.3 from scratch on CentOS 6.6.
Here http://ihorbobak.com/index.php/2015/05/06/installing-hadoop-using-ambari-server/ you may find my detailed tutorial how to do this.
It was written for HDP 2.2, but it is fully applicable to HDP 2.3 – I did it on my own, everything works fine.

As to upgrades, I cannot comment because I didn’t do them.

Regards,
Ihor.

↧

Comparing CheckSum of Local and HDFS File

August 7, 2015, 3:43 am

≫ Next: Ambari Metrics Crtitical Alert HDP 2.3

≪ Previous: Reply To: Installing HDP 2.3 on a single node only

I have a small confusion regarding checksum verification.Lets say , i have a file abc.txt and I transferred this file to hdfs. How do I ensure about data integrity?

I followed below steps to check that file is correctly transferred.

On Local File System:

md5sum abc.txt

276fb620d097728ba1983928935d6121 TestFile

On Hadoop Cluster :

hadoop fs -checksum /abc.txt

/abc.txt MD5-of-0MD5-of-512CRC32C 000002000000000000000000911156a9cf0d906c56db7c8141320df0

Both output looks different to me. Let me know if I am doing anything wrong.

How do I verify if my file is transferred properly into HDFS?

Thanks

↧

Ambari Metrics Crtitical Alert HDP 2.3

August 7, 2015, 5:08 am

≫ Next: Cloudbreak – Softlayer

≪ Previous: Comparing CheckSum of Local and HDFS File

Hello,

I installed the last release of HDP (2.3) with Ambari 2.1 on a virtual cluster, the installation has completed with success but when i start Ambari i have a permanent alert on Ambari Metrics Service.

Metrics Collector Process
Connection failed: [Errno 111] Connection refused to 0.0.0.0:6188

With HDP 2.2 i didnt have these probleme.

Do you have any idea ?

Thank you for your help

↧

Cloudbreak – Softlayer

August 7, 2015, 6:47 am

≫ Next: Reply To: User does not have privilege on database

≪ Previous: Ambari Metrics Crtitical Alert HDP 2.3

I notice in the cloudbreak docs for HDP 2.3 that Softlayer is not currently listed as a supported provider.
Has anyone got cloudbreak working with softlayer? Or is it a case of having to use the SDK to customize?

↧

Reply To: User does not have privilege on database

August 7, 2015, 7:24 am

≫ Next: Reply To: Lab 3: Pig – Risk Factor riskfactor.pig "Path segment is null" error

≪ Previous: Cloudbreak – Softlayer

Can you please share your hive-site.xml and hiveserver2-site.xml settings for hive.security.* parameters ?

↧

Reply To: Lab 3: Pig – Risk Factor riskfactor.pig "Path segment is null" error

August 7, 2015, 9:07 am

≫ Next: HDFS Disk Usage – Ranger

≪ Previous: Reply To: User does not have privilege on database

just before leaving on vacation, I managed to find the solution :
I think but i’m not totally sure but try this :
put * for properties of core-site.xml of HDFS
hadoop.proxyuser.hcat.groups and hadoop.proxyuser.hcat.hosts

Restart all the impacted app and normally it’s good.

↧

HDFS Disk Usage – Ranger

August 7, 2015, 11:54 am

≫ Next: Kerberos Setup "Host tuxgrid13 doesn't exist in database"

≪ Previous: Reply To: Lab 3: Pig – Risk Factor riskfactor.pig "Path segment is null" error

Using HDP 2.2 Sandbox on the Ambari Dashboard I notice that HDFS Disk Usage circle was red and the usage was 95%. I don’t have all that much data. So I set out to find out what was using it all and I found:

$ sudo -u hdfs hadoop fs -du /
7497031      /app-logs
229265468    /apps
881591       /demo
413489640    /hdp
0            /mapred
0            /mr-history
20616255563  /ranger
0            /system
4599988      /tmp
239285045    /user

Ranger? It looks like it’s all audit files.

$ sudo -u hdfs hadoop fs -du /ranger/audit/
274172       /ranger/audit/hbaseMaster
3027170361   /ranger/audit/hbaseRegional
17578061921  /ranger/audit/hdfs
0            /ranger/audit/knox
0            /ranger/audit/storm

Why so many?

↧

Latest Images