Hi, all.
Here’s my question on Stack Overflow:
I’m trying to figure out the HDP-friendly way of adding an HBase sink for Flume.
TIA,
Ali
Hi, all.
Here’s my question on Stack Overflow:
I’m trying to figure out the HDP-friendly way of adding an HBase sink for Flume.
TIA,
Ali
We are trying to run a sqoop import in shell script in oozie via YARN. We are running hdp 2.3.0. The sqoop job runs fine via command prompt. But through oozie its failing as job.splitmetainfo file not found. I’m guessing some kind of misconfiguration somewhere as sqoop runs via command prompt. Any pointers would be much appreciated.
2015-08-06 05:13:39,851 INFO [Thread-53] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Setting job diagnostics to Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://xxx/user/root/.staging/job_1438837181977_0008/job.splitmetainfo
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1312)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1080)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ixxx/user/root/.staging/job_1438837181977_0008/job.splitmetainfo
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:51)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
… 17 more
The root cause was because of the number of CPUs on this machine (80 CPUs). The Ambari webserver hangs when the CPU count is too high – presumably because it is doing some sort of a thread calculation based on the CPU count, and choking itself to death. I disabled 60 of the 80 CPUs, bounced Ambari, and then it started ok.
Unfortunately not yet. Waiting for help…
Varugis,
Do you know what the url for the virtualbox sandbox is?
Hi,
is anyone else having issues with the HDP2.3 sandbox and Thrift? Everything was good for the same setup on 2.2.
After setting up the latest VM Thrift eventually crashes with the message below. The load doesn’t seem to matter, it seems like things starts happening after a certain number of requests.
Any feedback greatly appreciated.
2015-08-06 17:36:56,190 INFO [ConnectionCache_ChoreService_1] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f02cf088b2dec
2015-08-06 17:36:56,193 INFO [ConnectionCache_ChoreService_1] zookeeper.ZooKeeper: Session: 0x14f02cf088b2dec closed
2015-08-06 17:36:56,193 INFO [thrift-worker-0-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-08-06 17:36:56,566 ERROR [thrift-worker-1] client.AsyncProcess: Failed to get region location
java.io.IOException: hconnection-0xad042b2 closed
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:370)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:321)
I had the same issue and was able to fix the problem. I noticed that ambari-server disk space is 100% full and after reclaiming the space and restarting ambari server the postgressql server is back to operational and was able to login into ambari ui
Thanks,Venkat
I found the URL…I tried it with Wget and still the same thing.
HortonWorks has a white paper on Embarcadero’s ER/Studio site stating that it’s possible to model data in Hive using ER/Studio. It doesn’t say how to make the connection. Does anyone know which version of the HDP can be connected to by ER/Studio…any of the local “virtual” versions? If so, how is the connect string constructed?
Thanks.
So we have established:
a) It is not a browser issue and since there is no proxy not a mitm issue
b) Did you run wget with –debug and –verbose (I think that these are the flags) to see what debugging spew was emitted – at least what the http error was?
c) Do you have any client side settings to terminate long running connections?
d) It might be worth trying on a different client like a linux box to see if is windows specific
To my knowledge there are no client settings which are terminating long connections.
Wget through a 206 Partial Content error, then retried 20 times to reconnect, and failed.
I’m pretty s.o.l. when it comes to another box, only have a windows computer. I thought about dual booting, but i’ll find some other way to get this sandbox before going that route.
Frankly this seems like a problem that the hortonworks people need to look into. The only other suggestion i have is to use range headers and then cat the two parts of the file together:
curl –header “Range: bytes=0-x” URL -o part1
curl –header “Range: bytes=x+1-n” URL -o part2
cat part1 part2 >> sandbox
BTW, why isn’t anyone from HortonWorks active on these threads. Do you even care about people trying to use your product?
Hello ALi,
It’s totally possible to install HDP2.3 on a single node, I usually do it on docker centos 6 in order to test new stuff.
I also arrive to manage the migration from HDP 2.2 to HDP 2.3 on a single node and multi node ( Docker) wih Kerberos include.
However, I made a manual upgrade and I didn’t used the Ambari upgrade Maanger.
What kind of problem did you had during this operation?
Ali,
I can confirm that there are no any problems installing HDP 2.3 from scratch on CentOS 6.6.
Here http://ihorbobak.com/index.php/2015/05/06/installing-hadoop-using-ambari-server/ you may find my detailed tutorial how to do this.
It was written for HDP 2.2, but it is fully applicable to HDP 2.3 – I did it on my own, everything works fine.
As to upgrades, I cannot comment because I didn’t do them.
Regards,
Ihor.
Hi
I have a small confusion regarding checksum verification.Lets say , i have a file abc.txt and I transferred this file to hdfs. How do I ensure about data integrity?
I followed below steps to check that file is correctly transferred.
On Local File System:
md5sum abc.txt
276fb620d097728ba1983928935d6121 TestFile
On Hadoop Cluster :
hadoop fs -checksum /abc.txt
/abc.txt MD5-of-0MD5-of-512CRC32C 000002000000000000000000911156a9cf0d906c56db7c8141320df0
Both output looks different to me. Let me know if I am doing anything wrong.
How do I verify if my file is transferred properly into HDFS?
Thanks
Hello,
I installed the last release of HDP (2.3) with Ambari 2.1 on a virtual cluster, the installation has completed with success but when i start Ambari i have a permanent alert on Ambari Metrics Service.
Metrics Collector Process
Connection failed: [Errno 111] Connection refused to 0.0.0.0:6188
With HDP 2.2 i didnt have these probleme.
Do you have any idea ?
Thank you for your help
I notice in the cloudbreak docs for HDP 2.3 that Softlayer is not currently listed as a supported provider.
Has anyone got cloudbreak working with softlayer? Or is it a case of having to use the SDK to customize?
Can you please share your hive-site.xml and hiveserver2-site.xml settings for hive.security.* parameters ?
just before leaving on vacation, I managed to find the solution :
I think but i’m not totally sure but try this :
put * for properties of core-site.xml of HDFS
hadoop.proxyuser.hcat.groups and hadoop.proxyuser.hcat.hosts
Restart all the impacted app and normally it’s good.
Using HDP 2.2 Sandbox on the Ambari Dashboard I notice that HDFS Disk Usage circle was red and the usage was 95%. I don’t have all that much data. So I set out to find out what was using it all and I found:
$ sudo -u hdfs hadoop fs -du / 7497031 /app-logs 229265468 /apps 881591 /demo 413489640 /hdp 0 /mapred 0 /mr-history 20616255563 /ranger 0 /system 4599988 /tmp 239285045 /user
Ranger? It looks like it’s all audit files.
$ sudo -u hdfs hadoop fs -du /ranger/audit/ 274172 /ranger/audit/hbaseMaster 3027170361 /ranger/audit/hbaseRegional 17578061921 /ranger/audit/hdfs 0 /ranger/audit/knox 0 /ranger/audit/storm
Why so many?