hdfs – Page 2 – Tarik Billa

Permission Denied error while running start-dfs.sh

September 26, 2023 by Tarik

I also encountered the same thing, I did so I found that my pdsh default rcmd is rsh, not ssh, rsh and ssh remote login authentication is not the same, when installing hadoop I configured ssh localhost password-free login, but rsh is not possible. so，try： 1.check your pdsh default rcmd rsh pdsh -q -w localhost … Read more

Hadoop: …be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation

September 20, 2023 by Tarik

This error is caused by the block replication system of HDFS since it could not manage to make any copies of a specific block within the focused file. Common reasons of that: Only a NameNode instance is running and it’s not in safe-mode There is no DataNode instances up and running, or some are dead. … Read more

Is there a hdfs command to list files in HDFS directory as per timestamp

September 19, 2023 by Tarik

No, there is no other option to sort the files based on datetime. If you are using hadoop version < 2.7, you will have to use sort -k6,7 as you are doing: hdfs dfs -ls /tmp | sort -k6,7 And for hadoop 2.7.x ls command , there are following options available : Usage: hadoop fs … Read more

Hadoop 2.2 Installation `.’ no such file or directory

September 15, 2023 by Tarik

Well, your problem regarding ls: ‘.’: No such file or directory‘ is because, there is no home dir on HDFS for your current user. Try hadoop fs -mkdir -p /user/[current login user] Then you will be able to hadoop fs -ls As per this warning WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… … Read more

hdfs dfs -mkdir, No such file or directory

September 9, 2023 by Tarik

It is because the parent directories do not exist yet either. Try hdfs dfs -mkdir -p /user/Hadoop/twitter_data. The -p flag indicates that all nonexistent directories leading up to the given directory are to be created as well. As for the question you posed in the comments, simply type into your browser http://<host name of the … Read more

apache spark – check if file exists

September 5, 2023 by Tarik

For a file in HDFS, you can use the hadoop way of doing this: val conf = sc.hadoopConfiguration val fs = org.apache.hadoop.fs.FileSystem.get(conf) val exists = fs.exists(new org.apache.hadoop.fs.Path(“/path/on/hdfs/to/SUCCESS.txt”))

How to find the size of a HDFS file

August 28, 2023 by Tarik

I also find myself using hadoop fs -dus <path> a great deal. For example, if a directory on HDFS named “/user/frylock/input” contains 100 files and you need the total size for all of those files you could run: hadoop fs -dus /user/frylock/input and you would get back the total size (in bytes) of all of … Read more

Where HDFS stores files locally by default?

August 24, 2023 by Tarik

You need to look in your hdfs-default.xml configuration file for the dfs.data.dir setting. The default setting is: ${hadoop.tmp.dir}/dfs/data and note that the ${hadoop.tmp.dir} is actually in core-default.xml described here. The configuration options are described here. The description for this setting is: Determines where on the local filesystem an DFS data node should store its blocks. … Read more

There are 0 datanode(s) running and no node(s) are excluded in this operation

August 22, 2023 by Tarik

How does Hadoop perform input splits?

August 1, 2023 by Tarik

The InputFormat is responsible to provide the splits. In general, if you have n nodes, the HDFS will distribute the file over all these n nodes. If you start a job, there will be n mappers by default. Thanks to Hadoop, the mapper on a machine will process the part of the data that is … Read more