Solution to Assignment 1, Task 3 Name: Nazeem Fazil Halilur Rahman Student Number: 7083178 // exporting the environment/structure of hadoop to HADOOP_CLASSPATH bigdata@bigdata-VirtualBox:~/solution3$ export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath) // Echo the content of $HADOOP_CLASSPATH bigdata@bigdata-VirtualBox:~/solution3$ echo $HADOOP_CLASSPATH /home/nfhr867/hadoop-3.3.5//etc/hadoop:/home/nfhr867/hadoop-3.3.5//share/hadoop/common/lib/*:/home/nfhr867/hadoop-3.3.5//share/hadoop/common/*:/home/nfhr867/hadoop-3.3.5//share/hadoop/hdfs:/home/nfhr867/hadoop-3.3.5//share/hadoop/hdfs/lib/*:/home/nfhr867/hadoop-3.3.5//share/hadoop/hdfs/*:/home/nfhr867/hadoop-3.3.5//share/hadoop/mapreduce/*:/home/nfhr867/hadoop-3.3.5//share/hadoop/yarn:/home/nfhr867/hadoop-3.3.5//share/hadoop/yarn/lib/*:/home/nfhr867/hadoop-3.3.5//share/hadoop/yarn/* // Compile the solution3.java with the specified classpath: bigdata@bigdata-VirtualBox:~/solution3$ javac solution3.java -cp $HADOOP_CLASSPATH // List the solution3.java that is generated: bigdata@bigdata-VirtualBox:~/solution3$ ls grep.txt solution3.class 'solution3$solution3Mapper.class' solution3.java 'solution3$solution3Reducer.class' solution3.txt // Generating the jar file : bigdata@bigdata-VirtualBox:~/solution3$ jar cvf solution3.jar solution3*class added manifest adding: solution3$solution3Mapper.class(in = 2026) (out= 958)(deflated 52%) adding: solution3$solution3Reducer.class(in = 1748) (out= 737)(deflated 57%) adding: solution3.class(in = 1504) (out= 813)(deflated 45%) //uploading the grep file to HDFS application: bigdata@bigdata-VirtualBox:~/solution3$ hdfs dfs -mkdir /solution3Input bigdata@bigdata-VirtualBox:~/solution3$ hdfs dfs -put grep.txt /solution3Input bigdata@bigdata-VirtualBox:~/solution3$ hdfs dfs -ls /solution3Input Found 1 items -rw-r--r-- 1 nfhr867 supergroup 91 2023-04-28 23:55 /solution3Input/grep.txt //Processing of the application: bigdata@bigdata-VirtualBox:~/solution3$ hadoop jar solution3.jar solution3 /solution3Input/grep.txt /solution3Output 2023-04-28 23:57:18,839 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032 2023-04-28 23:57:19,096 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2023-04-28 23:57:19,109 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/nfhr867/.staging/job_1682352643655_0031 2023-04-28 23:57:19,305 INFO input.FileInputFormat: Total input files to process : 1 2023-04-28 23:57:19,755 INFO mapreduce.JobSubmitter: number of splits:1 2023-04-28 23:57:19,858 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1682352643655_0031 2023-04-28 23:57:19,858 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2023-04-28 23:57:20,007 INFO conf.Configuration: resource-types.xml not found 2023-04-28 23:57:20,007 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2023-04-28 23:57:20,052 INFO impl.YarnClientImpl: Submitted application application_1682352643655_0031 2023-04-28 23:57:20,080 INFO mapreduce.Job: The url to track the job: http://nfhr867-VirtualBox:8088/proxy/application_1682352643655_0031/ 2023-04-28 23:57:20,081 INFO mapreduce.Job: Running job: job_1682352643655_0031 2023-04-28 23:57:25,134 INFO mapreduce.Job: Job job_1682352643655_0031 running in uber mode : false 2023-04-28 23:57:25,135 INFO mapreduce.Job: map 0% reduce 0% 2023-04-28 23:57:29,189 INFO mapreduce.Job: map 100% reduce 0% 2023-04-28 23:57:33,210 INFO mapreduce.Job: map 100% reduce 100% 2023-04-28 23:57:33,218 INFO mapreduce.Job: Job job_1682352643655_0031 completed successfully 2023-04-28 23:57:33,287 INFO mapreduce.Job: Counters: 54 File System Counters FILE: Number of bytes read=73 FILE: Number of bytes written=552927 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=201 HDFS: Number of bytes written=47 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=1623 Total time spent by all reduces in occupied slots (ms)=1549 Total time spent by all map tasks (ms)=1623 Total time spent by all reduce tasks (ms)=1549 Total vcore-milliseconds taken by all map tasks=1623 Total vcore-milliseconds taken by all reduce tasks=1549 Total megabyte-milliseconds taken by all map tasks=1661952 Total megabyte-milliseconds taken by all reduce tasks=1586176 Map-Reduce Framework Map input records=12 Map output records=13 Map output bytes=153 Map output materialized bytes=73 Input split bytes=110 Combine input records=13 Combine output records=5 Reduce input groups=5 Reduce shuffle bytes=73 Reduce input records=5 Reduce output records=5 Spilled Records=10 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=73 CPU time spent (ms)=760 Physical memory (bytes) snapshot=504819712 Virtual memory (bytes) snapshot=5109510144 Total committed heap usage (bytes)=395313152 Peak Map Physical memory (bytes)=259440640 Peak Map Virtual memory (bytes)=2551484416 Peak Reduce Physical memory (bytes)=245379072 Peak Reduce Virtual memory (bytes)=2558025728 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=91 File Output Format Counters Bytes Written=47 //Output of the process: bigdata@bigdata-VirtualBox:~/solution3$ hdfs dfs -cat /solution3Output/part-r-00000 XX long 1 long 2 medium 3 short 3 very short 4 //We can see that it succesfully ran and the output is given above //Input grep.txt file: h ha a hellowww helqqq hello1234smallfast haba b bigyellow bluehigh 234efrcwet bklue small