Point out the wrong statement.

[amp_mcq option1=”the mapper outputs are sorted and then partitioned per reducer” option2=”the total number of partitions is the same as the number of reduce tasks for the job” option3=”the intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format” option4=”none of the mentioned” correct=”option3″]

The correct answer is: C. the intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format.

The intermediate, sorted outputs are stored in a format that is specific to the Hadoop implementation. For example, in Hadoop 2.x, the intermediate, sorted outputs are stored in a sequence file format.

The following is a brief explanation of each option:

  • A. the mapper outputs are sorted and then partitioned per reducer: This is correct. The mapper outputs are sorted by key and then partitioned per reducer.
  • B. the total number of partitions is the same as the number of reduce tasks for the job: This is correct. The total number of partitions is the same as the number of reduce tasks for the job.
  • C. the intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format: This is incorrect. The intermediate, sorted outputs are stored in a format that is specific to the Hadoop implementation.
  • D. none of the mentioned: This is incorrect. Option C is the only incorrect option.