Point out the wrong statement.

the mapper outputs are sorted and then partitioned per reducer
the total number of partitions is the same as the number of reduce tasks for the job
the intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
none of the mentioned

The correct answer is: C. the intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format.

The intermediate, sorted outputs are stored in a format that is specific to the Hadoop implementation. For example, in Hadoop 2.x, the intermediate, sorted outputs are stored in a sequence file format.

The following is a brief explanation of each option:

  • A. the mapper outputs are sorted and then partitioned per reducer: This is correct. The mapper outputs are sorted by key and then partitioned per reducer.
  • B. the total number of partitions is the same as the number of reduce tasks for the job: This is correct. The total number of partitions is the same as the number of reduce tasks for the job.
  • C. the intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format: This is incorrect. The intermediate, sorted outputs are stored in a format that is specific to the Hadoop implementation.
  • D. none of the mentioned: This is incorrect. Option C is the only incorrect option.