BUG OF THE MONTH | MapReduce
V6072 Two similar code fragments were found. Perhaps, this is a typo and ‘localFiles’ variable should be used instead of ‘localArchives’. LocalDistributedCacheManager.java(183), LocalDistributedCacheManager.java(178), LocalDistributedCacheManager.java(176), LocalDistributedCacheManager.java(181)
public synchronized void setup(JobConf conf, JobID jobId) throws IOException { .... // Update the configuration object with localized data. if (!localArchives.isEmpty()) { conf.set(MRJobConfig.CACHE_LOCALARCHIVES, StringUtils .arrayToString(localArchives.toArray(new String[localArchives .size()]))); } if (!localFiles.isEmpty()) { conf.set(MRJobConfig.CACHE_LOCALFILES, StringUtils .arrayToString(localFiles.toArray(new String[localArchives .size()]))); } .... }
In the first block the localArchives variable is used, in the second similar fragment – localFiles. If you study this code with due diligence, rather then quickly run through it, as it often happens while code reviewing, you’ll notice the fragment, where the author forgot to replace the localArchives variable.
This gaffe can lead to the following scenario:
- Suppose, we have localArchives (size = 4) and localFiles (size = 2);
- When creating the array localFiles.toArray(new String[localArchives.size()]), the last 2 elements will be null ([“pathToFile1”, “pathToFile2”, null, null]);
- Then org.apache.hadoop.util.StringUtils.arrayToString will return the string representation of our array, in which the last files names will be presented as “null” (“pathToFile1, pathToFile2, null, null”);
- All this will be passed further and God only knows what kind of checks there are for such cases =).
Please click here to see more bugs from this project.