在Hadoop中,如果我们没有设置reducer的数量,那么将创建多少个reducer?
就像映射器的数量取决于(总数据大小)/(输入拆分大小)一样,
例如。如果数据大小为1 TB,输入拆分大小为100 MB。那么映射器的数量将是(1000 * 1000)/ 100 = 10000(万)。
reducer 的数量取决于哪些因素?为一个工作创建了多少个 reducer ?
最佳答案
有多少减少? (来自official documentation)
正确的减少数似乎是0.95或1.75乘以
(节点数)*(每个节点的最大容器数)。
With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.
The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.
Configuration.set(MRJobConfig.NUM_MAPS, int)
(仅向框架提供提示)将其设置得更高。job.setNumReduceTasks(integer_numer);
关于hadoop - reducer 的默认数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55200955/