Vengadeswaran∗, Balasundaram
Computer Systems Science and Engineering, Vol.34, No.1, pp. 47-60, 2019, DOI:10.32604/csse.2019.34.047
Abstract The tremendous growth of data being generated today is making storage and computing a mammoth task. With its distributed processing capability Hadoop
gives an efficient solution for such large data. Hadoop’s default data placement strategy places the data blocks randomly across the nodes without considering
the execution parameters resulting in several lacunas such as increased execution time, query latency etc., Also, most of the data required for a task execution
may not be locally available which creates data-locality problem. Hence we propose an innovative data placement strategy based on dependency of data
blocks across the More >