Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (14)
  • Open Access

    ARTICLE

    Performance Improvement through Novel Adaptive Node and Container Aware Scheduler with Resource Availability Control in Hadoop YARN

    J. S. Manjaly, T. Subbulakshmi*

    Computer Systems Science and Engineering, Vol.47, No.3, pp. 3083-3108, 2023, DOI:10.32604/csse.2023.036320 - 09 November 2023

    Abstract The default scheduler of Apache Hadoop demonstrates operational inefficiencies when connecting external sources and processing transformation jobs. This paper has proposed a novel scheduler for enhancement of the performance of the Hadoop Yet Another Resource Negotiator (YARN) scheduler, called the Adaptive Node and Container Aware Scheduler (ANACRAC), that aligns cluster resources to the demands of the applications in the real world. The approach performs to leverage the user-provided configurations as a unique design to apportion nodes, or containers within the nodes, to application thresholds. Additionally, it provides the flexibility to the applications for selecting and… More >

  • Open Access

    ARTICLE

    Enhanced Best Fit Algorithm for Merging Small Files

    Adnan Ali1, Nada Masood Mirza1,2, Mohamad Khairi Ishak1,*

    Computer Systems Science and Engineering, Vol.46, No.1, pp. 913-928, 2023, DOI:10.32604/csse.2023.036400 - 20 January 2023

    Abstract In the Big Data era, numerous sources and environments generate massive amounts of data. This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes. Hadoop is used to process this kind of data. It is known to handle vast volumes of data more efficiently than tiny amounts, which results in inefficiency in the framework. This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm (EBFM) that merges files depending on predefined parameters (type and size). Implementing… More >

  • Open Access

    ARTICLE

    New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes

    Keungyeup Ji, Youngmi Kwon*

    Computer Systems Science and Engineering, Vol.45, No.1, pp. 201-214, 2023, DOI:10.32604/csse.2023.031270 - 16 August 2022

    Abstract As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naïve Bayes, K-Nearest… More >

  • Open Access

    ARTICLE

    Twitter Data Analysis Using Hadoop and ‘R’ and Emotional Analysis Using Optimized SVNN

    K. Sailaja Kumar*, H. K. Manoj, D. Evangelin Geetha

    Computer Systems Science and Engineering, Vol.44, No.1, pp. 485-499, 2023, DOI:10.32604/csse.2023.025390 - 01 June 2022

    Abstract Standalone systems cannot handle the giant traffic loads generated by Twitter due to memory constraints. A parallel computational environment provided by Apache Hadoop can distribute and process the data over different destination systems. In this paper, the Hadoop cluster with four nodes integrated with RHadoop, Flume, and Hive is created to analyze the tweets gathered from the Twitter stream. Twitter stream data is collected relevant to an event/topic like IPL- 2015, cricket, Royal Challengers Bangalore, Kohli, Modi, from May 24 to 30, 2016 using Flume. Hive is used as a data warehouse to store the… More >

  • Open Access

    ARTICLE

    Hybrid Deep Learning Framework for Privacy Preservation in Geo-Distributed Data Centre

    S. Nithyanantham1,*, G. Singaravel2

    Intelligent Automation & Soft Computing, Vol.32, No.3, pp. 1905-1919, 2022, DOI:10.32604/iasc.2022.022499 - 09 December 2021

    Abstract In recent times, a huge amount of data is being created from different sources and the size of the data generated on the Internet has already surpassed two Exabytes. Big Data processing and analysis can be employed in many disciplines which can aid the decision-making process with privacy preservation of users’ private data. To store large quantity of data, Geo-Distributed Data Centres (GDDC) are developed. In recent times, several applications comprising data analytics and machine learning have been designed for GDDC. In this view, this paper presents a hybrid deep learning framework for privacy preservation… More >

  • Open Access

    ARTICLE

    Research on ABAC Access Control Based on Big Data Platform

    Kun Yang1, Xuanxu Jin2, Xingyu Zeng1,*

    Journal of Cyber Security, Vol.3, No.4, pp. 187-199, 2021, DOI:10.32604/jcs.2021.026735 - 09 February 2022

    Abstract In the environment of big data, the traditional access control lacks effective and flexible access mechanism. Based on attribute access control, this paper proposes a HBMC-ABAC big data access control framework. It solves the problems of difficult authority change, complex management, over-authorization and lack of authorization in big data environment. At the same time, binary mapping codes are proposed to solve the problem of low efficiency of policy retrieval in traditional ABAC. Through experimental analysis, the results show that our proposed HBMC-ABAC model can meet the current large and complex environment of big data. More >

  • Open Access

    ARTICLE

    BitmapAligner: Bit-Parallelism String Matching with MapReduce and Hadoop

    Mary Aksa1, Junaid Rashid2,*, Muhammad Wasif Nisar1, Toqeer Mahmood3, Hyuk-Yoon Kwon4, Amir Hussain5

    CMC-Computers, Materials & Continua, Vol.68, No.3, pp. 3931-3946, 2021, DOI:10.32604/cmc.2021.016081 - 06 May 2021

    Abstract Advancements in next-generation sequencer (NGS) platforms have improved NGS sequence data production and reduced the cost involved, which has resulted in the production of a large amount of genome data. The downstream analysis of multiple associated sequences has become a bottleneck for the growing genomic data due to storage and space utilization issues in the domain of bioinformatics. The traditional string-matching algorithms are efficient for small sized data sequences and cannot process large amounts of data for downstream analysis. This study proposes a novel bit-parallelism algorithm called BitmapAligner to overcome the issues faced due to… More >

  • Open Access

    ARTICLE

    Residential Electricity Classification Method Based On Cloud Computing Platform and Random Forest

    Ming Li1, Zhong Fang2, Wanwan Cao1, Yong Ma1,*, Shang Wu1, Yang Guo1, Yu Xue3, Romany F. Mansour4

    Computer Systems Science and Engineering, Vol.38, No.1, pp. 39-46, 2021, DOI:10.32604/csse.2021.016189 - 01 April 2021

    Abstract With the rapid development and popularization of new-generation technologies such as cloud computing, big data, and artificial intelligence, the construction of smart grids has become more diversified. Accurate quick reading and classification of the electricity consumption of residential users can provide a more in-depth perception of the actual power consumption of residents, which is essential to ensure the normal operation of the power system, energy management and planning. Based on the distributed architecture of cloud computing, this paper designs an improved random forest residential electricity classification method. It uses the unique out-of-bag error of random More >

  • Open Access

    ARTICLE

    Design and Implementation of Log Data Analysis Management System Based on Hadoop

    Dunhong Yao1,2,3,*, Yu Chen4

    Journal of Information Hiding and Privacy Protection, Vol.2, No.2, pp. 59-65, 2020, DOI:10.32604/jihpp.2020.010223 - 11 November 2020

    Abstract With the rapid development of the Internet, many enterprises have launched their network platforms. When users browse, search, and click the products of these platforms, most platforms will keep records of these network behaviors, these records are often heterogeneous, and it is called log data. To effectively to analyze and manage these heterogeneous log data, so that enterprises can grasp the behavior characteristics of their platform users in time, to realize targeted recommendation of users, increase the sales volume of enterprises’ products, and accelerate the development of enterprises. Firstly, we follow the process of big… More >

  • Open Access

    ARTICLE

    An Optimized Resource Scheduling Strategy for Hadoop Speculative Execution Based on Non-cooperative Game Schemes

    Yinghang Jiang1, Qi Liu2,3,*, Williams Dannah1, Dandan Jin2, Xiaodong Liu3, Mingxu Sun4,*

    CMC-Computers, Materials & Continua, Vol.62, No.2, pp. 713-729, 2020, DOI:10.32604/cmc.2020.04604

    Abstract Hadoop is a well-known parallel computing system for distributed computing and large-scale data processes. “Straggling” tasks, however, have a serious impact on task allocation and scheduling in a Hadoop system. Speculative Execution (SE) is an efficient method of processing “Straggling” Tasks by monitoring real-time running status of tasks and then selectively backing up “Stragglers” in another node to increase the chance to complete the entire mission early. Present speculative execution strategies meet challenges on misjudgement of “Straggling” tasks and improper selection of backup nodes, which leads to inefficient implementation of speculative executive processes. This paper… More >

Displaying 1-10 on page 1 of 14. Per Page