Open Access
ARTICLE
A Novel Approach for Deciphering Big Data Value Using Dark Data
Department of Information Systems, College of Computer Sciences and Information Technology, King Faisal University, Al Hasa, 36362, Saudi Arabia
* Corresponding Author: Surbhi Bhatia. Email:
Intelligent Automation & Soft Computing 2022, 33(2), 1261-1271. https://doi.org/10.32604/iasc.2022.023501
Received 10 September 2021; Accepted 08 December 2021; Issue published 08 February 2022
Abstract
The last decade has seen a rapid increase in big data, which has led to a need for more tools that can help organizations in their data management and decision making. Business intelligence tools have removed many of the obstacles to data visibility, and numerous data mining technologies are playing an essential role in this visibility. However, the increase in big data has also led to an increase in ‘dark data’, data that does not have any predefined structure and is not generated intentionally. In this paper, we show how dark data can be mined for practical purposes and utilized to gain business insight. The most common type of dark data is a log file generated on a web server. Using the example of log files generated by e-commerce transactions, this paper shows how residual data and data trails can prove to be valuable when an actual dataset is inaccessible, and explains the usage of residual data for modeling purposes. The work uses a system identification approach, based on natural language processing for log file tokenization and feature extraction. The features are then embedded into the next step, which uses a deep neural network to identify customers for targeted advertising. The results achieve a significant accuracy and show how dark data has the potential to deliver value for business. Locating, organizing, and understanding dark data can unlock its relevance, usefulness, and potential monetization, but it is important to act when the benefits of use outweigh the costs of access and analysis.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.