Open Access
ARTICLE
Data Warehouse Design for Big Data in Academia
Department of Computing Sciences, University of Scranton, Scranton, 18510, PA, USA
* Corresponding Author: Alex Rudniy. Email:
Computers, Materials & Continua 2022, 71(1), 979-992. https://doi.org/10.32604/cmc.2022.016676
Received 08 January 2021; Accepted 06 September 2021; Issue published 03 November 2021
Abstract
This paper describes the process of design and construction of a data warehouse (“DW”) for an online learning platform using three prominent technologies, Microsoft SQL Server, MongoDB and Apache Hive. The three systems are evaluated for corpus construction and descriptive analytics. The case also demonstrates the value of evidence-centered design principles for data warehouse design that is sustainable enough to adapt to the demands of handling big data in a variety of contexts. Additionally, the paper addresses maintainability-performance tradeoff, storage considerations and accessibility of big data corpora. In this NSF-sponsored work, the data were processed, transformed, and stored in the three versions of a data warehouse in search for a better performing and more suitable platform. The data warehouse engines—a relational database, a No-SQL database, and a big data technology for parallel computations—were subjected to principled analysis. Design, construction and evaluation of a data warehouse were scrutinized to find improved ways of storing, organizing and extracting information. The work also examines building corpora, performing ad-hoc extractions, and ensuring confidentiality. It was found that Apache Hive demonstrated the best processing time followed by SQL Server and MongoDB. In the aspect of analytical queries, the SQL Server was a top performer followed by MongoDB and Hive. This paper also discusses a novel process for render students anonymity complying with Family Educational Rights and Privacy Act regulations. Five phases for DW design are recommended: 1) Establishing goals at the outset based on Evidence-Centered Design principles; 2) Recognizing the unique demands of student data and use; 3) Adopting a model that integrates cost with technical considerations; 4) Designing a comparative database and 5) Planning for a DW design that is sustainable. Recommendations for future research include attempting DW design in contexts involving larger data sets, more refined operations, and ensuring attention is paid to sustainability of operations.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.