Data Warehouse Design for Big Data in Academia

Alex Rudniy

doi:10.32604/cmc.2022.016676

Open Access icon Open Access

ARTICLE

Data Warehouse Design for Big Data in Academia

Alex Rudniy^*

Department of Computing Sciences, University of Scranton, Scranton, 18510, PA, USA

* Corresponding Author: Alex Rudniy. Email: email

Computers, Materials & Continua 2022, 71(1), 979-992. https://doi.org/10.32604/cmc.2022.016676

Received 08 January 2021; Accepted 06 September 2021; Issue published 03 November 2021

Abstract

This paper describes the process of design and construction of a data warehouse (“DW”) for an online learning platform using three prominent technologies, Microsoft SQL Server, MongoDB and Apache Hive. The three systems are evaluated for corpus construction and descriptive analytics. The case also demonstrates the value of evidence-centered design principles for data warehouse design that is sustainable enough to adapt to the demands of handling big data in a variety of contexts. Additionally, the paper addresses maintainability-performance tradeoff, storage considerations and accessibility of big data corpora. In this NSF-sponsored work, the data were processed, transformed, and stored in the three versions of a data warehouse in search for a better performing and more suitable platform. The data warehouse engines—a relational database, a No-SQL database, and a big data technology for parallel computations—were subjected to principled analysis. Design, construction and evaluation of a data warehouse were scrutinized to find improved ways of storing, organizing and extracting information. The work also examines building corpora, performing ad-hoc extractions, and ensuring confidentiality. It was found that Apache Hive demonstrated the best processing time followed by SQL Server and MongoDB. In the aspect of analytical queries, the SQL Server was a top performer followed by MongoDB and Hive. This paper also discusses a novel process for render students anonymity complying with Family Educational Rights and Privacy Act regulations. Five phases for DW design are recommended: 1) Establishing goals at the outset based on Evidence-Centered Design principles; 2) Recognizing the unique demands of student data and use; 3) Adopting a model that integrates cost with technical considerations; 4) Designing a comparative database and 5) Planning for a DW design that is sustainable. Recommendations for future research include attempting DW design in contexts involving larger data sets, more refined operations, and ensuring attention is paid to sustainability of operations.

Keywords

Big data; data warehouse; MongoDB; Apache hive; SQL server

Cite This Article

APA Style

Rudniy, A. (2022). Data Warehouse Design for Big Data in Academia. Computers, Materials & Continua, 71(1), 979–992. https://doi.org/10.32604/cmc.2022.016676

Vancouver Style

Rudniy A. Data Warehouse Design for Big Data in Academia. Comput Mater Contin. 2022;71(1):979–992. https://doi.org/10.32604/cmc.2022.016676

IEEE Style

A. Rudniy, “Data Warehouse Design for Big Data in Academia,” Comput. Mater. Contin., vol. 71, no. 1, pp. 979–992, 2022. https://doi.org/10.32604/cmc.2022.016676

BibTex EndNote RIS

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Data Warehouse Design for Big Data in Academia

Abstract

Keywords

Cite This Article

3563

2344

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link