A Model for Enhancing Unstructured Big Data Warehouse Execution Time

Citation DataBig Data and Cognitive Computing, ISSN: 2504-2289, Vol: 8, Issue: 2

Publication Year2024

1
Citations
0
Usage
52
Captures
2
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
1
- Citation Indexes
  1
Captures
52
- Readers
  52
Mentions
2
- Blog Mentions
  1
- News Mentions
  1

Most Recent Blog

BDCC, Vol. 8, Pages 17: A Model for Enhancing Unstructured Big Data Warehouse Execution Time

February 6, 2024
MDPI Publishing

BDCC, Vol. 8, Pages 17: A Model for Enhancing Unstructured Big Data Warehouse Execution Time Big Data and Cognitive Computing doi: 10.3390/bdcc8020017 Authors: Marwa Salah

Most Recent News

Data on Big Data and Cognitive Computing Published by a Researcher at Helwan University (A Model for Enhancing Unstructured Big Data Warehouse Execution Time)

February 26, 2024
Information Technology Daily

2024 FEB 26 (NewsRx) -- By a News Reporter-Staff News Editor at Information Technology Daily -- New research on big data and cognitive computing is

Article Description

Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract–Transform–Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract–Clean–Load–Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.

Bibliographic Details

DOI10.3390/bdcc8020017

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85185542841&origin=inward; http://dx.doi.org/10.3390/bdcc8020017; https://www.mdpi.com/2504-2289/8/2/17; https://dx.doi.org/10.3390/bdcc8020017

AUTHOR(S)

Marwa Salah Farhan; Amira Youssef; Laila Abdelhamid

PUBLISHER(S)

MDPI AG

TAG(S)

Business, Management and Accounting; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know