ICT701 Business Intelligence


ICT701

Business Intelligence

Student Id:

Student Name:





List of Figures




Introduction

The report details the BI system used in a retail store, the system design and implementation. It aims at capturing, assembling, storing, analyzing and providing an interface for data in a structured data warehouse. The BI system involves the sales analysis and behaviour of the customers and the performance of the products to confirm that strategic decisions are made successfully.



Business Intelligence (BI) System

In the current trend of the retail market, the BI system plays an important role in processing the initial statistics into valuable knowledge. The BI system combines data acquisition, data transformation and reporting to facilitate decision-making processes. The suggested BI system employs PostgreSQL, Apache Airflow, Google BigQuery and Power BI to provide a cost and time-effective solution for the retail BI implementation.

Elements of the BI System

Data Sources

Transaction data that concerns various branches of the retail business are stored in PostgreSQL which by default acts as the primary storage system. It is also used for its ability to handle on-scale data sets which is crucial for large-scale applications [2]. Some of the data involved comprises and organization, sale, customer info, product description, records of payment and all originate from different units of a retailing company. This information is the base for the BI system in its preparedness to be processed under other steps [1].

ETL process

Apache Airflow is the tool that helps in the ETL process to extract data from various sources, transform it and load it into the data warehouse [7].

Extract: Apache Airflow extracts transactional data from PostgreSQL including sale data, product types and customer preferences among others[3].

Transform: The extracted data is processed before analysis to remove invalid data, date conversion to date format and numeric conversion apart from handling of missing data.

Load: The clean structured data is then inputted into Google BigQuery which is a data analysis tool. Using Apache Airflow, scheduling and monitoring of these ETL workflows could easily be accomplished due to the flexibility of this platform [5].

Data Analysis Layer

For operational and business intelligence Power BI is adopted for analyzing and data visualization. Google BigQuery integration lets users get up-to-date data for making real-time insights only [8].

Descriptive Analytics: Power BI assists in displaying and analyzing historical information like sales, consumers’ choices and so on [5].

Predictive Analytics: Power BI incorporates original forecasting tools which are more of built-in models to estimate future value of sales or product demand.

Prescriptive Analytics: Power BI has suggestions to improve organization outcomes, for example, determining the ways of product promotion to targeted businesses [3].

Reporting and Visualization

It is possible to make and transform different reports and dashboards using rich visualization tools of Power BI that would be suitable for retail managers. These assist in presenting patterns and trends in such a manner that necessary actions can be taken by the decision-makers [6]. The presence of interactive tools such as charts, graphs and key performance indicators will help managers in their decision-making about business [9].



Data Warehouse Framework

A data warehouse is therefore the central storeroom of all the data collected from different sources and is optimized for query and reporting purposes. The objective is to end up with an appropriate architecture for the data storage and retrieval as well as usage for decision-making facilities in retail setting.

Data Modeling

The star schema fits well for this retail store dataset of data. This schema calculates data into a fact table, which has transactional data – selling, payments, invoicing and several dimension tables like product details, customer details, time and geographical details. The information about primary KPI is presented by the fact table and the detailed contextual information is contained by the dimension tables [5].

Fact Table: This can store other transactional details such as amount, quantity and invoice number among others.

Dimension Tables: Comprises of data attributes such as; product and service type, customer type, gender, and the mode of payment. These dimensions offer capability in ways, methods and manners of carving data for operational purposes.

ETL Process (extract, transform and load)

ETL is the process of being able to extract, transform and load data from the various retail store transactional systems to a data warehouse where is queried and stored.

Extract: Information from the branch offices of the retailers (for instance, sales details, the customers’ information) is obtained from the transactional databases.

Transform: Time zones, currencies and categories are standardized, in addition, data is cleansed, normalised and formatted.

Load: The transformed data is then moved to the warehouse and this warehouse is Google BigQuery due to its scalability and performance.

Data Storage Technologies

The Google BigQuery is chosen as the data warehouse technology due to its flexibility in the handling of big data coupled with efficiency in query resolution. BigQuery as a cloud-based service supports processing data in real-time and has a feature of low-cost and virtually unlimited storage. The tool easily works in conjunction with other tools such as Apache Airflow for ETL operations or Power BI for reporting.


Architecture

The overall architecture includes multiple layers.

Data Sources: This information may contain details of customers visiting the retail branches, sales that took place within the branches, and details of stocks in the branches among others.

ETL Layer: Operated by Apache Airflow, therefore allowing for proper handling of the extraction, transformation, and loading processes [10].

Data Warehouse Layer: The data in BigQuery is stored in the star schema fashion which ensures that queries can be performed on it.

Analytics and Reporting Layer: Power BI uses BigQuery to pull data and create reports and dashboards which can be used by business users [8].

Rationale Behind the Data Warehouse and Business Intelligence

The technologies for implementing the BI system and the data warehouse framework, only scalable, integrative, and high-performing technologies are chosen. Google BigQuery becomes useful for the handling of big-scale retail data since it is a serverless solution that delivers high performance and accelerates real-time query, it is an effective solution for multi-branch data [7]. Apache Airflow helps to avoid confusion and makes ETL very simple as it automates the complex jobs of data extraction, transformation and loading into BigQuery [8]. Power BI is chosen because of the simple and intuitive interface in conjunction with its highly effective integration with BigQuery for providing instant, click-through dashboards and reports. Combined, these technologies form a solid, scalable solution to real-time analysis, which expands the application of big data for retail business decisions. The adoption of a star schema for the architecture will also foster easy querying and flexibility essential for business intelligence operation at high speed [6].




Visual analytics

Sales trend over time

The sales trend mentioned in the figure has highs and lows in 2022 with a spike in sales in March, July and September and a dip in May and November. This volatility can show months of high and low sales which will commonly pinpoint months that may benefit from adjustments to increase performance or promotional planning.

Figure 1 Total Sales Over Time

Total Sales by Product line

Figure 2 Total Sales by Product Line

The bar chart shows total sales in terms of product line, the highest-selling category is “Food and Beverages” with relatively high sales in “Electronic Accessories” and “Fashion Accessories” The lowest-selling category is “Health and Beauty” with moderate sales in “Home and Lifestyle” and “Sports and Travel” as the sales distributions are relatively moderate and evenly spread across all of the product categories.







Customer Rating Distribution

Figure 3 Customer Rating Distribution

The histogram shows the customer rating of customer ratings varying within the range of 4 and 10 while a slight focus on the average ratings of 6 and 7. Frequencies range from 80 to 110 showing that all the customers are at different levels of satisfaction without being overly unsatisfied. This distribution also shows an equal-term perception of the product or service in view.

Sales by Payment Method

Figure 4 Total Sales by Payment Method

The total sales by payment method shows that cash and Ewallet were slightly used more than credit card options. It shows that there is a high sale of cash followed by Ewallet, but both are almost equal to $100, 000. The percentage of credit card sales is the lowest at almost $ 100,000, thus showing a vast diversification of payment by customers.


Quantity of Products Sold by Gender

Figure 5 Quantity of Products Sold by Gender

The bar chart indicates that females sold more quantity of products as compared to male counterparts. Gender is presented on the x-axis and quantity of products on the y-axis and the chart fulfils its purpose of presenting the difference between the two genders in terms of sales.

Sales by City

Total sales include all sales made by cities, whereby Ballarat has edged slightly over $100000. This is then closely followed by Geelong and Melbourne which are almost equal and just under $100,000. It shows that the sales have a similar distribution in all three cities, meaning that the markets have equivalent productivity in all three locations and the clients’ engagement is moderate in all three cities.

Figure 6 Total Sales by City








Sales Heatmap by Branch and Product Line

Figure 7 Sales Heatmap by Branch and Product Line

The heatmap represents the distribution of sales across the branches (X, Y and Z) and different product lines, the sales by branch colour depth portrays the levels of sales where branch Z sales of food and beverages record high sales. From the table above it is seen that Branch Y has a relatively equal distribution of sales over the products while Branch X has relatively high sales in the Electronic and Sports categories. This was seen using the colour gradient which depicts total sales as varying between 14,000 and 22000 approximations.

Total Sales by Month

Figure 8 Total Sales by Month

The total sales by month, in the year 2022 have depicted an imperative trend seizing from the line chart as well. The sales are at their highest in January at $89,875, slightly lower in February at $83,234 and rise again in March at $88,603. Nonetheless, beginning of the year sales are high, but then they begin to dip down significantly especially in April, which records less than $10,000 worth of sales. The trend is that sales trend remains low for the rest of the year and bounces around In that $10,000 to $20,000 range without showing any sign of rebound.



Conclusion and Recommendation

Recommendation

Enhance Inventory Management During Peak Sales Periods: From the sales trends, it is clear that this company records its highest sales in March, July and September. To take advantage of these trends, maintain appropriate inventory levels and guarantee that the popular products, especially those belonging to such groups as Food and Beverages and Electronics, are in store [11].

Focus Marketing and Promotions in Low Sales Months: There is a marked low in sales in May and November. Maybe, launch specific marketing strategies or seasonal offers during these months to generate the volumes which should ideally be sold throughout the year [12].

Expand Digital Payment Options: As Cash and Ewallet have been the most common modes of payment, it will be effective to encourage the use of digital payment solutions, particularly, Ewallet by offering them special discounts that can help to improve the flow of the transaction.

Personalized Marketing by City: Looking at the sales point, Ballarat has the highest amount, which means one should expand operations or promotions in this city. At the same time, check that all other major cities including Geelong and Melbourne have appropriate basic & promotional seat share numbers which are considerably lower [11].



Conclusion

The sales were at their highest in January of the year 2022, everything went downward from March and was below 20000 for that whole year. Really, in terms of sales, Banch Z was at the top, and the most selling product line was “Food and Beverages”. Consumers’ choice of payment methods was attainable through cash and e-wallets. Customer satisfaction scores ranged between 6 and 7, female customers made higher purchasing than the male.




References

[1] A. Al-Okaily, A. P. Teoh, and M. Al-Okaily, “Evaluation of data analytics-oriented business intelligence technology effectiveness: an enterprise-level analysis,” Business Process Management Journal, vol. 29, no. 3, pp. 777-800, 2023. [Online]. Available: https://www.emerald.com/insight/content/doi/10.1108/BPMJ-10-2022-0546/full/html.

[2] C. A. Tavera Romero, J. H. Ortiz, O. I. Khalaf, and A. R. Prado, "Business intelligence: business evolution after industry 4.0," Sustainability, vol. 13, no. 18, p. 10026, 2021. [Online]. Available: https://www.mdpi.com/2071-1050/13/18/10026.

[3] A. Simitsis, P. Vassiliadis, and T. Sellis, "Optimizing ETL processes in data warehouses," in 21st International Conference on Data Engineering (ICDE'05), Apr. 2005, pp. 564-575. [Online]. Available: https://www.researchgate.net/publication/4133487_Optimizing_ETL_processes_in_data_warehouses.

[4] S. H. A. El-Sappagh, A. M. A. Hendawi, and A. H. El Bastawissy, "A proposed model for data warehouse ETL processes," Journal of King Saud University-Computer and Information Sciences, vol. 23, no. 2, pp. 91-104, 2011. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S131915781100019X.

[5] A. A. Yulianto, "Extract transform load (ETL) process in distributed database academic data warehouse," APTIKOM Journal on Computer Science and Information Technologies, vol. 4, no. 2, pp. 61-68, 2019. [Online]. Available: http://www.aptikomjournal.com/index.php/CSIT/article/download/36/pdf_1.

[6] M. Z. Iqbal, G. Mustafa, N. Sarwar, S. H. Wajid, J. Nasir, and S. Siddque, "A review of star schema and snowflakes schema," in Intelligent Technologies and Applications: Second International Conference, INTAP 2019, Bahawalpur, Pakistan, November 6–8, 2019, Revised Selected Papers, vol. 2, Springer Singapore, 2020, pp. 129-140. [Online]. Available: https://www.researchgate.net/publication/341264133_A_Review_of_Star_Schema_and_Snowflakes_Schema.

[7] V. N. Dinh and H. A. Nguyen, "Data Process Approach by Traditional and Cloud Services Methodologies," 2022. [Online]. Available: https://www.theseus.fi/bitstream/handle/10024/745789/Dinh_Nguyen.pdf.

[8] M. Mucchetti, BigQuery for Data Warehousing, Springer, 2020. [Online]. Available: https://link.springer.com/book/10.1007/978-1-4842-6186-6.

[9] A. Altdorf, "Operational Work Management with Data & Management Reporting: Utilizing Power BI Reporting and Visualization," 2024. [Online]. Available: https://www.theseus.fi/bitstream/handle/10024/858744/Altdorf_Annika.pdf.

[10] A. Saini, "The role of data warehousing in the infrastructure of E-commerce," IITM Journal of Management and IT, vol. 12, no. 2, pp. 39-47, 2021. [Online]. Available: https://ijaem.net/issue_dcp/The%20Role%20of%20Data%20Warehousing%20In%20the%20Infrastructure%20of%20E%20Commerce.pdf.

[11] P. Sridhar, C. R. Vishnu, and R. Sridharan, "Simulation of inventory management systems in retail stores: A case study," Materials Today: Proceedings, vol. 47, pp. 5130-5134, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2214785321039080.

[12] R. McColl, R. Macgilchrist, and S. Rafiq, "Estimating cannibalizing effects of sales promotions: The impact of price cuts and store type," Journal of Retailing and Consumer Services, vol. 53, p. 101982, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/am/pii/S0969698919300608.


14


FAQ's