Faculty of HigherEducation
Unit Details |
Unit Name |
|
Code |
|
|
Year |
2024 |
|
Trimester number |
|
Assessment Details |
Assessment Name |
|
Due Date: dd/mm/yyyy |
|
Group Number |
|
|
** only list students who have contributed to this assignment |
|||
Group Details |
Student ID |
First Name |
Family Name |
Contribution % |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Group Submission Declaration |
Integrity Declaration We have read and understand academic integrity policies and practices and our assessment does not violate these.
|
||
Student ID |
First Name |
Family Name |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Submission Date |
|
Table of Contents
Descriptive Statistical Analysis 3
Commentary on calculated descriptive statistics 8
Number of bedrooms – Graphical Presentation 11
Scatter plot depicting relationship between DV and Land Size 12
Correlation and Regression analysis 13
Descriptive Statistical Analysis
The ‘data analysis’ tool in MS-excel has been used for the purpose of computing summary descriptive statistics for each numeric variable in the dataset. The tables below present the results for the same –
Land Size |
|
|
|
Mean |
1059.77 |
Standard Error |
238.79 |
Median |
727.50 |
Mode |
4000.00 |
Standard Deviation |
1067.88 |
Sample Variance |
1140369.54 |
Kurtosis |
4.88 |
Coefficient of Variation |
1.01 |
Skewness |
2.39 |
Range |
3720.00 |
Minimum |
280.00 |
Maximum |
4000.00 |
Sum |
21195.40 |
Count |
20 |
Bedrooms |
|
|
|
Mean |
4.00 |
Standard Error |
0.18 |
Median |
4.00 |
Mode |
4.00 |
Standard Deviation |
0.79 |
Sample Variance |
0.63 |
Kurtosis |
0.81 |
Coefficient of Variation |
0.20 |
Skewness |
0.70 |
Range |
3.00 |
Minimum |
3.00 |
Maximum |
6.00 |
Sum |
80.00 |
Count |
20 |
Garage Space |
|
|
|
Mean |
2.55 |
Standard Error |
0.32 |
Median |
2.00 |
Mode |
2.00 |
Standard Deviation |
1.43 |
Sample Variance |
2.05 |
Kurtosis |
4.19 |
Coefficient of Variation |
0.56 |
Skewness |
1.98 |
Range |
6.00 |
Minimum |
1.00 |
Maximum |
7.00 |
Sum |
51.00 |
Count |
20 |
Nearest secondary school |
|
|
|
Mean |
3.78 |
Standard Error |
1.17 |
Median |
1.56 |
Mode |
- |
Standard Deviation |
5.23 |
Sample Variance |
27.37 |
Kurtosis |
4.63 |
Coefficient of Variation |
1.38 |
Skewness |
2.24 |
Range |
19.48 |
Minimum |
0.34 |
Maximum |
19.82 |
Sum |
75.64 |
Count |
20 |
Selling price ($) |
|
|
|
Mean |
1412800.00 |
Standard Error |
128458.16 |
Median |
1270000.00 |
Mode |
1000000.00 |
Standard Deviation |
574482.34 |
Sample Variance |
330029957894.74 |
Kurtosis |
7.50 |
Coefficient of Variation |
0.41 |
Skewness |
2.60 |
Range |
2400000.00 |
Minimum |
1000000.00 |
Maximum |
3400000.00 |
Sum |
28256000.00 |
Count |
20 |
Commentary on calculated descriptive statistics
The analysis and evaluation of the computed descriptive statistics has been presented below –
Land Size
The first and most significant descriptive statistical value that needs to be reported is the mean which is 1059.77 square metres. In simple words, the average size of the sample houses is around 1060 sqm (Holmes Institute, 2024a). Thus, fairly large sized properties have been included in the sample. Further, attention must be paid to the median which has been estimated at 727.50. If the sizes of all sample houses are arranged in ascending order, the central value identified would be 727.50 sqm. Also, mode of 4000 denoted that multiple houses with the size of 4000 sqm were reported in the sample.
The standard deviation of 1067.88, variance of 1140369.54 and coefficient of variation of 1.01 suggested that the data values were scattered wide from the average value. A preliminary idea about the shape of the distribution can be made based on the values with respect to Kurtosis and Skewness. The former is 4.88 means that the shape of the distribution obtained would be that of one that is too peaked (Smart PLS, 2024). The value of Skewness has been found to be 2.39. This showed that the distribution would not be normal. Since the value is positive, the distribution would have a longer tail on the right hand side.
Bedrooms
Interpreting the several measures of descriptive summary statistics in the dataset, it was found that average bedrooms in the sample houses are 4. If the values of all the bedrooms are arranged from smallest to largest, the central value would also be located at 4. Interestingly, this is the same number of bedrooms that appeared for most of the properties in the collected sample (Holmes Institute, 2024a). Thus, all the mean, mode, and median for the sample properties have been estimated to be equal.
The coefficient of S.D. is 0.79 while the variance is 0.63. This highlights the fact that data related to bedrooms had low variation in it. The data values were assessed to be closely located to the central value or mean. Coefficient of variation of 0.20 indicated this factor as this can be interpreted to show that the S.D. was only 20% of the value of mean. In addition, Kurtosis is 0.81 which helps understanding that the distribution is normal and its shape would not be too peaked. Finally, Skewness is 2.24, thus, is higher than the acceptable value of +2 (Smart PLS, 2024). It can, therefore, be stated that the distribution of data associated to this variable would depict non-normality.
Garage Space
The median value of 2 denotes that if the size or spaces for cars in each of the properties are arranged in ascending order, the middle value would be 2. The mode of 2 showed that the frequency of houses with garage space for two cars in the dataset was the highest. Further, the average number of garage space in the houses was ascertained to be 2.55. It is easy to suggest that the overall mean was higher than both median and mode (Holmes Institute, 2024a). This hints that on an average, each property in the sample contained garage space higher than 2.
Moving towards the measures of dispersion, the coefficient of S.D. has been calculated to be 1.43 while the sample variance is 2.05. Thus, the level of variability in the data has been assessed to be on the lower side. Also, the coefficient of variation has been assessed to be equal to 0.56 which showed that the S.D. was about 56% of the mean (Holmes Institute, 2024b). This further evidenced the fact that the variation in the data was low. Also, moving towards the values of Kurtosis and Skewness, the former has been estimated at 4.19. It can therefore, be commented that the data distribution would be too peaked. Finally, the skewness value has been estimated to be 1.98 (Smart PLS, 2024). Thus, it is clear that the data distribution would be non normal.
Distance from nearest secondary school
This is one of the most unique and interesting variables that have been collected in the dataset. Firstly, the value of 3.78 as mean shows that average distance of the houses in the sample from the nearest secondary school was as high as 4 kms. The median value has been obtained to be 1.56 which suggests that on arranging the data in ascending order the overall value is only 1.56 kms (Holmes Institute, 2024a). It is easy to suggest that the data values differ significantly from the mean. No mode has been found for this variable which shows that the overall data values do not repeat in the distribution.
The S.D. has been estimated to be 5.23 while the variance is as high as 27.37. In addition to this, the coefficient of variation has been found to be 1.38 which suggested that S.D. was 138% higher than the mean value (Statistics How to, 2024a). It is apparent that the overall value of the data distribution contained high variation. Kurtosis is 4.63 which showed that the data values related to a highly peaked distribution. The Skewness of 2.24 also was found to be higher than the acceptable value of 2.24 (Smart PLS, 2024). It has, therefore, been found that the data distribution was too long on the right hand side of the distribution. Also, the data is non-normal as the value of skewness was higher than 2.
Property Price ($)
The interpretation of the median value obtained is one of the significant computations with respect to the price of the sample houses. The median value of $ 1270000 shows that the central value for price is $ 1270000. However, average value of 1412800 suggested that properties in the sample were priced at an average of $ 1412800. On the other hand, the median value is $ 1270000 which indicates that the frequency of houses with a price of $ 1270000 was highest in the sample under consideration.
Considering the measures of dispersion, the S.D. is 574482.34 while the sample variance is 330029957894.74. The coefficient of variation of 0.41 showed that S.D. was just 41% of the value of mean. In addition, the Kurtosis value of 7.50 shows that the peak of the distribution on the graph would be too high. Further, Skewness of 2.60 exceeded the limit of +2 and confirmed the fact that the distribution on the sample was on the higher side. The overall measure of dispersion in the dataset, therefore, shows that variability in the data was on the high side (Smart PLS, 2024). In simple words, if the values are plotted on graph, the data points would be scattered wider from the mean.
Number of bedrooms – Graphical Presentation
The below histogram is the perfect choice for showing the data associated to number of bedrooms in the selected sample. The figure has been shown below –
Figure 1 – Histogram for bedroom numbers
Scatter plot depicting relationship between DV and Land Size
Figure 2 - Scatter Plot showing price and land size
The scatter plot in Figure 2 above depicts the relationship that the land size shares with the response variable (price). The identification of the relationship is possible through observation of the slope of the trend-line and the position of the data points. First of all, the trend line is sloping downward which suggests that the two variables are negative correlated to each other. To interpret, an increase in the land size results in a decline in the price of the sold houses in the sample. The position of the data points, on the other hand, suggests the strength of the correlation between the two variables. Several outliers have been identified in the above plot and the data points are wider from the trend line. This indicates that the association among the variables is weak.
Correlation and Regression analysis
The discussion on several parts has been presented below –
Regression Statistics |
|
Multiple R |
0.268 |
R Square |
0.072 |
Adjusted R Square |
-0.176 |
Standard Error |
622872.660 |
Observations |
20 |
ANOVA
|
df |
SS |
MS |
F |
Significance F |
Regression |
4 |
4.51014E+11 |
1.12753E+11 |
0.290624 |
0.879497062 |
Residual |
15 |
5.81956E+12 |
3.8797E+11 |
|
|
Total |
19 |
6.27057E+12 |
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
Intercept |
1219440.48 |
788091.14 |
1.55 |
0.14 |
-460336.01 |
2899216.97 |
-460336.01 |
2899216.97 |
Land Size |
7.59 |
266.80 |
0.03 |
0.98 |
-561.08 |
576.25 |
-561.08 |
576.25 |
Bedrooms |
114871.33 |
193905.87 |
0.59 |
0.56 |
-298429.25 |
528171.92 |
-298429.25 |
528171.92 |
Garage Space |
-92106.77 |
148169.37 |
-0.62 |
0.54 |
-407922.31 |
223708.76 |
-407922.31 |
223708.76 |
Nearest secondary school |
-10390.35 |
43993.85 |
-0.24 |
0.82 |
-104161.01 |
83380.31 |
-104161.01 |
83380.31 |
The multiple regression equation has been presented below –
House Price ($) = 1219440.48 + 7.59 (Land Size) + 114871.33 (Bedrooms) – 92106.77 (Garage Space) – 10390.35 (Nearest Secondary School)
The meaning of all the coefficients in the equation above has been explained individually as follows –
Intercept – In general terms, the intercept shows the value of the response variable when the entire explanatory variable are zero. Thus, when all the IDVs are equal to zero the house price would be $ 1219440.48.
Land Size – The coefficient value of 7.59 indicates that change in size by 1 square metres leads to increase in the house price by $ 7.59.
Bedrooms – The value of coefficient is 114871.33 which can be interpreted to state that increase in number of bedrooms by 1 leads to increase in house price by $ 114871.33.
Garage Space – An increase in space for one additional vehicle leads to decline in property price by $ 92,106.77.
Nearest Secondary School – If the distance from the secondary school increases by 1 kilometre the house prices drop by $ 10,390.55.
The coefficient of determination is denoted by r-squared. In the above regression model, the value of coefficient is 0.072. It means that 7.20% of the variation in the house prices occurs due to the multiple independent variables in the model (Statistics How to, 2024). Thus, the identified relationship is weak.
In multiple regression analysis, the F ratio is the indicator of the significance of the overall significance of the model. The F ratio (4,15) is 0.291. The p value is 0.879 which is above the stated level of significance of 0.05 or 5% (Holmes Institute, 2024). Thus, the independent variables share an overall insignificant association with the property prices.
At 5% significance level, none of the independent variables have p value lower than 0.05. This is another indication of the fact that all the independent variables are insignificantly associated to house prices.
The correlation output has been shown below –
|
Land Size |
Bedrooms |
Garage Space |
Nearest secondary school |
Land Size |
1.000 |
|
|
|
Bedrooms |
0.244 |
1.000 |
|
|
Garage Space |
0.645 |
0.093 |
1.000 |
|
Nearest secondary school |
0.680 |
0.370 |
0.175 |
1.000 |
The above correlation matrix presents the correlation among different independent variables in the dataset. As can be seen in the table, all the explanatory variables share a positive association with each other. Distance and Land size have a correlation coefficient of 0.680 which is the highest in the matrix followed by Garage space and Land size (CFI, 2024). It is, therefore, apparent that these two combinations have strong correlation. If one increases, the other rises as well.
One test of multicolinearity in the matrix can be done based on the values of the correlation coefficients. Since none in the above table exceeds 0.7 it can be said that multicolinearity does not exist.
Summary
The results of the regression analysis obtained helps in addressing the research question that has been established. It has been found that the property prices are directly and insignificantly impacted by variables such as land size and number of bedrooms in the property. On the other hand, prices are inversely associated with distance from secondary school near the property and the number of garage space. All the identified associations were statistically insignificant.
References
Corporate Finance Institute 2024, ‘Correlation Matrix’, CFI, viewed 27 September 2024, https://corporatefinanceinstitute.com/resources/excel/correlation-matrix/
Holmes Institute 2024, ‘Lecture 11: Multiple Regression Analysis’.
Holmes Institute 2024a, ‘Lecture 2: Descriptive statistics: Numerical measures vs. Qualitative measures’.
Holmes Institute 2024b, ‘Lecture 9: Parametric vs. Non-parametric Statistics’.
Smart PLS 2024, ‘How to Interpret Excess Kurtosis and Skewness’, Smart PLS, viewed 26 September 2024, https://www.smartpls.com/documentation/functionalities/excess-kurtosis-and-skewness
Statistics How to 2024, ‘Coefficient of Determination (R Squared): Definition, Calculation’, Statistics How to, viewed 26 September 2024, https://www.statisticshowto.com/probability-and-statistics/coefficient-of-determination-r-squared/
Statistics How to 2024, ‘How to find a coefficient of variation’, Statistics How to, viewed 27 September 2024, https://www.statisticshowto.com/probability-and-statistics/how-to-find-a-coefficient-of-variation/
Also Read
- Business and Tourism Management: An Analysis of Leadership and Change Management in the Tourism Industry
- Innovation and Creativity in Business Analytics: UX, CX and Ethical innovation case study for a dating application
- Business Event Management: Choosing the Perfect Venue for the Australian Food for Future Convention and Exposition
- Fundamentals of Business Finance
- The Marketing Business: A Full Marketing Plan for Cafepod