Abstract
The real estate investment model is complex due to the nature and number of attributes involved in developing the optimal investment model. Attributes such as housing type, housing shape, local community (religion, political affiliations, class), tax laws, personal and family influence, and financial market status all add to the complexity of the real estate investment model. This complexity is intensified by environmental factors, short- and long-term temporal changes, and education levels. A realistic investment model means incorporating as many of these factors as possible. Previous literature focuses more on the house price valuation for real estate investment, while research related to the detection of the suitable area of property development has been relatively neglected. In this dissertation, a machine learning model is developed to help identify real estate opportunities, i.e., to detect clusters with the possibility of discovering investors within the middle-income market, thereby guiding the banking institution on whether real estate properties should be developed or not. Several machine learning algorithms are introduced and tested, which incorporate a range of unsupervised learning algorithms with the ability to learn non-linearity and detect groups from the given attributes. Results indicate that the final chosen model can produce meaningful clusters, and when data is provided, the “middle-income” clusters can be detected. The clusters can then be utilized as inputs into the real estate simulation tool to guide the suitable investment decision.
The dataset is subjected to data pre-processing to produce two dispersed, correlated, and complete datasets that can be used to develop the unsupervised learning algorithm. Furthermore, the “hyper-parameter” tuning of the machine learning model is performed to improve the elementary results. Optimal machine learning algorithm, the k-Means ++ algorithm is chosen as the final model and improved. The solution results in meaningful clusters produced, the model profiles clients and the middle-income clients can be extracted from the clusters. The accuracy of the model is based on the cluster-validity index of 0.1870.