Cardiff School of Technology
Cardiff Metropolitan University
Proposal for Dissertation Project
CIS7017 Technology Dissertation
Predicting fraudulent transactions using machine learning algorithms
Student Name:
Student Number:
Date:
Abstract
The project is based on providing information about finding the fraudulent transaction which is executed through machine learning algorithms. The report has generated the project by using the supervised and unsupervised fraud model. The study will explore data exploration which allows for effectively defining the work. The aim that will be achieved during the project execution process is to make predictions for fraud transactions using machine learning algorithms. for the better implementation of this technique, it must necessary to provide effective tools for the working process. It has also concluded the analysis which is based on the Gantt chart. This allows us to understand the project plan of finding the fraud using machine learning algorithms.
List of figures
CHAPTER 1: INTRODUCTION
The machine learning model is almost concluded in every sector of the industry that has given the many techniques based on the algorithms. For predicting the fraud transaction this model has given many techniques to resolve this solution. The study will analyze the ways through which it can help in making a proper project based on the algorithms. It will conclude the project plan for the project that helps in providing the time complexity that can be developed during the execution of the task.
It will provide the different methodologies which are going to be used in the data analysis process. After that, it will define the list of software requirements which will help in identifying the system requirements for the project. This will also define the testing plan which will help in establishing the project effectively on the live floor.
The machine learning model is used to use to find the behavior based on the patterns. There are many ways to analyze the scenario of the frauds that are occurred during the process. At the current time, the algorithms that are concluded in the following section are logistic Regression (De Roux et al. 2018). Which is used to generate the output based on the fraud and non-fraud transactions. It has been that it has decreased the rate of fraud in the sector of money transfer which allows them to work effectively. the machine learning algorithms are used for the better development of automated techniques (Gao et al. 2019).
1.3 Aim and objectives
This project aims to give the proper fraud prediction techniques based on machine learning algorithms.
Objectives:
To find out the best method to predict fraud using machine learning algorithms.
To provide the theoretical knowledge of algorithms by reviving the different author’s reports.
To provide the effective technique for providing the project execution plan.
To analyze the resource requirement based on the algorithms which are going to be used.
To give a risk analysis plan and provide risk mitigation techniques on behalf of each risk technique.
To provide proper data of algorithms that are going to use in the project.
1.4 Project plan
The project plan of this study is going to provide an outline for managing time for the project. which will help to define the time for each executioner task that helps in providing better support to the report.
The first step will define the problem statement which gives the information about to whom the project is going to be made. This has concluded the section of the aims and objectives that allow understanding of the functionality of the report. That also defines the cost of the project based on the prediction which gives information about the total cost generation (De Roux et al. 2018).
The next is to give the literature review on the defined topic which will helps in identifying the requirements for the project initiation. The report is given based on the relevant research topic. Which gives the accessibility of using the knowledge in the better development of the project (Salve et al. 2018).
Data collection helps in identifying the relevant topic related to the topic which will helps in identifying the required resources for the project. it also gives the data analysis process for finding the relevant data.
The next stage is to develop the project by providing the coding structure to it. The main programming language is used in the project is python language. That allows the developer to use multiple techniques based on the machine learning methodology (De Roux et al. 2018).
At the last, it will discuss the project in the form of result and a conclusion that give a short description of the report. After this procedure, it will come the final submission of the project (Salve et al. 2018).
1.5
Gantt chart
(Source: Author 2022)
Activities |
Duration |
Week 1 |
Week 2 |
Week 3 |
Week 4 |
Week 5 |
Week 6 |
Week 7 |
Week 8 |
Week 9 |
Week 10 |
Week 11 |
Week 12 |
Introduction |
14 days |
|
|
|
|
|
|
|
|
|
|
|
|
Defining the topic |
3 days |
|
|
|
|
|
|
|
|
|
|
|
|
Achieving the aims |
5 days |
|
|
|
|
|
|
|
|
|
|
|
|
Making the objectives for the report |
2 days |
|
|
|
|
|
|
|
|
|
|
|
|
Establishing the project teams |
2 days |
|
|
|
|
|
|
|
|
|
|
|
|
Giving the budget to the report |
3 days |
|
|
|
|
|
|
|
|
|
|
|
|
Literature review |
24 days |
|
|
|
|
|
|
|
|
|
|
|
|
Performing authentication protocols |
12 days |
|
|
|
|
|
|
|
|
|
|
|
|
Finding the relevant article |
6 days |
|
|
|
|
|
|
|
|
|
|
|
|
Article analysis and reviewing |
6 days |
|
|
|
|
|
|
|
|
|
|
|
|
Data collection |
18 days |
|
|
|
|
|
|
|
|
|
|
|
|
Segregation of data |
10 days |
|
|
|
|
|
|
|
|
|
|
|
|
Drafting the data |
8 days |
|
|
|
|
|
|
|
|
|
|
|
|
Software development |
30 days |
|
|
|
|
|
|
|
|
|
|
|
|
Data cleaning |
10 days |
|
|
|
|
|
|
|
|
|
|
|
|
ML model implementation |
12 days |
|
|
|
|
|
|
|
|
|
|
|
|
analysis and result
|
8 days |
|
|
|
|
|
|
|
|
|
|
|
|
Discussion and closure |
5 days |
|
|
|
|
|
|
|
|
|
|
|
|
Submission |
5 days |
|
|
|
|
|
|
|
|
|
|
|
|
1.9 Risk analysis
Table 2: Risk Analysis for the project
(Source: Author 2022)
Item |
Issues |
Probability (P) |
Impact (I) |
Total P x I |
Mitigation |
1. |
Failure of Computer |
1 |
4 |
4 |
The computer is machinery that gets disrupted at any time which will interrupt the smooth flow of research activities. Hence, taking regular backups will help to secure information . |
2. |
Security-related issues |
2 |
3 |
6 |
Should have to establish the properly secured software which can help in removing the threats from the system. |
3. |
Using wrong data sets during the project execution process |
3 |
1 |
3 |
For choosing the right data sets it’s become necessary to set a perfect path for the data analysis process. |
4. |
generating wrong probability stats |
3 |
2 |
6 |
Must have to perform the right testing methodology which will help in identifying the solution effectively. |
5. |
Misbehaviour of the application |
4 |
2 |
8 |
This can be resolved by making proper maintenance of the defined code of structure. |
CHAPTER 2: LITERATURE REVIEW
2.1 Introduction
Prediction based on the credit card fraud using machine learning technique.
According to the author it has been seen that there has been a huge increment in fraud by which many peoples are losing their wealth. They can be based on the types of cards that are going to be used by the person. For the solution to these many algorithms are used to identify the threats. In the following paper, the author has performed the project based on the various nonlinear and linear statistical modeling and also conclude with the machine learning algorithms (Gao et al. 2019).
Figure 1: Performance-based on the support vector machines
(Source: Gao et al. 2019)
They all are based on the transaction model of credit cards that perform the transaction based on the monetary system. the author has built a supervised fraud model which gives the ability to identify the same category of data. It has provided the fraudster data based on mathematical equations. The data which is used in describing the data is the record, card number, merch state, merch description, date, and so on. It has concluded the model of the artificial; neural network support vector machines, random forest, and so on (Patil et al. 2018).
Financial transaction detection based on the machine learning
In this review, the author has given a different model for explaining the detection process of fraud based on a financial transaction. The author has given a brief of the project based on machine learning and the deep learning procedure. It has concluded the survey based on the literature review which gives information about techniques that are going to be implemented in the project. they have concluded the supervised learning algorithms that help to analyze the task effectively (Megdad et al. 2022).
The author has defined the methodology for attempting the research in a better manner. The implication of data analysis it has given the data sets which are going to be performed in finding the relevant data. The whole study is based on the experimenter’s research that is based on the two methodologies. that effectively identified the results in the secured environment (Shahabazi and Byun 2022).
CHAPTER 3: METHODOLOGY
3.1 Introduction
The research is based on the qualitative and quantitative methodology which co0mes in the category of the hybrid methodology. the report has concluded the section on finding the data sets for the project which gives the opportunity of finding the task in less time (Pandey et al. Pandey 2021). The qualitative methodology has helped in identifying the same research which allows understanding of the basic functionality of the project. This has helped in identifying the experiment results for different studies (Paleyes et al. 2020).
Whereas the quantitative methodology will provide information about the working structure of algorithms which are based on the coding structure in the python programming language. The quantitative methodology will allow users to understand the working structure of the programs (Paleyes et al. 2020). The prediction of fraud will be based on the statical data which is generated after performing the program execution process on the defined data sets. The methodological process allows the user to give the opportunity of understanding the way of report development. The qualitative methodology allows the user to give the information about the statical data which is goon got to be used (Shahabazi and Byun 2022).
4.0 Software Development
4.1 Requirements
Datasets
The data sets are used to provide the information from the user based on that it will analyze the fraud detection process. Which will help in getting the right information from the user. This transforms the data into a tabular format which decreases the complex process of analyzing the data (Patil et al. 2018).
Python
The whole programming structure depends on the language that is going to be used to perform the coding. This has concluded the different algorithms which allow simplifying the code structure. This can be said as the best programming language to define the code structure (Pandey et al. Pandey 2021).
4.2 Implementation
Data pre-processing
This is based on the five defined steps which are can be executed in sequential order. The names which have been considered are the quality assessment of the data, data cleansing, transformation of data, and reduction based on the selected data. It will conclude the data cleansing process which is used to clarify the data in an effective manner that gives the relevant data from the bulk amount of data (Saastamoinen et al. 2021).
Feature selection
For the better implementation of the project, it will first reduce the input variables and provide a many rid of the nose based on the given data. This can be done based on the automation technology of machine learning (Shahabazi and Byun 2022).
ML algorithms implementation
The project must require a better ML algorithms implementation which gives the accessibility to perform the task of fraud identification in a better process. It can conclude the different algorithms like linear regression, logistic regression, decision tree, and many other algorithms.
Analysis and Prediction
A good analysis can help in identifying the requirements of the project. this defines the code structure of the project. which can help in providing better prediction stats for the project. if the analysis process is effective it provides a good result based on the research (Shahabazi and Byun 2022).
5.0 Results
For the better implementation of the software, it must necessary to provide the important requirement for the execution of the program. This has implemented the code structure in the python structure. Because this will help in identifying the perfect methodology to establish the project in an effective structure. The results that come out from the research are that the prediction of the fraud transaction is going perfectly. That allows the user to find out the results in an effective manner. All the algorithms are working in an effective manner which gives the accessibility to gain the result better way. The result of the whole research is to find and analyze the working process of the fraud detection report.
6.0 Conclusions and Recommendations
The machine learning algorithms have simplified the structure of different techniques which has enhanced the working procedure of the algorithms. the study has effectively achieved the aim of predicting fraudster transactions using machine learning algorithms. for making effective research it has effectively achieved the objective. For that, the author has performed the process of research reviewing which allows understanding of the functionality that is defined by different authors.
It has described the methodology which is used to define the structure of the report. It also has concluded the section on software development which give information about the required tools and technique which are used in the report. This has given information about the research results that give a short description of the working process of algorithms.
References
De Roux, D., Perez, B., Moreno, A., Villamil, M.D.P. and Figueroa, C., 2018, July. Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 215-222). https://dl.acm.org/doi/pdf/10.1145/3219819.3219878
Gao, J., Zhou, Z., Ai, J., Xia, B. and Coggeshall, S., 2019. Predicting credit card transaction fraud using machine learning algorithms. Journal of Intelligent Learning Systems and Applications, 11(3), pp.33-63. DOI: 10.4236/jilsa.2019.113003
Megdad, M.M., Abu-Naser, S.S. and Abu-Nasser, B.S., 2022. Fraudulent Financial Transactions Detection Using Machine Learning. International Journal of Academic Information Systems Research (IJAISR), 6(3). https://philpapers.org/archive/MEGFFT-2.pdf
Paleyes, A., Urma, R.G. and Lawrence, N.D., 2020. Challenges in deploying machine learning: a survey of case studies. arXiv preprint arXiv:2011.09926. https://arxiv.org/pdf/2011.09926.pdf
Pandey, P. and Pandey, M.M., 2021. Research methodology tools and techniques. Bridge Center. http://dspace.vnbrims.org:13000/jspui/bitstream/123456789/4666/1/RESEARCH%20METHODOLOGY%20TOOLS%20AND%20TECHNIQUES.pdf
Patil, S., Nemade, V. and Soni, P.K., 2018. Predictive modelling for credit card fraud detection using data analytics. Procedia computer science, 132, pp.385-395. https://reader.elsevier.com/reader/sd/pii/S1877050918309347?token=F8C6DF5FEDE7CD50CA006D53C282EE77F2C7BC825C7807783219EDB661BDA8FFD255405E8ED96C6A84C9B29B4EFDB6A0&originRegion=eu-west-1&originCreation=20220404114122
Saastamoinen, S., Jyrälä, M. and Lagemann, H., 2021. Project Plan. https://sal.aalto.fi/files/teaching/ms-e2177/2021/2021-ProjectPlan-UPM-final.pdf
Salve, S.M., Samreen, S.N. and Khatri-Valmik, N., 2018. A Comparative Study of Software Development Life Cycle Models. International Research Journal of Engineering and Technology, 5(02), p.5. https://d1wqtxts1xzle7.cloudfront.net/55998573/IRJET-V5I2154-with-cover-page-v2.pdf?Expires=1649077060&Signature=BdNoJE2OeaX79nNPa0a8aZ-Q0mc5FaiB~H~8PojP0M1f7P84lEaLGMtspeCNZTNQGGo1v0miiurit~QAn8tr8KPlrGXJuyukvnW1Ars1~oRxiLFMk1e2CzXm6wdgD9OU-WczuXV7JApfgTrd~XGKpWntE9y97gp1~6JzjtD4xXQzH1M75ymsmHMPdzgAgfDZ0ThzAvd3GQE1vQMsFskk147N834qA93HiHXyXi~gasOPMRs~s~hRkhGROIwZwtn3wNBFzV256k1ecKYUHnSx5tgro8U2pRXIhJpenTBeZUssRyDOqcutPPe-VA9Rk53W4axiOGOK2h~WAZLGQu5stg__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA
Shahbazi, Z. and Byun, Y.C., 2022. Knowledge Discovery on Cryptocurrency Exchange Rate Prediction Using Machine Learning Pipelines. Sensors, 22(5), p.1740. https://doi.org/10.3390/s22051740