Fraudulent Credit Card Transaction Detection with Multithreaded CPU/GPU accelerated SVM and Decision Tree Classifier

Tech stack: SnapML, scikit-learn, Matplotlib, Seaborn, Pandas, NumPy.

Dataset used: Credit Card Fraud Detection, Kaggle link **here.**

Description: The models used were ****SnapML library’s Decision Tree Classifier and SVM, to ensure speedy training due to their CPU multithreading and GPU implementations. The dataset includes information about real transactions made by credit cards in Sept. 2013 by European cardholders. Upon visualisation, class representation in the dataset was found to be highly imbalanced, with only 0.173% of the data consisting of fraudulent transactions. This was dealt with during training by computing the weight of the fraudulent samples and biasing the model to pay more attention to these samples.

Resulting accuracy metrics:

Decision Tree Classifier:
- Training duration = 0.4 seconds
- ROC-AUC score = 0.87
Support Vector Machine:
- Training duration = 11.5 seconds
- ROC-AUC score = 0.89
- Hinge loss = 0.18

View code on GitHub