Mova Insights Inc. - Integrated Business Analytics for small- and medium-sized business

Mova Bank Card Fraud Detection: Description

Adjust the model’s Classification Threshold to better predict fraudulent transactions

Mova Bank Card Fraud Detection is an analysis tool for detecting fraudulent credit card transactions:

Bank Card Fraud is a large and growing problem for card-holders and financial institutions. Bank card managers want to reduce fraudulent transactions and detect and block fraudulent transactions as quickly as possible.

Traditional fraud detection methods and models often do not account for the characteristics of the data. They often start with less-effective methods, such as data sampling or weighting, and use the default model parameters. This can cause too many annoying false positives (not fraud) and miss important actual fraud cases (false negative).

Effective fraud detection tools:

  • Help prevent financial loss by the card-holder and the bank.
  • Help banks keep card-holders’ data safe.
  • Help prevent future fraudulent transactions.

Run Mova Bank Card Fraud Detection

Select a Classification Threshold value for the model from the slider below. Click the Run Bank Card Fraud Detection button:

Threshold:

The default Classification Threshold is 0.5, which works well for classification of balanced datasets.


  

The Classification Threshold is used to classify each transaction record as Fraud or Not Fraud. For this imbalanced dataset, a lower threshold is more likely to classify a transaction as fraudulent.

It’s easy. It’s free. No signup or registration.

Fraud Detection Models

This application demonstrates the large effect on classification model output when varying a key hyperparameter of the model (Classification Threshold).

The application uses a Logistic Regression model to classify bank card transactions as either fraudulent or not fraudulent. Logistic Regression is used because it is a simple model that runs fast and produces meaningful results for the demonstration.

Changing the Classification Threshold of the model changes the predictions of the classification. The default Classification Threshold of most classification models is 0.5. This is optimal for perfectly balanced datasets (equal number of records in each class). But it is not optimal for imbalanced datasets, especially when the important class is under-represented.

The data analysis and prediction are challenging because the dataset is very imbalanced. Only 0.17% of the transactions are fraudulent. The model can show 99.8% accuracy simply by predicting all transactions are Not Fraud. But this will miss most of the all-important Fraud transactions.

The usual metric for balanced datasets is Accuracy. But for highly imbalanced data, the Precision and Recall are more useful. But they move in opposite directions. The metric F1 Score combines them and is the most useful metric for measuring the output of a model with an imbalanced dataset.

Insightful Bank Card Fraud Detection Tool

  • Learn how the Classification Threshold you select affects the Precision and Recall of the model’s prediction.
  • Consider changing the Classification Threshold in your classification models at work for better performance and outcome.

Why use Mova Bank Card Fraud Detection?

This application is not a rigorous analysis of the data or comparison of analysis methods or models. It shows how the common default settings (default Threshold is 0.5) of many data analysis models do not produce the best results (for certain datasets) which can easily be improved by simple model tuning.

After tuning the hyperparameters for the data, you can improve the model further with techniques such as feature engineering, etc.

The dataset is from Kaggle Credit Card Fraud Detection. It comprises 28 anonymous variables and transaction time and amount. Based on our pre-analysis, the application excludes Transaction Time. It scales the Transaction Amount. Thus, the model uses 29 input variables. The output (target) variable is Class where Class 0 is Not Fraud and Class 1 is Fraud.

Based on our pre-analysis, the other main Logistic Regression model hyperparameters (Solver, Penalty, and C parameter) have a small effect on the model performance for this dataset. Our model does not vary them and uses their default values which produces the best relative model performance.

Design your classification models for meaningful performance:

  1. Be aware of imbalanced classes in a dataset. Every dataset is imbalanced to some degree.
  2. Adjust the high-level hyperparameters such as Classification Threshold, Solver, Penalty, and C parameter.
  3. Use feature engineering to improve model performance and prediction output. Feature engineering requires time and effort; be creative.
  4. Avoid re-sampling the data. This can remove important information from the data or add misleading data.
  5. Avoid adjusting the data weights. This can bias the outcome for or against your goal.