Introduction of Fraud detection
In short, Target of fraud detection is to detect fradulent traffic and filtering or block them. What mekes this interesting and difficult are what we are detecting are human, it might be as hard as you can imagine to defense with huamn.
Strategy
Rules
Most of the traffic are produced by pruely machine/device without any human involved, like Ddos. Rules usually can handle these kind of fraudulent traffic because they are too obvious.
The core ideas to design a rule is to fine a dimension and feature, like count of traffic from same IP. What we need is find a proper dimension and valid features for current case.
Models
After we block the huge amount of fraudulent traffic, fraud rings usually will add more human-liked feature into the traffic, or create the behaviour totally manually. What they care is RIO, just like what we cared. If the profit is high enough for the manual operation time, it will worth.
We need more features and signals to detect fraudulent traffic when they are more likely produced by human, that’s why we need statistic algorithm and machine learning models to detect.
- Anomaly deteciton
- Linear regression
- Tree-base models
- Graph-base models
- Deep learning models
- Mixture of above
fraud similation
We can simulate the fraud rings to attack our business, which can improve anti-fraud quality help use to estimate the cost to break anti-fraud productions.
architecture
Online
Online Service usually reponse result within 50ms, errors in online product might block all of users of bussiness, and it some times really happens becuase it is designed to blcok users. So we should make sure we don’t give false positive result to business.
Streaming
We need streaming system to calculate faeture values, these feature will be used in rules and models.
Offline
Some complex algorithm cannot be implemented in online and streaming system, we can keep high-productivity iteration in offline stage. We also can try various different methods without considering efficiency.
In addition, we can detect the hardest fraudulent traffic in offline stage, which is benefitical for online model training.
Risk alert
Risk alert can prevent from false negative samples.
We can reduce the threshold of online rules and models to create risk alert methods, it’s also a great method to apply low-accuracy algorithms.
Monitor service
Monitor the traffic that we judge fraudulent, prevent from false positive.
Montiro and analysis feedback from business, the have more clear view about these traffic.
Summary
Fradu detection is a mixture area that need engineering, statistics, algorithms and security, it’s a very interesting topic if you like it. It will be more challenging influenced by AI because there are less and less difference between huamn and AI.