Identify Social Scams in P2P Payments using Real-Time ML

Any ecosystem that supports the transfer of value between users, from marketplaces to online games, will inevitably attract scammers looking to defraud unsuspecting customers.

A machine learning model, driven by real-time risk signals, can track the flow of money across touchpoints, and identify scams before the money is out the door.

Block suspicious activity using the freshest data.

Bad guys move money to hide

Fraudsters and social scammers hide their activity by quickly moving money through a network of mule accounts.

Add real-time link analysis

Compute real-time risk signals across deposits, transfers, and withdrawals to track the suspicious activity.

Stop the fraud before the money is gone

An accurate risk model at cashout time instantly blocks bad withdrawals while giving a low-friction experience to good customers.

A custom fraud solution your data team can manage

Sumatra gives data scientists the infrastructure and development tools needed to build, deploy, and operate machine learning solutions that stop fraud and abuse in real time.

Here are the steps for shipping a risk model for P2P social scams with Sumatra.

1. Define your risk signals as code

Sumatra provides a concise language, Scowl, for defining powerful, stateful features that are computed in real time. Here are a few examples of signals you can build to identify risky transactions:

Flag suspiciously-linked deposits

event cash_in

card_hashes_per_bin :=
    CountUnique(card_hash by card_bin, acct_id)

dollars_in_same_device_diff_account :=
    Sum(amount by device_id)
    - Sum(amount by device_id, acct_id)

On each cash_in transaction, we can compute risk factors like the number of unique credit cards the account has been attempting to use from the same card BIN. Similarly, we can see how much money this same device is depositing to other accounts.

Identify transfers between strangers

event transfer

pair_receives :=
    Count(by sender, receiver as receiver, sender)
pair_sends :=
    Count(by sender, receiver exclusive) -- not including this one
is_friend := pair_sends + pair_receives > 0 

On each P2P transfer, we can determine if this sender and receiver have ever sent money to each other in the past. This is obviously an important risk factor for social scams.

Propagate risk forward to cash_out time

event cash_out

transfers_from_stranger :=
    Count<transfer>(by acct_id as receiver
        where not is_friend last 30 days)

risky_deposits :=
    Sum<cash_in>(amount by acct_id
        where card_hashes_per_bin > 2)

Sumatra's native support for cross-event aggregates means that risk signals computed across the customer journey are made available at every decision point, allowing for risk assessments that are comprehensive and based on the freshest context.

2. Train your ML model on backfilled features—no reimplementation required

Sumatra includes a distributed offline compute engine (similar to Spark), which can quickly backfill Scowl features over historical data to generate a model training set as a dataframe in Python.

from sumatra import Client

client = sumatra.Client()
train = client.replay(
df = train.get_features("cash_out")

Note the Scowl feature definitions are directly used for replay, with no reimplemention required. Backfilled features are point-in-time consistent with what would have been computed online.

Now let's train a model on our dataframe using the popular scikit-learn package to build a gradient boosted tree model and save it as a PMML model artifact.

from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.ensemble import GradientBoostingClassifier

pipeline = PMMLPipeline(
        ("imputer", SimpleImputer(missing_values=np.nan, strategy="mean")),
        ("classifier", GradientBoostingClassifier(n_estimators=30)),
), labels)
sklearn2pmml(pipeline, "cash_out_model.xml", with_repr=True)

3. Deploy model directly in Sumatra

Sumatra not only computes model inputs, it also executes machine learning models, directly in the same environment without any need to stand up separate model services.

First, we upload our PMML, which can be done from the same Python client used to build our training set:

client.create_model_from_pmml("cash_out_model", "cash_out_model.xml")

The platform automatically tracks model versions and their input/output schemas.

Finally, we update our Scowl code to perform model inference at cash_out time:

-- cash_out.scowl
score := ModelPredict<cash_out_model>({

And we're live!

When we click "Apply" in Sumatra, we deploy a serverless service that performs our online feature computation and model serving with near instant freshness, median latency around 50ms, and effortless auto-scaling up to 10,000 TPS.

More fraud recipes

To check out another recipe for reducing fraud and abuse with Sumatra, see: Supercharge Your Stripe Radar with ATO Risk Signals.

Ready to start building these and many more fraud signals?