Anti Fraud training course Risks of fraud: measurement models for and analysis supporting tools Padova, May 6-26th 2015
This event is supported by the European Union Programme Hercule III (2014-2020). This programme is implemented by the European Commission. It was established to promote activities in the field of the protection of the financial interests of the European Union. (for more information see http://ec.europa.eu/anti_fraud/about-us/funding/index_en.htm)'
Risks of fraud: measurement models for and analysis supporting tools Padova, May 26th 2015 Mr. Andrea Chiusani Mr. Luca Marzegalli Mr. Piero Di Michele Mr. Marco Ferrara This document reflects the author’s view and the European Commission is not responsible for the views displayed in the publications and/or in conjunction with the activities for which the grant is used. The information contained in this publication does not necessarily reflect the position or opinion of the European Commission.
Agenda
Introduction to Data Analysis Rules based analysis and Alerting tools Models for Data Analysis Dashboarding and decision process eDiscovery Link Analysis
Introduction to Data Analysis
Why Fraud Data Analytics
Data Analytics Object Data Analytics (DA) is the science that deals with examining the data (structured and rough) in order to draw conclusions on such information. Data Analytics is used in large companies and organizations in order to support the management in identifying the best business decisions.
spiegare causa-effetto Cause and Effect dei fenomeni
Individuare un particolare Identify specific problem problema
anticipare gli eventi che Improve future actions possono determinare il futuro di una azienda
Why Fraud Data Analytics
Anti fraud control through inspection of 100% of transactions
Automating the analysis of forensic auditors and fraud examiners
Retrospective and/or real time
Independently validate compliance with company code of conduct
Measures gap between policy and expectation vs. what really happens and report on control effectiveness
Detect risks as they happen - when they are less costly and less complex to prove, correct and remediate
ERROR
WASTE
MISUSE Detection
ABUSE
FRAUD
Specify objectives of analysis Identify payment anomalies Duplicates Data errors
Waste and error
Misuse and abuse
Fraud and misconduct
Recovery of value
Process improvement
Risk reduction
Behavioural non-compliance Deviation from contracts Incorrect rates Missed opportunity Missed rebates or discounts Poor working capital management Identify control breaches Segregation of duties Invalid or inactive user or accounts Unapproved or out of policy expenses Detect indicators Out of sequence transaction steps Inconsistencies or mis-matches Invalid or falsified master data Select possible schemes Ghost vendors or employees Price fixing, supplier bias and kick-backs Manipulation of reported results Detect fraud indicators Employee supplier links Unusual sequence or timing of transaction steps Large and unusual journals
EY FDA Survey Between November 2013 and January 2014, our researchers conducted a total of 466 interviews across 11 countries with organizations actively using forensic data analytics (FDA). Respondents were decision-makers responsible for their companies’ anti-fraud and anti-corruption programs. Function Internal audit and risk Finance Legal/compliance Business/management Investigations Other Revenue (US) More than US$5b US$1b – US$5b US$500m – US$1b US$100m – US$500m Above US$1b Below US$1b
Italy
Global
33%
41%
40%
26%
10%
17%
0%
8%
3%
3%
15%
6%
Italy
Global
15%
22%
38%
33%
10%
9%
38%
35%
53%
56%
48%
44%
All interviews were conducted by telephone in the local language. 40 interviews were conducted in Italy. Results are compared with global findings.
Forensic Data Analytics primary benefits
Data Analytics tools used in organization Global results Italy
Total Transportation
Consumer Financial Manufacturing products services
Life sciences
Mining
Oil and gas
Technology, communications and entertainment
38
422
28
85
30
100
47
23
88
21
Spreadsheet tools such as Microsoft Excel
39%
65%
75%
79%
77%
55%
55%
57%
63%
62%
Database tools such as Microsoft Access or Microsoft SQL Server
26%
43%
39%
53%
37%
44%
43%
13%
42%
57%
Forensic analytics software (ACL, IDEA)
11%
26%
25%
21%
27%
24%
36%
26%
27%
24%
11%
11%
0%
11%
10%
14%
15%
4%
13%
14%
24%
29%
25%
26%
27%
26%
36%
35%
35%
19%
Visualization and reporting tools
8%
12%
18%
16%
7%
11%
13%
4%
10%
10%
Big data technologies
0%
2%
4%
1%
0%
3%
4%
0%
2%
0%
Text analytics tools or keyword searching
24%
26%
14%
33%
37%
21%
28%
22%
25%
24%
Social media/web monitoring tools
16%
21%
18%
25%
23%
23%
21%
4%
17%
24%
Voice searching and analysis
0%
2%
0%
2%
0%
3%
4%
0%
1%
5%
Statistical analysis and data mining packages Continuous monitoring tools, which may include governance risk and compliance tools
Data Analytics – Top 5 success factor
Focus on quick win: prioritize initial objectives of the project
Communicate: share information on early successes within the company and among the business units, in order to gain internal consensus
Go beyond rule based analytics
Deliveries take time: avoid the last minute rush
EY FDA Survey – Italian market
In Italy: 51% of people interviewed consider corruption as the main risk of fraud which they are exposed to
75% of italian companies use Forensic Data Analytics (FDA) tools in order to reduce the risk of fraud and corruption
According to 89% of the people surveyed, the benefits produced by FDA tools derive from «their capability to incercept potentially poor behaviour», hard to find by other means
70% of companies interviewed believe that new FDA technologies applied to large amounts of information (i.e. «Big Data») are gaining a central role in prevention and detection of suspect behaviours within the company.
Rules based analysis and Alerting tools
Fraud Detection Strategy – Controls
Legal rappresentative > 70 years old
Approved amount 20% requested
<
Final refund < 50% authorized
For implement a structured and effective Fraud Detection Strategy, we need to introduce the concept of “Controls”. Controls are logically based on the identification and classification wrong or strange operation in order to identify all fraud pattern.
Fraud Detection Strategy – Controls values
Legal representative > 70 years old
Approved amount
<
20% requested
Check Theshold
In this case, we report a value for each controls. In detail:
Legal representative = > 70 years old Approved amount = < 20% of amount requested Final refund = < 50% of refund authorized
Is important to identify a value that could be significant for each control.
Final refund < 50% authorized
Fraud Detection Strategy – Measure
Scoring Overall 30
Measure TOT 100
60 Legal representative > 70 years old
10 Approved amount 20% requested
<
Final refund < 50% authorized
For each control, in order to calculate the math formulas, could be assigned a specific value. In this example we assigned:
Legal representative > 70 years old = 30 Approved amount < 20% of amount requested = 60 Final refund < 50% of refund authorized = 10
The total of all control’s value is 100.
Fraud Detection Strategy – Alert threshold Alert Threshold
0,5 Scoring Overall 30
60 Legal representative > 70 years old
10 Approved amount 20% requested
<
Final refund < 50% authorized
For the scoring overall, could be assigned a value that separate the alert threshold operation. If the value of specific controls are wired than the level designed, so the system has an alert.
Fraud Detection Strategy – Example values 0,5 Scoring Overall 30
60 Legal representative > 70 years old
10
Approved amount < 20% requested
Final refund < 50% authorized
ID domanda = 16
Example 1 Legal representative 71 years old
1 Approved amount = 16% requested
0 Final refund = 55% authorized
1 = True 0 = False
Fraud Detection Strategy – Example result 0,5 Scoring Overall 30
60 Legal representative > 70 years old
10 Approved amount 20% requested
<
Final refund < 50% authorized
ID domanda = 16
Example 1 Legal representative 71 years old
1 Approved amount 16% requested
(1x30) + (1x60) + (0x10) 100
0
=
Final refund = 55% authorized
0,9
Fraud Detection Strategy - Alerting 0,5 Scoring Overall 30
60 Legal representative > 70 years old
10 Approved amount 20% requested
<
Final refund < 50% authorized
ID domanda = 16
Example 1 Legal representative 71 years old
1 Approved amount 16% requested
0,9
=
0 Final refund = 55% authorized
Alert
Excel example: data
Excel example: formulas
Excel example: conditional formatting
Excel example: output and alarm overall
ALERTING
Advanced Graph and Chart
False Positive – The importance of analysis
ID Request: 1
ID Request: 2
ID Request: 1
ID Request: 3
ID Request: 3
ID Request: 2
....
All request
Allowed
Fraud
False Positive – Misuse detection
You have to define as unacceptable behavior and identify the misuse.
ID Request: 3
Alert Fraud
False Positive – Anomaly detection
You have to define as acceptable behavior and identify, logically, the other one.
ID Request: 1
ID Request: 2
Alert Allowed
False Positive and False Negative Anomaly detection
Misuse detection
False positive False positive
False negative
False negative
Allowed
Fraud
Allowed
False Positive : an acceptable behaviour give an alarm False Negative: an unacceptable behavior don’t give an alarm
Fraud
Control object logic based
Legal representative Employee
Legal entity
Request
Supplier
Control object logic based: example
Legal representative Employee
Legal entity
Request
There are legal representatives that request more than one question?
Supplier There is a vendor that appears to only one question, and that has never seen for any other questions?
Spreadsheet vs Relational Database
Spreadsheet Id Misura Pratica Id Domanda
Des Stato Domanda
vs
Relational Database
Ente Delegato
Cuaa
Ragione Sociale
12100AZ
2774979 NON FINANZIABILE
SPORTELLO UNICO DI ROVIGO
DSRFPP77M03H620A
AZIENDA AGRICOLA VIVA
12100AZ
2774979 NON FINANZIABILE
Misura Pratica Id Domanda Ragione Sociale AGRICOLA Id Domanda SPORTELLOIdUNICO DI ROVIGO DSRFPP77M03H620A AZIENDA VIVA
12100AZ
2774979 NON FINANZIABILE
SPORTELLO UNICO DI ROVIGO
12100AZ
2779497 RICEVIBILE
12100AZ
2780086 RICEVIBILE
12100AZ
2780086 RICEVIBILE
12100AZ 2774979 AZIENDA AGRICOLA VIVAI 2774979 DSRFPP77M03H620A AZIENDA AGRICOLA VIVA 12100AZ 2774979 AZIENDA AGRICOLA VIVAI 2774979 SPORTELLO UNICO DI VENEZIA CLLMLE57R09H823U AZIENDA AGRICOLA CEL 12100AZ 2774979 AZIENDA AGRICOLA VIVAi 2774979 SPORTELLO UNICO DI TREVISO 04172990261 SOCIETA' AGRICOLA GIU 12100AZ 2779497 AZIENDA AGRICOLA CELLA 2779497 SPORTELLO UNICO DI TREVISO SOCIETA' AGRICOLA 12100AZ 278008604172990261 SOCIETA' AGRICOLA GIUSTI 2780086 GIU
12100AZ
2780086 RICEVIBILE
SPORTELLO UNICO DI TREVISO
04172990261
SOCIETA' AGRICOLA GIU
12100AZ
2780637 RICEVIBILE
SPORTELLO UNICO DI PADOVA
TMBNZR64T31E682R
TAMBARA NAZZARENO
12100AZ
2780637 RICEVIBILE
SPORTELLO UNICO DI PADOVA
TMBNZR64T31E682R
12100AZ
2781800 RICEVIBILE
TAMBARA NAZZARENO Id Misura Pratica Id Domanda Ragione Sociale SPORTELLO UNICO DI VERONA FRGMSM70C25H783B FRIGOTTO MASSIMO
12100AZ
2781800 RICEVIBILE
12100AZFRGMSM70C25H783B 2774979 AZIENDA AGRICOLA VIVAI MASSIMO SPORTELLO UNICO DI VERONA FRIGOTTO
12100AZ
2781803 FINANZIABILE
12100AZBLLNTN80M18L364P 2774979 AZIENDA AGRICOLA VIVAIANTONIO SPORTELLO UNICO DI PADOVA BELLOMI
12100AZ
2781803 FINANZIABILE
SPORTELLO UNICO DI PADOVA
BLLNTN80M18L364P
BELLOMI ANTONIO
12100AZ
2781884 RICEVIBILE
SPORTELLO UNICO DI TREVISO
04030120267
IRIS VIGNETI SOC.SEMP
12100AZ
2781884 RICEVIBILE
SPORTELLO UNICO DI TREVISO
04030120267
IRIS VIGNETI SOC.SEMP
12100AZ
2781884 RICEVIBILE
SPORTELLO UNICO DI TREVISO
04030120267
IRIS VIGNETI SOC.SEMP
A database isTREVISO a storage space for content andBONOTTO information The12100AZ spreadsheet, like2782362 Microsoft Excel, is a simpleSPORTELLO FINANZIABILE UNICO DI BNTLDA58A23I124R ALDO (data). 12100AZ 2782362 FINANZIABILE SPORTELLO UNICO DI TREVISO BNTLDA58A23I124R BONOTTO ALDO tool that could be used to collect, order and analyze some various data. Any collection of homogeneous data, stored in a structured form could be used to perform complex analysis with multiple variables and conditions. The spreadsheet could be used for simply The database are based on relationship and on link analysis and the results could be represented in connections. basic methods.
Relational Databases - Introduction
Relational databases are category of databases (actually the most common at the moment) in which the data are stored in tables, originally called relations.
A table is a collection of related data entries and it consists of columns and rows.
A database contains one or more tables, and each table contains rows (records) of data
ID
FIRST_NAME
LAST_NAME
CITY
COUNTRY
Height
1
Albert
Lucas
London
England
184
2
Beatrice
Monroe
New York
USA
171
3
Charles
Jones
New York
USA
176
4
Diane
Mc Gregor
New York
USA
165
For instance, the table above contains 4 records, each one of those having 6 fields. In order to interact with a relational database systems, that is to insert and retrieve data, we have to use a specific programming language. To this purpose, most of databases uses SQL (Standard Query Language). SQL provides the syntax to create, retrieve, update or delete a piece of information
Why databases are useful
Databases are specifically designed for dealing with large amounts of data • They use scripting languages specific for data manipulation • They provide features for mission critical aspects such as security, efficiency, reliability, fault tolerance, data consistency, backup, etc. However, most of business applications use a database system and the use of databases is the key for data analytics. Nevertheless, when the amount of information grows, using spreadsheets becomes particularly difficult, or even unfeasible.
Relational Databases - Querying
Given the table mentioned before, if we would like to get from our table the information about the people living in New York and taller than 170 cm we can write a SQL statement like this: Select * From PEOPLE Where CITY = ‘New York’ and height > 170 Which translates to «get all the rows from table people having ‘New York’ as value for the field CITY», and will return this result ID
FIRST_NAME
LAST_NAME
2
Beatrice
3
Charles
ADDRESS
CITY
Height
Monroe
New York
171
Jones
New York
176
COUNTRY
Each clause uses a comparison operator, such as the “equal” (=) or the “greater than” (>) used in this example. Obviously, we can use as much conditions as we want inside the where clause linking them together with the logical operators AND and OR
Relational Databases – Joining tables
However, one of the key principles of a databases is to have multiple tables, in order to organize the information in an efficient and effective manner. For example we might have a second table containing the phone numbers:
In order to associate the owner’s name to each phone number we can write a SQL statement contain a JOIN instruction:
PHONE_ID
PERSON_ID
PHONE_TYPE
PHONE_NUMBER
1
1
Mobile_business
122548215
2
1
Mobile_personal
445431287
3
2
Home_personal
245482136
4
4
Mobile_business
545825315
5
4
Home_personal
785453568
Select FIRST_NAME, LAST_NAME, PHONE_TYPE, PHONE_NUMBER From PEOPLE as P join PHONE_BOOK as B ON PEOPLE.PERSON_ID = B.PERSON_ID FIRST_NAME
LAST_NAME
PHONE_TYPE
PHONE_NUMBER
Albert
Lucas
Mobile_business
122548215
Albert
Lucas
Mobile_personal
445431287
Beatrice
Monroe
Home_personal
245482136
Diane
Mc Gregor
Mobile_business
545825315
Diane
Mc Gregor
Home_personal
785453568
Relational Databases – Types of join
There is more than one viable variants when joining two tables: Inner Join: rows that matches on both tables Left join: all rows from the left table, and the matched rows from the left table Right join: all rows from the right table, and the matched rows from the right Full outer join: all the rows from both tables (combining right and left join) Inner join
Right join
Left join
Full outer join
Relational Databases – Aggregate functions
PURCHASE_ID
Aggregation is a common operation when dealing with data. Considering those data, we could need for example to compute the total amount. To this purpose we can use the aggregate function sum(): Select SUM(AMOUNT) From PURCHASES
PURCHASE_TYPE
AMOUNT
DATE
1
cat_1
1000
2015/02/10
2
cat_3
1500
2015/03/16
3
cat_1
700
2015/02/22
4
cat_2
1400
2015/03/07
5
cat_3
400
2015/04/27
6
cat_3
1850
2015/05/08
7
cat_2
975
2015/05/21
This translates to «sum all the values of the field AMOUNT», and produces the result on the right.
TOT_AMOUNT
7825
SQL provides a large number of aggregate functions, like min, max, avg, stdev, etc. Select SUM(AMOUNT), min(AMOUNT), max(AMOUNT) From PURCHASES
TOT_AMOUNT
MIN_AMOUNT
MAX_AMOUNT
7825
400
1850
Relational Databases – Grouping
PURCHASE_ID
SQL allows one to use the group by statement in conjunction with the aggregate functions in order to group the result set by one or more columns.
Considering always the same example, we might want to obtain such total for each PURCHASE_TYPE separately. Select PURCHASE_TYPE, SUM(AMOUNT) From PURCHASES GROUP BY PURCHASE_TYPE This translates to «regroup the data by the field PURCHASE_TYPE, then from each group compute the sum of the field AMOUNT». This produces:
PURCHASE_TYPE
AMOUNT
DATE
1
cat_1
1000
2015/02/10
2
cat_3
1500
2015/03/16
3
cat_1
700
2015/02/22
4
cat_2
1400
2015/03/07
5
cat_3
400
2015/04/27
6
cat_3
1850
2015/05/08
7
cat_2
975
2015/05/21
PURCHASE_TYPE
TOT_AMOUNT
cat_1
1700
cat_2
2375
cat_3
3750
When using a group by statement, the fields extracted in the select statement can only be aggregate functions or the fields used for grouping.
Dashboarding and decision process
Dashboarding Introduction to Tableau
Tableau Software is designed to quickly analyze, visualize and share information. You can work with data, move from simple to complex visualizations and combine them in interactive dashboards.
Dashboarding The Marks Card and Buttons
Tableau applies label, color, shape and size to visualizations using the view cards.
Dashboarding Example dashboard
Structured and Unstructured Data
Data source in today’s organization
Text
Graphics Unstructured Data
Email CRM
Structured Data
Databases
Presentations & Spreadsheets Transactions Systems
20% 20%
80% 80%
eDiscovery Electronic Discovery: the process of identifying, managing, preserving, processing, analyzing, reviewing producing and presenting of electronically stored information, usually in the context of an investigation or litigation.
Experience Robust process Highly trained personnel Validation and cleansing Metadata and text extraction De-duplication
Custodians
Relevant
Keywords
Quality control Exception reporting
Date range
Not Relevant Tracking and reporting
Project Management
Next frontier – Link Analysis
eDiscovery
Fraud Data Analytics
Output
Output
Link Analisys
Analysis of the relationships between the identified findings and reconstruction of dependencies. Data Sources: anomalous transactions , relevant documents