Anti Fraud training course

Anti Fraud training course Risks of fraud: measurement models for and analysis supporting tools Padova, May 6-26th 2015 ... Why Fraud Data Analytics D...

0 downloads 91 Views 9MB Size
Anti Fraud training course Risks of fraud: measurement models for and analysis supporting tools Padova, May 6-26th 2015

This event is supported by the European Union Programme Hercule III (2014-2020). This programme is implemented by the European Commission. It was established to promote activities in the field of the protection of the financial interests of the European Union. (for more information see http://ec.europa.eu/anti_fraud/about-us/funding/index_en.htm)'

Risks of fraud: measurement models for and analysis supporting tools Padova, May 26th 2015 Mr. Andrea Chiusani Mr. Luca Marzegalli Mr. Piero Di Michele Mr. Marco Ferrara This document reflects the author’s view and the European Commission is not responsible for the views displayed in the publications and/or in conjunction with the activities for which the grant is used. The information contained in this publication does not necessarily reflect the position or opinion of the European Commission.

Agenda

     

Introduction to Data Analysis Rules based analysis and Alerting tools Models for Data Analysis Dashboarding and decision process eDiscovery Link Analysis

Introduction to Data Analysis

Why Fraud Data Analytics

Data Analytics Object Data Analytics (DA) is the science that deals with examining the data (structured and rough) in order to draw conclusions on such information. Data Analytics is used in large companies and organizations in order to support the management in identifying the best business decisions.

spiegare causa-effetto Cause and Effect dei fenomeni

Individuare un particolare Identify specific problem problema

anticipare gli eventi che Improve future actions possono determinare il futuro di una azienda

Why Fraud Data Analytics



Anti fraud control through inspection of 100% of transactions



Automating the analysis of forensic auditors and fraud examiners



Retrospective and/or real time



Independently validate compliance with company code of conduct



Measures gap between policy and expectation vs. what really happens and report on control effectiveness



Detect risks as they happen - when they are less costly and less complex to prove, correct and remediate

ERROR

WASTE

MISUSE Detection

ABUSE

FRAUD

Specify objectives of analysis Identify payment anomalies  Duplicates  Data errors

Waste and error

Misuse and abuse

Fraud and misconduct

Recovery of value

Process improvement

Risk reduction

Behavioural non-compliance  Deviation from contracts  Incorrect rates Missed opportunity  Missed rebates or discounts  Poor working capital management Identify control breaches  Segregation of duties  Invalid or inactive user or accounts  Unapproved or out of policy expenses Detect indicators  Out of sequence transaction steps  Inconsistencies or mis-matches  Invalid or falsified master data Select possible schemes  Ghost vendors or employees  Price fixing, supplier bias and kick-backs  Manipulation of reported results Detect fraud indicators  Employee supplier links  Unusual sequence or timing of transaction steps  Large and unusual journals

EY FDA Survey Between November 2013 and January 2014, our researchers conducted a total of 466 interviews across 11 countries with organizations actively using forensic data analytics (FDA). Respondents were decision-makers responsible for their companies’ anti-fraud and anti-corruption programs. Function Internal audit and risk Finance Legal/compliance Business/management Investigations Other Revenue (US) More than US$5b US$1b – US$5b US$500m – US$1b US$100m – US$500m Above US$1b Below US$1b

Italy

Global

33%

41%

40%

26%

10%

17%

0%

8%

3%

3%

15%

6%

Italy

Global

15%

22%

38%

33%

10%

9%

38%

35%

53%

56%

48%

44%

All interviews were conducted by telephone in the local language. 40 interviews were conducted in Italy. Results are compared with global findings.

Forensic Data Analytics primary benefits

Data Analytics tools used in organization Global results Italy

Total Transportation

Consumer Financial Manufacturing products services

Life sciences

Mining

Oil and gas

Technology, communications and entertainment

38

422

28

85

30

100

47

23

88

21

Spreadsheet tools such as Microsoft Excel

39%

65%

75%

79%

77%

55%

55%

57%

63%

62%

Database tools such as Microsoft Access or Microsoft SQL Server

26%

43%

39%

53%

37%

44%

43%

13%

42%

57%

Forensic analytics software (ACL, IDEA)

11%

26%

25%

21%

27%

24%

36%

26%

27%

24%

11%

11%

0%

11%

10%

14%

15%

4%

13%

14%

24%

29%

25%

26%

27%

26%

36%

35%

35%

19%

Visualization and reporting tools

8%

12%

18%

16%

7%

11%

13%

4%

10%

10%

Big data technologies

0%

2%

4%

1%

0%

3%

4%

0%

2%

0%

Text analytics tools or keyword searching

24%

26%

14%

33%

37%

21%

28%

22%

25%

24%

Social media/web monitoring tools

16%

21%

18%

25%

23%

23%

21%

4%

17%

24%

Voice searching and analysis

0%

2%

0%

2%

0%

3%

4%

0%

1%

5%

Statistical analysis and data mining packages Continuous monitoring tools, which may include governance risk and compliance tools

Data Analytics – Top 5 success factor



Focus on quick win: prioritize initial objectives of the project



Communicate: share information on early successes within the company and among the business units, in order to gain internal consensus



Go beyond rule based analytics



Deliveries take time: avoid the last minute rush

EY FDA Survey – Italian market

In Italy:  51% of people interviewed consider corruption as the main risk of fraud which they are exposed to 

75% of italian companies use Forensic Data Analytics (FDA) tools in order to reduce the risk of fraud and corruption



According to 89% of the people surveyed, the benefits produced by FDA tools derive from «their capability to incercept potentially poor behaviour», hard to find by other means



70% of companies interviewed believe that new FDA technologies applied to large amounts of information (i.e. «Big Data») are gaining a central role in prevention and detection of suspect behaviours within the company.

Rules based analysis and Alerting tools

Fraud Detection Strategy – Controls

Legal rappresentative > 70 years old

Approved amount 20% requested

<

Final refund < 50% authorized

For implement a structured and effective Fraud Detection Strategy, we need to introduce the concept of “Controls”. Controls are logically based on the identification and classification wrong or strange operation in order to identify all fraud pattern.

Fraud Detection Strategy – Controls values

Legal representative > 70 years old

Approved amount

<

20% requested

Check Theshold

In this case, we report a value for each controls. In detail:   

Legal representative = > 70 years old Approved amount = < 20% of amount requested Final refund = < 50% of refund authorized

Is important to identify a value that could be significant for each control.

Final refund < 50% authorized

Fraud Detection Strategy – Measure

Scoring Overall 30

Measure TOT 100

60 Legal representative > 70 years old

10 Approved amount 20% requested

<

Final refund < 50% authorized

For each control, in order to calculate the math formulas, could be assigned a specific value. In this example we assigned:  



Legal representative > 70 years old = 30 Approved amount < 20% of amount requested = 60 Final refund < 50% of refund authorized = 10

The total of all control’s value is 100.

Fraud Detection Strategy – Alert threshold Alert Threshold

0,5 Scoring Overall 30

60 Legal representative > 70 years old

10 Approved amount 20% requested

<

Final refund < 50% authorized

For the scoring overall, could be assigned a value that separate the alert threshold operation. If the value of specific controls are wired than the level designed, so the system has an alert.

Fraud Detection Strategy – Example values 0,5 Scoring Overall 30

60 Legal representative > 70 years old

10

Approved amount < 20% requested

Final refund < 50% authorized

ID domanda = 16

Example 1 Legal representative 71 years old

1 Approved amount = 16% requested

0 Final refund = 55% authorized

1 = True 0 = False

Fraud Detection Strategy – Example result 0,5 Scoring Overall 30

60 Legal representative > 70 years old

10 Approved amount 20% requested

<

Final refund < 50% authorized

ID domanda = 16

Example 1 Legal representative 71 years old

1 Approved amount 16% requested

(1x30) + (1x60) + (0x10) 100

0

=

Final refund = 55% authorized

0,9

Fraud Detection Strategy - Alerting 0,5 Scoring Overall 30

60 Legal representative > 70 years old

10 Approved amount 20% requested

<

Final refund < 50% authorized

ID domanda = 16

Example 1 Legal representative 71 years old

1 Approved amount 16% requested

0,9

=

0 Final refund = 55% authorized

Alert

Excel example: data

Excel example: formulas

Excel example: conditional formatting

Excel example: output and alarm overall

ALERTING

Advanced Graph and Chart

False Positive – The importance of analysis





ID Request: 1 

ID Request: 2

ID Request: 1 





ID Request: 3



ID Request: 3

ID Request: 2

....

All request

Allowed

Fraud

False Positive – Misuse detection





You have to define as unacceptable behavior and identify the misuse.

ID Request: 3

Alert Fraud

False Positive – Anomaly detection







You have to define as acceptable behavior and identify, logically, the other one.

ID Request: 1

ID Request: 2

Alert Allowed

False Positive and False Negative Anomaly detection

Misuse detection

False positive False positive

False negative

False negative

Allowed  

Fraud

Allowed

False Positive : an acceptable behaviour give an alarm False Negative: an unacceptable behavior don’t give an alarm

Fraud

Control object logic based

Legal representative Employee

Legal entity

Request

Supplier

Control object logic based: example

Legal representative Employee

Legal entity

Request

There are legal representatives that request more than one question?

Supplier There is a vendor that appears to only one question, and that has never seen for any other questions?

Spreadsheet vs Relational Database

Spreadsheet Id Misura Pratica Id Domanda

Des Stato Domanda

vs

Relational Database

Ente Delegato

Cuaa

Ragione Sociale

12100AZ

2774979 NON FINANZIABILE

SPORTELLO UNICO DI ROVIGO

DSRFPP77M03H620A

AZIENDA AGRICOLA VIVA

12100AZ

2774979 NON FINANZIABILE

Misura Pratica Id Domanda Ragione Sociale AGRICOLA Id Domanda SPORTELLOIdUNICO DI ROVIGO DSRFPP77M03H620A AZIENDA VIVA

12100AZ

2774979 NON FINANZIABILE

SPORTELLO UNICO DI ROVIGO

12100AZ

2779497 RICEVIBILE

12100AZ

2780086 RICEVIBILE

12100AZ

2780086 RICEVIBILE

12100AZ 2774979 AZIENDA AGRICOLA VIVAI 2774979 DSRFPP77M03H620A AZIENDA AGRICOLA VIVA 12100AZ 2774979 AZIENDA AGRICOLA VIVAI 2774979 SPORTELLO UNICO DI VENEZIA CLLMLE57R09H823U AZIENDA AGRICOLA CEL 12100AZ 2774979 AZIENDA AGRICOLA VIVAi 2774979 SPORTELLO UNICO DI TREVISO 04172990261 SOCIETA' AGRICOLA GIU 12100AZ 2779497 AZIENDA AGRICOLA CELLA 2779497 SPORTELLO UNICO DI TREVISO SOCIETA' AGRICOLA 12100AZ 278008604172990261 SOCIETA' AGRICOLA GIUSTI 2780086 GIU

12100AZ

2780086 RICEVIBILE

SPORTELLO UNICO DI TREVISO

04172990261

SOCIETA' AGRICOLA GIU

12100AZ

2780637 RICEVIBILE

SPORTELLO UNICO DI PADOVA

TMBNZR64T31E682R

TAMBARA NAZZARENO

12100AZ

2780637 RICEVIBILE

SPORTELLO UNICO DI PADOVA

TMBNZR64T31E682R

12100AZ

2781800 RICEVIBILE

TAMBARA NAZZARENO Id Misura Pratica Id Domanda Ragione Sociale SPORTELLO UNICO DI VERONA FRGMSM70C25H783B FRIGOTTO MASSIMO

12100AZ

2781800 RICEVIBILE

12100AZFRGMSM70C25H783B 2774979 AZIENDA AGRICOLA VIVAI MASSIMO SPORTELLO UNICO DI VERONA FRIGOTTO

12100AZ

2781803 FINANZIABILE

12100AZBLLNTN80M18L364P 2774979 AZIENDA AGRICOLA VIVAIANTONIO SPORTELLO UNICO DI PADOVA BELLOMI

12100AZ

2781803 FINANZIABILE

SPORTELLO UNICO DI PADOVA

BLLNTN80M18L364P

BELLOMI ANTONIO

12100AZ

2781884 RICEVIBILE

SPORTELLO UNICO DI TREVISO

04030120267

IRIS VIGNETI SOC.SEMP

12100AZ

2781884 RICEVIBILE

SPORTELLO UNICO DI TREVISO

04030120267

IRIS VIGNETI SOC.SEMP

12100AZ

2781884 RICEVIBILE

SPORTELLO UNICO DI TREVISO

04030120267

IRIS VIGNETI SOC.SEMP

A database isTREVISO a storage space for content andBONOTTO information The12100AZ spreadsheet, like2782362 Microsoft Excel, is a simpleSPORTELLO FINANZIABILE UNICO DI BNTLDA58A23I124R ALDO (data). 12100AZ 2782362 FINANZIABILE SPORTELLO UNICO DI TREVISO BNTLDA58A23I124R BONOTTO ALDO tool that could be used to collect, order and analyze some various data. Any collection of homogeneous data, stored in a structured form could be used to perform complex analysis with multiple variables and conditions. The spreadsheet could be used for simply The database are based on relationship and on link analysis and the results could be represented in connections. basic methods.

Relational Databases - Introduction

Relational databases are category of databases (actually the most common at the moment) in which the data are stored in tables, originally called relations. 

A table is a collection of related data entries and it consists of columns and rows.



A database contains one or more tables, and each table contains rows (records) of data

ID

FIRST_NAME

LAST_NAME

CITY

COUNTRY

Height

1

Albert

Lucas

London

England

184

2

Beatrice

Monroe

New York

USA

171

3

Charles

Jones

New York

USA

176

4

Diane

Mc Gregor

New York

USA

165

For instance, the table above contains 4 records, each one of those having 6 fields. In order to interact with a relational database systems, that is to insert and retrieve data, we have to use a specific programming language. To this purpose, most of databases uses SQL (Standard Query Language). SQL provides the syntax to create, retrieve, update or delete a piece of information

Why databases are useful

Databases are specifically designed for dealing with large amounts of data • They use scripting languages specific for data manipulation • They provide features for mission critical aspects such as security, efficiency, reliability, fault tolerance, data consistency, backup, etc. However, most of business applications use a database system and the use of databases is the key for data analytics. Nevertheless, when the amount of information grows, using spreadsheets becomes particularly difficult, or even unfeasible.

Relational Databases - Querying

Given the table mentioned before, if we would like to get from our table the information about the people living in New York and taller than 170 cm we can write a SQL statement like this: Select * From PEOPLE Where CITY = ‘New York’ and height > 170 Which translates to «get all the rows from table people having ‘New York’ as value for the field CITY», and will return this result ID

FIRST_NAME

LAST_NAME

2

Beatrice

3

Charles

ADDRESS

CITY

Height

Monroe

New York

171

Jones

New York

176

COUNTRY

Each clause uses a comparison operator, such as the “equal” (=) or the “greater than” (>) used in this example. Obviously, we can use as much conditions as we want inside the where clause linking them together with the logical operators AND and OR

Relational Databases – Joining tables

However, one of the key principles of a databases is to have multiple tables, in order to organize the information in an efficient and effective manner. For example we might have a second table containing the phone numbers:

In order to associate the owner’s name to each phone number we can write a SQL statement contain a JOIN instruction:

PHONE_ID

PERSON_ID

PHONE_TYPE

PHONE_NUMBER

1

1

Mobile_business

122548215

2

1

Mobile_personal

445431287

3

2

Home_personal

245482136

4

4

Mobile_business

545825315

5

4

Home_personal

785453568

Select FIRST_NAME, LAST_NAME, PHONE_TYPE, PHONE_NUMBER From PEOPLE as P join PHONE_BOOK as B ON PEOPLE.PERSON_ID = B.PERSON_ID FIRST_NAME

LAST_NAME

PHONE_TYPE

PHONE_NUMBER

Albert

Lucas

Mobile_business

122548215

Albert

Lucas

Mobile_personal

445431287

Beatrice

Monroe

Home_personal

245482136

Diane

Mc Gregor

Mobile_business

545825315

Diane

Mc Gregor

Home_personal

785453568

Relational Databases – Types of join

There is more than one viable variants when joining two tables:  Inner Join: rows that matches on both tables  Left join: all rows from the left table, and the matched rows from the left table  Right join: all rows from the right table, and the matched rows from the right  Full outer join: all the rows from both tables (combining right and left join) Inner join

Right join

Left join

Full outer join

Relational Databases – Aggregate functions

PURCHASE_ID

Aggregation is a common operation when dealing with data. Considering those data, we could need for example to compute the total amount. To this purpose we can use the aggregate function sum(): Select SUM(AMOUNT) From PURCHASES

PURCHASE_TYPE

AMOUNT

DATE

1

cat_1

1000

2015/02/10

2

cat_3

1500

2015/03/16

3

cat_1

700

2015/02/22

4

cat_2

1400

2015/03/07

5

cat_3

400

2015/04/27

6

cat_3

1850

2015/05/08

7

cat_2

975

2015/05/21

This translates to «sum all the values of the field AMOUNT», and produces the result on the right.

TOT_AMOUNT

7825

SQL provides a large number of aggregate functions, like min, max, avg, stdev, etc. Select SUM(AMOUNT), min(AMOUNT), max(AMOUNT) From PURCHASES

TOT_AMOUNT

MIN_AMOUNT

MAX_AMOUNT

7825

400

1850

Relational Databases – Grouping

PURCHASE_ID

SQL allows one to use the group by statement in conjunction with the aggregate functions in order to group the result set by one or more columns.

Considering always the same example, we might want to obtain such total for each PURCHASE_TYPE separately. Select PURCHASE_TYPE, SUM(AMOUNT) From PURCHASES GROUP BY PURCHASE_TYPE This translates to «regroup the data by the field PURCHASE_TYPE, then from each group compute the sum of the field AMOUNT». This produces:

PURCHASE_TYPE

AMOUNT

DATE

1

cat_1

1000

2015/02/10

2

cat_3

1500

2015/03/16

3

cat_1

700

2015/02/22

4

cat_2

1400

2015/03/07

5

cat_3

400

2015/04/27

6

cat_3

1850

2015/05/08

7

cat_2

975

2015/05/21

PURCHASE_TYPE

TOT_AMOUNT

cat_1

1700

cat_2

2375

cat_3

3750

When using a group by statement, the fields extracted in the select statement can only be aggregate functions or the fields used for grouping.

Dashboarding and decision process

Dashboarding Introduction to Tableau

Tableau Software is designed to quickly analyze, visualize and share information. You can work with data, move from simple to complex visualizations and combine them in interactive dashboards.

Dashboarding The Marks Card and Buttons

Tableau applies label, color, shape and size to visualizations using the view cards.

Dashboarding Example dashboard

Structured and Unstructured Data

Data source in today’s organization

Text

Graphics Unstructured Data

Email CRM

Structured Data

Databases

Presentations & Spreadsheets Transactions Systems

20% 20%

80% 80%

eDiscovery Electronic Discovery: the process of identifying, managing, preserving, processing, analyzing, reviewing producing and presenting of electronically stored information, usually in the context of an investigation or litigation.

Experience Robust process Highly trained personnel Validation and cleansing Metadata and text extraction De-duplication

Custodians

Relevant

Keywords

Quality control Exception reporting

Date range

Not Relevant Tracking and reporting

Project Management

Next frontier – Link Analysis

eDiscovery

Fraud Data Analytics

Output

Output

Link Analisys  

Analysis of the relationships between the identified findings and reconstruction of dependencies. Data Sources: anomalous transactions , relevant documents