Linear and Logistic Regression

What is Simple Linear Regression ? ›Simple linear regression is a statistical method that allows us to summarize and study relationships between two c...

0 downloads 43 Views 3MB Size
Linear and Logistic Regression (the SQL way)

What is the purpose of this presentation? › To show if linear or logistic regression is possible with SQL and to provide their implementation › To provide enough arguments whether an SQL implementation of these regressions is worth or not › When is it worth to perform any numerical calculations on the database side.

Presentation Structure 1. Linear Regression  What is linear regression ?  Use cases  Solving for coefcients in SQL (with demo)

2. Logistic Regression     

What is logistic regression ? Use cases Logistic Regression vs Linear Regression comparison Gradient Descent Solving for coefcients in C++ demo

3. Discussion whether SQL implementation of the above is worth it or not?

What is Simple Linear Regression ? › Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: › One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. › The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

Scatter Plots of Data with Various Correlation Coefcients Y










Why do we need Linear Regression ? › What dose is required for a particular treatment ? › How many days will take for a full recovery ? › How many cars will be parked here tomorrow ? › How many insurances will be sold next month ?

Simple Linear Regression ›  Given the training set we try to fnd the ”best” coefcients which would give a › where

Simple linear regression (SLR) ›  There are multiple ways in which we could obtain the coefcients : – Ordinary least squares (OLS) : conceptually the simplest and computationally straightforward (advantageous for SQL) – Generalized least squares (GLS) – Iteratively reweighted least squares (IRWLS) – Percentage least squares (PLS)

 SLR - OLS Minimize ›  =0

 SLR - OLS Minimize ›  =0 =0 =0

SLR ›  With only 2 parameters to estimate : computationally it is not a challenge for any DBMS › But increasing the #parameters it will slow by noticeable constant factor if we solve all in our ”scalar” fashion. › However we could use a some matrix tricks to solve for any size of .

Multilinear Linear Regression with LS › 

  Minimize › 

Solving with QR   › 

No need to compute , bring it to a simpler form : where Q is orthogonal() and R upper triangular

  ›  At this point it is trivial to solve for because R is an upper triangular matrix.

› QR Factorization simplifed the process but it’s still tedious (this is how the “lm” routine is implemented in R with C and Fortran calls underneath)

Problems with Multiple linear regression ? › Operations for linear algebra must be implemented : – – – –

Matrix/vector multiplication Matrix inverse/pseudo-inverse Matrix factorization like SVD, QR, Cholesky, Gauss-Jordan Too much number crunching for an engine which has diferent purpose, it’s far away from FORTRA!嘻

› Even C++ Boost’s library for basic linear algebra (BLAS) does a poor job in comparison to MATLAB.

What is Logistic Regression ? › To predict an outcome variable that is categorical from predictor variables that are continuous and/or categorical ›  Used because having a categorical outcome variable violates the assumption of linearity in normal regression › The only “real” limitation for logistic regression is that the outcome variable must be discrete › Logistic regression deals with this problem by using a logarithmic transformation on the outcome variable which allow us to model a nonlinear association in a linear way › It expresses the linear regression equation in logarithmic terms (called the logit)

Logistic Regression Use Cases › Google uses it to classify spam or not spam email › Is a loan good for you or bad? › Will my political candidate win the election? › Will this user buy a monthly !etfix subscription? › Will the prescribed drugs have the desired efect? › Should this bank transaction be classifed as a fraud?

Why not Linear Regression instead of Logistic Regression ? › Maybe we could solve the problem mathematically only using Linear Regression for classifcation ? and that will spare us a lot of complexity. › We would like to classify our input into 2 categories : either a 0 or 1 (ex: 0  !o, 1  Yes)

Why not Linear Regression instead of Logistic Regression ? ›  Assume if

Real Data, -> Training set  

(1) Yes (0.5)Malignan t? (0) !o

› It doesn’t look that unreasonable until…

Tumor Size

Why not Linear Regression instead of Logistic Regression ? ›  Assume if  


(1) Yes (0.5)Malignan t? (0) !o

Tumor Size

› !ew data comes to our model and ”destroys” it

Why not Linear Regression instead of Logistic Regression ? ›  Assume if  


(1) Yes (0.5)Malignan t? (0) !o

› Where should the separation line be

Tumor Size

Why not Linear Regression instead of Logistic Regression ? ›  Currently : › For classifcation we need › Logistic Regression : › Let’s create a new function which will satisfy our conditions. › wrap it to

Logistic Regression › 

Gradient Descent › A technique to fnd minimum of a function › With other words “for which input parameters of the function will I get a minimum output” › !ot the best algorithm but computationally and conceptually simple

Gradient Descent Intuition › This technique works well for convex functions. convex

landing point

Gradient Descent Intuition › This technique is not the best for non-convex functions.

nonconvex potential landing points

› It can potentially fnd the global min but also local

Gradient Descent Intuition

Gradient Descent ›Given some function   We want High level steps: • Start with some • will keep changing to reduce until hopefully we come to the , that is : we converge.

Gradient Descent Algorithm ›Repeat   until convergence {

Gradient Descent Intuition ›Positive Tangent Case  

repeat until convergence { The learning rate will adjust accordingly.   �0

Gradient Descent Intuition ›!egative   Tangent Case

repeat until convergence { The learning rate will adjust accordingly.   �0

Some Ugly Cases for Gradient Descent › It could converge slow, taking micro steps (almost fat surfaces)

›  It may not converge at all (large , may overshoot)

Starting point

Gradient Descent notation ››   › Can be generalized for any dimension

Rosenbrock function › A special non-convex › Also known as function used as a Rosenbrock’s valley performance test problem function or for optimization algorithms. Rosenbrock’s banana function.  

Global min at

Logistic Regression with Gradient Descent › 

Repeat until convergence {

Is pure SQL Regression worth it? MAYBE


› Simple linear regression

› Multiple linear regression › Multiple logistic regression › “!umerical” Algorithms

Stackoverfow friendliness