Screen credit scorecard predictors for predictive value
returns the output variable, metric_table
= screenpredictors(data
)metric_table
, a MATLAB® table containing the calculated values for several measures of
predictive power for each predictor variable in the data
.
Use the screenpredictors
function as a preprocessing step in
the Credit Scorecard Modeling Workflow (Financial Toolbox) to reduce the number of predictor variables before you create the credit
scorecard using the creditscorecard
function from
Financial
Toolbox™.
specifies options using one or more name-value pair arguments in addition to the
input arguments in the previous syntax. metric_table
= screenpredictors(___,Name,Value
)
creditscorecard
ObjectReduce the number of predictor variables by screening predictors before you create a credit scorecard.
Use the CreditCardData.mat
file to load the data (using a dataset from Refaat 2011).
load CreditCardData
Define 'IDVar'
and 'ResponseVar'
.
idvar = 'CustID'; responsevar = 'status';
Use screenpredictors
to calculate the predictor screening metrics. The function returns a table containing the metrics values. Each table row corresponds to a predictor from the input table data.
metric_table = screenpredictors(data,'IDVar', idvar,'ResponseVar', responsevar)
metric_table=9×7 table
InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing
_________ _____________ _______ _______ _______ __________ ______________
CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0
TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0
CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0
TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0
UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0
AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0
EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0
OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0
ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0
metric_table = sortrows(metric_table,'AccuracyRatio','descend')
metric_table=9×7 table
InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing
_________ _____________ _______ _______ _______ __________ ______________
CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0
CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0
TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0
EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0
AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0
ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0
OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0
UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0
TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0
Based on the AccuracyRatio
metric, select the top predictors to use when you create the creditscorecard
object.
varlist = metric_table.Row(metric_table.AccuracyRatio > 0.09)
varlist = 4x1 cell array
{'CustIncome'}
{'CustAge' }
{'TmWBank' }
{'EmpStatus' }
Use creditscorecard
to create a createscorecard
object based on only the "screened" predictors.
sc = creditscorecard(data,'IDVar', idvar,'ResponseVar', responsevar, 'PredictorVars', varlist)
sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {'CustAge' 'CustIncome' 'TmWBank'} CategoricalPredictors: {'EmpStatus'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {'CustAge' 'EmpStatus' 'CustIncome' 'TmWBank'} Data: [1200x11 table]
data
— Data for creditscorecard
objectData for the creditscorecard
object, specified as a
MATLAB table, where each column of data can be any one of the
following data types:
Numeric
Logical
Cell array of character vectors
Character array
Categorical
String
Data Types: table
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
metric_table =
screenpredictors(data,'IDVar','CustAge','ResponseVar','status','PredictorVars',{'CustID','CustIncome'})
'IDVar'
— Name of identifier variable''
(default) | character vectorName of identifier variable, specified as the comma-separated pair
consisting of 'IDVar'
and a case-sensitive
character vector. The 'IDVar'
data can be ordinal
numbers or Social Security numbers. By specifying
'IDVar'
, you can omit the identifier variable
from the predictor variables easily.
Data Types: char
'ResponseVar'
— Response variable name for “Good” or “Bad” indicatordata
input (default) | character vectorResponse variable name for the “Good” or
“Bad” indicator, specified as the comma-separated pair
consisting of 'ResponseVar'
and a case-sensitive
character vector. The response variable data must be binary.
If not specified, 'ResponseVar'
is set to the
last column of the input data
by
default.
Data Types: char
'PredictorVars'
— Names of predictor variablesVarNames
and
{
IDVar
,ResponseVar
}
(default) | cell array of character vectors | string arrayNames of predictor variables, specified as the comma-separated
pair consisting of 'PredictorVars'
and a
case-sensitive cell array of character vectors or string array. By
default, when you create a creditscorecard
object, all variables are predictors except for
IDVar
and ResponseVar
.
Any name you specify using 'PredictorVars'
must
differ from the IDVar
and
ResponseVar
names.
Data Types: cell
| string
'WeightsVar'
— Name of weights variable''
(default) | character vectorName of weights variable, specified as the comma-separated pair
consisting of 'WeightsVar'
and a case-sensitive
character vector to indicate which column name in the
data
table contains the row weights.
If you do not specify 'WeightsVar'
when you
create a creditscorecard
object, then the
function uses the unit weights as the observation weights.
Data Types: char
'NumBins'
— Number of (equal frequency) bins for numeric predictors20
(default) | scalar numericNumber of (equal frequency) bins for numeric predictors, specified
as the comma-separated pair consisting of
'NumBins'
and a scalar numeric.
Data Types: double
'FrequencyShift'
— Indicates small shift in frequency tables that contain zero entries0.5
(default) | scalar numeric between 0
and
1
Small shift in frequency tables that contain zero entries,
specified as the comma-separated pair consisting of
'FrequencyShift'
and a scalar numeric with a
value between 0
and 1
.
If the frequency table of a predictor contains any "pure" bins
(containing all goods or all bads) after you bin the data using
autobinning
, then
the function adds the 'FrequencyShift'
value to
all bins in the table. To avoid any perturbation, set
'FrequencyShift'
to
0
.
Data Types: double
metric_table
— Calculated values for predictor screening metricsCalculated values for the predictor screening metrics, returned as table. Each table row corresponds to a predictor from the input table data. The table columns contain calculated values for the following metrics:
'InfoValue'
— Information value.
This metric measures the strength of a predictor in the
fitting model by determining the deviation between the
distributions of "Goods"
and
"Bads"
.
'AccuracyRatio'
— Accuracy
ratio.
'AUROC'
— Area under the ROC
curve.
'Entropy'
— Entropy. This metric
measures the level of unpredictability in the bins. You can
use the entropy metric to validate a risk model.
'Gini'
— Gini. This metric
measures the statistical dispersion or inequality within a
sample of data.
'Chi2PValue'
— Chi-square
p-value. This metric is computed from
the chi-square metric and is a measure of the statistical
difference and independence between groups.
'PercentMissing'
— Percentage of
missing values in the predictor. This metric is expressed in
decimal form.
A modified version of this example exists on your system. Do you want to open this version instead?
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
Select web siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.