Troubleshooting Credit Scorecard Results
This topic shows some of the results when using credit scorecards that need troubleshooting. These examples cover the full range of the credit score card workflow. For details on the overall process of creating and developing credit scorecards, see Credit Scorecard Modeling Workflow.
Predictor Name Is Unspecified and the Parser Returns an Error
If you attempt to use modifybins, bininfo, or plotbins and omit the predictor's
name, the parser returns an error.
load CreditCardData sc = creditscorecard(data,'IDVar','CustID','GoodLabel',0); modifybins(sc,'CutPoints',[20 30 50 65])
Error using creditscorecard/modifybins (line 79) Expected a string for the parameter name, instead the input type was 'double'.
Solution: Make sure to include the predictor’s
name when using these functions. Use this syntax to specify the
PredictorName when using modifybins.
load CreditCardData sc = creditscorecard(data,'IDVar','CustID','GoodLabel',0); modifybins(sc,'CustIncome','CutPoints',[20 30 50 65]);
Using bininfo or plotbins Before Binning
If you use bininfo or plotbins before binning, the
results might be unusable.
load CreditCardData sc = creditscorecard(data,'IDVar','CustID','GoodLabel',0); bininfo(sc,'CustAge') plotbins(sc,'CustAge')
ans =
Bin Good Bad Odds WOE InfoValue
________ ____ ___ _______ _________ __________
'21' 2 1 2 -0.011271 3.1821e-07
'22' 3 1 3 0.39419 0.00047977
'23' 1 2 0.5 -1.3976 0.0053002
'24' 3 4 0.75 -0.9921 0.0062895
'25' 3 1 3 0.39419 0.00047977
'26' 4 2 2 -0.011271 6.3641e-07
'27' 6 5 1.2 -0.5221 0.0026744
'28' 10 2 5 0.90502 0.0067112
'29' 8 6 1.3333 -0.41674 0.0021465
'30' 9 10 0.9 -0.80978 0.011321
'31' 8 6 1.3333 -0.41674 0.0021465
'32' 13 13 1 -0.70442 0.011663
'33' 9 11 0.81818 -0.90509 0.014934
'34' 14 12 1.1667 -0.55027 0.0070391
'35' 18 10 1.8 -0.11663 0.00032342
'36' 23 14 1.6429 -0.20798 0.0013772
'37' 28 19 1.4737 -0.31665 0.0041132
'38' 24 14 1.7143 -0.16542 0.0008894
'39' 21 14 1.5 -0.29895 0.0027242
'40' 31 12 2.5833 0.24466 0.0020499
'41' 21 18 1.1667 -0.55027 0.010559
'42' 29 9 3.2222 0.46565 0.0062605
'43' 29 23 1.2609 -0.47262 0.010312
'44' 28 16 1.75 -0.1448 0.00078672
'45' 36 16 2.25 0.10651 0.00048246
'46' 33 19 1.7368 -0.15235 0.0010303
'47' 28 6 4.6667 0.83603 0.016516
'48' 32 17 1.8824 -0.071896 0.00021357
'49' 38 10 3.8 0.63058 0.013957
'50' 33 14 2.3571 0.15303 0.00089239
'51' 28 9 3.1111 0.43056 0.0052525
'52' 35 8 4.375 0.77149 0.01808
'53' 14 8 1.75 -0.1448 0.00039336
'54' 27 12 2.25 0.10651 0.00036184
'55' 20 9 2.2222 0.094089 0.00021044
'56' 20 11 1.8182 -0.10658 0.00029856
'57' 16 7 2.2857 0.12226 0.00028035
'58' 11 7 1.5714 -0.25243 0.00099297
'59' 11 6 1.8333 -0.098283 0.00013904
'60' 9 4 2.25 0.10651 0.00012061
'61' 11 2 5.5 1.0003 0.0086637
'62' 8 0 Inf Inf Inf
'63' 7 1 7 1.2415 0.0076953
'64' 10 0 Inf Inf Inf
'65' 4 1 4 0.68188 0.0016791
'66' 6 1 6 1.0873 0.0053857
'67' 2 3 0.66667 -1.1099 0.0056227
'68' 6 1 6 1.0873 0.0053857
'69' 6 0 Inf Inf Inf
'70' 1 0 Inf Inf Inf
'71' 1 0 Inf Inf Inf
'72' 1 0 Inf Inf Inf
'73' 3 0 Inf Inf Inf
'74' 1 0 Inf Inf Inf
'Totals' 803 397 2.0227 NaN Inf
The plot for CustAge is not readable because it has too many
bins. Also, bininfo returns data that have Inf
values for the WOE due to zero observations for either Good or
Bad.
Solution: Bin the data using autobinning or modifybins before plotting or
inquiring about the bin statistics, to avoid having too many bins or having
NaNs and Infs. For example, you can use
the name-value pair argument for AlgoOptions with the autobinning function to define the
number of bins.
load CreditCardData sc = creditscorecard(data,'IDVar','CustID','GoodLabel',0); AlgoOptions = {'NumBins',4}; sc = autobinning(sc,'CustAge','Algorithm','EqualFrequency',... 'AlgorithmOptions',AlgoOptions); bininfo(sc,'CustAge','Totals','off') plotbins(sc,'CustAge')
ans =
Bin Good Bad Odds WOE InfoValue
___________ ____ ___ ______ ________ _________
'[-Inf,39)' 186 133 1.3985 -0.36902 0.03815
'[39,46)' 195 108 1.8056 -0.11355 0.0033158
'[46,52)' 192 75 2.56 0.23559 0.011823
'[52,Inf]' 230 81 2.8395 0.33921 0.02795
If Categorical Data Is Given as Numeric
Categorical data is often recorded using numeric values, and can be stored in a
numeric array. Although you know that the data should be interpreted as categorical
information, for creditscorecard this predictor looks like a
numeric array.
To show the case where categorical data is given as numeric data, the data for the
variable ResStatus is intentionally converted to numeric
values.
load CreditCardData data.ResStatus = double(data.ResStatus); sc = creditscorecard(data,'IDVar','CustID')
sc =
creditscorecard with properties:
GoodLabel: 0
ResponseVar: 'status'
VarNames: {1x11 cell}
NumericPredictors: {1x7 cell}
CategoricalPredictors: {'EmpStatus' 'OtherCC'}
IDVar: 'CustID'
PredictorVars: {1x9 cell}Note that 'ResStatus' appears as part of the
NumericPredictors property. If we applied automatic binning,
the resulting bin information raises flags regarding the predictor
type.
sc = autobinning(sc,'ResStatus'); [bi,cg] = bininfo(sc,'ResStatus')
bi =
Bin Good Bad Odds WOE InfoValue
__________ ____ ___ ______ _________ __________
'[-Inf,2)' 365 177 2.0621 0.019329 0.0001682
'[2,Inf]' 438 220 1.9909 -0.015827 0.00013772
'Totals' 803 397 2.0227 NaN 0.00030592
cg =
2The numeric ranges in the bin labels show that 'ResStatus' is
being treated as a numeric variable. This is also confirmed by the fact that the
optional output from bininfo is a numeric array of cut
points, as opposed to a table with category groupings. Moreover, the output from
predictorinfo confirms that the
credit scorecard is treating the data as
numeric.
[T,Stats] = predictorinfo(sc,'ResStatus')
T =
PredictorType LatestBinning
_____________ ______________________
ResStatus 'Numeric' 'Automatic / Monotone'
Stats =
Value
_______
Min 1
Max 3
Mean 1.7017
Std 0.71863
Solution: For creditscorecard,
'Categorical' means a MATLAB® categorical data type. For more information, see categorical. To
treat'ResStatus' as categorical, change the
'PredictorType' of the PredictorName
'ResStatus' from 'Numeric' to
'Categorical' using modifypredictor.
sc = modifypredictor(sc,'ResStatus','PredictorType','Categorical') [T,Stats] = predictorinfo(sc,'ResStatus')
sc =
creditscorecard with properties:
GoodLabel: 0
ResponseVar: 'status'
VarNames: {1x11 cell}
NumericPredictors: {1x6 cell}
CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'}
IDVar: 'CustID'
PredictorVars: {1x9 cell}
T =
PredictorType Ordinal LatestBinning
_____________ _______ _______________
ResStatus 'Categorical' false 'Original Data'
Stats =
Count
_____
C1 542
C2 474
C3 184
Note that 'ResStatus' now appears as part of the Categorical
predictors. Also, predictorinfo now describes
'ResStatus' as categorical and displays the category counts.
If you apply autobinning, the categories are now
reordered, as shown by calling bininfo, which also shows the
category labels, as opposed to numeric ranges. The optional output of bininfo is now a category grouping
table.
sc = autobinning(sc,'ResStatus'); [bi,cg] = bininfo(sc,'ResStatus')
bi =
Bin Good Bad Odds WOE InfoValue
________ ____ ___ ______ _________ _________
'C2' 307 167 1.8383 -0.095564 0.0036638
'C1' 365 177 2.0621 0.019329 0.0001682
'C3' 131 53 2.4717 0.20049 0.0059418
'Totals' 803 397 2.0227 NaN 0.0097738
cg =
Category BinNumber
________ _________
'C2' 1
'C1' 2
'C3' 3 NaNs Returned When Scoring a “Test” Dataset
When applying a creditscorecard model to a “test”
dataset using the score function, if an observation in the
“test” dataset has a NaN or
<undefined> value, a NaN total score
is returned for each of these observations. For example, a
creditscorecard object is created using
“training” data.
load CreditCardData sc = creditscorecard(data,'IDVar','CustID'); sc = autobinning(sc); sc = fitmodel(sc);
1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08
2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06
3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601
4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257
5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306
6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078
7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769
Generalized Linear regression model:
logit(status) ~ 1 + CustAge + ResStatus + EmpStatus + CustIncome + TmWBank + OtherCC + AMBalance
Distribution = Binomial
Estimated Coefficients:
Estimate SE tStat pValue
________ ________ ______ __________
(Intercept) 0.70239 0.064001 10.975 5.0538e-28
CustAge 0.60833 0.24932 2.44 0.014687
ResStatus 1.377 0.65272 2.1097 0.034888
EmpStatus 0.88565 0.293 3.0227 0.0025055
CustIncome 0.70164 0.21844 3.2121 0.0013179
TmWBank 1.1074 0.23271 4.7589 1.9464e-06
OtherCC 1.0883 0.52912 2.0569 0.039696
AMBalance 1.045 0.32214 3.2439 0.0011792
1200 observations, 1192 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16Suppose that a missing observation (Nan) is added to the data
and then newdata is scored using the score function. By default, the
points and score assigned to the missing value is NaN.
newdata = data(1:10,:); newdata.CustAge(1) = NaN; [Scores,Points] = score(sc,newdata)
Scores =
NaN
1.4646
0.7662
1.5779
1.4535
1.8944
-0.0872
0.9207
1.0399
0.8252
Points =
CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance
________ _________ _________ __________ _________ ________ _________
NaN -0.031252 -0.076317 0.43693 0.39607 0.15842 -0.017472
0.479 0.12696 0.31449 0.43693 -0.033752 0.15842 -0.017472
0.21445 -0.031252 0.31449 0.081611 0.39607 -0.19168 -0.017472
0.23039 0.12696 0.31449 0.43693 -0.044811 0.15842 0.35551
0.479 0.12696 0.31449 0.43693 -0.044811 0.15842 -0.017472
0.479 0.12696 0.31449 0.43693 0.39607 0.15842 -0.017472
-0.14036 0.12696 -0.076317 -0.10466 -0.033752 0.15842 -0.017472
0.23039 0.37641 0.31449 0.43693 -0.033752 -0.19168 -0.21206
0.23039 -0.031252 -0.076317 0.43693 -0.033752 0.15842 0.35551
0.23039 0.12696 -0.076317 0.43693 -0.033752 0.15842 -0.017472Also, notice that because the CustAge predictor for the first
observation is NaN, the corresponding Scores
output is NaN also.
Solution: To resolve this issue, use the
formatpoints function with the
name-value pair argument Missing. When using
Missing, you can replace a predictor’s
NaN value according to three alternative criteria
('ZeroWoe', 'MinPoints', or
'MaxPoints').
For example, use Missing to replace the missing value with
the 'MinPoints' option. The row with the missing data now has a
score corresponding to assigning it the minimum possible points for
CustAge.
sc = formatpoints(sc,'Missing','MinPoints'); [Scores,Points] = score(sc,newdata) PointsTable = displaypoints(sc); PointsTable(1:7,:)
Scores =
0.7074
1.4646
0.7662
1.5779
1.4535
1.8944
-0.0872
0.9207
1.0399
0.8252
Points =
CustAge ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance
________ _________ _________ __________ _________ ________ _________
-0.15894 -0.031252 -0.076317 0.43693 0.39607 0.15842 -0.017472
0.479 0.12696 0.31449 0.43693 -0.033752 0.15842 -0.017472
0.21445 -0.031252 0.31449 0.081611 0.39607 -0.19168 -0.017472
0.23039 0.12696 0.31449 0.43693 -0.044811 0.15842 0.35551
0.479 0.12696 0.31449 0.43693 -0.044811 0.15842 -0.017472
0.479 0.12696 0.31449 0.43693 0.39607 0.15842 -0.017472
-0.14036 0.12696 -0.076317 -0.10466 -0.033752 0.15842 -0.017472
0.23039 0.37641 0.31449 0.43693 -0.033752 -0.19168 -0.21206
0.23039 -0.031252 -0.076317 0.43693 -0.033752 0.15842 0.35551
0.23039 0.12696 -0.076317 0.43693 -0.033752 0.15842 -0.017472
ans =
Predictors Bin Points
__________ ___________ _________
'CustAge' '[-Inf,33)' -0.15894
'CustAge' '[33,37)' -0.14036
'CustAge' '[37,40)' -0.060323
'CustAge' '[40,46)' 0.046408
'CustAge' '[46,48)' 0.21445
'CustAge' '[48,58)' 0.23039
'CustAge' '[58,Inf]' 0.479Notice that the Scores output has a value for the first
customer record because CustAge now has a value and the score can
be calculated for the first customer record.
See Also
creditscorecard | autobinning | bininfo | predictorinfo | modifypredictor | modifybins | bindata | plotbins | fitmodel | displaypoints | formatpoints | score | setmodel | probdefault | validatemodel