[1] Agresti, A. *Categorical Data Analysis*, 2nd Ed.
Hoboken, NJ: John Wiley & Sons, Inc., 2002.

[2] Allwein, E., R. Schapire, and Y. Singer.
“Reducing multiclass to binary: A unifying approach for margin
classiﬁers.” *Journal of Machine Learning
Research*. Vol. 1, 2000, pp. 113–141.

[3] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised
Classification Learning Algorithms.” *Neural Computation*, Vol.
11, No. 8, 1999, pp. 1885–1992.

[4] Blackard, J. A. and D. J. Dean. "Comparative accuracies of artificial
neural networks and discriminant analysis in predicting forest cover types from cartographic
variables". *Computers and Electronics in Agriculture* Vol. 24, Issue
3, 1999, pp. 131–151.

[5] Bottou, L., and Chih-Jen Lin. “Support Vector Machine
Solvers.” *Large Scale Kernel Machines* (L. Bottou, O. Chapelle,
D. DeCoste, and J. Weston, eds.). Cambridge, MA: MIT Press, 2007.

[6] Bouckaert. R. “Choosing Between Two
Learning Algorithms Based on Calibrated Tests.” *International
Conference on Machine Learning*, pp. 51–58, 2003.

[7] Bouckaert, R. and E. Frank. “Evaluating the Replicability of
Significance Tests for Comparing Learning Algorithms.” *In Advances in
Knowledge Discovery and Data Mining, 8th Pacific-Asia Conference*, 2004, pp.
3–12.

[8] Breiman, L. *Bagging Predictors.* Machine
Learning 26, 1996, pp. 123–140.

[9] Breiman, L. *Random Forests.* Machine Learning
45, 2001, pp. 5–32.

[10] Breiman, L. `https://www.stat.berkeley.edu/~breiman/RandomForests/`

[11] Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone.
*Classification and Regression Trees.* Boca Raton, FL: Chapman &
Hall, 1984.

[12] Christianini, N., and J. Shawe-Taylor. *An Introduction to
Support Vector Machines and Other Kernel-Based Learning Methods*. Cambridge,
UK: Cambridge University Press, 2000.

[13] Dietterich, T. “Approximate statistical tests for comparing
supervised classification learning algorithms.” *Neural
Computation*, Vol. 10, No. 7, 1998, pp. 1895–1923.

[14] Dietterich, T., and G. Bakiri. “Solving
Multiclass Learning Problems Via Error-Correcting Output Codes.” *Journal
of Artificial Intelligence Research*. Vol. 2, 1995, pp.
263–286.

[15] Escalera, S., O. Pujol, and P. Radeva.
“On the decoding process in ternary error-correcting output
codes.” *IEEE Transactions on Pattern Analysis and
Machine Intelligence*. Vol. 32, Issue 7, 2010, pp. 120–134.

[16] Escalera, S., O. Pujol, and P. Radeva.
“Separability of ternary codes for sparse designs of error-correcting
output codes.” *Pattern Recogn*. Vol.
30, Issue 3, 2009, pp. 285–297.

[17] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working
set selection using second order information for training support
vector machines.” *Journal of Machine Learning Research*,
Vol 6, 2005, pp. 1889–1918.

[18] Fagerlan, M.W., S Lydersen, P. Laake. “The
McNemar Test for Binary Matched-Pairs Data: Mid-p and Asymptotic Are
Better Than Exact Conditional.” *BMC Medical Research
Methodology*. Vol. 13, 2013, pp. 1–8.

[19] Freund, Y. *A more robust boosting
algorithm.* arXiv:0905.2138v1, 2009.

[20] Freund, Y. and R. E. Schapire. *A Decision-Theoretic
Generalization of On-Line Learning and an Application to Boosting.* J. of
Computer and System Sciences, Vol. 55, 1997, pp. 119–139.

[21] Friedman, J. *Greedy function approximation: A gradient
boosting machine.* Annals of Statistics, Vol. 29, No. 5, 2001, pp.
1189–1232.

[22] Friedman, J., T. Hastie, and R. Tibshirani. *Additive
logistic regression: A statistical view of boosting.* Annals of Statistics,
Vol. 28, No. 2, 2000, pp. 337–407.

[23] Hastie, T., and R. Tibshirani. “Classification
by Pairwise Coupling.” *Annals of Statistics*.
Vol. 26, Issue 2, 1998, pp. 451–471.

[24] Hastie, T., R. Tibshirani, and J. Friedman. *The Elements of
Statistical Learning*, second edition. New York: Springer, 2008.

[25] Ho, C. H. and C. J. Lin. “Large-Scale
Linear Support Vector Regression.” *Journal of Machine
Learning Research*, Vol. 13, 2012, pp. 3323–3348.

[26] Ho, T. K. *The random subspace method for constructing
decision forests.* IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 20, No. 8, 1998, pp. 832–844.

[27] Hsieh, C. J., K. W. Chang, C. J. Lin,
S. S. Keerthi, and S. Sundararajan. “A Dual Coordinate Descent
Method for Large-Scale Linear SVM.” *Proceedings
of the 25th International Conference on Machine Learning, ICML ’08*,
2001, pp. 408–415.

[28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. *A
Practical Guide to Support Vector Classification*. Available at `https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf`

.

[29] Hu, Q., X. Che, L. Zhang, and D. Yu. “Feature
Evaluation and Selection Based on Neighborhood Soft Margin.” *Neurocomputing*.
Vol. 73, 2010, pp. 2114–2124.

[30] Kecman V., T. -M. Huang, and M. Vogt. “Iterative
Single Data Algorithm for Training Kernel Machines from Huge Data
Sets: Theory and Performance.” In *Support Vector
Machines: Theory and Applications*. Edited by Lipo Wang,
255–274. Berlin: Springer-Verlag, 2005.

[31] Kohavi, R. “Scaling Up the Accuracy of Naive-Bayes
Classifiers: a Decision-Tree Hybrid.” *Proceedings
of the Second International Conference on Knowledge Discovery and
Data Mining*, 1996.

[32] Lancaster, H.O. “Significance Tests
in Discrete Distributions.” *JASA*, Vol.
56, Number 294, 1961, pp. 223–234.

[33] Langford, J., L. Li, and T. Zhang. “Sparse
Online Learning Via Truncated Gradient.” *J. Mach.
Learn. Res.*, Vol. 10, 2009, pp. 777–801.

[34] Loh, W.Y. “Regression Trees with
Unbiased Variable Selection and Interaction Detection.” *Statistica
Sinica*, Vol. 12, 2002, pp. 361–386.

[35] Loh, W.Y. and Y.S. Shih. “Split
Selection Methods for Classification Trees.” *Statistica
Sinica*, Vol. 7, 1997, pp. 815–840.

[36] McNemar, Q. “Note on the Sampling
Error of the Difference Between Correlated Proportions or Percentages.” *Psychometrika*,
Vol. 12, Number 2, 1947, pp. 153–157.

[37] Meinshausen, N. “Quantile Regression
Forests.” *Journal of Machine Learning Research*,
Vol. 7, 2006, pp. 983–999.

[38] Mosteller, F. “Some Statistical Problems
in Measuring the Subjective Response to Drugs.” *Biometrics*,
Vol. 8, Number 3, 1952, pp. 220–226.

[39] Nocedal, J. and S. J. Wright. *Numerical
Optimization*, 2nd ed., New York: Springer, 2006.

[40] Schapire, R. E. et al. *Boosting the margin: A new
explanation for the effectiveness of voting methods.* Annals of Statistics,
Vol. 26, No. 5, 1998, pp. 1651–1686.

[41] Schapire, R., and Y. Singer. *Improved boosting algorithms
using confidence-rated predictions.* Machine Learning, Vol. 37, No. 3, 1999,
pp. 297–336.

[42] Shalev-Shwartz, S., Y. Singer, and N.
Srebro. “Pegasos: Primal Estimated Sub-Gradient Solver for
SVM.” *Proceedings of the 24th International Conference
on Machine Learning, ICML ’07*, 2007, pp. 807–814.

[43] Seiffert, C., T. Khoshgoftaar, J. Hulse, and A. Napolitano.
*RUSBoost: Improving classification performance when training data is
skewed.* 19th International Conference on Pattern Recognition, 2008, pp.
1–4.

[44] Warmuth, M., J. Liao, and G. Ratsch. *Totally corrective
boosting algorithms that maximize the margin.* Proc. 23rd Int’l. Conf. on
Machine Learning, ACM, New York, 2006, pp. 1001–1008.

[45] Wu, T. F., C. J. Lin, and R. Weng. “Probability
Estimates for Multi-Class Classification by Pairwise Coupling.” *Journal
of Machine Learning Research*. Vol. 5, 2004, pp. 975–1005.

[46] Wright, S. J., R. D. Nowak, and M. A. T. Figueiredo.
“Sparse Reconstruction by Separable Approximation.” *Trans.
Sig. Proc.*, Vol. 57, No 7, 2009, pp. 2479–2493.

[47] Xiao, Lin. “Dual Averaging Methods
for Regularized Stochastic Learning and Online Optimization.” *J.
Mach. Learn. Res.*, Vol. 11, 2010, pp. 2543–2596.

[48] Xu, Wei. “Towards Optimal One Pass
Large Scale Learning with Averaged Stochastic Gradient Descent.” *CoRR*,
abs/1107.2490, 2011.

[49] Zadrozny, B. “Reducing Multiclass
to Binary by Coupling Probability Estimates.” *NIPS
2001: Proceedings of Advances in Neural Information Processing Systems
14*, 2001, pp. 1041–1048.

[50] Zadrozny, B., J. Langford, and N. Abe. “Cost-Sensitive Learning
by Cost-Proportionate Example Weighting.” *Third IEEE International
Conference on Data Mining*, 435–442. 2003.

[51] Zhou, Z.-H. and X.-Y. Liu. “On Multi-Class Cost-Sensitive
Learning.”*Computational Intelligence.* Vol. 26, Issue 3,
2010, pp. 232–257 CiteSeerX.