학술·연구정보가이드: Computer Science 분야 (07): Classifiers; Decision trees; Ensemble pruning

피인용 상위 논문

Do we need hundreds of classifiers to solve real world classification problems?.
Fernández-Delgado, M., Cernadas, E., Barro, S. and 1 more (2014) Journal of Machine Learning Research, 15, pp. 3133-3181.

more... less...

We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifi ers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearestneighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classi ers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively). © 2014 Manuel Fernández-Delgado, Eva Cernadas, Senén Barro and Dinani Amorim.
A survey of multiple classifier systems as hybrid systems.
Woźniak, M., Graña, M., Corchado, E.
(2014) Information Fusion, 16 (1), pp. 3-17.

more... less...

A current focus of intense research in pattern classification is the combination of several classifier systems, which can be built following either the same or different models and/or datasets building approaches. These systems perform information fusion of classification decisions at different levels overcoming limitations of traditional approaches based on single classifiers. This paper presents an up-to-date survey on multiple classifier system (MCS) from the point of view of Hybrid Intelligent Systems. The article discusses major issues, such as diversity and decision fusion methods, providing a vision of the spectrum of applications that are currently being developed.
LightGBM: A highly efficient gradient boosting decision tree.
Ke, G., Meng, Q., Finley, T. and 5 more (2017) Advances in Neural Information Processing Systems, 2017-, pp. 3147-3155.

more... less...

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: \emph{Gradient-based One-Side Sampling} (GOSS) and \emph{Exclusive Feature Bundling} (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB \emph{LightGBM}. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.
Combining Pattern Classifiers: Methods and Algorithms: Second Edition.
Kuncheva, L.I.
(2014) Combining Pattern Classifiers: Methods and Algorithms: Second Edition, 9781118315231, pp. 1-357.
Social network analysis for decisive the ultimate classification from the ensemble to boost accuracy rates.
Sundarraj, B., Kaliyamurthie, K.P.
(2016) International Journal of Pharmacy and Technology, 8 (3), pp. 17272-17279.

more... less...

Classifier ensembles are used with success to boost accuracy rates of the underlying classification mechanisms. Through the utilization of collective classifications,it becomes doable to attain lower error rates in classification than by employing a single classifier instance. Ensembles area unit most frequently used with collections of call trees or neural networks because of their higher rates of error once used severally. during this paper,we are going to contemplate a novel implementation of a classifier ensemble that utilizes kNN classifiers. every categoryifier is ready-made to police investigation membership in a very specific class employing a best set choice method for variables. This provides the range required to with success implement associate ensemble. associate aggregating mechanism for decisive the ultimate classification from the ensemble is conferred and tested against many documented datasets. © 2016,International Journal of Pharmacy and Technology. All rights reserved.