背景

决策树是一种传统有效的机器学习方法，建立于数据集之上，由节点和边组成。每个节点要么用来做决策（决策节点），要么代表一个结果（叶子节点）。

ID3是Iterative Dichotomiser 3的缩写，该算法在每一步都会迭代（iteratively）地将特征分为（dichotomize）两个或多个组。算法由Ross Quinlan提出。

ID3使用自上而下的贪婪方法来建立一棵决策树。简单地说，「自上而下」意味着我们从顶部开始构建树，而「贪婪」意味着在每个迭代中我们选择当前最好的特征来创建一个节点。
一般来说，ID3只用于Nominal分类问题。

ID3算法

最佳特征

正如上述「背景」所提到的，ID3的核心问题就是如何选择当前最好的特征。在算法中，通常使用信息增益（Information Gain）或增益（Gain）来寻找最佳特征。其中，信息增益表示使用该特征进行划分后，整个数据集的熵减少的程度。具有最高信息增益的特征被选为最佳特征。

数据集的初始熵（Entropy）计算如下：

$Entropy(S) = - \sum p_i\cdot log_2(p_i);\quad i = 1~to~n$

其中， $n$ 表示决策结果的分类数量，对于二分类问题 $n=2$ ， $p_i$ 表示分类结果 $i$ 出现的频率，等价于数据集中分类结果 $i$ 出现的次数占样本总数的比。

信息增益计算如下：

$IG(S, A) = Entropy(S) - \sum (\frac{|S_v|}{|S|}\cdot Entropy(S_v))\\\\ Entropy(S_v)=-\sum p_i\cdot log_2(p_i)$

这里 $\frac{|S_V|}{|S|}$ 表示特征值为 $v$ 的样本占总样本的比例； $Entropy(S_v)$ 指样本子集的熵。

选择信息增益 $IG(S, A)$ 最大的特征创建节点。在实际的算法实现中，不难发现在计算每个特征时， $Entropy(S)$ 的值都是一致的，并不影响我们比较大小。所以也可以将问题转化为找出 $\sum (\frac{|S_v|}{|S|}\cdot Entropy(S_v))$ 最小的特征。

结束标志

算法的另一个问题是什么时候结束一棵树的生长。容易想到，当剩下的样本所属的分类是一致的，就不需要再创建孩子将它们分开，此时他们的分类就是最终结果。此外，还可以设置一个最小叶结点数量，避免过度的学习。此时，通过简单投票判断Majority class，获得分类结果。

算法伪代码表示如下：

function ID3Algorithm(examples, targetAttribute, attributes):
    create a new node
    if all examples belong to the same class:
        return the node labeled with the class
    else if attributes is empty:
        return the node labeled with the majority class
    else:
        calculate the information gain for each attribute
        bestAttribute = attribute with the maximum information gain
        set the node attribute to bestAttribute
        split examples into subsets using bestAttribute
        for each value v of bestAttribute:
            create a new branch below the node
            subsetExamples = subset of examples where example.attribute equals v
            if subsetExamples is empty:
                add a leaf node labeled with the majority class below the current branch
            else:
                newAttributes = attributes - {bestAttribute}
                add a subtree ID3Algorithm(subsetExamples, targetAttribute, newAttributes) below the current branch
    return the node