<?xml version="1.0" encoding="utf8"?>
 <!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd"> <article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.0" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JMID</journal-id>
      <journal-title-group>
        <journal-title>Journal of Medical Informatics and Decision Making</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2641-5526</issn>
      <publisher>
        <publisher-name>Open Access Pub</publisher-name>
        <publisher-loc>United States</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.14302/issn.2641-5526.jmid-20-3302</article-id>
      <article-id pub-id-type="publisher-id">JMID-20-3302</article-id>
      <article-categories>
        <subj-group>
          <subject>research-article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Study of The ID3 and C4.5 Learning Algorithms </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Y.Fakir</surname>
            <given-names/>
          </name>
          <xref ref-type="aff" rid="idm1842936708">1</xref>
          <xref ref-type="aff" rid="idm1842933972">*</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>M.</surname>
            <given-names>Azalmad</given-names>
          </name>
          <xref ref-type="aff" rid="idm1842936708">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>R.</surname>
            <given-names>Elaychi</given-names>
          </name>
          <xref ref-type="aff" rid="idm1842936708">1</xref>
        </contrib>
      </contrib-group>
      <aff id="idm1842936708">
        <label>1</label>
        <addr-line>Laboratory of Information Processing and Decision Support, Faculty of Sciences and Technics, Sultane Moulay Slimane University, Beni Mellal, Morocco </addr-line>
      </aff>
      <aff id="idm1842933972">
        <label>*</label>
        <addr-line>Corresponding author</addr-line>
      </aff>
      <contrib-group>
        <contrib contrib-type="editor">
          <name>
            <surname>Erfan</surname>
            <given-names>Babaee</given-names>
          </name>
          <xref ref-type="aff" rid="idm1843059692">1</xref>
        </contrib>
      </contrib-group>
      <aff id="idm1843059692">
        <label>1</label>
        <addr-line>Mazandaran University of Science and Technology, Iran. </addr-line>
      </aff>
      <author-notes>
        <corresp>Youssef Fakir, Laboratory of Information Processing and Decision Support, Faculty of Sciences and Technics, Sultane Moulay Slimane University, Beni Mellal, Morocco. Email: <email>info.dec07@yahoo.fr</email></corresp>
        <fn fn-type="conflict" id="idm1842853124">
          <p>The authors have declared that no competing interests exist.</p>
        </fn>
      </author-notes>
      <pub-date pub-type="epub" iso-8601-date="2020-04-23">
        <day>23</day>
        <month>04</month>
        <year>2020</year>
      </pub-date>
      <volume>1</volume>
      <issue>2</issue>
      <fpage>29</fpage>
      <lpage>43</lpage>
      <history>
        <date date-type="received">
          <day>04</day>
          <month>04</month>
          <year>2020</year>
        </date>
        <date date-type="accepted">
          <day>21</day>
          <month>04</month>
          <year>2020</year>
        </date>
        <date date-type="online">
          <day>23</day>
          <month>04</month>
          <year>2020</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© </copyright-statement>
        <copyright-year>2020</copyright-year>
        <copyright-holder>Youssef Fakir, et al.</copyright-holder>
        <license xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
          <license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="http://openaccesspub.org/jmid/article/1334">This article is available from http://openaccesspub.org/jmid/article/1334</self-uri>
      <abstract>
        <p>Data Mining is a process of exploring against large data to find patterns in decision-making. One of the techniques in decision-making is classification.  Data classification is a form of data analysis used to extract models describing important data classes. There are many classification algorithms.  Each classifier encompasses some algorithms in order to classify object into predefined classes. Decision Tree is one such important technique, which builds a tree structure by incrementally breaking down the datasets in smaller subsets. Decision Trees can be implemented by using popular algorithms such as ID3, C4.5 and CART etc. The present study considers ID3 and C4.5 algorithms to build a decision tree by using the “entropy” and “information gain” measures that are the basics components behind the construction of a classifier model </p>
      </abstract>
      <kwd-group>
        <kwd>Decision tree classifier</kwd>
        <kwd>Entropy</kwd>
        <kwd>information gain</kwd>
        <kwd>ID3 algorithm</kwd>
        <kwd>C4.5 algorithm</kwd>
      </kwd-group>
      <counts>
        <fig-count count="8"/>
        <table-count count="9"/>
        <page-count count="15"/>
      </counts>
    </article-meta>
  </front>
  <body>
    <sec id="idm1842801612" sec-type="intro">
      <title>Introduction</title>
      <p>The decision tree builds a model based on a data set called training data. This set of training data is only a set of records or objects, and each object (record) is characterized by a set of attributes and other special attribute called class label. If the training data has big number of records and small number of noisy data, then the generated model will be well designed, and constructed, otherwise the model will be poorly predicting future unseen records. Since the training data is important, the test data is also important. It is a set of records with unknown label class, used as a validator of the model.</p>
      <p>The generated model used as a descriptive or predictive modeling, the descriptive modeling serve to describe the set of attributes that essentially make an object belong to a class or another class. The model shows the priority of each attribute, which influences the belonging of an object to a specific class. The predictive modeling act as a function takes a new object as input, and produces a class that will be attached to that object. </p>
      <p>In the next section, we will discuss in depth the measures behind the constructing of the model, then later we will present how to make these measures work with learning algorithms ID3<xref ref-type="bibr" rid="ridm1842508636">1</xref><xref ref-type="bibr" rid="ridm1842577908">2</xref> and C4.5<xref ref-type="bibr" rid="ridm1842583956">3</xref><xref ref-type="bibr" rid="ridm1842373972">4</xref><xref ref-type="bibr" rid="ridm1842372820">5</xref><xref ref-type="bibr" rid="ridm1842368860">6</xref>.</p>
      <sec id="idm1842801972">
        <title>Impurity Measure</title>
        <p>There several indices to measure degree of umpurity quantitatively. Most of well know indices are entropy, Gini index and classification error. Entropy in simple terms is the measure of disorder in a group of multiple objects <xref ref-type="bibr" rid="ridm1842363308">7</xref>. Consider the example in <xref ref-type="fig" rid="idm1842264068">Figure 1</xref>, with seven triangles, and five circles.</p>
        <fig id="idm1842264068">
          <label>Figure 1.</label>
          <caption>
            <title> Example for entropy measure</title>
          </caption>
          <graphic xlink:href="images/image1.jpg" mime-subtype="jpg"/>
        </fig>
        <p>In this example, we cannot be certain of the shape that dominates in totality, this uncertainty we can express it mathematically by the following expression:</p>
        <fig id="idm1842265364">
          <graphic xlink:href="images/image2.png" mime-subtype="png"/>
        </fig>
        <p>where n is the number of classes, and <italic>p</italic><sub><italic>i</italic></sub>  is the fraction of an object <italic>i</italic>  by the total number of objects.</p>
        <p>Taking the previous example, we get:</p>
        <fig id="idm1842261548">
          <graphic xlink:href="images/image3.png" mime-subtype="png"/>
        </fig>
        <p>The entropy value is between zero and one, its value for the previous example is very high, which means high disorder. Thus, the set of objects is heterogeneous.</p>
        <p>We will take another two more examples to illustrate the relations between disorder and the entropy value. The process is the same. Calculate the entropy value for each example.</p>
        <p>In this example (<xref ref-type="fig" rid="idm1842260324">Figure 2</xref>), we take six green triangles, and two red circles, then calculate again the entropy value for this case:</p>
        <fig id="idm1842260324">
          <label>Figure 2.</label>
          <caption>
            <title> Example for illustrating the relation between disorder and the entropy</title>
          </caption>
          <graphic xlink:href="images/image4.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1842258668">
          <graphic xlink:href="images/image5.png" mime-subtype="png"/>
        </fig>
        <p>As we see here the value of entropy become smaller, because the group of triangles have much more members than group of circles, which means that the disorder becomes smaller. </p>
        <p>In this last example (<xref ref-type="fig" rid="idm1842258092">Figure 3</xref>), only take only a set of triangles as describe below:</p>
        <fig id="idm1842258092">
          <label>Figure 3.</label>
          <caption>
            <title> Example for illustrating the relation between disorder and the entropy</title>
          </caption>
          <graphic xlink:href="images/image6.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1842190820">
          <graphic xlink:href="images/image7.png" mime-subtype="png"/>
        </fig>
        <p>The entropy value indicates that there is no disorder, and that the hole is homogenous. To conclude when a group of objects of different class, or a set of data is heterogonous, the disorder is very high then the entropy value is also high, otherwise when there is homogeneity among a set of data then the entropy value is zero.</p>
        <p>There are other measures of impurity <xref ref-type="bibr" rid="ridm1842577908">2</xref>, such as “Gini Index”, that measures the divergence between the distributions of the target attribute’s values and the “classification error”:</p>
        <fig id="idm1842191252">
          <graphic xlink:href="images/image8.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842189092">
          <graphic xlink:href="images/image9.png" mime-subtype="png"/>
        </fig>
        <p>Earlier we see what impurity is. The next step is to find out what information gain is and how it relates to impurity.</p>
      </sec>
      <sec id="idm1842775156">
        <title>Information Gain Measure</title>
        <p>Information gain is a measure which makes it possible to discover, among all attributes which characterize a set of records or objects, the attribute that return enormous information about these records. In other words, the attribute that return the lowest impurity <xref ref-type="bibr" rid="ridm1842363308">7</xref>.</p>
        <p>If we have a parent node composed of several objects, each object has several attributes and a class label. We can partition the parent node into sub-set using an attribute. The goal is to discover the attribute that partitions the parent node of objects or records into sub-sets, with the following constraint, each sub-set must be the purest set possible. Means that the sub-set must has a low impurity value.</p>
        <p>Information gain is mathematically defined as:</p>
        <fig id="idm1842188516">
          <graphic xlink:href="images/image10.png" mime-subtype="png"/>
        </fig>
        <p>Where I(*) is the impurity measure of a node, we can calculate it by one of the tree measured defined in the impurity section. N is the total number of records or objects in the parent node or parent set, k is the number of attribute values, and N(<italic>v</italic><sub><italic>j</italic></sub>) is the number of records associated with the sub-set <italic>v</italic><sub><italic>j</italic></sub>. Here is an illustrative example of computing information gain measure for different attributes on set of objects, there are twelve object in total, each object represents a person, in general each person characterized by a set of attributes like age, eye color, family status, name, weight…etc. However, in our case, we will take only two attributes (age and weight), and the class label (gender) which only takes two possible values (male or female). Male is represented by triangles, and female represented by circles, as described in the <xref ref-type="fig" rid="idm1842184484">Figure 4</xref>. There are seven male and fourth female:</p>
        <fig id="idm1842184484">
          <label>Figure 4.</label>
          <caption>
            <title> Example of gender illustrating</title>
          </caption>
          <graphic xlink:href="images/image11.jpg" mime-subtype="jpg"/>
        </fig>
        <p>The impurity for this parent node is equal to 0.979. The impurity is enormous and it makes sense because the set of objects is heterogonous. The main objective is to reduce this impurity. The solution consists in dividing this parent node into sub-set by each attribute from this list of attributes (weight, age), and calculate for each attribute the information gain. Then take the attribute that returns the largest information gain measure.</p>
        <p>· Dividing the parent set by the “Weight” attribute.</p>
        <p>The weight attribute has three possible values (<xref ref-type="fig" rid="idm1842171292">Figure 5</xref>), small, medium and large, this attribute divides the parent set into three sub-sets, a set for small people, contains a male and two female, a set for medium people, contains, two male and a female, and last sub set for large people contains, four male and a female.</p>
        <fig id="idm1842171292">
          <label>Figure 5.</label>
          <caption>
            <title> Attribute values</title>
          </caption>
          <graphic xlink:href="images/image12.jpg" mime-subtype="jpg"/>
        </fig>
        <sec id="idm1842759996">
          <title>Calculate the Impurity for Each Sub-Set</title>
          <fig id="idm1842167836">
            <graphic xlink:href="images/image13.png" mime-subtype="png"/>
          </fig>
          <fig id="idm1842169132">
            <graphic xlink:href="images/image14.png" mime-subtype="png"/>
          </fig>
          <fig id="idm1842168772">
            <graphic xlink:href="images/image15.png" mime-subtype="png"/>
          </fig>
          <p> </p>
        </sec>
        <sec id="idm1842757764">
          <title>Calculate the Information Gain</title>
          <fig id="idm1842165892">
            <graphic xlink:href="images/image16.png" mime-subtype="png"/>
          </fig>
          <p>When we divide the parent set by the weigh attribute, the information gain gives the value 0.115.</p>
          <p>· Dividing the parent set by the “Age” attribute.</p>
          <p>The age attribute also has two possible values, young and adult. The young value gives a sub-set, with three male and two female, and the adult value gives sub-set, with four male and three female (<xref ref-type="fig" rid="idm1842165100">Figure 6</xref>).</p>
          <fig id="idm1842165100">
            <label>Figure 6.</label>
            <caption>
              <title> Attribute values</title>
            </caption>
            <graphic xlink:href="images/image17.jpg" mime-subtype="jpg"/>
          </fig>
        </sec>
        <sec id="idm1842764532">
          <title>Calculate the Impurity for Each Sub-Set</title>
          <fig id="idm1842179212">
            <graphic xlink:href="images/image18.png" mime-subtype="png"/>
          </fig>
          <fig id="idm1842180940">
            <graphic xlink:href="images/image19.png" mime-subtype="png"/>
          </fig>
        </sec>
        <sec id="idm1842760500">
          <title>Calculate the Information Gain</title>
          <fig id="idm1842178564">
            <graphic xlink:href="images/image20.png" mime-subtype="png"/>
          </fig>
          <p>The information gain provided by the weight attribute is very high compared to the other age attribute, and the subsets provided by the weight attribute are purer than the subsets provided by the age attribute. Therefore, for this initial parent node, we will take the weight attribute to divide the initial parent node into purer subsets.</p>
          <p>(<xref ref-type="table" rid="idm1842177772">Table 1</xref>)</p>
          <table-wrap id="idm1842177772">
            <label>Table 1.</label>
            <caption>
              <title> Information gain</title>
            </caption>
            <table rules="all" frame="box">
              <tbody>
                <tr>
                  <td> </td>
                  <td>Weight</td>
                  <td>Age</td>
                </tr>
                <tr>
                  <td>Information Gain</td>
                  <td>
                    <bold>0.115</bold>
                  </td>
                  <td>0.00025</td>
                </tr>
              </tbody>
            </table>
          </table-wrap>
          <p>What is the use of this information gain measure in the classification process? Moreover, how can it be used in the decision tree classifier to construct a model?</p>
          <p>These questions will be answered in details in the next section.</p>
        </sec>
      </sec>
      <sec id="idm1842738844">
        <title>The Core of Decision Tree Algorithms</title>
        <p>Decision tree algorithms use an initial training data set characterized by a dimension (mXn), m designed to the number of rows in the training data set, and n designed to the number of attributes that belongs to a record. The decision tree algorithms run through that search space recursively, this method called tree growing, during each recursive step, the tree growing process must select an attribute test condition to divide the records into smaller subsets. To implement this step, the algorithm used the measure of the information gain that we saw previously. This means that the decision tree algorithms calculate the information gain for each attribute and this process called the attribute test condition, and choose the attribute test condition that maximize the gain. Then a child node is created for each outcome of the attribute test condition and the sub records are distributed to the children node based on the outcomes. The tree growing process repeated recursively to each child node until all the records in a set belongs to the same class or all the records have identical attribute values. Although both condition are sufficient to stop any decision tree algorithm.</p>
        <p>The core and basis of many existing decision tree algorithm such as ID3, C4.5 and CART is the Hunt’s algorithm <xref ref-type="bibr" rid="ridm1842341516">11</xref>. In Hunt’s algorithm, a decision tree is grown in a recursive way by partitioning the training records into successively purer data set.  A skelton for this algorithm called “TreeCrowth” is shown below. The input to this algorithm consists of the training record <italic>E </italic>and the list of attributes<italic> F.</italic></p>
        <sec id="idm1842738412">
          <title>TreeGrowth(<italic>E</italic>,<italic>F</italic>)</title>
          <p>1) If stropping_condition (<italic>E</italic>,<italic>F</italic>) = true then</p>
          <p>2)     Leaf_node = createNode()</p>
          <p>3)     Leaf_node = classify()</p>
          <p>4)     return leaf_node</p>
          <p>5) Else</p>
          <p>6)     root = createNode()</p>
          <p>7)     root_test_condition = findBestSplit(<italic>E</italic>,<italic>F</italic>)</p>
          <p>8)     let V = {<italic>v </italic>| <italic>v</italic> is a possible outcome of             root_test_condition}</p>
          <p>9) <bold>     for</bold> each <italic>v</italic><inline-graphic xlink:href="images/image21.png" mime-subtype="png"/> V do</p>
          <p>10)          <inline-graphic xlink:href="images/image22.png" mime-subtype="png"/> = {<italic>e</italic> | root_test_condition(<italic>e</italic>) = <italic>v </italic>and <italic>e</italic><inline-graphic xlink:href="images/image23.png" mime-subtype="png"/>}</p>
          <p>11)          Child = TreeGrowth(<italic>E</italic>,<italic>F</italic>)</p>
          <p>12)          Add child as descendent of root and label the edge (root <inline-graphic xlink:href="images/image24.png" mime-subtype="png"/> child) as <italic>v</italic></p>
          <p>13)      end for</p>
          <p>14) end if</p>
          <p>15) return root</p>
        </sec>
        <sec id="idm1842746980">
          <title>The Detail of this Algorithm is Explained Below</title>
          <p>1) The stopping_condition() function becomes true if all records have the same class label, or all the attributes have the same values.</p>
          <p>2) createNode() function, enlarge the tree structure  by adding new node for a new root attribute as root_test_cond or a new class label as label_node.</p>
          <p>3) Classify() function determines the class label to be assigned to a leaf node.</p>
          <p>4) findBestSplit() function determines the attribute that produce the maximum information gain measure. </p>
          <p>This section discuss how a decision tree works and how it construct a model based on Hunt’s algorithm, and shows the importance of the information gain measurement, which is the core part of the algorithm, it is the metric that allows the algorithm to learn how it can be partition the records and build the tree.</p>
          <p>Next, we will discuss the ID3 and C4.5 algorithms, which use the Hunt’s algorithm and explain by examples how they works, we will first start with the ID3, explain it, and present its limitation, then we will discuss its evolutional version C4.5, and why C4.5 is more performant than ID3.</p>
        </sec>
      </sec>
      <sec id="idm1842743308">
        <title>ID3 Learning Algorithm</title>
        <p>The ID3 algorithm is considered as a very simple decision tree algorithm. The ID3 algorithm is a decision tree-building algorithm. It determines the classification of objects or records by testing the values of their attributes. It builds a decision tree for the given data in top-down structure, starting from a set of records and a set of attributes.at each node of the tree, one attribute is tested based on maximizing the information gain measurement and minimizing the entropy measurement, and the result are used to split the records. This process is recursively done until the records given in a sub-tree are homogenous (all the records belong to the same class). These homogenous records become a leaf node of the decision tree <xref ref-type="bibr" rid="ridm1842362012">8</xref><xref ref-type="bibr" rid="ridm1842359276">9</xref><xref ref-type="bibr" rid="ridm1842343316">10</xref>.</p>
        <p>To illustrate the operation of ID3, consider the learning task represented by the training records of <xref ref-type="table" rid="idm1842112884">Table 2</xref> below. The class label is the attribute “gender”, which can have values “male” or “female” for different people. The task is to build a model from the initial records described in the Table below, this model will be used later to predict the class label “gender” for a new record with an unknown class label, based on a set of attributes (age, weight, length).which attribute should be tested first in the tree? ID3 determines the information gain for each attribute (age, weight, length), and then select the one with highest information gain. </p>
        <table-wrap id="idm1842112884">
          <label>Table 2.</label>
          <caption>
            <title> Initial training data sets </title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <th>
                  <bold>Age</bold>
                </th>
                <td>
                  <bold>Weight</bold>
                </td>
                <td>
                  <bold>Length</bold>
                </td>
                <td>
                  <bold>Gender</bold>
                </td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Middle-weigh</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Light-weight</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Heavy-weight</td>
                <td>Medium</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Heavy-weight</td>
                <td>Long</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Heavy-weight</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Middle-weight</td>
                <td>Medium</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Middle-weight</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Light-weight</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Heavy-weight</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Middle-weight</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Light-weight</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Heavy-weight</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Middle-weight</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Light-weight</td>
                <td>Long</td>
                <td>female</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>All the attributes are categorical, which means that all its values are nominal. Since ID3 does not support continue values, we will see later why? For the moment, we will calculate the impurity for this initial training data set, and then calculate the information gain for every attribute, to find the best attribute, which returns the maximum value of the information gain, then link that attribute as a root node and split the initial straining data set into new sub-sets.</p>
        <p>· Calculate the impurity for the initial training data set.</p>
        <p>We have four male and three female so the entropy is equal to:</p>
        <fig id="idm1842044620">
          <graphic xlink:href="images/image25.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842042028">
          <graphic xlink:href="images/image26.png" mime-subtype="png"/>
        </fig>
        <p>· Calculate the information gain for the “Age” attribute</p>
        <p>The age attribute has two distinct values, which are young and adult, the first step before calculating the information gain. We should calculate the impurity for each value.</p>
        <p>Calculate the entropy for the “Young” value of the “Age” attribute:</p>
        <fig id="idm1842041740">
          <graphic xlink:href="images/image27.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842041164">
          <graphic xlink:href="images/image28.png" mime-subtype="png"/>
        </fig>
        <p>Calculate the entropy for the “adult” value of the “age” attribute:</p>
        <fig id="idm1842039652">
          <graphic xlink:href="images/image29.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842039868">
          <graphic xlink:href="images/image30.png" mime-subtype="png"/>
        </fig>
        <p>The information gain for the “age” attribute is equal to:</p>
        <fig id="idm1842038356">
          <graphic xlink:href="images/image31.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842037492">
          <graphic xlink:href="images/image32.png" mime-subtype="png"/>
        </fig>
        <p>Calculate the information gain for the “weight” attribute.</p>
        <fig id="idm1842036844">
          <graphic xlink:href="images/image33.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842035836">
          <graphic xlink:href="images/image34.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842035332">
          <graphic xlink:href="images/image35.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842033460">
          <graphic xlink:href="images/image36.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842034324">
          <graphic xlink:href="images/image37.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842033820">
          <graphic xlink:href="images/image38.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842032020">
          <graphic xlink:href="images/image39.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842062908">
          <graphic xlink:href="images/image40.png" mime-subtype="png"/>
        </fig>
        <p>Calculate the information gain for the “length” attribute.</p>
        <fig id="idm1842063268">
          <graphic xlink:href="images/image41.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842061540">
          <graphic xlink:href="images/image42.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842060028">
          <graphic xlink:href="images/image43.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842059956">
          <graphic xlink:href="images/image44.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842058300">
          <graphic xlink:href="images/image45.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842058012">
          <graphic xlink:href="images/image46.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842058804">
          <graphic xlink:href="images/image47.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842057364">
          <graphic xlink:href="images/image48.png" mime-subtype="png"/>
        </fig>
        <p>The <xref ref-type="table" rid="idm1842056716">Table 3</xref> summarizes all the result that we get above.</p>
        <table-wrap id="idm1842056716">
          <label>Table 3.</label>
          <caption>
            <title> Information gain by ID3 </title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <td> </td>
                <td>
                  <bold>Age</bold>
                </td>
                <td>
                  <bold>Weight</bold>
                </td>
                <td>
                  <bold>Length</bold>
                </td>
              </tr>
              <tr>
                <td>Information Gain</td>
                <td>0.102</td>
                <td>0.104</td>
                <td>
                  <bold>0.247</bold>
                </td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>The length attribute returns the maximum information gain, so the ID3 algorithm will split the initial records based on the values of this attribute. Each value will be the outcome of the root node. Length attribute will produce three outcomes, short, medium and long. The splitting will therefore also produce three sub-sets. Each sub set has the records that the value of the “length” attribute will match the value of the outcome. The ID3 algorithm will create the first root node (<xref ref-type="fig" rid="idm1842165100">Figure 6</xref> a) and its outcomes as described below: </p>
        <fig id="idm1841984228">
          <label>Figure 6a.</label>
          <caption>
            <title> The root node of the decision tree.</title>
          </caption>
          <graphic xlink:href="images/image49.jpg" mime-subtype="jpg"/>
        </fig>
        <p>In the outcome associated with the “short” value the class label has the same value for all the records. Therefore, the ID3 will end in this child node and will create a leaf node that has the value “female”, but for the others child nodes, the stopping conditions are not verified, the algorithm will repeat the process until reaching a homogenous sub node. At the end of the process, the ID3 algorithm generate a tree model as shown below.</p>
        <p>The root node and the intern nodes are in green, the leaf nodes are in red, and the outcomes are in orange.</p>
        <p>In general, decision tree represents a disjunction of conjunctions on the attribute values of records. Each path from the tree root to the leaf node corresponds to a conjunction of attribute test. In addition, the tree itself to a disjunction of these conjunctions. For example, the decision tree shown in <xref ref-type="fig" rid="idm1841983868">Figure 7</xref> corresponds to the expression</p>
        <fig id="idm1841983868">
          <label>Figure 7.</label>
          <caption>
            <title> The full decision tree model.</title>
          </caption>
          <graphic xlink:href="images/image50.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841982284">
          <graphic xlink:href="images/image51.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1841982068">
          <graphic xlink:href="images/image52.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1841981420">
          <graphic xlink:href="images/image53.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842013748">
          <graphic xlink:href="images/image54.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842012092">
          <graphic xlink:href="images/image55.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842012956">
          <graphic xlink:href="images/image56.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1842011516">
          <graphic xlink:href="images/image57.png" mime-subtype="png"/>
        </fig>
        <p> </p>
        <p>The above disjunctions of conjunctions can be used as a function to predict new records with unknown class label “gender”. Thus, each disjunction become an “If” statement and each conjunction become a condition for testing the value of each attribute of the new record. We will generate five rules from this decision tree model to predict new records as shown below.</p>
      </sec>
      <sec id="idm1842645996">
        <title>Model_to_Rules(New_Record)</title>
        <p>if new_record(Length) = ”Short” then</p>
        <p>return female</p>
        <p>else if new_record(Length) = “Medium” AND new_record(weight) = Middle OR </p>
        <p>new_record (weight) = Light then</p>
        <p>return female</p>
        <p>else if new_record(Length) = “Medium” AND new_record(weight)= Heavy then</p>
        <p>return male</p>
        <p>else if new_record(Length) = “Long” AND new_record(weight) = “Light” OR new_record(weight) = “Heavy” then</p>
        <p>return female</p>
        <p>else if new_record(Length) = “Long” AND new_record(weight) = “Middle” then</p>
        <p>return male</p>
      </sec>
      <sec id="idm1842642756">
        <title>Limitation of ID3 Algorithm</title>
        <p>   In the real world data, the dataset can contain different types of data. Such as Boolean data, categorical data and continues data. In this section we will work with the previous training data set, but with a small difference, the weight attribute will be defined by continues values instead of categorical values, in order to see how the ID3 will react to this small difference, and try to understand the limitation of the ID3 to numeric data. The table below (<xref ref-type="table" rid="idm1842006404">Table 4</xref>) show the training data set with two categorical attributes (age and length); one continues attribute (weight) and a class label (gender). </p>
        <table-wrap id="idm1842006404">
          <label>Table 4.</label>
          <caption>
            <title> The initial training data set.</title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <th>
                  <bold>Age</bold>
                </th>
                <td>
                  <bold>Weight</bold>
                </td>
                <td>
                  <bold>Length</bold>
                </td>
                <td>
                  <bold>Gender</bold>
                </td>
              </tr>
              <tr>
                <td>Young</td>
                <td>72</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>52</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>92</td>
                <td>Medium</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>76</td>
                <td>Long</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>70</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>67</td>
                <td>Medium</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>60</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>62</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>74</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>58</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>59</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>75</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>71</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>61</td>
                <td>Long</td>
                <td>female</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>The process is the same, the goal is to build a decision tree model, the ID3 algorithm need to calculate the information gain for each attribute as a first step, the values for the age and length attributes still the same, because we do not change the values for this two attributes, the only difference is about the weight attribute. </p>
        <p>· Information gain for the “Age” attribute is equal to:</p>
        <fig id="idm1841952980">
          <graphic xlink:href="images/image32.png" mime-subtype="png"/>
        </fig>
        <p>· Information gain for the “Length” attribute is equal to:</p>
        <fig id="idm1841951756">
          <graphic xlink:href="images/image48.png" mime-subtype="png"/>
        </fig>
        <p>· Calculate the information gain for the “Weigh” attribute:</p>
        <p>Before calculating the information gain, we must first calculate the impurity for each possible value of the “weight” attribute, the weight attribute producing fourteen values, each value linked to a single record, so that the entropy for each value will be zero. Here is an example of the value “72”, its entropy equal to:</p>
        <fig id="idm1841951324">
          <graphic xlink:href="images/image58.png" mime-subtype="png"/>
        </fig>
        <p>While the entropy for all the values is null, so the information gain of the “weight” attribute is equal to the entropy of the initial dataset:</p>
        <fig id="idm1841948588">
          <graphic xlink:href="images/image59.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1841949308">
          <graphic xlink:href="images/image60.png" mime-subtype="png"/>
        </fig>
        <p>We can recognize without calculation, that the “weight” attribute will produce the highest entropy, because the weight attribute will generate fourteen purer subsets, each subset has only one record belongs to one class. The ID3 will end here because there are no more records to split and all the sub-sets are homogenous. The decision tree model generated is shown in <xref ref-type="fig" rid="idm1841945780">Figure 8</xref>.</p>
        <fig id="idm1841945780">
          <label>Figure 8.</label>
          <caption>
            <title> Decision tree model generated by ID3 with numeric attribute.</title>
          </caption>
          <graphic xlink:href="images/image61.jpg" mime-subtype="jpg"/>
        </fig>
        <p>The above training dataset has only one numeric attribute, and the ID3 algorithm fails to generate a model that will generally predict new future records. The ID3 algorithms is less effective for numeric attributes, with a training dataset that has more than one numeric attribute and noisy and missing data, the ID3 algorithm is bad choice. The better solution for this type of datasets is the C4.5 algorithm, which we will discover in the next section.</p>
      </sec>
      <sec id="idm1842590268">
        <title>C4.5 Learning Algorithm</title>
        <p>C4.5 is evolution of ID3, presented by the same author (Quinlan, 1993). The C4.5 algorithm generates a decision tree for a given dataset by recursively splitting the records. The C4.5 algorithm considers the categorical and numeric attributes. For each categorical attribute, the C4.5 calculate the information gain and select the one with the highest value, and used the attribute to produce many outcomes as the number of distinct values of this attribute <xref ref-type="bibr" rid="ridm1842362012">8</xref><xref ref-type="bibr" rid="ridm1842359276">9</xref><xref ref-type="bibr" rid="ridm1842343316">10</xref>. For each numeric attribute there are two methods to calculate the information gain, the first method consists in calculating the gain ratio, and the second method consists in calculating the information gain as we did in the ID3 but with some differences. </p>
        <p>As an example, we take the training dataset that we already used in the ID3 limitation section.</p>
        <p>Calculations of categorical attributes do not change. The C4.5 uses the same process as ID3 to calculate the categorical attribute. The main difference between ID3 and C4.5 concerns numeric attributes, C4.5 presents two methods for managing the numeric values of an attribute. The information gain of categorical values is still the same as we did before.</p>
        <p>First method</p>
        <p>What is the wrong with the “weight” attribute? Simply, it has so many possible values that is used to separate the training records into very small sub-sets. Because of this, the weight attribute will have the highest information gain relative to the initial training dataset. Thus, the resulting model will be a very poor predictor of the target class label over new future records. </p>
        <p>One way to avoid this difficulty is to select decision attributes based on some measure other than information gain. One alternative measure that has been used successfully is the gain ratio. The gain ratio measure penalizes attributes penalizes attributes such as weight attribute by incorporating a term called split information <xref ref-type="bibr" rid="ridm1842341516">11</xref>, that is sensitive to how uniformly the attribute split the data:</p>
        <fig id="idm1841944340">
          <graphic xlink:href="images/image62.png" mime-subtype="png"/>
        </fig>
        <p>Where c is the total number of splits, and p(<italic>v</italic><sub><italic>j</italic></sub>)  is the fraction of the records associated with a split by the total number of all the records. For an example, the weight attribute has fourteen possible values, so the number of splits c in this example is equal to fourteen, and the number of records associated to each possible value is equal to one, each attribute value has the same number of records then:</p>
        <fig id="idm1841941028">
          <graphic xlink:href="images/image63.png" mime-subtype="png"/>
        </fig>
        <p>Where N(<italic>v</italic><sub><italic>j</italic></sub>)  the number of records is associated with a split or an unique value, and N is the total number of records.</p>
        <fig id="idm1841941676">
          <graphic xlink:href="images/image64.png" mime-subtype="png"/>
        </fig>
        <p>Each attribute value has the same number of records, so the split info will be equal to:</p>
        <fig id="idm1841938436">
          <graphic xlink:href="images/image65.png" mime-subtype="png"/>
        </fig>
        <p>To determine the goodness of a split, we need to use a criterion known as gain ratio. This criterion is defined as follows:</p>
        <fig id="idm1841939372">
          <graphic xlink:href="images/image66.png" mime-subtype="png"/>
        </fig>
        <p>The information gain of the weight attribute is equal to the entropy of the initial data set, we have been discussed this topic previously in the fifth section. The gain ratio of the weight attribute is equal to:</p>
        <fig id="idm1841937284">
          <graphic xlink:href="images/image67.png" mime-subtype="png"/>
        </fig>
        <p>This example suggests that if an attribute produces large number of splits, its split information will also be large, which in turn reduces its gain ratio.</p>
        <p>The information gain for all the attributes is given in <xref ref-type="table" rid="idm1841937932">Table 5</xref>.</p>
        <table-wrap id="idm1841937932">
          <label>Table 5.</label>
          <caption>
            <title> Information Gain</title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <td> </td>
                <td>Age</td>
                <td>Weight</td>
                <td>Length</td>
              </tr>
              <tr>
                <td>Information Gain</td>
                <td>0.102</td>
                <td>0.246</td>
                <td>
                  <bold>0.247</bold>
                </td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>Using this first technique in C4.5, the attribute “length” always provides the great information gain than the attributes “age” and “weight”.</p>
        <p>· Second method</p>
        <p>This method consists in considering each value of a numeric attribute as a candidate, then for each candidate (numeric value) select the set of records less than or equal to this candidate, and the set of records greater than this candidate. In addition, calculate the entropy for the two sets, and the information gain for the candidate that relates to the two sets of records <xref ref-type="bibr" rid="ridm1842363308">7</xref>. Here is a full example (<xref ref-type="table" rid="idm1842006404">Table 4</xref>) of how we use this method on the weight attribute. For example, consider the candidate who has the value “61”. This candidate can divide the initial training dataset into two sub-sets, the first subset with the records whose weight value is less than or equal to “61”, and the second sub-set with the records whose weight value is greater than “61”. The calculation of entropy and information gain are presented in <xref ref-type="table" rid="idm1841882668">Table 6</xref>.</p>
        <table-wrap id="idm1841882668">
          <label>Table 6.</label>
          <caption>
            <title> Calculations for each value of weight attribute</title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <th colspan="2">
                  <bold> </bold>
                </th>
                <td colspan="2">
                  <bold>classes</bold>
                </td>
                <td>
                  <bold>Entropy</bold>
                </td>
                <td>
                  <bold>Gain</bold>
                </td>
              </tr>
              <tr>
                <td colspan="2"/>
                <td>male</td>
                <td>female</td>
                <td/>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>52</bold>
                </td>
                <td>&lt;=</td>
                <td>0</td>
                <td>1</td>
                <td>0</td>
                <td>0.0197</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>5</td>
                <td>8</td>
                <td>0.991</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>58</bold>
                </td>
                <td>&lt;=</td>
                <td>0</td>
                <td>2</td>
                <td>0</td>
                <td>0.100</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>5</td>
                <td>7</td>
                <td>0.979</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>59</bold>
                </td>
                <td>&lt;=</td>
                <td>0</td>
                <td>3</td>
                <td>0</td>
                <td>0.159</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>5</td>
                <td>6</td>
                <td>0.994</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>60</bold>
                </td>
                <td>&lt;=</td>
                <td>0</td>
                <td>4</td>
                <td>0</td>
                <td>0.225</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>5</td>
                <td>5</td>
                <td>1</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>61</bold>
                </td>
                <td>&lt;=</td>
                <td>0</td>
                <td>5</td>
                <td>0</td>
                <td>
                  <bold>0.302</bold>
                </td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>5</td>
                <td>4</td>
                <td>0.991</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>62</bold>
                </td>
                <td>&lt;=</td>
                <td>1</td>
                <td>5</td>
                <td>0.650</td>
                <td>0.09</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>4</td>
                <td>4</td>
                <td>1</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>67</bold>
                </td>
                <td>&lt;=</td>
                <td>2</td>
                <td>5</td>
                <td>0.863</td>
                <td>0.016</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>3</td>
                <td>4</td>
                <td>0.985</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>70</bold>
                </td>
                <td>&lt;=</td>
                <td>2</td>
                <td>6</td>
                <td>0.811</td>
                <td>0.048</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>3</td>
                <td>3</td>
                <td>1</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>71</bold>
                </td>
                <td>&lt;=</td>
                <td>3</td>
                <td>6</td>
                <td>0.918</td>
                <td>0.0034</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>2</td>
                <td>3</td>
                <td>0.970</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>72</bold>
                </td>
                <td>&lt;=</td>
                <td>4</td>
                <td>6</td>
                <td>0.970</td>
                <td>0.015</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>1</td>
                <td>3</td>
                <td>0.811</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>74</bold>
                </td>
                <td>&lt;=</td>
                <td>4</td>
                <td>7</td>
                <td>0.945</td>
                <td>0.00078</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>1</td>
                <td>2</td>
                <td>0.918</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>75</bold>
                </td>
                <td>&lt;=</td>
                <td>4</td>
                <td>8</td>
                <td>0.918</td>
                <td>0.0102</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>1</td>
                <td>1</td>
                <td>1</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>76</bold>
                </td>
                <td>&lt;=</td>
                <td>4</td>
                <td>9</td>
                <td>0.890</td>
                <td>0.113</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>1</td>
                <td>0</td>
                <td>0</td>
                <td/>
              </tr>
              <tr>
                <td>
                  <bold>92</bold>
                </td>
                <td>&lt;=</td>
                <td>5</td>
                <td>9</td>
                <td>0.940</td>
                <td>0</td>
              </tr>
              <tr>
                <td/>
                <td>&gt; </td>
                <td>0</td>
                <td>0</td>
                <td>0</td>
                <td/>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <fig id="idm1841670500">
          <graphic xlink:href="images/image68.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1841670212">
          <graphic xlink:href="images/image69.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1841668412">
          <graphic xlink:href="images/image70.png" mime-subtype="png"/>
        </fig>
        <fig id="idm1841667836">
          <graphic xlink:href="images/image71.png" mime-subtype="png"/>
        </fig>
        <p>The information gain of the value”61” will be the information gain of the weight attribute (<xref ref-type="table" rid="idm1841665172">Table 7</xref>).</p>
        <table-wrap id="idm1841665172">
          <label>Table 7.</label>
          <caption>
            <title> Information gain</title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <td> </td>
                <td>Age</td>
                <td>Weight</td>
                <td>Length</td>
              </tr>
              <tr>
                <td>Information Gain</td>
                <td>0.102</td>
                <td>
                  <bold>0.302</bold>
                </td>
                <td>0.247</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>C4.5 will split the records into two sub-sets, the first subset with records whose weight value is less than or equal to “61”, the second subset with records whose weight value is greater than “61”.</p>
        <p>· The first subset  (<xref ref-type="table" rid="idm1841658476">Table 8</xref>) </p>
        <table-wrap id="idm1841658476">
          <label>Table 8.</label>
          <caption>
            <title> Same class label</title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <th>
                  <bold>Age</bold>
                </th>
                <td>
                  <bold>Length</bold>
                </td>
                <td>
                  <bold>Gender</bold>
                </td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Medium</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Long</td>
                <td>female</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>All of the records in this subset have the same class label, so the subset is homogenous, the C4.5 will end at this node and create a leaf node with the value “female”.</p>
        <p>· The second subset.</p>
        <p>This second subset is heterogeneous (<xref ref-type="table" rid="idm1841712116">Table 9</xref>), so the C4.5 algorithm repeats the process until reaching homogenous subsets.</p>
        <table-wrap id="idm1841712116">
          <label>Table 9.</label>
          <caption>
            <title> Heterogeneous class label</title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <th>
                  <bold>Age</bold>
                </th>
                <td>
                  <bold>Length</bold>
                </td>
                <td>
                  <bold>Gender</bold>
                </td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Medium</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Long</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Young</td>
                <td>Medium</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Long</td>
                <td>male</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Short</td>
                <td>female</td>
              </tr>
              <tr>
                <td>Adult</td>
                <td>Long</td>
                <td>male</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>The C4.5 algorithm has many advantages over ID3 algorithm <xref ref-type="bibr" rid="ridm1842359276">9</xref>. One of the main advantages is to manage both continues and categorical attributes,  for the continues attribute and as we saw above, the C4.5 creates a threshold, then divides the initial records into those whose attribute value is greater than the threshold and those that are less than or equal to it. The other advantages are: </p>
        <p>Þ Handling training data set with missing values, the real data set is not perfect. They may have noisy and missing values.</p>
        <p>Þ Pruning trees after creation means that C4.5 will remove branches from the tree that are repeated or unnecessary, by replacing them with a leaf node.</p>
      </sec>
    </sec>
    <sec id="idm1842397836" sec-type="conclusions">
      <title>Conclusion</title>
      <p> In this article, we explain entropy and information gain measures, we discussed the usefulness and the importance of these measures and how they were used in the decision tree algorithm to build a model that will be used later for prediction. This article also gives an overview of the ID3 and C4.5 algorithms, and explain how they work.  In future work we will use these algorithms for classification project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ridm1842508636">
        <label>1.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>A</surname>
            <given-names>S Fitrani</given-names>
          </name>
          <name>
            <surname>M</surname>
            <given-names>A Rosid</given-names>
          </name>
          <name>
            <surname>Findawati</surname>
            <given-names>Y</given-names>
          </name>
          <name>
            <surname>Rahmawati</surname>
            <given-names>Y</given-names>
          </name>
          <name>
            <surname>Anam</surname>
            <given-names>A K</given-names>
          </name>
          <article-title>Implementation of ID3 algorithm classification using web based weka,The</article-title>
          <date>
            <year>2019</year>
          </date>
          <chapter-title>1st International Conference on Engineering and Applied Science, Journal of Physics: Conference Series.IOP Publishing,doi;10.1088/1742-6596/1381/1/012036</chapter-title>
          <fpage>1381</fpage>
          <lpage>012036</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1842577908">
        <label>2.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>Wang</surname>
            <given-names>Yingying</given-names>
          </name>
          <name>
            <surname>Li</surname>
            <given-names>Yibin</given-names>
          </name>
          <name>
            <surname>Song</surname>
            <given-names>Yong</given-names>
          </name>
          <name>
            <surname>Rong</surname>
            <given-names>Xuewen</given-names>
          </name>
          <name>
            <surname>Zhang</surname>
            <given-names>Shuaishuai</given-names>
          </name>
          <date>
            <year>2017</year>
          </date>
          <chapter-title>Improvement of ID3 Algorithm Based on Simplified Information Entropy and Coordination Degree, Algorithms.doi: 10.3390/a10040124.www.mdpi.com/journal/algorithms</chapter-title>
          <fpage>10</fpage>
          <lpage>124</lpage>
          <pub-id pub-id-type="doi">10.3390/a10040124.www.mdpi.com/journal/algorithms</pub-id>
        </mixed-citation>
      </ref>
      <ref id="ridm1842583956">
        <label>3.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Sharma</surname>
            <given-names>Seema</given-names>
          </name>
          <name>
            <surname>Agrawal</surname>
            <given-names>Jitendra</given-names>
          </name>
          <name>
            <surname>Sharma</surname>
            <given-names>Sanjeev</given-names>
          </name>
          <article-title>Classification through Machine Learning Technique: C4.5 Algorithm based on Various Entropies</article-title>
          <date>
            <year>2013</year>
          </date>
          <source>International Journal of Computer Applications(0975-8887)82(16)November2013</source>
        </mixed-citation>
      </ref>
      <ref id="ridm1842373972">
        <label>4.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>Sudrajat</surname>
            <given-names>R</given-names>
          </name>
          <name>
            <surname>Irianingsih</surname>
            <given-names>I</given-names>
          </name>
          <name>
            <surname>Krisnawan</surname>
            <given-names>D</given-names>
          </name>
          <article-title>Analysis of data mining classification by comparison of</article-title>
          <date>
            <year>2017</year>
          </date>
          <chapter-title>C4.5 and ID algorithms, IORA IOP Publishing, IOP Conf. Series: Materials Science and Engineering.doi;10.1088/1757-899X/166/1/012031</chapter-title>
          <fpage>166</fpage>
          <lpage>012031</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1842372820">
        <label>5.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>Setiawan</surname>
            <given-names>K Mukiman H</given-names>
          </name>
          <name>
            <surname>S</surname>
            <given-names>Hanadwiputra</given-names>
          </name>
          <article-title>A Suwarno, C4.5 Classification Algorithm Based On Particle Swarm Optimization To Determine The Delay Order Production Pattern</article-title>
          <date>
            <year>2019</year>
          </date>
          <chapter-title>INCITEST 2019, IOP Conf. Series: Materials Science and Engineering.IOP Publishing.doi:</chapter-title>
          <fpage>10</fpage>
          <lpage>1088</lpage>
          <pub-id pub-id-type="doi">10.1088/1757-899X/662/2/022085</pub-id>
        </mixed-citation>
      </ref>
      <ref id="ridm1842368860">
        <label>6.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Sinam</surname>
            <given-names>I</given-names>
          </name>
          <name>
            <surname>Lawan</surname>
            <given-names>Abdulwahab</given-names>
          </name>
          <article-title>An Improved C4.5 Model Classification Algorithm Based On Taylor’s Series</article-title>
          <source>Jordanian Journal of Computers and Information Technology(JJCIT),April2019</source>
          <volume>5</volume>
          <issue>1</issue>
        </mixed-citation>
      </ref>
      <ref id="ridm1842363308">
        <label>7.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>Tom</surname>
            <given-names>M Mitchell</given-names>
          </name>
          <date>
            <year>1997</year>
          </date>
          <chapter-title>Book Machine Learning”, Chapitre3</chapter-title>
          <fpage>52</fpage>
          <lpage>77</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1842362012">
        <label>8.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>Singh</surname>
            <given-names>Sonia</given-names>
          </name>
          <name>
            <surname>Gupta</surname>
            <given-names>Priyanka</given-names>
          </name>
          <article-title>Comparative Study ID3, CART, and C4.5 Decision Tree Algorithm: a survey”</article-title>
          <date>
            <year>2014</year>
          </date>
          <chapter-title>International Journal of Advanced Information Science and Technology, Vol27, No.27, pp-99</chapter-title>
        </mixed-citation>
      </ref>
      <ref id="ridm1842359276">
        <label>9.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>Kumar</surname>
            <given-names>Sunil</given-names>
          </name>
          <name>
            <surname>Sharma</surname>
            <given-names>Himani</given-names>
          </name>
          <article-title>A Survey on Decision Tree Algorithms of Classification in</article-title>
          <date>
            <year>2016</year>
          </date>
          <chapter-title>Data Mining”, International Journal of Science and Research, Vol-5 Issue4</chapter-title>
          <fpage>2094</fpage>
          <lpage>2095</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1842343316">
        <label>10.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Singh</surname>
            <given-names>Sonia</given-names>
          </name>
          <article-title>Manoj Giri, “Comparative Study Id3, Cart and C4.5 Decision Tree Algorithm: A Survey”</article-title>
          <date>
            <year>2014</year>
          </date>
          <source>International Journal of Advanced Information Science and Technology,July</source>
          <volume>3</volume>
          <issue>7</issue>
          <fpage>47</fpage>
          <lpage>52</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1842341516">
        <label>11.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Pang</surname>
            <given-names>Ning Tan</given-names>
          </name>
          <name>
            <surname>Steinbac</surname>
            <given-names>Michael</given-names>
          </name>
          <name>
            <surname>Kumar</surname>
            <given-names>Vipin</given-names>
          </name>
          <article-title>Book introduction to data mining”</article-title>
          <date>
            <year>2006</year>
          </date>
          <fpage>145</fpage>
          <lpage>168</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>
