<?xml version="1.0" encoding="utf8"?>
 <!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd"> <article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.0" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JBR</journal-id>
      <journal-title-group>
        <journal-title>Journal of Big Data Research</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2768-0207</issn>
      <publisher>
        <publisher-name>Open Access Pub</publisher-name>
        <publisher-loc>United States</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.14302/issn.2768-0207.jbr-23-4478</article-id>
      <article-id pub-id-type="publisher-id">JBR-23-4478</article-id>
      <article-categories>
        <subj-group>
          <subject>research-article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Clustering objects for spatial data mining: a             comparative study </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Youssef</surname>
            <given-names>FAKIR</given-names>
          </name>
          <xref ref-type="aff" rid="idm1842235508">1</xref>
          <xref ref-type="aff" rid="idm1842234788">*</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Rachid</surname>
            <given-names>ELAYACHI</given-names>
          </name>
          <xref ref-type="aff" rid="idm1842235508">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Btissam</surname>
            <given-names>MAHI</given-names>
          </name>
          <xref ref-type="aff" rid="idm1842235508">1</xref>
        </contrib>
      </contrib-group>
      <aff id="idm1842235508">
        <label>1</label>
        <addr-line>Information Processing and Decision Support Laboratory , Department of Computer Science,Faculty of Science and Technology,PO Box. 523, Béni Mellal, Morocco </addr-line>
      </aff>
      <aff id="idm1842234788">
        <label>*</label>
        <addr-line>Corresponding Author</addr-line>
      </aff>
      <contrib-group>
        <contrib contrib-type="editor">
          <name>
            <surname>Hongwei</surname>
            <given-names>Mo</given-names>
          </name>
          <xref ref-type="aff" rid="idm1842354228">1</xref>
        </contrib>
      </contrib-group>
      <aff id="idm1842354228">
        <label>1</label>
        <addr-line>Harbin Engineering University, Harbin 150001, China</addr-line>
      </aff>
      <author-notes>
        <corresp>
    
    Youssef FAKIR, <addr-line>Laboratory of Information Processing and Decision Support, University Sulan Moulay Slimane</addr-line>, Email: <email>info.dec07@yahoo.fr</email></corresp>
        <fn fn-type="conflict" id="idm1841410316">
          <p>The authors declare that they have no conflict of interest.</p>
        </fn>
      </author-notes>
      <pub-date pub-type="epub" iso-8601-date="2023-03-03">
        <day>03</day>
        <month>03</month>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <issue>3</issue>
      <fpage>1</fpage>
      <lpage>11</lpage>
      <history>
        <date date-type="received">
          <day>13</day>
          <month>02</month>
          <year>2023</year>
        </date>
        <date date-type="accepted">
          <day>16</day>
          <month>02</month>
          <year>2023</year>
        </date>
        <date date-type="online">
          <day>03</day>
          <month>03</month>
          <year>2023</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© </copyright-statement>
        <copyright-year>2023</copyright-year>
        <copyright-holder>Youssef FAKIR, et al.</copyright-holder>
        <license xlink:href="http://creativecommons.org/licenses/by/4.0/" xlink:type="simple">
          <license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
        </license>
      </permissions>
      <self-uri xlink:href="http://openaccesspub.org/jbr/article/1918">This article is available from http://openaccesspub.org/jbr/article/1918</self-uri>
      <abstract>
        <p>Spatial data mining (SDM) is searching important  relationships and                 characteristics that can clearly exist   in spatial databases. This content aims to compare object clustering algorithms for spatial data mining, before identifying the most efficient algorithm. To this end, this paper compare k-means, Partionning Around Medoids (PAM) and Clustering Large Applications based on                 RANdomized Search (CLARANS) algorithms based on computing time.                      Experimental results indicate that, CLARANS is very efficient and effective.</p>
      </abstract>
      <kwd-group>
        <kwd>clustering</kwd>
        <kwd>K-means</kwd>
        <kwd>PAM</kwd>
        <kwd>Clarans</kwd>
        <kwd>Datamining</kwd>
      </kwd-group>
      <counts>
        <fig-count count="13"/>
        <table-count count="2"/>
        <page-count count="11"/>
      </counts>
    </article-meta>
  </front>
  <body>
    <sec id="idm1842079644" sec-type="intro">
      <title>Introduction</title>
      <p>Spatial data mining is the discovery of interesting characteristics and patterns that may exist in large spatial databases. It can be used in many applications such as seismology, minefield detection and astronomy. Clustering, in spatial data mining, is a useful technique for grouping a set of objects into classes or clusters such that objects within a cluster have high similarity among each other, but are dissimilar to objects in other clusters. In the last few years , many effective and scalable                clustering methods have been developed. These methods can be categorized into                           partitioning methods, hierarchal methods, density based methods, grid-based              methods,  model-based methods and constrained-based methods <xref ref-type="bibr" rid="ridm1842050900">1</xref>. Spatial data mining aims to automate such a knowledge discovery process. Spatial                      data </p>
      <p>mining has several  objectives, it has an important role in:</p>
      <p>Drawing interesting spatial features and patterns</p>
      <p>Capturing the intrinsic relationships between spatial and non spatial  data,</p>
      <p> Presenting data compilance concisely and at higher conceptual levels, and       helping reorganized spatial databases to  accommodate data semantics, as well as to achieve better good results.</p>
      <p>Data analysis used cluster analysis, which generates a set of data elements into groups (or clusters) so in the same group the elements are similar to each other and different from those in other groups <xref ref-type="bibr" rid="ridm1842050900">1</xref><xref ref-type="bibr" rid="ridm1842052628">2</xref><xref ref-type="bibr" rid="ridm1842061420">3</xref>. Different clustering methods have been developed in several research areas such as statistics, pattern recognition, data mining, and spatial analysis.</p>
      <p>Methods of clustering can be broadly classified into groups, there are two clustering groups:            partitioning clustering and hierarchical clustering. The first group presents several methods, such as K-means and self- organizing map (SOM) <xref ref-type="bibr" rid="ridm1842152348">4</xref> , divide a set of elements to a given cluster number. The attribution of a data element is  done according to a measure of proximity or dissimilarity. The second group, on the other hand, organizes data items into a hierarchy with a sequence of nested partitions or groupings <xref ref-type="bibr" rid="ridm1842052628">2</xref>. Commonly-used hierarchical clustering methods include the Ward’s method <xref ref-type="bibr" rid="ridm1841904412">5</xref>,              single-linkage clustering, average-linkage clustering, and complete- linkage clustering <xref ref-type="bibr" rid="ridm1842050900">1</xref><xref ref-type="bibr" rid="ridm1842052628">2</xref><xref ref-type="bibr" rid="ridm1841901100">6</xref>.</p>
      <p>In this paper, we are working on a comparative analysis of k-mean, Partionning Aroud Medoids (PAM) and Clustering Large Applications based on RANdomized Search (CLARANS) algorithm and for that, we are using Flame and Spiral datasets.</p>
      <p>The paper is organized as follows: section 2 introduces clustering Algorithms Based On Partitioning. Section 3 present the CLARANS algorithm. Experimental results are presented in section 4. Section 5 concludes the paper with a discussion on ongoing works.</p>
      <sec id="idm1842080580">
        <title>Clustering Algorithms Based on Partitioning</title>
        <p>There are two basic types of clustering algorithms <xref ref-type="bibr" rid="ridm1841898580">7</xref>, partitioning and hierarchical algorithms. In this part, we are interested in the first type which builds a partition dataset of objects n in a database D into clusters k. For this algorithm, k is an input parameter that is, some domain knowledge is required, which unfortunately is not Available for many applications. The partitioning algorithm starts with the initialization of D then uses an optimization strategy of the objective function by an iterative control.</p>
      </sec>
      <sec id="idm1842078132">
        <title>K-means</title>
        <p>The K-Means algorithm <xref ref-type="bibr" rid="ridm1841886820">8</xref>, is one of the simplest algorithms that address the well-known clustering problem. The procedure follows a simple method to classify a given dataset through a certain number of clusters k static a priori. The K-Means algorithm run multiple times to decrease the complexity of grouping data. The K-Means is a simple algorithm used in many areas and it is a noble candidate to work for a randomly generated data points. The assignment process repeated and updated until no point changes clusters, or equivalently, until the centroids remain the same.</p>
        <p>K-means algorithm is as follows:</p>
        <p>
          <bold>Algorithm: K-means</bold>
        </p>
        <table-wrap id="idm1841442396">
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <td> </td>
                <td>Input</td>
              </tr>
              <tr>
                <td> </td>
                <td> n: total number of clusters </td>
              </tr>
              <tr>
                <td> </td>
                <td> D: Data set</td>
              </tr>
              <tr>
                <td> </td>
                <td>Output: n clusters.</td>
              </tr>
              <tr>
                <td>a</td>
                <td>for initial cluster center randomly choose n objects from data set D; </td>
              </tr>
              <tr>
                <td>b</td>
                <td>do; </td>
              </tr>
              <tr>
                <td>c</td>
                <td>based on the mean value of the objects in the cluster, (re) assign each similar object to the cluster; </td>
              </tr>
              <tr>
                <td>d</td>
                <td>update each cluster means by calculating the mean value of objects for each              cluster; </td>
              </tr>
              <tr>
                <td>e</td>
                <td>until no change found;</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
      </sec>
      <sec id="idm1842062668">
        <title>Partitioning Aroud Medoids (PAM)</title>
        <p>PAM is similar to K-means algorithm. Like k- means algorithm, PAM divides data sets in to groups but based on mediods whereas k-means is based on centroids. By using mediods we can reduce the dissimilarity of objects within a cluster. In PAM, first calculate the mediod, then assigned the object to the nearest mediod, which forms a cluster.</p>
        <p>The basic idea of PAM algorithm is choosing an initial representative object (center) for each cluster at random. The remaining objects are assigned to the nearest cluster<xref ref-type="bibr" rid="ridm1841891932">9</xref>, according to its dissimilarity with the representative object. In order to improve the quality of clustering, it is necessary for the iterative process to replace the non- representative objects with the representative object repeatedly. Cost function is used to measure whether this non- representative object instead of the current representative object or not. If so, then replace; else, not replace. The correct classification is given.</p>
        <p>In the remainder, we use:</p>
        <p>I<sub>m</sub>: current medoid that is to be replaced ,</p>
        <p>I<sub>p</sub> :the new medoid to replace im,</p>
        <p>I<sub>j</sub>: other nonmedoid objects that may or may not need to be moved</p>
        <p>I<sub>j</sub>,2 : current medoid that is nearest to Ij.</p>
        <p>Now, to formalize the effect of a swap between Im and Ip, PAM computes costs Ci for all nonmedoid objects I<sub>j</sub>.</p>
        <p><bold>Case 1. </bold>suppose Ij currently belongs to the cluster represented by Im. Furthermore, let Ij be more similar to Ij,2 than to Ip, i.e., d(Ij, Ip )⩾ d(Ij, Ij,2 ), where Ij,2 is the second most similar medoid to Ij. Thus, if Im is replaced by Ip as a medoid, Ij would belong to the cluster represented by Ij,2 . Hence, the cost of the swap as far as Ij is concerned is:</p>
        <p><inline-graphic xlink:href="images/image1.png" mime-subtype="png"/> </p>
        <p>This equation always gives a nonnegative Ci, indicating that there is a nonnegative cost incurred in replacing Im with Ip.</p>
        <p><bold>Case 2. </bold>Ij currently belongs to the cluster represented by I<italic>m</italic>. But, this time, Ij is less similar to Ij,2 than to Ip, i.e., d(Ij, Ip) &lt; d(Ij, Ij,2 ). Then, if Im is replaced by Ip, Ij would belong to the cluster represented by Ip. Thus, the cost for Ij is given by:</p>
        <fig id="idm1841384036">
          <graphic xlink:href="images/image2.png" mime-subtype="png"/>
        </fig>
        <p>  Unlike in (1),Ci here can be positive or negative, depending on wether I<sub>j</sub> is more similar to I<sub>m</sub> or to I<sub>p</sub></p>
        <p><bold>Case 3. </bold>suppose that Ij currently belongs to a cluster other than the one represented by Im. Let Ij,2 be the representative object of that cluster. Furthermore, let Ij be more similar to Ij,2 than to Ip. Then, even if Im is replaced by Ip, Ij would stay in the cluster represented by Ij,2. Thus, the cost is:</p>
        <p><inline-graphic xlink:href="images/image3.png" mime-subtype="png"/> </p>
        <p><bold> </bold><bold>Case 4. </bold>Ij currently belongs to the cluster represented by Ij,2. But, Ij is less similar to Ij,2 than to Ip. Then, replacing Im with Ip would cause Ij to jump to the cluster of Ip from that of Ij,2. Thus, the cost is:</p>
        <p><inline-graphic xlink:href="images/image4.png" mime-subtype="png"/> </p>
        <p> </p>
        <p>and is always negative. Combining the four cases above, the total cost of replacing Im with Ip is given by:</p>
        <p><inline-graphic xlink:href="images/image5.png" mime-subtype="png"/> </p>
        <p>
We now present PAM algorithm </p>
        <p>
          <bold>Algorithm</bold>
          <bold>PAM</bold>
        </p>
        <table-wrap id="idm1841393252">
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <td>1</td>
                <td>Select k representative objects arbitrarily. </td>
              </tr>
              <tr>
                <td>2</td>
                <td>Compute <italic>DCmp</italic>for all pairs of objects Im, Ip where Im is currently selected, and Ip is not.</td>
              </tr>
              <tr>
                <td>3</td>
                <td>Select the pair Im,Ip which corresponds to min(Im, Ip) <italic>DCmp</italic>. If the minimum<italic>DCmp</italic>is negative,  replace Im with Ip, and go back to step 2. </td>
              </tr>
              <tr>
                <td>4</td>
                <td>Otherwise, for each nonselected object, find the most similar representative object. Halt</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>The experimental study show that PAM is efficient for small data sets, (e.g., 100 objects in 5 clusters) but is not sufficient to process medium and large data sets. For this reason a complexity analysis on PAM is necessary. This analysis motivates the development of CLARANS.</p>
      </sec>
      <sec id="idm1842015460">
        <title>Clustering algorithms based on randomized search</title>
        <p>In this section we will show our clustering algorithm CLARANS. Compared to the revealed clustering methods, CLARANS is very effective and efficient (experimental results). CLARANS is a variant of PAM <xref ref-type="bibr" rid="ridm1841889988">10</xref><xref ref-type="bibr" rid="ridm1841879636">11</xref><xref ref-type="bibr" rid="ridm1841875532">12</xref>, that uses the same neighbourhood operation but takes the form of a stochastic                 first-found hill climber. In each iteration, a medoid object, i, and non-medoid object, j, are selected at random until the clustering produced when their rules are switched is better than the current                 clustering. The algorithm begins with a random choice of k-medoids, and no construction phase is required. Basing on the study <xref ref-type="bibr" rid="ridm1841889988">10</xref> two parameters are used in this algorithm, <italic>numlocal</italic>and                <italic>maxneighbour</italic>. <italic>numlocal</italic>indicates how many runs of the local search algorithm are performed. At the end of each run, the algorithm restarts at a randomly selected solution. The second parameter, maxneighbour, indicates the maximum number of neighbours the algorithm examines at each step.</p>
        <p>Algorithm CLARANS</p>
        <p>Enter numlocal and maxneighbor as input parameters. Initialize i to 1, and mincost to a large number</p>
        <p>Set current to an arbitrary node in Gn,k .</p>
        <p>Set j to 1 .</p>
        <p>Random neighbor S of the current, and based on equation 5, calculate the cost differential of the two nodes. If S has a lower cost, set current to S, and go to step 3.</p>
        <p>Otherwise, increment j by 1. If j maxneighbor, go to step 4.</p>
        <p>Otherwise, when j &gt; maxneighbor, compare the results if the first is lower than mincost, apply mincost on the cost of the current and apply bestnode on current</p>
        <p>Increment i by 1. If i &gt; numlocal, output bestnode and halt. Otherwise, go to step 2. </p>
      </sec>
      <sec id="idm1842014092">
        <title>Experimental results</title>
        <p>In order to evaluate CLARANS in practice, we compare its performance with that of different k-medoids clustering techniques, using the dataset. The three algorithms were implemented using                 Python programmining language on PC, Intel Core i5 CPU (2.40 GHz) with 8GB RAM, Windows 10.</p>
      </sec>
      <sec id="idm1842015028">
        <title>Results of K-means</title>
        <p>We start by implementing the k-means algorithm based on data distribution models using this algorithm, we want to distribute the data to a precise number of clusters. To achieve this task, we have chosen a dataset (of 240 data). In <xref ref-type="fig" rid="idm1841363876">figure 1</xref> we choose the dataset and the number of cluster k = 3. We get the execution result after four iterations as shown in <xref ref-type="fig" rid="idm1841364452">Figure 2</xref>. <xref ref-type="fig" rid="idm1841360420">Figure 3</xref> illustrate the performance measure of K-means while <xref ref-type="fig" rid="idm1841361212">figure 4</xref> shows the distribution of clusters.</p>
        <fig id="idm1841363876">
          <label>Figure 1.</label>
          <caption>
            <title> Data initialization</title>
          </caption>
          <graphic xlink:href="images/image6.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841364452">
          <label>Figure 2.</label>
          <caption>
            <title> Result  execution</title>
          </caption>
          <graphic xlink:href="images/image7.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841360420">
          <label>Figure 3.</label>
          <caption>
            <title> Performance measures</title>
          </caption>
          <graphic xlink:href="images/image8.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841361212">
          <label>Figure 4.</label>
          <caption>
            <title> Plot of k-means algorithm</title>
          </caption>
          <graphic xlink:href="images/image9.jpg" mime-subtype="jpg"/>
        </fig>
      </sec>
      <sec id="idm1842009052">
        <title>Results of PAM</title>
        <p>For the PAM algorithm, we use the same dataset from the previous part of k-means (240 data). <xref ref-type="fig" rid="idm1841358332">Figure 5</xref> illustrate the data initialisation of using PAM. To start the algorithm, we have fixed a number of clusters k = 3. In this program we have proposed a TCMP coefficient that can be calculated at each iteration, if TCMP is negative we continue the iterations. The iteration stop in the first positive TCMP value and note the medoid values in order to determine the clusters (<xref ref-type="fig" rid="idm1841336508">Figure 6</xref> &amp; <xref ref-type="fig" rid="idm1841337228">Figure 7</xref>). Performace measure is given in <xref ref-type="fig" rid="idm1841334708">Figure 8</xref>, and the distribution of clusters is illustrated in <xref ref-type="fig" rid="idm1841334060">figure 9</xref>.</p>
        <fig id="idm1841358332">
          <label>Figure 5.</label>
          <caption>
            <title> Data initialization(PAM)</title>
          </caption>
          <graphic xlink:href="images/image10.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841336508">
          <label>Figure 6.</label>
          <caption>
            <title> Iteration medoids</title>
          </caption>
          <graphic xlink:href="images/image11.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841337228">
          <label>Figure 7.</label>
          <caption>
            <title> Number of cluster</title>
          </caption>
          <graphic xlink:href="images/image12.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841334708">
          <label>Figure 8.</label>
          <caption>
            <title> Performance measure</title>
          </caption>
          <graphic xlink:href="images/image13.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841334060">
          <label>Figure 9.</label>
          <caption>
            <title> Plot of PAM algorithm</title>
          </caption>
          <graphic xlink:href="images/image14.jpg" mime-subtype="jpg"/>
        </fig>
      </sec>
      <sec id="idm1841993972">
        <title>Result of CLARANS</title>
        <p>The CLARANS algorithm consists in finding the most representative objects in each of the clusters for which these objects cost (for example, the sum of the distance to others belonging to the same cluster) is minimal. These objects are called medoids. After selecting the medoids, each of the grouped objects goes to the cluster whose representative is the medoid closest to that object. CLARANS is based on a random search for medoid candidates, a cost calculation and a comparison with the current best local solution.</p>
        <p>This action is repeated until the number of randomly selected objects with a cost greater than the current one the lowest local cost will not exceed ‘ neighbors ‘. Then the cost of the best local solution is compared to the best global solution achieved to date and if it is smaller then local solution               becomes global. Whether the local solution has become global or not, the counter is incremented by 1, and the algorithm is run again until the number of such passes reaches ‘local minimum’ values. Then, the current best overall solution is returned to run this algorithm :</p>
        <p>Number of clusters in the data is 3</p>
        <p>Number of objects (points / polygons) to generate, the value is 240.</p>
        <p>Number of clusters / medoids to search, the default is the value of the numlocal parameter, the    value is 3.</p>
        <p>Maximal number of neighbors in claster 15</p>
        <p>Points in two-dimensional space with x and y coordinates implemented as a class with x and y                attributes were adopted as a data model. The second spatial data model used for analysis are polygons also defined in two- dimensional space, implemented as a class with the vertices attribute being a list containing the point objects belonging to the polygon.</p>
        <p>The cost has been implemented:</p>
        <p>for points as the sum of the distances between all the points in the individual clusters and the              medoids in these clusters.</p>
        <p>for polygons, as the sum of the smallest distance between the vertices of all polygons in the               individual groups and the medoid polygons in these groups.</p>
        <p><xref ref-type="fig" rid="idm1841328228">Figure 10</xref> and <xref ref-type="fig" rid="idm1841328660">Figure 11</xref> shows the result obtaiined by CLARANS algorithm.</p>
        <fig id="idm1841328228">
          <label>Figure 10.</label>
          <caption>
            <title> Execution result</title>
          </caption>
          <graphic xlink:href="images/image15.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841328660">
          <label>Figure 11.</label>
          <caption>
            <title> Plot of CLARANS algorithm</title>
          </caption>
          <graphic xlink:href="images/image16.jpg" mime-subtype="jpg"/>
        </fig>
      </sec>
      <sec id="idm1841988788">
        <title>Runtime</title>
        <p>Execution time has been measured by the required time for forming clusters by the clustering                 algorithms k- means, PAM and CLARANS. Time required for each algorithm has been recorded and               results have been drawn. Computing time (in seconds) are shown in <xref ref-type="table" rid="idm1841341260">table 1</xref> and <xref ref-type="table" rid="idm1841305564">table 2</xref>. For this work we choose two spatial datasets (312 data) and flame (240 data). A graphical representation of results has been created in <xref ref-type="fig" rid="idm1841342484">Figure 12</xref> and <xref ref-type="fig" rid="idm1841341332">Figure 13</xref>, which is a chart used for comparison with clustering           algorithms.</p>
        <fig id="idm1841342484">
          <label>Figure 12.</label>
          <caption>
            <title> Time chart of k-means, PAM and CLARANS algorithms to form three clusters of different dataset. </title>
          </caption>
          <graphic xlink:href="images/image17.jpg" mime-subtype="jpg"/>
        </fig>
        <fig id="idm1841341332">
          <label>Figure 13.</label>
          <caption>
            <title> Time chart of k-means, PAM and CLARANS algorithms to form ten clusters of different dataset. </title>
          </caption>
          <graphic xlink:href="images/image18.jpg" mime-subtype="jpg"/>
        </fig>
        <table-wrap id="idm1841341260">
          <label>Table 1.</label>
          <caption>
            <title> Time referency (k=3)</title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <td>Dataset</td>
                <td>PAM</td>
                <td>Kmeans</td>
                <td>CLARANS</td>
              </tr>
              <tr>
                <td>Flame (240)</td>
                <td>13.8</td>
                <td>0.5</td>
                <td>0.21</td>
              </tr>
              <tr>
                <td>Spiral (312)</td>
                <td>56.4</td>
                <td>1.69</td>
                <td>0.36</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <table-wrap id="idm1841305564">
          <label>Table 2.</label>
          <caption>
            <title> Time referency (k=10) </title>
          </caption>
          <table rules="all" frame="box">
            <tbody>
              <tr>
                <td>Dataset</td>
                <td>PAM</td>
                <td>Kmeans</td>
                <td>CLARANS</td>
              </tr>
              <tr>
                <td>Flame (240)</td>
                <td>363.63</td>
                <td>3.34</td>
                <td>0.6</td>
              </tr>
              <tr>
                <td>Spiral (312)</td>
                <td>304.53</td>
                <td>6.11</td>
                <td>0.68</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>Comparing computing time of the clustering algorithms hows that the CLARANS algorithm takes less time than others, that means this work indicates CLARANS performance is better than others according to time chart.</p>
        <p>All the proposed algorithms has been implemented using programming language python. From the experimental results, the graphical representation shows that CLARANS has a low execution time value compared to other algorithms for (k = 3 (0.21; 0.36)), (k = 10 (0.60; 0.68)) even if we have to change the value of k, and also the size of dataset but PAM has a longer execution time.</p>
        <p>for the execution time of Kmeans has an intermediate value between the two algorithms.</p>
        <p>the number of k clusters has an influence on the execution time.</p>
        <p> the size of dataset has an effect on the variation of the execution time.    Clarans is the best clustering algorithm compared to PAM and K-MEANS.</p>
      </sec>
    </sec>
    <sec id="idm1841965028" sec-type="conclusions">
      <title>Conclusion</title>
      <p>Clustering is an unsupervised method aimed at grouping and creating a collection of similar objects within the same group . . This work is about algorithms of clustering and their effectiveness for spatial data mining. The experimental result indicates that CLARANS algorithm reduces the execution time and gives better clusters when compared with other algorithms. But it’s not always the case. CLARANS doesn’t give better cluster. So future research work will be first focused on developing an algorithm, which will increase the performance of segmentation process. Further work also lies in this area. A result line can be drawn from this experimental work and is it is that CLARANS algorithm has better efficiency than PAM and K-MEANS algorithm.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ridm1842050900">
        <label>1.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <article-title>Ch.N.Santhosh Kumar.(2012).Spatial Data Mining using Cluster Analysis</article-title>
          <source>International Journal of Computer Science &amp; Information Technology (IJCSIT) Vol</source>
          <volume>4</volume>
          <issue>4</issue>
        </mixed-citation>
      </ref>
      <ref id="ridm1842052628">
        <label>2.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Aksac</surname>
            <given-names>Alper</given-names>
          </name>
          <article-title>Tansel Ozyer and Reda Alhajj, (2020).Data on cut-edge for spatial clustering based onproximity graphs,ElsevierDatain brief28</article-title>
        </mixed-citation>
      </ref>
      <ref id="ridm1842061420">
        <label>3.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Liu</surname>
            <given-names>Yaohui</given-names>
          </name>
          <name>
            <surname>Liu</surname>
            <given-names>Dong</given-names>
          </name>
          <name>
            <surname>Yu</surname>
            <given-names>Fang</given-names>
          </name>
          <name>
            <surname>Ma</surname>
            <given-names>Zhengming</given-names>
          </name>
          <article-title>A Double-Density Clustering Method Basedon “Nearest to First in”Strategy,Symmetry</article-title>
          <date>
            <year>2020</year>
          </date>
          <fpage>12</fpage>
          <lpage>747</lpage>
          <pub-id pub-id-type="doi">10.3390/sym12050747</pub-id>
        </mixed-citation>
      </ref>
      <ref id="ridm1842152348">
        <label>4.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Kohonen</surname>
            <given-names>T</given-names>
          </name>
          <article-title>Self-organizing maps</article-title>
          <date>
            <year>2001</year>
          </date>
          <publisher-name>Springer</publisher-name>
          <publisher-loc>Berlin New York</publisher-loc>
        </mixed-citation>
      </ref>
      <ref id="ridm1841904412">
        <label>5.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Mark</surname>
            <given-names>J Embrechts</given-names>
          </name>
          <name>
            <surname>Christopher</surname>
            <given-names>J Gatti</given-names>
          </name>
          <name>
            <surname>Linton</surname>
            <given-names>Jonathan</given-names>
          </name>
          <name>
            <surname>Roysam</surname>
            <given-names>Badrinath</given-names>
          </name>
          <article-title>Hierarchical Clustering for Large Data Sets, in book:AdvancesinIntelligent SignalProcessingandDataMining</article-title>
          <date>
            <year>2013</year>
          </date>
        </mixed-citation>
      </ref>
      <ref id="ridm1841901100">
        <label>6.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>J</surname>
            <given-names>H Ward</given-names>
          </name>
          <article-title>Hierarchical grouping to optimise an objective function</article-title>
          <date>
            <year>1963</year>
          </date>
          <source>Journal of the American Statistic Association</source>
          <volume>58</volume>
          <fpage>236</fpage>
          <lpage>244</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1841898580">
        <label>7.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Han</surname>
            <given-names>J</given-names>
          </name>
          <name>
            <surname>Kamber</surname>
            <given-names>M</given-names>
          </name>
          <article-title>Data mining: Concepts and techniques.MorganKaufmannPublishers</article-title>
          <date>
            <year>2001</year>
          </date>
        </mixed-citation>
      </ref>
      <ref id="ridm1841886820">
        <label>8.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>Fakir</surname>
            <given-names>Youssef</given-names>
          </name>
          <name>
            <surname>Elklil</surname>
            <given-names>Jihane</given-names>
          </name>
          <article-title>Clustering Techniques for Big Data Mining</article-title>
          <date>
            <year>2021</year>
          </date>
          <chapter-title>Lecture Notes in Business Information Processing book series(LNBIP</chapter-title>
          <volume>416</volume>
        </mixed-citation>
      </ref>
      <ref id="ridm1841891932">
        <label>9.</label>
        <mixed-citation xlink:type="simple" publication-type="book">
          <name>
            <surname>J</surname>
            <given-names>B MacQueen</given-names>
          </name>
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          <date>
            <year>1967</year>
          </date>
          <chapter-title>inProceedings of the 5thBerkeley SymposiumonMathematicalStatisticsandProbability,Vol</chapter-title>
          <volume>1</volume>
          <fpage>281</fpage>
          <lpage>297</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1841889988">
        <label>10.</label>
        <mixed-citation xlink:type="simple" publication-type="book"><name><surname>A</surname><given-names>P Reynolds</given-names></name><name><surname>Richards</surname><given-names>G</given-names></name><name><surname>V</surname><given-names>J Rayward-Smith</given-names></name><article-title>The application of K-medoids and PAM to the clustering of rules</article-title><date><year>2004</year></date><chapter-title>in Proceedings of theFifth International Conference on IntelligentData EngineeringandAutomatedLearning (IDEAL’04)</chapter-title><fpage>173</fpage><lpage>178</lpage>
in Z. R. Yang, H. Yin, and R. Everson (eds.)



</mixed-citation>
      </ref>
      <ref id="ridm1841879636">
        <label>11.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>R</surname>
            <given-names>T Ng</given-names>
          </name>
          <name>
            <surname>Han</surname>
            <given-names>J</given-names>
          </name>
          <article-title>CLARANS: A method for clustering objects for spatial data mining</article-title>
          <date>
            <year>2002</year>
          </date>
          <source>IEEE Trans.Knowl. DataEng</source>
          <volume>14</volume>
          <issue>5</issue>
          <fpage>1003</fpage>
          <lpage>1016</lpage>
        </mixed-citation>
      </ref>
      <ref id="ridm1841875532">
        <label>12.</label>
        <mixed-citation xlink:type="simple" publication-type="journal">
          <name>
            <surname>Raymond</surname>
            <given-names>T Ng</given-names>
          </name>
          <name>
            <surname>Han</surname>
            <given-names>Jiawei</given-names>
          </name>
          <article-title>CLARANS: A Method for Clustering Objects for Spatial Data Mining,IEEEtransactionsonknowledgeanddataengineering</article-title>
          <date>
            <year>2002</year>
          </date>
          <volume>14</volume>
          <issue>5</issue>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>
