The following dataset contains eight items representing colored points on the x-y plane:
х y color
1 1 red
1 3 green
2 5 blue
3 5 green
4 1 blue
4 4 red
5 3 blue
5 4 green
Using this data as the training set, run the k-nearest-neighbors classification algorithm (k=1) using square distance and choose the color for a new item with x = 3 and y = 3.

Respuesta :

Solution :

According to the question, the following datasheet contains eight items representing the colored points o the x-y plane.

We have to use this data as training set to run the k-nearest classification algorithm to decide most likely color for a new item with x = 3 and y = 3.

The distance between the points is actual distance on x-y plane, called as Eucledian distance.

We will make a data table, by calculating distance from (3, 3) of each point. By using formula :

Distance, [tex]$d = \sqrt{(x_2-x_1)^2+(y_2-y_1)^2}$[/tex]

x      y    Color       Distance from point (3, 3)

1       1     Red         [tex]$\sqrt{(3-1)^2+(3-1)^2} = 2.82$[/tex]

1      3    Green      [tex]$\sqrt{(3-1)^2+(3-3)^2} = 2$[/tex]              

2     5    Blue         [tex]$\sqrt{(3-2)^2+(3-5)^2} = 2.23$[/tex]      

3     5   Green       [tex]$\sqrt{(3-3)^2+(3-5)^2} = 2$[/tex]

4     1    Blue          [tex]$\sqrt{(3-4)^2+(3-1)^2} = 2.23$[/tex]

4     4   Red          [tex]$\sqrt{(3-4)^2+(3-4)^2} = 1.41$[/tex]

5     3   Blue        [tex]$\sqrt{(3-5)^2+(3-3)^2} = 2$[/tex]

5     4   Green     [tex]$\sqrt{(3-5)^2+(3-4)^2} = 2.23$[/tex]

Now, we will do sorting of colors with distance in ascending order.

We get, [ Red, Green, Green, Blue, Blue, Blue, Green, Red]  

Now if we run the algorithm with k = 1, then we pick only 1 color having the shortest distance that will be assigned to the given point.

Therefore the color is RED.

If we run the algorithm with k = 4, we will pick up [tex]$4 \text{ colors}$[/tex] with shortest distance which are [tex]$\text{red, green, green, blue}$[/tex]. Since, now we know, Green has the greatest frequency among 4, hence the answer is Green.