artificial intelligence – K Nearest Neighbour Algorithm doubt – Education Career Blog

I am new to Artificial Intelligence. I understand K nearest neighbour algorithm and how to implement it. However, how do you calculate the distance or weight of things that aren’t on a scale?

For example, distance of age can be easily calculated, but how do you calculate how near is red to blue? Maybe colours is a bad example because you still can say use the frequency. How about a burger to pizza to fries for example?

I got a feeling there’s a clever way to do this.

EDIT: Thank you all for very nice answers. It really helped and I appreciate it. But I am thinking there must be a way out.

Can I do it this way? Let’s say I am using my KNN algorithm to do a prediction for a person whether he/she will eat at my restaurant that serves all three of the above food. Of course, there’s other factors but to keep it simple, for the field of favourite food, out of 300 people, 150 loves burger, 100 loves pizza, and 50 loves fries. Common sense tells me favourite food affect peoples’ decision on whether to eat or not.

So now a person enters his/her favourite food as burger and I am going to predict whether he/she’s going to eat at my restaurant. Ignoring other factors, and based on my (training) previous knowledge base, common sense tells me that there’s a higher chance the k nearest neighbours’ distance for this particular field favourite food is nearer as compared to if he entered pizza or fries.

The only problem with that is that I used probability, and I might be wrong because I don’t know and probably can’t calculate the actual distance. I also worry about this field putting too much/too little weight on my prediction because the distance probably isn’t to scale with other factors (price, time of day, whether the restaurant is full, etc that I can easily quantify) but I guess I might be able to get around it with some parameter tuning.

Oh, everyone put up a great answer, but I can only accept one. In that case, I’ll just accept the one with highest votes tomorrow. Thank you all once again.

,

Represent all food for which you collect data as a “dimension” (or a column in a table).

Record “likes” for every person on whom you can collect data, and place the results in a table:

```          Burger  |    Pizza  |   Fries   | Burritos |  Likes my food
person1     1     |        0  |       1   |     1    |      1
person2     0     |        0  |       1   |     0    |      0
person3     1     |        1  |       0   |     1    |      1
person4     0     |        1  |       1   |     1    |      0
```

Now, given a new person, with information about some of the foods he likes, you can measure similarity to other people using a simple measure such as the Pearson Correlation Coefficient, or the Cosine Similarity, etc.

Now you have a way to find K nearest neighbors and make some decision..

For more advanced information on this, look up “collaborative filtering” (but I’ll warn you, it gets math-y).

,

Well, ‘nearest’ implies that you have some metric on which things can be more or less ‘distant’. Quantification of ‘burger’, ‘pizza’, and ‘fries’ isn’t so much a KNN problem as it’s about fundamental system modeling. If you have a system where you’re doing analysis where ‘burger’, ‘pizza’, and ‘fries’ are terms, the reason for the system to exist is going to determine how they’re quantified — like if you’re trying to figure out how to get the best taste and least calories for a given amount of money, then ta-da, you know what your metrics are. (Of course, ‘best taste’ is subjective, but that’s another set of issues.)

It’s not up to these terms to have inherent quantifiability and thereby to tell you how to design your system of analysis; it’s up to you to decide what you’re trying to accomplish and design metrics from there.

,

This is one of the problems of knowledge representation in AI. Subjectively plays a big part. Would you and me agree, for example, on the “closeness” of a burger, pizza and fries?

You’d probably need a look up matrix containing the items to be compared. You may be able to reduce this matrix if you can assume transitivity, but I think even that would be uncertain in your example.

The key may be to try and determine the feature that you are trying to compare on. For example, if you were comparing your food items on health, you may be able to get at something more objective.

,

If you look at “Collective Intelligence”, you’ll see that they assign a scale and a value. That’s how Netflix is comparing movie rankings and such.

You’ll have to define “nearness” by coming up with that scale and assigning values for each.

,

I would actually present pairs of these attributes to users and ask them to define their proximity. You would present them with a scale reaching from synonym..very foreign or similar. Having many people do this you will end up with a widely accepted proximity function for the non-linear attribute values.

,

There is no “best” way to do this. Ultimately, you need to come up with an arbitrary scale.

,

Good answers. You could just make up a metric, or, as malach suggests, ask some people. To really do it right, it sounds like you need bayesian analysis.