php – Searching keywords(from a matrix) in a string(around 500 char) – Education Career Blog

Hey, basically what i am trying to do is automatically assign Tags to a user input string. Now i have 5 tags to be assigned. Each tag will have around 10 keywords. A String can only be assigned one tag. In order to assign tag to string, i need to search for words matching keywords for all the five tags.

TAGS:     Keywords
Drink:    Beer, whiskey, drinks, drink, pint, peg.....
Fitness:  gym, yoga, massage, exercise......
Apparels: men's shirt, shirt, dress......
Music:    classical, western, sing, salsa.....
Food:     meal, grilled, baked, delicious.......

User String: Take first step to reach your fitness goals, Pay Rs 199 for Aerobics, Yoga, Kick Boxing, Bollywood Dance and more worth Rs 1000 at The very Premium F Chisel Bounce, Koramangala.

Now i need to decide upon a tag for the above string. I need an time efficient algorithm for this problem. I don’t know how to go about matching keywords for strings but i do have a thought about deciding tag. I was thinking to maintain an array count for each tag and as a keyword is matched count for respective tag is increased. if at any time count for any tag reaches 5 we can stop and decide on that tag only this will save us from searching the whole thing.

Please give any advice you have on this. I will be using php just so you know.


Interesting topic! What you are looking for is something similar to latent semantic indexing. There is questing here.


If the number of tags and keywords is small I would save me writing a complex algorithm and simply do:

$tags = array(
    'drink' => array('beer', 'whiskey', ...),
$string = 'Take first step ...';
$bestTag = '';
$bestTagCount = 0;
foreach ($tags as $tag => $keywords) {
    $count = 0;
    foreach ($keywords as $keyword) {
        $count += substr_count($string, $keyword);
    if ($count > $bestTagCount) {
        $bestTagCount = $count;
        $bestTag = $tag;

The algorithm is pretty obvious, but only suited for a small number of tags/keywords.


If you dont mind using an external API, you should try one of these:

To give an example, Zemanta will return the following tags (among other things) for your User String:

Bollywood, Kickboxing, Koramangala, Aerobics, Boxing, Sports, India, Asia

Open Calais will return

Sports, Hospitality Recreation, Health, Recreation, Human behavior, Kick, Yoga, Chisel
Aerobics, Meditation, Indian philosophy, Combat sports, Aerobic exercise, Exercise

Leave a Comment