php – Searching keywords(from a matrix) in a string(around 500 char) – Education Career Blog

Hey, basically what i am trying to do is automatically assign Tags to a user input string. Now i have 5 tags to be assigned. Each tag will have around 10 keywords. A String can only be assigned one tag. In order to assign tag to string, i need to search for words matching keywords for all the five tags.
Example:

TAGS:     Keywords
Drink:    Beer, whiskey, drinks, drink, pint, peg.....
Fitness:  gym, yoga, massage, exercise......
Apparels: men's shirt, shirt, dress......
Music:    classical, western, sing, salsa.....
Food:     meal, grilled, baked, delicious.......

User String: Take first step to reach your fitness goals, Pay Rs 199 for Aerobics, Yoga, Kick Boxing, Bollywood Dance and more worth Rs 1000 at The very Premium F Chisel Bounce, Koramangala.


Now i need to decide upon a tag for the above string. I need an time efficient algorithm for this problem. I don’t know how to go about matching keywords for strings but i do have a thought about deciding tag. I was thinking to maintain an array count for each tag and as a keyword is matched count for respective tag is increased. if at any time count for any tag reaches 5 we can stop and decide on that tag only this will save us from searching the whole thing.

Please give any advice you have on this. I will be using php just so you know.
thanks

,

Interesting topic! What you are looking for is something similar to latent semantic indexing. There is questing here.

,

If the number of tags and keywords is small I would save me writing a complex algorithm and simply do:

$tags = array(
    'drink' => array('beer', 'whiskey', ...),
    ...
);
$string = 'Take first step ...';
$bestTag = '';
$bestTagCount = 0;
foreach ($tags as $tag => $keywords) {
    $count = 0;
    foreach ($keywords as $keyword) {
        $count += substr_count($string, $keyword);
    }
    if ($count > $bestTagCount) {
        $bestTagCount = $count;
        $bestTag = $tag;
    }
}
var_dump($bestTag);

The algorithm is pretty obvious, but only suited for a small number of tags/keywords.

,

If you dont mind using an external API, you should try one of these:

To give an example, Zemanta will return the following tags (among other things) for your User String:

Bollywood, Kickboxing, Koramangala, Aerobics, Boxing, Sports, India, Asia

Open Calais will return

Sports, Hospitality Recreation, Health, Recreation, Human behavior, Kick, Yoga, Chisel
Aerobics, Meditation, Indian philosophy, Combat sports, Aerobic exercise, Exercise

Leave a Comment