I need to get parts of a string in a particular format. Tried
everything from split, substring to pattern and matcher. but everytime
it fails with one of the requirements.
str = (((abc) shdj (def) iueexs (ghi)) mkek ONE(tree23) bjm (twooo(bug OR bag)) mvnj THR-EE(<*>$##))
And terms wanted are :
"Hard Coded Term1":abc "Hard Coded Term2":def "Hard Coded Term3":ghi ONE:tree23 twooo:bug,bag THR-EE:<*>$##
Provision to hard code the terms as in the case of first three.
Ugh, you need to first properly specify your requirements, preferably in BNF or equivalent. With that out of the way, you can find the hard coded terms via a regexp
(^|( )((^ ))) (use the 2nd group), and the other terms with a regexp like
(0-9a-zA-Z-_)((^ ))) (use 1st group as name, 2nd group as value, but you will need to process further the 2nd group to split on operands).
You’re in the neighborhood of doing language parsing. Just looking at it, it looks doable with a recursive descent parser, but with that one short example it’s hard to tell for sure.
The tricky think looks to be distinguishing
shdj (def) which should resuit in a “hard coded term ‘def'” from
ONE(tree23) which should return “ONE:tree23”.