I need to extract some data from malformed XML stored in an Oracle database. The XPath expressions would look like this: //image/type/text(). One take at a regular expression which would work in a similar fashion would be <image>.*?<type>(.+?)<\/type> (with appropriate flags for multiline matching).

Since Oracle does not support match groups in any form for REGEXP_SUBSTR I am unsure how to extract a set (with potentially n > 1 members) of match groups from an Oracle CLOB column. Any ideas?


AFAIK you can’t extract a set with Oracle regex functions direcly, but you can iterate through the string calling regex_substr function and saving result to collection (or whatever you need) as a workaround, something like that:

fOccurence := 0;
  fSubstr := regex_substr(fSourceStr, '<image>.*?<type>(.+?)<\/type>', 1, fOccurence, 'gci');
  exit when fSubstr is null;
  fOccurence := fOccurence + 1;
  fResultStr := fResultStr || fSubstr;
end loop;

