GREEDY: complex patterns for pcre matching
Posted
#1
(In Topic #1289)
Regular

For instance "((\W)(bla)(\W))" shoud return 4 submatches.
Is it ONLY available in Replace ?
Can I pass a callback to Replace to deal with this groups ?
Thanks
Posted
Guru

/comp/gb.pcre - Gambas Documentation
/comp/gb.pcre/regexp - Gambas Documentation
/comp/gb.pcre/regexp/replace - Gambas Documentation
/comp/gb.pcre/regexp/submatches - Gambas Documentation
There is also Match and Like , match is more like pcre, Like is a gambas version syntax)
/lang/match - Gambas Documentation
/lang/like - Gambas Documentation
But sounds like you want the features RegExp.class offers
Posted
Regular

i have a string with utf8 chars and I want pcre identifies the non ascii chars.
(([\xc2\xc3][\x80-\xbf]))++" this catches grouped "açé ã" I want a list separared like:
3 matches:
"ç"
"é"
"ã"
I know that should have the correct level of "greediness" , that I don't know.
In gambas console, it accepts one slash: [\xc2] but in code must use 2 slashes "\xc2"
Would you know the exact pattern to match "one non ascii" many times, knowing that the prefix of utf-8 is \xc2 and \xc3 and the range of second char is [\x80-\xbf] ?
I know that I should not be using pcre on utf8 but if it can match the whole: "açé ã" … and I know pcre can match a single \xc3 …
If someone knows the exact pattern, I would tank. In other languages, Regexp is simpler. A simple "[\x80-\xff]+" shoul
Posted
Regular

BruceSteers said
Check out many of the other commands in RegExp class if you want to do proper regex matching
/comp/gb.pcre - Gambas Documentation
/comp/gb.pcre/regexp - Gambas Documentation
/comp/gb.pcre/regexp/replace - Gambas Documentation
/comp/gb.pcre/regexp/submatches - Gambas Documentation
There is also Match and Like , match is more like pcre, Like is a gambas version syntax)
/lang/match - Gambas Documentation
/lang/like - Gambas Documentation
But sounds like you want the features RegExp.class offers
I have read all of them before.
I am not looking for "test" match, that's easy.
I want CAPTURE the chars matched in the result !
I am getting char++ (all together) and I wonder if a better pattern would make Regexp COUNT go up and give SUBMATCHES of the chars individually.
Posted
Regular

Close to solution:
Found the solutionMust mix a level of greediness in MANY LAYERS to achieve what I wanted:
Code
"((([\\xa0-\\xc3])\{1,2})+)++"
Posted
Regular

GREEDY: How capture grouping/individual Matches?
I could solve with a loop but I imagine finding and counting these chars on a big text, Regexp was made for that.Maybe the approach is regexp mixed with a loop if needed.
Posted
Regular

Code
"((([\\xa0-\\xc3][\\x80-\\xbf])+?)+?)+"It gives me (for example):
"çé"
"ç"
"õ"
The ideal was to ungroup the first but is very close of what I want.
1 guest and 0 members have just viewed this.


