GREEDY: complex patterns for pcre matching

Post

Posted
Rating:
#1 (In Topic #1289)
Regular
sergioabreu is in the usergroup ‘Regular’
 Can pcre RegExp capture grouping or individual chars matched ?

For instance  "((\W)(bla)(\W))" shoud return 4 submatches.

Is it ONLY available in Replace ?

Can I pass a callback to Replace to deal with this groups ?

Thanks
Online now: No Back to the top

Post

Posted
Rating:
#2
Guru
BruceSteers is in the usergroup ‘Guru’
Check out many of the other commands in RegExp class if you want to do proper regex matching
/comp/gb.pcre - Gambas Documentation
/comp/gb.pcre/regexp - Gambas Documentation

/comp/gb.pcre/regexp/replace - Gambas Documentation
/comp/gb.pcre/regexp/submatches - Gambas Documentation


There is also Match and Like  , match is more like pcre, Like is a gambas version syntax)
/lang/match - Gambas Documentation
/lang/like - Gambas Documentation

But sounds like you want the features RegExp.class offers
Online now: No Back to the top

Post

Posted
Rating:
#3
Regular
sergioabreu is in the usergroup ‘Regular’
would you help with an specific pattern IN GAMBAS?

  i have a string with utf8 chars and I want pcre identifies the non ascii chars.

  (([\xc2\xc3][\x80-\xbf]))++" this catches grouped "açé ã" I want a list separared like:
 
 3 matches:
 "ç"
 "é"
 "ã"
 I know that should  have the correct level of "greediness" , that I don't know.

  In gambas console, it accepts one slash: [\xc2] but in code must use 2 slashes "\xc2"

 Would you know the exact pattern to match "one non ascii" many times,  knowing that the prefix of utf-8 is \xc2 and \xc3 and the range of second char is [\x80-\xbf] ?

  I know that I should not be using pcre on utf8 but if it can match the whole: "açé ã" … and I know pcre can match a  single \xc3 …

  If someone knows the exact pattern, I would tank. In other languages, Regexp is simpler.   A simple "[\x80-\xff]+" shoul
Online now: No Back to the top

Post

Posted
Rating:
#4
Regular
sergioabreu is in the usergroup ‘Regular’

BruceSteers said

Check out many of the other commands in RegExp class if you want to do proper regex matching
/comp/gb.pcre - Gambas Documentation
/comp/gb.pcre/regexp - Gambas Documentation

/comp/gb.pcre/regexp/replace - Gambas Documentation
/comp/gb.pcre/regexp/submatches - Gambas Documentation


There is also Match and Like  , match is more like pcre, Like is a gambas version syntax)
/lang/match - Gambas Documentation
/lang/like - Gambas Documentation

But sounds like you want the features RegExp.class offers


I have read all of them before.
I am not looking for "test" match, that's easy.
I want CAPTURE the chars matched in the result !
I am getting char++  (all together) and I wonder if a better pattern would make Regexp COUNT go up and give SUBMATCHES of the chars individually.
Online now: No Back to the top

Post

Posted
Rating:
#5
Regular
sergioabreu is in the usergroup ‘Regular’

Close to solution:

Found the solution

Must mix a level of greediness in MANY LAYERS to achieve what I wanted:

Code

"((([\\xa0-\\xc3])\{1,2})+)++"
Online now: No Back to the top

Post

Posted
Rating:
#6
Regular
sergioabreu is in the usergroup ‘Regular’

GREEDY: How capture grouping/individual Matches?

 I could solve with a loop but I imagine finding and counting these chars on a big text, Regexp was made for that.

Maybe the approach is regexp mixed with a loop if needed.
Online now: No Back to the top

Post

Posted
Rating:
#7
Regular
sergioabreu is in the usergroup ‘Regular’
This one is almost there:

Code

"((([\\xa0-\\xc3][\\x80-\\xbf])+?)+?)+"

It gives me (for example):

"çé"
"ç"
"õ"

The ideal was to ungroup the first but is very close of what I want.
Online now: No Back to the top
1 guest and 0 members have just viewed this.