gbWilly	11,754
cogier	11,689
Poly	6,055
sholzy	3,769
Quincunxian	3,590

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

GREEDY: complex patterns for pcre matching

Post

Posted July 21st 2024, 8:48 AM

Rating:

#1 (In Topic #1289)

Regular

Can pcre RegExp capture grouping or individual chars matched ?

For instance "((\W)(bla)(\W))" shoud return 4 submatches.

Is it ONLY available in Replace ?

Can I pass a callback to Replace to deal with this groups ?

Thanks

Post

Posted July 21st 2024, 2:28 PM

Rating:

BruceSteers

Banned

Check out many of the other commands in RegExp class if you want to do proper regex matching
/comp/gb.pcre - Gambas Documentation
/comp/gb.pcre/regexp - Gambas Documentation

/comp/gb.pcre/regexp/replace - Gambas Documentation
/comp/gb.pcre/regexp/submatches - Gambas Documentation

There is also Match and Like , match is more like pcre, Like is a gambas version syntax)
/lang/match - Gambas Documentation
/lang/like - Gambas Documentation

But sounds like you want the features RegExp.class offers

Post

Posted July 21st 2024, 6:28 PM

Rating:

sergioabreu

Regular

would you help with an specific pattern IN GAMBAS?

  i have a string with utf8 chars and I want pcre identifies the non ascii chars.

  (([\xc2\xc3][\x80-\xbf]))++" this catches grouped "açé ã" I want a list separared like:

3 matches:
"ç"
"é"
"ã"
I know that should have the correct level of "greediness" , that I don't know.

  In gambas console, it accepts one slash: [\xc2] but in code must use 2 slashes "\xc2"

Would you know the exact pattern to match "one non ascii" many times, knowing that the prefix of utf-8 is \xc2 and \xc3 and the range of second char is [\x80-\xbf] ?

  I know that I should not be using pcre on utf8 but if it can match the whole: "açé ã" … and I know pcre can match a single \xc3 …

  If someone knows the exact pattern, I would tank. In other languages, Regexp is simpler.   A simple "[\x80-\xff]+" shoul

Post

Posted July 21st 2024, 6:34 PM

Rating:

sergioabreu

Regular

BruceSteers said

Check out many of the other commands in RegExp class if you want to do proper regex matching
/comp/gb.pcre - Gambas Documentation
/comp/gb.pcre/regexp - Gambas Documentation

/comp/gb.pcre/regexp/replace - Gambas Documentation
/comp/gb.pcre/regexp/submatches - Gambas Documentation

There is also Match and Like , match is more like pcre, Like is a gambas version syntax)
/lang/match - Gambas Documentation
/lang/like - Gambas Documentation

But sounds like you want the features RegExp.class offers

I have read all of them before.
I am not looking for "test" match, that's easy.
I want CAPTURE the chars matched in the result !
I am getting char++ (all together) and I wonder if a better pattern would make Regexp COUNT go up and give SUBMATCHES of the chars individually.

Post

Posted July 21st 2024, 6:53 PM

Rating:

sergioabreu

Regular

Close to solution:

Found the solution

Must mix a level of greediness in MANY LAYERS to achieve what I wanted:

Code

"((([\\xa0-\\xc3])\{1,2})+)++"

Post

Posted July 21st 2024, 7:04 PM

Rating:

sergioabreu

Regular

GREEDY: How capture grouping/individual Matches?

I could solve with a loop but I imagine finding and counting these chars on a big text, Regexp was made for that.

Maybe the approach is regexp mixed with a loop if needed.

Post

Posted July 21st 2024, 7:23 PM

Rating:

sergioabreu

Regular

This one is almost there:

Code

"((([\\xa0-\\xc3][\\x80-\\xbf])+?)+?)+"

It gives me (for example):

"çé"
"ç"
"õ"

The ideal was to ungroup the first but is very close of what I want.

1 guest and 0 members have just viewed this.

Not logged in

Leader-board

GREEDY: complex patterns for pcre matching

Post

Post

Post

Post

BruceSteers said

Post

Close to solution:

Code

Post

GREEDY: How capture grouping/individual Matches?

Post

Code

Statistics