UTF8 BOM mark crashes gb.form.editor

Post

Posted
Rating:
#1 (In Topic #1489)
Regular
sergioabreu is in the usergroup ‘Regular’
Hello

UTF-8 encoding supports the add of a BOM mark at the firts 3 bytes of an UTF-8 file

These chars are \xEF\xBB\xBF and it is a "signature" that the document is formally UTF-8.

Although this BOM mark is optional in UTF-8, some documents may have it, so I think that it should be treated by Mr Benoir because if a document contais it, it will crash gb.form.editor. The crash message says it can't render the "image". I am not sure, but it seems that gambas tries to convert BOM mark in a utf-8 visible symbol.

So the solution is to SKIP these 3 bytes if they are present at the beginning of a document.

Code

'Suppose you got data from the file... in a variable called data:
If Left(data, 3) == "\xEF\xBB\xBF" Then
  data = Mid( data, 4) ' Skips the BOM Mark, making the data safe for gambas
EndIf
'From here the data will be "clean"
This post is kind a hidden bug report. I am NOT a critic of gambas at all, totally the opposite: I  am an enthusiast of it and wanna help to make gambas better and better.

Regards.

Sergio Abreu - Brazil
Online now: No Back to the top

Post

Posted
Rating:
#2
Avatar
Administrator
sholzy is in the usergroup ‘unknown’

sergioabreu said

Hello

UTF-8 encoding supports the add of a BOM mark at the firts 3 bytes of an UTF-8 file

These chars are \xEF\xBB\xBF and it is a "signature" that the document is formally UTF-8.

Although this BOM mark is optional in UTF-8, some documents may have it, so I think that it should be treated by Mr Benoir because if a document contais it, it will crash gb.form.editor. The crash message says it can't render the "image". I am not sure, but it seems that gambas tries to convert BOM mark in a utf-8 visible symbol.

So the solution is to SKIP these 3 bytes if they are present at the beginning of a document.

Code

'Suppose you got data from the file... in a variable called data:
If Left(data, 3) == "\xEF\xBB\xBF" Then
  data = Mid( data, 4) ' Skips the BOM Mark, making the data safe for gambas
EndIf
'From here the data will be "clean"
This post is kind a hidden bug report. I am NOT a critic of gambas at all, totally the opposite: I  am an enthusiast of it and wanna help to make gambas better and better.

Regards.

Sergio Abreu - Brazil

Thanks for the heads up! That information might help out someone.

If you think it's a bug, then you need to report it on the Gambas bug tracker. Reporting there will be the only way it can get fixed.

sholzy
Gambas One Site Director

To report bugs in the Gambas IDE:
Official Gambas Bug Tracker
Online now: No Back to the top

Post

Posted
Rating:
#3
Guru
BruceSteers is in the usergroup ‘Guru’
TextEditor has now been made to handle the useless UTF-8 BOM
TextEditor: Support for BOM character. (0df59ada) · Commits · Gambas / gambas · GitLab

Benoit said this…

Benoit Minisini said

I don't get an error with a file starting with UTF-8 BOM, just an
invisible character at the beginning of the first line.

Note that BOM is a Windows thing created by moronic developers that did
not understand UTF-8. BOM is useless in UTF-8, as there is no byte order
in UTF-8.
Online now: No Back to the top
1 guest and 0 members have just viewed this.