UTF8 BOM mark crashes gb.form.editor
Posted
#1
(In Topic #1489)
Regular

UTF-8 encoding supports the add of a BOM mark at the firts 3 bytes of an UTF-8 file
These chars are \xEF\xBB\xBF and it is a "signature" that the document is formally UTF-8.
Although this BOM mark is optional in UTF-8, some documents may have it, so I think that it should be treated by Mr Benoir because if a document contais it, it will crash gb.form.editor. The crash message says it can't render the "image". I am not sure, but it seems that gambas tries to convert BOM mark in a utf-8 visible symbol.
So the solution is to SKIP these 3 bytes if they are present at the beginning of a document.
Code
'Suppose you got data from the file... in a variable called data:
If Left(data, 3) == "\xEF\xBB\xBF" Then
data = Mid( data, 4) ' Skips the BOM Mark, making the data safe for gambas
EndIf
'From here the data will be "clean"
Regards.
Sergio Abreu - Brazil
Posted
Administrator

sergioabreu said
Hello
UTF-8 encoding supports the add of a BOM mark at the firts 3 bytes of an UTF-8 file
These chars are \xEF\xBB\xBF and it is a "signature" that the document is formally UTF-8.
Although this BOM mark is optional in UTF-8, some documents may have it, so I think that it should be treated by Mr Benoir because if a document contais it, it will crash gb.form.editor. The crash message says it can't render the "image". I am not sure, but it seems that gambas tries to convert BOM mark in a utf-8 visible symbol.
So the solution is to SKIP these 3 bytes if they are present at the beginning of a document.This post is kind a hidden bug report. I am NOT a critic of gambas at all, totally the opposite: I am an enthusiast of it and wanna help to make gambas better and better.Code
'Suppose you got data from the file... in a variable called data:
If Left(data, 3) == "\xEF\xBB\xBF" Then
data = Mid( data, 4) ' Skips the BOM Mark, making the data safe for gambas
EndIf
'From here the data will be "clean"
Regards.
Sergio Abreu - Brazil
Thanks for the heads up! That information might help out someone.
If you think it's a bug, then you need to report it on the Gambas bug tracker. Reporting there will be the only way it can get fixed.
Posted
Guru

TextEditor: Support for BOM character. (0df59ada) · Commits · Gambas / gambas · GitLab
Benoit said this…
Benoit Minisini said
I don't get an error with a file starting with UTF-8 BOM, just an
invisible character at the beginning of the first line.
Note that BOM is a Windows thing created by moronic developers that did
not understand UTF-8. BOM is useless in UTF-8, as there is no byte order
in UTF-8.
1 guest and 0 members have just viewed this.

