Exposing bytes for what they really are

Post

Posted
Rating:
#1 (In Topic #946)
Avatar
Regular
Cedron is in the usergroup ‘Regular’
Bytes are the fundamental unit of storage in most modern computers.  They are composed of eight bits, which can be considered two groupings of four.  For convenience sake, binary numbers can be written in shorthand using hex numbers.  Each grouping of four corresponds to a hex digit.  The these two groupings in a byte are sometimes referred to as nybbles.

ASCII (American Standard Code for Information Interchange) is the mapping of the byte values from 0 to 127 to the common printed characters.  The values from 0 to 31 are control characters, meaning they coordinate transmission on a teletype.  The values from 48 to 57 are the decimal digit characters "0" to "9".  The values from 65 to 90 are "A" to "Z", and 97 to 122 are "a" to "z".

Gambas makes dealing with byte value fairly simple.  The sample program and its output demonstrate some of the concepts and syntax.

Code (gambas)

  1. '=============================================================================
  2. Public Sub Main()
  3.  
  4. '---- Sample string of ordinary characters
  5.  
  6.         Dim theSample As String = "0123 ABCD Gambas Zz"
  7.        
  8.         For p As Integer = 1 To Len(theSample)
  9.           Dim theByteValue As Byte = Asc(Mid(theSample, p, 1))
  10.           DisplayByteValue(theByteValue)
  11.         Next
  12.  
  13. '---- Some ASCII characters
  14.  
  15.         Print
  16.         For d As Integer = 0 To 5
  17.           Print d, Chr(48 + d), Chr(64 + d), Chr(96 + d)
  18.         Next
  19.  
  20. '---- Some special characters
  21.        
  22.         Print
  23.         Print "   Null aka \\0 = "; Asc("\0")
  24.         Print "   Tab  aka \\t = "; Asc("\t"), Asc(gb.Tab)
  25.         Print "   LF   aka \\n = "; Asc("\n"), Asc(gb.Lf)
  26.         Print "   CR   aka \\r = "; Asc("\r"), Asc(gb.Cr)
  27.  
  28. '=============================================================================
  29. Private Sub DisplayByteValue(argByteValue As Byte)
  30.  
  31.         Print Chr(argByteValue); "  ";
  32.         Print Right("00000000" & Bin(argByteValue), 8); "   ";
  33.         Print "&"; Right("00" & Hex(argByteValue), 2); "&   ";
  34.         Print Right("   " & Str(argByteValue), 3); "     ";
  35.         Print HexSums(argByteValue);
  36.         Print BinarySums(argByteValue)
  37.  
  38. '=============================================================================
  39. Private Sub HexSums(argByteValue As Byte) As String
  40.  
  41.         Dim theHighNybble As Byte = Shr(argByteValue, 4)
  42.         Dim theLowNybble As Byte = argByteValue And &0F&
  43.        
  44.         Dim r As String = Str(theHighNybble) & " * 16 + " & Str(theLowNybble)
  45.  
  46.         Return Left(r & "                      ", 16)
  47.  
  48. '=============================================================================
  49. Private Sub BinarySums(argByteValue As Byte) As String
  50.  
  51.         If argByteValue = 0 Then Return "0"
  52.  
  53.         Dim theMaskValue As Integer = 128 ' 100000000b
  54.        
  55.         Dim theResult As String
  56.        
  57.         For b As Integer = 7 To 0 Step -1
  58.           If (argByteValue And theMaskValue) > 0 Then
  59.              theResult &= " + " & Str(theMaskValue)
  60.           End If    
  61.            '  theMaskValue = Shr(theMaskValue, 1)
  62.                       theMaskValue /= 2
  63.         Next
  64.  
  65.         Return Mid(theResult, 4)
  66. '=============================================================================
  67.  
Here is the output:

Code

0  00110000   &30&    48     3 * 16 + 0      32 + 16
1  00110001   &31&    49     3 * 16 + 1      32 + 16 + 1
2  00110010   &32&    50     3 * 16 + 2      32 + 16 + 2
3  00110011   &33&    51     3 * 16 + 3      32 + 16 + 2 + 1
   00100000   &20&    32     2 * 16 + 0      32
A  01000001   &41&    65     4 * 16 + 1      64 + 1
B  01000010   &42&    66     4 * 16 + 2      64 + 2
C  01000011   &43&    67     4 * 16 + 3      64 + 2 + 1
D  01000100   &44&    68     4 * 16 + 4      64 + 4
   00100000   &20&    32     2 * 16 + 0      32
G  01000111   &47&    71     4 * 16 + 7      64 + 4 + 2 + 1
a  01100001   &61&    97     6 * 16 + 1      64 + 32 + 1
m  01101101   &6D&   109     6 * 16 + 13     64 + 32 + 8 + 4 + 1
b  01100010   &62&    98     6 * 16 + 2      64 + 32 + 2
a  01100001   &61&    97     6 * 16 + 1      64 + 32 + 1
s  01110011   &73&   115     7 * 16 + 3      64 + 32 + 16 + 2 + 1
   00100000   &20&    32     2 * 16 + 0      32
Z  01011010   &5A&    90     5 * 16 + 10     64 + 16 + 8 + 2
z  01111010   &7A&   122     7 * 16 + 10     64 + 32 + 16 + 8 + 2

0       0       @       `
1       1       A       a
2       2       B       b
3       3       C       c
4       4       D       d
5       5       E       e

   Null aka \0 = 0
   Tab  aka \t = 9      9
   LF   aka \n = 10     10
   CR   aka \r = 13     13


Here is a related post from long ago:

https://forum.gambas.one/viewtopic.php?p=1553

.... and carry a big stick!
Online now: No Back to the top

Post

Posted
Rating:
#2
Avatar
Regular
Cedron is in the usergroup ‘Regular’
You may have wondered why I wrapped my hex values in ampersands rather than use the traditional '&H' prefix.

Consider the following Gambas code:

Code (gambas)

  1.  
  2.         Print Val("&Babe&"), &Babe&
  3.  
  4.         Print Val("&HBabe"), &HBabe
  5.  
  6.         Print Val("&HBabe&"), &HBabe&
  7.  
  8.  
A = 10, B = 11, and E = 14 so the result should be 11*16^3 + 10*16^2 + 11*16 + 14, right?

The experts snicker.

Somewhere, stored in memory, it would look something like this:

Code

          Address  Hex    Binary
          ::::::::
          ######9E ??
          ######9F ??
Varptr--> ######A0 BE  1011 1110
          ######A1 BA  1011 1010
          ######A2 ??
          ######A3 ??
          ######A4 ??
          ######A5 ??
          ::::::::
Think of it as a big byte array where the address is the index.

This is a little endian representation of a two byte integer value.  In Gambas this is the variable type 'Short'.  The common integer type is a four byte version, also little endian.  (BTW, serial transmissions are also little endian bitwise, with the most significant bit, often used as a parity bit, comes last.)

In a signed integer variable, the highest order bit determines the sign.  If it is set, the number is negative.  The difference between signed and unsigned integers can be understood by looking at a three bit example.

Code

    000    0   0
    001    1   1
    010    2   2
    011    3   3
    100    4  -4
    101    5  -3
    110    6  -2
    111    7  -1
Same bit patterns, different interpretations.  Now, if you want to store the three bit value in an eight bit byte, you would put the three bits in the lowest positions.

Code

    00000XXX
As a byte, those will have values strictly between 0 and 7 inclusive.  As a signed conversion, the highest order bit needs to be "sign extended", so the results look like this:

Code

    000000XX  Positive values
    111111XX  Negative values
When short integers values are put into integer variables, they are sign extended.

Code

          ######9F ??
Varptr--> ######A0 BE  1011 1110
          ######A1 BA  1011 1010
          ######A2 FF  1111 1111
          ######A3 FF  1111 1111
          ######A4 ??
But what does the "Val" function do with a value represented by a string of characters?

Let's check the output of the program:

Code

Priceless!   Priceless!
Priceless!   Priceless!
Priceless!   Priceless!
Okay, I've got a strange computer, yours probably printed numbers.

.... and carry a big stick!
Online now: No Back to the top

Post

Posted
Rating:
#3
Avatar
Regular
Cedron is in the usergroup ‘Regular’
Just for completeness, here are the printable* ASCII characters, arranged by their byte value codes.

Code

                   0 1 2 3 4 5 6 7 8 9 A B C D E F

00100000 &20&  32    ! " # $ % & ' ( ) * + , - . /
00110000 &30&  48  0 1 2 3 4 5 6 7 8 9 : ; < = > ?
01000000 &40&  64  @ A B C D E F G H I J K L M N O
01010000 &50&  80  P Q R S T U V W X Y Z [ \ ] ^ _
01100000 &60&  96  ` a b c d e f g h i j k l m n o
01110000 &70& 112  p q r s t u v w x y z \{ | } ~ 
(*) Note, 127 isn't exactly printable.  It is known as "rub out".

Here is the code that produced it.

Code (gambas)

  1.         Dim theHighNybble, theLowNybble, theHighValue, theByteValue As Integer
  2.  
  3.         Print "                  ";
  4.  
  5.         For theLowNybble = 0 To 15
  6.          Print " "; Hex(theLowNybble);
  7.         Next
  8.  
  9.         Print
  10.         Print
  11.  
  12.         For theHighNybble = 2 To 7
  13.           theHighValue = theHighNybble * 16 ' &10&  00010000b  shl by 4
  14.  
  15.           Print Right("00000000" & Bin(theHighValue), 8); " ";
  16.           Print "&" & Right("00" & Hex(theHighValue), 2); "& ";
  17.           Print Right("        " & Str(theHighValue), 3); " ";
  18.  
  19.           For theLowNybble = 0 To 15
  20.             theByteValue = theHighValue + theLowNybble  
  21.             Print " "; Chr(theByteValue);
  22.           Next
  23.  
  24.          Print
  25.         Next    
  26.  
  27.  

.... and carry a big stick!
Online now: No Back to the top

Post

Posted
Rating:
#4
Avatar
Regular
Cedron is in the usergroup ‘Regular’
Here is a demonstration of little endianess, and the answer to the above.

Priceless.

Code (gambas)

  1. '=============================================================================
  2. Public Sub Main()
  3.  
  4.         DisplayMemoryOfInteger(&Babe&)
  5.         DisplayMemoryOfInteger(&HBabe)
  6.  
  7. '=============================================================================
  8. Public Sub DisplayMemoryOfInteger(ArgIntegerValue As Integer)
  9.  
  10.         Dim theAddress As Pointer = VarPtr(ArgIntegerValue)
  11.  
  12.         Print
  13.        
  14.         For m As Integer = 0 To 3
  15.           Dim theByteValue As Byte = Byte@(theAddress)
  16.           Dim theBinary As String = Right("00000000" & Bin(theByteValue), 8)
  17.        
  18.           Print Hex(theAddress); ": "; Right("00" & Hex(theByteValue), 2);
  19.           Print "  "; Left(theBinary, 4); " "; Right(theBinary, 4)
  20.          
  21.           Inc theAddress
  22.         Next
  23.  
  24. '=============================================================================
  25.  

Code

FFFF92797028: BE  1011 1110
FFFF92797029: BA  1011 1010
FFFF9279702A: 00  0000 0000
FFFF9279702B: 00  0000 0000

FFFF92797028: BE  1011 1110
FFFF92797029: BA  1011 1010
FFFF9279702A: FF  1111 1111
FFFF9279702B: FF  1111 1111
Negative sign?  Did anybody see a negative sign?  I always like to treat hex constants as unsigned integers.

Code (gambas)

  1. Print -&100&, -&FF&
  2.  
Those are negative signs, and they work as expected.

Code

-256    -255
Gambas treats Bytes as unsigned.

Code (gambas)

  1. Dim theByte As Byte = -1
  2.  
  3. Print theByte, Bin(theByte)
  4.  

Code

255     11111111
Here is the official narrative on the matter:
http://gambaswiki.org/wiki/lang/type/integer

.... and carry a big stick!
Online now: No Back to the top

Post

Posted
Rating:
#5
Avatar
Regular
Cedron is in the usergroup ‘Regular’
This example should convince you that using the &form&, like quotation marks, rather than the &Hform is a good practice.  Besides, in ordinary syntax with constant values, it makes for better looking code, as in easier to read and understand.

Code (gambas)

  1. Print Hex(15), Val("&" & Hex(15) & "&"), Val("&H" & Hex(15))
  2. Print Hex(250), Val("&" & Hex(250) & "&"), Val("&H" & Hex(250))
  3. Print Hex(4013), Val("&" & Hex(4013) & "&"), Val("&H" & Hex(4013))
  4. Print Hex(64222), Val("&" & Hex(64222) & "&"), Val("&H" & Hex(64222))
  5. Print Hex(1027565), Val("&" & Hex(1027565) & "&"), Val("&H" & Hex(1027565))
  6.  

Code

F       15      15
FA      250     250
FAD     4013    4013
FADE    64222   -1314
FADED   1027565 1027565
Spot the "Gotcha!" lurking in there?


Here is another illustration of the boundary, (or the odometer rollover), for signed integers.

Code (gambas)

  1.         For theValueAsInteger As Integer = -4 To 3
  2.           Dim theByte As Byte = theValueAsInteger
  3.           Dim theShort As Short = theValueAsInteger
  4.           Dim theLong As Long = theValueAsInteger
  5.        
  6.           Print theValueAsInteger,
  7.           Print Bin(theValueAsInteger, 3); "  ";
  8.           Print Bin(theByte, 8); "  ";
  9.           Print Hex(theByte, 2); "  ";
  10.           Print Hex(theByte),
  11.           Print Hex(theShort); " ";
  12.           Print Hex(theValueAsInteger); " ";
  13.           Print Hex(theLong)
  14.         Next
  15.  
Note the quirky behavior of Hex(Long) vs Hex(Byte).

Code

-4      100  11111100  FC  FC   FFFFFFFFFFFFFFFC FFFFFFFFFFFFFFFC FFFFFFFFFFFFFFFC
-3      101  11111101  FD  FD   FFFFFFFFFFFFFFFD FFFFFFFFFFFFFFFD FFFFFFFFFFFFFFFD
-2      110  11111110  FE  FE   FFFFFFFFFFFFFFFE FFFFFFFFFFFFFFFE FFFFFFFFFFFFFFFE
-1      111  11111111  FF  FF   FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFF
0       000  00000000  00  0    0 0 0
1       001  00000001  01  1    1 1 1
2       010  00000010  02  2    2 2 2
3       011  00000011  03  3    3 3 3
Just like if you took a brand new car with zero miles, drove it in reverse for a mile (with an odometer that allowed rollback) it would read "999999" for however many digits there are.

Or like grouping decimal numbers with commas (U.S. style) to effectively make a base 1000 numbering system.

In summary:

Code

Number       Function     String of characters

Byte     ---->   Chr        ---->  Character          
Byte     <----   Asc        <----  Character          

Integer  ---->   Str        ---->  Text Decimal Representation
Integer  <----   Val        <----  Text Decimal Representation

Integer  ---->   Hex        ---->  Text Hexadecimal Representation
Integer  <----   Val(& &)   <----  Text Hexadecimal Representation
These functions, and more, can be found at the language index page of the Gambas Wiki:

http://gambaswiki.org/wiki/lang

That's where I found that bin and hex can take a second argument specifying the zeropadded length.  The previous code in this post would look better converted, but I'm not going to change them.

The next steps are how floating point numbers can be stored in the same bit patterns, and on the character side, how Utf-8 works.  Fixed point formats are just integers with an implied whole/fraction partition.  ("Decimal point" doesn't fit, and "Binary point" just doesn't seem to apply.)

Then how strings and objects are stored.  After that, you'll be ready to write, or at least understand, function calls to shared libraries.  Even write shared libraries of your own.

.... and carry a big stick!
Online now: No Back to the top
1 guest and 0 members have just viewed this.