A
computer can use only two kinds of values. That is, fixed point and
floating point. The fixed point values are stored in the computer
memory in binary format representing their ASCII value.

For
example:-

Character
‘A’ can be stored as- 1000001. Because, 65 is ASCII value of ‘a’.
In case of floating point values, these follow the IEEE 754
standard to store in memory. Whenever any programming language
declared-

*float a*; Then the variable 'a's value will be stored in memory by following IEEE 754 standard.
This
standard specifies the single precision and double precision format.
In case of C, C++ and Java,

*float*and*double*data types specify the single and double precision which requires 32 bits (4-bytes) and 64 bits (8-bytes) respectively to store the data.
Lets
have a look at these precision formats.

Single
Precision:-

It
requires 32 bit to store. Following is the format of single
precision.

In
order to store a float value in computer memory, a specified
algorithm is followed.

Take
an example at float value- 3948.125

- Covert 3948 to binary. i.e. 111101101100
- Convert .125 to binary,

0.125
x 2 = 0.25 0

0.25
x 2 = 0.5 0

0.5
x 2 = 1 1
=
0.001

Now
3948.125 = 111101101100.001

- Normalize the number so that the decimal point will be placed after MSB-1. i.e.

111101101100.001
= 1.11101101100001 x 2

^{11}- Now, for this number s=0, as the number is positive.

Exponent'
= 11 and

Mantissa
= 11101101100001

- Bias for single precision used is 127 so,

Final
exponent = exponent' + 127 i.e.

E=
11 + 127= 138 = 10001010 in binary.

- Final value-

In
this format the number 3948.125 will be stored in main memory.

For
double precision values following changes are expected:

Total
bits required – 64

Exponent
– 11 bits

Mantissa
– 52 bits

Bias
value – 1023

Now,
if you want to find the IEEE 754 representation at any floating point
number, following program can be used.

**#include<stdio.h>**

**int binary(int n, int i)**

**{**

**int k;**

**for (i--; i >= 0; i--)**

**{**

**k = n >> i;**

**if (k & 1)**

**printf("1");**

**else**

**printf("0");**

**}**

**}**

**typedef union**

**{**

**float f;**

**struct**

**{**

**unsigned int mantissa : 23;**

**unsigned int exponent : 8;**

**unsigned int sign : 1;**

**} field;**

**} myfloat;**

**int main()**

**{**

**myfloat var;**

**printf("Enter any float number: ");**

**scanf("%f",&var.f);**

**printf("%d ",var.field.sign);**

**binary(var.field.exponent, 8);**

**printf(" ");**

**binary(var.field.mantissa, 23);**

**printf("\n");**

**return 0;**

**}**

Explanation-

The
function binary( ) is used to convert the number ‘n’ into binary
format and print its ‘i’ number of bits.

In
C, structure members can be specified with no. of bits with size. It
is known as

*bit**fields*. As ‘*f**loat f*’ is declared in ‘*union**myfloat*’. It can use 23 bits to store mantissa exponent can use 8 and sign can use one! The variable ‘*var*’ is at*myfloat*type. So, in order to access mantissa, we can use ‘*var.field*.*mantissa*’. Here, mantissa is the name of internal structure. So, float value’s internal bits can be accessed bitwise with*sign*,*exponent*and*mantissa*separately.
Run
the program and see the output of the said example!

## No comments:

## Post a Comment