Ch. 13 Data Representation

13.1 User-defined data types

Non-composite data types

Does not reference another data type

Built-in examples: integer, real

Enumerated data type

Example of a user defined and non-composite data type

All of the possible values are identified

Pseudocode example:

TYPE
	DECLARE TDirections = (North, East, South, West)
	DECLARE TDays = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)
ENDTYPE

DECLARE Direction1 : TDirections
DECLARE StartDay : TDays
Direction1 <- North
StartDay <- Wednesday

They look like string values but they are not, so they must not be enclosed in quotes

The values are ordinal, they have an implied order of values

That’s why they can be compared, for example:

IF Day > Friday:
	Weekend = TRUE

Pointer data type

The value is a reference to a memory location

TYPE <identifier> = ^<data type>

Composite data types

Refers to any data type in its type definition

Record data type (AS)

Pseudocode example:

TYPE StudentRecord
	DECLARE LastName : STRING
	DECLARE FirstName : STRING
	DECLARE DateOfBirth : DATE
	DECLARE YearGroup : INTEGER
	DECLARE FormGroup : CHAR
ENDTYPE

Set data type

Pseudocode example:

TYPE LetterSet = SET OF CHAR
DEFINE Vowels ('A', 'E', 'I', 'O', 'U') : LetterSet

Class data type

Used for an object in object-oriented programming

13.2 File organisation and access

File organisation

Serial file organisation

Physically stores records of a data in file one after another, in the order they were added to the file

New records are appended to the end of the file

Mostly used for temporary files storing transactions to be made to more permanent files

Sequential file organisation

Physically stores records of data one after another, usually based on the key field of the records as this is a unique identifier

Random file organisation (Direct-access files)

Physically stores records of data in a file in any available position

Position of the file is set using a hashing algorithm on the key field

If the same address is calculated for different field values, a collision occurs

Best way to solve this is to have a linked list accessible from each address

File access

Sequential access

Searches for records one after another from the start of the file until the required record is found

For a serial file, every record needs to be checked until the record is found

For a sequential file, every record needs to be checked until the key field being checked is greater than the current key field

Direct access

Can physically find a record in a file without other records being read

Sequential and random files can be directly accessed

For a sequential file, an index of all key fields is kept and used to look up address of the file location

For a random access file, a hashing algorithm is used on the key field to calculate the address of the file location where a given record is stored

Hashing algorithms

A hashing algorithm is a mathematical formula used to perform a calculation on the key field of the record

A collision can occur if the key fields have the same hash

In such case, we can:
- Use an open hash, where the record is stored in the next free space
- Use a closed hash where an overflow area is set up and the record is stored in the next free space in the overflow area
- Linked list approach (chaining) is NOT in CAIE mark scheme

Hashing algorithms can be used to calculate address from names, for example, by adding up the ASCII values for every character in a name and dividing by the number of locations

13.3 Floating-point numbers, representation and manipulation

Floating-point number representation

Consists of mantissa and exponent

The mantissa dictates the precision of a number, the more bits allocated to the mantissa, the more precise a number can be

The exponent dictates the range of numbers that can be represented

Potential rounding errors and approximations

Some numbers can only be represented as an approximate value

For example, a number such as 5.88 can lead to a rounding error because it is impossible to convert the denary number into an exact binary equivalent

It can be converted by the “multiply by two and record whole number parts” method
.88 × 2 = 1.76 so we will use the 1 value to give 0.1
.76 × 2 = 1.52 so we will use the 1 value to give 0.11
.52 × 2 = 1.04 so we will use the 1 value to give 0.111
.04 × 2 = 0.08 so we will use the 0 value to give 0.1110
.08 × 2 = 0.16 so we will use the 0 value to give 0.11100
.16 × 2 = 0.32 so we will use the 0 value to give 0.111000
.32 × 2 = 0.64 so we will use the 0 value to give 0.1110000
.64 × 2 = 1.28 so we will use the 1 value to give 0.11100001

Normalisation

There can be multiple representations for the same number

Normalisation is used to solve this problem

Positive numbers are shifted until the mantissa starts with 0.1

Negative numbers are shifted until the mantissa starts with 1.0

The accuracy of a number is determined by the number of bits in the mantissa

The range of numbers is determined by the number of bits used in the exponent

Floating-point problems

Overflow error will be produced when a calculation produces a number which exceeds the maximum possible value (the value being too large)

Underflow error will be produced when dividing by a very large number (the value being too small)

Zero error will be produced when a “zero” has to be stored; binary floating-point numbers cannot store the value zero