Ch. 13 Data Representation
13.1 User-defined data types
Non-composite data types
- Does not reference another data type
- Built-in examples: integer, real
Enumerated data type
- Example of a user defined and non-composite data type
- All of the possible values are identified
Pseudocode example:
TYPE
DECLARE TDirections = (North, East, South, West)
DECLARE TDays = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)
ENDTYPE
DECLARE Direction1 : TDirections
DECLARE StartDay : TDays
Direction1 <- North
StartDay <- Wednesday- They look like string values but they are not, so they must not be enclosed in quotes
- The values are ordinal, they have an implied order of values
- That’s why they can be compared, for example:
IF Day > Friday:
Weekend = TRUEPointer data type
- The value is a reference to a memory location
TYPE <identifier> = ^<data type>Composite data types
- Refers to any data type in its type definition
Record data type (AS)
Pseudocode example:
TYPE StudentRecord
DECLARE LastName : STRING
DECLARE FirstName : STRING
DECLARE DateOfBirth : DATE
DECLARE YearGroup : INTEGER
DECLARE FormGroup : CHAR
ENDTYPESet data type
Pseudocode example:
TYPE LetterSet = SET OF CHAR
DEFINE Vowels ('A', 'E', 'I', 'O', 'U') : LetterSetClass data type
- Used for an object in object-oriented programming
13.2 File organisation and access
File organisation
Serial file organisation
- Physically stores records of a data in file one after another, in the order they were added to the file
- New records are appended to the end of the file
- Mostly used for temporary files storing transactions to be made to more permanent files
Sequential file organisation
- Physically stores records of data one after another, usually based on the key field of the records as this is a unique identifier
Random file organisation (Direct-access files)
- Physically stores records of data in a file in any available position
- Position of the file is set using a hashing algorithm on the key field
- If the same address is calculated for different field values, a collision occurs
- Best way to solve this is to have a linked list accessible from each address
File access
Sequential access
- Searches for records one after another from the start of the file until the required record is found
- For a serial file, every record needs to be checked until the record is found
- For a sequential file, every record needs to be checked until the key field being checked is greater than the current key field
Direct access
- Can physically find a record in a file without other records being read
- Sequential and random files can be directly accessed
- For a sequential file, an index of all key fields is kept and used to look up address of the file location
- For a random access file, a hashing algorithm is used on the key field to calculate the address of the file location where a given record is stored
Hashing algorithms
- A hashing algorithm is a mathematical formula used to perform a calculation on the key field of the record
- A collision can occur if the key fields have the same hash
- In such case, we can:
- Use an open hash, where the record is stored in the next free space
- Use a closed hash where an overflow area is set up and the record is stored in the next free space in the overflow area
- Linked list approach (chaining) is NOT in CAIE mark scheme
- Hashing algorithms can be used to calculate address from names, for example, by adding up the ASCII values for every character in a name and dividing by the number of locations
13.3 Floating-point numbers, representation and manipulation
Floating-point number representation
- Consists of mantissa and exponent
- The mantissa dictates the precision of a number, the more bits allocated to the mantissa, the more precise a number can be
- The exponent dictates the range of numbers that can be represented
Potential rounding errors and approximations
- Some numbers can only be represented as an approximate value
- For example, a number such as 5.88 can lead to a rounding error because it is impossible to convert the denary number into an exact binary equivalent
- It can be converted by the “multiply by two and record whole number parts” method
.88 × 2 = 1.76 so we will use the 1 value to give 0.1
.76 × 2 = 1.52 so we will use the 1 value to give 0.11
.52 × 2 = 1.04 so we will use the 1 value to give 0.111
.04 × 2 = 0.08 so we will use the 0 value to give 0.1110
.08 × 2 = 0.16 so we will use the 0 value to give 0.11100
.16 × 2 = 0.32 so we will use the 0 value to give 0.111000
.32 × 2 = 0.64 so we will use the 0 value to give 0.1110000
.64 × 2 = 1.28 so we will use the 1 value to give 0.11100001
Normalisation
- There can be multiple representations for the same number
- Normalisation is used to solve this problem
- Positive numbers are shifted until the mantissa starts with 0.1
- Negative numbers are shifted until the mantissa starts with 1.0
- The accuracy of a number is determined by the number of bits in the mantissa
- The range of numbers is determined by the number of bits used in the exponent
Floating-point problems
- Overflow error will be produced when a calculation produces a number which exceeds the maximum possible value (the value being too large)
- Underflow error will be produced when dividing by a very large number (the value being too small)
- Zero error will be produced when a “zero” has to be stored; binary floating-point numbers cannot store the value zero