-
Notifications
You must be signed in to change notification settings - Fork 54
Builtin types
Mercury has a small number of built-in types.
For the precise details of any literal syntax described below, please see the reference manual. The standard library documentation is available here.
The int
type represents signed integer numbers.
Integer literals are written in decimal, hexadecimal, octal or binary like this:
1234
-1234
0xabcd hexadecimal
0o177 octal
0b11011011 binary
Recent versions of the compiler also allow underscores within integer literals to improve readability.
int
values may be tested for equality with the =
operator (unification),
and compared for inequality with the \=
operator.
Other common operations are available in the int
standard library module,
e.g.
< > =< >=
+ - * /
mod
rem
<< left shift
>> right shift
/\ bitwise and
\/ bitwise or
\ bitwise complement
The less-than-or-greater operator is =<
. The <=
operator
is used as an arrow elsewhere.
/\
and \/
are the bitwise AND and OR operators, being the closest ASCII
approximations of the logical conjunction (∧) and disjunction (∨) symbols.
The width and range of int
values depends on the compilation target language
and platform. When targeting C, int
may be 32 or 64 bits wide.
When targeting Java or C# int
is 32 bits wide, in line with the int
type in those languages.
32-bit integers range from -232 to 232-1 while 64-bit integers range
from -264 to 264-1 (assuming two's complement representation).
The float
type represents double-precision floating point values (64-bits).
Floating point numbers are written in the conventional manner of programming
languages:
1.414
1414e-3
0.01414E2
Recent versions of the compiler allow underscores within float literals to improve readability.
Common operations are available in the float
standard library module.
Some higher-level functions are available in the math
module (e.g. sin,
cos, sqrt).
float
values can be tested for equality and inequality using =
and
\=
operators. Note the usual warnings about comparing floating point
numbers for equality.
The string
type represents a sequence of Unicode characters.
A string literal is written as a sequence of characters between double quotes:
"this is a string"
String literals may contain backslash escapes.
\a stands for "alert" (a beep character),
\b for backspace,
\r for carriage-return,
\f for form-feed,
\t for tab,
\n for newline,
\v for vertical-tab.
\\ for backslash
\' for single quote
\" for double quote
\xXXXX\ for the character code identified by hexadecimal XXXX
\oOOO\ for the character code identified by octal OOO
\uXXXX for Unicode character U+XXXX (four hexadecimal digits)
\UXXXXXXXX for Unicode character U+XXXXXXXX (eight hexadecimal digits)
Another way to include a double quote in a string is to use two adjacent double quotes:
"I said ""oops""."
Strings literals may continue over multiple lines. A backslash followed immediately by a newline will be deleted (both the backslash and the newline will not be included in the string).
Basic operations on strings are available in the string
library module.
For example, there are predicates and functions to:
- append or join strings (++, append, join_list)
- return the length of a string (count_codepoints, count_code_units, length)
- index code points or code units (index, unsafe_index_code_unit)
- skip to the next or previous code point (index_next, prev_index)
- convert between a string and list of code points (to_char_list, from_char_list)
- convert between a string and list of code units (to_code_unit_list, from_code_unit_list)
- search for sub-strings (sub_string_search)
- replace sub-strings (replace, replace_all)
- split strings
- strip whitespace
As usual, string values can be tested for equality or inequality using the
operators =
and \=
.
The internal encoding of a string depends on the target language for interoperability with code written in those languages. When targeting C, strings use UTF-8 encoding and are terminated with a NUL character (so code units are 8-bit integers and null characters are not allowed). When targeting Java or C#, strings are the same as Java or C# strings and use UTF-16 encoding (so code units are 16-bit integers). Usually you would not work with code units directly, so it is not hard to write code that works for either encoding.
The character
type represents a Unicode code point. There is a
type alias char
in the char
standard library module which is more
commonly used. The name character
exists for historical reasons.
There is no dedicated syntax for character literals. Rather, they are written as a single-character name. You can write characters in one of these ways:
c works for lower case and certain other characters
'H' works for upper case and almost all characters
'''' represents a single quote
'\n' backslash escapes are allowed
In some contexts it will be necessary to wrap an unquoted or quoted character within parentheses. When in doubt, you can treat this as the character literal syntax:
('c')
That is, a character or character escape sequence within single quotes, within parentheses. It always works and is instantly recognisable.
Character values can be tested for equality or inequality using the operators
=
and \=
. Other basic operations are available in the char
module.
A tuple term is a compound data term, written as curly brackets surrounding argument terms separated by commas. A tuple type is written as curly brackets surrounding argument types separated by commas.
{}
is the tuple term with zero arguments. It has the type {}
.
{1}
is a tuple term with one argument of type int
.
It has the type {int}
.
{1, "apple"}
is a tuple term with two arguments.
It has the type {int, string}
.
{{1, "apple"}, Price}
is a tuple term with type {{int, string}, float}
,
if Price
has type float
.
So we can build up tuples. To get the arguments of a tuple, we can unify it with a term containing variables not yet bound to any values ("free"):
Pair = {First, Second}
or, equivalently:
{First, Second} = Pair
Assume Pair
is {1, "apple"}
. Unification proceeds like this.
(Note that it happens at compile time.)
{1, "apple"} = {First, Second}
To unify two tuple terms, both tuples must have the same number of arguments (the same arity). Actually, Mercury requires both sides of the unification to have the same type so trying to unify tuples with different arities would be a compile-time error. The next step is to unify each of the corresponding arguments:
1 = First,
"apple" = Second
If First
and Second
are both free then these unifications must succeed,
and then we can refer to the arguments of Pair
.
What if we don't need some of the arguments? Well, we could unify any unneeded argument with a new variable and simply not refer to it again. However, the Mercury compiler will warn about a variable name that only occurs once in its scope, in case it was unintentional:
In clause for predicate `pair.main'/2:
warning: variable `Second' occurs only once in this scope.
To tell the compiler it was intentional, you can give the variable a
name beginning with underscore. The compiler will not warn if such a
variable occurs only once in its scope. In fact, the compiler will warn
if the variable occurs more than once. Also, each occurrence of the
token _
(a single underscore) is treated as a distinct variable so
that you can write _
without making up distinct names for "don't-care"
variables.
You might have guessed that comparing tuple terms for equality is done through unification.
We have already seen the io
type which represents the state of the
world. Its real name is io.state
(that's a module-qualified name:
the type state
from the io
module). io.state
is too long for
such a commonly used type so the alias io
is preferred.
There is not much you can do with an io.state
value except pass
it around. main
starts with an initial state of the world and must
produce a final state of the world. Any predicate that performs I/O,
or intends to perform I/O, must do likewise.
Mercury does not feature implicit type conversions. Conversions must be done explicitly through predicate or function calls.
Here are some of the possible conversions between the primitive types. In many cases, there are both functions and predicates available which do the same thing. Sometimes there are multiple functions or predicates that do that same thing. We use module-qualified names below so you know which modules are required, but you may prefer to leave off the module qualifiers if it's clear enough.
-
int to float
Flt = float.float(Int)
-
float to int
Int = float.ceiling_to_int(Flt) Int = float.floor_to_int(Flt) Int = float.round_to_int(Flt) Int = float.truncate_to_int(Flt)
-
int to string
Str = string.from_int(Int) Str = string.int_to_base_string(Int, Base)
-
float to string
Str = string.from_float(Flt)
-
string to int
string.to_int(Str, Int) % can fail string.base_string_to_int(Base, Str, Int) % can fail
-
string to float
string.to_float(Str, Flt) % can fail
-
char to int
Int = char.to_int(Char)
-
int to char
char.from_int(Int, Char) % can fail
-
char to string
Str = string.from_char(Char)
- Mercury depends on the axiom "∀x. x=x" (for all X, X = X). On the other hand, the IEEE floating-point standard includes NaN values (not a number) which "compare unordered with everything, including itself". What to do?