Skip to content
Jinwoo Lee edited this page May 4, 2019 · 3 revisions

Built-in types

Mercury has a small number of built-in types.

For the precise details of any literal syntax described below, please see the reference manual. The standard library documentation is available here.

int

The int type represents signed integer numbers.

Integer literals are written in decimal, hexadecimal, octal or binary like this:

1234
-1234
0xabcd          hexadecimal
0o177           octal
0b11011011      binary

Recent versions of the compiler also allow underscores within integer literals to improve readability.

int values may be tested for equality with the = operator (unification), and compared for inequality with the \= operator. Other common operations are available in the int standard library module, e.g.

< > =< >=
+ - * /
mod
rem
<<          left shift
>>          right shift
/\          bitwise and
\/          bitwise or
\           bitwise complement

The less-than-or-greater operator is =<. The <= operator is used as an arrow elsewhere.

/\ and \/ are the bitwise AND and OR operators, being the closest ASCII approximations of the logical conjunction (∧) and disjunction (∨) symbols.

The width and range of int values depends on the compilation target language and platform. When targeting C, int may be 32 or 64 bits wide. When targeting Java or C# int is 32 bits wide, in line with the int type in those languages. 32-bit integers range from -232 to 232-1 while 64-bit integers range from -264 to 264-1 (assuming two's complement representation).

float

The float type represents double-precision floating point values (64-bits). Floating point numbers are written in the conventional manner of programming languages:

1.414
1414e-3
0.01414E2

Recent versions of the compiler allow underscores within float literals to improve readability.

Common operations are available in the float standard library module. Some higher-level functions are available in the math module (e.g. sin, cos, sqrt).

float values can be tested for equality and inequality using = and \= operators. Note the usual warnings about comparing floating point numbers for equality.

string

The string type represents a sequence of Unicode characters.

A string literal is written as a sequence of characters between double quotes:

"this is a string"

String literals may contain backslash escapes.

\a          stands for "alert" (a beep character),
\b          for backspace,
\r          for carriage-return,
\f          for form-feed,
\t          for tab,
\n          for newline,
\v          for vertical-tab.
\\          for backslash
\'          for single quote
\"          for double quote
\xXXXX\     for the character code identified by hexadecimal XXXX
\oOOO\      for the character code identified by octal OOO
\uXXXX      for Unicode character U+XXXX (four hexadecimal digits)
\UXXXXXXXX  for Unicode character U+XXXXXXXX (eight hexadecimal digits)

Another way to include a double quote in a string is to use two adjacent double quotes:

"I said ""oops""."

Strings literals may continue over multiple lines. A backslash followed immediately by a newline will be deleted (both the backslash and the newline will not be included in the string).

Basic operations on strings are available in the string library module. For example, there are predicates and functions to:

  • append or join strings (++, append, join_list)
  • return the length of a string (count_codepoints, count_code_units, length)
  • index code points or code units (index, unsafe_index_code_unit)
  • skip to the next or previous code point (index_next, prev_index)
  • convert between a string and list of code points (to_char_list, from_char_list)
  • convert between a string and list of code units (to_code_unit_list, from_code_unit_list)
  • search for sub-strings (sub_string_search)
  • replace sub-strings (replace, replace_all)
  • split strings
  • strip whitespace

As usual, string values can be tested for equality or inequality using the operators = and \=.

The internal encoding of a string depends on the target language for interoperability with code written in those languages. When targeting C, strings use UTF-8 encoding and are terminated with a NUL character (so code units are 8-bit integers and null characters are not allowed). When targeting Java or C#, strings are the same as Java or C# strings and use UTF-16 encoding (so code units are 16-bit integers). Usually you would not work with code units directly, so it is not hard to write code that works for either encoding.

character (char)

The character type represents a Unicode code point. There is a type alias char in the char standard library module which is more commonly used. The name character exists for historical reasons.

There is no dedicated syntax for character literals. Rather, they are written as a single-character name. You can write characters in one of these ways:

c           works for lower case and certain other characters
'H'         works for upper case and almost all characters
''''        represents a single quote
'\n'        backslash escapes are allowed

In some contexts it will be necessary to wrap an unquoted or quoted character within parentheses. When in doubt, you can treat this as the character literal syntax:

('c')

That is, a character or character escape sequence within single quotes, within parentheses. It always works and is instantly recognisable.

Character values can be tested for equality or inequality using the operators = and \=. Other basic operations are available in the char module.

Tuple types

A tuple term is a compound data term, written as curly brackets surrounding argument terms separated by commas. A tuple type is written as curly brackets surrounding argument types separated by commas.

{} is the tuple term with zero arguments. It has the type {}.

{1} is a tuple term with one argument of type int. It has the type {int}.

{1, "apple"} is a tuple term with two arguments. It has the type {int, string}.

{{1, "apple"}, Price} is a tuple term with type {{int, string}, float}, if Price has type float.

So we can build up tuples. To get the arguments of a tuple, we can unify it with a term containing variables not yet bound to any values ("free"):

Pair = {First, Second}

or, equivalently:

{First, Second} = Pair

Assume Pair is {1, "apple"}. Unification proceeds like this. (Note that it happens at compile time.)

{1, "apple"} = {First, Second}

To unify two tuple terms, both tuples must have the same number of arguments (the same arity). Actually, Mercury requires both sides of the unification to have the same type so trying to unify tuples with different arities would be a compile-time error. The next step is to unify each of the corresponding arguments:

1 = First,
"apple" = Second

If First and Second are both free then these unifications must succeed, and then we can refer to the arguments of Pair.

What if we don't need some of the arguments? Well, we could unify any unneeded argument with a new variable and simply not refer to it again. However, the Mercury compiler will warn about a variable name that only occurs once in its scope, in case it was unintentional:

In clause for predicate `pair.main'/2:
  warning: variable `Second' occurs only once in this scope.

To tell the compiler it was intentional, you can give the variable a name beginning with underscore. The compiler will not warn if such a variable occurs only once in its scope. In fact, the compiler will warn if the variable occurs more than once. Also, each occurrence of the token _ (a single underscore) is treated as a distinct variable so that you can write _ without making up distinct names for "don't-care" variables.

You might have guessed that comparing tuple terms for equality is done through unification.

io.state

We have already seen the io type which represents the state of the world. Its real name is io.state (that's a module-qualified name: the type state from the io module). io.state is too long for such a commonly used type so the alias io is preferred.

There is not much you can do with an io.state value except pass it around. main starts with an initial state of the world and must produce a final state of the world. Any predicate that performs I/O, or intends to perform I/O, must do likewise.

Type conversions

Mercury does not feature implicit type conversions. Conversions must be done explicitly through predicate or function calls.

Here are some of the possible conversions between the primitive types. In many cases, there are both functions and predicates available which do the same thing. Sometimes there are multiple functions or predicates that do that same thing. We use module-qualified names below so you know which modules are required, but you may prefer to leave off the module qualifiers if it's clear enough.

  • int to float

        Flt = float.float(Int)
  • float to int

        Int = float.ceiling_to_int(Flt)
        Int = float.floor_to_int(Flt)
        Int = float.round_to_int(Flt)
        Int = float.truncate_to_int(Flt)
  • int to string

        Str = string.from_int(Int)
        Str = string.int_to_base_string(Int, Base)
  • float to string

        Str = string.from_float(Flt)
  • string to int

        string.to_int(Str, Int)                     % can fail
        string.base_string_to_int(Base, Str, Int)   % can fail
  • string to float

        string.to_float(Str, Flt)                   % can fail
  • char to int

        Int = char.to_int(Char)
  • int to char

        char.from_int(Int, Char)                    % can fail
  • char to string

        Str = string.from_char(Char)

Questions

  • Mercury depends on the axiom "∀x. x=x" (for all X, X = X). On the other hand, the IEEE floating-point standard includes NaN values (not a number) which "compare unordered with everything, including itself". What to do?
Clone this wiki locally