diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 4456c3c9c..aff208478 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -1,5 +1,7 @@ name: CI -on: [push, pull_request] +on: + pull_request: + merge_group: jobs: test: diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 0cab5a17d..2df5bc711 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -44,6 +44,7 @@ - [Code generation](attributes/codegen.md) - [Limits](attributes/limits.md) - [Type System](attributes/type_system.md) + - [Debugger](attributes/debugger.md) - [Statements and expressions](statements-and-expressions.md) - [Statements](statements.md) diff --git a/src/attributes.md b/src/attributes.md index 5d619c990..92ce1cd09 100644 --- a/src/attributes.md +++ b/src/attributes.md @@ -271,6 +271,8 @@ The following is an index of all built-in attributes. - Type System - [`non_exhaustive`] — Indicate that a type will have more fields/variants added in future. +- Debugger + - [`debugger_visualizer`] — Embeds a file that specifies debugger output for a type. [Doc comments]: comments.md#doc-comments [ECMA-334]: https://www.ecma-international.org/publications/standards/Ecma-334.htm @@ -291,6 +293,7 @@ The following is an index of all built-in attributes. [`cold`]: attributes/codegen.md#the-cold-attribute [`crate_name`]: crates-and-source-files.md#the-crate_name-attribute [`crate_type`]: linkage.md +[`debugger_visualizer`]: attributes/debugger.md#the-debugger_visualizer-attribute [`deny`]: attributes/diagnostics.md#lint-check-attributes [`deprecated`]: attributes/diagnostics.md#the-deprecated-attribute [`derive`]: attributes/derive.md diff --git a/src/attributes/codegen.md b/src/attributes/codegen.md index 91a6b4604..c929f979c 100644 --- a/src/attributes/codegen.md +++ b/src/attributes/codegen.md @@ -89,9 +89,11 @@ Feature | Implicitly Enables | Description `bmi1` | | [BMI1] — Bit Manipulation Instruction Sets `bmi2` | | [BMI2] — Bit Manipulation Instruction Sets 2 `cmpxchg16b`| | [`cmpxchg16b`] - Compares and exchange 16 bytes (128 bits) of data atomically +`f16c` | `avx` | [F16C] — 16-bit floating point conversion instructions `fma` | `avx` | [FMA3] — Three-operand fused multiply-add `fxsr` | | [`fxsave`] and [`fxrstor`] — Save and restore x87 FPU, MMX Technology, and SSE State `lzcnt` | | [`lzcnt`] — Leading zeros count +`movbe` | | [`movbe`] - Move data after swapping bytes `pclmulqdq` | `sse2` | [`pclmulqdq`] — Packed carry-less multiplication quadword `popcnt` | | [`popcnt`] — Count of bits set to 1 `rdrand` | | [`rdrand`] — Read random number @@ -117,10 +119,12 @@ Feature | Implicitly Enables | Description [BMI1]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets [BMI2]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#BMI2 [`cmpxchg16b`]: https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b +[F16C]: https://en.wikipedia.org/wiki/F16C [FMA3]: https://en.wikipedia.org/wiki/FMA_instruction_set [`fxsave`]: https://www.felixcloutier.com/x86/fxsave [`fxrstor`]: https://www.felixcloutier.com/x86/fxrstor [`lzcnt`]: https://www.felixcloutier.com/x86/lzcnt +[`movbe`]: https://www.felixcloutier.com/x86/movbe [`pclmulqdq`]: https://www.felixcloutier.com/x86/pclmulqdq [`popcnt`]: https://www.felixcloutier.com/x86/popcnt [`rdrand`]: https://en.wikipedia.org/wiki/RdRand @@ -357,15 +361,20 @@ trait object whose methods are attributed. ## The `instruction_set` attribute -The *`instruction_set` attribute* may be applied to a function to enable code generation for a specific -instruction set supported by the target architecture. It uses the [_MetaListPath_] syntax and a path -comprised of the architecture and instruction set to specify how to generate the code for -architectures where a single program may utilize multiple instruction sets. +The *`instruction_set` [attribute]* may be applied to a function to control which instruction set the function will be generated for. +This allows mixing more than one instruction set in a single program on CPU architectures that support it. +It uses the [_MetaListPath_] syntax, and a path comprised of the architecture family name and instruction set name. -The following values are available on targets for the `ARMv4` and `ARMv5te` architectures: +[_MetaListPath_]: ../attributes.md#meta-item-attribute-syntax + +It is a compilation error to use the `instruction_set` attribute on a target that does not support it. + +### On ARM -* `arm::a32` - Uses ARM code. -* `arm::t32` - Uses Thumb code. +For the `ARMv4T` and `ARMv5te` architectures, the following are supported: + +* `arm::a32` - Generate the function as A32 "ARM" code. +* `arm::t32` - Generate the function as T32 "Thumb" code. ```rust,ignore @@ -376,4 +385,7 @@ fn foo_arm_code() {} fn bar_thumb_code() {} ``` -[_MetaListPath_]: ../attributes.md#meta-item-attribute-syntax +Using the `instruction_set` attribute has the following effects: + +* If the address of the function is taken as a function pointer, the low bit of the address will be set to 0 (arm) or 1 (thumb) depending on the instruction set. +* Any inline assembly in the function must use the specified instruction set instead of the target default. diff --git a/src/attributes/debugger.md b/src/attributes/debugger.md new file mode 100644 index 000000000..6ea80221e --- /dev/null +++ b/src/attributes/debugger.md @@ -0,0 +1,141 @@ +# Debugger attributes + +The following [attributes] are used for enhancing the debugging experience when using third-party debuggers like GDB or WinDbg. + +## The `debugger_visualizer` attribute + +The *`debugger_visualizer` attribute* can be used to embed a debugger visualizer file into the debug information. +This enables an improved debugger experience for displaying values in the debugger. +It uses the [_MetaListNameValueStr_] syntax to specify its inputs, and must be specified as a crate attribute. + +### Using `debugger_visualizer` with Natvis + +Natvis is an XML-based framework for Microsoft debuggers (such as Visual Studio and WinDbg) that uses declarative rules to customize the display of types. +For detailed information on the Natvis format, refer to Microsoft's [Natvis documentation]. + +This attribute only supports embedding Natvis files on `-windows-msvc` targets. + +The path to the Natvis file is specified with the `natvis_file` key, which is a path relative to the crate source file: + + +```rust ignore +#![debugger_visualizer(natvis_file = "Rectangle.natvis")] + +struct FancyRect { + x: f32, + y: f32, + dx: f32, + dy: f32, +} + +fn main() { + let fancy_rect = FancyRect { x: 10.0, y: 10.0, dx: 5.0, dy: 5.0 }; + println!("set breakpoint here"); +} +``` + +and `Rectangle.natvis` contains: + +```xml + + + + ({x},{y}) + ({dx}, {dy}) + + + ({x}, {y}) + + + ({x}, {y + dy}) + + + ({x + dx}, {y + dy}) + + + ({x + dx}, {y}) + + + + +``` + +When viewed under WinDbg, the `fancy_rect` variable would be shown as follows: + +```text +> Variables: + > fancy_rect: (10.0, 10.0) + (5.0, 5.0) + > LowerLeft: (10.0, 10.0) + > UpperLeft: (10.0, 15.0) + > UpperRight: (15.0, 15.0) + > LowerRight: (15.0, 10.0) +``` + +### Using `debugger_visualizer` with GDB + +GDB supports the use of a structured Python script, called a *pretty printer*, that describes how a type should be visualized in the debugger view. +For detailed information on pretty printers, refer to GDB's [pretty printing documentation]. + +Embedded pretty printers are not automatically loaded when debugging a binary under GDB. +There are two ways to enable auto-loading embedded pretty printers: +1. Launch GDB with extra arguments to explicitly add a directory or binary to the auto-load safe path: `gdb -iex "add-auto-load-safe-path safe-path path/to/binary" path/to/binary` + For more information, see GDB's [auto-loading documentation]. +1. Create a file named `gdbinit` under `$HOME/.config/gdb` (you may need to create the directory if it doesn't already exist). Add the following line to that file: `add-auto-load-safe-path path/to/binary`. + +These scripts are embedded using the `gdb_script_file` key, which is a path relative to the crate source file. + + +```rust ignore +#![debugger_visualizer(gdb_script_file = "printer.py")] + +struct Person { + name: String, + age: i32, +} + +fn main() { + let bob = Person { name: String::from("Bob"), age: 10 }; + println!("set breakpoint here"); +} +``` + +and `printer.py` contains: + +```python +import gdb + +class PersonPrinter: + "Print a Person" + + def __init__(self, val): + self.val = val + self.name = val["name"] + self.age = int(val["age"]) + + def to_string(self): + return "{} is {} years old.".format(self.name, self.age) + +def lookup(val): + lookup_tag = val.type.tag + if lookup_tag is None: + return None + if "foo::Person" == lookup_tag: + return PersonPrinter(val) + + return None + +gdb.current_objfile().pretty_printers.append(lookup) +``` + +When the crate's debug executable is passed into GDB[^rust-gdb], `print bob` will display: + +```text +"Bob" is 10 years old. +``` + +[^rust-gdb]: Note: This assumes you are using the `rust-gdb` script which configures pretty-printers for standard library types like `String`. + +[auto-loading documentation]: https://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html +[attributes]: ../attributes.md +[Natvis documentation]: https://docs.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects +[pretty printing documentation]: https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html +[_MetaListNameValueStr_]: ../attributes.md#meta-item-attribute-syntax diff --git a/src/attributes/diagnostics.md b/src/attributes/diagnostics.md index 45f9cc440..506e2848b 100644 --- a/src/attributes/diagnostics.md +++ b/src/attributes/diagnostics.md @@ -49,7 +49,7 @@ check on and off: ```rust #[warn(missing_docs)] -pub mod m2{ +pub mod m2 { #[allow(missing_docs)] pub mod nested { // Missing documentation is ignored here diff --git a/src/behavior-considered-undefined.md b/src/behavior-considered-undefined.md index 986ba80d5..72b5e8ab0 100644 --- a/src/behavior-considered-undefined.md +++ b/src/behavior-considered-undefined.md @@ -42,9 +42,12 @@ code. All this also applies when values of these types are passed in a (nested) field of a compound type, but not behind pointer indirections. -* Mutating immutable data. All data inside a [`const`] item is immutable. Moreover, all - data reached through a shared reference or data owned by an immutable binding - is immutable, unless that data is contained within an [`UnsafeCell`]. +* Mutating immutable bytes. All bytes inside a [`const`] item are immutable. + The bytes owned by an immutable binding are immutable, unless those bytes are part of an [`UnsafeCell`]. + + Moreover, the bytes [pointed to] by a shared reference, including transitively through other references (both shared and mutable) and `Box`es, are immutable; transitivity includes those references stored in fields of compound types. + + A mutation is any write of more than 0 bytes which overlaps with any of the relevant bytes (even if that write does not change the memory contents). * Invoking undefined behavior via compiler intrinsics. * Executing code compiled with platform features that the current platform does not support (see [`target_feature`]), *except* if the platform explicitly documents this to be safe. @@ -79,6 +82,11 @@ code. > `rustc_layout_scalar_valid_range_*` attributes. * Incorrect use of inline assembly. For more details, refer to the [rules] to follow when writing code that uses inline assembly. +* **In [const context](const_eval.md#const-context)**: transmuting or otherwise + reinterpreting a pointer (reference, raw pointer, or function pointer) into + some allocated object as a non-pointer type (such as integers). + 'Reinterpreting' refers to loading the pointer value at integer type without a + cast, e.g. by doing raw pointer casts or using a union. **Note:** Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which @@ -91,13 +99,16 @@ reading uninitialized memory is permitted are inside `union`s and in "padding" > vice versa, undefined behavior in Rust can cause adverse affects on code > executed by any FFI calls to other languages. +### Pointed-to bytes + +The span of bytes a pointer or reference "points to" is determined by the pointer value and the size of the pointee type (using `size_of_val`). + ### Dangling pointers [dangling]: #dangling-pointers A reference/pointer is "dangling" if it is null or not all of the bytes it -points to are part of the same live allocation (so in particular they all have to be -part of *some* allocation). The span of bytes it points to is determined by the -pointer value and the size of the pointee type (using `size_of_val`). +[points to] are part of the same live allocation (so in particular they all have to be +part of *some* allocation). If the size is 0, then the pointer must either point inside of a live allocation (including pointing just after the last byte of the allocation), or it must be @@ -121,3 +132,5 @@ must never exceed `isize::MAX`. [dereference expression]: expressions/operator-expr.md#the-dereference-operator [place expression context]: expressions.md#place-expressions-and-value-expressions [rules]: inline-assembly.md#rules-for-inline-assembly +[points to]: #pointed-to-bytes +[pointed to]: #pointed-to-bytes diff --git a/src/comments.md b/src/comments.md index ad29c58e5..bf1e7caa1 100644 --- a/src/comments.md +++ b/src/comments.md @@ -42,7 +42,7 @@ Non-doc comments are interpreted as a form of whitespace. ## Doc comments Line doc comments beginning with exactly _three_ slashes (`///`), and block -doc comments (`/** ... */`), both inner doc comments, are interpreted as a +doc comments (`/** ... */`), both outer doc comments, are interpreted as a special syntax for [`doc` attributes]. That is, they are equivalent to writing `#[doc="..."]` around the body of the comment, i.e., `/// Foo` turns into `#[doc="Foo"]` and `/** Bar */` turns into `#[doc="Bar"]`. diff --git a/src/conditional-compilation.md b/src/conditional-compilation.md index 97840e4f6..e724b21e2 100644 --- a/src/conditional-compilation.md +++ b/src/conditional-compilation.md @@ -129,6 +129,7 @@ Example values: * `"dragonfly"` * `"openbsd"` * `"netbsd"` +* `"none"` (typical for embedded targets) ### `target_family` @@ -254,6 +255,12 @@ It is written as `cfg`, `(`, a configuration predicate, and finally `)`. If the predicate is true, the thing is rewritten to not have the `cfg` attribute on it. If the predicate is false, the thing is removed from the source code. +When a crate-level `cfg` has a false predicate, the behavior is slightly +different: any crate attributes preceding the `cfg` are kept, and any crate +attributes following the `cfg` are removed. This allows `#![no_std]` and +`#![no_core]` crates to avoid linking `std`/`core` even if a `#![cfg(...)]` has +removed the entire crate. + Some examples on functions: ```rust diff --git a/src/destructors.md b/src/destructors.md index b246e9fb4..00d31face 100644 --- a/src/destructors.md +++ b/src/destructors.md @@ -285,7 +285,7 @@ An *extending pattern* is either * An [identifier pattern] that binds by reference or mutable reference. * A [struct][struct pattern], [tuple][tuple pattern], [tuple struct][tuple struct pattern], or [slice][slice pattern] pattern where at least one of the - direct subpatterns is a extending pattern. + direct subpatterns is an extending pattern. So `ref x`, `V(ref x)` and `[ref x, y]` are all extending patterns, but `x`, `&ref x` and `&(ref x,)` are not. diff --git a/src/expressions.md b/src/expressions.md index b2411cd8e..ad4cc5f54 100644 --- a/src/expressions.md +++ b/src/expressions.md @@ -162,7 +162,7 @@ Explicitly, the assignee expressions are: - Place expressions. - [Underscores][_UnderscoreExpression_]. - [Tuples][_TupleExpression_] of assignee expressions. -- [Slices][_ArrayExpression_] of assingee expressions. +- [Slices][_ArrayExpression_] of assignee expressions. - [Tuple structs][_StructExpression_] of assignee expressions. - [Structs][_StructExpression_] of assignee expressions (with optionally named fields). diff --git a/src/expressions/call-expr.md b/src/expressions/call-expr.md index 577f3f432..7a01e92e1 100644 --- a/src/expressions/call-expr.md +++ b/src/expressions/call-expr.md @@ -48,7 +48,7 @@ trait Pretty { } trait Ugly { - fn print(&self); + fn print(&self); } struct Foo; diff --git a/src/expressions/loop-expr.md b/src/expressions/loop-expr.md index 204207ee0..c8b93ea39 100644 --- a/src/expressions/loop-expr.md +++ b/src/expressions/loop-expr.md @@ -249,8 +249,27 @@ A `break` expression is only permitted in the body of a loop, and has one of the >    [_BlockExpression_] Labelled block expressions are exactly like block expressions, except that they allow using `break` expressions within the block. -Unlike other loops, `break` expressions within a label expression *must* have a label (i.e. the label is not optional). -Unlike other loops, labelled block expressions *must* begin with a label. +Unlike loops, `break` expressions within a labelled block expression *must* have a label (i.e. the label is not optional). +Similarly, labelled block expressions *must* begin with a label. + +```rust +# fn do_thing() {} +# fn condition_not_met() -> bool { true } +# fn do_next_thing() {} +# fn do_last_thing() {} +let result = 'block: { + do_thing(); + if condition_not_met() { + break 'block 1; + } + do_next_thing(); + if condition_not_met() { + break 'block 2; + } + do_last_thing(); + 3 +}; +``` ## `continue` expressions diff --git a/src/expressions/operator-expr.md b/src/expressions/operator-expr.md index 691f801e8..8b6429636 100644 --- a/src/expressions/operator-expr.md +++ b/src/expressions/operator-expr.md @@ -243,8 +243,8 @@ The operands of all of these operators are evaluated in [value expression contex | `+` | Addition | | Addition | `std::ops::Add` | `std::ops::AddAssign` | | `-` | Subtraction | | Subtraction | `std::ops::Sub` | `std::ops::SubAssign` | | `*` | Multiplication | | Multiplication | `std::ops::Mul` | `std::ops::MulAssign` | -| `/` | Division* | | Division | `std::ops::Div` | `std::ops::DivAssign` | -| `%` | Remainder** | | Remainder | `std::ops::Rem` | `std::ops::RemAssign` | +| `/` | Division*† | | Division | `std::ops::Div` | `std::ops::DivAssign` | +| `%` | Remainder**† | | Remainder | `std::ops::Rem` | `std::ops::RemAssign` | | `&` | Bitwise AND | [Logical AND] | | `std::ops::BitAnd` | `std::ops::BitAndAssign` | | | | Bitwise OR | [Logical OR] | | `std::ops::BitOr` | `std::ops::BitOrAssign` | | `^` | Bitwise XOR | [Logical XOR] | | `std::ops::BitXor` | `std::ops::BitXorAssign` | @@ -258,6 +258,8 @@ The operands of all of these operators are evaluated in [value expression contex \*\*\* Arithmetic right shift on signed integer types, logical right shift on unsigned integer types. +† For integer types, division by zero panics. + Here are examples of these operators being used. ```rust diff --git a/src/expressions/struct-expr.md b/src/expressions/struct-expr.md index 8caeff200..8d9154789 100644 --- a/src/expressions/struct-expr.md +++ b/src/expressions/struct-expr.md @@ -73,7 +73,7 @@ drop(y_ref); ``` Struct expressions with curly braces can't be used directly in a [loop] or [if] expression's head, or in the [scrutinee] of an [if let] or [match] expression. -However, struct expressions can be in used in these situations if they are within another expression, for example inside [parentheses]. +However, struct expressions can be used in these situations if they are within another expression, for example inside [parentheses]. The field names can be decimal integer values to specify indices for constructing tuple structs. This can be used with base structs to fill out the remaining indices not specified: diff --git a/src/inline-assembly.md b/src/inline-assembly.md index 996b157da..26f1acedc 100644 --- a/src/inline-assembly.md +++ b/src/inline-assembly.md @@ -11,12 +11,14 @@ Support for inline assembly is stable on the following architectures: - ARM - AArch64 - RISC-V +- LoongArch The compiler will emit an error if `asm!` is used on an unsupported target. ## Example ```rust +# #[cfg(target_arch = "x86_64")] { use std::arch::asm; // Multiply x by 6 using shifts and adds @@ -32,6 +34,7 @@ unsafe { ); } assert_eq!(x, 4 * 6); +# } ``` ## Syntax @@ -43,16 +46,15 @@ format_string := STRING_LITERAL / RAW_STRING_LITERAL dir_spec := "in" / "out" / "lateout" / "inout" / "inlateout" reg_spec := / "\"" "\"" operand_expr := expr / "_" / expr "=>" expr / expr "=>" "_" -reg_operand := dir_spec "(" reg_spec ")" operand_expr -operand := reg_operand +reg_operand := [ident "="] dir_spec "(" reg_spec ")" operand_expr clobber_abi := "clobber_abi(" *("," ) [","] ")" option := "pure" / "nomem" / "readonly" / "preserves_flags" / "noreturn" / "nostack" / "att_syntax" / "raw" options := "options(" option *("," option) [","] ")" -asm := "asm!(" format_string *("," format_string) *("," [ident "="] operand) *("," clobber_abi) *("," options) [","] ")" -global_asm := "global_asm!(" format_string *("," format_string) *("," [ident "="] operand) *("," options) [","] ")" +operand := reg_operand / clobber_abi / options +asm := "asm!(" format_string *("," format_string) *("," operand) [","] ")" +global_asm := "global_asm!(" format_string *("," format_string) *("," operand) [","] ")" ``` - ## Scope Inline assembly can be used in one of two ways. @@ -74,8 +76,7 @@ An `asm!` invocation may have one or more template string arguments; an `asm!` w The expected usage is for each template string argument to correspond to a line of assembly code. All template string arguments must appear before any other arguments. -As with format strings, named arguments must appear after positional arguments. -Explicit [register operands](#register-operands) must appear at the end of the operand list, after named arguments if any. +As with format strings, positional arguments must appear before named arguments and explicit [register operands](#register-operands). Explicit register operands cannot be used by placeholders in the template string. All other named and positional operands must appear at least once in the template string, otherwise a compiler error is generated. @@ -185,6 +186,8 @@ Here is the list of currently supported register classes: | RISC-V | `reg` | `x1`, `x[5-7]`, `x[9-15]`, `x[16-31]` (non-RV32E) | `r` | | RISC-V | `freg` | `f[0-31]` | `f` | | RISC-V | `vreg` | `v[0-31]` | Only clobbers | +| LoongArch | `reg` | `$r1`, `$r[4-20]`, `$r[23,30]` | `r` | +| LoongArch | `freg` | `$f[0-31]` | `f` | > **Notes**: > - On x86 we treat `reg_byte` differently from `reg` because the compiler can allocate `al` and `ah` separately whereas `reg` reserves the whole register. @@ -223,6 +226,8 @@ The availability of supported types for a particular register class may depend o | RISC-V | `freg` | `f` | `f32` | | RISC-V | `freg` | `d` | `f64` | | RISC-V | `vreg` | N/A | Only clobbers | +| LoongArch64 | `reg` | None | `i8`, `i16`, `i32`, `i64`, `f32`, `f64` | +| LoongArch64 | `freg` | None | `f32`, `f64` | > **Note**: For the purposes of the above table pointers, function pointers and `isize`/`usize` are treated as the equivalent integer type (`i16`/`i32`/`i64` depending on the target). @@ -284,15 +289,27 @@ Here is the list of all supported register aliases: | RISC-V | `f[10-17]` | `fa[0-7]` | | RISC-V | `f[18-27]` | `fs[2-11]` | | RISC-V | `f[28-31]` | `ft[8-11]` | +| LoongArch | `$r0` | `$zero` | +| LoongArch | `$r1` | `$ra` | +| LoongArch | `$r2` | `$tp` | +| LoongArch | `$r3` | `$sp` | +| LoongArch | `$r[4-11]` | `$a[0-7]` | +| LoongArch | `$r[12-20]` | `$t[0-8]` | +| LoongArch | `$r21` | | +| LoongArch | `$r22` | `$fp`, `$s9` | +| LoongArch | `$r[23-31]` | `$s[0-8]` | +| LoongArch | `$f[0-7]` | `$fa[0-7]` | +| LoongArch | `$f[8-23]` | `$ft[0-15]` | +| LoongArch | `$f[24-31]` | `$fs[0-7]` | Some registers cannot be used for input or output operands: | Architecture | Unsupported register | Reason | | ------------ | -------------------- | ------ | | All | `sp` | The stack pointer must be restored to its original value at the end of an asm code block. | -| All | `bp` (x86), `x29` (AArch64), `x8` (RISC-V) | The frame pointer cannot be used as an input or output. | +| All | `bp` (x86), `x29` (AArch64), `x8` (RISC-V), `$fp` (LoongArch) | The frame pointer cannot be used as an input or output. | | ARM | `r7` or `r11` | On ARM the frame pointer can be either `r7` or `r11` depending on the target. The frame pointer cannot be used as an input or output. | -| All | `si` (x86-32), `bx` (x86-64), `r6` (ARM), `x19` (AArch64), `x9` (RISC-V) | This is used internally by LLVM as a "base pointer" for functions with complex stack frames. | +| All | `si` (x86-32), `bx` (x86-64), `r6` (ARM), `x19` (AArch64), `x9` (RISC-V), `$s8` (LoongArch) | This is used internally by LLVM as a "base pointer" for functions with complex stack frames. | | x86 | `ip` | This is the program counter, not a real register. | | AArch64 | `xzr` | This is a constant zero register which can't be modified. | | AArch64 | `x18` | This is an OS-reserved register on some AArch64 targets. | @@ -300,6 +317,9 @@ Some registers cannot be used for input or output operands: | ARM | `r9` | This is an OS-reserved register on some ARM targets. | | RISC-V | `x0` | This is a constant zero register which can't be modified. | | RISC-V | `gp`, `tp` | These registers are reserved and cannot be used as inputs or outputs. | +| LoongArch | `$r0` or `$zero` | This is a constant zero register which can't be modified. | +| LoongArch | `$r2` or `$tp` | This is reserved for TLS. | +| LoongArch | `$r21` | This is reserved by the ABI. | The frame pointer and base pointer registers are reserved for internal use by LLVM. While `asm!` statements cannot explicitly specify the use of reserved registers, in some cases LLVM will allocate one of these reserved registers for `reg` operands. Assembly code making use of reserved registers should be careful since `reg` operands may use the same registers. @@ -346,6 +366,8 @@ The supported modifiers are a subset of LLVM's (and GCC's) [asm template argumen | ARM | `qreg` | `e` / `f` | `d0` / `d1` | `e` / `f` | | RISC-V | `reg` | None | `x1` | None | | RISC-V | `freg` | None | `f0` | None | +| LoongArch | `reg` | None | `$r1` | None | +| LoongArch | `freg` | None | `$f0` | None | > **Notes**: > - on ARM `e` / `f`: this prints the low or high doubleword register name of a NEON quad (128-bit) register. @@ -379,6 +401,7 @@ The following ABIs can be used with `clobber_abi`: | AArch64 | `"C"`, `"system"`, `"efiapi"` | `x[0-17]`, `x18`\*, `x30`, `v[0-31]`, `p[0-15]`, `ffr` | | ARM | `"C"`, `"system"`, `"efiapi"`, `"aapcs"` | `r[0-3]`, `r12`, `r14`, `s[0-15]`, `d[0-7]`, `d[16-31]` | | RISC-V | `"C"`, `"system"`, `"efiapi"` | `x1`, `x[5-7]`, `x[10-17]`, `x[28-31]`, `f[0-7]`, `f[10-17]`, `f[28-31]`, `v[0-31]` | +| LoongArch | `"C"`, `"system"`, `"efiapi"` | `$r1`, `$r[4-20]`, `$f[0-23]` | > Notes: > - On AArch64 `x18` only included in the clobber list if it is not considered as a reserved register on the target. @@ -466,6 +489,8 @@ To avoid undefined behavior, these rules must be followed when using function-sc - RISC-V - Floating-point exception flags in `fcsr` (`fflags`). - Vector extension state (`vtype`, `vl`, `vcsr`). + - LoongArch + - Floating-point condition flags in `$fcc[0-7]`. - On x86, the direction flag (DF in `EFLAGS`) is clear on entry to an asm block and must be clear on exit. - Behavior is undefined if the direction flag is set on exiting an asm block. - On x86, the x87 floating-point register stack must remain unchanged unless all of the `st([0-7])` registers have been marked as clobbered with `out("st(0)") _, out("st(1)") _, ...`. @@ -486,6 +511,29 @@ To avoid undefined behavior, these rules must be followed when using function-sc > **Note**: As a general rule, the flags covered by `preserves_flags` are those which are *not* preserved when performing a function call. +### Correctness and Validity + +In addition to all of the previous rules, the string argument to `asm!` must ultimately become— +after all other arguments are evaluated, formatting is performed, and operands are translated— +assembly that is both syntactically correct and semantically valid for the target architecture. +The formatting rules allow the compiler to generate assembly with correct syntax. +Rules concerning operands permit valid translation of Rust operands into and out of `asm!`. +Adherence to these rules is necessary, but not sufficient, for the final expanded assembly to be +both correct and valid. For instance: + +- arguments may be placed in positions which are syntactically incorrect after formatting +- an instruction may be correctly written, but given architecturally invalid operands +- an architecturally unspecified instruction may be assembled into unspecified code +- a set of instructions, each correct and valid, may cause undefined behavior if placed in immediate succession + +As a result, these rules are _non-exhaustive_. The compiler is not required to check the +correctness and validity of the initial string nor the final assembly that is generated. +The assembler may check for correctness and validity but is not required to do so. +When using `asm!`, a typographical error may be sufficient to make a program unsound, +and the rules for assembly may include thousands of pages of architectural reference manuals. +Programmers should exercise appropriate care, as invoking this `unsafe` capability comes with +assuming the responsibility of not violating rules of both the compiler or the architecture. + ### Directives Support Inline assembly supports a subset of the directives supported by both GNU AS and LLVM's internal assembler, given as follows. @@ -499,12 +547,9 @@ The following directives are guaranteed to be supported by the assembler: - `.4byte` - `.8byte` - `.align` +- `.alt_entry` - `.ascii` - `.asciz` -- `.alt_entry` -- `.balign` -- `.balignl` -- `.balignw` - `.balign` - `.balignl` - `.balignw` @@ -520,17 +565,17 @@ The following directives are guaranteed to be supported by the assembler: - `.eqv` - `.fill` - `.float` -- `.globl` - `.global` -- `.lcomm` +- `.globl` - `.inst` +- `.lcomm` - `.long` - `.octa` - `.option` -- `.private_extern` - `.p2align` -- `.pushsection` - `.popsection` +- `.private_extern` +- `.pushsection` - `.quad` - `.scl` - `.section` diff --git a/src/items/constant-items.md b/src/items/constant-items.md index bf315932f..85d3e015d 100644 --- a/src/items/constant-items.md +++ b/src/items/constant-items.md @@ -89,6 +89,22 @@ m!(const _: () = ();); // const _: () = (); ``` +## Evaluation + +[Free][free] constants are always [evaluated][const_eval] at compile-time to surface +panics. This happens even within an unused function: + +```rust,compile_fail +// Compile-time panic +const PANIC: () = std::unimplemented!(); + +fn unused_generic_function() { + // A failing compile-time assertion + const _: () = assert!(usize::BITS == 0); +} +``` + +[const_eval]: ../const_eval.md [associated constant]: ../items/associated-items.md#associated-constants [constant value]: ../const_eval.md#constant-expressions [free]: ../glossary.md#free-item diff --git a/src/items/external-blocks.md b/src/items/external-blocks.md index ce2190829..982f57ba7 100644 --- a/src/items/external-blocks.md +++ b/src/items/external-blocks.md @@ -90,6 +90,8 @@ There are also some platform-specific ABI strings: `__fastcall` and GCC and clang's `__attribute__((fastcall))` * `extern "vectorcall"` -- The `vectorcall` ABI -- corresponds to MSVC's `__vectorcall` and clang's `__attribute__((vectorcall))` +* `extern "thiscall"` -- The default for C++ member functions on MSVC -- corresponds to MSVC's + `__thiscall` and GCC and clang's `__attribute__((thiscall))` * `extern "efiapi"` -- The ABI used for [UEFI] functions. ## Variadic functions @@ -231,9 +233,38 @@ resolution logic to find that import library. Alternatively, specifying `kind = "raw-dylib"` instructs the compiler to generate an import library during compilation and provide that to the linker instead. -`raw-dylib` is only supported on Windows and not supported on 32-bit x86 -(`target_arch="x86"`). Using it when targeting other platforms or -x86 on Windows will result in a compiler error. +`raw-dylib` is only supported on Windows. Using it when targeting other +platforms will result in a compiler error. + +#### The `import_name_type` key + +On x86 Windows, names of functions are "decorated" (i.e., have a specific prefix +and/or suffix added) to indicate their calling convention. For example, a +`stdcall` calling convention function with the name `fn1` that has no arguments +would be decorated as `_fn1@0`. However, the [PE Format] does also permit names +to have no prefix or be undecorated. Additionally, the MSVC and GNU toolchains +use different decorations for the same calling conventions which means, by +default, some Win32 functions cannot be called using the `raw-dylib` link kind +via the GNU toolchain. + +To allow for these differences, when using the `raw-dylib` link kind you may +also specify the `import_name_type` key with one of the following values to +change how functions are named in the generated import library: + +* `decorated`: The function name will be fully-decorated using the MSVC + toolchain format. +* `noprefix`: The function name will be decorated using the MSVC toolchain + format, but skipping the leading `?`, `@`, or optionally `_`. +* `undecorated`: The function name will not be decorated. + +If the `import_name_type` key is not specified, then the function name will be +fully-decorated using the target toolchain's format. + +Variables are never decorated and so the `import_name_type` key has no effect on +how they are named in the generated import library. + +The `import_name_type` key is only supported on x86 Windows. Using it when +targeting other platforms will result in a compiler error. ### The `link_name` attribute @@ -308,3 +339,4 @@ restrictions as [regular function parameters]. [`whole-archive` documentation for rustc]: ../../rustc/command-line-arguments.html#linking-modifiers-whole-archive [`verbatim` documentation for rustc]: ../../rustc/command-line-arguments.html#linking-modifiers-verbatim [`dylib` versus `raw-dylib`]: #dylib-versus-raw-dylib +[PE Format]: https://learn.microsoft.com/windows/win32/debug/pe-format#import-name-type diff --git a/src/items/unions.md b/src/items/unions.md index 0b989ccc1..3c6c83d50 100644 --- a/src/items/unions.md +++ b/src/items/unions.md @@ -61,7 +61,7 @@ non-zero offset (except when [the C representation] is used); in that case the bits starting at the offset of the fields are read. It is the programmer's responsibility to make sure that the data is valid at the field's type. Failing to do so results in [undefined behavior]. For example, reading the value `3` -through of a field of the [boolean type] is undefined behavior. Effectively, +from a field of the [boolean type] is undefined behavior. Effectively, writing to and then reading from a union with [the C representation] is analogous to a [`transmute`] from the type used for writing to the type used for reading. diff --git a/src/keywords.md b/src/keywords.md index 67f1089d8..1855a35d0 100644 --- a/src/keywords.md +++ b/src/keywords.md @@ -111,6 +111,7 @@ is possible to declare a variable or method with the name `union`. Beginning in the 2018 edition, `dyn` has been promoted to a strict keyword. > **Lexer**\ +> KW_MACRO_RULES : `macro_rules`\ > KW_UNION : `union`\ > KW_STATICLIFETIME : `'static` > diff --git a/src/macros-by-example.md b/src/macros-by-example.md index cd9dc3402..51aa919fc 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -166,7 +166,7 @@ The repetition operators are: - `*` — indicates any number of repetitions. - `+` — indicates any number but at least one. -- `?` — indicates an optional fragment with zero or one occurrences. +- `?` — indicates an optional fragment with zero or one occurrence. Since `?` represents at most one occurrence, it cannot be used with a separator. diff --git a/src/names/namespaces.md b/src/names/namespaces.md index 14811697c..bb4409b73 100644 --- a/src/names/namespaces.md +++ b/src/names/namespaces.md @@ -52,6 +52,7 @@ The following is a list of namespaces, with their corresponding entities: * [Generic lifetime parameters] * Label Namespace * [Loop labels] + * [Block labels] An example of how overlapping names in different namespaces can be used unambiguously: @@ -132,6 +133,7 @@ It is still an error for a [`use` import] to shadow another macro, regardless of [Attribute macros]: ../procedural-macros.md#attribute-macros [attributes]: ../attributes.md [bang-style macros]: ../macros.md +[Block labels]: ../expressions/loop-expr.md#labelled-block-expressions [boolean]: ../types/boolean.md [Built-in attributes]: ../attributes.md#built-in-attributes-index [closure parameters]: ../expressions/closure-expr.md diff --git a/src/paths.md b/src/paths.md index cb6b24aa0..9efbda701 100644 --- a/src/paths.md +++ b/src/paths.md @@ -125,7 +125,7 @@ S::f(); // Calls the inherent impl. >    `::`? _TypePathSegment_ (`::` _TypePathSegment_)\* > > _TypePathSegment_ :\ ->    _PathIdentSegment_ `::`? ([_GenericArgs_] | _TypePathFn_)? +>    _PathIdentSegment_ (`::`? ([_GenericArgs_] | _TypePathFn_))? > > _TypePathFn_ :\ > `(` _TypePathFnInputs_? `)` (`->` [_Type_])? diff --git a/src/procedural-macros.md b/src/procedural-macros.md index 31f029a63..7d69ab72d 100644 --- a/src/procedural-macros.md +++ b/src/procedural-macros.md @@ -251,7 +251,7 @@ use my_macro::show_streams; #[show_streams] fn invoke1() {} // out: attr: "" -// out: item: "fn invoke1() { }" +// out: item: "fn invoke1() {}" // Example: Attribute with input #[show_streams(bar)] diff --git a/src/trait-bounds.md b/src/trait-bounds.md index 0a6731288..a7cd5a7d9 100644 --- a/src/trait-bounds.md +++ b/src/trait-bounds.md @@ -156,6 +156,79 @@ fn call_on_ref_zero(f: F) where F: for<'a> Fn(&'a i32) { } ``` +## Implied bounds + +Lifetime bounds required for types to be well-formed are sometimes inferred. + +```rust +fn requires_t_outlives_a<'a, T>(x: &'a T) {} +``` +The type parameter `T` is required to outlive `'a` for the type `&'a T` to be well-formed. +This is inferred because the function signature contains the type `&'a T` which is +only valid if `T: 'a` holds. + +Implied bounds are added for all parameters and outputs of functions. Inside of `requires_t_outlives_a` +you can assume `T: 'a` to hold even if you don't explicitly specify this: + +```rust +fn requires_t_outlives_a_not_implied<'a, T: 'a>() {} + +fn requires_t_outlives_a<'a, T>(x: &'a T) { + // This compiles, because `T: 'a` is implied by + // the reference type `&'a T`. + requires_t_outlives_a_not_implied::<'a, T>(); +} +``` + +```rust,compile_fail,E0309 +# fn requires_t_outlives_a_not_implied<'a, T: 'a>() {} +fn not_implied<'a, T>() { + // This errors, because `T: 'a` is not implied by + // the function signature. + requires_t_outlives_a_not_implied::<'a, T>(); +} +``` + +Only lifetime bounds are implied, trait bounds still have to be explicitly added. +The following example therefore causes an error: + +```rust,compile_fail,E0277 +use std::fmt::Debug; +struct IsDebug(T); +// error[E0277]: `T` doesn't implement `Debug` +fn doesnt_specify_t_debug(x: IsDebug) {} +``` + +Lifetime bounds are also inferred for type definitions and impl blocks for any type: + +```rust +struct Struct<'a, T> { + // This requires `T: 'a` to be well-formed + // which is inferred by the compiler. + field: &'a T, +} + +enum Enum<'a, T> { + // This requires `T: 'a` to be well-formed, + // which is inferred by the compiler. + // + // Note that `T: 'a` is required even when only + // using `Enum::OtherVariant`. + SomeVariant(&'a T), + OtherVariant, +} + +trait Trait<'a, T: 'a> {} + +// This would error because `T: 'a` is not implied by any type +// in the impl header. +// impl<'a, T> Trait<'a, T> for () {} + +// This compiles as `T: 'a` is implied by the self type `&'a T`. +impl<'a, T> Trait<'a, T> for &'a T {} +``` + + [LIFETIME_OR_LABEL]: tokens.md#lifetimes-and-loop-labels [_GenericParams_]: items/generics.md [_TypePath_]: paths.md#paths-in-types diff --git a/src/type-layout.md b/src/type-layout.md index 191567a42..4c87954f3 100644 --- a/src/type-layout.md +++ b/src/type-layout.md @@ -549,13 +549,28 @@ The `align` modifier can also be applied on an `enum`. When it is, the effect on the `enum`'s alignment is the same as if the `enum` was wrapped in a newtype `struct` with the same `align` modifier. -
- -***Warning:*** Dereferencing an unaligned pointer is [undefined behavior] and -it is possible to [safely create unaligned pointers to `packed` fields][27060]. -Like all ways to create undefined behavior in safe Rust, this is a bug. - -
+> Note: References to unaligned fields are not allowed because it is [undefined behavior]. +> When fields are unaligned due to an alignment modifier, consider the following options for using references and dereferences: +> +> ```rust +> #[repr(packed)] +> struct Packed { +> f1: u8, +> f2: u16, +> } +> let mut e = Packed { f1: 1, f2: 2 }; +> // Instead of creating a reference to a field, copy the value to a local variable. +> let x = e.f2; +> // Or in situations like `println!` which creates a reference, use braces +> // to change it to a copy of the value. +> println!("{}", {e.f2}); +> // Or if you need a pointer, use the unaligned methods for reading and writing +> // instead of dereferencing the pointer directly. +> let ptr: *const u16 = std::ptr::addr_of!(e.f2); +> let value = unsafe { ptr.read_unaligned() }; +> let mut_ptr: *mut u16 = std::ptr::addr_of_mut!(e.f2); +> unsafe { mut_ptr.write_unaligned(3) } +> ``` ### The `transparent` Representation @@ -587,7 +602,6 @@ used with any other representation. [enumerations]: items/enumerations.md [zero-variant enums]: items/enumerations.md#zero-variant-enums [undefined behavior]: behavior-considered-undefined.md -[27060]: https://github.com/rust-lang/rust/issues/27060 [55149]: https://github.com/rust-lang/rust/issues/55149 [`PhantomData`]: special-types-and-traits.md#phantomdatat [Default]: #the-default-representation diff --git a/src/types/impl-trait.md b/src/types/impl-trait.md index 413f999f8..af900408e 100644 --- a/src/types/impl-trait.md +++ b/src/types/impl-trait.md @@ -31,15 +31,15 @@ The caller must provide a type that satisfies the bounds declared by the anonymo For example, these two forms are almost equivalent: -```rust,ignore +```rust trait Trait {} // generic type parameter -fn foo(arg: T) { +fn with_generic_type(arg: T) { } // impl Trait in argument position -fn foo(arg: impl Trait) { +fn with_impl_trait(arg: impl Trait) { } ``` @@ -96,16 +96,24 @@ With `impl Trait`, unlike with a generic type parameter, the function chooses th The function: -```rust,ignore +```rust +# trait Trait {} fn foo() -> T { + // ... +# panic!() +} ``` allows the caller to determine the return type, `T`, and the function returns that type. The function: -```rust,ignore +```rust +# trait Trait {} +# impl Trait for () {} fn foo() -> impl Trait { + // ... +} ``` doesn't allow the caller to determine the return type. diff --git a/src/types/never.md b/src/types/never.md index e32674272..3fbd2ad5c 100644 --- a/src/types/never.md +++ b/src/types/never.md @@ -7,16 +7,17 @@ The never type `!` is a type with no values, representing the result of computations that never complete. Expressions of type `!` can be coerced into any other type. - -```rust,ignore -let x: ! = panic!(); -// Can be coerced into any type. -let y: u32 = x; +The `!` type can **only** appear in function return types presently, +indicating it is a diverging function that never returns. + +```rust +fn foo() -> ! { + panic!("This call never returns."); +} ``` -**NB.** The never type was expected to be stabilized in 1.41, but due -to some last minute regressions detected the stabilization was -temporarily reverted. The `!` type can only appear in function return -types presently. See [the tracking -issue](https://github.com/rust-lang/rust/issues/35121) for more -details. +```rust +extern "C" { + pub fn no_return_extern_func() -> !; +} +``` diff --git a/src/types/textual.md b/src/types/textual.md index 7f3899d70..65d563312 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -8,7 +8,7 @@ or 0xE000 to 0x10FFFF range. It is immediate [Undefined Behavior] to create a `char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 string of length 1. -A value of type `str` is represented the same way as `[u8]`, it is a slice of +A value of type `str` is represented the same way as `[u8]`, a slice of 8-bit unsigned bytes. However, the Rust standard library makes extra assumptions about `str`: methods working on `str` assume and ensure that the data in there is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause diff --git a/src/unsafe-keyword.md b/src/unsafe-keyword.md index 5fa5deea6..a29fc9432 100644 --- a/src/unsafe-keyword.md +++ b/src/unsafe-keyword.md @@ -27,9 +27,9 @@ this can be changed by enabling the [`unsafe_op_in_unsafe_fn`] lint. By putting operations into an unsafe block, the programmer states that they have taken care of satisfying the extra safety conditions of all operations inside that block. Unsafe blocks are the logical dual to unsafe functions: -where unsafe functions define a proof obligation that callers must uphold, unsafe blocks state that all relevant proof obligations have been discharged. +where unsafe functions define a proof obligation that callers must uphold, unsafe blocks state that all relevant proof obligations of functions or operations called inside the block have been discharged. There are many ways to discharge proof obligations; -for example, there could be run-time checks or data structure invariants that guarantee that certain properties are definitely true, or the unsafe block could be inside an `unsafe fn` and use its own proof obligations to discharge the proof obligations of its callees. +for example, there could be run-time checks or data structure invariants that guarantee that certain properties are definitely true, or the unsafe block could be inside an `unsafe fn`, in which case the block can use the proof obligations of that function to discharge the proof obligations arising inside the block. Unsafe blocks are used to wrap foreign libraries, make direct use of hardware or implement features not directly present in the language. For example, Rust provides the language features necessary to implement memory-safe concurrency in the language but the implementation of threads and message passing in the standard library uses unsafe blocks.