|
| 1 | +//! Utilities related to FFI bindings. |
| 2 | +//! |
| 3 | +//! This module provides utilities to handle data across non-Rust |
| 4 | +//! interfaces, like other programming languages and the underlying |
| 5 | +//! operating system. It is mainly of use for FFI (Foreign Function |
| 6 | +//! Interface) bindings and code that needs to exchange C-like strings |
| 7 | +//! with other languages. |
| 8 | +//! |
| 9 | +//! # Overview |
| 10 | +//! |
| 11 | +//! Rust represents owned strings with the [`String`] type, and |
| 12 | +//! borrowed slices of strings with the [`str`] primitive. Both are |
| 13 | +//! always in UTF-8 encoding, and may contain nul bytes in the middle, |
| 14 | +//! i.e., if you look at the bytes that make up the string, there may |
| 15 | +//! be a `\0` among them. Both `String` and `str` store their length |
| 16 | +//! explicitly; there are no nul terminators at the end of strings |
| 17 | +//! like in C. |
| 18 | +//! |
| 19 | +//! C strings are different from Rust strings: |
| 20 | +//! |
| 21 | +//! * **Encodings** - Rust strings are UTF-8, but C strings may use |
| 22 | +//! other encodings. If you are using a string from C, you should |
| 23 | +//! check its encoding explicitly, rather than just assuming that it |
| 24 | +//! is UTF-8 like you can do in Rust. |
| 25 | +//! |
| 26 | +//! * **Character size** - C strings may use `char` or `wchar_t`-sized |
| 27 | +//! characters; please **note** that C's `char` is different from Rust's. |
| 28 | +//! The C standard leaves the actual sizes of those types open to |
| 29 | +//! interpretation, but defines different APIs for strings made up of |
| 30 | +//! each character type. Rust strings are always UTF-8, so different |
| 31 | +//! Unicode characters will be encoded in a variable number of bytes |
| 32 | +//! each. The Rust type [`char`] represents a '[Unicode scalar |
| 33 | +//! value]', which is similar to, but not the same as, a '[Unicode |
| 34 | +//! code point]'. |
| 35 | +//! |
| 36 | +//! * **Nul terminators and implicit string lengths** - Often, C |
| 37 | +//! strings are nul-terminated, i.e., they have a `\0` character at the |
| 38 | +//! end. The length of a string buffer is not stored, but has to be |
| 39 | +//! calculated; to compute the length of a string, C code must |
| 40 | +//! manually call a function like `strlen()` for `char`-based strings, |
| 41 | +//! or `wcslen()` for `wchar_t`-based ones. Those functions return |
| 42 | +//! the number of characters in the string excluding the nul |
| 43 | +//! terminator, so the buffer length is really `len+1` characters. |
| 44 | +//! Rust strings don't have a nul terminator; their length is always |
| 45 | +//! stored and does not need to be calculated. While in Rust |
| 46 | +//! accessing a string's length is an *O*(1) operation (because the |
| 47 | +//! length is stored); in C it is an *O*(*n*) operation because the |
| 48 | +//! length needs to be computed by scanning the string for the nul |
| 49 | +//! terminator. |
| 50 | +//! |
| 51 | +//! * **Internal nul characters** - When C strings have a nul |
| 52 | +//! terminator character, this usually means that they cannot have nul |
| 53 | +//! characters in the middle — a nul character would essentially |
| 54 | +//! truncate the string. Rust strings *can* have nul characters in |
| 55 | +//! the middle, because nul does not have to mark the end of the |
| 56 | +//! string in Rust. |
| 57 | +//! |
| 58 | +//! # Representations of non-Rust strings |
| 59 | +//! |
| 60 | +//! [`CString`] and [`CStr`] are useful when you need to transfer |
| 61 | +//! UTF-8 strings to and from languages with a C ABI, like Python. |
| 62 | +//! |
| 63 | +//! * **From Rust to C:** [`CString`] represents an owned, C-friendly |
| 64 | +//! string: it is nul-terminated, and has no internal nul characters. |
| 65 | +//! Rust code can create a [`CString`] out of a normal string (provided |
| 66 | +//! that the string doesn't have nul characters in the middle), and |
| 67 | +//! then use a variety of methods to obtain a raw <code>\*mut [u8]</code> that can |
| 68 | +//! then be passed as an argument to functions which use the C |
| 69 | +//! conventions for strings. |
| 70 | +//! |
| 71 | +//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it |
| 72 | +//! is what you would use to wrap a raw <code>\*const [u8]</code> that you got from |
| 73 | +//! a C function. A [`CStr`] is guaranteed to be a nul-terminated array |
| 74 | +//! of bytes. Once you have a [`CStr`], you can convert it to a Rust |
| 75 | +//! <code>&[str]</code> if it's valid UTF-8, or lossily convert it by adding |
| 76 | +//! replacement characters. |
| 77 | +//! |
| 78 | +//! [`String`]: crate::string::String |
| 79 | +//! [`CStr`]: core::ffi::CStr |
| 80 | +
|
| 81 | +#![unstable(feature = "alloc_ffi", issue = "94079")] |
| 82 | + |
| 83 | +#[cfg(bootstrap)] |
| 84 | +#[unstable(feature = "cstr_internals", issue = "none")] |
| 85 | +pub use self::c_str::CStrExt; |
| 86 | +#[unstable(feature = "alloc_c_string", issue = "94079")] |
| 87 | +pub use self::c_str::FromVecWithNulError; |
| 88 | +#[unstable(feature = "alloc_c_string", issue = "94079")] |
| 89 | +pub use self::c_str::{CString, IntoStringError, NulError}; |
| 90 | + |
| 91 | +mod c_str; |
0 commit comments