版權(quán)聲明:原創(chuàng)不易墩衙,轉(zhuǎn)載請注明出處务嫡。
&str類型是rust中最基本的字符串類型甲抖,聲明一個&str類型的變量很簡單:
let s = "hello rust";
&str類型
我們可以打印出上述定義中變量s
的類型:
#![feature(type_name_of_val)]
fn main() {
let s = "hello rust!";
println!("{}: {}", std::any::type_name_of_val(&s), s);
}
在 rust-playground 中使用nightly版本編譯:
關(guān)于 str和&str標(biāo)準(zhǔn)庫文檔是如此說明的:
The str type, also called a 'string slice', is the most primitive string type. It is usually seen in its borrowed form, &str. It is also the type of string literals, &'static str.
String slices are always valid UTF-8.
通俗理解,str
類型是字符串切片類型心铃,是rust中最基本的字符串類型惧眠,但是我們見的更多的是它的借用類型(引用值),也就是&str
于个,最直觀的例子就是擁有靜態(tài)生命周期'static
的字符串字面量。
另有 《Why Rust?》中給出的示例:
let seasons = vec!["Spring", "Summer", "Bleakness"];
即:
This declares seasons to be a value of type Vec<&str>, a vector of references to statically allocated strings.
因此在rust中&str
類型為: 靜態(tài)內(nèi)存分配字符串的引用
[T]暮顺、&[T] 和 FatPtr
Rust中切片類型表示為 &[T]
厅篓,它表示無法在編譯期確定大小的同一種類型數(shù)據(jù)的連續(xù)內(nèi)存序列[T]
的視圖
,它在內(nèi)存中的管理是基于Repr
union 來實(shí)現(xiàn)的捶码,&[T]
即指向[T]
類型的指針羽氮,這個指針在最底層是通過稱為胖指針(FatPtr
)的結(jié)構(gòu)體來模擬的:
// src/libcore/ptr/mod.rs
#[repr(C)]
pub(crate) union Repr<T> {
pub(crate) rust: *const [T],
rust_mut: *mut [T],
pub(crate) raw: FatPtr<T>,
}
#[repr(C)]
pub(crate) struct FatPtr<T> {
data: *const T,
pub(crate) len: usize,
}
在內(nèi)存布局(memory layout)上, 切片變量和FatPtr
類型的變量共享同一片內(nèi)存空間,而FatPtr中則保存了"切片"的必要特征:
- data: 指向若干同質(zhì)連續(xù)數(shù)據(jù)內(nèi)存首地址的指針惫恼;
- len:
data
指針?biāo)赶虻倪B續(xù)內(nèi)存段中存放的元素?cái)?shù)目档押;
而借助于Rust類型系統(tǒng)的優(yōu)勢,標(biāo)準(zhǔn)庫在[T]
類型上定義的方法和trait則完全封裝了底層負(fù)責(zé)解釋指針含義的工作(這部分解釋工作需要依賴unsafe rust來實(shí)現(xiàn))祈纯。
如標(biāo)準(zhǔn)庫實(shí)現(xiàn)的len方法:
// src/libcore/slice/mod.rs
#[lang = "slice"]
#[cfg(not(test))]
impl<T> [T] {
/// Returns the number of elements in the slice.
///
/// # Examples
///
/// ```
/// let a = [1, 2, 3];
/// assert_eq!(a.len(), 3);
/// ```
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_stable(feature = "const_slice_len", since = "1.32.0")]
#[inline]
// SAFETY: const sound because we transmute out the length field as a usize (which it must be)
#[allow(unused_attributes)]
#[allow_internal_unstable(const_fn_union)]
pub const fn len(&self) -> usize {
unsafe { crate::ptr::Repr { rust: self }.raw.len }
}
str類型
查看標(biāo)準(zhǔn)庫對于 str
類型的實(shí)現(xiàn):
// src/libcore/str/mod.rs
#[lang = "str"]
#[cfg(not(test))]
impl str {
// ...
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_stable(feature = "const_str_len", since = "1.32.0")]
#[inline]
pub const fn len(&self) -> usize {
self.as_bytes().len()
}
// ...
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_stable(feature = "str_as_bytes", since = "1.32.0")]
#[inline(always)]
#[allow(unused_attributes)]
#[allow_internal_unstable(const_fn_union)]
pub const fn as_bytes(&self) -> &[u8] {
#[repr(C)]
union Slices<'a> {
str: &'a str,
slice: &'a [u8],
}
// SAFETY: const sound because we transmute two types with the same layout
unsafe { Slices { str: self }.slice }
}
// ...
我們知道令宿,&str
類型變量可以通過調(diào)用len
方法獲取字符串中的字節(jié)個數(shù),查看len
函數(shù)的定義可以發(fā)現(xiàn)腕窥,其內(nèi)部是調(diào)用了as_bytes
方法實(shí)現(xiàn)的粒没;as_bytes
方法中定義了一個union類型 Slices
,并且聲明為和C語言的內(nèi)存布局一致(#[repr(C)]
):
#[repr(C)]
union Slices<'a> {
str: &'a str,
slice: &'a [u8],
}
熟悉union的同學(xué)不難發(fā)現(xiàn)簇爆,&str
和&[u8]
的內(nèi)存布局是一樣的癞松,從而&str
是&[T]
當(dāng)T=u8
時(shí)的特例!而len
方法不過是調(diào)用了&[u8]
的len
方法而已入蛆。
&str v.s. &[u8]
String slices are always valid UTF-8.
字符串切片類型總是合法的utf-8
字節(jié)序列响蓉。
&str -> &[u8]
let s = "hello rust";
let bytes = s.as_bytes();
&[u8] -> &str
// src/libcore/str/mod.rs
#[stable(feature = "rust1", since = "1.0.0")]
pub fn from_utf8(v: &[u8]) -> Result<&str, Utf8Error> {
run_utf8_validation(v)?;
// SAFETY: Just ran validation.
Ok(unsafe { from_utf8_unchecked(v) })
}
#[stable(feature = "str_mut_extras", since = "1.20.0")]
pub fn from_utf8_mut(v: &mut [u8]) -> Result<&mut str, Utf8Error> {
run_utf8_validation(v)?;
// SAFETY: Just ran validation.
Ok(unsafe { from_utf8_unchecked_mut(v) })
}
#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
pub unsafe fn from_utf8_unchecked(v: &[u8]) -> &str {
&*(v as *const [u8] as *const str)
}
#[inline]
#[stable(feature = "str_mut_extras", since = "1.20.0")]
pub unsafe fn from_utf8_unchecked_mut(v: &mut [u8]) -> &mut str {
&mut *(v as *mut [u8] as *mut str)
}
其中 run_utf8_validation(v)
做了必要的utf-8字節(jié)序列的合法性檢測,若不符合utf-8規(guī)范哨毁,則拋出Error枫甲。
One more thing
思考下面的例子:
let s = "hello rust";
let len = s.len();
其中 s的類型是 &str
,那么s是怎么調(diào)用定義在 str
類型上的方法len
的呢扼褪?
是因?yàn)闃?biāo)準(zhǔn)庫已經(jīng)為我們對任意類型&T
實(shí)現(xiàn)了 Deref
trait:
// src/libcore/ops/deref.rs
#[stable(feature = "rust1", since = "1.0.0")]
impl<T: ?Sized> Deref for &T {
type Target = T;
fn deref(&self) -> &T {
*self
}
}
// ...
#[stable(feature = "rust1", since = "1.0.0")]
impl<T: ?Sized> Deref for &mut T {
type Target = T;
fn deref(&self) -> &T {
*self
}
}
而實(shí)現(xiàn)了Deref trait的類型言秸,編譯器會在適當(dāng)?shù)牡胤綄ψ兞窟M(jìn)行足夠多的解引用以使變量的類型轉(zhuǎn)變?yōu)?T
。
由于deref
函數(shù)獲取的變量&self
是不可變引用:
#[lang = "deref"]
#[doc(alias = "*")]
#[doc(alias = "&*")]
#[stable(feature = "rust1", since = "1.0.0")]
pub trait Deref {
/// The resulting type after dereferencing.
#[stable(feature = "rust1", since = "1.0.0")]
type Target: ?Sized;
/// Dereferences the value.
#[must_use]
#[stable(feature = "rust1", since = "1.0.0")]
fn deref(&self) -> &Self::Target;
}
因此保證了由編譯器來進(jìn)行解引用總是安全的迎捺。
參考資料
- ptr: https://doc.rust-lang.org/std/ptr/index.html
- str: https://doc.rust-lang.org/std/str/index.html
- slice: https://doc.rust-lang.org/std/slice/index.html
- 《Rust編程之道》
- 《Why Rust?》