Building Bridges to C++

Document #:
Date: 2024-11-8
Project: Programming Language C++
Audience:
Reply-to: Sean Baxter
<>

1 Motivation

Companies ship software that contains security vulnerabilites to millions of customers. For C++ products, 70% of those vulnerabilities would be stopped by a memory-safe language. There’s growing pressure to move off memory-unsafe languages and onto safe languages like Rust, Swift, Java and C#. The US Government is calling for safety roadmaps from big vendors outlining how they’ll migrate to memory-safe languages for new code. The deadline for these roadmaps is coming up: January 1 2026.

What can be done to hasten the migration to safe coding?

I proposed the Safe C++ extension. This overhauls Standard C++ with memory safety capabilities. It implements the same borrow-checking technology as featured in Rust. This is one path for C++ projects to start writing memory-safe code.

A second viable path to safety is through improved Rust interop. The recent study Eliminating Memory Safety Vulnerabilities at the Source demonstrates that old production code contains fewer vulnerabilities than new code. Time has debugged it. Consequently, the best way to reduce vulnerabilities is to put existing code in maintenance mode and write new code in a memory-safe language. This document explores an idea for dramatically reducing interop friction between C++ and Rust. If it’s easy to use C++ code from Rust, developers will be more open to making the transition.

This is a proposal about molding C++ to support all of Rust’s vocabulary types to increase the surface area of interop between the languages.

2 C interoperability

Operating systems expose functionality through C APIs. Standard libraries, for any language, are built atop these system APIs. Interoperability with C is very easy for language developers. There’s no overloading of declarations. There are no templates or generics. Structs have straight-forward layout rules that are no challenge to implement.

For the purpose of compilers, C’s ABI is just the parameter-passing convention of the platform your program targets. Unix-like systems follow the ELF object file conventions. Each CPU architecture has an ELF or System V supplement that specifies struct layouts, parameter passing and object file definitions.

Peruse the x86-64 System V ABI for details on processor-specific conventions. C abstracts these concerns from the user. If you code against the C language, your software should compile for many operating systems and hardware architectures.

Languages provide a way to define C-layout structs. Compilers implement the parameter-passing conventions for each target. Voilà. C interoperability.

3 C++ interoperability

To call C functions, you don’t need much. To call C++ functions you need all the intelligence of a C++ frontend. There are a lot of factors that contribute to making C++ interoperability a colossal challenge.

C++ is a big knot that can’t be untangled. If each language feature is hitch or bend, tugging at one concern just tightens the others.

4 Two classes of interop

Let’s break the interop problem into first- and second-class levels of support.

  1. Coverage - First-class support for language features. By definition, C++ has coverage of all C++ features and Rust has coverage of all Rust features. We can grow the interop surface area by adding Rust feature coverage to C++ and C++ feature coverage to Rust. Existing C++/Rust interop tools rely on coverage, which is why they operate at the C language level of abstraction: both toolchains support C layout and function calls. Coverage is high-quality but it increases the complexity of the toolchain.
  2. Intelligence - Second-class support for language features. Language A needs access to data and semantics of language B. If you want to call a C++ function, you need argument-dependent lookup, overload resolution, argument deduction, substitution and so on. Rather than language A implementing this complex logic, it gets provided across toolchains by language B.

Intelligence is the novel portion of this design. Expose compiler functionality through an API. Using the API, point the compiler at a module or header file to parse it and return a metadata tree of declarations. Submit a query, such as request for the primary, partial or explicit specialization of a class template and retrieve a result. This compiler-as-library, which provides intelligence, is a language server. Rust and C++ compilers can access data and semantics by utilizing one another as language servers.

Coverage goes beyond intelligence in letting you not just use declarations, but define them. We don’t propose adding coverage of C++ templates into Rust (it already has a rich generics system), so you can’t define templates in Rust. But you can use them through the c++ language server.

5 Wide coverage for Rust interop

In the Safe C++ proposal I introduced a new std2 standard library. The containers are designed with borrows, lifetime parameters and relocation semantics to provide rigorous memory safety. But the excellent Eliminating Memory Safety Vulnerabilities at the Source study out of Google made me reconsider this design choice. The study makes a strong case that rather than worrying about rewriting C++ code, the best strategy for improving software quality is to focus on a quick transition to memory-safe languages.

Extending C++ to natively use Rust’s standard library directly improves interoperability.

5.1 Using Rust from C++

Consider extending language coverage and accessing cross-language intelligence to model a toolchain where Rust declarations can be used directly from C++ without bridge code. Let’s walk through a scenario.

  1. Intelligence: A C++ file imports a Rust module into a C++ namespace. The Rust language server parses the module code and returns metadata of all parsed declarations. Rust has a different layout scheme than C++, and only structs marked with #[repr(C)] are guaranteed compatible with C layout. Since we want to support all Rust types, struct and enum layouts are part of the discovery data provided by the language server.
  2. Coverage: The C++ frontend injects these declarations into the requested namespace, making them available for qualified lookup. Name lookup is natively supported by C++ and doesn’t require use of a language server.
  3. Coverage: C++ code can define functions originally declared on the Rust side. Safe C++ already has a safe-specifier, borrow types and lifetime parameters with outlives-constraints.
  4. Coverage: Lower C++ functions from AST to MIR. Safe C++ implements NLL borrow checking, which guarantees that the exclusivity and lifetime invariants implied by the function declaration are upheld through the definition of the function. This is coverage, because it’s a first-class feature.
  5. Intelligence: Function declarations that are ODR-used require definitions be emitted to satsify the linker. C++ transmits ODR usage of Rust-provided functions to the Rust language server. Rust must lower these implementations to LLVM bitcode, which is merged with the C++ bitcode prior to optimization.

Prioritize a list of features to improve C++’s coverage of Rust:

These are profound extensions to C++. Rust types use relocation semantics rather than C++11 move semantics, so C++ compilers need a new mid-level IR subsystem to perform initialization analysis and drop elaboration. In order to call functions with Rust types, we essentially embed Rust’s object model into the C++ extension.

5.2 Using C++ from Rust

Let’s go in the other direction and use C++ entities from Rust:

  1. Intelligence: A Rust module imports a C++ header. The C++ language server parses the header’s text and returns metadata of all parsed declarations.
  2. Coverage: The Rust frontend injects the supported declarations into the requested namespace, making them available for qualified lookup.
  3. Intelligence: Rust code can use C++ types and functions. If it wants to specialize a class template or call a member function on a C++ object, it uses the C++ language server to perform specialization or overload resolution. While it’s possible to include C++ semantics directly into a Rust frontend, that is a big lift. A language server provides the same result for your function call without the immense cost in tooling development.
  4. Coverage: After argument deduction and overload resolution choose a best viable candidate, Rust has to make a direct function call. This requires coverage of C++ function call ABI, which is absent in existing interop tools. It has to be able to copy or move its function arguments. Types with non-trivial move constructors, like std::string, will require that Rust support non-trivial relocation.
  5. Intelligence: As with C++, any ODR-used foreign functions (ODR usage is discovered when a function is lowered to IR) must be communicated back through the language server. The C++ compiler generates those definitions in LLVM bitcode. The linker incorporates both the C++ and Rust definitions in the same crate.

What new coverage does Rust need for high-quality C++ interop? This is more modest than the C++ extensions, because there’s a desire maintain the relative simplicity of Rust. The C++ coverage can be considered an interop extension rather than “core language.”

We won’t be able to define in Rust all functions previously declared in C++, since some function prototypes involve language entities that extended Rust doesn’t have coverage for. Overloading is supported, but templates aren’t. But that should be okay. You can still use C++ types and functions directly from Rust. The C++ language server is responsible for evaluating the semantics around function calls and template specializations.

5.3 One-sided interop without language servers

We can add a lot of value even with one-sided interop. Extend C++ with coverage of all of Rust’s types. This provides an environment to write idiomatic wrappers that provide access to C++ functionality through Rust’s native types and traits. There’s still the impedence of only being able to access C++ assets via these wrappers, but since the wrappers are implemented in C++ side, they have unfettered access to your legacy C++ code. It’s just C++ wrapper implementations calling into other C++ code.

Contrast with the current practice of trying to bridge the language divide with unsafe C APIs, and then wrapping those in Rust. It’s that loss of expressiveness that makes C API bridge interop so frustrating.

5.4 Parameter destructors

The C++ Standard doesn’t prescribe parameter-passing conventions. That’s left to the platform ABI. On Unix-like platforms, the Itanium C++ ABI stipulates that callers are responsible for calling destructors on function arguments.

If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception).

Itanium C++ ABI: Non-Trivial Parameters[https://itanium-cxx-abi.github.io/cxx-abi/abi.html#non-trivial-parameters]

fn f1(s:String) { 
  // s is owned and destroyed by f1.
}

fn f2(s:String) {
  // s is owned by f2.

  // s is relocated to f1. It's no longer owned by f2.
  f1(s);

  // s is not destroyed because it's not an owned place.
}

Rust performs relocation on objects that are non-Copy. Relocating s from f1 into f2 leaves the s parameter uninitialized. Drop instructions for local objects with non-trivial destructors are emitted when a function is lowered to MIR, but a subsequent drop elaboration pass eliminates drops for places that are uninitialized.

In Rust, callees destroy parameter objects. This is necessary since a parameter may be relocated or dropped before the end of its scope. Calling a Rust function with the C++ convention risks a double-free: from C++, the caller would destroy the argument; from Rust, the callee would destroy the parameter.

C++ needs an alternate calling convention to support Rust’s affine type system. The Safe C++ draft discusses [function parameter ownership], proposing a __relocate calling convention that gives ownership of parameters to callees.

5.5 The std::string tragedy

Almost all C++ container types are trivially relocatable without knowing it. Important types like unique_ptr, shared_ptr and vector are trivially relocatable. Their declarations could be marked with a [[trivially_relocatable]] attribute for compatibility with Rust’s relocation semantics.

Unfortunately, the libstdc++ version of std::string is not trivially relocatable. It implements a small-string optimization that maintains a pointer back into its own storage. Move construction and move assignment reset the small-string pointer back to local storage. Trivial relocation would leave a dangling pointer. The idea was to get rid of a branch when calling the std::string::data() member function. But this optimization makes for a pretty wasteful implementation. std::string weighs 32 bytes (std::vector is 24 bytes) but only has a local capacity for strings of 15 characters or fewer.

This is not the first troublesome string. There was a previous libstdc++ std::string that used copy-on-write to deliver cheap string copies. This was no good, because using the non-const operator[] function would technically invalidate the string, spawning a copy of the data if the string didn’t have exclusive ownership. (A pity. Why are you even allowed to modify strings like that?)

The move away from the copy-on-write string was one of the few ABI breaks in C++ history. Rust’s avoidance of a stable ABI makes it easier to change library implementations to satisfy new requirements. But C++ has a stable ABI, for better or worse, and you have to play it as it lays. Thanks to std::string and the transitive property of containment, non-trivial relocation is a necessary buy-in for Rust to support move semantics for many C++ types.

5.6 Swift’s coverage tradeoffs

The Swift team has been working for several years improving C++ interop. Their effort also embeds a C++ compiler (which is Clang) into the Swift toolchain. There’s no way to interop with C++ without embedding a C++ frontend.

The question of how much C++ coverage to incorporate in Swift is one that the engineers are wrestling with.

  • Functions and constructors that use r-value reference types are not yet available in Swift.
  • Swift supports calling some C++ function templates.
  • Any function or function template that uses a dependent type in its signature, or a universal reference (T &&) is not available in Swift.
  • Any function template with non-type template parameters is not available in Swift.
  • Variadic function templates are not available in Swift.

Supported C++ APIs

The Swift language remains slightly smaller at the cost of not being able to use a large amount of C++. Without access to move semantics, it’s really not able to use any of it efficiently. Is this tradeoff worth it? I don’t think so. I don’t advocate a maximalist approach to extending Rust with C++ capabilities (although I do favor maximalism in the other direction), but I am convinced that a few strategic extensions to Rust will have enormous payoff for a quality interop experience.

  • Enumerations that have an enumeration case with more than one associated value [are not yet supported]

Swift Enumerations Supported by C++

Swift didn’t extend its embedded C++ compiler with first-class enum types. Therefore, the C++ side can’t use Swift enums with more than one associated value. Enums are a flagship feature for both Rust and Swift. I think it’s worth it to extend the C++ side to fully support them. Safe C++ has first-class choice types with pattern matching. While maintaining these extensions is a burden for C++ tooling engineers, the goal of interop isn’t to make their life easier, it’s to make everyone else’s life easier.

6 Exception handling

C++ exception handling is a major source of friction when dealing with Rust interop. But it doesn’t have to be. Rust is 99% of the way to supporting C++ exceptions. When compiled with -C panic=unwind, which is the default, Rust functions are all potentially throwing. When lowered to MIR and then to LLVM, function calls have a normal edge leading to the next statement and a cleanup edge that catches the exception, calls the destructor for all in-scope objects with non-trivial drops, and resumes unwinding. This is exactly what C++ does.

6.1 How C++ unwinds

(Compiler Explorer)

struct HasDtor {
  int i;
  ~HasDtor() { }
};

// Potentially throwing. (i.e. not noexcept)
void may_throw() { }

int func() {
  HasDtor a { };

  // On the cleanup edge out of may_throw, run a's dtor.
  may_throw();

  return 1;
}
define dso_local noundef i32 @func()() #0 personality ptr @__gxx_personality_v0 !dbg !15 {
  %1 = alloca %struct.HasDtor, align 4
  %2 = alloca ptr, align 8
  %3 = alloca i32, align 4
  call void @llvm.dbg.declare(metadata ptr %1, metadata !20, metadata !DIExpression()), !dbg !28
  %4 = getelementptr inbounds %struct.HasDtor, ptr %1, i32 0, i32 0, !dbg !29
  store i32 1, ptr %4, align 4, !dbg !29
  invoke void @may_throw()()
          to label %5 unwind label %6, !dbg !30

5:
  call void @HasDtor::~HasDtor()(ptr noundef nonnull align 4 dereferenceable(4) %1) #4, !dbg !31
  ret i32 1, !dbg !31

6:
  %7 = landingpad { ptr, i32 }
          cleanup, !dbg !31
  %8 = extractvalue { ptr, i32 } %7, 0, !dbg !31
  store ptr %8, ptr %2, align 8, !dbg !31
  %9 = extractvalue { ptr, i32 } %7, 1, !dbg !31
  store i32 %9, ptr %3, align 4, !dbg !31
  call void @HasDtor::~HasDtor()(ptr noundef nonnull align 4 dereferenceable(4) %1) #4, !dbg !31
  br label %10, !dbg !31

10:
  %11 = load ptr, ptr %2, align 8, !dbg !31
  %12 = load i32, ptr %3, align 4, !dbg !31
  %13 = insertvalue { ptr, i32 } poison, ptr %11, 0, !dbg !31
  %14 = insertvalue { ptr, i32 } %13, i32 %12, 1, !dbg !31
  resume { ptr, i32 } %14, !dbg !31
}

declare i32 @__gxx_personality_v0(...)

In C++, in-scope objects with non-trivial destructors are destroyed by the cleanup block. Here the cleanup block is 6. The landingpad instruction advertises its intent to cleanup in-scope objects. The cleanup block copies out a { ptr, i32 } pair, which indicates the exception object, calls HasDtor’s destructor, and resumes on that cached pair. Since the function participates in exception handling it is associated with __gxx_personality_v0, C++’s standard personality function, which abstracts some even lower-level exception-handling APIs.

6.2 How Rust unwinds

struct HasDtor { i: i32 }

impl Drop for HasDtor {
  fn drop(&mut self) { }
}

// Potentially throwing. (i.e. not noexcept)
fn may_throw() { }

fn func() -> i32 {
  let _a = HasDtor { i: 1 };

  // On the cleanup edge out of may_throw, run a's dtor.
  may_throw();

  return 1;
}
define internal i32 @_ZN5throw4func17hd08044f7eb69f50cE() unnamed_addr #1 personality ptr @rust_eh_personality {
start:
  %0 = alloca [16 x i8], align 8
  %_a = alloca [4 x i8], align 4
  store i32 1, ptr %_a, align 4
; invoke throw::may_throw
  invoke void @_ZN5throw9may_throw17hb8b8ce4f5b598848E()
          to label %bb1 unwind label %cleanup

bb3:                                              ; preds = %cleanup
; invoke core::ptr::drop_in_place<throw::HasDtor>
  invoke void @"_ZN4core3ptr35drop_in_place$LT$throw..HasDtor$GT$17hcc21909492c17e73E"(ptr align 4 %_a) #5
          to label %bb4 unwind label %terminate

cleanup:                                          ; preds = %start
  %1 = landingpad { ptr, i32 }
          cleanup
  %2 = extractvalue { ptr, i32 } %1, 0
  %3 = extractvalue { ptr, i32 } %1, 1
  store ptr %2, ptr %0, align 8
  %4 = getelementptr inbounds i8, ptr %0, i64 8
  store i32 %3, ptr %4, align 8
  br label %bb3

bb1:                                              ; preds = %start
; call core::ptr::drop_in_place<throw::HasDtor>
  call void @"_ZN4core3ptr35drop_in_place$LT$throw..HasDtor$GT$17hcc21909492c17e73E"(ptr align 4 %_a)
  ret i32 1

terminate:                                        ; preds = %bb3
  %5 = landingpad { ptr, i32 }
          filter [0 x ptr] zeroinitializer
  %6 = extractvalue { ptr, i32 } %5, 0
  %7 = extractvalue { ptr, i32 } %5, 1
; call core::panicking::panic_in_cleanup
  call void @_ZN4core9panicking16panic_in_cleanup17hb5e4521fe5c4d68fE() #6
  unreachable

bb4:                                              ; preds = %bb3
  %8 = load ptr, ptr %0, align 8
  %9 = getelementptr inbounds i8, ptr %0, i64 8
  %10 = load i32, ptr %9, align 8
  %11 = insertvalue { ptr, i32 } poison, ptr %8, 0
  %12 = insertvalue { ptr, i32 } %11, i32 %10, 1
  resume { ptr, i32 } %12
}

declare i32 @rust_eh_personality(i32, i32, i64, ptr, ptr) unnamed_addr #1

Rust does all the same cleanup as C++. In fact, it does more cleanup, because even its destructors are potentially throwing. C++ destructors are implicitly noexcept. In this Rust example, the cleanup block is called cleanup. The landingpad instruction expresses the cleanup handler and caches the same { ptr, i32 } pair. The cleanup code branches to bb3 which calls HasDtor’s destructor. But that destructor is also potentially throwing. If the destructor throws, it’s non-recoverable, since we’re already on the cleanup path. That cleanup edge jumps to the terminate block which calls core::panicking::panic_in_cleanup. That function prints “panic in a destructor during cleanup” and aborts. The normal path out of the destructor branches to bb4 which resumes stack unwinding.

If you look closely you may one salient difference: Rust uses the rust_eh_personality personality function. This is closely modeled on the C++ version: rust_eh_personality_impl.

If Rust’s personality function is actually incompatible with C++ cleanup (I don’t know if it is or not), it can be replaced by __gxx_personality_v0. Additionally, for consistency with C++ exceptions, Rust’s panic objects could be allocated with __cxa_allocate_exception, the same storage that backs C++ exceptions. That’s part of libc++abi.

6.3 RTTI

(Compiler Explorer)

struct S { int i; };

void throw_it() {
  throw S { 10 };
}

int main() {
  try {
    throw_it();

  } catch(S s) {

  } catch(int i) {

  }
}
%struct.S = type { i32 }

$_ZTS1S = comdat any

$_ZTI1S = comdat any

@_ZTVN10__cxxabiv117__class_type_infoE = external global [0 x ptr]
@_ZTS1S = linkonce_odr dso_local constant [3 x i8] c"1S\00", comdat, align 1
@_ZTI1S = linkonce_odr dso_local constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cx
xabiv117__class_type_infoE, i64 2), ptr @_ZTS1S }, comdat, align 8
@_ZTIi = external constant ptr

; Function Attrs: mustprogress noinline optnone uwtable
define dso_local void @_Z8throw_itv() #0 {
  %1 = call ptr @__cxa_allocate_exception(i64 4) #4
  %2 = getelementptr inbounds %struct.S, ptr %1, i32 0, i32 0
  store i32 10, ptr %2, align 16
  call void @__cxa_throw(ptr %1, ptr @_ZTI1S, ptr null) #5
  unreachable
}

C++ uses RTTI typeinfo data to identify the type of a thrown exception. The throw-expression passes a pointer to _ZTS1S to __cxa_throw. That’s the RTTI typeinfo structure for class S.

  %8 = landingpad { ptr, i32 }
          catch ptr @_ZTI1S
          catch ptr @_ZTIi
  %9 = extractvalue { ptr, i32 } %8, 0
  store ptr %9, ptr %2, align 8
  %10 = extractvalue { ptr, i32 } %8, 1
  store i32 %10, ptr %3, align 4
  br label %11

The try-statement in main indicates the RTTI typeinfo data for all of its catch-clauses. Rust doesn’t exactly conform to this convention. Does that create interoperability problems? I’m not sure. It is the case that C++ can’t catch panic objects. But this is easy to resolve: emit a C++ RTTI typeinfo struct for the Rust panic type and point __cxa_throw at that. This is a very minor change, if it is necessary at all.

We can unstick one of interop’s most irritating sticking points. C++ exceptions will propagate safely through Rust frames, properly destroying all in-scope objects. As far as the ability to catch C++ exceptions, coverage could be added to Rust. But since that’s already part of C++, you may as well do it there: write your catch/throw handler on the C++ side. Interop will let you return Result or any other Rust type.