In a recent post, rachelbythebay showed this example of a very surprising C++ behavior (graceously leaving the explanation as an exercise to the reader):

#include <stdio.h>
 
#include <map>
#include <string>
 
int main() {
  std::map<std::string, std::string> x;
  int i = 0x5001;
  x["foo"] = i;
 
  printf("%s", x["foo"].c_str());
 
  return 0;
}

So, what is happening here? Compilation does not give any warnings and it runs without segfaults, and more, without any visible output!

What happened to type safety?! How on earth is this possible?! Is it a compiler bug? No, it turns out that it’s perfectly valid C++ (C++98, to be pedantic!).

What is x[“foo”], exactly?

My first reflex was to look at the assembly:

$ clang++ -Wall -Wextra -g strmap.cc
$ otool -tv ./a.out | c++filt

and oops, trying to wade through C++ heavily templated symbol names was an exercise in futility (unlike plain C).

Ok, let’s fire up a debugger, which is by definition more human-friendly, and set a breakpoint after the mysterious assignment statement and see what we got:

$ lldb ./a.out
Current executable set to './a.out' (x86_64).
(lldb) b 11
Breakpoint 1: where = a.out`main + 354 at strmap.cc:11, address = 0x0000000100000bc2
(lldb) r
Process 25081 launched: './a.out' (x86_64)
Process 25081 stopped
* thread #1: tid = 0x45b804, 0x0000000100000bc2 a.out`main + 354 at strmap.cc:11, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000bc2 a.out`main + 354 at strmap.cc:11
   8      int i = 0x5001;
   9      x["foo"] = i;
   10    
-> 11     printf("%s", x["foo"].c_str());
   12    
   13     return 0;
   14   }
(lldb) p x
(std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<const std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > >) $0 = size=1 {
  [0] = (first = "foo", second = "\x01")
}
(lldb) 

Whoa, the string "\x01" is inserted silently! Is this some kind of integer-pointer conversion that creates a C string from the integer? If it is, why there are no segfaults (0x5001 does not look like a memory address usually used in Unix-like OSes). If it a pointer conversion, let’s try to produce a segfault by changing 0x5001 to something else, like 0x6001. Still no segfaults, and the string is still "\x01". So, it’s not a pointer, it’s the integer value that is changed into "\x01" somehow. Now it’s clear that int is silently converted to char, leaving us with the least significant byte.

Let’s make the source code a little less obscure:

x["foo"] = 'a';

String constructors?

Now we have a new mystery: how can you put a char into map<string, string>?

The usual suspects in such investigations are usually constructors that can kick in silently and make everything into everything. Is it a std::string constructor that would take a char (business as usual in the C++ land, why not)?

Let’s read the docs on basic_string constructors. Still, there is no sign of a constructor that would take a bare CharT value as an argument.

Is it an initializer_list somewhere that could be made with one char? No, clang++ -std=c++98 strmap.cc is happy to produce the same result.

Another twenty minutes are fruitlessly spent on trying to guess what other constructor could be hidden here, until an experiment proves me wrong:

x["foo"] = std::string('a');

spews two screens of compilation errors that can be summarized as “no such constructor”.

Who else was there?

Now the investigation turns to every witness of the crime, one of which is std::map. How does std::map::operator[] work? Reading the docs clears this up a bit:

1) Inserts value_type(key, T()) if the key does not exist. This function is equivalent to return insert(std::make_pair(key, T())).first->second;

  • key_type must meet the requirements of CopyConstructible.
  • mapped_type must meet the requirements of CopyConstructible and DefaultConstructible.

If an insertion is performed, the mapped value is value-initialized (default-constructed for class types, zero-initialized otherwise) and a reference to it is returned.

So that’s how the string "\x01" is created: it emerges as an empty string in x["foo"] and gets its content later. Now it’s clear that the default constructor is called and the content is assigned later through std::string::operator=.

std::string::operator= in its fourth (sic) form is the last piece of the puzzle:

4) Replaces the contents with character ch as if by *this = basic_string(1,c)

That’s what really happens:

int i = 0x5001;
x["foo"] = std::string();  // in-place creation
x["foo"] = (char)i;        // the string content is replaced with (char)i

Conclusion

We have a nice Rube-Goldberg machine here:

  • int is silently converted into char;
  • std::map creates an empty string for a key that does not exist;
  • std::string::operator= is heavily overloaded, so one of its forms takes a char.

The lesson to learn here is that C++ compilers could waste less time and CPU cycles trying to guess the programmer intent and could benefit from Rust-like explicitness and strictness.