C2cpp Notes: Smart Pointers and Concurrency
Published:
17. Smart Pointers
Suicidal Pointer: gracefully cleans itself up when no longer referred
A class with pointer inside and acts like a pointer (op*(),op->())
- Self-delete when goes out of scope
- How making two copies? Who’s deleting it?
- Shared pointer: know each other; deleted only after all gone
- Unique pointer: Only one can hold it; can only be moved
Shared Pointer
Implementation
template <typename T> class SharedPtr {
T* ptr;
int* count;
public:
explicit SharedPtr(T* p = nullptr): ptr{p}, count{new int{1}} {}
~SharedPtr() {
if(--*count == 0) {
delete count; delete ptr;
}
}
T& operator*() const { return *ptr; }
T* operator->() const { return ptr; }
operator void*() const { return ptr; } // for if (sp)
T* get_prt() const { return ptr; } // get() in STL
int get_count() const { return *count; }
- Constructed with a pointer (E.
new string{}) - Manages heap-allocated data and
count - Last one out of scope is responsible for deletion
SharedPtr*(const SharedPtr<T>& sp): ptr(sp.ptr), count(sp.count) {
++*count;
}
SharedPtr<T>& operator=(const SharedPtr<T>& sp) {
if (this != &sp) { // delete, and then construct
if (--*count == 0) {delete count; delete sp;}
ptr = sp.ptr;
count = sp.count;
++*count;
}
return *this;
}
};
- Copy constructor makes a shallow copy of
ptrandcount; increments*count - Copy assignment, need detach first
Example
SharedPtr<string> p1 = {new string{"hello"}};
auto foo = [=] (SharedPtr<string> p) { // take p by value, CC!
cout << p.get_count() << endl;
}
foo(p1);
This will invoke copy constructor, incrementing count After foo() returns, p goes out of scope, and count decrements back
Vector
Let SharedPtr<string> contain a string* inside
auto create_vec1() {
SharedPtr<string> p1 { new string{"hello"} };
SharedPtr<string> p2 { new string{"c"} };
SharedPtr<string> p3 { new string{"students"} };
vector v { p1, p2, p3 };
return v;
}
vector v1 = create_vec1();
- Shared pointers themselves
p1… are stack-allocatedstringobjects are heap allocated
- When creating a vector, will make another copy of the string!
- Should be
vector<SharedPtr<s
- Should be
- Return by value:
vwill get moved/elided- Stack-allocated
p1goes out of scope,countgoes back to 1
- Stack-allocated
When create_vec1() returns, stack pointers destroyed, and v moved/elided Now modify it:
vector v2 = v1; // copy pointers, but still same underlying string
*v2[1] = "c2cpp";
*v2[2] = "hackers";
// print v1 again
for (auto&& p : v1) { cout << *p + " "; } cout << '\n';
- Copy construct
v1tov2countgoes up to 2
- Modify
v2: DereferenceSharedPtras raw pointer and modify it! - Print original
v1: also modified!
Loop Types
for (SharedPtr<string> p : v1) { cout << *p + " "; }
for (auto p : v1) // do something
for (auto& p : v1)
for (const auto& p : v1)
for (auto&& p : v1)
auto&grab each by referenceconstnecessary ifv1is aconstobject!auto&&declares a forwarding reference, can pass both L- and R-values- Most general form in template code
xcan either be an L-value ref or R-value ref, depending on category of*v1.begin()- Useful if iterator return is unknown
- E.
vector<bool>
When could dereferencing an iterator give you an R-value?
vector<bool>, since each bit does not have a full address
Make Shared Pointer
MakeSharedPtr<T>() does the new with any number of arguments to construct T
- Variadic function!
- Forwarding reference! Perfect forward to
new T{}expression
template <typename T, typename... Args>
SharedPtr<T> MakeSharedPtr(Args&&... args) {
return SharedPtr<T>{ new T{forward<Args>(args)...} };
}
auto p1 { MakeSharedPtr<string>("hello") }; // My implementation
auto p2 { make_shared<string>("c") }; // STL
static_assert(is_same_v< decltype(p2), shared_ptr<string> >);
Same type as the STL shared_ptr
std::shared_ptr
Usage
string* p0 { new string(50, 'A') };
shared_ptr<string> p1 { new string(50, 'B') };
shared_ptr<string> p2 { make_shared<string>(50, 'C') };
Memory Footprint
{
string* p0 { new string(50, 'A') }; // leak string and char[]
shared_ptr<string> p1 { new string(50, 'B') };
shared_ptr<string> p2 { new MakeSharedPtr<string>(50, 'C') };
}
MakeSharedPtr has one fewer allocation from Valgrind
- Library notices that
new stringis on the heap, so it combines the control blockcountandstringin a single allocation (placementnew) - Our implementation uses
new string, no such freedom
Custom Deletion
{
SharedPtr<string> p0 { new string[2] {"a", "b"} };
}
Deletion crashes for
string[2]! Needdelete[]
Need to write custom lambda functor for delete
shared_ptr<string> p0 {new string[2] {"a", "b"},
[](string* s) {delete[] s;}
}
Can also do for FILEs
shared_ptr<FILE> sp { fp,
[](FILE* fp) {fclose(fp);}
}
Or, directly put shared_ptr<string[]> as the type for STL
Aliasing
p2 only points to the int portion of p1
shared_ptr<pair<string,int>> p1 { new pair{"hi"s, 5} };
shared_ptr<int> p2 { p1, &p1->second };
Since p2 is constructed on p1, the point to the same control block and share the ref count!
- Managed pointer same as
p1- Managed pointer
deleted at the end!
- Managed pointer
- Stored pointer points to the
intportion ofp1- Can be something random, doesn’t even have to be part of
p1
- Can be something random, doesn’t even have to be part of
Weak Pointer
Useful for back/circular reference without messing up the cleanup
std::weak_ptr points to an existing shared object, but does not participate in reference count
- Same stored pointer and control block as shared pointer
- Another block of weak count
- When shared count goes zero, data will be destroyed, but control block remains
- When both shared count and weak count go zero, destroyed.
Define helper lambda
auto weak_test = [](weak_ptr<string> wp) {
shared_ptr<string> sp = wp.lock();
if (sp) {
assert(sp.use_count() == 2);
cout << *sp << '\n';
} else {
cout << "weak_ptr expired\n";
}
};
- Copy
weak_ptrby value - Call
wp.lock()to get a new shared pointer to underlying object- You can’t directly dereference a weak pointer
- Such shared pointer will participate in use count
- If
wpexpired (shared reference count = 0), getnullptr
weak_ptr<string> wp;
{
auto sp = make_shared<string>("hello");
assert(sp.use_count() == 1);
wp = sp; // weak_ptr does not increase
assert(sp.use_count() == 1); // the shared_ptr's ref count
weak_test(wp); // sp is alive, so is wp
}
weak_test(wp); // sp is gone, so wp is expired
You can attach weak pointer to shared pointer, but when shared pointer goes away, weak pointer expires
- When all shared pointers go
Unique Pointer
Unique ownership of a managed object. Smart deletion when out of scope
- Deletes object when destroyed
- No reference counting
- Need ensure only one
unique_ptrdeletecopy operations, but can be moved- Just holds a pointer, size 8 bytes.
Usage: create an array with 1000 string objects, wrapped within one unique_ptr
unique_ptr<string[]> create_array(size_t n, string s) {
auto up = make_unique<string[]>(n);
for (size_t i = 0; i < n; ++i) {
up[i] = s;
}
return up;
}
size_t n = 1000;
unique_ptr<string[]> up1 = create_array(n, "hello");
for (size_t i = 0; i < n; ++i) { assert(up1[i] == "hello"s); }
Copy and move operations
unique_ptr<string[]> up2;
assert(!up2);
// up2 = up1; // can't copy
up2 = std::move(up1); // can move
// It can also be moved to a shared_ptr
shared_ptr<string[]> sp { std::move(up2) };
- After moved away, original one becomes a
nullptr
Can also convert to a shared pointer, with a custom deleter
template <template T, template Deleter>
shared_ptr(std::unique_ptr<T, Deleter>&& r);
18. Concurrency
Measure with std::chrono::high_resolution_clock
#include <chrono>
auto time0 = high_resolution_clock::now();
auto time1 = high_resolution_clock::now();
auto dt = duration_cast<microseconds>(time1 - time0);
cout << dt.count();
Sequential
struct Args {
uint64_t* sum_ptr; // pointer to shared variable
uint64_t count;
};
void* f1(void* args) {
Args* p = (Args*)args;
uint64_t* sum_ptr = p->sum_ptr;
uint64_t count = p->count;
while (count--) {
++*sum_ptr;
}
return NULL;
}
f1() grabs sum_ptr and increments it count times
sum1() calls f1() twice, each incrementing it 1M times
uint64_t sum1() {
constexpr uint64_t count = 1'000'000;
uint64_t sum = 0;
Args args1 { &sum, count };
f1(&args1);
Args args2 { &sum, count };
f1(&args2);
return sum;
}
- Time: 3 ms
- Result: 2M
C pthread
Shared address space, running simultaneously (3 threads)
uint64_t sum2() {
constexpr uint64_t count = 1'000'000;
uint64_t sum = 0;
pthread_t t1, t2;
Args args1 { &sum, count };
pthread_create(&t1, NULL, &f1, &args1);
Args args2 { &sum, count };
pthread_create(&t2, NULL, &f1, &args2);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
return sum;
}
- Time: 4 ms (slower!, bad cache locality)
Result < 2M
- Load-increment-store may be interleaved. Not atomic.
Cache Locality
Declare sum as an array of two integers. Increment separately
- Time: 4 ms, two
uint64_tnot on same cache line- False sharing for 2 CPUs!
- Need to put on different cache lines -> 1.8 ms
- Result: 2M
Locking
Use pthread_mutex
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
void* f2(void* args) {
Args* p = (Args*)args;
uint64_t* sum_ptr = p->sum_ptr;
uint64_t count = p->count;
while (count--) {
pthread_mutex_lock(&mtx);
++*sum_ptr;
pthread_mutex_unlock(&mtx);
}
return NULL;
}
- 147 ms, much slower :(
- Better performance: atomic variables
Cpp thread
-pthreadin MakefileCXXFLAGSandLDFLAGS
struct wallet encapsulate a mutex
struct wallet {
mutex m;
uint64_t money = 0;
operator uint64_t() const { return money; }
wallet& operator++() {
m.lock();
++money;
m.unlock();
return *this;
}
};
void f(wallet& sum, uint64_t count) {
while (count--) {
++sum;
}
}
In main(), create a thread object:
tmeplate<class F, class... Args>
expicit thread(F&& f, Args&&... args);
Parameters:
- Function to run
- Arguments (default by value, need wrap with
ref())
#include <thread>
constexpr uint64_t count = 1'000'000;
wallet sum;
thread t1 { f, ref(sum), count };
thread t2 { f, ref(sum), count };
t1.join();
t2.join();
Scoped Lock and Threads
A
mutexwrapper
- Constructor: call
m.lock() - Destructor:
m.unlock(), as long as scoped - Robust to
throwing!
#include <mutex>
wallet& operator++() {
{
scoped_lock lck{m};
++money;
}
return *this;
}
Can also construct with multiple mutex objects
- Internal deadlock prevention algorithm
jthread call join() at end of scope
{
jthread t1 { f, ref(sum), count };
jthread t2 { f, ref(sum), count };
}
- Main thread needs to call the destructor
Condition Variables
For producer-consumer, locking/unlocking becomes hairy. Condition variables manage them in a controlled way
Example
Main thread: while loop that reads a line
unique_lockused in conjunction with condition variablecondition_variable.notify_one()unblock one waiting thread- Notify while holding the lock
- When input hits EOF, poison
vecwith empty string Both threads have access tovec,mtx, andcv
vector<string> vec;
string str, line;
mutex mtx;
condition_variable cv;
while (getline(cin, line)) {
istringstream iss(line);
unique_lock lck(mtx); // lck's contructor will lock mtx
while (iss >> str) { vec.push_back(str); }
cv.notify_one(); // unblock one waiting thread
} // unique_lock unlocks when out of scope
{
unique_lock lck(mtx);
vec.push_back(""s); // poison value to tell the thread to exit
cv.notify_one();
}
Child thread: pass lambda function, grabbing everything by reference
- Run forever: acquire same lock
.wait()atomically yieldslck, put itself to sleep (blocking)- Block until other thread does
cv.notify() - Need to wrap
mtxwithunique_lock - When you wake up from
notify, wait for parent to release.wait()returns after reacquired lock - Recheck
while, in case other consumer takes it
- Block until other thread does
- Print out each word and clear vector
- Poison value: empty string
thread t {
[&]() {
while (1) {
unique_lock lck(mtx); // lck's contructor will lock mtx
while (vec.empty()) {
cv.wait(lck);
}
for (const auto& x : vec) {
if (x == ""s) // exit thread on poison value
return;
cout << x << ' ';
}
vec.clear();
} // lck's destructor will unlock mtx
}
};
19. Atomics
This section is under construction, on Jae’s end
Cpp Memory Models
Gets slower and stricter, but safer (https://en.cppreference.com/cpp/atomic/memory_order)
enum memory_order {
memory_order_relaxed,
memory_order_consume,
memory_order_acquire,
memory_order_release,
memory_order_acq_rel,
memory_order_seq_cst
};
Atomic Variable
Similar to waller code before
In cpp, you can do an atomic of anything, even a struct
- May use lock. Can check with
static_assert<atomic<T>::is_always_lock_freetruefor built-in types, but typically not for complexstructs- Architecture dependent (E. memory fence)
atomic.cpp: ```cpp #includestruct wallet { std::atomic money{0}; operator uint64_t() const { return money; // return money.load(std::memory_order_relaxed); }
wallet& operator++() { ++money; // money.fetch_add(1, std::memory_order_relaxed); return *this; } }; ```
- If you use atomic variable directly, cpp will default to most strict memory model
seq_cst- Can use more relaxing memory model
- Not too different in
moneyexample
- Much faster than
mutex, but still much slower than native
Memory Ordering
producer increments x, and then y in a loop
ywill never be bigger thanxconsumerchecks that
std::atomic<uint64_t> x{0};
std::atomic<uint64_t> y{0};
void producer() {
for (size_t i = 0; i < 1000'000'000; ++i) {
x.fetch_add(1, std::memory_order_relaxed);
y.fetch_add(1, std::memory_order_relaxed);
}
}
void consumer() {
for (size_t i = 0; i < 1000'000'000; ++i) {
uint64_t yy = y.load(std::memory_order_relaxed);
uint64_t xx = x.load(std::memory_order_relaxed);
assert(xx >= yy);
}
}
- Works on x86. Less aggressive ordering
- On ARM, fails with
relaxed:(
If use y.fetch_add(1, release); and yy = y.load(acquire);, telling compiler to make sure when y.load() is called, all changes before are visible.
