C2cpp Notes: Smart Pointers and Concurrency

16 minute read

Published: June 13, 2026

17. Smart Pointers

Suicidal Pointer: gracefully cleans itself up when no longer referred
A class with pointer inside and acts like a pointer (op*(), op->())
Self-delete when goes out of scope
How making two copies? Who’s deleting it?
Shared pointer: know each other; deleted only after all gone
Unique pointer: Only one can hold it; can only be moved

Shared Pointer

Implementation

template <typename T> class SharedPtr {
	T* ptr;
	int* count;	
public:
	explicit SharedPtr(T* p = nullptr): ptr{p}, count{new int{1}} {}
	~SharedPtr() {
		if(--*count == 0) {
			delete count; delete ptr;
		}
	}

	T& operator*() const { return *ptr; }
	T* operator->() const { return ptr; }
	
	operator void*() const { return ptr; }   // for if (sp)
	T* get_prt() const { return ptr; }       // get() in STL
	int get_count() const { return *count; }

Constructed with a pointer (E. new string{})
Manages heap-allocated data and count
Last one out of scope is responsible for deletion

	SharedPtr*(const SharedPtr<T>& sp): ptr(sp.ptr), count(sp.count) {
		++*count;
	}
	SharedPtr<T>& operator=(const SharedPtr<T>& sp) {
		if (this != &sp) {    // delete, and then construct
			if (--*count == 0) {delete count; delete sp;}
			ptr = sp.ptr;
			count = sp.count;
			++*count;
		}
		return *this;
	}
};

Copy constructor makes a shallow copy of ptr and count; increments *count
Copy assignment, need detach first

Example

SharedPtr<string> p1 = {new string{"hello"}};
auto foo = [=] (SharedPtr<string> p) {  // take p by value, CC!
	cout << p.get_count() << endl;
}
foo(p1);

This will invoke copy constructor, incrementing count After foo() returns, p goes out of scope, and count decrements back

Vector

Let SharedPtr<string> contain a string* inside

auto create_vec1() {
	SharedPtr<string> p1 { new string{"hello"} };
	SharedPtr<string> p2 { new string{"c"} };
	SharedPtr<string> p3 { new string{"students"} };

	vector v { p1, p2, p3 };
	return v;
}

vector v1 = create_vec1();

Shared pointers themselves p1… are stack-allocated
- string objects are heap allocated
When creating a vector, will make another copy of the string!
- Should be vector<SharedPtr<s
Return by value: v will get moved/elided
- Stack-allocated p1 goes out of scope, count goes back to 1

When create_vec1() returns, stack pointers destroyed, and v moved/elided Now modify it:

vector v2 = v1;   // copy pointers, but still same underlying string
*v2[1] = "c2cpp";
*v2[2] = "hackers"; 

// print v1 again
for (auto&& p : v1) { cout << *p + " "; } cout << '\n';

Copy construct v1 to v2
- count goes up to 2
Modify v2: Dereference SharedPtr as raw pointer and modify it!
Print original v1: also modified!

Loop Types

for (SharedPtr<string> p : v1) { cout << *p + " "; }
for (auto p : v1)              // do something
for (auto& p : v1)
for (const auto& p : v1)
for (auto&& p : v1)

auto& grab each by reference
const necessary if v1 is a const object!
auto&& declares a forwarding reference, can pass both L- and R-values
- Most general form in template code
- x can either be an L-value ref or R-value ref, depending on category of *v1.begin()
- Useful if iterator return is unknown
- E. vector<bool>

When could dereferencing an iterator give you an R-value?
vector<bool>, since each bit does not have a full address

Make Shared Pointer

MakeSharedPtr<T>() does the new with any number of arguments to construct T

Variadic function!
Forwarding reference! Perfect forward to new T{} expression

template <typename T, typename... Args>
SharedPtr<T> MakeSharedPtr(Args&&... args) {
	return SharedPtr<T>{ new T{forward<Args>(args)...} };
}

auto p1 { MakeSharedPtr<string>("hello") };    // My implementation
auto p2 { make_shared<string>("c") };          // STL

static_assert(is_same_v< decltype(p2), shared_ptr<string> >);

Same type as the STL shared_ptr

`std::shared_ptr`

Usage

string*            p0 { new string(50, 'A') };
shared_ptr<string> p1 { new string(50, 'B') };
shared_ptr<string> p2 { make_shared<string>(50, 'C') };

Memory Footprint

{
	string* p0 { new string(50, 'A') };           // leak string and char[]
	shared_ptr<string> p1 { new string(50, 'B') };
	shared_ptr<string> p2 { new MakeSharedPtr<string>(50, 'C') };
}

MakeSharedPtr has one fewer allocation from Valgrind

Library notices that new string is on the heap, so it combines the control block count and string in a single allocation (placement new)
Our implementation uses new string, no such freedom

Custom Deletion

{
SharedPtr<string> p0 { new string[2] {"a", "b"} };
}

Deletion crashes for string[2] ! Need delete[]

Need to write custom lambda functor for delete

shared_ptr<string> p0 {new string[2] {"a", "b"}, 
	[](string* s) {delete[] s;}
}

Can also do for FILEs

shared_ptr<FILE> sp { fp,
	[](FILE* fp) {fclose(fp);}
}

Or, directly put shared_ptr<string[]> as the type for STL

Aliasing

p2 only points to the int portion of p1

shared_ptr<pair<string,int>> p1 { new pair{"hi"s, 5} };
shared_ptr<int>              p2 { p1, &p1->second };

Since p2 is constructed on p1, the point to the same control block and share the ref count!

Managed pointer same as p1
- Managed pointer deleted at the end!
Stored pointer points to the int portion of p1
- Can be something random, doesn’t even have to be part of p1

Weak Pointer

Useful for back/circular reference without messing up the cleanup

std::weak_ptr points to an existing shared object, but does not participate in reference count

Same stored pointer and control block as shared pointer
Another block of weak count
- When shared count goes zero, data will be destroyed, but control block remains
- When both shared count and weak count go zero, destroyed.

Define helper lambda

auto weak_test = [](weak_ptr<string> wp) {
	shared_ptr<string> sp = wp.lock();
	if (sp) {
		assert(sp.use_count() == 2);
		cout << *sp << '\n';
	} else {
		cout << "weak_ptr expired\n";
	}
};

Copy weak_ptr by value
Call wp.lock() to get a new shared pointer to underlying object
- You can’t directly dereference a weak pointer
- Such shared pointer will participate in use count
- If wp expired (shared reference count = 0), get nullptr

weak_ptr<string> wp;
{
	auto sp = make_shared<string>("hello");
	assert(sp.use_count() == 1);

	wp = sp;                     // weak_ptr does not increase
	assert(sp.use_count() == 1); // the shared_ptr's ref count

	weak_test(wp); // sp is alive, so is wp
}
weak_test(wp); // sp is gone, so wp is expired

You can attach weak pointer to shared pointer, but when shared pointer goes away, weak pointer expires

When all shared pointers go

Unique Pointer

Unique ownership of a managed object. Smart deletion when out of scope
Deletes object when destroyed
No reference counting
Need ensure only one unique_ptr
delete copy operations, but can be moved
Just holds a pointer, size 8 bytes.

Usage: create an array with 1000 string objects, wrapped within one unique_ptr

unique_ptr<string[]> create_array(size_t n, string s) {
	auto up = make_unique<string[]>(n);
	for (size_t i = 0; i < n; ++i) {
		up[i] = s;
	}
	return up;
}

size_t n = 1000;
unique_ptr<string[]> up1 = create_array(n, "hello");

for (size_t i = 0; i < n; ++i) { assert(up1[i] == "hello"s); }

Copy and move operations

unique_ptr<string[]> up2;
assert(!up2);

// up2 = up1;            // can't copy
up2 = std::move(up1);    // can move

// It can also be moved to a shared_ptr
shared_ptr<string[]> sp { std::move(up2) };

After moved away, original one becomes a nullptr

Can also convert to a shared pointer, with a custom deleter

template <template T, template Deleter>
shared_ptr(std::unique_ptr<T, Deleter>&& r);

18. Concurrency

Measure with std::chrono::high_resolution_clock

#include <chrono>
auto time0 = high_resolution_clock::now();
auto time1 = high_resolution_clock::now();
auto dt = duration_cast<microseconds>(time1 - time0);
cout << dt.count();

Sequential

struct Args {
    uint64_t* sum_ptr;    // pointer to shared variable
    uint64_t count;
};

void* f1(void* args) {
    Args* p = (Args*)args;
    uint64_t* sum_ptr = p->sum_ptr;
    uint64_t count = p->count;

    while (count--) {
        ++*sum_ptr;
    }
    return NULL;
}

f1() grabs sum_ptr and increments it count times

sum1() calls f1() twice, each incrementing it 1M times

uint64_t sum1() {
    constexpr uint64_t count = 1'000'000;
    uint64_t sum = 0;

    Args args1 { &sum, count };
    f1(&args1);

    Args args2 { &sum, count };
    f1(&args2);

    return sum;
}

Time: 3 ms
Result: 2M

C `pthread`

Shared address space, running simultaneously (3 threads)

uint64_t sum2() {
    constexpr uint64_t count = 1'000'000;
    uint64_t sum = 0;
    pthread_t t1, t2;

    Args args1 { &sum, count };
    pthread_create(&t1, NULL, &f1, &args1);
    Args args2 { &sum, count };
    pthread_create(&t2, NULL, &f1, &args2);

    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    return sum;
}

Time: 4 ms (slower!, bad cache locality)
Result < 2M
Load-increment-store may be interleaved. Not atomic.

Cache Locality

Declare sum as an array of two integers. Increment separately

Time: 4 ms, two uint64_t not on same cache line
- False sharing for 2 CPUs!
- Need to put on different cache lines -> 1.8 ms
Result: 2M

Locking

Use pthread_mutex

pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
void* f2(void* args) {
    Args* p = (Args*)args;
    uint64_t* sum_ptr = p->sum_ptr;
    uint64_t count = p->count;

    while (count--) {
        pthread_mutex_lock(&mtx);
        ++*sum_ptr;
        pthread_mutex_unlock(&mtx);
    }
    return NULL;
}

147 ms, much slower :(
Better performance: atomic variables

Cpp `thread`

-pthread in Makefile CXXFLAGS and LDFLAGS

struct wallet encapsulate a mutex

struct wallet {
    mutex m;
    uint64_t money = 0;
    operator uint64_t() const { return money; }

    wallet& operator++() {
        m.lock();
        ++money;
        m.unlock();
        return *this;
    }
};

void f(wallet& sum, uint64_t count) {
    while (count--) {
        ++sum;
    }
}

In main(), create a thread object:

tmeplate<class F, class... Args>
expicit thread(F&& f, Args&&... args);

Parameters:

Function to run
Arguments (default by value, need wrap with ref())

#include <thread>
constexpr uint64_t count = 1'000'000;
wallet sum;

thread t1 { f, ref(sum), count };
thread t2 { f, ref(sum), count };
t1.join();
t2.join();

Scoped Lock and Threads

A mutex wrapper

Constructor: call m.lock()
Destructor: m.unlock(), as long as scoped
Robust to throwing!

#include <mutex>
    wallet& operator++() {
	    {
	        scoped_lock lck{m};
	        ++money;
	    }
	    return *this;
    }

Can also construct with multiple mutex objects

Internal deadlock prevention algorithm

jthread call join() at end of scope

{
	jthread t1 { f, ref(sum), count };
	jthread t2 { f, ref(sum), count };
}

Main thread needs to call the destructor

Condition Variables

For producer-consumer, locking/unlocking becomes hairy. Condition variables manage them in a controlled way

Example

Main thread: while loop that reads a line

unique_lock used in conjunction with condition variable
condition_variable.notify_one() unblock one waiting thread
- Notify while holding the lock
When input hits EOF, poison vec with empty string Both threads have access to vec, mtx, and cv

vector<string> vec;
string str, line;

mutex mtx;
condition_variable cv;

while (getline(cin, line)) {
	istringstream iss(line);

	unique_lock lck(mtx);  // lck's contructor will lock mtx
	while (iss >> str) { vec.push_back(str); }
	cv.notify_one();  // unblock one waiting thread
}   // unique_lock unlocks when out of scope

{
	unique_lock lck(mtx);
	vec.push_back(""s);  // poison value to tell the thread to exit
	cv.notify_one();
}

Child thread: pass lambda function, grabbing everything by reference

Run forever: acquire same lock
.wait() atomically yields lck, put itself to sleep (blocking)
- Block until other thread does cv.notify()
- Need to wrap mtx with unique_lock
- When you wake up from notify, wait for parent to release. wait() returns after reacquired lock
- Recheck while, in case other consumer takes it
Print out each word and clear vector
- Poison value: empty string

thread t {
	[&]() {
		while (1) {
			unique_lock lck(mtx);  // lck's contructor will lock mtx
			while (vec.empty()) {
				cv.wait(lck);
			}

			for (const auto& x : vec) {
				if (x == ""s)    // exit thread on poison value
					return;
				cout << x << ' ';
			}
			vec.clear();
		}    // lck's destructor will unlock mtx
	}
};

19. Atomics

This section is under construction, on Jae’s end

Cpp Memory Models

Gets slower and stricter, but safer (https://en.cppreference.com/cpp/atomic/memory_order)

enum memory_order {
    memory_order_relaxed,
    memory_order_consume,
    memory_order_acquire,
    memory_order_release,
    memory_order_acq_rel,
    memory_order_seq_cst
};

Atomic Variable

Similar to waller code before

In cpp, you can do an atomic of anything, even a struct

May use lock. Can check with static_assert<atomic<T>::is_always_lock_free
- true for built-in types, but typically not for complex structs
- Architecture dependent (E. memory fence) atomic.cpp: ```cpp #include struct wallet { std::atomic money{0}; operator uint64_t() const { return money; // return money.load(std::memory_order_relaxed); }
wallet& operator++() { ++money; // money.fetch_add(1, std::memory_order_relaxed); return *this; } }; ```
If you use atomic variable directly, cpp will default to most strict memory model seq_cst
- Can use more relaxing memory model
- Not too different in money example
Much faster than mutex, but still much slower than native

Memory Ordering

producer increments x, and then y in a loop

y will never be bigger than x
consumer checks that

std::atomic<uint64_t> x{0};
std::atomic<uint64_t> y{0};

void producer() {
    for (size_t i = 0; i < 1000'000'000; ++i) {
        x.fetch_add(1, std::memory_order_relaxed);
        y.fetch_add(1, std::memory_order_relaxed);
    }
}

void consumer() {
    for (size_t i = 0; i < 1000'000'000; ++i) {
        uint64_t yy = y.load(std::memory_order_relaxed);
        uint64_t xx = x.load(std::memory_order_relaxed);
        assert(xx >= yy);
    }
}

Works on x86. Less aggressive ordering
On ARM, fails with relaxed :(

If use y.fetch_add(1, release); and yy = y.load(acquire);, telling compiler to make sure when y.load() is called, all changes before are visible.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Ming Gong

C2cpp Notes: Smart Pointers and Concurrency

17. Smart Pointers

Shared Pointer

Implementation

Vector

Loop Types

Make Shared Pointer

`std::shared_ptr`

Custom Deletion

Aliasing

Weak Pointer

Unique Pointer

18. Concurrency

Sequential

C `pthread`

Cache Locality

Locking

Cpp `thread`

Scoped Lock and Threads

Condition Variables

19. Atomics

Cpp Memory Models

Atomic Variable

Memory Ordering

Share on

You May Also Enjoy

Simon’s Rock: Now & Then

Understanding Hazards

42030青春版–一套刚刚好的乐高

Embedded Project: CircuitSim

Ming Gong

17. Smart Pointers

Shared Pointer

Implementation

Vector

Loop Types

Make Shared Pointer

std::shared_ptr

Custom Deletion

Aliasing

Weak Pointer

Unique Pointer

18. Concurrency

Sequential

C pthread

Cache Locality

Locking

Cpp thread

Scoped Lock and Threads

Condition Variables

19. Atomics

Cpp Memory Models

Atomic Variable

Memory Ordering

Share on

You May Also Enjoy

Simon’s Rock: Now & Then

Understanding Hazards

42030青春版–一套刚刚好的乐高

Embedded Project: CircuitSim

`std::shared_ptr`

C `pthread`

Cpp `thread`