AP Notes.md

74 minute read

Published: July 25, 2025

This file may not be fully adapted for web. Download the raw file here

1. CLI Basics

X=hello
echo $X

SSH

User inputs command
Shell
Command executed

Folder organization

Top level (root) directory: /
Absolute path: path from the root directory (starting with /) UpArrow restores the previous command

Shell shortcuts

Ctrl + A: start of a line
Ctrl + E: end of a line

Loop

for i in 1 2 3 ; do date ; done         # runs date 3 times

Alias

declare alias ll-'ls -alF'

cat .bashrc     # to see all alias

diff

diff a b        # displays difference between two files

Ctrl + R reverse search of commands

Vim shortcuts

gg go to top
G go to bottom
dd delete whole line
Ctrl+F scroll fwd by a screen
Ctrl+B scroll back by a screen
{ scroll back a paragraph
} scroll fwd a paragraph
zz move current cursor to the center
v start highlighting stuff
V highlight by line
Ctrl+v highlight by column
= auto indents the highlighted portion

2. Compilation and Make

gcc hello.c

Compiles and links the file

gcc hello.c -o hello

Compiles, links the file. Named hello, same as a.out.

./hello

executes the hello program Can’t directly do hello like ls, since ls is in the usr/bin/ folder (system path)

Building a C program

How source file -> executable program

gcc myprogram.c && ./a.out      # runs first command, if works, runs the second

No classes in C. All functions are on the top of source file. No nesting of functions.

C program always starts from the SINGLE main function (entry point).

Proper way: separate compilation

For multiple source files, first compile them into multiple intermediate OBJECT files (.o)
Then use linker to link the multiple object files

In scenario above:

gcc -c hello.c      # -c means compilation only

Repeat for other .c files

Now link all the .o files. Give gcc the .o intermediate file

gcc hello.o         # just gcc on hello.o

Can use -o to rename the executable

(In vim, use :e to edit two files at once)

Now make myadd.c. Compile both

gcc hello.o myadd.o -o hello        # link both togther
# warning here: myadd is not defined in hello.c
# declared myadd "implicitly" by calling it

Compilation: translate C program into machine code (sequence of numbers) for the CPU (.o)
- Why is it so much bigger? There is some code we didn’t write printf(). Also some structural code

More options

gcc -c -Wall -g myprogram.c

-Wall enables ALL warnings
-g include some information in the object file for DEBUGGING

Declaration

int add(int x, int y);      // I will define this function later
// Aka function prototype/signature
// You can also put prototype within main()

Before calling the function, you should either define or declare the function.

Create myadd.h to just declare add.

#include "myadd.h"      
// When you encounter this line, replace this line withe the file content

Compiler just copy and paste

<> looks for at system directory
"" looks at cd Code of myadd is actually in myadd.c. myadd.h only has the prototype

Linker links the necessary .o files from standard library.

gcc hello.o myadd.o -lm         # -lm means to look for in the math lib

Prevent linking problem

Modify myadd.c somehow and recompile it to myadd.o

We don’t need to recompile hello.c. Just need to update hello by relinking.

Now change the input property of myadd.c to take 3 numbers

Compilation is OK, since myadd.h unchanged
Linking is also OK. C only matches up the function names, not the parameters.
- Linker is not restricted to C. .o files can be compiled from different languages.
When run, program displays random wrong shit.

Things fail silently!!!

objdump -d myadd.o

This displays content of files in hexadecimal and assembly code

Memory allocation issue

How to solve this issue In myadd.c, include its own header file myadd.c

#include "myadd.h"              // myadd in .h take only 2 params
int add(int x, int y, int z){   // It will complain here
    ...
}

Makefile

make is a program that compiles based on instruction (Makefile)

capital M!

myadd.o: myadd.c myadd.h        # myadd.o: target, others: ingredient
  gcc -c -Wall -g myadd.c     # tab here, can't be spaces
# Empty lines OK, sole spaces not allowed

make        # Runs the gcc command

Make will make myadd.o

If nothing changes, make will look at timestamp and not do anything.

touch myadd.c       # updates timestamp to "simulate" a change in file

cat tricks (-ten)

cat -ten Makefile   # t: changes tab, e: marks end of line, n: line num

In vim, 2yy means yank (copy) 2 lines, and then p to paste it.
. repeats previous action.

Makefile only makes the first target it finds. Needs to specify target if not the first:

make myprogram.o

myprogram: myadd.o myprogram.o
    gcc myadd.o myprogram.o -o myprogram

myadd.o: myadd.c myadd.h
    gcc -c -o myadd.o myadd.c

myprogram.o: ...

When make typed in, make builds the first target: myprogram
Then it looks for the prereq, comparing the time. However, .o files are missing. make will usually give up.
- However, it scans down if prereq are targets themselves, looking for myadd.o as targets.
- make suspends building myprogram, instead, starts building myadd.o first. (recursively)
- Similarly, make builds myprogram.o.
Once built, come back to mission of building myprogram and runs linking.

Running make again

Already up to date

Still, make checks if all sub-ingredients are up to date, since some subfiles may not.
- e.g. If touch myadd.c, make updates myadd.o and myprogram, not myprogram.o.
- e.g. If touch myadd.h, 3 files updated.

Variables

CC = gcc
CFLAGS = -g -Wall

$(CC)   # uses the variable

#If linking recipe not specified, 
#   $(CC) $(LDFLAGS) <dependent-.o-files> $(LDLIBS) -o <executable name>
# so we can just omit the recipe

You can omit the subsequent lines

Auto remove binary files after compilation

.PHONY: clean
clean:
    rm -f *.o myprogram

make clean

.PHONY: all
all: clean myprogram
# Let it first do 'clean', and then REBUILD 'myprogram'
# make clean
# make myprogram

Make is attempting to produce a file clean. File did not get produced, but command executed

aka phony target

Git

Version control system: take a snapshot of current progress, in case of messing up

For HW, we clone the lab assignments, not init.

git add *.c     # can be add or modification
git commit      # actually "does" the add. Allows multiple "units" of change
# git pops a "commit message"

Type on the first line for commit message

git log         # sees commit log
git status      # displays UNTRACKED files

File possibilities:

Untracked: never added to git
tracked, unmodified: files haven’t been modified since last commit
Tracked, modified, unstaged: modified but not git added. Changes in git diff
Tracked, modified, staged: modified, added, not yet committed, can’t be shown on git diff, but can be seen with git diff --staged
After commit, those files will not show up in git status

3. C Basics

Integer types

C doesn’t specify the exact length of the types, but pretty standardized

char 1 byte (8 bits), unsigned 0-225, or -128-127
short 2 bytes, unsigned 0~65536, or -32768-32767
int 4 bytes, unsigned $0\approx 4\times 10^9$
long 4/8 bytes
long long 8 bytes

If unsigned used as prefix, positives only

C99 introduces (u)intN_t to specify the space.

E. int8_t, uint8_t (same as char!, it’s 8 bit, not byte!)

Binary numbers

Signed implementation:

1’s complement
2’s complement (currently used)

MSB is given a negative weight

Properties:

Negative numbers have MSB 1
1111…1 = -1 (largest negative number)
1000…0 = smallest negative number

Conversion:

Filp all bits
Add 1

Hex notation: 0x12 = 12

int x = 0xffffffff;             // x = -1
int y = 0x7fffffff;             // y = 2^31 - 1
int z = 0x80000000;             // z = -2^31
unsigned int a = 0xffffffff;    // a = 2^32 - 1
unsigned int b = -1;            // C will force assignment. b = 2^32 - 1

Integer literals

Declaration

int i = 5;      // Declared, initialized
int j;          // not yet init

i = 100;        // Assignment
i = 'A';        // 'A' (single quote) = 65
j = '\11';      // Ocatal numbers 011=9
j = '\0';       // '\0' = 0, not '0' = 48

Expressions vs statements

Statement

Typically end with semicolon/curly braces

Expression

A piece of a statement that has a type and value

42
x (depends on assignment)
1 + 6 (type int)
3.14f (specifies float type)
a == b (type int, not bool)
printf("hi") (This is a function call, with some evaluation)

Extension: x = 1; is both a statement and an expression.

Bitwise operators

int x = 1;      // x = 0000 .... 0001
int y = 2;      // y = 0000 .... 0010
int z = -1;

int a = x & y;  // a = 0000 .... 0000   AND operation on each bit
int b = x | y;  // b = 0000 .... 0011   OR
int c = x ^ y;  // c = 0000 .... 0000   XOR
int d = ~x;     // d = 1111 .... 1110   NOT
int e = x << 1  // e = 0000 .... 0010   left shift by 1, filing 0s.
int f = z >> 1; // Up to complier to feed a 1 or a 0.
// Typically use bitshift with signed numbers

HW hint

int f(int x)
{
    return  x & (1 << 5);       // picks the value of the 6th LS bit
}

Loops to pick each bit

Logical operators (short circuit)

Similar, but

if (x && f(x))
{
    // code
}

If x=0, f(x) will not be evaluated. This is called short circuit evaluation.

Maybe f(x=0) will crash.

Same for ||

Assignment and `x++`

Typically for C, an expression is not a statement.

3+5; is not allowed (with a ;)
x << 1; does not change x! This is only an expression.
f(x); is both an expression and a statement
x = 1; This statement and expression has type int and value 1.
int x; is a statement (declaration), but not an expression, since no value.
int x = 1; is also not an expression. You can’t use it the same way as (x = 1)
y = x = 2; same as y = (x = 2);, a compound expression, evaluated right to left.
x = x + 1; + has higher precedence than =
x += 1; Exactly the same as above
++x Exactly the same. Expression and statement.

int x = 1;
int y;

y = ++x;        // y = 2. ++ first, and then use x
y = x++;        // y = 2. Use x first, and then ++

x = 2;
y = (x++);      // y = 2! Same thing as above! The act of evaluating (x++) //reserves the output, and outputs 2, before the increment.

(y = x)++;      // ERROR: the value of assignments (y = x) can't be changed!

4. Storage Classes

You can only have one name (variable or function) within a scope

1. Local variables

foo.c:

int x = 100;            // Global static variable
void f(int p) {
    printf("%d", p);    // p is also local
    int x;              // can't access outer x. Local variable name HIDES it.
    x = 0;
    {
        int x;          // A different int x comes alive, separate from previous
        x = 1;
        printf("%d", x) // 1
    }                   // That int x gone
    printf("%d", x);    // 0
}

f(x);

Variables declared inside a function/block are local variables, or automatic variables. They come and go automatically (deleted after block ends). Or called stack variables. Why?

Next time you call the function, local variables are reinitialized.

2. Static variables

Initialized at program startup. Value will stay there. May be changed. Persistent through the program. foo.c

int x = 100;
void f(void);
int main() {
    f(); 
    printf("%d", x);
}

bar.c

void f(void) {
    x = x + 50;
    printf("%d", x);
}

If compile bar.c alone, won’t compile!, since x is undefined.
If add a line int x;, both files will compile, but can’t link, since duplicate variable names! Need to add extern. Tells the compiler to “assume” that you will find (resolve) x at link time.
```
extern int x;
void g(void) {
  x = x + 50;
}
```
extern and initialize: same as init alone
extern, but not initialize: required for accessing global var elsewhere
When you don’t initialize, global vars auto init to 0.

2a. Global (static) variable

Defined outside main(). Always accessible, unless “covered” by a local variable

2b. File static variable

In declaration in foo.c, use static int = 100. static makes extern int x in other files not work. x is for foo.c file only!

Compile is fine, but link time, global variable not found

2c. Block static variable

main.c

//int x = 100;            // Prints 101 - 105
int f(void) {
    // int x = 100;     // If put x = 100 here, will print 101 five times
    static int x = 100;      //PRINTS 101 - 105 again
    x++;
    return x;
}

int main() {
    for (int i = 0; i < 5; i++)
    printf("%d", f());
}

x is block static. You can’t access x anywhere outside f, but x behaves like a global variable (as if it’s defined outside).

x only gets initialized at program startup, before f called. Every time x is called, static int x = 100 is skipped!!!

Process address space

When a program runs, it’s a process (running instance of a program)
Program code is put in memory (some space left initially)

Program thinks they have total memory of, let’s say, 512G, but actually much smaller, because OS gives fake addresses. OS maps each fake address to the real RAM (E. 4G)

Memory block for one process (long, sequential tape, top to bottom):

Stack (local variable)
… (empty)
Heap (malloc)
Static variable
Code
Reserved Initially, code and static variable areas are fixed. Stack and heap can grow.

Stack allocation (an overview)

Allocates local variables

int f(void) {
    int x; 
    int y;
    ...
}
int g(void) {
    int s;
    int t;
    ...
}

int main() {
    int a;
    int b;
    f();
    g();
}

Stack (top-down):

a = 100
b = 200
x (after f returns, x deleted)
…

CPU also has embedded special variables: registers

Stack pointer holds the address of stack’s bottom
- When a variable goes away, stack pointer moves up
- Stack grows and shrinks
- Recursion (if incorrectly written): stack overflow (enforced by OS)

5. Pointer

Pointer variables

foo() {
    int x = 1;
    int y = 2;

    int *p;     // type: int*
    // Can also use int* p;

    // p takes 8 bytes, holds the memory address
    // p can contain a memory address of type INT

    p = &x;         // p = address of x
    // &x: type int*
    // & is a unary operator, returning the ADDRESS of a variable

    y = *p;         // y = gets the value that p points to
    // *p: type int         PEEL OFF a *

    *p = 0;         // changes x to 0
}

In int *p, * declares a type, but in y = *p, * is an operator Why int *p? p is a variable s.t. that if you apply a * on it, you’ll get an int.

Postfix has higher priority than prefix, so (*p)++ differ from *p++.

Pointer types

Pointer variables have distinct types.

int i = 1234;           // 4 bytes
double d = 3.14;        // 8 bytes

int *pi = &i;           // All pointer var are 8 bytes
double *pd = &d;

pi = pd;                // ERROR!
pi = (int *) pd;        // casted pd (double*) as int*. 
// If dereferenced, It will take the first 4 bytes of d, meaningless

Void pointer

Special type: can point to any type

void *pv;
pv = pd;

double x = *pv;             // WRONG! 
double x = *(double*) pv;   // Cast, then deref

A void* can’t be dereferenced! C doesn’t know how many bytes to read or what to interpret it as.

NULL Pointer

Address 0 is special

int *pi = 0 
pi = NULL;          // NULL = 0

Pointer in if block will evaluate to 0 only if pointer is NULL.

int c = 0;
int *p = &c;
int *q = 0;

if (p)
    // runs
if (q)
    // won't run
if (*p)
    // won't run
if (*q)
    // COMPILES, but CRASHES

6. Arrays

int a[4] = {100, 101, 102, 103};    // declares and inits array
int a[] = {100, 101, 102, 103};     // also OK, compiler will deduce it
int a[10] = {100, 101, 102, 103};   // compiler will init others 0
int a[] =                           // NO init: C won't init anything

a[0] = 200;
a[4] = 400;     // ILLEGAL, but may run

`sizeof()`

Not a function. A keyword operator. Can take any expression and type name.

int i;
int a[10];
int x;

x = sizeof(i);      // returns byte-size of i: 4
x = sizeof(int);    // 4
x = sizeof(&a[0]);  // 8        &(a[0]), postfix wins

x = sizeof(a);      // 40!      4 * 10

`p++;`

Addition of pointer and int means the address of the next element the pointer points to. +sizeof E. int, +1 points to 4 elements later. double 8.

Can’t do for void*. Undefined

int a[4] = {100, 101, 102, 103};
int *p = &a[0];

for (int i = 0; i < 4; i++) {
    //printf("%d", *(p+i));
    //print("%d", a[i]);
    printf("%d", *p);
    p++;                    // p += 4 actually

    //same as 
    printf("%d", *p++);     // *(p++)
}

At the end, p points to one PAST the last element of a.

C guarantees it’s fine. You can’t deref it tho.

GUT

Grand Unified Theory of pointers and arrays

if p = &a[0],

a[i] <=> *(p+i)
a <=> &a[0]. Most cases, an array name will be converted to the pointer of the first element.
a[i] <=> *(p+i) <=> *(&a[0] + i) <=> *(a + i)
p[i] <=> (&a[0])[i] <=> a[i].

We can interchange pointers and array!

GUT: u[i] <=> *(u+i). Doesn’t matter if u is pointer or array. Compiler will change u[i] to *(u+i)

Prop: a[5] = *(a+5) = *(5+a) = 5[a] 5[a] compiles and runs normally!

Exceptions

You can do p++, but not a++.
sizeof(a) differ with sizeof(p).

7. Heap allocation

int main() {
    int *p;
    p = f();        // f returns a pointer to array
    printf("%d", p[2]);
}

int *f() {
    int a[4] = {100, 101, 102, 103};
    return a;       // returns a pointer to a[0]
}

Problematic! After f finishes, automatic a deleted (stack goes up).

p becomes a dangling pointer.

How to use array out of the function then: heap allocation: malloc()

argument: how many bytes using.
returns a void * pointer to the first byte of the space you want.
- void * because malloc() doesn’t know the type

int *f() {
    int a[4] = {100, 101, 102, 103};
    int *p = malloc(sizeof(int) * 4);   

    return p;
}

Freeing memory

Heap is not automatically managed. You have to free it manually.

free(p);

Memory leak

Once you disconnect a pointer to heap, the memory is lost valgrind needs to say nothing is wrong!

8. String

String represented as char arrays, ending with 0.

// "abc"
char a[4] = {'a', 'b', 'c', 'd', '\0'};
char a[4] = {97, 98, 99, 100, 0};

char *p = a;          // p = &a[0]
printf("%s", p);      // printf expects p to be type char*, NOT char!!!
// printf follows the pointer, prints the character 'a'
// printf keeps going until it hits 0.
printf("%s", a+1);    // a -> &a[0], a+1 -> &a[1]
// bcd

String Literals

char *p = "abc";
printf("%s", "abc");    
// "abc" is an expression, type: array that turns itself into char *

The type of "abc" is char[4], anonymous array. When "abc" is assigned to *p, it turns itself to a char * pointer to the first value.

Two cases where array not turned into pointer:

sizeof("abc"): 4
- sizeof("abc" + 1) (converted to pointer): 8
Array initializer: char a[4] = "abc"

Where is "abc" stored at?

p is on the stack
"abc" is in a region between static var and code, called read-only data (kind of part of code)
- "abc" is immutable, just like you can’t do 5 = 4

String library functions

strlen() returns string length not including \0

#include <string.h>
int strlen(char *s) {
    int i = 0;
    while (*s++)
        i++;
    return i;
}

strcpy() copies

#include <string.h>
void strcpy(char *t, char *s) {
    while((*t = *s) != 0){
        s++;
        t++;
    }
}

void strcpy(char *t, char*s) {
    while((*t++ = *s++) != 0)
        ;
}

Wrong usage:

char *a = "abc";
char *b;
strcpy(b, a);

b is not initialized, so "abc" is copied to somewhere random, and the program may crash.

Have to declare char b[4];

`argv` array

How to access ./a.out hi world?

int main(int argc, char **argv)

argc is the number of arguments including program name.
**argv is an array of pointers (char *), ending with a 0

Example: print “o” in “world”

printf("%c", argv[2][1]);
printf("%c", *(*(argv + 2) + 1));

int main(int argc, char** argv) {
    argv++;
    while(*argv)
        printf("%s\n", *argv++);    // *(argv++)
}

9. Function Pointers

`const`

Variables that should not be changed. C won’t compile if changed

`const` pointers

int * const p = &x;       // p is a pointer STUCK to an address
p = &y;                   // ERROR

const int *q = &x;        // q is not constant, but *q is
*q = 100;                 // ERROR

// To force changing *q:
*(int *)q = 100;           // cast to regular int *, and then change 

Example:

strcpy(char *t, const char *s);

*s should not be modified.

`qsort`

Sorts any array

typedef unsigned int size_t;    // optional

void qsort( void *base,         // ANY TYPE pointers accepted
            size_t nmemb,       // size_t is unsigned_int/long
            size_t size,        // size of EACH element
            int (*compar) (const void *, const void *)); 

int (*compar) (const void *, const void *) is a pointer to function

var name: compar
var type: int (*) (const void *, const void *)
Function input: two const void *s
Function output: int

compar is in the code section of the memory

compar’s signature needs to exactly match the parameters in qsort.

int compare_int(const void *a, const void *b) {
  int x = *(int *) a;
  int y = *(int *) b;

  if (x < y)      return -1;
  else if (x > y) return 1;
  else            return 0;
  //return x - y;
}

Usage:

int a[5] = {100, 37, -2, 200, 0};
qsort(a, sizeof(a)/sizeof(a[0]), sizeof(int), &compare_int);

&compare_int points to the address. & can be omitted. compare_int() will run the function.

qsort dereferences and calls (*compar)(p, q)

Complex Declarations

C declarations follows usage. Postfix higher precedence.

int a[3];       // array of 3 ints
char *b[3];     // array of 3 (char*)s
int (*f1) (const void *, const void *);
int *f2 (const void *, const void *);       // w/o ()
int (*f3[5]) (const void *, const void *);

Spiral rule:

E. int *p;: p is a variable such that if you * it, it will be an int.
b is an array of 3 pointers, where you * it, it becomes char
f1
- (*): f1 is a pointer
- (...): *f1 is a function.
- int: *f1 is a function that returns int.
f2
- (...): f2 is a function.
- *: f2 returns a pointer to something
- int: Once you deref it, becomes int.
f3
- [5]: f3 is an array of 5 elements.
- *: f3 is an array[5] of pointers
- (...): f3 is an array of pointers to functions
- int: f3 is an array of pointers to functions that return int

(*) is a pointer to something that you call

10. Structures

Similar to Java’s class, but you can’t put functions inside Typically defined outside main(). Usually in the header file.

`struct` layout

Don’t forget the ;!!!

struct Point {
    int x;
    int y;
};

int main() {
    struct Point pt;
    pt.x = 2;       // dot is the member selection operator

    // Alternative initializations:
    struct Point pt = {2, 3};
    struct Point pt = {.x = 2, .y = 3};
}

struct is a new user-defined type.

`struct` size

When a struct Point is declared, on stack, 4 bytes are allocated for x, and 4 for y.

sizeof(pt) will be 4 + 4 = 8.

struct Foo {
    int x;      // 4 bytes
    char *s;    // 8 bytes
};

int i = sizeof(struct Foo);     // i = 16!

First 4 bytes are for int x, but compiler leaves out a 4-byte padding
Then 8 bytes for char *x

C allows compiler to layout the struct in memory. CLAC requires all 8-byte quantities must start at an address whose value is a multiple of 8. 4-bits must be at multiples of 4.

If reverse order by putting char *s first, compiler will still pad, in case of array Foo[].

Pointer to `struct`

Continue from struct point declaration

printf("%p", pt); prints the pointer address.

struct Point *p1;       // pointer to struct
p1 = &pt;

int *p2 = &pt.x;        // same address, but DIFF TYPE

printf("%p", p1 + 1);   // + 8 bytes!
printf("%p", p2 + 1);   // + 4 bytes

// Accessing pt.x
(*p1).x = 5;            // dereference, the select
p1->x = 5;            // same thing

-> means dereference the pointer, then select

struct Point pt[1];     // array of ONE element, same thing 
pt->x = 5;

struct Point a[10];     // a: xyxyxyxy...xy

// Declare on the heap
struct Point *p = malloc(10 * sizeof(struct Point));

By declaring array, pt can be used as a pointer, and we can use ->.

Memory optimization

If a struct variable is passed to a function, you are copying the entire struct.

An entire array cannot be copied. Array will turn itself into a pointer.

int get_num_element(int a[]) {   // As soon as you pass a, a becomes int *
    return sizeof(a) / sizeof(int);     // 8 / 2 = 4
}

int main {
    int a[10];      // here sizeof(a) / sizeof(int) = 10
    int n = get_num_elements(a);
}

Therefore, we usually pass pointers to struct to functions.

You can’t have a struct that contains itself, but you can contain a pointer to it.

Linked list

// Linked list node
struct Node {
    struct Node *next;  // Pointer to the next node
    int val;            // value
};

struct Node *create2Nodes(int x, int y) {
    struct Node *n2 = malloc(sizeof(struct Node));
    if (!n2)    return NULL;    // check if malloc fails
    n2->val = y;
    n2->next = NULL;
    // A complete linked list with a last node

    struct Node *n1 = malloc(sizeof(struct Node));
    if (!n1)    {free(n2); return NULL;}    // have to free n2!!!
    n1->val = x;
    n1->next = n2;
    return n1;
}

int main() {
    struct Node *head;
    head = create2Nodes(100, 200);
    if (head == NULL)   exit(1);

    printf("%d -> %d", head->val, head->next->val);

    free(head->next);
    free(head);     // free tail, and then head
}

A pointer of a node is our notion of linked list.
A NULL pointer is a linked-list with 0 nodes

Stack		Heap
head	`n1` (-> `&n1.next`)
`x`	100
`y`	200
`n2`	->	next	NULL
		val	200
`n1`	->	next	`n2` (-> `&n2.next`)
		val	100

After return, x, y, n2, n1 all gone, but the nodes are on heap. They remain.

free_all_nodes(head);

void free_all_nodes(struct Node *head) {    // recursion
    if (head == NULL)   return;
    free_all_nodes(head->next); 
    free(head);
}

int count_list(struct Node *head) {
    if (head == NULL)   return 0;
    return 1 + count_list(head->next);
}

`struct List`

Real linked lists are more complicated and powerful.

Previous element pointer
Arbitrary data type storage (replace int by void *)
Pointer to tail (easier to manipulate the back)
Count of elements

A struct List can have more metadata

11. Libraries

Macros

#s are preprocessed first and replaced

#include "myadd.h"
#define PI (3.14)         // substitution
#define SQR(x)  (x * x)
// function calls are expensive (jump instruction), so macros are used
// put () everywhere to be safe

`include` Guards

For code between different OS, we may need to do differently.

mylist.h

#ifndef __MYLIST_H__    // if __MYLIST_H__ is not defined,
#define __MYLIST_H__    // define __MYLIST_H__ to exist
//...
#endif                  // closes if

foo.c

#include "mylist.h"
#include "mylist.h"     // accidentally include it again
// Prevents double inclusion

Archive files

Package all .o files into a single archive .a file

ar rcs libnumbers.a power.o prime.o

r: add power.o and prime.o to the archive (or replace them if already there)
c: create archive file
s: create an index to the contents to speed up linking

# link with library objects
gcc myprogram.o prime.o power.o -o myprogram

# link with library archive
gcc myprogram.o libnumbers.a -o myprogram

Linker knows to use object files within libnumbers.a directly.

Using libraries

Headers and archives in non-standard locations

`-I`

Look for header files

gcc -I/some/path -c foo.c

If foo.c #include <some-header.h>, it should also look for this header in /some/path. Optional space or \ after -I

`-L` and `-l`

Looks for archive files, similar to -I

-L tells where to look. Must appear before object files
-l tells what archive file too look (lib<lib-name>.a). Must appear after object files
```
gcc -Lsome/path foo.o bar.o -lbaz
```
Linker will look for libbaz.a in some/path

You can pass multiple -I, -L, -l flags to link multiple locations of header and library files

12. Standard I/O

scanf and printf work on standard input and standard output

Standard channels:

Standard input (connected to keyboard device)
Standard output (typically terminal screen)
Standard error (also output, connected to terminal; E. valgrind output)

Redirection

vim input.txt

./isort < input.txt  
./isort > output.txt
./isort >> output.txt

isort reads from standard input, but < makes shell change standard input from reading keyboard to reading input.txt.
Same for >, changing standard output. Will overwrite original output.txt
- >> appends, not overwrite
- 2> redirects standard error and overwrites; E. valgrind output

valgrind --leak-check=yes ./isort < input >> output 2> err

valgrind --leak-check=yes ./isort >> output.txt 2>&1

Redirect stderr (descriptor 2) to where stdout (descriptor 1) is going

If >> output.txt and 2>&1 switched, won’t work. stderr message will be printed out to stdout, which is the screen.’

Pipes

Pipe: taking one program’s output directly to the input of another program

echo 10 | ./isort

Bash runs echo. | runs two programs together. Bash redirects the standard output of echo and the standard input of .\isort.

echo 10 | ./isort | cat -n

If cat not given an argument (file name), it will behave like echo

program | program
program > file
program < file

Useful functions and examples

cat *.txt | tr ' ' '\n' | sort | uniq > lecture-words

cat *.txt concatenates all .txt files in pwd and outputs to stdout
tr ' ' '\n' reads from stdin, translates every space to new line, and writes in stdout
`sort
uniq: removes duplicate lines. Need to be sorted

w | tail -n +3 | grep "vim" | cut -d " " -f 1 | sed 's/$/@colubmia.edu/' | sort | uniq > vimusers

wc reads a file, counts number of lines, words, and characters
head -n gives the first n lines
tail
- tail -n +3: display from 3rd line onward
grep: matches REGEX
- E. grep "vim.*c": gets all vim users working on .c files
cut: cuts content afterwards
- - -d " " specifies the delimiter " "to use for separating fields.
- -f 1 tells cut to select the first field from the split text (i.e., everything before the first space in each line).
sed: find and replace
- s/: substitute
- $: end of a line in sed.
- @colubmia.edu text you’re appending at the end of each line.
- / delimiters separating the parts of the sed command.

Collect all users using vim and their emails, sort them, and remove duplicated (by uniq). Save the output to vimusers.

echo hello | mutt -s hi mg4264@columbia.edu

Sends “hello” , with subject hi, to receiver “mg4264@columbia.edu”

for i in $(cat vimusers); do echo $i; done

Substitute the result of cat into command line. echoes all lines in vimusers

for i in $(cat vimusers); do mutt -s sorry-for-spam $i < content; done

If you have a separate file content for the body of the email. Subject is “sorry-for-spam”

For pipeline |, you can’t control the order of program run. Pipe facilities have mechanisms to block some programs from running before getting input.

13. File I/O

stdin, stdout, strerr are declared in stdio.h by FILE pointers

extern FILE *stdin;
extern FILE *stdout;
extern FILE *stderr;

Anything we can read/write are files (E. hardware device, network)

In UNIX, a file is a sequency of bytes with a position (file stream)

FILE pointers

ncat.c cats the file and adds line numbering

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
	# checks number of command line arguments
    if (argc != 2) {
        fprintf(stderr, "%s\n", "usage: ncat <file_name>");
        exit(1);
    }

fprintf prints to standard error instead of output

    char *filename = argv[1];
    FILE *fp = fopen(filename, "r");
    # If fopen failes:
    if (fp == NULL) {
        perror(filename);
        exit(1);
    }

fopen() opens a file in "r" (read). It returns a FILE *, a handle for the file.

FILE not a built-in type. It’s typedefed. Differs between OSs (opaque handling)
Other flags: read, write, append, … In OS perspective, the file organization is complex (lots of metadata). Costly to refer by name. fopen() tells OS to get ready, and OS finds all metadata and prepares structure in memory, so that next time you want to read it, it knows where to go.
perror passes filename, and prints the last thing that gets wrong

    char buf[100];
    int lineno = 1;
    while (fgets(buf, sizeof(buf), fp) != NULL) {
	    # prints line number, 4 spaces wide, right justified
        printf("[%4d]", lineno++);

fgets() reads one line (including \n) of file content and put them into buf, adding \0
Once it reaches end of file, or if something fails, it will return NULL
```
  char *fgets(char *buffer, int size, FILE *file);
```
What if the line is longer than 99 characters (buf - \0)? buf will read up to 99, terminating with a \0. The next time fgets() is called, it will continue.
What if I make size of buf small, like 20?
- When fgets() called, it only read 19 characters. The next iteration of fget picks it up, BUT lineno is changed no matter what. BUG!!
- Want to see if the last read character is \n. If not, don’t print the line number.

        if (fputs(buf, stdout) == EOF) {
            perror("can't write to stdout");
            exit(1);
        }
    }

fputs() prints one line, opposite of fgets, writing to a given file

Here in the code, we write in stdout (screen)
stdout and stderr are treated as files.

    if (ferror(fp)) {
        perror(filename);
        exit(1);
    }

fgets() returns NULL in two cases:

If there’s nothing more to read
If something actually fails in the middle. Need to distinguish by calling ferror()(), “Did you terminate it normally?”

	# close the file (free memory and stuff)
    fclose(fp);
    return 0;
}

Open and Close

Open

OS will locate the file, read the metadata first, and prepares some internal structures to keep track of the file. Prepares a file structure (FILE), and returns a pointer to it Get a FILE pointer corresponding to any other file

FILE *fopen(const char *pathname, const char *mode);

FILE *fp = fopen("myfile.txt", "r");

Returns a pointer to FILE, NULL if failed Both arguments are strings Mode argument:

Binary file

Add "b" (E. "wb", "w+b") to specify the file to be opened in binary Without "b" option, Windows translates newline \r\n to \n when reading, and vv when writing

Windows: "hello" will take 7 bytes on disk, with "\r\n", 6 bytes in memory
Linux: 6 bytes However, for non-text files, we don’t want such translation

Close

fclose(fp);

Writing output

int fputc(int c, FILE *file);
int fputs(const char *str, FILE *file);

Writes str to file, not including the '\0', returns EOF (-1) on error

size_t fwrite(const void *p, size_t size, size_t n, FILE *file)

Writes an arbitrary number of bytes (n objects) out to a file stream, including null bytes

p: Pointer to the first element of an array of these items
size: size of each item

n: number of items Returns the number of successfully written objects (<n if error)

fprintf(fp, "hello, world\n");
fputs("hello world\n", stdout);  // No string formatting
fputc('o', stderr)               // single character

Reading input

int fgetc(FILE *file);
char *fgets(char *buffer, int size, FILE *file);
size_t fread(void *p, size_t size, size_t n, FILE *file);

Blocking: OS can’t return immediately something, so it pauses progress execution until it’s ready to unblock

Example: sleep(1): blocks until 1 seconds later If you call fgets() on stdin, program will block until you type something

char buf[1024];
fgets(buf, sizeof(buf), fp):     // reads at most 1023 bytes to buf ('\0')
fread(buf, 1, sizeof(buf), fp);  // reads 1024 bytes

fgets() returns either when there’s a new line or buffer fills up, but it guarantees the buffer is '\0' terminated

Returns NULL on EOF or error (call ferror() after to find error) fread() better for arbitrary binary data
Returns the number of objects successfully (not partially) read
If less than requested, feof() and ferror() will distinguish EOF or fail
EOF
Happens when reach the end of the byte sequence. Cursor is past the end of the stream When reach EOF, fgetc(), fgets(), fread() will unblock immediately (same for error though)
fgetc() returns -1
fgets() returns NULL pointer
fread() returns less than the number of items you asked for

feof() returns 1 at EOF

EOF also happens at stdin, by pressing Ctrl-D

Buffering

printf("hello...");
sleep(3);
printf("world\n");

"hello..." will not be printed after sleep Line buffering: printf() and other FILE output functions buffer bytes give to stdout and wait until reaching \n (terminal printing is expensive)

fprintf(stderr, "hello...");
sleep(3);
fprinf(stderr, "world\n");

stderr is by default, unbuffered, because we don’t want error messages to be delayed.

Files are block buffered, buffed until a fixed-size buffer is filled (typically 4096 bytes), sends to disk until fill, flush, or program termination

Flush

printf("hello...");
fflush(stdout);     // Immediately flushes out "hello..."
sleep(3);
printf("world\n");

To manually turn off all buffering for a FILE pointer, making the second argument NULL

setbuf(stdout, NULL);

File seeking

Change the current position of the underlying stream

int fseek(FILE *file, long offset, int whence)

New position added by offset, can be negative
whence:
- SEEK_SET: offset from beginning
- SEEK_CUR: from current position
- SEEK_END: end-of-file
Returns 0 on success, else error

Formatted I/O

%d: int
%u: unsigned int
%ld: long
%uld: unsigned long
%f: double
%g: double (trailing zeros not printed)
%s: string (char *)
%p: pointer address (void *)
```
int scanf(const char *format, ...);
int fscanf(FILE *stream, const char *format, ...)
int sscanf(const char *input_string, const char *format, ...);
```
Returns the number of items assigned. Variable number of arguments (variadic functions) sscanf() parses from input_string! Example: `
```
input_string = "  \t   0xffffffff \n ;
sscanf(s, "%x", &u)
```
will skip all spaces, look what looks like a number, and terminate reading once it hits a space.

int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
int sprintf(char *output_buffer, const char *format, ...);
int snprintf(char *output_buffer, size_t size, const char *format, ...)

All return the number of chars printed (not including '\0'). sprintf() writes to the output_buffer string.

output_buffer is assumed to be large enough, or else will overrun. Dangerous function, stack smashing snprintf() only prints size number of bytes, including '\0',
Typically, n in function name means there’s a size limit of n bytes
It returns the number of chars it would have printed if it is allowed more space
- If this return value not used, there will be a warning
- Use -Wno-format-truncation flag to disable this warning
  cutstr.c
  If just run as it is, nothing will be run

#define CASE1

This runs CASE1 Or, can define on command line

gcc -DCASE2 cut-str.c

strncpy similar to strcpy, but only copies n characters. However, if it goes over the limit, it will not include '\0'.

printf will print over, random crap

Lab 4

mdb: message database

Each message is 40 bytes (if 6 messages, then 240 bytes)

If you have your own database , so

./mdb-add mdb-cs3157   # if you have write permission

./mdb-add-cs3157       # hard coded the database to mess with permissions

Size of struct MdbRed is 40 bytes, first 16 being name, second 24 being msg

mdb-add calls fgets() and strcpy() or snprintf() to copy name, cutting it off at 15 (allow NULL at the end!!). Then copies 23 characters of msg

For lookup,

Opens the file in "rb" mode
Goes into a loop reading each record (fread()) for 40 bytes
1. As it reads, appends the struct (heap allocated) into a linked list (addAfter)
2. Until fread() returns 0 (EOF)
Prompts the keyword to lookup (calls fgets() != EOF):
1. Searches thru the list for all the records
2. Examine the message
3. strstr() tells if a string contains a substring
4. Prints output
5. Until fgets() returns EOF Copy mdb-add, mdb-lookup` in the directory for reference

14. Endianness

Order of bytes are stored in computer memory

Inspecting binary files

int b;
while ((b = fgetc(fp)) != EOF)
	printf("%x\n", b);

Prints each byte from fp to stdout in hexadecimal

endian-demo.c: if block and else block Example input (4-byte int): 0x00010203 (00000000 00000001 00000010 00000011 in bin) (66051 in decimal)

xxd endian-demo.host
00000000: 0302 0100

Left: Byte offset of each row in hex
Mid: file content (2 bytes, 4 hex digits) at a time
Right: ASCII file content (. for non-printable)

CLAC happens to store integer in memory in reverse, in unit of byte +—-+—-+—-+—-+ | 03 | 02 | 01 | 00 | +—-+—-+—-+—-+

This is specific to CLAC, called little Endian machine.
- Before, computers are small, so they started with 8-bit register (8086). For backward compatibility, this reverse representation is better
Big Endian machine stores in normal order, aka network byte order
Typically don’t care. The computer takes care of it!
- Does not affect MSB/LSB, or any program operation

May be problematic:

Between file systems

Internet IP address (packed in 4-byte native integer), which is big Endian!

unsigned int n;               // = 0x00010203
fread(&n, sizeof(n), 1, f);   // in `f`: 0302 0100
char *p = (char *)&n;         // treat number as a char[4]
printf("%.2x", p[0]);         // will print "03"

Conversion

htonl: host to network long (4 bytes)

#include <apra/inet.h>

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong); 
uint16_t ntohs(uint16_t netshort);

Convert on little Endian host
No change on big Endian host itself.

15. Multiprocessing

(Not multithreading: 2 functions within the same process running at the same time)

Process

Each instance of a running program is a process. There can be many processes of the same program. Example: running vim twice

Each process has its own ID (PID)

getpid(): obtains a process’s PID
getppid(): obtains PID of the parent.
fork()

fork() is a special system call. Needs to ask the underlying OS to do special stuff.

pid_t fork(void);

When fork() is called,

The process is suspended. Kernel takes over.
OS freeze the process’s memory address space (memory, static, heap, stack).
Takes a snapshot and all current status
Clones the status
Let the two processes go independently. Nothing shared.

`fork()` return

Returns the PID (pid_t) of its newly created child. Many children can be created

If parent, p > 0 (returns child PID)
If child, p = 0 (no child, not a valid process ID)
If failed, p < 0 (E. too many already) Then, both parent and child resume from the place where fork() returned. Values may be different
No guarantee of execution order, unless explicitly coordinated (waitpid())

partnet-child.c Two branches run at the same time!! fork() splits program into two, and runs together.

pid_t pid = fork();
// Both parent and child will resume execution HERE

if (pid == 0) {
	// Child
	printf("This is the child, my PID is %d\n", getpid());
} else {
	// Parent
	printf("This is the parent, my child's PID is %d\n", pid);
}
// Both will print Hello
printf("Hello\n");

Exit status and `waitpid()`

A process terminates with an integer exit status code (0 for success, else error)

Return value of main()
Termination of exit(int status)

Reaping child: parent process waits until child terminates, retrieving the child’s exit code

pid_t waitpid(pid_t pid, int *wstatus, int options);

pid: PID of the process to wait for
- -1 to wait for all its children
wstatus: pointer to place to write the terminated child’s exit code
- NULL if not interested
options
- 0: normal. Block until child terminates
- WNOHANG: never blocks.
  - If a child terminates, return its PID
  - If all child(ren) is still running, return 0.
Returns PID of the child process reaped, or -1 on error

Zombie process: a child that has terminated but not yet reaped. OS keeps metadata on it, so parent must reap children Orphan process: parent terminates before child. Adopted by init process and automatically reaped when terminated. Not bad

Executing programs

When fork()ed, the child runs the same program as the parent. Change program through the exec() family of functions: execl(), execlp(), execle(), execv(), execvp(), execvpe()

ap-shell.c Runs shell inside a program.

ps af

-bash
 \_ .a/.out
	 \_ /bin/bash

`execl()`

execl() replaces the current process image with a new process image (program).

Process continues: PID and PPID retained
Memory space completely reset (stack, heap, …)

New program starts to run from its main()

int execl(const char *pathname, const char *arg, ...
       /* (char *)NULL */);

pathname: the name of the program that gets turned into
arg: variadic function, list of strings to form the argv array passed to the main() of the next function
- Terminated by (char *)NULL
```
execl("/bin/echo", "/bin/echo", "hello", (char *)NULL);
printf("This should not be printed");
```
  This imitates /bin/echo hello
If execl() goes well, it will not return. The process is turned into echo, and terminates. The old program is erased.
execv()
execv() takes the argv array directly: ```c int execv(const char *pathname, char **argv);

char *argv[] = { “/bin/echo”, “hello”, NULL };

- `argv`: (actually `*argv[]`): the `argv` array **passed to the `main` of the next function**
	- Need to tokenize the input to take additional params
If `execvp()` used, will automatically loop through common directories (`/bin/`)

```c
while (fgets(buf, sizeof(buf), stdin) != NULL) {
	pid = fork();
	if (pid == 0) {  // child process
		// DOES NOT support additional params
		char *argv[] = { buf, NULL };
		execv(argv[0], argv);
		// Got here means something wrong with execv()
		die("execl failed");
	} else {         // parent process
		if (waitpid(pid, &status, 0) != pid) {
			// If a different child reterns, die
			die("waitpid failed");
		}
	}
}

waitpid() does not return until child terminates.

If child different, die

16. UNIX

Application: ls, vim, gcc, chrome, mdb-lookup-cs3157
Library functions: printf(), malloc()
System calls: write(), fork(), execl()
OS kernel: Linux, windows NT
Hardware: processor, memory (RAM), disk, GPU, keyboard

How shell works

System

ps afj     # prints in a forest, shows parent-child rel

a shows all processes
f shows in forest
j Shows PPID (parent)
x shows system-wide stuff (E. sshd)

In the output, parent often goes first, but can’t exactly control. Unpredictable. Controlled by OS.

When you run a program on bash, bash forks itself, and the child turns itself into the executed program.

How’s the very first process started? Hand-stitched by the OS kernel at startup. Called init. PID: 1

Users and file permissions

-rwxr-xr-x 1 mg4264 student 12345 Oct 31 17:04 mdb-lookup*

Permissions:
- -: regular file
  - d for directory, p for pipes
- rwx owner: read, write, execute
- r-x owning group: can’t overwrite (recompile)
- r-x everyone else: can’t overwrite
mg4264: owner (actually represented as number, seen from id command)
students: owning group (number)
12345: binary size of file size
Timestamp
File name

Octal representation

Binary: 1 for enable, 0 for disable (111101101). Octal: 755

Change permission

Only owner can change!

chmod -x foo      # turns off for myself
chmod go+x foo    # turns on for g and o
chmod a+x foo     # turns on for ALL
chmod 755 foo

chown root foo    # change owner to root

For mdb-cs3157, you don’t have the write permission. If you run some program (vim), permission check still fails. You can’t change it via regular programs

mdb-add-cs3157 has permission rws.
- s: set user id. When you run it, it runs as the owner of the program Gives limited permission to others

Directory permissions

r allows reading content
- ls allowed,
- ls -l (metadata) needs x permission
wx allows modifying content (create, delete, move)
- w alone is meaningless!

x for entering

drwx------  2 jae jae  5 Nov 12 23:50 2bin/      # not accessible
drwxr-xr-x  2 jae jae 13 Nov 12 23:50 bin/       # accessible

Need to give x permission to all parent directories

You may not have r permission to see the contents, but you can physically be there and redirect yourself to the subfolders.

Shell scripts

Text file that can be interpreted by some program ls4.sh

for i in {1..4}
do
	ls
done

/bin/bash ls4.sh

Can’t do ./ls4 on shell. Must do bash ls4, because you don’t have execution permission by the file owner.

To give execution permission:

chmod +x ls4.sh
./ls4.sh

Now ls4.sh has the * label in ll listing.

Using ap-shell.c:

./ap-shell
AP> ./ls4.sh

Fail! execv()(carried by the OS) does not accept ./ls4.sh. How did it work in our bash shell?

bash calls execv().
If failed, bash tries running it directly. Will not work for shells other than bash, that uses a different language

`shebang` directive

At the top line of the shell script, put the full path of the program that can run the rest of the file

#!/bin/bash/
for i in {1..4}
do
	ls
done

OS looks at the first 2 bytes of the file. If begins with #!,

execv() syscall executes the specified process /bin/bash
The original script file ls4.sh is passed to the argv array.
Same as /bin/bash ls4.sh Now any shell can interpret it, including ap-shell

Lab 5

Part 1(a)

$$: PID of the shell running the script
$1: first argument

Part 1(b)

\_ sshd: mg4264 [priv]
	\_ sshd: mg4264@pts/2
		\_ -bash
			\_ ./jaes-nc-1 5201
				# execl of child process. PID printed
		         \_ /bin/bash ./mdb-lookup-server-nc.sh 5201
		            \_ cat mypipe-364820
			        \_ nc -l 5201
				    \_ /bin/sh /home/jae/cs3157-pub/bin/mdb-lookup-cs3157
		                \_ /home/jae/cs3157-pub/bin/mdb-lookup /home/jae/cs3157-pub/bin/mdb-cs3157

Find process tree of all ancestors: Trace thru all PPIDs, such as 1 (init) . 0 is not a valid process. Then you can launch another child process. Printing order is messed up. Do not sleep().

Part 1(c)

Improved version that asks you for a port number

Keeps fork()ing child processes and launching multiple servers
- If permission denied, remove FIFO
- If quit on client side, remove FIFO
- Before printing another prompt, report which process has terminated so far.
  - (Can’t report it right away because blocked by fgets(), then parse into a number).
  - A new process can be simultaneously launched there
  - After child terminates, sit there and waits for parent retrieval.
  - Need to find out whether children have terminated
    - Can’t simply call waitpid() normally. Stuck in case not terminated
    - pid = waitpid( (pid_t) -1, NULL, WNOHANG). Never block. Returns immediately whether the children.
      - -1 is a special number, indicating that ANY child has terminated.
  - Finally, print another prompt ~ 30 lines of code

17. TCP/IP

Internet Protocol Layers

Interconnection between networks Do minimal first, and let the higher layers figure out the next. Layers:

Physical: radio wave, light, wire
Link: sends packets over connection, rather than voltage variations (E. ethernet, wifi)
Network: (global) receive info, and forward it to the next direction (E. Internet Protocol-IP, IPv4, IPv6)
- Routers carry packets between places by IP address.
  - Address should contain certain location information.
  - MAC address: unique to each device. Won’t work for IP, since does not carry any location info.
- IP address (E. 128.59.2.5)
  - Dotted quad notation. Each number is a single byte (0-255).
  - Packets have header in source/destination IP addresses (network order)
  - Running out of addresses! Use Network Address Translation. Router gives out fake public IP addresses to all its devices, and have a mapping between the fake address to the real. (range, private IP addresses: 192.168.x.x)
- If you send a package to China (Chinese IP)
  - Goes through nearest gateway router (Columbia)
  - Gateway sends it west (based on location info)
  - The closer you get to destination, the routers know more
  - Boarder gateway protocol – BGP
- IP protocol doesn’t guarantee anything. If packet dropped in the middle, No recovery -> 4
Transport: manages packet flow pipeline (E. TCP, UDP)
- Using this unreliable IP, build a clean, working pipeline, in order
- Broken into sequential chunks
- Congestion control
- Multiplexing (port number)
Application: Interpret transported data (E. HTTP, SMTP (email), SSH, ncat (does nothing, simply exposes L4 layer))
- Access through sockets API (application programming interface, lab3 is linked list API) IP is the bottle neck. Top/bottom layer needs to take a lot of stuff

TCP/IP Networking

TCP establishes reliable bidirectional channel. (ncat)

Reliable: packets never dropped, corrupted, reordered
Bidirectional: both server and client can send and receive Hosts are identified by
IP address
or domain name clac.cs.columbia.edu (clac within local network). Translated to IP address using Domain Name System (DNS)

Port number

a 2-byte integer (short)

dig clac.cs.columbia.edu
nc 34.145.159.110 10000

nc localhost 10000
nc 127.0.0.1 10000           # localhost IP address

Server listens to a port number

Client connects to the server with server’s IP and port number the server is listening on Convert host name to address:

struct hostent *he;
if ((he = gethostbyname(serverName)) == NULL) 
  die("gethostbyname failed");
char *serverIP = inet_ntoa(*(struct in_addr *)he->h_addr);

`netcat`: TCP/IP tool

Similar to cat, but thru network

Server mode: runs first, then listen for incoming connections

nc -l 20000

ssh is a client program that connects to a server program sshd
- Find a server in clac.cs.columbia.edu and knock on its sshd program

Client mode: specifies the server to knock on, and then make connections.

nc clac.cs.columbia.edu 20000

Flags

-N: close connection upon EOF on stdin
-C: end lines with \r\n, useful for HTTP

Now they have made a TCP connection (Transport connect protocol)

Hides all the complexity and gives a clean pipeline of sending/receiving files Whatever entered on one end will show up on the other end. Done on both sides
netcat reads from stdin, and sends via network
Other side receives, until something comes from the network, spits to `stdout

How is it possible? You are stuck on fgets()! How can it print out simultaneously: fork (or something more advanced)

Turn `mdb-lookup` to network server

Server side:

nc -l 20000 < tmp | ./mdb-lookup-cs3157 > tmp

Sends server input to mdb-lookup Won’t work with normal file!

Pipe blocks until the first program outputs, before the second program reads
- There is no EOF for a pipe. It will just wait
File redirection: no guarantee. If EOF encountered, done

Need a special file that acts like a pipe: named pipe, created by mkfifo (make first-in-first-out, aka pipe) This file will never have content. Size always 0. Just an entry point

mkfifo mypipe

nc -l 20000 < mypipe | ./mdb-lookup-cs3157 > mypipe

cat mypipe | nc -l 20000 | ./mdb-lookup-cs3157 > mypipe    # same thing

stdout of nc piped to mdb-lookup, whose output goes to mypipe, which is fed back into nc

File Descriptors

In UNIX, native file handles are integers called file descriptors.
C file descriptors are wrappers on top. fopen(), fread(), fwrite() eventually call its UNIX counterparts: open(), read(), write() system calls
stdin: 0
stdout: 1
stderr: 2
New files opened fd: likely 3 toupper.c needs to be compiled with STDIO or UNIX defined. Reads stuff and prints capitalized. UNIX
```
ssize_t write(int fd, const void *buf, size_t count);
int open(const char *pathname, int flags);
```

toupper.c

#ifdef UNIX
	int fd_in = 0;
	int fd_out = 1;

	while (read(fd_in, &c, 1) == 1) {
		c = toupper(c);
		write(fd_out, &c, 1);
	}
#endif

Once a TCP connection is made, it will be made to a file descriptor.

Sockets API

Socket: Endpoint for a reliable, bidirectional connection (E. TCP)

Bound to IP address and port number
Something you can plug in, but won’t go anywhere Associated with file descriptors. Can use with I/O system calls read(), write(), close()
```
int socket(int domain, int type, int protocol);
```
AF_INET: IPV4 protocol
SOCK_STREAM: layer-4 technology using: TCP
0: Return: if < 0, failed!

Client

connect() the socket file descriptor to the server address

int fd = socket(...);
connect(fd, ... /* server address */);
// Communicate with the server thru read() and write()
close(fd);

A connect()ed socket will be create later by accept() on the server side

Server

socket() creates a listening socket
bind() socket to a port that should be known to the client
listen() sets up listening socket for incoming connections. Client can connect() at here.
- Clients can queue up, waiting for server to call accept().
accept() incoming connections
- Will block until a client connects
- Returns file descriptor of a new socket (clnt_fd) for each new client
  - servsock is a template to create the new clntsock connection.
- Now there is a virtual pipeline where packets are byte flow.

One side calls close() after communication done

Other side detects it and calls close() as well

int serv_fd = socket(...);
bind(serv_fd, ... /* server address */);
listen(serv_fd, ... /* max pending connections */);
while (1) {
 int clnt_fd = accept(serv_fd, ...);
 // Communicate with client thru read() and write()
 close(clnt_fd);  // terminates TCP connection
}

sockets

Socket address `struct`

connect() and bind() requires specifying server’s address and port using struct sockaddr

// connect requires struct sockaddr. Need to be casted
struct sockaddr {
	sa_family_t sa_family;
	char sa_data[14];
};
struct sockaddr_in {
	sa_family_t sin_family;            // address family: AF_INET
	uint16_t sin_port;                 // port in network order
	struct in_addr sin_addr;           // address
};
struct in_addr {
	uint32_t s_addr;                   // address in network order
}

// destination address
strcut sockaddr_in servaddr;
memset(&servaddr, 0, sizeof(servaddr));     // zero out memory
servaddr.sin_family = AF_INET;              // address family
servaddr.sin_addr.s_addr = inet_addr(ip);   // address in network order
servaddr.sin_port = htons(port);            // port in network order

connect() looks at the first two bytes, and casts pointers to address the contents (polymorphism) Send connection packet to server, wait for acknowledgement.

sock: your end of the TCP connection
IP address of server and port number connected in struct Return 0 if successful. Now socket is an established connection. Good endpoint for established virtual pipeline
<0: failed Very crude way of OOP

Sockets I/O

Once TCP/IP connection established on a socket, can communicate with read(), write()

int write(int fd, const void *buf, size_t len);
int read(int fd, void *buf, size_t len);

int send(int sockfd, const void *buf, size_t len, int flags);
int recv(int sockfd, void *buf, size_t len, int flags);

sockfd: destination
buffer: with characters sent
length: how many bytes to send/receive
flags: 0 means normal behavior (write(sock, buf, len), where sock ana. FILE), blocks until sends all bytes requested send()
Return: number of bytes sent, -1 on error recv() Blocks until it has received at least 1 byte
Return num bytes received
- 0 if connection closed by other party
- -1 in error

Wrapped `FILE` Descriptor

No syscalls equivalent to fgets() or fprintf() :( We can wrap a file descriptor using fdopen() (d!), returns a FILE *

FILE *sock_fp = fdopen(sock_fd, "wb");
fprintf(sock_fp, ...);
fclose(sock_fp);                      // will close() underlying socket fd

May need setbuf() or fflush() to ensure buffered output actually sent through

Not good to use it for both read and write (r+) on the same FILE *. C requires to fseek() every time you switch between read and write Or use dup() to duplicate socket descriptor and create two separate FILE *s

FILE *input = fdopen(socket_descriptor, "r");       // read-only FILE pointer  
FILE *output = fdopen(dup(socket_descriptor), "w"); // write-only FILE pointer  
// ...  
fclose(input); 
fclose(output);

`echo` server example

tcp-echo-client.c and tcp-echo-server.c One a connection is established, 4 couples:

IP address of server
Port number server is listening on
IP address of client (unseen)
Port number where client came from (unnoticed, underneath. OS randomly picks a number for client and sends to server at initialization. Server remembers it so it can reach client)

Client sends server a packet of characters. Sometimes receive all characters, but sometime split. You can’t control message boundaries in TCP If messages are bigger, don’t know how things will be broken up.

Client quits once it receives everything.
Server is perpetual, waiting for new client after current one ends

Client

// ./client <server-ip> <server-port>
const char *ip = argv[1];
unsigned short port = atoi(argv[2]);   // port number is short

// socket(): your end of the connection
// Like a file descriptor, pass to read and write
// open socket has a descriptor, likely #3, after stderr
int sock = socket(AF_INET, SOCK_STREAM, 0);    
connect(sock, (struct sockaddr *)&servaddr, sizeof(servaddr));

char buf[100];
fgets(buf, sizeof(buf), stdin);
size_t len = strlen(buf);

send(sock, buf, len, 0);

int r;
while (len > 0 && (r = recv(sock, buf, sizeof(buf), 0)) > 0) {
	fwrite(buf, 1, r, stdout);   // only printing r chars
	len -= r;
}

close(sock);
return 0;

Server

// ./server <server-port>
unsigned short port = atoi(argv[1]);

int servsock = socket(AF_INET, SOCK_STREAM, 0);

strcut sockaddr_in servaddr;
memset(&servaddr, 0, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);  // any netowrk interface, 
// INADDR_ANY = 0.0.0.0. If you want WIFI, only, you'll pass specified
servaddr.sin_port = htons(port);               // port # you choose to listen on

// Bind to local address, grab the port. 
// Other program will fail to bind the same port
bind(servsock, (struct sockaddr *) &servaddr, sizeof(servaddr));

// Start accepting client connections. Queue (5) requests up at OS level
// Now clienct can connect()
listen(servsock, 5);

// CLIENT address
struct sockaddr_in clntaddr; 
int r;  char buf[10];
while (1)
{
	socklen_t clntlen = sizeof(clntaddr);

	// accept() gives client side info
	// clntsock is a NEW socket for further communication
	int clntsock = accept(servsock, (struct sockaddr *) &clntaddr, &clntlen);

	// only pass CLIENT side
	// keep receiving and sends back the chunk it reads (until client closed)
	// recv() blocks until at least 1 byte received
	while ((r = recv(clntsock, buf, sizeof(buf), 0)) > 0)
		send(clntsock, buf, r, 0);

	close(clntsock);
}

File transfer program

File transfer program that runs on any TCP communication program Sender:

./sender 127.9.9.1 10000 <filename>

Receiver:

./recver 10000 foo

Files received will be named foo.0, foo.1, …

Not complete. Does not check file name or error

`tcp-sender.c`

(client side)

Boiler plate code, socket(), prepare address struct, connect()

Send file size as 4-byte uint in network order

FILE *fp = fopen(filename, "rb");
struct stat st;
stat(filename, &st);
uint32_t size_net = htonl(st.st_size);
send(sock, &size_net, sizeof(size_net), 0);

stat(filename) gives you info about file by filling in struct stat

Sends actual file content in 4K chunks

 char buf[4096];               // 4K is an optimal for disk read
 unsigned int n; 
 unsigned int total = 0;

 while ((n = fread(buf, 1, sizeof(buf), fp)) > 0) {
     send(sock, buf, n, 0);    // if != n, die
     total += n;               // collect total bytes sent
 }
 // check for ferror()
 fclose(fp);
 fprintf(stderr, "bytes sent: %u\n", total);

Use fread(buf, 1, sizeof(buf), fp). We may need to partial read 4K and know how many bytes read. For send(), the size needs to be n, to prevent writing garbage

Receive and verify file size acknowledgement from server
```
uint32_t ack, ack_net;
int r = recv(sock, &ack_net, sizeof(ack_net), MSG_WAITALL);
ack = ntohl(ack_net);
```
MSG_WAITALL tells recv() to block until full request is satisfied (4-byte ack_net filled). If still problem:
- If r != sizeof(ack_net),
- r < 0, recv() failed
- r == 0, connection closed prematurely
- r > 0, type is not uint32 - If ack != size, something content missing
- Does not account for bit flips

Expects server to close connection

 char x;   // just receive a single byte (0)
 r = recv(sock, &x, 1, 0);
 assert(r == 0);

 close(sock);

recv() returns 0 when server closes

`tcp-recver.c`

(server side)

Boiler plate: socket(), set up struct, bind(), listen()

 FILE *fp;
 unsigned int filesuffix = 0;
 char filename[strlen(filebase) + 100];

 int r;
 char buf[4096];
 uint32_t size, size_net, remaining, limit;
 struct stat st;

 while (1) {
     accept();
     // generates file name and writes
     snprintf(filename, "%s.%u", filebase, filesuffix++);
     fp = fopen(filename, "wb");

filename’s size is dynamic. (stack allocated)

Problem: no error checking. If it fails, program misbehaves. malloc() will return NULL

Receive file size

     r = recv(clntsock, &size_net, sizeof(size_net), MSG_WAITALL);
     // error checking here
     size = ntohl(size_net);
     fprintf(stderr, "size received: %u\n", size);

Receive file content

 remaining = size;    // supposed to receive
 while (remaining > 0) {
     limit = remaining > sizeof(buf) ? sizeof(buf) : remaining  // min
     r = recv(clntsock, buf, limit, 0);
     if (r > 0) {
         remaining -= r;
         fwrite(buf, 1, r, fp);
     }   // error check if r <= 0
 }
 assert(remaining == 0);
 fclose(fp);

Loop to keep recv()ing and write to the disk Make sure to write only the exact number of bytes received. In lab 6 part 2, we wrapped recv() with fgets() and fread()

Send file size back as acknowledgement

 stat(filename, &st);
 size_net = htonl(st.st_size);
 send(clntsock, &size_net, sizeof(size_net), 0);

Close client connection, can accept() next client
```
 close(clntsock);
}
```

18. HTTP

HyperText Transfer Protocol, application level

HTTP clients request resources (E. html) from HTTP servers over a TCP connection

URL anatomy

http://www.clac.cs.columbia.edu:80/index.html

http HTTP protocol
www.clac.cs.columbia.edu Domain name, translated to IP address
80 Port number
/index.html URI Once connection made, speak to HTTP to request /index.html.

URI (Uniform Resource Identifier, /index.html) tells server the resource you want.

The server interprets “/index.html” and decides what resources to serve
Typically, URI appended to some web root directory of server
- E. root is /home/www/web, server will respond with file at /home/www/web/index.html
Static: server file content directly comes from file system
Dynamic web server may query other databases

Browsing HTTP

Browser establishes a TCP connection with the server with the IP address
Browser sends HTTP request to the server for resource
Server responds, over the same TCP connection (fulfill or deny)
Browser parses HTML, makes subsequent requests for other resources, and renders the webpage

Command-line

curl http://example.com/index.html     # prints to stdout
wget http://example.com/index.html      # saves page to disk

curl --http1.0 forces using HTTP/1.0

Client `GET` request

Getting index page from ncat (stored in /var/www/html)

nc clac.cs.columia.edu 80

GET /index.html HTTP/1.0            # request line
Host: clac.cs.columbia.edu:80       # header
									# /r/n blank line

Request line: method (GET), URI (index.html), HTTP version (HTTP/1.0), space separated
Header: field name, colon, field value
New line is \r\n!! Sends a metadata header (browser receives all these), separated by blank line, followed by actual file (HTML, binary, music, video, etc.)

Server response

Status line: HTTP version, status code, optional phrase (space separated)
- HTTP version of server response may be different
Header
\r\n
Beginning of file content (arbitrary binary data)
Server closes TCP connection after file sent

nc -l 8888

HTTP/1.0 200 OK                     # status line
Content-Type: text/html             # header
									# \r\n
<html> <h1>                         # resource content...
Hello, world!
</h1> </html>

Lab 6 part 2

Look for two bytes? Build a web client

E. GNU make manual: https://www.gnu.org/software/make/manual/make.html

wget https://www.gnu.org/software/make/manual/make.html

./http client <host name> <port number> <file path>
./http-client https://www.gnu.org 80 /software/make/manual/make.html

Start from client boiler plate

Connect
Send request file via HTTP/1.0, terminating with /r/n
1. Little more thing
Get response. parse it. Look for 200 OK. If not, report
1. skip header ,skip blank line
Loop:
1. read (fread()) 4K chunk
2. save Must work for all binary files

Lab 7

Part 1

clac:80 files are in /var/www/html folder When GET /index.html HTTP/1.0, we don’t start from root directory. /var/ww/html is configured to be the web root by a configuration file

https://clac.cs.columbia.edu:80/~jae/index.html will look to user specific web root directory, configured to be ~/html if it sees “/~jae”

Challenge: get the permission right. Needs x permission for all intervening directories

Part 2(a)

http-server: Now write server that receives a GET request, parse, read file, and send to browser

Use browser to test webserver ./http-server 8000 /home/jae/html/cs3157 127.0.0.1 9000 ./http-server 5201 ~/html/cs3157 localhost 9201 On browser go to clac.cs.columbia.edu:5201/tng Now connect to http://clac:8000/tng/index.html, it will send GET /tng/index.html HTTP/1.0 Now the page displayed will be served by my own server

Browser receives index.html from first GET, will render it, and then makes another GET request to fetch the image
- When GET file downloaded ended, the server closes the connection (read, save, …, until fread() returns 0)
Each GET gets a connection. The server closes when done, and browser connects again
- 3 separate TCP connections for HTTP/1.0
- Fixed in HTTP/1.1, before sending content, in header, server specifies content length.
- The first time, browser asks for /favicon.ico, the icon in URL bar. Server sends 404 Not Found

Part 2(b)

If send hard-coded URL clac:8000/mdb-lookup, my server would respond with a textbox. May see weird " " 400 Bad Requests, just respond When type something in window, browser will send GET /mdb-lookup/key=AP, mdb-lookup will respond, server will make it pretty, and send it to browser

The source page can be downloaded, with a <form>

On “submit”, browser will compose a GET request with the ?key=
GET sent to my server
In server, if GET without key, send empty form
- If key=something, server will look for it
- Do the mdb-lookup query (empty query will print everything)
- Get the result
- Format it (need to wrap it up with table and stuff)
- Send it back (a) is static. Look for the file. If there, send it to browser.
If the file name is special mdb-lookup, do the special processing. Nothing dynamic on browser. Browser is just static page

Nowadays JS script running on browser

3-tier architecture

Extra. C++

Most C code are valid in C++

Start by writing a simple C program

struct (aka class)

You can omit struct keyword
Can contain other methods
Access rights
- struct are by default public
- class are by default private
Constructor
Destructor, called automatically when variable out of scope (here it’s main())
- E. free allocated memory
Heap allocation: need to cast void * to Pt *
- Did not invoke constructor and destructor
- malloc() grabs this of memory for this object. Has no idea to insert constructor calls.
- New syntax: new (constructor + heap), delete (destructor)

#include <stdio.h>
class Pt {         // you can drop the struct 
public:            // everything below is public
	double x;
	double y;
	Pt() {         // constructor
		x = 1;
		y = 2;
		printf("hi\n");
	}
	~Pt() {        // destructor, called automatically when var goes out of scope
		printf("bye\n");
	}
	void print()   // method
		printf("(%g, %g)\n", x, y);
}；

int main() {
	// Stack allocation
	Pt p;          
	p.print();
	
	// Heap allocation
	//Pt *pp = (Pt *)malloc(sizeof(Pt));
	Pt *pp = new Pt();
	pp->print();
	//free(pp);
	delete(pp);
}

Translating C to C++

class -> struct
Access: check through all calls, make sure legit
Member functions are removed and added as global functions
- rename print() to Pt_print()
- Add a parameter this
- Similar to Lab 3 that takes a pointer to a List
```
void Pt_print(struct Pt *this) {
  printf("(%g, %g)\n", this->x this->y);
}
```
Take constructor/destructor as a global function
- Call it right after variable declaration (allocate room on stack)
More: inheritance, polymorphism

Java

Pointers become references Everything is a pointer

// Heap allocation
Pt pp;
pp = new Pt();
// no need to delete()

// Stack allocation
// CAN'T!!!
// IMPOSSIBLE!!!

Java does not allow object creation on the stack. Therefore, Java virtual machine needs a tiny stack Background garbage collector to delete inactive pointers

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Ming Gong

1. CLI Basics

SSH

Folder organization

Shell shortcuts

Vim shortcuts

2. Compilation and Make

Building a C program

Proper way: separate compilation

More options

Declaration

Prevent linking problem

Makefile

Variables

Auto remove binary files after compilation

Git

3. C Basics

Integer types

Binary numbers

Integer literals

Expressions vs statements

Statement

Expression

Bitwise operators

HW hint

Logical operators (short circuit)

Assignment and x++

4. Storage Classes

1. Local variables

2. Static variables

2a. Global (static) variable

2b. File static variable

2c. Block static variable

Process address space

Stack allocation (an overview)

5. Pointer

Pointer variables

Pointer types

Void pointer

NULL Pointer

6. Arrays

sizeof()

p++;

GUT

Exceptions

7. Heap allocation

Freeing memory

Memory leak

8. String

String Literals

String library functions

argv array

9. Function Pointers

const

const pointers

qsort

Complex Declarations

10. Structures

struct layout

struct size

Pointer to struct

Memory optimization

Linked list

struct List

11. Libraries

Macros

include Guards

Archive files

Using libraries

-I

-L and -l

12. Standard I/O

Redirection

Pipes

Useful functions and examples

13. File I/O

FILE pointers

Open and Close

Open

Binary file

Assignment and `x++`

`sizeof()`

`p++;`

`argv` array

`const`

`const` pointers

`qsort`

`struct` layout

`struct` size

Pointer to `struct`

`struct List`

`include` Guards

`-I`

`-L` and `-l`

`EOF`

`cutstr.c`

`fork()`

`fork()` return

Exit status and `waitpid()`

`execl()`

`execv()`

`shebang` directive

`netcat`: TCP/IP tool

Turn `mdb-lookup` to network server

Socket address `struct`

Wrapped `FILE` Descriptor

`echo` server example

`tcp-sender.c`

`tcp-recver.c`

Client `GET` request