#cs144

cs144 checkpoint0

developer

cs144 checkpoint0

1 装系统

最好还是用课程提供的镜像,我用的kubuntu22.04,后面遇到一堆版本问题。

2 装依赖

sudo apt update && sudo apt install git cmake gdb build-essential clang \
clang-tidy clang-format gcc-doc pkg-config glibc-doc tcpdump tshark

tshark就是cli的wireshark

3 Networking by hand

3.1 访问web页面

用telnet访问http

3.2 发邮件

没有sunetid,没成功

3.3 本地字节流

用netcat和telnet实现了本地侦听端口,两个终端中出现同步字节流。

nc -vlp 9090
telnet localhost 9090

4 webget

Writing a network program using an OS stream socket

流套接字:一个文件,两边出现同步的字节流

目标:用操作系统提供的TCP实现,写一个webget程序实现刚才手动完成的功能

4.1 初始化repo

连接到github出了点问题,需要用token或ssh key,我选了token

4.2 编译初始代码

终于知道为什么建议用他们提供的镜像了,上来就报错cmake版本不够,于是更新cmake。因为是22系统还需要添加额外的ppa

# 安装依赖
sudo apt update
sudo apt install software-properties-common wget

# 添加 Kitware 签名密钥
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | sudo tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null

# 添加仓库
echo 'deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ jammy main' | sudo tee /etc/apt/sources.list.d/kitware.list >/dev/null

# 更新并安装 CMake
sudo apt update
sudo apt install cmake

然后编译过程中报错找不到<format>,原来是gcc/g++版本不够。一开始更新到了12还是不够,得13才行

# 添加 PPA
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update

# 安装 GCC 13
sudo apt install gcc-13 g++-13

# 设置为默认编译器
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-13 130
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-13 130

# 验证版本
gcc --version
g++ --version

然后终于可以编译了

cmake -S . -B build
cmake --build build

其实更新到24.04LTS就完美支持了,真是没事找事做。

4.3 Modern C++

RAII: Resource Acquisition is Initialization,对于malloc/free、new/delete这样的成对行为,要求它们必须在析构过程中实现,避免漏掉(例如函数提前返回并扔出错误,结果后半段释放内存的操作没执行)。

提了一些要求,惊人的是new/delete、指针、模板、虚函数都不建议用。

多用git,多提交,最好每个commit都能跑起来,这样容易debug。

4.4 读初始代码

minnow把操作系统的C实现封装成了Modern C++类,之后用他们的类就行。公共接口在这两个文件中

  • util/socket.hh
  • util/file_descriptor.hh

注意socket是一种file_descriptor,tcpsocket是一种socket。

4.5 实现webget

在apps/webget.cc中实现get_URL函数,功能就是获取http网页。

需要用到的接口:

util/socket.hh
	TCPSocket()
	Socket.connect(Address)
	
util/file_descriptor.hh
	void read(string)
	size_t write(string)
	
util/address.hh
	Address()

通过这个示例可以特别直观地感受到类的继承,以及这样做的好处。

下面是webget的实现:

void get_URL( const string& host, const string& path )
{
  TCPSocket sock;
  sock.connect(Address(host, "http"));
  string req = "GET " + path + " HTTP/1.1\r\n"
                "Host: " + host + "\r\n"
                "Connection: Close\r\n"
                "\r\n";
  sock.write(req);
  while(true){
    string response;
    sock.read(response);
    // when get EOF, stop reading and printing
    // "a single call to read is not enough"
    if(response.empty()){
      break;
    }
    cout<<response;
  }

  debug( "Function called: get_URL( \"{}\", \"{}\" )", host, path );
  debug( "get_URL() function not yet implemented" );
}
} // namespace

竟然没debug几次就成功了。第一次是string写成了String,第二次是response=sock.read()调用方法错误,第三次是把response初始化成NULL报错。

构造、使用和测试:

cmake --build build
./build/apps/webget css144.keithw.org /hello
cmake --build build --target check_webget

5 ByteStream

An in-memory reliable byte stream

内存中的可信字节流,其实就是实现一个Reader-Writer结构,不过不考虑多线程、锁一类的东西。

有三种思路作为缓冲区的数据结构

  • deque<char>最简单,但是一想到要for(len),性能肯定烂完了
  • deque<string>性能应该是最好的,但也是最复杂的
    • deque采用了很高明的内存分配,但需要处理reader writer块长不一致的问题
  • string buffer_适中选项,用一个字符串当作缓冲区
    • 不知道string是怎么分配内存的,但是性能还行

准备工作

语法

  • 私有变量命名为var_
  • bool error_ { false }; 后面的{}是初始化
  • const Reader& reader() const; const是说不会变化
  • explicit ByteStream( uint64_t capacity ); explicit用来避免隐式转换,强制用构造函数初始化对象

宏观视角

  • reader和writer是同一个ByteStream的不同接口
  • 需要共享的状态变量设置在ByteStream的protected区域中

std::string接口:

  • void str.append( str1.data(), len )

  • void str.erase( pos, len )

  • bool str.empty()

peek和pop

  • 这样设计是因为更灵活,适合网络应用场景
  • string思路可以一下子peek整个缓冲区
    • string_view(buffer_)仅提供引用,避免了深拷贝

需要修改什么

  • .hh的protected区域的状态变量
  • .cc中的各个方法

注意事项

  • 每个方法都很简单,其实真不用发怵,代码量很小。不行就让copilot提示下再自己写。
  • 需要注意边界条件、修改变量改的是哪个

实现

ByteStream.hh

#pragma once

#include <cstdint>
#include <string>
#include <string_view>

class Reader;
class Writer;

class ByteStream
{
public:
  explicit ByteStream( uint64_t capacity );

  // Helper functions (provided) to access the ByteStream's Reader and Writer interfaces
  Reader& reader();
  const Reader& reader() const;
  Writer& writer();
  const Writer& writer() const;

  void set_error() { error_ = true; };       // Signal that the stream suffered an error.
  bool has_error() const { return error_; }; // Has the stream had an error?

protected:
  // Please add any additional state to the ByteStream here, and not to the Writer and Reader interfaces.
  uint64_t capacity_;
  bool error_ { false };
  bool is_closed_ { false };
  // 用一个字符串作为缓冲区
  std::string buffer_ {};
  // 要保持reader和writer状态同步,所以放在ByteStream里
  uint64_t total_bytes_pushed_ {};
  uint64_t total_bytes_popped_ {};
};

class Writer : public ByteStream
{
public:
  void push( std::string data ); // Push data to stream, but only as much as available capacity allows.
  void close();                  // Signal that the stream has reached its ending. Nothing more will be written.

  bool is_closed() const;              // Has the stream been closed?
  uint64_t available_capacity() const; // How many bytes can be pushed to the stream right now?
  uint64_t bytes_pushed() const;       // Total number of bytes cumulatively pushed to the stream
};

class Reader : public ByteStream
{
public:
  std::string_view peek() const; // Peek at the next bytes in the buffer -- ideally as many as possible.
  void pop( uint64_t len );      // Remove `len` bytes from the buffer.

  bool is_finished() const;        // Is the stream finished (closed and fully popped)?
  uint64_t bytes_buffered() const; // Number of bytes currently buffered (pushed and not popped)
  uint64_t bytes_popped() const;   // Total number of bytes cumulatively popped from stream
};

/*
 * read: A (provided) helper function thats peeks and pops up to `max_len` bytes
 * from a ByteStream Reader into a string;
 */
void read( Reader& reader, uint64_t max_len, std::string& out );

ByteStream.cc

#include "byte_stream.hh"
#include "debug.hh"

using namespace std;

ByteStream::ByteStream( uint64_t capacity ) : capacity_( capacity ) {
  
}

// Push data to stream, but only as much as available capacity allows.
void Writer::push( string data )
{
  if(error_ || is_closed_) {
    return;
  }
  uint64_t can_write = available_capacity();
  uint64_t to_write = min( can_write, static_cast<uint64_t>( data.size() ) );
  buffer_.append( data.data(), to_write );
  total_bytes_pushed_ += to_write;
}

// Signal that the stream has reached its ending. Nothing more will be written.
void Writer::close()
{
  is_closed_ = true;
}

// Has the stream been closed?
bool Writer::is_closed() const
{
  return is_closed_;
}

// How many bytes can be pushed to the stream right now?
uint64_t Writer::available_capacity() const
{
  if( buffer_.size() >= capacity_ ) {
    return 0;
  }
  return capacity_ - buffer_.size();
}

// Total number of bytes cumulatively pushed to the stream
uint64_t Writer::bytes_pushed() const
{
  return total_bytes_pushed_;
}

// Peek at the next bytes in the buffer -- ideally as many as possible.
// It's not required to return a string_view of the *whole* buffer, but
// if the peeked string_view is only one byte at a time, it will probably force
// the caller to do a lot of extra work.
string_view Reader::peek() const
{
  return string_view(buffer_);
}

// Remove `len` bytes from the buffer.
void Reader::pop( uint64_t len )
{
  uint64_t to_pop = min( len, static_cast<uint64_t>(buffer_.size()) );
  buffer_.erase(0, to_pop);
  total_bytes_popped_ += to_pop;
}

// Is the stream finished (closed and fully popped)?
bool Reader::is_finished() const
{
  return is_closed_ && buffer_.empty();
}

// Number of bytes currently buffered (pushed and not popped)
uint64_t Reader::bytes_buffered() const
{
  return buffer_.size();
}

// Total number of bytes cumulatively popped from stream
uint64_t Reader::bytes_popped() const
{
  return total_bytes_popped_;
}

test

cmake --build build --target check0

Debug

其实只遇到了一个bug,终端输出被吃了,下面是凭记忆写的

1. initialize capacity=15
2. Writer.close()
...
5. Writer.available_capacity() -> 0 
	Error: available_capacity() should be 15

错误的边界条件导致的

uint64_t Writer::available_capacity() const
{
  if( error_ || is_closed_ ) {
    return 0;
  }
  return capacity_ - buffer_.size();
}

一开始我觉得都close了available_capacity当然应该置0,不过现在看来available_capacity()应该只和buffer.size()和capacity_绑定,引入error_和is_closed_没有道理。

还有一个幽默bug,在大概第7个testcase附近终端突然疯狂输出AddressSanitizer:DEADLYSIGNAL。搜了下原来是ASLR导致的,话说之前做二进制的lab的时候还了解了一点ASLR。解决方法就是把ASLR关了。

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

结果

性能意外的还不错,达到了10Gb/s。

Test project /home/bbz/cs144/minnow/build
      Start  1: compile with bug-checkers
 1/11 Test  #1: compile with bug-checkers ........   Passed    0.28 sec
      Start  2: t_webget
 2/11 Test  #2: t_webget .........................   Passed    2.11 sec
      Start  3: byte_stream_basics
 3/11 Test  #3: byte_stream_basics ...............   Passed    0.02 sec
      Start  4: byte_stream_capacity
 4/11 Test  #4: byte_stream_capacity .............   Passed    0.02 sec
      Start  5: byte_stream_one_write
 5/11 Test  #5: byte_stream_one_write ............   Passed    0.02 sec
      Start  6: byte_stream_two_writes
 6/11 Test  #6: byte_stream_two_writes ...........   Passed    0.02 sec
      Start  7: byte_stream_many_writes
 7/11 Test  #7: byte_stream_many_writes ..........   Passed    0.14 sec
      Start  8: byte_stream_stress_test
 8/11 Test  #8: byte_stream_stress_test ..........   Passed    0.05 sec
      Start 37: no_skip
 9/11 Test #37: no_skip ..........................   Passed    0.01 sec
      Start 38: compile with optimization
10/11 Test #38: compile with optimization ........   Passed   14.25 sec
      Start 39: byte_stream_speed_test
        ByteStream throughput (pop length 4096): 10.39 Gbit/s
        ByteStream throughput (pop length 128):   1.40 Gbit/s
        ByteStream throughput (pop length 32):    0.41 Gbit/s
11/11 Test #39: byte_stream_speed_test ...........   Passed    1.18 sec

100% tests passed, 0 tests failed out of 11

Total Test time (real) =  18.11 sec
Built target check0

checkpoint0就这么做完了!突然对c++有了些许兴趣与自信。