CS204: Advanced Computer Networks

Posted on 2020-04-28 | Post modified: 2020-04-28

Words count in article: 146

This is the lecture notes that I took when attending the graduate course CS204: Computer Networks in UC Riverside.

Reliable and Secure Distributed Programming

Posted on 2020-04-26 | Post modified: 2020-05-19

Words count in article: 2.7k

This is the note of the textbook “Introduction to Reliable and Secure Dsitributed Programming” to the graduate course CS247 “The Principles of Distributed Computing” in UCR.

Reference:

CS202: OS Paper Critique

Posted on 2020-04-26 | Post modified: 2020-04-28

Words count in article: 1.6k

Record the critiques that I made for the graduate course CS202: Advanced Opearting System in UCR.

SPIN
Exokernel
Scheduler Activation - Redesign the user-level thread
Lottery Scheduler Algorithm

P2PFUSE-OPENDHT A BUG

Posted on 2020-03-11 | Post modified: 2020-03-11

Words count in article: 570

This is the blog to record a bug that I encountered in the process of developing my P2P File System.

Introduction

By saying P2P file system, I mean that this file system doesn’t store the files storaged in the local disk. It will call the API provided by another P2P platform to fetch the data from it, and display the content returned.

For the file system part, I choose FUSE as the tool to develope a user-level file system. In this case, I don’t need to dive into the low-level detail about the disk or something else. Rather, I just need to implement specific API that FUSE provides, where I fill the data to the given data structure such as buf. And then the system API will call the user-defined API, fetch the required data and form the files.

For the P2P part, I choose the open source project OpenDHT as the backend of the file storage. Though there’re some limitations like the uploading data could only exist 10mins by default, garbled character after the user input string, overall, this has provided adequate functions for our system.

The BUG

Finally, this is the most important part of this blog. Well, actually it is kind of shamed for me to get a bug like this. Because this is a really fundamental-level bug in a high-level project. And the lesson I learnt from this is that, I need to study the fundamental knowledge more and more.

First of all, I decalre the header file which includes many definitions of the functions, class and global variables. And the issues lie at the global variables.

namespace nbfs {
	// hard coded user name, used for the namespace for files
	static std::string user_name = "hongbo/";

	// connection entry
	static dht::DhtRunner node;

	// root file node
	static FileNode* root;
}

Here is the definition of all the global variables. All the operations rely on the entry point provided by them. Well, it is totally fine with the first user_name variable, becase we don’t want it to be modified, and it should be visible to all the functions.

However, for the latter two variables, there’s a huge mistake. I set them to be static. In which case, the variable imported by the source file ending with cpp will all have different entry point of their own version. For example, I initialize the entry point in main.cpp file, however, when I call in the operations.cpp file, it will be set to NULL. I spent a huge time on this, because the core function dht::get() function will be called even though the node has not been initialized. And I just sticked on this point over and over again, failed to get the truth that the variable should not be static.

The solution

The correct way to declare the global variable in cpp is to give the definition in the header file like this

// hard coded user name, used for the namespace for files
static std::string user_name = "hongbo/";

// connection entry
extern dht::DhtRunner node;

// root file node
extern FileNode* root;

// map for file names
extern std::map<std::string, bool> exists;

// log file
extern FILE *logfile;

And this is just the definition, we will need to declare it in the source file where we initialize it. After that, all other files includes the header file could directly call the global variables right now.

Chord: A Peer-to-peer Lookup Protocol

Posted on 2020-01-26 | Post modified: 2020-06-10

Words count in article: 321

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

System Model

Support the following features:

load balance: spreading keys evenly over the nodes
decentralization: fully distributed; no node is more important than any other
scalability: the cost of a Chord loopup grows as the log of the number of nodes
availability: automatically adjusts its internal tables to reflect newly joined nodes as well as node failures
flexible naming: Chord places no constraints on the structure of the keys it looks up

The Chord Protocol

The consistent hash function assigns each node and key an m-bit identifier using SHA-1 as a base hash function. A node’s identifier is chosen by hashing the node’s IP address, while a key identifier is produced by hashing the key. The identifier length m must be large enough to make the probability of two nodes or keys hashing to the same identifier negligible.

The range of the identifier circle is $2^m$. Key k is assigned to the first node whose identifier is equal to or follows k in the circle.

The lookup algorithm makes use of binary search tech. Every node will store a table with m entries. The $i^{th}$ entry in the table at node n contains the identity of the first node s that succeeds n by at least $2^{i-1}$ on the circle. We denote the $i^{th}$ entry of node n by $n.finger[i]$.

finger

Since from any node, we can reach half of the entire Chord circle, the lookup operation would take O($\log n$) cost.

Impact of node joins on lookups: Sometimes the nodes in the affected region have incorrect successor pointers, or keys may not yet have migrated to newly joined nodes, and the lookup may fail. The higher-layer software using Chord has to notice that the desired data was not found, and has the option of retrying the lookup after a pause.

hashlib.md5 - Understand hexlify and unhexlify

Posted on 2020-01-21 | Post modified: 2020-01-21

Words count in article: 421

When I was implementing the Linux MD5 crypt method for the user password, I encountered this problem. To solve a problem in this low level benefit me a lot. Here is the summary for what I learned from this.

ASCII Encoding

Basically, string in python 3 are encoded in utf-8. This is a encoding method with a variable length for each character. If we are going to operate on the byte or bit level, it is not impossible to do that based on this encoding method. This we need to convert the string to another data structure - bytes.

However, there exists many ways to interpret the original string to a bytes array. And the defualt translation method is utf-8 for the encode() method of data structure str. No matter what encoding methods we are going to use, they are operating on the character level. That is to say, every character will be encoded in different ways based on the request encoding method.

digest & hexdigest

These are the method included in the data structure _Hash, which is the return value of hashlib.md5(bytes). We have two ways to get the printable result from it:

digest()

return the ascii encoding style result: every character in the output has the size of 8 bit, which means that there will be totally 16 characters (maybe includes invisible character).

If we directly print the result to the console, we will get the format like \x..\x..\x... That’s the low level storage method for ASCII encoding method.
hexdigest()

For every ASCII character, extract the two hex character from the original representation. For example, \x65\xa9 will be transformed to 65a9.

exlify & unhexlify

These are the method included in the package binascii, which is aimed to convert binary to ascii character, and vice versa.

We can pass the result from digest() method to hexlify, and we will get the same result with unhexlify.
We can pass the result from hexdigest() method to unhexlify, and we will get the same result with hexlify.

How to learn Python

When coming into contact with a new package or function, it is necessary to find out what exactly it is designed to do. And the official document is a good resource to read about.

For example, only after this experience, I got the idea of what the package binascii is doing.

Reference

Leetcode# [14] Longest Common Prefix

Posted on 2020-01-07 | Post modified: 2020-01-07

Words count in article: 91

One novel solution to the easy problem

class Solution:
    def longestCommonPrefix(self, strs):
        """
        :type strs: List[str]
        :rtype: str
        """
        if not strs: return ""
        if len(strs) == 1: return strs[0]
        
        strs.sort()
        p = ""
        for x, y in zip(strs[0], strs[-1]):
            if x == y: p+=x
            else: break
        return r

It works because the sort function will sort the strings by prefix. So if the last string has the same prefix with the first string, it means that all the string within the range would satisfy the property of common prefix.

Computer Networks: A System Approach

Posted on 2019-12-22 | Post modified: 2020-01-07

Words count in article: 287

Chapter 1 Foundation

At one time, the term network meant the set of serial lines used to attach dumb terminals to mainframe computers. Other important networks include the voice telephone network and the cable TV network used to disseminate video signals.

Chapter 3 Internetworking

Part 1 Switching and Bridging

Concerning about how to forward some packet or frame from source to destination.

Datagrams

It is a connectionless protocol, which means that we can directly send the packet to the switch without establishing the connection. Hosts need to compete with each other to get the slot in the switch so that the packet could be sent out to the destination. In the packet, we need to specify the destination addess, while making use of the forwarding table, to know which port we are going to send the packet to.

Virtual Circuiit Switching

It is a connection-oriented protocol, which means that in order to deliver the packet to the other host, it will first try to establish the connection, then send the packet, and finally close the connection. There’re mainly two kinds of VC, wihch are Permanent VC and Switching VC. The former one is usually set by the network administrator to maintain a long-lived connection, which is kind of like the physical cable connection, while the latter one is just like the definition.

It will number the established link between host and switch, or switch and switch, so that the connection could be remembered. Thus, it don’t need to include the destination address in the packet bit, rather it would only need the VCI [Virtual Circuit Identifier].

The well-known usages of this tech include X.25 network, Frame Relay, Asynchronous Transfer Mode (ATM) and virtual private networks (VPN).

Fully Understand Binary Search

Posted on 2019-12-21 | Post modified: 2020-03-19

Words count in article: 427

Here is a blog particularly for the Binary Search implementation. Though it is a very fundamental algorithm, it stll lays a foundation for a lot of other problems. And it is necessary to dive deep for it.

First let’s talk about the traditional implementation version of the binary search:

bool search(int x[], int n, int k) {
    int l = 0, r = n-1;
    while (l <= r) {
        int m = (l+r)/2;
        if (x[m] == k) return true;
        if (x[m] < k) l = m+1; else r = m-1;
    }
    return false;
}

Observation:

The scope of the initial boundary is [0, n - 1], which is the entire available elements set. The reason why we choose this could be explained by the update strategy for the boundary variables l and r. Since they are updated as mid - 1 or mid + 1, they could be set to any elements inside the entire scope. So we should not make the boundary exceed the valid index set.
The update strategy for the l and r is reasonable when we think about a decision tree. Take left boundary l for example, if the current midpoint mid could not meet the predefined requirement, then all the elements on the leftside of mid, while including the midpoint, should be excluded from the choice scope. And we can do this by assign mid + 1 to l.
According to the update strategy, it is possible that we will reach the situation when l equals to r. Thus, the while clause should be l <= r, which includes the equality situation.
It is always helpful to consider the situation when the binary search hit the end, so that we can easily find the correct boundary and update strategy. For example, in leetcode [35], consider what will happen in the final step where l == r - 1, and there’s no element equal to target. According to the previous updating strategy, l will be set to mid + 1 and the binary search is finished. So the final output of the algorithm will be at the first value that is equal or bigger to the target.
The template could easily support the operations like finding equality in list and find the first element that is not satisfied the mentioned property [in this case, element in l will be the first element that is not satisfied the property].
https://leetcode-cn.com/problems/find-first-and-last-position-of-element-in-sorted-array/solution/er-fen-cha-zhao-suan-fa-xi-jie-xiang-jie-by-labula/

Effective Go

Posted on 2019-07-15 | Post modified: 2019-07-15

Words count in article: 2.6k

本文是关于美团tutor给出的学习资料 — 官方文档”Effective Go”的笔记

参考资料：

秦baibai

So we beat on, boats against the current, borne back ceaselessly into the past.

RSS

GitHub E-Mail Instagram