This is the lecture notes that I took when attending the graduate course CS204: Computer Networks in UC Riverside.
Reliable and Secure Distributed Programming
This is the note of the textbook “Introduction to Reliable and Secure Dsitributed Programming” to the graduate course CS247 “The Principles of Distributed Computing” in UCR.
Reference:
CS202: OS Paper Critique
Record the critiques that I made for the graduate course CS202: Advanced Opearting System in UCR.
- SPIN
- Exokernel
- Scheduler Activation - Redesign the user-level thread
- Lottery Scheduler Algorithm
P2PFUSE-OPENDHT A BUG
This is the blog to record a bug that I encountered in the process of developing my P2P File System.
Introduction
By saying P2P file system, I mean that this file system doesn’t store the files storaged in the local disk. It will call the API provided by another P2P platform to fetch the data from it, and display the content returned.
For the file system part, I choose FUSE as the tool to develope a user-level file system. In this case, I don’t need to dive into the low-level detail about the disk or something else. Rather, I just need to implement specific API that FUSE provides, where I fill the data to the given data structure such as buf
. And then the system API will call the user-defined API, fetch the required data and form the files.
For the P2P part, I choose the open source project OpenDHT as the backend of the file storage. Though there’re some limitations like the uploading data could only exist 10mins by default, garbled character after the user input string, overall, this has provided adequate functions for our system.
The BUG
Finally, this is the most important part of this blog. Well, actually it is kind of shamed for me to get a bug like this. Because this is a really fundamental-level bug in a high-level project. And the lesson I learnt from this is that, I need to study the fundamental knowledge more and more.
First of all, I decalre the header file which includes many definitions of the functions, class and global variables. And the issues lie at the global variables.
1 | namespace nbfs { |
Here is the definition of all the global variables. All the operations rely on the entry point provided by them. Well, it is totally fine with the first user_name
variable, becase we don’t want it to be modified, and it should be visible to all the functions.
However, for the latter two variables, there’s a huge mistake. I set them to be static. In which case, the variable imported by the source file ending with cpp
will all have different entry point of their own version. For example, I initialize the entry point in main.cpp
file, however, when I call in the operations.cpp
file, it will be set to NULL. I spent a huge time on this, because the core function dht::get()
function will be called even though the node has not been initialized. And I just sticked on this point over and over again, failed to get the truth that the variable should not be static.
The solution
The correct way to declare the global variable in cpp is to give the definition in the header file like this
1 | // hard coded user name, used for the namespace for files |
And this is just the definition, we will need to declare it in the source file where we initialize it. After that, all other files includes the header file could directly call the global variables right now.
Chord: A Peer-to-peer Lookup Protocol
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications
System Model
Support the following features:
- load balance: spreading keys evenly over the nodes
- decentralization: fully distributed; no node is more important than any other
- scalability: the cost of a Chord loopup grows as the log of the number of nodes
- availability: automatically adjusts its internal tables to reflect newly joined nodes as well as node failures
- flexible naming: Chord places no constraints on the structure of the keys it looks up
The Chord Protocol
The consistent hash function assigns each node and key an m-bit identifier using SHA-1 as a base hash function. A node’s identifier is chosen by hashing the node’s IP address, while a key identifier is produced by hashing the key. The identifier length m must be large enough to make the probability of two nodes or keys hashing to the same identifier negligible.
The range of the identifier circle is $2^m$. Key k is assigned to the first node whose identifier is equal to or follows k in the circle.
The lookup algorithm makes use of binary search tech. Every node will store a table with m entries. The $i^{th}$ entry in the table at node n contains the identity of the first node s that succeeds n by at least $2^{i-1}$ on the circle. We denote the $i^{th}$ entry of node n by $n.finger[i]$.
Since from any node, we can reach half of the entire Chord circle, the lookup operation would take O($\log n$) cost.
Impact of node joins on lookups: Sometimes the nodes in the affected region have incorrect successor pointers, or keys may not yet have migrated to newly joined nodes, and the lookup may fail. The higher-layer software using Chord has to notice that the desired data was not found, and has the option of retrying the lookup after a pause.
hashlib.md5 - Understand hexlify and unhexlify
When I was implementing the Linux MD5 crypt method for the user password, I encountered this problem. To solve a problem in this low level benefit me a lot. Here is the summary for what I learned from this.
ASCII Encoding
Basically, string in python 3 are encoded in utf-8. This is a encoding method with a variable length for each character. If we are going to operate on the byte or bit level, it is not impossible to do that based on this encoding method. This we need to convert the string to another data structure - bytes.
However, there exists many ways to interpret the original string to a bytes array. And the defualt translation method is utf-8
for the encode()
method of data structure str
. No matter what encoding methods we are going to use, they are operating on the character level. That is to say, every character will be encoded in different ways based on the request encoding method.
digest & hexdigest
These are the method included in the data structure _Hash
, which is the return value of hashlib.md5(bytes)
. We have two ways to get the printable result from it:
digest()
return the ascii encoding style result: every character in the output has the size of 8 bit, which means that there will be totally 16 characters (maybe includes invisible character).
If we directly print the result to the console, we will get the format like
\x..\x..\x..
. That’s the low level storage method for ASCII encoding method.hexdigest()
For every ASCII character, extract the two hex character from the original representation. For example,
\x65\xa9
will be transformed to65a9
.
exlify & unhexlify
These are the method included in the package binascii
, which is aimed to convert binary to ascii character, and vice versa.
- We can pass the result from
digest()
method tohexlify
, and we will get the same result withunhexlify
. - We can pass the result from
hexdigest()
method tounhexlify
, and we will get the same result withhexlify
.
How to learn Python
When coming into contact with a new package or function, it is necessary to find out what exactly it is designed to do. And the official document is a good resource to read about.
For example, only after this experience, I got the idea of what the package binascii
is doing.
Reference
Leetcode# [14] Longest Common Prefix
One novel solution to the easy problem
1 | class Solution: |
It works because the sort function will sort the strings by prefix. So if the last string has the same prefix with the first string, it means that all the string within the range would satisfy the property of common prefix.
Computer Networks: A System Approach
Chapter 1 Foundation
At one time, the term network meant the set of serial lines used to attach dumb terminals to mainframe computers. Other important networks include the voice telephone network and the cable TV network used to disseminate video signals.
Chapter 3 Internetworking
Part 1 Switching and Bridging
Concerning about how to forward some packet or frame from source to destination.
Datagrams
It is a connectionless protocol, which means that we can directly send the packet to the switch without establishing the connection. Hosts need to compete with each other to get the slot in the switch so that the packet could be sent out to the destination. In the packet, we need to specify the destination addess, while making use of the forwarding table, to know which port we are going to send the packet to.
Virtual Circuiit Switching
It is a connection-oriented protocol, which means that in order to deliver the packet to the other host, it will first try to establish the connection, then send the packet, and finally close the connection. There’re mainly two kinds of VC, wihch are Permanent VC and Switching VC. The former one is usually set by the network administrator to maintain a long-lived connection, which is kind of like the physical cable connection, while the latter one is just like the definition.
It will number the established link between host and switch, or switch and switch, so that the connection could be remembered. Thus, it don’t need to include the destination address in the packet bit, rather it would only need the VCI [Virtual Circuit Identifier].
The well-known usages of this tech include X.25 network, Frame Relay, Asynchronous Transfer Mode (ATM) and virtual private networks (VPN).
Fully Understand Binary Search
Here is a blog particularly for the Binary Search implementation. Though it is a very fundamental algorithm, it stll lays a foundation for a lot of other problems. And it is necessary to dive deep for it.
First let’s talk about the traditional implementation version of the binary search:
1 | bool search(int x[], int n, int k) { |
Observation:
- The scope of the initial boundary is [0, n - 1], which is the entire available elements set. The reason why we choose this could be explained by the update strategy for the boundary variables
l
andr
. Since they are updated asmid - 1
ormid + 1
, they could be set to any elements inside the entire scope. So we should not make the boundary exceed the valid index set. - The update strategy for the
l
andr
is reasonable when we think about a decision tree. Take left boundaryl
for example, if the current midpointmid
could not meet the predefined requirement, then all the elements on the leftside ofmid
, while including the midpoint, should be excluded from the choice scope. And we can do this by assignmid + 1
tol
. - According to the update strategy, it is possible that we will reach the situation when
l
equals tor
. Thus, the while clause should bel <= r
, which includes the equality situation. - It is always helpful to consider the situation when the binary search hit the end, so that we can easily find the correct boundary and update strategy. For example, in leetcode [35], consider what will happen in the final step where l == r - 1, and there’s no element equal to target. According to the previous updating strategy,
l
will be set tomid + 1
and the binary search is finished. So the final output of the algorithm will be at the first value that is equal or bigger to the target. - The template could easily support the operations like finding equality in list and find the first element that is not satisfied the mentioned property [in this case, element in
l
will be the first element that is not satisfied the property]. - https://leetcode-cn.com/problems/find-first-and-last-position-of-element-in-sorted-array/solution/er-fen-cha-zhao-suan-fa-xi-jie-xiang-jie-by-labula/