hashlib.md5 - Understand hexlify and unhexlify

When I was implementing the Linux MD5 crypt method for the user password, I encountered this problem. To solve a problem in this low level benefit me a lot. Here is the summary for what I learned from this.

ASCII Encoding

Basically, string in python 3 are encoded in utf-8. This is a encoding method with a variable length for each character. If we are going to operate on the byte or bit level, it is not impossible to do that based on this encoding method. This we need to convert the string to another data structure - bytes.

However, there exists many ways to interpret the original string to a bytes array. And the defualt translation method is utf-8 for the encode() method of data structure str. No matter what encoding methods we are going to use, they are operating on the character level. That is to say, every character will be encoded in different ways based on the request encoding method.

digest & hexdigest

These are the method included in the data structure _Hash, which is the return value of hashlib.md5(bytes). We have two ways to get the printable result from it:

  • digest()

    return the ascii encoding style result: every character in the output has the size of 8 bit, which means that there will be totally 16 characters (maybe includes invisible character).

    If we directly print the result to the console, we will get the format like \x..\x..\x... That’s the low level storage method for ASCII encoding method.

  • hexdigest()

    For every ASCII character, extract the two hex character from the original representation. For example, \x65\xa9 will be transformed to 65a9.

exlify & unhexlify

These are the method included in the package binascii, which is aimed to convert binary to ascii character, and vice versa.

  • We can pass the result from digest() method to hexlify, and we will get the same result with unhexlify.
  • We can pass the result from hexdigest() method to unhexlify, and we will get the same result with hexlify.

How to learn Python

When coming into contact with a new package or function, it is necessary to find out what exactly it is designed to do. And the official document is a good resource to read about.

For example, only after this experience, I got the idea of what the package binascii is doing.

Reference

  1. https://www.daniweb.com/programming/software-development/threads/494123/how-can-i-add-add-x
  2. http://www.asciitable.com/
  3. https://200ok.ch/posts/2018-12-09_unhexlify.html
0%