GIỚI THIỆU VỀ HÀM BĂM
Hàm băm thông dụng là một phép biến đổi một dãy "số" hoặc một dã "số và chữ cái" thành một số nguyên (khá bé):
H(k) = n
Biến đổi hàm băm là một đối một một chiều, nghĩa là nếu có k1 =/= k2 thì chắc chắn n1 =/n2 nhưng điều ngược lại không chắc đúng.
A Hash Function is a function that converts a given numeric or alphanumeric key to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a significant number or string to a small integer that can be used as the index in the hash table.
The pair is of the form (key, value), where for a given key, one can find a value using some kind of a “function” that maps keys to values. The key for a given object can be calculated using a function called a hash function. For example, given an array A, if i is the key, then we can find the value by simply looking up A[i].
Types of Hash functions
There are many hash functions that use numeric or alphanumeric keys. This article focuses on discussing different hash functions:
- Division Method.
- Mid Square Method.
- Folding Method.
Let’s begin discussing these methods in detail.
- Multiplication Method.
1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function divides the value k by M and then uses the remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are more uniformly distributed. The hash function is dependent upon the remainder of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
= 0
Pros:
- This method is quite good for any value of M.
Cons:
- The division method is very fast since it requires only a single division operation.
- This method leads to poor performance since consecutive keys map to consecutive hash values in the hash table.
2. Mid Square Method:
- Sometimes extra care should be taken to choose the value of M.
The mid-square method is a very good hashing method. It involves two steps to compute the hash value-
- Square the value of the key k i.e. k2
Formula:
- Extract the middle r digits as the hash value.
h(K) = h(k x k)
Here,
k is the key value.
The value of r can be decided based on the size of the table.
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits are required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
Pros:
- The performance of this method is good as most or all digits of the key value contribute to the result. This is because all digits in the key contribute to generating the middle digits of the squared result.
Cons:
- The result is not dominated by the distribution of the top digit or bottom digit of the original key value.
- The size of the key is one of the limitations of this method, as the key is of big size then its square will double the number of digits.
3. Digit Folding Method:
- Another disadvantage is that there will be collisions but we can try to reduce collisions.
This method involves two steps:
- Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same number of digits except for the last part that can have lesser digits than the other parts.
Formula:
- Add the individual parts. The hash value is obtained by ignoring the last carry if any.
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Note:
The number of digits in each part varies depending upon the size of the hash table. Suppose for example the size of the hash table is 100, then each part must have two digits except for the last part which can have a lesser number of digits.
4. Multiplication Method
This method involves the following steps:
- Choose a constant value A such that 0 < A < 1.
- Multiply the key value with A.
- Extract the fractional part of kA.
- Multiply the result of the above step by the size of the hash table i.e. M.
Formula:
- The resulting hash value is obtained by taking the floor of the result obtained in step 4.
h(K) = floor (M (kA mod 1))
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
Pros:
The advantage of the multiplication method is that it can work with any value between 0 and 1, although there are some values that tend to give better results than the rest.
Cons:
The multiplication method is generally suitable when the table size is the power of two, then the whole process of computing the index by the key using multiplication hashing is very fast.
MỘT SỐ HÀM BĂM THÔNG DỤNG
Commonly used hash functions
Hash functions are widely used in computer science and cryptography for a variety of purposes, including data integrity, digital signatures, password storage, and more.
There are many types of hash functions, each with its own strengths and weaknesses. Here are a few of the most common types:
1. SHA (Secure Hash Algorithm): SHA is a family of cryptographic hash functions designed by the National Security Agency (NSA) in the United States. The most widely used SHA algorithms are SHA-1, SHA-2, and SHA-3. Here’s a brief overview of each:
- SHA-1: SHA-1 is a 160-bit hash function that was widely used for digital signatures and other applications. However, it is no longer considered secure due to known vulnerabilities.
- SHA-2: SHA-2 is a family of hash functions that includes SHA-224, SHA-256, SHA-384, and SHA-512. These functions produce hash values of 224, 256, 384, and 512 bits, respectively. SHA-2 is widely used in security protocols such as SSL/TLS and is considered secure.
2. CRC (Cyclic Redundancy Check): CRC is a non-cryptographic hash function used primarily for error detection in data transmission. It is fast and efficient but is not suitable for security purposes. The basic idea behind CRC is to append a fixed-length check value, or checksum, to the end of a message. This checksum is calculated based on the contents of the message using a mathematical algorithm, and is then transmitted along with the message.
- SHA-3: SHA-3 is the latest member of the SHA family and was selected as the winner of the NIST hash function competition in 2012. It is designed to be faster and more secure than SHA-2 and produces hash values of 224, 256, 384, and 512 bits.
When the message is received, the receiver can recalculate the checksum using the same algorithm, and compare it with the checksum transmitted with the message. If the two checksums match, the receiver can be reasonably certain that the message was not corrupted during transmission.
The specific algorithm used for CRC depends on the application and the desired level of error detection. Some common CRC algorithms include CRC-16, CRC-32, and CRC-CCITT.
3. MurmurHash: MurmurHash is a fast and efficient non-cryptographic hash function designed for use in hash tables and other data structures. It is not suitable for security purposes as it is vulnerable to collision attacks.
4. BLAKE2: BLAKE2 is a cryptographic hash function designed to be fast and secure. It is an improvement over the popular SHA-3 algorithm and is widely used in applications that require high-speed hashing, such as cryptocurrency mining.
BLAKE2 is available in two versions: BLAKE2b and BLAKE2s. BLAKE2b is optimized for 64-bit platforms and produces hash values of up to 512 bits, while BLAKE2s is optimized for 8- to 32-bit platforms and produces hash values of up to 256 bits.
5. Argon2: Argon2 is a memory-hard password hashing function designed to be resistant to brute-force attacks. It is widely used for password storage and is recommended by the Password Hashing Competition. The main goal of Argon2 is to make it difficult for attackers to crack passwords by using techniques such as brute force attacks or dictionary attacks. It achieves this by using a computationally-intensive algorithm that makes it difficult for attackers to perform large numbers of password guesses in a short amount of time.
Argon2 has several key features that make it a strong choice for password hashing and key derivation:
- Resistance to parallel attacks: Argon2 is designed to be resistant to parallel attacks, meaning that it is difficult for attackers to use multiple processing units, such as GPUs or ASICs, to speed up password cracking.
- Memory-hardness: Argon2 is designed to be memory-hard, meaning that it requires a large amount of memory to compute the hash function. This makes it more difficult for attackers to use specialized hardware to crack passwords.
Resistance to side-channel attacks: Argon2 is designed to be resistant to side-channel attacks, such as timing attacks or power analysis attacks, that could be used to extract information about the password being hashed.
- Customizable: Argon2 is highly customizable, and allows users to adjust parameters such as the memory usage, the number of iterations, and the output length to meet their specific security requirements.
6. MD5 (Message Digest 5): MD5 is a widely-used cryptographic hash function that produces a 128-bit hash value. It is fast and efficient but is no longer recommended for security purposes due to known vulnerabilities. The basic idea behind MD5 is to take an input message of any length, and produce a fixed-length output, known as the hash value or message digest. This hash value is unique to the input message, and is generated using a mathematical algorithm that involves a series of logical operations, such as bitwise operations, modular arithmetic, and logical functions.
MD5 is widely used in a variety of applications, including digital signatures, password storage, and data integrity checks. However, it has been shown to have weaknesses that make it vulnerable to attacks. In particular, it is possible to generate two different messages with the same MD5 hash value, a vulnerability known as a collision attack.
There are many other types of hash functions, each with its own unique features and applications. The choice of hash function depends on the specific requirements of the application, such as speed, security, and memory usage.
Last Updated : 09 Mar, 2023