Python is a powerful and versatile programming language that supports multiple paradigms and features. One of the features of Python is that it supports Unicode, which is a standard for encoding and representing text in different languages and scripts. In this article, we will explore the ord in Python in more detail. We will learn how to use it with different characters, how to handle errors, and what are some of its applications. By the end of this article, you will have a better understanding of what is ord Python and how to use it effectively.
What is Ord Python?
Unicode assigns a unique number, called a code point, to every character in every language. For example, the code point for the letter A is 65, and the code point for the euro sign (€) is 8364.
But how can we convert a character to its corresponding code point in Python? This is where the ord function comes in handy. The ord function in Python takes a single character as an argument and returns its Unicode code point as an integer. For example, ord(‘A’) returns 65, and ord(‘€’) returns 8364. The ord function is the inverse of the chr function, which takes an integer as an argument and returns its corresponding character. For example, chr(65) returns ‘A’, and chr(8364) returns ‘€’.
Usage of Ord Function in Python
The ord() function in Python is used to obtain the Unicode code point of a given character. It can be used with a wide range of characters, including alphabets, digits, symbols, punctuation marks, emojis, and non-printable characters.
Here's an example showcasing the usage of the ord() function:
# Using ord with alphabets
print(ord('a')) # 97
print(ord('Z')) # 90
# Using ord with digits
print(ord('0')) # 48
print(ord('9')) # 57
# Using ord with symbols
print(ord('$')) # 36
print(ord('@')) # 64
# Using ord with punctuation marks
print(ord('.')) # 46
print(ord(',')) # 44
# Using ord with emojis
print(ord('😊')) # 128522
print(ord('🐶')) # 128054
# Using ord with non-printable characters
print(ord('\n')) # 10
print(ord('\t')) # 9
The output of the ord() function depends on the encoding scheme used by Python. By default, Python uses UTF-8 encoding, which can represent any Unicode character using one to four bytes. However, you can specify a different encoding scheme using the encode() method.
Here's an example:
# Using UTF-8 encoding (default)
print(ord('€')) # 8364
# Using ISO-8859-15 encoding
print(ord('€'.encode('iso-8859-15'))) # 164
In the above examples, the ord() function returns the respective Unicode code points for the given characters. This can be useful in various situations, such as character manipulation, encoding and decoding, and working with international text.
Types of Errors in Ord() function
The ord() function in Python can raise several types of errors depending on the input provided. Here are some errors that can occur:
TypeError
UnicodeError
UnicodeEncodeError
1. TypeError
The ord() function can raise a TypeError exception if the argument passed to it is not a single character or a valid Unicode object.
# Passing a string of length more than one
print(ord('Hello'))
# TypeError: ord() expected a character, but string of length 5 found
# Passing an invalid Unicode object
print(ord(b'\x80'))
# TypeError: ord() expected string of length 1, but int found
To handle these errors, you can use a try-except block and catch the TypeError exception. Here's an example demonstrating error handling with the ord() function:
# Handling errors with the ord function
try:
print(ord('Hello'))
# Raises TypeError: ord() expected a character, but string of length 5 found
except TypeError:
print('Invalid argument for ord function')
try:
print(ord(b'\x80'))
# Raises TypeError: ord() expected string of length 1, but int found
except TypeError:
print('Invalid argument for ord function')
In the above code, a try-except block is used to catch the TypeError exception. If the exception is raised, the code within the except block will be executed, allowing you to handle the error gracefully. In this case, the message 'Invalid argument for ord function' is printed to indicate that an invalid argument was provided to the ord() function.
2. UnicodeError
The UnicodeError is a base class for all Unicode-related errors. It can occur due to various reasons, including:
Attempting to process or encode characters outside the Basic Multilingual Plane (BMP) range. The BMP covers characters with code points up to 0xFFFF. Characters above this range are represented using surrogate pairs, and some operations may not support them.
Encountering invalid or ill-formed Unicode sequences that do not adhere to the Unicode standard.
When the ord() function encounters such situations, it raises a UnicodeError with a specific error message describing the issue.
Example:
try:
print(ord('\U0001F984'))
# Raises UnicodeError: (unicodeencodeerror) 'surrogateescape' codec can't encode character '\U0001f984' in position 0-1: surrogates not allowed
except UnicodeError as e:
print(f'UnicodeError: {e}')
In this example, the ord() function is passed a Unicode character (\U0001F984) that is outside the Basic Multilingual Plane (BMP) range. This raises a UnicodeError with the message "surrogates not allowed."
3. UnicodeEncodeError
The UnicodeEncodeError specifically occurs when there is a problem with encoding Unicode characters into a specific character encoding scheme, such as ASCII, UTF-8, or others. It can happen in scenarios like:
Attempting to encode a Unicode character that is not supported by the chosen encoding scheme.
Encoding a character that falls outside the range of characters representable by the chosen encoding.
In the case of the ord() function, the UnicodeEncodeError can occur if you explicitly try to encode a Unicode character using an encoding scheme (e.g., using .encode() method) that cannot handle that character.
Example:
try:
print(ord('😊'.encode('ascii')))
# Raises UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f60a' in position 0: ordinal not in range(128)
except UnicodeEncodeError as e:
print(f'UnicodeEncodeError: {e}')
In this example, the ord() function attempts to encode the emoji character 😊 using the ASCII encoding scheme. However, ASCII cannot represent this character, so a UnicodeEncodeError is raised with the message "ordinal not in range(128)."
Applications of Ord() in Python
The ord function can be used for various purposes in Python programming. Some of them are:
1. Converting characters to numbers:
You can utilize the ord() function to convert individual characters into their corresponding numerical values. This enables performing arithmetic operations on characters. For example:
# Converting characters to numbers
a = ord('a') # 97
b = ord('b') # 98
# Performing arithmetic operations on numbers
c = a + b # 195
d = b - a # 1
# Converting numbers back to characters
e = chr(c) # Ã
f = chr(d) # \x01
2. Encrypting or decrypting messages:
You can use the ord function to encrypt or decrypt messages using a simple algorithm such as Caesar cipher or XOR cipher. For example:
# Encrypting a message using Caesar cipher (shift by 3)
message = 'Hello'
encrypted = ''
for char in message:
# Get the code point of the character
code = ord(char)
# Shift the code point by 3
code = code + 3
# Convert the code point back to character
char = chr(code)
# Append the character to the encrypted message
encrypted = encrypted + char
print(encrypted) # Khoor
3. Sorting strings:
You can use the ord function to sort strings based on their Unicode values. For example:
# Sorting a list of strings
names = ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
sorted_names = sorted(names, key=lambda x: ord(x[0]))
print(sorted_names) # ['Alice', 'Bob', 'Charlie', 'David', 'Eve']
Conclusion
In this article, we learned what is the ord function in Python and how to use it with different characters. We also learned how to handle errors with the ord function and what are some of its applications. The ord function is a useful built-in function that can help us work with Unicode characters and perform various tasks in Python programming.
Комментарии