[Python] 파이썬으로 한글 Text를 Encoding, Decoding 하는 방법

250x250

Notice

Recent Posts

Recent Comments

Link

« 2026/07 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Maxima's Lab

[Python] 파이썬으로 한글 Text를 Encoding, Decoding 하는 방법 본문

Python

[Python] 파이썬으로 한글 Text를 Encoding, Decoding 하는 방법

Minima 2024. 4. 13. 13:14

728x90

SMALL

안녕하세요, 오늘은 파이썬으로 한글 Text를 Encoding 및 Decoding 하는 방법에 대해서 알아보겠습니다.

한글 Text에 대해서 Encoding을 하기 위해서는 각 문자를 초성, 중성, 종성으로 구분해서 Encoding을 진행해야합니다.

먼저, 초성, 중성, 종성이 될 수 있는 자음과 모음들에 대해서 알아보겠습니다.

def generate_hangul_jamos():
    # 초성
    choseong = [chr(code) for code in range(0x1100, 0x1113)]
    # 중성 (모음)
    jungseong = [chr(code) for code in range(0x1161, 0x1176)]
    # 중성
    jongseong = [chr(code) for code in range(0x11A8, 0x11C3)]
    return choseong + jungseong + jongseong

위의 함수는 각각 초성, 중성, 종성이 될수 있는 자음과 모음들을 계산하여 합쳐 1개의 List로 반환하는 함수입니다.

함수의 결과는 다음과 같습니다.

['ᄀ', 'ᄁ', 'ᄂ', 'ᄃ', 'ᄄ', 'ᄅ', 'ᄆ', 'ᄇ', 'ᄈ', 'ᄉ', 'ᄊ', 'ᄋ', 'ᄌ', 'ᄍ', 'ᄎ', 'ᄏ', 'ᄐ', 'ᄑ', 'ᄒ', 'ᅡ', 'ᅢ', 'ᅣ', 'ᅤ', 'ᅥ', 'ᅦ', 'ᅧ', 'ᅨ', 'ᅩ', 'ᅪ', 'ᅫ', 'ᅬ', 'ᅭ', 'ᅮ', 'ᅯ', 'ᅰ', 'ᅱ', 'ᅲ', 'ᅳ', 'ᅴ', 'ᅵ', 'ᆨ', 'ᆩ', 'ᆪ', 'ᆫ', 'ᆬ', 'ᆭ', 'ᆮ', 'ᆯ', 'ᆰ', 'ᆱ', 'ᆲ', 'ᆳ', 'ᆴ', 'ᆵ', 'ᆶ', 'ᆷ', 'ᆸ', 'ᆹ', 'ᆺ', 'ᆻ', 'ᆼ', 'ᆽ', 'ᆾ', 'ᆿ', 'ᇀ', 'ᇁ', 'ᇂ']

이어서, 위의 List의 각 원소와 Index 값을 Dictionary로 반환하는 함수는 다음과 같습니다.

def generate_hangul_jamo_to_index():
    hangul_jamos = generate_hangul_jamos()
    jamo_to_index = {jamo: index for index, jamo in enumerate(hangul_jamos)}
    return jamo_to_index

해당 함수의 실행 결과는 다음과 같습니다.

{'ᄀ': 0, 'ᄁ': 1, 'ᄂ': 2, 'ᄃ': 3, 'ᄄ': 4, 'ᄅ': 5, 'ᄆ': 6, 'ᄇ': 7, 'ᄈ': 8, 'ᄉ': 9, 'ᄊ': 10, 'ᄋ': 11, 'ᄌ': 12, 'ᄍ': 13, 'ᄎ': 14, 'ᄏ': 15, 'ᄐ': 16, 'ᄑ': 17, 'ᄒ': 18, 'ᅡ': 19, 'ᅢ': 20, 'ᅣ': 21, 'ᅤ': 22, 'ᅥ': 23, 'ᅦ': 24, 'ᅧ': 25, 'ᅨ': 26, 'ᅩ': 27, 'ᅪ': 28, 'ᅫ': 29, 'ᅬ': 30, 'ᅭ': 31, 'ᅮ': 32, 'ᅯ': 33, 'ᅰ': 34, 'ᅱ': 35, 'ᅲ': 36, 'ᅳ': 37, 'ᅴ': 38, 'ᅵ': 39, 'ᆨ': 40, 'ᆩ': 41, 'ᆪ': 42, 'ᆫ': 43, 'ᆬ': 44, 'ᆭ': 45, 'ᆮ': 46, 'ᆯ': 47, 'ᆰ': 48, 'ᆱ': 49, 'ᆲ': 50, 'ᆳ': 51, 'ᆴ': 52, 'ᆵ': 53, 'ᆶ': 54, 'ᆷ': 55, 'ᆸ': 56, 'ᆹ': 57, 'ᆺ': 58, 'ᆻ': 59, 'ᆼ': 60, 'ᆽ': 61, 'ᆾ': 62, 'ᆿ': 63, 'ᇀ': 64, 'ᇁ': 65, 'ᇂ': 66}

최종적으로 전달받은 Text를 Encoding해서 정수 값으로 구성된 리스트로 반환하는 코드는 다음과 같습니다.

import unicodedata
jamo_to_index = generate_hangul_jamo_to_index()

def encode_text_to_jamo_indices(text, jamo_to_index):
    decomposed_text = unicodedata.normalize('NFD', text)
    return [jamo_to_index.get(jamo, -1) for jamo in decomposed_text if jamo in jamo_to_index]

sample_text = "안녕하세요 내 이름은 홍길동입니다"
encoded_indices = encode_text_to_jamo_indices(sample_text, jamo_to_index)

print("Encoded Indices:", encoded_indices)

출력결과는 다음과 같습니다.

Encoded Indices: [11, 19, 43, 2, 25, 60, 18, 19, 9, 24, 11, 31, 2, 20, 11, 39, 5, 37, 55, 11, 37, 43, 18, 27, 60, 0, 39, 47, 3, 27, 60, 11, 39, 56, 2, 39, 3, 19]

위의 Encoding 결과를 다시 Decoding 하기 위한 코드는 다음과 같습니다.

def decode_jamo(indices, jamo_to_index):
    idx2jamo = {idx: jamo for jamo, idx in jamo_to_index.items()}
    composed = ''.join(unicodedata.normalize('NFC', idx2jamo[idx]) for idx in indices)
    return composed

decoded_text = decode_jamo(encoded_indices, jamo_to_index)
print("Decoded Text:", decoded_text)

출력 결과는 다음과 같습니다.

Decoded Text: 안녕하세요내이름은홍길동입니다

이상으로, 파이썬으로 한글 Text를 Encoding 및 Decoding하는 방법에 대해서 알아보았습니다.

728x90

LIST

저작자표시 비영리 변경금지 (새창열림)

'Python' 카테고리의 다른 글

[Python] .bat 파일을 활용하여 공유 폴더 설정 및 해제(Windows) (0)	2024.05.10
[Python] 공유 폴더 설정 및 삭제 하기 (Ubuntu, Samba, smbclient, mount/unmount) (0)	2024.05.07
[Python] 쉘 스크립트 (배치) 파일 내부에서 쉘 스크립트 (배치) 파일 실행하는 방법 (0)	2024.02.27
[Python] 쉘 스크립트 파일(.sh) & 배치 파일(.bat) 작성 및 실행하는 방법 (0)	2024.02.27
[Python] tqdm 패키지 사용법 (0)	2024.01.29

'Python' Related Articles

Comments

Maxima's Lab

[Python] 파이썬으로 한글 Text를 Encoding, Decoding 하는 방법 본문

[Python] 파이썬으로 한글 Text를 Encoding, Decoding 하는 방법

'Python' 카테고리의 다른 글

티스토리툴바