encoding korean euc-kr cp949 i18n

Korean Character Encodings Guide: Understanding EUC-KR and CP949

Master Korean text processing with our guide on EUC-KR and CP949 encodings, and learn how to convert legacy Korean text to modern UTF-8.

2026-04-13

Korean Character Encodings Guide: Understanding EUC-KR and CP949

Handling text in the Korean language (Hangul) requires a clear understanding of the specific encoding standards used in South Korea. While UTF-8 is now the universal standard for modern web and mobile applications, many legacy systems, legacy Windows applications, and older databases still rely on EUC-KR and its extension, CP949.

In this guide, we’ll dive into the technical details of Korean character encodings, their relationship to each other, and how to effectively manage conversions for modern development.


1. The Core Standards: EUC-KR and CP949

South Korean digital text has been primarily shaped by two closely related encoding standards.

EUC-KR (The Wansung Standard)

EUC-KR (Extended Unix Code for Korean) is based on the KS X 1001 standard. It is a "Wansung" (pre-composed) encoding system, meaning it encodes each Hangul syllable as a single unit rather than separate characters (Jamo).

  • Pros: It is very efficient for the most common 2,350 Hangul syllables.
  • Cons: It cannot represent all 11,172 possible Hangul syllables, leading to issues with rare characters or names.
  • Keywords: EUC-KR encoder decoder, EUC-KR to UTF-8.

CP949 (The Windows Extension)

CP949 (Code Page 949) is Microsoft’s proprietary extension of EUC-KR. It is the default encoding for older versions of Windows (Korean edition) and remains extremely common in legacy business software.

  • Why it matters: CP949 solves the main limitation of EUC-KR by supporting all 11,172 possible Hangul syllables while remaining backward compatible with EUC-KR.
  • Keywords: CP949 encoder decoder.

2. Technical Comparison Table

Encoding Standard Type Best Use Case Unicode Compatible?
EUC-KR KS X 1001 Wansung Legacy Unix/Linux systems No
CP949 MS Windows Wansung Legacy Windows applications No
UTF-8 Unicode Universal All modern Korean software Yes

3. Best Practices for Korean Software Development

Transitioning to UTF-8

For any new Korean project, UTF-8 is the only logical choice. It natively supports all Hangul syllables, ancient Hangul characters, and global emojis without the limitations of regional encodings.

  • Recommendation: Always use UTF-8 (without BOM) for code files and web content.

Normalization (NFC vs. NFD)

When working with Korean text, it is crucial to handle Unicode normalization correctly.

  • NFC (Canonical Composition): Hangul characters are stored as pre-composed syllables (e.g., '한'). This is the standard for web, Windows, and Linux.
  • NFD (Canonical Decomposition): Hangul characters are decomposed into individual Jamo (e.g., 'ㅎ', 'ㅏ', 'ㄴ'). This is primarily used in macOS file systems.
  • Why it matters: A search for "한" in NFC will fail to find "한" in NFD unless your system is "normalization aware."

4. FAQ: Frequently Asked Questions

Q: Why do Korean characters appear as "broken" (乱码) in my application?

A: This usually happens when an EUC-KR or CP949 file is read as UTF-8. To fix this, you must explicitly decode the file using the correct Korean encoding and re-encode it to UTF-8.

Q: What is the difference between EUC-KR and CP949?

A: CP949 is a superset of EUC-KR. It adds over 8,000 characters to support all possible Hangul syllable combinations that were missing in the original EUC-KR standard.

Q: How can I detect if a file is EUC-KR or UTF-8?

A: You can use byte pattern detection libraries (like chardet) or manually check for the absence of UTF-8 multi-byte sequences. UTF-8 files often contain a BOM (Byte Order Mark), though it is not recommended for Korean text.


5. Master Korean Text with Tool3M

Don't let legacy Korean encodings slow down your development. Tool3M provides specialized tools to handle Korean text with precision:

  • EUC-KR/CP949 Encoder & Decoder: Repair garbled text and convert legacy Korean files to modern standards.
  • Hangul Normalization Tool: Convert between Hangul NFC and NFD for cross-platform compatibility.
  • Korean Encoding Detector: Instantly identify the encoding of any Korean text snippet or file.

Related Guides