Question: Why UTF-8 is used in class file and UTF-16 in in runtime?

Question

Why UTF-8 is used in class file and UTF-16 in in runtime?

Answers 3
Added at 2016-12-30 10:12
Tags
Question

Why .class is UTF-8, but runtime .class is UTF-16?

enter image description here

Answers
nr: #1 dodano: 2016-12-30 10:12

Source code can have any encoding, you can also tell the compiler what encoding to use using the -encoding flag.

The JVM uses UTF-16, and it's specified in the JLS:

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.

nr: #2 dodano: 2016-12-30 10:12

javac encoding:

-encoding encoding Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.

JVM encoding:

Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

nr: #3 dodano: 2016-12-30 11:12

Why .class is UTF-8

For classes written for a Western audience, which are usually mostly ASCII, this is the most compact encoding.

but runtime .class is UTF-16?

At runtime it's quicker to manipulate strings that use a fixed-width encoding (Why Java char uses UTF-16?), so UCS-2 was chosen. This is complicated by the change from UCS-2 to UTF-16 making this another variable-width encoding.

As noted in the comments of that question, JEP 254 allows for the runtime representation to change to something more space efficient (e.g., Latin-1).

Source Show
◀ Wstecz