Location>code7788 >text

Deeper Understanding of Java Object Structure

Popularity:691 ℃/2024-09-20 18:38:11

I. Java object structure

After instantiating a Java object, how is the structure of the object in memory?The structure of a Java object (Object instance) consists of three parts: the object header, the object body and the alignment byte, as shown in the following figure

Java对象结构

1, the three parts of the Java object

(1) Object header

The object header consists of three fields, the first field is called Mark Word and is used to store its own runtime data, such as GC flag bits, hash codes, lock status and other information.

The second field is called the Class Pointer, and is used to hold the address of the Class object in the method area, which is used by the virtual machine to determine which instance of the class the object is.

The third field is called Array Length. If the object is a Java array, then this field must be present to record data for the length of the array; if the object is not a Java array, then this field is not present, so this is an optional field.

(2) Object body

The object body contains the object's instance variables (member variables) for member property values, including those of the parent class. This portion of memory is aligned at 4 bytes.

(3) Alignment bytes

Alignment bytes, also known as padding alignment, is used to ensure that the number of bytes of memory occupied by a Java object is a multiple of 8 HotSpot VM's memory management requires that the starting address of the object must be an integer multiple of 8 bytes. The object header itself is a multiple of 8. When the data in the object's instance variable is not a multiple of 8, padding data is required to ensure 8-byte alignment.

2. Structural information of Mark Word

The Mark Word is the first part of the object header, where a lot of important information about Java's built-in locks resides.The bit length of the Mark Word is the size of a Word in the JVM, which means that a 32-bit JVM has a 32-bit Mark Word, and a 64-bit JVM has a 64-bit Mark Word.The bit length of the Mark Word is not affected by the Oop object pointer compression option. The bit length of the Mark Word is not affected by the Oop object pointer compression option.

Java built-in locks have a total of four states, the level from low to high: no lock, biased lock, lightweight lock and heavyweight lock. In fact, before JDK 1.6, the Java built-in lock is still a heavyweight lock, is a relatively inefficient lock, after JDK 1.6, the JVM in order to improve the efficiency of the lock acquisition and release, the implementation of synchronized optimization, the introduction of the bias locks and lightweight locks, since then, the Java built-in locks on the state of the four kinds (no locks, bias locks, Lightweight Lock and Heavyweight Lock), and the 4 states will be gradually upgraded with the competition, and it is an irreversible process, i.e., not degradable, which means that you can only upgrade the locks (from low level to high level). Below is the information about the structure of the 64-bit Mark Word in different lock states:

64位Mark Word的结构信息

Since the current mainstream JVM are 64-bit, so we use 64-bit Mark Word. the next part of the 64-bit Mark Word in the content of the specific introduction.

(1) lock: lock status mark bit, accounting for two binary bits, because of the desire to use as few binary bits as possible to express as much information as possible, so set the lock mark. The value of the mark is different, the meaning of the whole Mark Word is different.

(2) biased_lock: whether the object is biased lock enabled or not, only occupies 1 binary bit. A value of 1 indicates that the object has biased lock enabled, while a value of 0 indicates that the object does not have biased lock.

The combination of the lock and biased_lock flag bits together indicate what kind of lock state the Object instance is in. The meaning of the combination of the two is shown in the following table

image-20240919171225689

(3) age: 4-bit Java object generation age. In GC, the object is copied once in the Survivor area, the age is increased by 1. When the object reaches the set threshold, it will be promoted to the old age. By default, the age threshold for parallel GC is 15 and for concurrent GC is 6. Since age has only 4 bits, the maximum value is 15, which is why the -XX:MaxTenuringThreshold option has a maximum value of 15.

(4) identity_hashcode: 31-bit object identification HashCode (hash code) using delayed loading technology, when the call () method or () method to calculate the object's HashCode, the result will be written to the object header. When the object is locked, the value is moved to the Monitor.

(5) thread: The 54-bit thread ID value is the ID of the thread holding the bias lock.

(6) epoch: biased timestamp.

(7) ptr_to_lock_record: takes up 62 bits and points to a pointer to the lock record in the stack frame in the lightweight lock state.

Second, use the JOL tool to view the layout of the object

1. Use of JOL tools

JOL tools is a jar package, using the tools it provides can easily parse out the structure of the runtime java object in memory, when using the first need to introduce maven GAV information

<!--Java Object Layout -->
<dependency>
    <groupId></groupId>
    <artifactId>jol-core</artifactId>
    <version>0.17</version>
</dependency>

The current latest version is version 0.17, and it has been observed that there is a relatively large difference in the output information between it and versions prior to 0.15 (excluding 0.15), while generally the versions in use now are lower, but that doesn't prevent experimenting with the tool here.

A few common methods used by jol-core

  • (object).toPrintable(): View internal information about an object.
  • (object).toPrintable(): View information external to an object, including referenced objects.
  • (object).totalSize(): View the total size of the object.
  • ().details(): Output current virtual machine information

First create a simple class Hello

public class Hello {
    private Integer a = 1;   
}

Next, write a startup class to test it

import .slf4j.Slf4j;
import ;
import ;

/**
 * @author kdyzm
 * @date 2024/9/19
 */
@Slf4j
public class JalTest {

    public static void main(String[] args) {
        (().details());
        Hello hello = new Hello();
        ("hello obj status:{}", (hello).toPrintable());
    }
}

Output results:

image-20240920092758780

2. Analysis of results

In the code, the first use of the().details() method gets information about the current java virtual machine:

  • VM mode: 64 bits - indicates that the current VM is a 64-bit VM

  • Compressed references (oops): 3-bit shift - Enables object pointer compression. On 64-bit Java VMs, object pointers usually take up 8 bytes (64-bit), but by using compressed pointers technology, the footprint of object pointers can be reduced and memory utilization can be improved." 3-bit shift" means using 3-bit shift operation to compress the object pointer. By shifting the object pointer to the right by 3 bits, some of the useless bits in the pointer can be eliminated, thus reducing the actual size of the object pointer so that it takes up less memory.

  • Compressed class pointers: 3-bit shift - class pointer compression is enabled, the rest as above.

  • Object alignment: 8 bytes - byte alignment uses 8 bytes

image-20240920101332652

This part of the output indicates the size of the number of bytes occupied by data of type reference, boolean, byte, char, short, int, float, long, double, as well as the size and offset in the array.

It should be noted that the concept of array offset, the value of the array offset is actually the size of the object header, 16 bytes in the figure above that if the current object is an array, then the object header is 16 bytes, do not forget that the object header there is the length of the array, in the case of object pointer compression has not been turned on, it has to occupy the size of the 4 bytes.

The next step is to analyze the output of the object structure.

III. Object structure output parsing

Let's review the object structure first

Java对象结构

Reviewing the object structure output again

image-20240920103046465
  • OFF: Offset in bytes

  • SZ: Size in bytes

  • TYPE DESCRIPTION: type description, shown here more visually, you can even see which part of the object header it is

  • VALUE: value, using hexadecimal string representation, note that a byte is 8bit and occupies two hexadecimal strings, before JOL version 0.15 it is presented in little end order, after version 0.15 (including 0.15) it is presented using big end order.

    1. Mark Word Analysis

image-20240920104856901

Because the current virtual machine is a 64-bit virtual machine, the Mark Word occupies 8 bytes, or 64 bits, in the object header. It is not affected by pointer compression, and the amount of memory it occupies is only related to the current virtual machine.

The current value is a hexadecimal value:0x0000000000000001, split it up by bytes for good looks:00 00 00 00 00 00 00 01, and then, to review the memory structure of mark workd:

64位Mark Word的结构信息

The last byte is 01 in hexadecimal, which, converted to a binary number, is00000001That's the last three bits.001, the biased lock flag bit biased is 0 and the lock flag bit is 01, corresponding to thelock-free stateunder the mark word data structure.

2、Class Pointer Analysis

image-20240920110627257

This field occupies 4 bytes with pointer compression turned on and 8 bytes without pointer compression turned on in 64-bit VMs, and it points to the memory address of the method area, where the Class object is located.

3、Object body analysis

image-20240920111056379

The Hello class has only one variable of type Integer, a, which occupies 4 bytes with pointer compression turned on and 8 bytes without pointer compression turned on in a 64-bit virtual machine. It is important to note that the 8 bytes are the size of the Integer object pointer, not the size of the memory occupied by the int value.

IV. Changes in object structure under different conditions

1, Mark Word in the hashCode

In a lock-free state, the mark word field in the object header has 31 bits for the hashCode value, but in the previous printout, the hashCode is all 0, why is that?

64位Mark Word的结构信息

There are two conditions that need to be met in order for the hashCode value to be displayed in the mark word:

  1. The target class cannot override the hashCode method
  2. The target object needs to call the hashCode method to generate the hashCode

In the experiment above, the Hello class is simple

public class Hello {
    private Integer a = 1;   
}

did not rewrite the hashCode method, using the JOL tool to analyze the hashCode value did not see the hashCode value, because did not call the hashCode () method to generate the hashCode value

Next, change the startup class, call the hashCode method, and re-export the parsed result

import .slf4j.Slf4j;
import ;
import ;

/**
 * @author kdyzm
 * @date 2024/9/19
 */
@Slf4j
public class JalTest {

    public static void main(String[] args) {
        (().details());
        Hello hello = new Hello();
        ();
        ("hello obj status:{}", (hello).toPrintable());
    }
}

output result

image-20240920132209032

As you can see, the Mark Word already has the hashCode value.

2. Byte alignment

From the JOL output, 8-byte alignment is used, and the object is exactly 16 bytes, an integer multiple of 8, so byte alignment is not used, in order to be able to see the effect of byte alignment, and then give the Hello class a new member variableInteger b = 2If it is known that an integer variable takes up 4 bytes of size space here, the object size will become 20 bytes, then it is not an integer multiple of 8, there will be 4 bytes of aligned byte padding, change the Hello class

public class Hello {
    private Integer a = 1;
    private Integer b = 2;
}

Then run the startup class

import .slf4j.Slf4j;
import ;
import ;

/**
 * @author kdyzm
 * @date 2024/9/19
 */
@Slf4j
public class JalTest {

    public static void main(String[] args) {
        (().details());
        Hello hello = new Hello();
        ();
        ("hello obj status:{}", (hello).toPrintable());
    }
}

Run results:

image-20240920133520395

Sure enough, with 4 extra bytes of padding to align the 8 bytes, the entire object instance size becomes 24 bytes.

3. Object structure of array types

Array type objects are definitely not the same as normal objects, even in the object header there is a special "array length" to record the length of the array. Change the startup class to see the object structure of an Integer array.

import .slf4j.Slf4j;
import ;
import ;

/**
 * @author kdyzm
 * @date 2024/9/19
 */
@Slf4j
public class JalTest {

    public static void main(String[] args) {
        (().details());
        Integer[] a = new Integer[]{1, 2, 3};
        ();
        ("hello obj status:{}", (a).toPrintable());
    }
}

output result

image-20240920134321859

The red part of the array object has an additional field for the length of the array compared to a normal object; and the next three integers, occupying a total of 12 bytes of memory space.

Look more carefully, plus the length of the array part of the object header part of a total of 16 bytes of space, this and the above Array base offsets the size of the same, this is because in order to access the real object value, from the beginning of the object to go through the 16 bytes of the object header in order to read to the object, this 16 bytes is also each element to read the "offset".

4、Pointer compression

Turn on pointer compression: -XX:+UseCompressedOops

Turn off pointer compression: -XX:-UseCompressedOops

In Intelij, just add this parameter to the VM Options in the following figure

image-20240920140730247

Note that pointer compression is turned on by default in java 8 and later.

Next, look at what the same parsing code prints out with and without pointer compression turned on

Code:

import .slf4j.Slf4j;
import ;
import ;

/**
 * @author kdyzm
 * @date 2024/9/19
 */
@Slf4j
public class JalTest {

    public static void main(String[] args) {
        ("\n{}",().details());
        Integer[] a = new Integer[]{1, 2, 3};
        ();
        ("hello obj status:\n{}", (a).toPrintable());
    }
}

The result of parsing with pointer compression turned on:

image-20240920141222851

The result of not turning on pointer compression:

image-20240920141306194

Based on the results with pointer compression turned on, observe the results without pointer compression turned on

image-20240920142324117

It should be noted that here the Integer [] array inside are Integer objects, rather than int type of value, it is an instance of the basic type of Integer wrapper class, here the array memory address is stored in the pointer reference to each Integer object, from the output of the VM information of the cross-reference table, "ref " type occupies 8 bytes, that's why it is 3*8 for 24 bytes size.

As you can see, there are two effects when pointer compression is turned on

  1. The object reference type will change from 8 bytes to 4 bytes
  2. The Class Pointer type in the object header will change from 8 bytes to 4 bytes

It does save space.

V. Extended reading

1. Big end sequence and small end sequence

Big Endian and Little Endian are two different ways of storing data, particularly in terms of the order in which multibyte data types, such as integers, are stored in computer memory.

  • Big Endian: In big-endian order, the high byte of data is stored at the low address and the low byte is stored at the high address. Analogous to the way numbers are written, the high digit is on the left and the low digit is on the right. Therefore, the Most Significant Byte (MSB) of the data is stored at the lowest address.
  • Little Endian: Conversely, in little-endian order, the low byte of data is stored at the low address and the high byte is stored at the high address. This approach is consistent with the order in which we read numbers, i.e., from low to high. Therefore, the Least Significant Byte (LSB) of the data is stored at the lowest address.

These two types of storage can be illustrated with a simple example:

Suppose you want to store a 4-byte integer0x12345678

  • In the big-end sequence, the storage order is12 34 56 78
  • In the small end sequence, the storage order is78 56 34 12

2. Older versions of JOL

Older versions of JOL (before 0.15) output values in small end-order, you can do an experiment by changing the maven coordinates to version 0.14

<!--Java Object Layout -->
<dependency>
    <groupId></groupId>
    <artifactId>jol-core</artifactId>
    <version>0.14</version>
</dependency>

Also to introduce new tool classes

<dependency>
    <groupId></groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.8.32</version>
</dependency>

Then modify the Hello class

import ;
import ;

/**
 * @author kdyzm
 * @date 2024/9/19
 /* @author kdyzm * @date 2024/9/19
public class Hello {

    private int a = 1; private int b = 2; private class Hello {
    private int b = 2; private class Hello { private int

    public String hexHash() {
        // Raw hashCode of the object, Java defaults to big endian mode.
        int hashCode = ();
        //Transform into a byte array in little-endian mode
        byte[] hashCode_LE = (hashCode, ByteOrder.LITTLE_ENDIAN);
        // into a string in hexadecimal form
        return (hashCode_LE);
    }
}

Startup Classes:

import .slf4j.Slf4j;
import ;
import ;

/**
 * @author kdyzm
 * @date 2024/9/19
 */
@Slf4j
public class JalTest {

    public static void main(String[] args) {
        ("\n{}", ().details());
        Hello hello = new Hello();
        ("hexadecimalhashCode:{}", ());
        ("JOLParsing tool outputs object structure:{}", (hello).toPrintable());
    }
}

Output results:

image-20240920151446421

Regardless of the discrepancy between the output of the old version and the new version, we can see that the hashCode manually calculated by the small end-order and the hashCode obtained by jol parsing are the same, which means that the output of the old version of jol (before 0.15) is small-end-ordered, which corresponds to the example of our Mark Word.

64位Mark Word的结构信息

Our legend is drawn in big-endian order, so it is the first byte 01 of the output of the older version that is the last byte of Mark Word's legend above.

Code unchanged, change JOL version number to 0.15

<!--Java Object Layout -->
<dependency>
    <groupId></groupId>
    <artifactId>jol-core</artifactId>
    <version>0.15</version>
</dependency>

The results of the run are as follows

image-20240920151945745

You can see that the manually calculated hashCode and the hashCode bytecode parsed by jol are reversed, i.e., from version 0.15 onwards, jol's output has become big-endian.

3. hutool's bugs

The code above uses the hutool utility class to calculate the hexadecimal string of the hashCode value, and at first I introduced a dependency that looked like this

<dependency>
    <groupId></groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.7.3</version>
</dependency>

There is a very important int type conversion to byte array:#intToBytes(int, )

The source code looks like this:

image-20240920152601176

Obvious bug, it judged the little-end-order flag but returned the big-end-order byte array, and unsurprisingly, my code was running contradictory. So I specifically went to gitee and found that its master code had changed

image-20240920152920125

Yes, it master code has been fixed. Looked for commits and found this one

image-20240920153048975

The corresponding COMMIT record link:/dromara/hutool/commit/d4a7ddac3b30db516aec752562cae3436a4877c0

image-20240920153256248

It was also trolled hahaha, introducing version 5.8.32 fixed it

<dependency>
    <groupId></groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.8.32</version>
</dependency>


END.



reference document

Java High Concurrency Core Programming Volume 2: Multithreading, Locks, JMM, JUC, High Concurrency Design Patterns.

Memory Layout of Java Objects

See the end, feel free to follow my personal blog ⌯'▾'⌯: