1. Synchronization
1.1 Constructing Locks
- Atomic implementation: Some parts of an application must be executed exclusively (atomically), which means that during the execution of these parts, other parallel processes cannot access or modify the relevant data. For example, an account transfer operation needs to ensure that funds are not modified at the same time to avoid inconsistencies.
- The Role of Locks: Locks are used to protect blocks of data or code, ensuring that only one process can access the Critical Section at a given point in time.When a process acquires a lock, other processes must wait until the lock is released, thus guaranteeing secure access to data。
- hardware primitive: The hardware needs to support some basic operations (such as atomicity operations) to help implement locks. These operations can be hardware-level instructions (e.g., atomic locking instructions) or other mechanisms that provide the functionality needed to build locks.
- Cache coherence mechanism: The correctness of the lock depends on theCache coherence mechanism. That is, when a process updates the state of a lock, other processes will eventually sense the update. For example.If a process sets a lock to the "occupied" state, other processes can sense this state through cache coherency and wait for it。
1.2 Synchronization
-
The simplest hardware primitive is atomic read-modify-write, which greatly simplifies the implementation of synchronization (e.g., locks, barriers, etc.).
-
atomic exchange: Swap the contents of registers and memory. Atomic swapping ensures that no other processor or thread can access the memory location at the same time as this operation is performed. With atomic swapping, exclusive access to shared resources is possible.
-
Test & Set: This is a special application of Atomic Swap to check the status of a memory Lock and set that Lock to 1. The procedure is as follows:
-
The contents of the memory Lock are first loaded into the registers.
-
Set this memory Lock to 1.
The result of the Test & Set operation isCan determine if a location is already occupied, usually used to implement locking mechanisms。
A 0 in the Lock variable means that no one is currently holding the Lock and no one is executing the Critical Section so that others can continue to acquire the Lock;
-
-
Examples of lock implementations:
lock: t&s register, location bnz register, lock ;Check the contents of the register, if it is not zero, the If it is not zero, it means that the lock is already occupied by another process and the current process needs to wait. CS ;indicates the Critical Section, i.e. the code segment that accesses the shared resources. Only the process that has acquired the lock can enter this area. st location, #0 ;After exiting the Critical Section, reset the memory location to 0 and release the lock.
Assuming that Lock is not fetched at the beginning, i.e., it is idle, other threads are free to enter this Critical Section.
1.3 Caching Locks
In the traditional spinlock mechanism, if the lock exists in memory, each attempt to acquire the lock results in a large amount of communication traffic on the bus (since each operation requires a memory access). This causes congestion on the bus, making it difficult for other processes to make efficient progress.
by introducingcache lockthat can solve this problem. Specifically:
- Benefits of Cache Locks: Store locks in cache rather than directly in memory. This ensures that every time the lock is updated, the cache consistency protocol ensures that other processors will see the lock's state update in a timely manner.
- Exclusive access: Once a process has successfully acquired a lock, it holds the lock in an "exclusive" state, which allows it to update the state of the lock directly in the cache without frequent memory accesses.
- local spin: Instead of accessing memory directly, other processes waiting for locks can perform spin operations on their respective local cached copies. This approach reduces frequent communication traffic on the bus and lowers system overhead.
1.4 Coherence Traffic for a Lock
In multiprocessor systems, locking mechanisms are often used to control access to shared resources. In order toReducing unnecessary consistency traffic, lock consistency protocols take certain measures to minimize communication overhead while waiting for locks. Namelytest & test & set。
- Spin Locks and Write Operations:When multiple processes are waiting for a lock, if each process performs a swap operation (write operation) directly, each swap triggers an invalidation operation, forcing the cached copies of the other processes to be invalidated, resulting in the ownership of the lock constantly switching between processes.This generates a lot of consistent traffic.
- Read local copies to reduce traffic: In order to reduce the flow of consistencyA process waiting for a lock can spin up by reading a locally cached copy of the. The read operation does not generate coherent traffic, so theMultiple processes can spin up their respective caches without disturbing the cache state of other processes.
- Lock Release and Reacquisition: When the lock owner sets the lock value to 0 to release the lock, theCopies of locks in other processes' caches are invalidated. At this point, each waiting process tries to read the lock state, which will cause a cache miss and thus reacquire a copy of that cached block. If the process reads 0, it tries to perform a swap operation (which is required to obtain the exclusive state of the block).The first process to acquire the exclusive state will get the lock, while the other processes continue to spin and wait.
This mechanism reduces unnecessary consistency traffic and makes lock management more efficient while ensuring access control to shared resources.
1.5 Test-and-Test-and-Set
lock: test register, location
bnz register, lock
t&s register, location
bnz register, lock
CS
st location, #0
1.6 Load-Linked (LL) and Store Conditional (SC)
Load-Linked (LL) and Store Conditional (SC) are mechanisms used to implement atomic operations, often in concurrent programming for thread-safe operations. LL-SC provide a more flexible and low-overhead approach to atomic operations than other locking mechanisms.
-
Load-Linked (LL):The LL operation reads the value of a memory address and records this address in some internal table. Thereafter, the processor can perform some calculations without immediately locking or updating this address.
-
Store Conditional (SC): SC operation attemptWrites the value to the same memory address as the LL operationBut only inThis address has not been modified by another process or thread since the LL operationThe SC operation succeeds only when the LL-SC operation succeeds. This mechanism ensures that this LL-SC operation can be regarded as atomic in the absence of other interferences.
LL reads x into R1 and puts the address of x into a table that monitors whether other processes have modified x. After a series of operations on R1, SC R1 to x. If x has not been modified, SC succeeds; if x has been modified, SC fails, and a tag is set in R1. after this operation is complete, check R1; if R1 is okay, the command completes; if R1 has a tag set, re-execute the command. If there is no problem with R1, the command is completed; if R1 is set with a tag, the command is re-executed.
In the implementation, SC operations do not generate bus traffic on failure (no additional data transfers are required), which reduces the communication overhead associated with cache coherence protocols. As a result, the hardware implementation of LL-SC is more efficient than the traditionaltest&test&set
more efficient, as the latter requires constant retries and generates a lot of bus traffic when it fails.
To summarize, the LL-SC mechanism allows a high degree of concurrency while reducing hardware and communication overhead, and is therefore commonly used in parallel systems that require efficient atomic operations.
1.7 Spin Locks with Low Consistency Traffic
lockit: LL R2, 0(R1) ;Load Linked instruction, loads the data at R1+0 into register R2.
This instruction does not trigger cache coherence traffic, meaning that it does not invalidate the contents of the cache.
BNEZ R2, lockit ;When the lock is not available, the program will always "spin" here.
DADDUI R2, R0, #1 ;put value R0+1 in R2
SC R2, 0(R1) ;store-conditional successes if no one
;updated the lock since the last LL
BEQZ R2, lockit ;confirm that SC succeeded, else keep trying
- If there are i processes waiting for a lock, how many bus transactions are there?
- 1 write transaction: The Releaser performs the unlock operation.
-
i
Read-miss requests: All waiting processes will try to read the status of the lock, which will cause thei
The second read failed to hit the request. -
i
Response (or 1): The cache system returnsi
The response to a second read of a missed request may be a response to each request, or it may be that only one response is needed to satisfy all waiters. - 1 write transaction: The process that successfully acquired the lock (Acquirer) performs a write operation to indicate that the lock is occupied.
-
0 times
SC
fail (e.g. experiments): Assuming that in aSC
No other process has changed the lock state before success, theSC
It only takes one time to succeed. FailedSC
The count is zero. -
i-1
second reading request: Remainingi-1
A process will continue to try to acquire lock status, thus issuing thei-1
The second read failed to hit the request. -
i-1
Response (or 1): Similar to before, the system may provide a separate response for each read miss, or it may satisfy all requests with a single response.
optimal point:i
respond in singingi-1
The number of read miss requests and their responses can be reduced to one by a caching mechanism to further reduce the number of bus transactions.
1.8 Further reduction in bandwidth requirements
-
Ticket Lock:
Each arriving process automatically gets a ticket and increments the ticket counter (using theLL-SC
(Command completion). After acquiring the ticket, the process keeps checking thenow-serving
variable to see if it is its turn to execute. When execution is complete, the process will set thenow-serving
The variable is incremented to allow the next queued process to enter.-
account for: Ticket locks achieve fairness by distributing "tickets" so that each process enters the critical zone in the order it arrives. Processes only need to check the
now-serving
variables, eliminating the need to send repeated requests and reducing bandwidth requirements.
-
account for: Ticket locks achieve fairness by distributing "tickets" so that each process enters the critical zone in the order it arrives. Processes only need to check the
-
Array-Based Lock:
This locking mechanism does not use a singlenow-serving
variable, but instead uses anow-serving
arrays, each process waits for a specific array element that belongs to it.- specificities: This approach achieves fairness, low latency, low bandwidth requirements, and high scalability.
- account for: Since each process waits on a different variable, they do not affect each other during contention, reducing bus bandwidth pressure. However, this approach requires more storage space to maintain the array.
-
Queueing Locks:
Queue locks are used by the directory controller to record the order of arrival of requests. When the lock becomes available, the controller passes the lock to the next process in the queue. This way only one process receives invalidation and update notifications.- account for: This approach ensures that processes acquire locks sequentially in the order of arrival, reducing the communication and bandwidth overhead caused by repeated requests from multiple processes. Invalidation traffic is also controlled since only one process receives lock status updates.
2. Memory Consistence Model (MCM)
2.1 Coherence Vs. Consistency
Coherence:
- Cache consistency concerns thesingle memory address (computing)The operating rules of the
- Write Propagation:For example, multiple cores may access a memory address at the same time. Cache coherence guarantees that when a core writes a value to that address, the other cores will eventually see that value and will not be permanently stuck in the old state.
-
Write Serialization: at the same time, it also requires an understanding of theWrite operations to the same memory addressThere is a consistent order (global consistency). Regardless of the core, everyone must start withsame orderObserve these write operations.
- That is, for
x
Seex <- 5, x <- 7
The order is consistent. - But: for
x, y
Seex <- 7, y <- 8
Not necessarily in the same order
- That is, for
Consistency (consistency model):
- Consistency modeling, on the other hand, is a broader concept that defines the understanding of theMultiple memory addressesBehavioral rules for performing operations. Example:
- sequential consistencyis the strictest type of model, requiring that all cores observe instructions executed in the same order as written in the program.
- Weaker models may allow for certain optimizations, such as promiscuous execution, that result in different cores seeing operations in different orders.
- That is, for
x, y
Seex <- 7, y <- 8
The order is consistent.
- The hardware will provide a specific model that the programmer needs to work under to write programs that will run correctly.
2.1.1 Example of a program
-
Sequential Consistency
- Program order
- Each instruction is executed atomically
2.2 Sequential Consistency
Hypothesis:
- In a program, the order of procedures is preserved.
- Each instruction is executed atomically.
- Instructions from different threads can be interleaved in any way.
Valid executions:
abAcBCDe...
ABCDEFabGc...
abcAdBe...
aAbBcCdDe...
- ...
Sequential Consistency (SC) is a memory consistency modelProvisionsAll processor read and write operations to shared memory must appear to be executed in a globally fixed orderand that this order is consistent with the program order of each processor.
Programmers often assume that programs have sequential consistency, which makes it easier to reason about program behavior.
However, hardware innovations may disrupt this sequential consistency model.
For example, if we introduce write buffers, jumble execution, or drop acknowledgement signals (ACKS) in a conformance protocol, a program that previously ran normally may produce unexpected output.
-
Write Buffers: Write operations are first deposited into the buffer and delayed writes to memory may cause other processors to see the old value instead of the latest value.
-
Out-of-Order Execution (OOE): In order to improve performance, the processor may reorder the execution order of instructions, causing the program's order of operations to appear inconsistent with the source code.
-
Discard ACKS (Acknowledgment): In the cache coherency protocol, ACKs are used to ensure that update operations are acknowledged by other processors. If ACKs are lost or ignored, this may result in inconsistent views of memory across processors.
2.2.1 Consistency Example-1
anOut-of-Order (OOO) executionThe core will not detect the processingA
instructions and processingB
dependencies between the instructions; as a result, these operations may be reordered. This situation is fine for single threads, but can cause problems in multi-threads.
clarification: The consistency model allows the programmer to understand the assumptions about what operational rearrangements can be made by the hardware.
2.2.2 Consistency Example-2
in the event thatAcknowledgement signals (ACKs) are not required for invalidation operations (Coherence Invalidation) in conformance protocolsWe can't be sure that all threads seeA
The latest value of the
As in the figure above due toDIR-A
(used form a nominal expression)INV
The signal is transmitted to theP3
The transmission time of theP3
The execution did not read theA=1
。
possessACK
The signaling process is shown above, withP2
(used form a nominal expression)Req A
Signal must waitP3
(used form a nominal expression)ACK
Signal return to execute。
2.2.3 Summary
A multiprocessor system is a multiprocessor system if the result of its execution can be achieved byThe order is consistent.The:
- The order of execution of program instructions is maintained within each processor;
- Accesses to memory by different processors are executed in an arbitrary interleaved order.
This can be achieved bysequential consistency:
- Maintaining program order: Instructions within each processor are executed in the order in which they are programmed;
- Write operation serialization: Ensure that all processors observe write operations to shared variables in the same order;
- Update first, then read: Before a value can be read, all processors must see the update to that value.
This model is very intuitive for programmers, but extremely inefficient to implement.
Since this method is very slow, the following alternatives are available:
- Optimization of hardware(e.g., validate load operations);
- Provides a relaxed memory consistency model and uses memory barriers (fences) to control the order of critical operations。
2.3 Relaxed Consistency Models
We want an intuitive programming model (e.g., sequential consistency), but we also want to achieve high performance.
In some parts of the program, we careData racesrespond in singingInstruction reorderinglimitations, and is less concerned in other parts of the
As a result, we relax the order consistency constraints in most cases, but enforce them strictly in specific parts of the code.
Memory barrier instructions (Fence instructions) is a special instruction that requires that all memory accesses before it must complete before it can proceed to the next operation (to achieve the effect of sequential consistency).
2.3.1 Fences
define: The Fence instruction is an explicit synchronization mechanism used to force a certain order of memory accesses.
functionality:
- The Fence instruction requires that all previous memory accesses (reads and writes) be completed before subsequent operations can be performed.
- It effectively inserts a barrier into the code, ensuring that previous operations are not executed out of order at both the program and hardware level.
3. Transaction Memory
3.1 Transactions
-
Problems with the locking mechanism
-
sophistication: When using locks for synchronization, programmers must manually manage shared resources to ensure that each thread is properly locked and unlocked. This is error-prone, especially in complex multi-threaded programs.
-
deadlock risk: If threads acquire multiple locks in the wrong order, or if a thread forgets to release a lock, it can cause the program to fail to move forward (i.e., deadlock).
-
Performance issues: Locks are blocking, which means that while one thread holds a lock, other threads must wait, potentially reducing parallelism.
-
-
Introduction of Transaction Memory
transaction memoryis a hardware or software mechanism used to replace traditional lock synchronization and simplify parallel programming:
-
core idea: The concurrent part of the program runs as a transaction, similar to the concept of a database transaction.
- Commencement of business: Marks the start of execution of a piece of code.
- speculative execution: All operations inside a transaction are temporarily stored in a buffer and not immediately updated to shared memory.
-
Closure of business: Check for conflicts (e.g., two transactions modifying the same resource at the same time).
- If there are no conflicts, the result of the transaction is committed to shared memory.
- If a conflict is detected, the transaction is rolled back (canceled) and possibly re-executed.
- This eliminates the need to explicitly use locks, avoiding deadlocks and other synchronization errors.
-
core idea: The concurrent part of the program runs as a transaction, similar to the concept of a database transaction.
-
dominance
- improve performance: The blocking mechanism of locks can lead to performance bottlenecks, whereas transactions use speculative execution, which greatly improves parallel efficiency.
- Eliminate deadlocks: There is no explicit locking operation, which naturally avoids deadlock problems due to improper lock management.
-
Simplified Programming: The programmer simply takes the key blocks of code with the
transaction begin
respond in singingtransaction end
Wrapped without having to worry about the complexity of adding and unlocking locks. Hardware/software automatically manages conflict detection and rollback of transactions.
3.1.1 Example-1
-
producer-consumer relationship:
-
Producer
Put the task into the end of the work queue (tail
),consumer
From the head of the work queue (head
) Pulling tasks. -
Disadvantages of using locks: As a result of
head
respond in singingtail
updates require a lock, and two threads (Producer
respond in singingconsumer
) cannot operate in parallel. -
Advantages of using transactions:
- The enqueue and dequeue operations can be performed in parallel.
- The transaction aborts (abort) only when the queue is almost empty.
-
3.1.2 Example-2
-
As shown in the figure below.
T1
Only changedhead
,T2
Only changedtail
Then both transactions can be executed normally. -
As shown in the figure below.
T1
changedhead
respond in singingtail
,T2
Only changedtail
So.T2
cannot be executed properly, the transaction willRollback (abort)to the start state and try to re-execute it.
3.1.3 Transactional Semantics
-
isolation
Isolation of a transaction means that while the transaction is executing, no other transaction or thread can see its intermediate state or influence its operation. The transaction will behave as if it is running in a separate environment until it completes, then it commits the results to the shared system. This feature makes transaction operations more intuitive and reliable for programmers.
-
atomicity
Atomicity refers to the fact that all operations (reads, writes, etc.) of a transaction are eitherDone.EitherAll incomplete.. This ensures that:
-
If a transaction is interrupted in the middle (e.g., by a conflict or system problem), no incomplete state is left behind.
-
The result of a transaction is either a successful commit of all operations or a rollback of all changes to the state prior to the start of the transaction.
-
-
Transaction failures and retries
If the following problem occurs during transaction execution:
-
collision (of interests): For example, other threads simultaneously modify the shared variable that the transaction is trying to manipulate.
-
system error: For example, a resource is not available or a transaction fails to meet a consistency requirement.
(political) affairs committeeRollback (abort)to the start state and attempts to re-execute. This mechanism automatically resolves data conflicts and reduces the burden on the programmer.
transaction begin // The transaction begins. read shared variables // read shared variables arithmetic // Perform arithmetic operations or other logic. write shared variables // modify shared variables transaction end // transaction ends (all operations are committed if the transaction satisfies conditions (e.g., no conflicts))
-
3.2 Transaction Memory (TM) Implementation
- The cache tracks read-sets and write-sets in a transaction.
- Write operations are visible to other cores only at the end of the transaction.
- Make the transaction's write operation visible to other cores when the transaction commits; if a conflict is detected, the other transaction may be rolled back (abort).
3.2.1 Detecting Conflicts - Basic Implementation
Write operations can be cached (not written directly to main memory):
- If the cache block needs to be evicted, mark it as an overflow (temporarily suspend the transaction).
- Invalidates written cache lines on transaction abort.
Tracking read collections and write collections:
- For each transaction, bits are set in the cache to record read-sets and write-sets.
Comparisons are made when other transactions are committed:
- Compare the set of writes from other transactions with the set of reads from the current transaction.
- If there is an overlap (match), the current transaction aborts.
Submitting intentions at the end of a transaction:
- A collection of writes to broadcast transactions.
- Multiple transactions can commit in parallel if their set of writes does not intersect.
3.2.2 Summary of Transaction Memory Benefits
Programming simplicity (advantages of coarse-grained locks):
- coarse-grained lock: Protecting large areas of code with a small number of locks is simple to program, but has low concurrency.
-
transaction memory: Providing simplicity similar to coarse-grained locking, programmers can easily divide code into transaction blocks (
begin
respond in singingend
), eliminating the need to be concerned with fine-grained synchronization issues and reducing programming complexity.
High performance (advantages of fine-grained locks):
- Fine-grained locks: Separate locks on different data or code blocks can improve concurrency performance, but programming is complex and error-prone.
- transaction memory: Performance optimization is achieved by recording read and write collections, deferred writes (write operations only take effect when the transaction commits), and other mechanisms that approach the efficiency of fine-grained locks and reduce lock contention problems.
Avoiding Deadlocks:
- Causes of Deadlocks: Multiple threads hold different locks and wait for each other to release resources, creating a circular wait.
- TM How to avoid deadlocks: Transactions are automatically rolled back and retried when a conflict is detected, thus breaking the lock waiting cycle and avoiding deadlocks.
3.3 Design space
Data Versioning:
- Eager version (Eager): based on undo logs
- Lazy version (Lazy): based on write buffer
Conflict Detection (Conflict Detection):
- Optimistic detection: check for conflicts at commit time (optimistically continue during transaction execution)
- Pessimistic detection (Pessimistic detection): check for conflicts with every read/write operation (reduces workload on commits)
3.3.1 "Lazy" implementation
For small multiprocessor implementations based on the Snoop protocol
- For small-scale multiprocessor systems, transactional memory is implemented using a cache coherence protocol based on bus listening to Snoop.
"Lazy" Versioning and "Lazy" Conflict Detection
- Optimism that the conflict can eventually be resolved.
- Defer conflict issues until the transaction commits to check and resolve conflicts.
Transaction parallel commits are not supported
- Multiple transactions cannot complete the commit operation at the same time and need to be committed one by one in sequence.
When a transaction issues aread requestWhen: If the data block is not already in the cache, fetch the data block in read-only mode and set therd-bit
(read bit).
When a transaction issues awrite requestWhen: If the data block is not already in the cache, fetch the data block in read-only mode, set thewr-bit
(write bit) and modify the data in the cache.
If there is awr-bit
of the cache line is evicted, the transaction must be aborted, or some software mechanism must be relied upon to preserve the overflowed data.
When a transaction reaches the end, it needs to persist its write operations: Commit the write operation only at the last stage of the transaction to make the data persistent.
A central arbiter for coordination (easy to implement in bus-based systems):
-
The winning transaction will occupy the bus until all cache line addresses that need to be written are broadcast (this is the commit process for the transaction).
-
There is no need to write back to the main memory immediately, just invalidate the other copies of these cached lines.
When another transaction (which has not yet started committing) detects that a cached row in its read collection is invalidated:
- The affair realizes that it has lost its atomicity.
- It then aborts (clearing its read and write flags) and resumes execution.
3.3.2 Summary of "Lazy" implementation
"Lazy" version control:
- implementation method: Transactions do not directly modify the global shared data ("master copy") during execution, but instead save the changes in the local cache.
- vantage: Reduce frequent modifications of global data to avoid unnecessary conflicts.
- affect (usually adversely): The commit phase requires a one-time synchronization of local changes to the master replica, resulting in a high commit overhead.
"Lazy" conflict detection:
- implementation method: Conflicts are not actively checked during the execution of a transaction, but only at the end of the transaction when it is committed.
- vantage: Reduce the frequency of conflict detection and optimize the speed of transaction execution.
- affect (usually adversely): If multiple transactions conflict, conflict handling (e.g., rollback) is triggered at commit time, which may affect commit efficiency.
Fast Rollback:
- Rollback step:
- Clears the transaction flag bit from the cache.
- Refresh the pipeline to clear affected instructions.
- Restore to a previously saved register checkpoint.
- vantage: Rollback operations are clean and efficient, reducing the cost of dealing with transaction failures.
Slower submission:
-
rationale:
- The commit phase requires conflict detection to ensure that local changes do not conflict with other transactions.
- Cache coherency updates for write operations (e.g., notifying other processors of cache invalidations) are deferred until commit time.
- affect (usually adversely): While performance is high during execution, the transaction commit phase can be a performance bottleneck.
No deadlock risk:
- rationale: In a bus contention, the first transaction to gain control of the bus commits first, breaking a possible circular wait.
- in the end: Effectively avoids the deadlock problem in the traditional locking mechanism.
Hunger:
- cause of formation: Commits may not be completed for long periods of time because some transactions are always preempted by others (e.g., always failing to fetch the bus).
- method settle an issue: Hunger avoidance mechanisms need to be introduced, such as priority escalation or polling mechanisms, to ensure that every transaction has a chance to complete.
3.3.3 "Eager" realization
Write operations take effect immediately:
- In an "ahead" implementation, writes are updated to memory immediately, rather than waiting for the transaction to finish before committing uniformly.
Returns the latest value when read by another transaction:
- If another transaction tries to read the value in the meantime, the latest updated value is returned, and the memory may also be updated to the latest value.
Retain old values for rollback:
-
To prevent the loss of old data when the current transaction is aborted due to a conflict, the old values are saved to a log before the write operation.
-
Log storage location: Logs are stored in virtual memory and can be manipulated in the cache, so there is little performance overhead.
This method is called Eager Versioning。
Since the write operation of transaction A takes effect immediatelyThis may cause a read/write operation in another transaction B to miss and be redirected to transaction A.
At this point, we detect a conflict(Since neither transaction has reached its end point, it isConflict detection for "Eager"): Two transactions operate on the same cache line at the same time, and at least one of them performs a write operation.
One Solution: The requesting party stalls. Transaction A sends a NACK to transaction B. Transaction B waits and retries, hoping that transaction A will commit and hand over the latest cached line to B.
- in the end: Neither transaction needs to be aborted.
May cause deadlocks: Each transaction is waiting for another to complete.
A separate (hardware/software) contention manager is needed toDetecting these deadlocks and forcing one of the transactions to abort
3.3.4 Summary of "Eager" implementation
Inter-transaction read/write conflicts and stalls:
- existEager*implementation, transactions are performed sequentially. When *Tr-B When attempting to perform a write operation, if theTr-Ahas read a value, but does not want to invalidate its cache line.Tr-B May have to waitTr-A Complete the operation before continuing. This dependency can cause transactions to stall, especially in highly concurrent situations.
Starvation:
- Starvation is a situation where some transactions are unable to execute because their resources are constantly occupied by other transactions. In this context, if there are a lot of new read transactions (e.g.Tr-A), andTr-BMust wait.Tr-B It may be starved for resources because it has been unable to obtain the resources it needs. Solving this problem usually requires additional hardware and software mechanisms to ensure transaction fairness and resource allocation.
Log Storage and Transaction Size:
- In the eager implementation, the transaction's log is stored in thevirtual memoryin the cache. This means that the logging of transactions is not limited by the size of the physical cache, so larger transactions can be processed. This is because virtual memory provides a larger memory space where transaction data is not lost due to cache overflow.
Costs of submission and suspension:
- Submit the operation (commit)is relatively inexpensive because it simply writes the transaction's data to the final storage location and requires no additional processing steps.
- However.Abortis more costly because the previous state and data must be recovered from the logs to ensure that the system remains consistent. Although abort operations are relatively rare, they require high computational resources and time.
3.3.5 Discussion
- "Eager" optimizes common situations and does not waste energy when conflicts may occur
- Transactional memory implementations require relatively low hardware support
- Multiple commercial implementation examples: Sun Rock, AMD ASF, IBM BG/Q, Intel Haswell
- Transactions are usually short - more than 95% of transactions will be fully adapted for caching
- Transactions are usually successfully committed - less than 10% of transactions are aborted
- 99.9% of transactions do not perform I/O operations
- Applying Amdahl's Law again: optimizing common situations!