The node that contains the compute (CPU) and state (memory, disk), it’s either a standalone machine or a running process (a member of the the distributed systems).
The computing power that does the actual work is The CPU.
Somewhere holds data memory, disk, or database.
The process is by which we split the compute (CPU) the state (memory, disk), or both (Node) into multiple machines and they are communicating asynchronously through unreliable communication channels (Can fail).
the state is offloaded.
Which machine doesn’t hold the state, the state persists elsewhere and accesses it. for example, web applications run on many servers but they access the same database or database cluster.
Every Node holds a state
Which machine holds the state of its own. for example
If we have Node A and Node B:
With this mechanism, we trade off the availability because if one machine is down we do not have the whole state. to mitigate this problem how likely is the company can pay for higher availability? like replicating the machine into whatever we want.
Suppose we have 6 nodes that are connected through the network, the partitioning problem happens when there is a subset of nodes separated from the other nodes and they think that they are the only nodes alive and the same for the other nodes.
C for Consistency, A for Availability, and P for Partitioning tolerance.
Of course not and depending on the use case, the key here is what to trade for Engineering is the Art of making tradeoffs and how to communicate that to business owners.
It’s more important to solve how to recover from the failure not to be fault tolerant against (recoverable)
Suppose we have two armies (army 1 and army 2) that need to invade the country but they must attack together at the same time to invade the country.
they must be communicating to agree on the time to invade, we suppose the messenger is authentic and the message will be received is authentic and true.
If we send the messenger from Army 1 with a certain time to invade and Army 2 receives the message and agrees on the time he sends another messenger to confirm the invasion time.
The problem here is from Army 1 point he sends the messenger and assumes Army 2 received it and will attack on time or he waits for the response from Army 2 to confirm but from Army 2 point he confirms the time and sends the messenger but he didn’t know if army 1 received the confirmation or not, Do we make army 1 send another messenger to confirm the confirmation, we are stuck in a loop, it requires send and received an infinite number of messages to confirm the time, This type problem called Consensus (common knowledge). to reach the consensus we require an infinite number of messages (assurance problem). No Solution for this problem for 100%.
The same as the previous problem but the messages is will be received 100% we assure you that the message will be received but the message maybe it’s corrupted or the messenger is disloyal or one of the army is actively sending wrong information (gray failure the army or the node is running but actively send wrong messages or corrupted or compromised or the messenger late).
In distributed systems each node has a state, All Nodes must agree on One state (Consensus) in unreliable communication channels between the nodes.
There are a class of algorithms that specialize in solving consensus problems for example Paxos, and Raft