Main Page | Report this Page
Linux Forum Index  »  Linux Development - Applications  »  a question on synchronizing distributed servers...
Page 1 of 1    

a question on synchronizing distributed servers...

Author Message
lmike...
Posted: Mon Oct 19, 2009 4:34 pm
Guest
A "client" will do two operations -- simply "add" and "get" data, to a
"Database".

The "client" and "database" are connected by n (where n >= 2)
"servers". See this chart:

client<---->Server 1<------>Database
^ ^
| |
|---------->server 2<--------------|
| |
| |
|---------->server 3<--------------|

... ...


The assumptions:

--The client's add/get messages are randomly dispatched to server 1,
2, or 3. These servers run in parallel.
--Each server forwards its received add/get message to the Database,
and forwards the database response (success/failure) back to client.
--Add and get operations on the Database side are atomic to each
other.
--The serves can communicate to each other.


The question:

--can I design a solution to have each server maintain its own cache,
so that Whenever there is a "add" request from the client, the server
stores the data in its cache as well as into the database. And
whenever there is a "get" request from the client, the server will
first look into its cache, and if there is a hit, it will directly
return the data found in its cache to the client, but not forward the
"get" request to the database.

The issue is how to design a synchromization mechanism between those
servers, so that the servers' caches are kept in synch.

I think that this might be a traditional parallel programming issue,
but I am not aware of the solution. Any one please know of a solution?
BTW, I'd like not to have one centralized "cache server" to cache the
data.

Thank you
Mike
 
David Schwartz...
Posted: Mon Oct 19, 2009 6:05 pm
Guest
On Oct 19, 7:34 pm, lmike <lmike3... at (no spam) gmail.com> wrote:

Quote:
The question:

--can I design a solution to have each server maintain its own cache,
so that Whenever there is a "add" request from the client, the server
stores the data in its cache as well as into the database. And
whenever there is a "get" request from the client, the server will
first look into its cache, and if there is a hit, it will directly
return the data found in its cache to the client, but not forward the
"get" request to the database.

Yes, you can. But how will it will work depends upon your design
requirements.

Quote:
The issue is how to design a synchromization mechanism between those
servers, so that the servers' caches are kept in synch.

I think that this might be a traditional parallel programming issue,
but I am not aware of the solution. Any one please know of a solution?
BTW, I'd like not to have one centralized "cache server" to cache the
data.

How often is data modified? How well is stale data tolerated? About
how many servers do you plan to have?

Do you expect the same server to use the same object over and over?

If data is rarely modified, then each server can cache as much data as
it wants. When a server modifies data, have it broadcast an
"invalidate" to all other servers.

If data that's, say, up to a minute old is well-tolerated, then you
can cache data for only a minute.

If there aren't that many servers and the database is really
expensive, it may be better to broadcast a request to all servers and
let any server that has current data serve it out of its cache.

There are more algorithms for this than you can shake a stick at, and
the correct one depends on how much locality of reference you expect,
how many servers you have, your operation mix (particularly read/write
ratios), and how 'fresh' data needs to be.

DS
 
Jasen Betts...
Posted: Tue Oct 20, 2009 3:59 am
Guest
On 2009-10-20, lmike <lmike3000 at (no spam) gmail.com> wrote:

illegible ascii art deleted.

Quote:
The assumptions:

--The client's add/get messages are randomly dispatched to server 1,
2, or 3. These servers run in parallel.
--Each server forwards its received add/get message to the Database,
and forwards the database response (success/failure) back to client.
--Add and get operations on the Database side are atomic to each
other.
--The serves can communicate to each other.

The question:

--can I design a solution to have each server maintain its own cache,
so that Whenever there is a "add" request from the client, the server
stores the data in its cache as well as into the database. And
whenever there is a "get" request from the client, the server will
first look into its cache, and if there is a hit, it will directly
return the data found in its cache to the client, but not forward the
"get" request to the database.

if an add request cannot invalidate cached get responses that could
work quite easily.

if it can you'll need to also forward the add to the other servers so
they can react apropriately (eg. drop the cache records that have been
invalidated)
 
lmike...
Posted: Tue Oct 20, 2009 3:05 pm
Guest
On Oct 19, 11:05 pm, David Schwartz <dav... at (no spam) webmaster.com> wrote:
Quote:
On Oct 19, 7:34 pm, lmike <lmike3... at (no spam) gmail.com> wrote:

The question:
--can I design a solution to have each server maintain its own cache,
so that Whenever there is a "add" request from the client, the server
stores the data in its cache as well as into the database. And
whenever there is a "get" request from the client, the server will
first look into its cache, and if there is a hit, it will directly
return the data found in its cache to the client, but not forward the
"get" request to the database.

Yes, you can. But how will it will work depends upon your design
requirements.

The issue is how to design a synchromization mechanism between those
servers, so that the servers' caches are kept in synch.

I think that this might be a traditional parallel programming issue,
but I am not aware of the solution. Any one please know of a solution?
BTW, I'd like not to have one centralized "cache server" to cache the
data.

How often is data modified? How well is stale data tolerated? About
how many servers do you plan to have?


First of all, I am sorry that when I said "add" in my previous
message, I really meant "add" or "modify', so, there can be actually
three actions from the client, i.e.
add, modify, and get.

-the data is modified less frequently, and the amount of data is not
big. So, don't need to care about traffic and storage space, etc.

-the client is supposed to immediately see the modified data, if he/
she does a "get" right after a "modify"

-there will be less then 10 (say, 3) servers, each of them runs on a
separate box.


Quote:
Do you expect the same server to use the same object over and over?


I am not quite sure what you meant by "object" here, but the add/
modify/get messages go randomly to those servers. All the servers
always connect to the same client and the same database.

Quote:
If data is rarely modified, then each server can cache as much data as
it wants. When a server modifies data, have it broadcast an
"invalidate" to all other servers.

If data that's, say, up to a minute old is well-tolerated, then you
can cache data for only a minute.

If there aren't that many servers and the database is really
expensive, it may be better to broadcast a request to all servers and
let any server that has current data serve it out of its cache.

There are more algorithms for this than you can shake a stick at, and
the correct one depends on how much locality of reference you expect,
how many servers you have, your operation mix (particularly read/write
ratios), and how 'fresh' data needs to be.

DS

I am thinking that for the add operation, it probably works ok: once
the server gets the response from the database saying that the add is
a success, it will broadcast the data to other servers, for them to
add to their cache.

The problem may be with the modify: when server 1 receives a "modify",
it can broadcast invalidate message to other servers. The issue here
is, a "get" right after the "modify" from the client may compete with
that "invalidate" message from server 1. If the "get" message hits
server 2 before the "invalidate" message coming from server 1, server
2 may find the (old) data in its cache and will return the (old) data
to client. The result in client's view is then wrong, i.e. he/she did
a "get" right after "modify", but got old data instead of the new
modified data.

I think that such a race condition will always exist. I now doubt
there is a solution to this problem, given all the requirements as
said above. Am I correct?
 
lmike...
Posted: Tue Oct 20, 2009 3:07 pm
Guest
On Oct 20, 4:59 am, Jasen Betts <ja... at (no spam) xnet.co.nz> wrote:
Quote:
On 2009-10-20, lmike <lmike3... at (no spam) gmail.com> wrote:

illegible ascii art deleted.



The assumptions:

--The client's add/get messages are randomly dispatched to server 1,
2, or 3. These servers run in parallel.
--Each server forwards its received add/get message to the Database,
and forwards the database response (success/failure) back to client.
--Add and get operations on the Database side are atomic to each
other.
--The serves can communicate to each other.
The question:

--can I design a solution to have each server maintain its own cache,
so that Whenever there is a "add" request from the client, the server
stores the data in its cache as well as into the database. And
whenever there is a "get" request from the client, the server will
first look into its cache, and if there is a hit, it will directly
return the data found in its cache to the client, but not forward the
"get" request to the database.

if an add request cannot invalidate cached get responses that could
work quite easily.

if it can you'll need to also forward the add to the other servers so
they can react apropriately (eg. drop the cache records that have been
invalidated)

Thanks for the response. I am thinking the there is a problem with a
"modify" operation, please see my post to DS.
 
David Schwartz...
Posted: Tue Oct 20, 2009 5:05 pm
Guest
On Oct 20, 6:05 pm, lmike <lmike3... at (no spam) gmail.com> wrote:

Quote:
The problem may be with the modify: when server 1 receives a "modify",
it can broadcast invalidate message to other servers. The issue here
is, a "get" right after the "modify" from the client may compete with
that "invalidate" message from server 1. If the "get" message hits
server 2 before the "invalidate" message coming from server 1, server
2 may find the (old) data in its cache and will return the (old) data
to client. The result in client's view is then wrong, i.e. he/she did
a "get" right after "modify", but got old data instead of the new
modified data.

If nothing forces the get to arrive after the modify, there is no
reason the get needs to get the modified data.

To use an analogy, suppose I send a letter to someone changing
something and you send them a letter asking for the current value.
Since neither of us did anything special to ensure our letter got
their first, regardless of which letter actually does get there first,
either the old value or the new value is correct.

It is only a violation of a guarantee if someone ensures things happen
in a particular order and you don't perform them in that order. You do
not have to respect an order that "happens to happen" by chance.

DS
 
jack...
Posted: Wed Oct 21, 2009 12:33 pm
Guest
lmike wrote:
Quote:
On Oct 19, 11:05 pm, David Schwartz <dav... at (no spam) webmaster.com> wrote:
On Oct 19, 7:34 pm, lmike <lmike3... at (no spam) gmail.com> wrote:

The question:
--can I design a solution to have each server maintain its own cache,
so that Whenever there is a "add" request from the client, the server
stores the data in its cache as well as into the database. And
whenever there is a "get" request from the client, the server will
first look into its cache, and if there is a hit, it will directly
return the data found in its cache to the client, but not forward the
"get" request to the database.
Yes, you can. But how will it will work depends upon your design
requirements.

The issue is how to design a synchromization mechanism between those
servers, so that the servers' caches are kept in synch.
I think that this might be a traditional parallel programming issue,
but I am not aware of the solution. Any one please know of a solution?
BTW, I'd like not to have one centralized "cache server" to cache the
data.
How often is data modified? How well is stale data tolerated? About
how many servers do you plan to have?


First of all, I am sorry that when I said "add" in my previous
message, I really meant "add" or "modify', so, there can be actually
three actions from the client, i.e.
add, modify, and get.

-the data is modified less frequently, and the amount of data is not
big. So, don't need to care about traffic and storage space, etc.

-the client is supposed to immediately see the modified data, if he/
she does a "get" right after a "modify"

-there will be less then 10 (say, 3) servers, each of them runs on a
separate box.


Do you expect the same server to use the same object over and over?


I am not quite sure what you meant by "object" here, but the add/
modify/get messages go randomly to those servers. All the servers
always connect to the same client and the same database.

If data is rarely modified, then each server can cache as much data as
it wants. When a server modifies data, have it broadcast an
"invalidate" to all other servers.

If data that's, say, up to a minute old is well-tolerated, then you
can cache data for only a minute.

If there aren't that many servers and the database is really
expensive, it may be better to broadcast a request to all servers and
let any server that has current data serve it out of its cache.

There are more algorithms for this than you can shake a stick at, and
the correct one depends on how much locality of reference you expect,
how many servers you have, your operation mix (particularly read/write
ratios), and how 'fresh' data needs to be.

DS

I am thinking that for the add operation, it probably works ok: once
the server gets the response from the database saying that the add is
a success, it will broadcast the data to other servers, for them to
add to their cache.

The problem may be with the modify: when server 1 receives a "modify",
it can broadcast invalidate message to other servers. The issue here
is, a "get" right after the "modify" from the client may compete with
that "invalidate" message from server 1. If the "get" message hits
server 2 before the "invalidate" message coming from server 1, server
2 may find the (old) data in its cache and will return the (old) data
to client. The result in client's view is then wrong, i.e. he/she did
a "get" right after "modify", but got old data instead of the new
modified data.

I think that such a race condition will always exist. I now doubt
there is a solution to this problem, given all the requirements as
said above. Am I correct?

If data is not modified often, and a small performance penalty on
add/modify is allowed, you could delay the 'ACK, data modified' from
server 1 back to the client until all other servers have seen and
acknowledged the invalidate broadcast back to 1. The client can not do
the 'get' request before it has seen the 'ACK, data modified' so it will
be guaranteed the modified data. This opens a whole new can of worms
when server 1 thinks 2 and 3 should be up, and 3 went down for whatever
reason, so it needs time-outs etc.

This would fix the case for a single client. When there are 2
independent clients, client A does a 'modify', and client B does a get,
the result that client B gets is still undefined - it may be the old
data, or it may be the new data.

J.
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Wed Dec 02, 2009 3:42 pm