DiffSync was created for transfer of large quantities of
files across a bandwidth limited medium.
It is FTP on steroids.
Where I work, we have a huge database of files for our buildtree
(10,200 files that comprise 450Megs of storage). For the most part, it is easiest to build with every file at it’s
most recent. This would usually mean
scraping the previous tree and starting from the beginning. Over VPN, this could mean at least 4 hours
just downloading the files.
Actually, I did a little experiment. Using my LAN, I put all the files on one
machine, then ftp’d them across to another machine – 2 hours. I then tried zipping the files up first,
then sending them across, then unzipping them – 25 minutes. Wow!
The real problem is the folder by folder “get” it must do. Just moving from to each folder must be most
of the time.
The more I thought about it, the more I thought this must be
the situation for most developers in a good build framework. If not, then we seriously need to get our’s
Currently, the first implementation is under
construction. It is truncating files,
and skipping others… so it’s far from finished. But I thought I should find a good place to start and get some
The basic algorithm is:
adds appropriate shares
connects to server
queries server for it’s repository (the files it wants to share)
sends an xml representation of the files it’s sharing (the
client diffs this FileInfoRepository with some folder (maybe it already
has an older version of some files).
This produces a new fileInfoRepository of all files that are newer
than the client’s current version.
This keeps the client from having to get files that it already has
(or has newer version of).
client sends this FileInfoRepository to the server.
server compresses the requested Files and sends them back to the client.
client unzips the files into the appropriate place.
From a higher level:
starts DiffSync (although each client is also a server, the first one must
be just the server, as it cannot connect to anyone else) that becomes the
server. This is usually a server
that everyone can get to (not behind a firewall or accessible through a
firewall). In my case, this is my
then sets the appropriate shares, if necessary.
user can start instances of DiffSync and connect to the server (these are
my work repositories).
yet implemented Further, it can initiate a request through an
instance. In my case, let’s say
I’m on my other home machine (not the server). I can start DiffSync on the 2nd machine and
connect to the server. Then I can
query for a list of connections from the server. A better analogy: A is connected to B. C connects to A. C queries A for its connections. C then can try to connect to B.
yet implemented Upon
disconnect, it will try to poll (once every 5 minutes) main server (the
first server that it attached to) for 24 hours. Not sure how this will affect authentication…
authentication until first “get?”
In my case, now I can get all my files from work I need…
basically from anywhere. Using another
DiffSync, I can connect to the “server” (my home machine) and get to any work
It also has some optimizations.
the “GetFiles” process, the client begins 3 threads. The idea was that one would be sending
data, while one receives data, and one determines the next “chunk” of the
fileInfoRepository. The threads
have are synchronized so that no two are in the same process.
“chunk” is limited so that the process isn’t “stuck” in one of the above
states. This allows the server to
begin sending before all the files are compressed. Also, it keeps the client/server from
taking up too many resources!
now, it’s mostly a public read-only folder! There should be some FTP “rights” added. Include integration with Server’s
must be encrypted.
files should probably be encrypted as well as compressed during transfer.
for servers to share more than a directory (although, it will share the
entire directory tree)
is a limitation for ease of use… I need one folder, so it uses one
interfaces to Source Management Software (version control)
should be implemented to interface with all those crappy source
management frameworks, like SourceSafe, CVS, etc.. Getting a buildtree from one of those
WinDiff like interface so that you only get the files that you want.
after the diff, you get two trees.. the new and the old? Maybe be just like windiff and tree it
out. I dunno… that where the suggestions part comes
file restrictions to the GetLatest
*.c, *.cpp, etc.. so you only get
the files you are concerned about.
Not sure how that would work with the WinDiff stuff.
more ease-of-use features
first use, it should ask for
Port number of server
If client apps can be downloadable
I envision the app accepting HTTP requests and returning the
pre-configured app (it will automatically connect to the server from which it
What kind of security it will use
to initiate a connection from a another server
maybe bridge connections? A
connects to B. Determines B is
connected to C. So, A tries to
connect to C but it can’t connect because both A & C are behind
VPN. So B can become the bridge
between A & C. Just an idea.
Things TBD before I put the source on the Sourceforge
VB source to C#
started using VB as my UI, and C# as my object code. It really is unnecessary to have two
- Fix sendingConnection/listeningConnection
now I open two connections to send and receive. This was definitely not the optimal method for solving this
problem. I will combine these two
objects and create a better thread-remoting interface.