sersync主要用于服务器同步,web镜像等功能

Introduce: http://code.google.com/p/sersync/

sersync主要用于服务器同步,web镜像等功能。基于boost1.41.0,inotify api,rsync command.开发。目前使用的比较多的同步解决方案是inotify-tools+rsync ,另外一个是google开源项目Openduckbill(依赖于inotify- tools),这两个都是基于脚本语言编写的。相比较上面两个项目,本项目优点是:

  1. sersync是使用c++编写,而且对linux系统文件系统产生的临时文件和重复的文件操作进行过滤(详细见附录,这个过滤脚本程序没有实现),所以在结合rsync同步的时候,节省了运行时耗和网络资源。因此更快。
  2. 相比较上面两个项目,sersync配置起来很简单,其中bin目录下已经有基本上静态编译的2进制文件,配合bin目录下的xml配置文件直接使用即可。
  3. 另外本项目相比较其他脚本开源项目,使用多线程进行同步,尤其在同步较大文件时,能够保证多个服务器实时保持同步状态。
  4. 本项目有出错处理机制,通过失败队列对出错的文件重新同步,如果仍旧失败,则每10个小时对同步失败的文件重新同步。
  5. 本项目自带crontab功能,只需在 xml配置文件中开启,即可按您的要求,隔一段时间整体同步一次。无需再额外配置crontab功能。
  6. 本项目socket与http插件扩展,满足您二次开发的需要。

sersync is mainly used in server synchronization and web mirroring, developed Based on boost1.41.0, inotify api, rsync command. Currently, common synchronous solutions use inotify-tools + rsync,or use google open source project Openduckbill (depends on inotify-tools), which are both written in script languages. Compared to the above two projects, this project has following benefits :

  1. Sersync is developed by c ++, it can filter lots of temporary files and duplicated inotify events generated by linux file system(for details see the appendix, the filter function is difficult for script program to implement),so it can save more time and network resources.
  2. Compare the above two projects, sersync is easy to use.You can use the binary executable file and the xml configuration file directly,both of them are in the bin directory.
  3. Compared to other script programs,sersync uses multiple threads to synchronize , especially used in synchronizing large files, it can ensure multiple servers to keep synchronization in real-time.
  4. The project has an error handling mechanism,add all the failed event to failure queue and try again, if still fails, sersync will retry every 10 hours until it is successfully synchronized .
  5. The project has crontab function.using the xml configuration file, you can according to your requirements, sync the whole monitor directory from time to time.
  6. The project has socket and http plugin extensions which can meet your secondary development needs.

Design Frame

Compile

如果使用linux 系统,一般情况下,不需要编译,直接使用bin目录下的可执行文件和xml配置文件即可。 src目录下放的是源码

lib目录下是需要的静态库

bin目录是最后生成的二进制文件

在sersync 目录下执行make命令,就会将生成的2进制文件放入bin目录下。


if you are using linux, under normal circumstances, do not need to compile, directly use the executable file and the xml configuration file under the bin directory

src directory: the source files.

include directory: the boost head files(1.41.0).

lib directory: static libraries.

bin directory: binary executable file and xml configuration file.

You can execute make command in sersync directory which will generate binary file into the bin directory.

Install

Config Rsync before install

注意在使用前,需要对每台服务器都配置并开启rsync守护进程。通常配置方法如下:

Before use, you need to config rsync and open the rsync daemon on each server. Normally configured as follows:

vi  /etc/rsyncd.conf
uid=root
gid
=root
max connections
=36000
use chroot=no
log file
=/var/log/rsyncd.log
pid file
=/var/run/rsyncd.pid
lock file=/var/run/rsyncd.lock


tongbu
path
=/opt/tongbu
comment
= xoyo video files
ignore errors
read only
= no
hosts allow
= 192.168.8.40/26 192.168.138.94/24
hosts deny
= *

配置参数详情,请google.

Configuration parameters, please google.

然后在需要同步的各服务器上开启 rsync守护进程:

Then open rsync daemon on each server:

rsync --daemon

Install sersync

由于大部分库都是静态编译的,所以在被监控服务器上,修改好配置文件后,直接执行./sersync2.1 即可。

Since most libraries are statically compiled, after modifying the configuration file, you can directly execute ./sersync2.1 on the server being monitored.

tar zxvf sersync2.1.tar.gz

cd sersync

在使用前,填写配置文件:

before using , write the xml configuration file

vi confxml.xml

根据使用插件和功能的不同,需要修改配置文件的不同部分:

According to the use of different plug-ins and features you need to modify the different parts of the configuration file as follows:

synchronization funciton config

只需修改 sersync标签下的内容如下即可:

Just modify the sersync tab as follows:

 <sersync>

 
<localpath watch="/opt/tongbu">

   
<remote ip="192.168.8.42" name="tongbu"/>

   
<remote ip="192.168.8.39" name="tongbu"/>

 
</localpath>

 
<crontab start="true" schedule="30"/>

 
<plugin name="refreshCDN" start="true"/>

 
</sersync>

其中,localpath标签的watch填写需同步的本地路径,remote标签用来填写要同步的远程主机ip与模块名称。crontab功能如果将 start标签置为true,可以通过设置schedule属性来制定多长时间对监控目录彻底同步一次。

the watch attribute of localpath tag identify the local path which required synchronization, remote tag need to fill in the remote host ip and rsync module name. if set the crontab tag’s start label to true, the crontab will work. you can set the schedule properties to formulate how many minutes to synchronize the monitor directory wholly once.

plugin config

如上面的xml所示,其中plugin标签是在同步过程中,使用插件。当前使用的是刷新cdn插件”refreshCDN”,

  <plugin name="refreshCDN" start="true"/>

即在同步过程中,将文件发送到目的服务器后,根据规则刷新cdn接口。如果不想使用,则将 start属性设为false即可。如果需要使用其他插件,则查看其他 plugin标签,将插件名称改为xml中其它插件的名称即可。目前支持的有refreshCDN socket http三个插件。socket插件是向指定ip和端口发送inotify事件,http插件是发送(post)Inotify事件到指定域名。

刷新CDN插件配置:

该插件根据 chinaCDN的协议,进行设计,当有文件产生的时候,就向cdn接口发送需要刷新的路径位置。刷新CDN插件需要配置的xml文件如下:

 <plugin name="refreshCDN">

-
<localpath watch="/data0/htdocs/cms.xoyo.com/site/">

 
<cdninfo domainname="ccms.chinacache.com" port="80" username="yourname" passwd="yourpasswd" />

 
<sendurl base="http://pic.xoyo.com/cms" />

 
<regexurl regex="false" match="cms.xoyo.com/site([/a-zA-Z0-9]*).xoyo.com/images" />

 
</localpath>

 
</plugin>

其中 localpath watch 是需要监控的目录,将会覆盖sersync中的监控目录。

cdnifo标签指定了cdn接口的域名,端口号,以及用户名与密码。

sendurl标签是需要刷新的url的前缀。

regexurl标签中的,regex属性为true时候,使用match属性的正则语句匹配 inotify 返回的路径信息,并将正则匹配到的部分作为url一部分,

举例,如果产生文件事件为:

/data0/htdoc/cms.xoyo.com/site/jx3.xoyo.com/image/a/123.txt

经过上面的 match正则匹配后,最后刷新的路径是:

http://pic.xoyo.com/cms/jx3/a/123.txt;

如果regex属性为false,最后刷新的路径是:

http://pic.xoyo.com/cms/jx3.xoyo.com/images/a/123.txt;

socket 与 http 接口很简单,填写xml文件即可。


refresh CDN plugin config:

this plugin is designed according to chinaCDN protocol , when there are inotify event arise, will send to china cdn the transfored path information.refreshCDN plugin need config the xml as follows:

 <plugin name="refreshCDN">

-
<localpath watch="/data0/htdocs/cms.xoyo.com/site/">

 
<cdninfo domainname="ccms.chinacache.com" port="80" username="yourname" passwd="yourpasswd" />

 
<sendurl base="http://pic.xoyo.com/cms" />

 
<regexurl regex="false" match="cms.xoyo.com/site([/a-zA-Z0-9]*).xoyo.com/images" />

 
</localpath>

 
</plugin>

As shown in the above xml, where plugin tag is used to call plug-ins,during the synchronization . Currently using refresh cdn plug-in “refreshCDN”,

  <plugin name="refreshCDN" start="true"/>

that is, during the synchronization process, after synchronize the file to the remote server, will call refreshCDN plugin to refresh the china CDN(transform inotify event to china cdn interface protocol). If you do not want to use plugin function, set start attribute to false. If you need to use other plug-ins, see other plugin labels, change the plugin name to other plugin tag name,like socket or http. There are currently supports three plug-ins refreshCDN socket http. socket plug-in is to send inotify events to specify ip and port, http plug-in is to send (post) Inotify event to the specified domain name.

the watch attribute of localpath tag is the path being monitored,this will overlap the localpath tag in the sersync.

cdnifo tag specify the china cdn interface domain,port number,username and password,which are needed by china cdn.

sendurl tag is the url prefix

in the regexurl tag,if “regex” attribute is true,use the string in the “match” attribute to regular match the path send by inotify,use the string matched as part of the url which will be sent to china cdn.

for example,if inotify event is :

/data0/htdoc/cms.xoyo.com/site/jx3.xoyo.com/image/a/123.txt

after regular match, the path to send to cdn interface is :

http://pic.xoyo.com/cms/jx3/a/123.txt;

if regex attribute is set to false ,the path send to cdn interface is:

http://pic.xoyo.com/cms/jx3.xoyo.com/images/a/123.txt;

socket and http plugin is very simple,just specify the info in the xml.

execute

synchronize or synchronize + plugin

查看帮助文件 (see the help)

./sersync2.1 -h 

在同步程序开启前对整个路径与远程服务器整体同步一遍 (before the synchronization program working, rsync the whole monitor directory to remote server once)

./sersync2.1 -r 

开启守护进程模式,在后台运行(Open the daemon mode, running in the background)

./sersync2.1 -d

指定配置文件名,如果配置文件名称不是 confxml.xml请使用’-o xxxxx.xml'(Specify the configuration file name, if the configuration file name is not confxml.xml use the ‘-o xxxxx.xml’)

./sersync2.1 -o

指定同步守护线程数量,默认为10个,适用于现在的4核服务器。如果需增加或减少使用 ‘-n 数量’.(Specify the number of simultaneous daemon thread, the default is 10, applicable to the present four-core server. If you need to increase or reduce,please use ‘./sersync2.1 -n + num’ to run)

./sersync2.1 -n

通常使用的方法是 : Commonly use following command to execute:

./sersync2.1 -d -r

run the plugin Only

可以不调用同步程序,当有inotify事件时候,不执行同步程序,只调用插件,形式如下:

when the inotify event arrive, you can run the plugin only, in this way ,the synchronization will not work.execute as follows:

run refresh cdn plugin only :

./sersync2.1 -d  -m  refreshCDN

run socket plugin only:

socket模块,开启该模块,则向指定ip与端口发送inotify所产生的文件路径信息:

./sersync2.1  -d -m socket

run Http plugin only

http模块接口,可以向指定域名的主机post,inotify监控的事件:

./sersync2.1 -d -m http

log file description

在执行的过程中,会产生rsync_fail_log.sh文件,在同步的过程中,如果需要同步的文件失败了会先进重传队列,如果再次失败将被记录在 rsync_fail_log.sh文件,该文件会每10个小时自动执行一次,并被重新清空。如果调用刷新cdn接口,执行过程会产生 error.log文件,记录从cdn接收到的信息,并且记录刷新的路径。

In the implementation process, will produce rsync_fail_log.sh files, which record the file fail to rsync,the rsync_fail_log.sh will execute every 10 hours,and will be cleared after executed.If you call refresh cdn plugin, the implementation process will produce error.log file, records the information received from the cdn, and record the path refreshed.

appendix

the script program can not filter the inotify event that we don’t use. if we vi a file “test”,will generate inotify events as follows:

in my progam ,only add one event to the queue after the filter,so it can save lots of synchronization time and network resource. detail info see the blog:

http://hi.baidu.com/johntech/blog/item/e4a31a3db1ee1ce755e723f4.html

为什么脚本监控效率低? 因为脚本监控,即使使用–exclude正则语法也无法过滤掉一些文件系统产生的临时文件和临时事件, 造成rsync反复执行,详细文章如下:

http://hi.baidu.com/johntech/blog/item/e4a31a3db1ee1ce755e723f4.html

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

sersync2.5beta1

1.修改sersync2.4中,如果使用–password-file带密码同步,-r 没有调用 –password-file,对源与目的整体同步一遍bug

2.将启动提示基本改为英文。

3.可以使用delete标签指定是否对本地的 delete事件进行监控。

4. rsync.fail.log.sh 会记录调用rsync错误号,便于以后改进。

5.改掉本地host ip 与远程 remote ip 相同时候,不进行同步的限制。现在可以将本地与远程都设为127.0.0.1进行本机同步了,无须把其中一个设置为localhost.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

sersync2.4

修正重大 bug,当删除一次文件后,无法同步的错误,这个错误在低版本的sersync都存在,抱歉。

增加debug功能,可以开启debug看同步的文件

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

sersync2.3

修正同步文件名中存在’$’ ‘(‘ ‘)’等需要转义字符时发生错误的情况。

增加密码同步功能。

增加文件过滤功能,可以自定义任意条过滤规则。

诸如 要过滤*.php 或者dirname1/*之类的文件,只需在配置文件中填写该正则表达即可。

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

sersync2.2

去除curl库,基本上编译只依赖标准库,暂时去除http插件的功能。