Topic: 请教大家一个技术问题, UNIX系统下, 如何写一个C程序, 能够在异常中断后程序重启时自动恢复文件读写指针(即接着上次中断的断点接着读写). @枫下论坛 The Rolia Forum

工作学习 / IT技术讨论 / 请教大家一个技术问题, UNIX系统下, 如何写一个C程序, 能够在异常中断后程序重启时自动恢复文件读写指针(即接着上次中断的断点接着读写). -albxu(Yukon); 2001-4-11 (#44057@0)

What kind of abnormal trap? Power-off? Killed by a signal? -marcow(UserMarco); 2001-4-11 (#44105@0)

You may think it's been killed or just core dumped. -albxu(Yukon); 2001-4-12 (#44334@0)

Be killed and core dump is two stories -marcow(UserMarco); 2001-4-12 {200} (#44371@0)
You can write signal handler to save current position of file
pointer when your process catch kill signal;

I don't think you can deal with core dump easily. This is a
transaction issue, I think.

I had the same thought as yours but just not sure if there is a final solution. -albxu(Yukon); 2001-4-12 (#44424@0)
wrong, the SIGKILL signal can never be caught or ignored. The best you can do is to call atexit(), but I doubt if it works for default signal handler -lumlumq(lumlum); 2001-4-12 (#44828@0)

You are right. It should be singal instead KILL singal. -marcow(UserMarco); 2001-4-12 (#44865@0)

一个很stupid的solution：lock file？ -pasu(InTheSky); 2001-4-11 (#44128@0)

But how could you re-open the same file? It was a quiz in my previous interview two years ago by a Waterloo guy. I don't have the answer right now. Hope someone has the smart idea. People may be asked the same question in their interviews. -albxu(Yukon); 2001-4-12 (#44330@0)

Which file did you mean? Sorry, I am not good at programming...just blind guess...hehe. -pasu(InTheSky); 2001-4-12 (#44420@0)

any file that can be read and write. -albxu(Yukon); 2001-4-12 (#44426@0)

?? I mean what's the "same file" you mentioned. You mean you cannot locate the file you once write to it or you cannot write to it any more? -pasu(InTheSky); 2001-4-12 (#44467@0)

It's 接着上次中断的断点接着读写. The file is always accessible. -albxu(Yukon); 2001-4-12 (#44521@0)

Then why could'nt you re-open the same file? I am really puzzled. -pasu(InTheSky); 2001-4-12 (#44525@0)

Then how do you know where you stopped last time. This is the point. Remember you program stopped abnormally, your work in the file was half done. -albxu(Yukon); 2001-4-12 (#44539@0)

a huge topic -lumlumq(lumlum); 2001-4-12 {2457} (#44824@0)
本文发表在 rolia.net 枫下论坛There are two different scenarios that you might have to consider.
The first one is that you only want to protect data from abnormal termination of the application process. In this scenario, the cause of the abnormal termination are due to errors occured in your process space, such as coding error, and insufficient resources, the operating
system as a whole is still healty. To solve the problem, you can have a
very simple deamon process running in the backgroud, this deamon process forks your application process as its child. This deamon process opens all the relevent fds before fork(). If the child application process is terminated somehow, either normally or abnormally, the deamon process will receive a SIGCHLD signal. The deamon process should call waitpid() function to recycle the dead child process and grab the exit status. If the exit status is 0, that means the application process has successfuly exited and the deamon process can safely exit too. However, if the exit status is not zero, something bad has happened to the child process, and the deamon process should fork a new child process in order to finish the rest of the job. The essence here is that
child process and the parent process SHARE the same kernel file pointer, if the child process advances it's file pointer, the corresponding file pointer in the parent process gets advanced too, vice versa. In Unix, there is only one kernel file descriptor for each active open() call, if you fork a child process, the parent process and child process shre the same kernel file pointer, although they keep separate versions of file descriptor in their respective process (PCB).

The second scenario is that you want to guard the whole system against the possible power failure, system panic, etc. It's not easy to come up with a very effective approach in this case, because you have to write some information out on to the hard disk, which is a very slow device. The risk of crash when you are writing onto the disk is really big, so there is no gurantee of "atomic transaction". A rough protection which can increase the chance of recovery would be to write your current position after(or before, it doesn't matter, both are unsafe) the actual read or write
operation. The current position can be the return value of the function
lseek(fd, 0, SEEK_CUR). Then, when you restart the program, it can restore the file pointer by calling lseek again.更多精彩文章及讨论，请光临枫下论坛 rolia.net

Thank you, Lumlum. This is the kind of anwser I've been waiting for. You're really good. -albxu(Yukon); 2001-4-12 (#44831@0)

"lumlum" is very happy to hear that, xixi,,, ^_^ -lumlumq(lumlum); 2001-4-12 (#44854@0)

You two are using the same account. hehe. I get you! -pasu(InTheSky); 2001-4-12 (#44898@0)

lumlum is really high hand. Admire you. But I don't think the access cannot be an atomic transaction. -pasu(InTheSky); 2001-4-12 {169} (#44846@0)
In my uderstanding, atomic transaction means get all done or get nothing done. The process can just restart at the last logged point. Then it's atomic. Do you think so?

I guess you are talking about DBMS -lumlumq(lumlum); 2001-4-12 {318} (#44945@0)
your concept of atomic transaction is correct. But your atomic transaction is a solution to concurrency control, not to database recovery. The atomic transaction you referred to can guarantee that no two transactions
interleave each other, thus prevent "lost update" problem, but it doesn't help a failed system.

In my uderstanding, I should make the writing an atomic operation. -pasu(InTheSky); 2001-4-12 {660} (#44559@0)
So I should use some lock, or say log. And I should put this lock on the disk because after the process die there is nothing leave in memory. So I will try to write a log which content is the position I wrote in the working file. Every time before I write something to the working file, I will write the position into the log file. After the working file is closed, the log file should be empty. And every time when the program restart, it will check the log file first. If there is something in the log file, pick up the last position and begin the writing from the next position.

In fact what I said is something like the logical log in RDBMS. Is it ok?

Yes. It works. I didn't think of it at the first place. I went to think about the i-node structure to dig out what could help me. -albxu(Yukon); 2001-4-12 (#44733@0)

哈哈，看来三脚猫有三脚猫的好处啊！懂得深的反而一头栽进去了，hoho。 -pasu(InTheSky); 2001-4-12 (#44753@0)

no, it won't completely work, it can only increase your chance to recover data. -lumlumq(lumlum); 2001-4-12 {249} (#44835@0)
What if you have written to the log file the current position, and the power is off. In this case the process hasn't really done any read or write operation yet, the outcome is that your log file has recorded the change that hadn't really ocurred.

yes. you are right. let's write the log after write the working file. -pasu(InTheSky); 2001-4-12 (#44900@0)

Then what if the system crashes before you have time to write the log, but after you have successfuly done read or write operation? -lumlumq(lumlum); 2001-4-12 (#44911@0)

Just do it again according to the log. It's the way RDBMS works in nowadays. -pasu(InTheSky); 2001-4-13 (#45320@0)

我觉得你还是没有把问题阐述清楚 -flying_snow(飞雪浮冰); 2001-4-12 {301} (#44743@0)
文件顺序读写本来就是可以接在原来的文件后面继续写的。一般来说系统先将内容在内存BUFFER，一定时间以后再写磁盘，或者在文件关闭的时候写，当然如果用 int fflush( FILE *stream ) 可以强制写磁盘，那么你的文件断点总是在最后处，下次打开直接接着写就是了。总之如果你能把情况说的更清楚一些，我们就可以对症下药了。:-)

This was not my question. I was asked this question before. It all depends on how you answer this question. Your anwser may get an excellent score. It's all up to the interviewer. -albxu(Yukon); 2001-4-12 (#44751@0)

Yeah, the guy just wants to dish out how well you understand the operating system. -lumlumq(lumlum); 2001-4-12 (#44949@0)

不一定。比如数据库，一般都是先申请一片空间，然后再在其中写内容。这样的话，断点就不会在最后了。 -pasu(InTheSky); 2001-4-12 (#44757@0)