12 May 2010

Recover Corrupted Archive Download with cURL

This post actually started in late April...

It never cross my mind till bad thing happened. I was start download a nostalgia "DBZ" game called Bid for Power 5.0 (a mod of open source quake 3) several days ago. For my 128Kbps connection it took several days to complete. Worse, the file is single 2.2 GB file of rar archive (luckily not an installer).

I have once downloading ISO DVD of CentOS 5 before and that time due to corruption I need to re-download dozen of individual packages and rebuild the ISO image file from scratch using cdrtfe (It really painful). So it's not surprising if Bid for Power also corrupted too. Initially I start to blame Free Downloader Manager for letting it happen since it has rollback feature which should prevent such thing occurred. Anyway, the download is multipart (2) mode across 2 different server source.



The idea is how to instruct download manager to repeat only the bad block (generally no much than 100KB). And that's why an open source program called cURL come to save.

With the sophisticated cURL, we could download a file by range aka "Give me only 1MB of middle part!". Now we need to know where exactly "middle" is, by technical term: the offset (in bytes) of a file. Here I assume everyone have an archive manager like 7-zip. We will test the archive repeatedly, but we need a tool that monitor filesystem's read/write activities that's filemon. After we know the range we will patch the corrupted part of file therefor we will use a hex editor that able to open very large file (I recommend freeware HxD).

download cURL 7.20 (526KB with OpenSSL 1.0, IPv6 and IDN support)

Step by step instructions:

1. Let your sluggish internet connection corrupt your bigass download.:))
2. Fire up filemon and set filter like this screenshot:
Click Ok to proceed
4. BackUp your corrupted file first just in case.
5. Open your corrupted archive with 7-zip and try extract it somewhere. In my case:

note:
This file blender25_24195.7z is 13,011,400 bytes in size. And has been compressed with 7zip solid mode. Therefor you'll see progress bar reached 100%, because 7zip will asume the same kind of files (listed) will also corrupt. Rar also support solid mode.

In case it's not solid mode, progress bar should display exact position (in percentage) where the file corrupt.

6. Now back on filemon which should record all 7z file related activities. In my case:

Notice the last offset there, that is the last time 7zip read the file and fail. It means between 3,912,010 and 1,048,576 (1MB) are located the bad block. It's PITA that while actually the bad block only less than 100KB we didn't know exactly where between that 1MB.

7. It's time for cURL now. cURL is very advanced command line programs that might confusing at first. But for this demonstration you don't need to know all kind of options or arguments. Just a few!  Lets start by copying curl.exe into Windows (just to make sure you can call it anywhere) then invoke a command prompt, either from Run > cmd or from explorer's shell extension. Once you greeted by C:\ prompt, In my case, start typing:

curl -# -r 3912010-4960586 -o bad_block_1.part http://www.something.com/file/blender25_24195.7z

change the url with your download url.

8. Curl will download the part of file that you're requested. After it done run HxD and open your corrupted file.
Press Ctrl-G to show Goto dialog, select dec, paste 3912010 on it. and click OK.

Now in my case the cursor should be at offset 3912010 (3BB14A in hex)







9. Open your part file into HxD:
Press Ctrl-A to select all then Ctrl-C to copy the bytes. We will overwrite the corrupt file with the fresh chunk.









10. Switch to first tab and press Ctrl-B (paste overwrite), the block that get overwritten will marked as red. You can now save it (Make sure you have closed 7zip manager that open the same file previously)

11. Now try to extract it again using 7zip, like our first step if everything fine you're done if it still error examine filemon output again for the last offset. and repeat the same task afterward.


Conclusion: this repetitive task might be tedious, but for me it's much better than waiting days and (probably) get corrupted again. Oddly why these stuff not already implemented as a feature to any (or at least IMO) download manager?

Other tips:
Using curl you also able to recover firefox's corrupted download or more precisely incomplete download. Yes, those *.part or any file that finished but actually incomplete. All you need to do is open command prompt at your download location and invoke this:

curl -# -C - -o "your-incomplete-file.part" "http://www.downloadsite.com/your-incomplete-file.mp3"
Once it complete you may rename the file into *.mp3 (for ex.)

Other notes:
If curl ever stopped (or too often) you should add
--retry 10
(10 times or at you option)
If curl always stuck at the same location (percentage) see this post. You may add proxy connection as follow
--proxy 127.0.0.1:8080
(toonel's default IP)
Newer filemon which called procmon is somewhat confusing and I didn't know if it will show offset or not. Therefor you best chance is stick with older filemon under Windows XP

1 comment:

  1. hum, I am very thankful for the load of information you have published here; trying to salvage some splitted files and all (what kind of ppl do that now a days anyway?!) I will have to re-read your post as I have not all my head right now and it is getting late on my local timezone. Just a side note as on Vista/7 the process monitor is just that: ProcessMonitor (to be fetched from the MSFT website anyway), although it differs and not having used the former version of the utility I found it confiusing never the less.

    Regards,

    Tam

    PS: 128k connection (ouch) I will pray the tech gods to provide a greater connection stream to your remote location. :p

    ReplyDelete