spearfoot / disk-burnin-and-testing Goto Github PK
View Code? Open in Web Editor NEWShell script for burn-in and testing of new or re-purposed drives
License: Other
Shell script for burn-in and testing of new or re-purposed drives
License: Other
I ran your script on an 8TB disk but it finished after just a couple of days. I discovered that badblocks had exited immediately with no apparent reason. It just said that it had finished and then the script continued with the next step.
When I ran badblocks myself I got the message
/dev/sdb is apparently in use by the system; it's not safe to run badblocks!
Apparently I had missed to unmount the drive. It would be nice if the script did a check if the device is mounted directly when you run it and warns you.
Thanks for a awsome project!
And as a result, skips execution of badblocks program.
WDC Green model: WDC_WD20EARX-00PASB0
I was reading https://www.reddit.com/r/freenas/comments/adgef1/slow_sequential_write_speed_new_8_disk_raidz2/ where they did not get the performance excepted due to one drive. This was not indicated in the SMART-test. Is there a need to do a performance test as part of your burnin and testing of new drives?
If there is a need I think such a test is within the scope of this script to early identify fault or degraded drives.
Which tool that could be used I do not know.
Thanks for the tool! If you cared to include in the readme:
fedora 33 workstation
WDC_WD140EDFZ-11A0VA0 (RED?)
I'm curious: why the badblocks test is performed 4 times with 4 different patterns? Is there an option of running a single pattern and if so, what would be the best pattern to use even it's not as rigorous?
This is a great script, it would be great to take some features from nwipe such as showing time remaining until completion, and a pdf showing what was done and what has passed/failed
Hi. I'm new to FreeNAS and setting up my first NAS. Been slowly working my way through figuring everything out.
I'm running your script as we speak on 1 drive of mine. It just returned this:
+-----------------------------------------------------------------------------
+ Run badblocks test on drive /dev/da0: Sat Jun 20 15:46:58 EDT 2020
+-----------------------------------------------------------------------------
Checking for bad blocks in read-write mode
From block 0 to 1465130645
Testing with pattern 0xaa: set_o_direct: Inappropriate ioctl for device
` 6.86% done, 31:09 elapsed. (0/0/0 errors)
I'm not sure what inappropriate ioctl
is, and this thing still has 9+ hours to go before it's done. Should I be concerned?
If it helps, the disk I'm running it on is a Seagate Enterprise NAS drive. 6 TB. SATA 3.0 (6 Gbps). Model number is ST6000NM0115-1YZ.
I don't think or expect you to help me with my particular device. Just wondering if you might share some insight?
Thanks!
I saw the note about the long testing times, and looked up expected times for badblocks on the disks I'm using (4TB). I found this useful answer on superuser, which mentioned that adjusting the value used for the "-c" flag made a big difference to the speed:
badblocks -svn /dev/sdb
To get to 1%: 1 Hour
To get to 10%: 8 hours 40 minutes
badblocks -svn -b 512 -c 32768 /dev/sda
To get to 1%: 35 Minutes
To get to 10%: 4 hours 10 minutes
badblocks -svn -b 512 -c 65536 /dev/sda
To get to 1%: 16 Minutes
To get to 10%: 2 hours 35 minutes
I naturally wondered if there's a downside to setting a higher "-c" value. Another helpful answer mentioned this:
The -c option corresponds to how many blocks should be checked at once. Batch reading/writing, basically. This option does not affect the integrity of your results, but it does affect the speed at which badblocks runs. badblocks will (optionally) write, then read, buffer, check, repeat for every N blocks as specified by -c. If -c is set too low, this will make your badblocks runs take much longer than ordinary, as queueing and processing a separate IO request incurs overhead, and the disk might also impose additional overhead per-request. If -c is set too high, badblocks might run out of memory. If this happens, badblocks will fail fairly quickly after it starts. Additional considerations here include parallel badblocks runs: if you're running badblocks against multiple partitions on the same disk (bad idea), or against multiple disks over the same IO channel, you'll probably want to tune -c to something sensibly high given the memory available to badblocks so that the parallel runs don't fight for IO bandwidth and can parallelize in a sane way.
I'm currently testing 6x 4TB disks and my memory use is under 300M, so that doesn't seem to be much of an issue. Is there another reason this option isn't used by the script?
First of all - thank you for publishing this little gem.
I'm new to this NAS game thing, and bought used disks(16x 2tb), and wanted to ensure that I know what I've got on my hands.
So I've made myself a bootable usb stick with ubuntu 18.04, ensured that all tools are available and fetched this script. It ran very fast at first, and I wondered, "large disks may take a long time", hmm what constitutes large disks?
Then I read the entire readme, carfeully, and lo and behold, hidden there in the middle, "disable dry run" shame on me for not RTFM. But bubling this to the top, would be very helpful for newcomers.
Lastly, I derived a "clever" method of running the tool for many disks(since I have some drives, and didn't want to sit and wait for it to finish).
ls /dev/sd[a-z] | cut -d'/' -f3 | sudo parallel -I{} ./wrapper.sh {}
# Wrapper contains this:
#!/bin/bash -xe
./disk-burnin.sh ${1} > logs/${1}.log
What I'm in doubt about then is, is this a good method? Does the parallel running degrade performance or in any way prevent a valid test? I Know this also tries to test my cd drive on /dev/sdr
but hey, worst-case it fails :-)
From this, I also feel that it would be nice if the script accepted a full device path, rather than a device name-ish - eg to me it would be more logical to look in /dev/disk/by-path/
to figure out which disks to test.
I would be more than happy to submit a PR with these changes, I just didn't want to do too much without understanding what I'm actually doing.
EDIT, More questions:
It seems the polling logic is not working with version smartmontools release 6.6 dated 2016-05-07 at 11:17:46 UTC
, due to a changed output format(this might be ubuntu 18.04 related). Also, in the mentioned version there is an option to do the task in the foreground, is there a particular reason to not doing this?(maybe because it didn't exist)
So in summary, the questions are:
I hope this is at least somewhat helpful feedback. :-)
/Nwillems
Pro:
Contra:
I have been doing this for years without issues btw, ref: https://github.com/ypid/scripts/blob/master/badblocks_and_secure_erase
It appears the issue is relating to weak logic surrounding SAS models. However, more test cases should be provided to confirm if this is indeed a protocol difference, or a difference in how manufacturers report SMART data.
It appears the script is incorrectly parsing smartctl results, as the script reports the following:
but sudo smartctl --all /dev/sda
clearly shows the expected data
Expected behavior:
Correctly parse the results of smartctl so the script can function accordingly.
Hi, thank you so much for writing this little script! There were a few issues I had with running it on a new 8 TB WD WD80EZAZ-11TDBA0 in a My Book external hard drive enclosure.
First and foremost, running the script without any modification returned "Please specify device type with the -d option." After a bit of Googling, I found a post from 2014 on https://bugs.freedesktop.org/show_bug.cgi?id=79379 that led to me the solution: adding -d sat
after every instance of smartctl
in the code. It was quick and dirty, but it worked. I don't think that this can be directly implemented into the script because it may cause breakage for others, but I did want to post it somewhere where others can find it if they run into the same issue as I did. This looks to be a problem with smartctl not automatically recognizing the connector.
I also have some other suggestions for the readme file. Since root privileges were required when I ran the script, it might be useful to let people know that they can run the script in a single line on the terminal as sudo bash ./disk-burnin.sh sdX
. Secondly, since the user does need to set the Dry_Run variable to 0, it might also be helpful to bold the line "The script is distributed with 'dry runs' enabled, so you will need to edit the Dry_Run variable, setting it to 0, in order to actually perform tests on drives." or potentially even have that echoed whenever a user tries to run the script. (I'm not an IT guy or programmer by trade, so I know that modifying scripts is something that might trip newcomers up.)
Thanks for your help with writing this script and making it available to others!
No check for availability of smartmontools. When smartmontools aren't installed running script gives:
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 263: smartctl: not found
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 264: smartctl: not found
[2020-10-05 09:49:59 UTC] +-----------------------------------------------------------------------------
[2020-10-05 09:49:59 UTC] + Started burn-in
[2020-10-05 09:49:59 UTC] +-----------------------------------------------------------------------------
[2020-10-05 09:49:59 UTC] Host:ubuntu-server
[2020-10-05 09:49:59 UTC] OS Flavor: Linux
[2020-10-05 09:49:59 UTC] Drive: /dev/sdc
[2020-10-05 09:49:59 UTC] Disk Type: non-mechanical
[2020-10-05 09:49:59 UTC] Drive Model:
[2020-10-05 09:49:59 UTC] Serial Number:
[2020-10-05 09:49:59 UTC] Short test duration: minutes
[2020-10-05 09:49:59 UTC] 0 seconds
[2020-10-05 09:49:59 UTC] Extended test duration: minutes
[2020-10-05 09:49:59 UTC] 0 seconds
[2020-10-05 09:49:59 UTC] Log file:/home/rakoczy/diskc/burnin-.log
[2020-10-05 09:49:59 UTC] Bad blocks file:/home/rakoczy/diskc/burnin-.bb
[2020-10-05 09:49:59 UTC] +-----------------------------------------------------------------------------
[2020-10-05 09:49:59 UTC] + Running SMART short test
[2020-10-05 09:49:59 UTC] +-----------------------------------------------------------------------------
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 1: eval: smartctl: not found
[2020-10-05 09:49:59 UTC] SMART short test started, awaiting completion for 0 seconds ...
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 483: eval: smartctl: not found
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 490: eval: smartctl: not found
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 483: eval: smartctl: not found
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 490: eval: smartctl: not found
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 483: eval: smartctl: not found
scripts/disk-burnin-and-testing-master/disk-burnin.sh: 490: eval: smartctl: not found
^C
Hello, thanks for this script and the write up on your blog.
I'm running your script on a new disk under FreeBSD after having 1 of 3 new disks fail on me. (I'm a total noob, and the last thing I expected (trying to rescue my raid from near death) was a problem with the new disk!)
Anyway, I'm now running a burn in on the RMA'd replacement, but I forgot to execute this first:
sysctl kern.geom.debugflags=0x10
Should I now:
I read somewhere that after you've set this kernel flag you should un-set it again later (e.g. reboot) to avoid 'problems'... (Note that my pool is currently online (DEGRADED but backed up), as I'm using the freenas box itself to burn in the new disk).
Sorry for the noobs and thanks for any advice,
Dan.
I found a good manual for burn-in testing on reddit:
In steps 3-5, he also makes an additional check with ZFS and f3write.
Would it make sense to add these steps in your script, too?
When trying to run this script on some 18TB drives backblocks threw the following error:
badblocks: Value too large for defined data type invalid end block (4394582016): must be 32-bit value
It seems like this is most likely due to the block size not being big enough for drives of this size. Can we get a dynamic block size based on drive size or another command line parameter to set this manually if we choose?
Hi, I came across this script and it's incredibly helpful, so thanks! One question I have is why you opted to change the blocksize flag to -b 8192
rather than, say, double the blocks written using -c
?
I found running badblocks
with the option -b 4096
was writing at around 25M/s which would have resulted in my 8T drive completing after 16 days. By modifying the call to badblocks to use -b 4096 -c 128
(double the default), I did see an almost double increase in write speeds. I didn't fancy going higher just to avoid any potential issues with badblocks misreporting anything, but figured there must be a sweet spot somewhere for larger drives?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.