Complete SSD failure: Dell and HPE release firmware against 40K hour bug

Published by

Click here to post a comment for Complete SSD failure: Dell and HPE release firmware against 40K hour bug on our message forum
https://forums.guru3d.com/data/avatars/m/132/132389.jpg
Denial:

Seems like a weird way to obsolete a subset of products. Seems more like it might just be a bug.
The thing is, it's ridiculously specific and also odd, to be a bug. Someone had to write something very specific for that to happen.
https://forums.guru3d.com/data/avatars/m/156/156133.jpg
Moderator
Sooooo a few us should take a break from posting in here...And calm down a little too.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
Neo Cyrus:

The thing is, it's ridiculously specific and also odd, to be a bug. Someone had to write something very specific for that to happen.
I don't get why that has to be the case. Various parts of the system firmware use the drive time. For example garbage collection and various other algorithms to optimize the drive all occur at various times of the drives life. Most enterprise drives are designed to fail the entire drive if it detects an issue - so any issue involving these algorithms and the drive time could cause the failed drive.
Ne1l:

This 'bug' appeared last year too.. when is a bug a feature? https://blocksandfiles.com/2019/11/25/hpe-issues-firmware-fix-to-to-stop-ssd-failure/
I mean yeah if you bothered to read the article it mentions it.
https://forums.guru3d.com/data/avatars/m/273/273678.jpg
ugh, its just an oversight, probably through an intern submitting unchecked code.
https://forums.guru3d.com/data/avatars/m/233/233786.jpg
"I mean yeah if you bothered to read the article it mentions it" I did read it, last years does look like a bug but once bitten twice shy? Wouldn't they specifically double check this when they create future FW and order SSD's? Any I'm outta here, mods appear to let HeavyHemi act like his avatar and abuse people, call them liars and clearly contradict himself while doing so "You should learn how to counter a fact based logical argument || I am not making an argument"
data/avatar/default/avatar01.webp
@Aura89 First they had it 'after precisely 32,768 hours of usage'.... now after 40000h (soo stupid/anoying that companies want a 5year-warranty)... (https://www.guru3d.com/news-story/hp-enterprise-ssd-usersplease-check-and-update-firmware-(before-a-kill-switch-kicks-in).html) One point is, that a SSD (SLC/MLC) can last veeery long, if it's only modestly hammered with writecycles... A 'bad' example is my Crucial C300 256GB, that is having it's 10th anniversary this year. This is only a consumer SSD, which had a 3 years warranty and was running for the last 9.6 years as Window 7 systemdrive with no issues whatsoever... Another 'bad example' is the 850 Pro lineup from Samsung, that has a 10 year warranty. I personally own 4 drives (2 x 512GB, 1TB & 2TB) They testet two of the 256GB models at heise.de - the 'weak' one died after 2.2 PB and the 'good' one after 9.1 Petabyte(!!!)! Here's the link: https://www.heise.de/newsticker/meldung/SSD-Langzeittest-beendet-Exitus-bei-9-1-Petabyte-3755009.html And those HP drives are enterprise-level drives... they come with a way higher write endurance than consumerdrives... The company wants to sell (harddisks & much more), but when you suddenly have customers, that order more than a 2 or 3-year warranty, than such a 'bug' must be 'fixed'... which leads me to the question, why was there such a bug at all? And why didn't they remove the 32768h-'bug' completely, but only (!)expanded(!) the lifecycle to 40000h which is called a 'bug' again (because 5 years are 43480h) ???. Or do you sell those drives?!? - just open your eyes! For me personally, the propability of beeing a conicidentally 'bug' is as high as the propability, that big companies like hp or even bigger ones are caring for the customer only & not for themselves. Just my 2 cents...
https://forums.guru3d.com/data/avatars/m/233/233786.jpg
DG21:

@Aura89 The company wants to sell (harddisks & much more), but when you suddenly have customers, that order more than a 2 or 3-year warranty, than such a 'bug' must be 'fixed'... which leads me to the question, why was there such a bug at all? For me personally, the propability of beeing a conicidentally 'bug' is as high as the propability, that big companies like hp or even bigger ones are caring for the customer only & not for themselves.
nicely put... Crucial C300 & Micron C300 we're identical apart from the enterprise Price and FW, I've still got a few Microns alive an kicking too.. I think I even cross-flashed one, as the initial Micron FW was buggy and we were literally binning 100's. Maybe some useful for anyone with Micron C300 bought from ebay thats dead on a shelf somewhere: The drive would freeze and not appear in the bios, FW 07 fixed it, but you needed to go to FW03 first, if they did freeze you removed that sata cable or just supplied power for 30 minutes which triggered a reset internally and they would come alive again)
data/avatar/default/avatar38.webp
Ne1l:

Maybe some useful for anyone with Micron C300 bought from ebay thats dead on a shelf somewhere: The drive would freeze and not appear in the bios, FW 07 fixed it, but you needed to go to FW03 first, if they did freeze you removed that sata cable or just supplied power for 30 minutes which triggered a reset internally and they would come alive again)
thanx 4 the hint! 🙂
data/avatar/default/avatar33.webp
Aura89:

Not sure how people don't understand you need proof to validify a claim, you don't need proof to not. It's about the oddest arguement i've ever heard. I'm not making an arguement. I'm not making a claim. I have no proof because there's zero reason to have proof for a lack of a claim. Really can't get more simple then that. You on the other hand decided to make a claim with zero proof. That's called a conspiracy theory. Here let me put it in a way even the simplest person could understand. If you bring someone to court claiming they stole from you, it's up to YOU to prove it. If you don't have proof, your court claim is lost and you'll likely have to pay for their court fees. What do they have to prove? Absolutely nothing. They only have to prove their innocence if you HAVE proof. And if during this court case you stated this long story about how they stole from you, but then the lawyer (me in this instance) asked you for proof, you don't get to say "well, prove it didn't happen!". No, that's not how anything works, you don't prove something didn't happen, you prove that it did, and the people claiming it are the ones to do it. The world may be going crazy right now, but left is still left and right is still right, and having to prove a lack of a claim still makes zero sense. ....Unless you want me to claim you're biased @cryohellinc because you and i have worked in the past for a competitor to Dell and HP and the only reason you're making these statements is because you still work for them and are trying to drive business over to the company you work for and that it completely lines up with your personality that i know you have since again, i know you and have worked for you.... Oh, want me to prove the statement i just said? No, you prove that it's not true. Lets all go back to the salem witch trials since that apparently is where you wish to be.
First, well, u are claiming it's a bug. Where is the proof that it is a bug? u have the testimony of the accused, not worth a lot. Second, u are saying it yourself, "if you bring someone to court", so u are entitled to suspect and ask for the justice system assistance into proving things (like a judge asking a company for other-wise not public information). Making fun of people for taking the first step and suspecting something is silly. U are saying we shouldn't be able to get others to court until we have absolute evidence? Yes u need something for a judge to even open a case, but HP (and others) don't have a "clean" record in this matter which is enough to warrant some raised browns and forum posting. end of quote ----------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------- As for reasons to do this and then patch it themselves there are a couple. Forcing consumers to apply an update or bricking the drives can be useful. For example: - as already mentioned, a company sells the drives to a "recycler" that resells some of them to other customers that have no idea nor the means to perform the update. Making recycling harder is profitable for them. (Apple makes more money on services and repairs than...... u know the story) - window of opportunity to introduce new things in the firmware, u never know, if a war starts it could be useful to brick or infect all enemy drives? or the EULA changes and know they can pull usage data directly from the disk (too far fetched but not impossible at all) ----------------------------------------------------------------------------------------------------------------------- Something I never understood is the value for the customer of having a "feature" that bricks a disk after detecting a legitimate failure. If a failure is detected, it can be informed and let the customer decide what to do with the drive, what's the value in an automatic "implosion"??? I'm not being sarcastic, this is a genuine question.
https://forums.guru3d.com/data/avatars/m/132/132389.jpg
Denial:

I don't get why that has to be the case. Various parts of the system firmware use the drive time. For example garbage collection and various other algorithms to optimize the drive all occur at various times of the drives life. Most enterprise drives are designed to fail the entire drive if it detects an issue - so any issue involving these algorithms and the drive time could cause the failed drive.
Garbage collection and other methods? How the hell could they screw that up to nuke the drive after X amount of hours? This is a genuine question, if you think it's something along the lines of that, how?
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
Neo Cyrus:

Garbage collection and other methods? How the hell could they screw that up to nuke the drive after X amount of hours? This is a genuine question, if you think it's something along the lines of that, how?
Look I'm not going to pretend I'm a firmware engineer for an SSD manufacturer but I did study CE for 5 years at RIT - lots of projects I've worked on then and since then use a device's power on time as triggers for various operations to take place - notably maintenance ones, which is why I mentioned it. In the past and I'm sure today, manufacturers of enterprise grade SSD's would fail the entire drive in place if it detected any kind of error in the firmware or with the memory itself. They do this because when you build massive storage arrays one drive out of hundreds or thousands is basically nothing costwise - yet one customer trying to pull error-prone data from a funked SSD creates a shitstorm of problems. So let's say they stored the time as the wrong variable type, or the logic in the code has some case that when it crosses a time threshold it fails one of the maintenance commands, or any other scenario where its simply using time as a trigger and causes an error - boom device is bricked. I don't know if this is happening - for all I know some intern or disgruntled engineer wrote shit code or intentionally put code in - or perhaps the CEO of HP and Dell colluded to crash a subset of their drives at exactly the same time but not before announcing and creating a patch for it. All I'm saying is that it's more than likely it's just a bug found in both companies drives because realistically Dell/HP are probably buying the drives/firmware from the same provider and that provider just didn't do the due-diligence of proper unit testing. I think Hanlon's Razor is appropriate here.