The right FUCKING time to get TWO ram sticks damaged

Wispy2891@lemmy.world · edit-2 4 months ago

The right FUCKING time to get TWO ram sticks damaged

ascendings@fedia.io · 4 months ago

If you haven’t yet, I would try disabling the XMPP/DOCP profile to see if that passes a test. This will tell you if the RAM is just dead or if it’s degraded a bit and can’t hit the same speeds as it did before. If it does pass, then re-enable that profile and try downclocking or loosening the timings a bit to see if that’ll work.

Failing that, you could try increasing the voltage slightly (like +0.05V, I wouldn’t go above 1.4V), but I’d be careful on this front to not cause anymore damage.

Sucks that this happened right now, but IMO it’d be better to sacrifice a slight hit in performance than to buy RAM by itself at these premiums.

Beryl@jlai.lu · 4 months ago

This guy RAMs !

Kairos@lemmy.today · 4 months ago

RAM has Jabber stuff now? /s

shittydwarf@piefed.social · 4 months ago

The universe: Fuck this guy in particular

Willem@kutsuya.dev · 4 months ago

A lot of ram is under lifetime warranty, check the manifacturer site (usually a serial lookup is enough).

tal@lemmy.today · edit-2 4 months ago

Do they run stably if you downclock the memory in your BIOS? I’d at least try that first if replacing them is going to be a major problem.

Wispy2891@lemmy.world · 4 months ago

No, even tried to run them at 1866…

tal@lemmy.today · edit-2 4 months ago

Ah, fair enough. Long shot, but thought I’d at least mention it on the off chance that maybe it would work and maybe you hadn’t yet tried it. Sorry.

tries to think of anything else that could be done

Are you using Linux? Linux has a patch that was added many years back with the ability to map around damaged regions in memory. I mean, if your memory is completely hosed and you can’t even boot the kernel, then that won’t work, but if you can identify specific areas that fail, you can hand that off to the kernel and it can just avoid them. Obviously decreases usable memory by a certain amount, but…shrugs

I’ve never needed to do it myself, but let me go see if I can find some information. Think it was the “badram” feature.

searches

Okay. You’re running memtest86. It looks like that has the ability to generate the string you need, and you hand that off to GRUB, which hands it off to the kernel.

https://www.memtest86.com/blacklist-ram-badram-badmemorylist.html

MemTest86 Pro (v9 or later) supports automatic generation of BadRAM string patterns from detected errors in the HTML report, that can be used directly in the GRUB2 configuration without needing to manually calculate address/mask values by hand.

To enter the address ranges to blacklist manually, do the following:
Edit /etc/default/grub and add the following line:
GRUB_BADRAM=addr,mask[,addr,mask...]
where the list of addr,mask pairs specify the memory range to block using address bit matching
Eg. GRUB_BADRAM=0x7ddf0000,0xffffc000 shall exclude the memory range 0x7DDF0000-0x7DDF4000
Open and terminal and run the following command
sudo update-grub
Reboot the system

If you can’t even boot the system sufficiently to get update-grub to run, then you might need to do a fancier dance (swap drive to another machine or something), but that’s probably a good first thing to try. I’d try booting to “rescue mode” or whatever if your distro has an option like that in GRUB, something that doesn’t start the graphical environment, as it’ll touch less memory.

EDIT: If your distro doesn’t have something like that “rescue mode” set up — all the distros I’ve used do, but that doesn’t mean that all of them do — or it it can’t even bring “rescue mode” up, because your memory is too hosed for that — then you probably want to do something like hit “edit kernel parameters” in GRUB and boot while adding “init=/bin/bash” to the end of the kernel command line. That’ll start your system up in a mode where virtually nothing is running — no systemd or other init system, no graphics, no virtual consoles, no anything. Bash running on bare metal Linux kernel. Control-C won’t work because your terminal won’t be in cooked mode, everything will be very super-duper minimal…but you should be able to bring up bash. From there, you’ll want to manually bring your root filesystem, which the kernel will have mounted read-only, as it does during boot, up to read-write, with:

# mount / -o remount,rw

Once that’s done, do your editing of the grub config file in vi or whatever, run the update-grub command.

Then run:

# sync

Because you don’t have an init system running and it’s not gonna flush the disk on shutdown and your normal power-down commands aren’t gonna work because you have no init system to talk to.

Go ahead and manually reboot the system by killing its power, and hopefully that’ll let it boot up with badram mapping around your damaged region of memory.

EDIT2: It occurs to me that someone could make a utility that can run entirely in Linux to do memory testing to the extent possible inside Linux using something like memtester instead of memtest86, generate the badram string and then write it out for GRUB. That’s less bulletproof than memtest86 because memtester can’t touch every bit of memory, but it’s also easier for a user to do than the above stuff, and if you additionally added it to the install media for a distro, it’d make it easier to run Linux on broken hardware without a whole lot of technical knowledge. I guess it’d be pretty niche, though — doubt that there are a lot of systems with damaged memory floating around.

EDIT3: Oh, that’s only the commercial version of memtest86 that will auto-generate the string. Well, if you know how to do a bitmask and you can get a list of affected addresses from memtest86, then you can probably just do it manually. If not, post the list of addresses here and someone can probably do a base address and bitmask that covers the addresses in question for you. Stick the memory back into your computer first, though, since the order of the DIMMs is gonna affect the addresses.

Wispy2891@lemmy.world · 4 months ago

wow i’m running linux, so it might be perfect

though i’m a bit scared that it will get worse over time. Today i got a freeze that forced me to test the ram with memtest86, but since september i got some random corruption in the btrfs filesystem (luckily always “useless” files like flatpak or docker stuff that i could delete and download again in seconds) and i assumed it was a btrfs bug, not hardware problem

COASTER1921@lemmy.ml · edit-2 4 months ago

If I were in this position I’d strongly consider using 16GB for the next year or two. Especially with an NVME SSD, good swap performance makes the impact of running out of memory much smaller than it used to be.

It’s very strange both sticks failed at the same time, have you tried them in another motherboard?

chellomere@lemmy.world · 4 months ago

You can even make linux run an automatic memtest on boot and reserve the bad areas it finds. This is with the memtest=N kernel parameter, where N is the number of passes. memtest=17 tests all patterns. With this, the kernel will run an automatic test on every boot.

justlemmyin@lemmy.world · edit-2 4 months ago

I had to do this on my busted ddr4 2 weeks ago. Badram didn’t work, but memmap did. I had to do bit flipping to get the translation from BADRAM as explained here.

I think the latest memtest86+ has the option to report in memmap format. But you will need to take a photo of the screen, coz it’s Foss and not as fancy as Passmarks memtest.

Edit: Adding badram to grub broke grub for me, I have to undo the grub config using a live boot rescue thingamajig. Then I went hunting why.

[object Object]@lemmy.world · 4 months ago

To add to what the above commenter said: afaik Grub allows specifying kernel parameters at boot by pressing some hotkey. You could type in the string from memtest86 if you find what the parameter should be called (or add the memtest parameter instead).

MigratingApe@lemmy.dbzer0.com · 4 months ago

doubt that there are a lot of systems with damaged memory floating around.

Let’s say that you would be surprised if we actually started checking this. I will not disclose my occupation but there are thousands of critical telco infrastructure pieces of equipment that run not only a non-ECC ram because of cost cutting, but with actually broken DRAM modules, regularly rebooting at least a few times a day and causing local outages…

Back to the topic at hand - doesn’t it seem strange that only CPU4 finds issues in memtest86? It could be a CPU or even motherboard that got damaged and not the DRAM itself, no?

tal@lemmy.today · 4 months ago

Back to the topic at hand - doesn’t it seem strange that only CPU4 finds issues in memtest86? It could be a CPU or even motherboard that got damaged and not the DRAM itself, no?

I noticed that, but OP said that he ran the thing in three different systems, so I’m assuming that he’s seen the same problems with multiple CPUs. It may be — I don’t know — that memtest86 doesn’t, at least as he’s running it, necessarily try to hit each byte of memory with each CPU, or at least that the order it does so doesn’t have errors from other CPUs visible.

I also wondered if it might be a 13th or 14th gen Intel CPU, the ones that destroyed themselves over time. But (a) it’s a mobile CPU, and only the desktop CPUs had the problem there, and (b) it’s 11th gen.

BlameTheAntifa@lemmy.world · edit-2 4 months ago

AlecSadler@lemmy.dbzer0.com · 4 months ago

I have two sticks of RAM worth $850 now that went bad but I was able to successfully RMA them - can you do that?

nialv7@lemmy.world · 4 months ago

On Linux you can mask out bad memory ranges. Don’t know about Windows.

rollerbang@lemmy.world · 4 months ago

That’s neat, I’ll definitely have a look about the topic.

Adulated_Aspersion@lemmy.world · 4 months ago

At least its DDR4?

Im grasping.

diabetic_porcupine@lemmy.world · 4 months ago

Bro I’ve had my ram for yeaaaars. Anytime my computer glitches I just think “yep it’s time” and my wallet sheds a tear

brucethemoose@lemmy.world · edit-2 4 months ago

Does the BIOS support any overclocking/tweaking?

I’m not familliar with Rocket Lake (your CPU generation), but you may be able to bump the voltage or loosen the timings a bit to get it stable. Even without BIOS support, it’s possible you could do this from your operating system, like you can with Ryzen.

db0@lemmy.dbzer0.com · 4 months ago

Nightmare scenario. My condolences

4 months ago

RIP OP’s Kidney

Devolution@lemmy.world · 4 months ago

Might be cheaper to buy a pre built laptop at this point…

ɔiƚoxɘup@infosec.pub · 4 months ago

Could this possibly be caused by a bad connection of the ram contacts?

I’m grasping for ya.

If not… F

Glifted@lemmy.world · 4 months ago

Sounds dumb but check craigslist

3laws@lemmy.world · 4 months ago

I’m eying FB Marketplace lately, not all users are aware of the RAM situation.