Wobbly computer

From: Chris (CHRISSS)19 Oct 2011 20:39
To: Ixion 31 of 64

I tried that and it got to 40% then said "Windows Resource Protection could not perform the requested operation."

 

The only problem I seem to be having now is with Windows Live Mail. I might just try reinstalling that to see if it fixes it. The turbo memory setting seems to be working fine now too, the computer's been on since yesterday torrenting with no problems :D Hopefully everything's working properly <crosses fingers>

From: Chris (CHRISSS)21 Oct 2011 21:41
To: ALL32 of 64

Problems once again. All was fine til I started to convert my CR2s to JPGs to use with the screensaver on my HTPC. I set Irfanview to some batch conversion and things started going wonky again.

 

Took a while the first time but soon after starting the conversion on subsequent attempts the computer BSODed pretty quickly. I had a PFN_LIST_CORRUPT and something_MEMORY_CORRUPT. I've now taken one of the DIMMs out to see if things are stable with just the one. If it is I'll swap over the other one and hope it crashes so I can have be reasonably certain it's the RAM.

From: Chris (CHRISSS)29 Oct 2011 09:40
To: ALL33 of 64

I did eventually get around to taking one of my DIMMs out and the computer was running perfectly with just the 2GB, even if it seemed ridiculously slow after using it with 4GB. It had been running for days with no crashing so I swapped the two over last night.

 

Everything was fine, watched some stuff from it on the HTPC, played Trackmania for a bit, got worried that nothing was crashing. Just tried to open Computer Management and it BSODed straight away so I'm sure it's a faulty DIMM.

From: ComtronBob30 Oct 2011 08:36
To: Chris (CHRISSS) 34 of 64

"Just tried to open Computer Management and it BSODed straight away so I'm sure it's a faulty DIMM".

Don't be so sure.  Opening Computer Management doesn't invoke anything special.  Without some type of trap-and-trace diagnostic running it doesn't really tell you anything.  It could simply have been one additional thread or process overflowed some register or pushed the CPU to a slightly higher temperature, where the real instability is hiding.

I realize I'm somewhat late to the party.  (I've only recently registered to Teh Forum).  But let me make a few suggestions/observations that may come in handy.

First, it helps to know the EXACT make and model of your MoBo.  Does your user profile still accurately list the hardware in question, such as a MoBo in the Asus P5B series?  Is that just a "P5B" (no suffix), or is it something more like "P5BP-E_4L", "P5B Deluxe", "P5B Premium", or similar suffix?

Knowing the full model, socket type, and any revision, helps nail down problems that are sometimes unique to that MoBo subset.  (Note the Asus classes MoBos by socket type).  For instance, most (but not all) P5B-series MoBos are socket-775 (LGA775).

Knowing the EXACT CPU model also helps.

As does knowing the graphics card and/or any on-MoBo graphics chipset, which you presently list as the ATI X1950XT.

If you're not sure what you have under the hood, the free CPU-Z utility can answer many questions about both the CPU and memory.  (See the SPD tab, which should also tell you the proper, non-overclocked memory voltage).

The free GPU-Z will give you similar info about your graphics card.

With regard to memory, the first thing I always do is clean the edge-connector fingers with 91% Isopropyl alcohol on a lintless cloth.  This clears up most problems for memory sticks that had been working, but seem to develop problems at some later time.  (For that matter, I would similarly clean all card edge connectors, especially the graphics card, before proceeding).

As to Memtest86, while I like it in general, it doesn't always catch some odd problems.  Quoting myself from this PCmag forum post:

Further, off-brand memory module/stick vendors sometimes use a trick, that while yielding a marginally functional product, will often fail to work whatsoever in a high-performance motherboard: They will buy large surplus lots of specialty RAM chips, not originally intended for use in PCs, and use a cheap FPGA to re-map and/or otherwise emulate a standard RAM product.  Unfortunately, this can skew access timing in a way that is not easily detected.  Interestingly, the old MS memory diagnostic will usually flag this type of module as defective, even when other diagnostics won't.  See this old post of mine for a few details, and the download and users' guide links. 

So you may want to get a second opinion from the MS diagnostic.

This may seem obvious, but have you tried substituting a different power supply?  Or have you measured the various supply voltages, right at the MoBo connectors, with a VOM?

Continued next post...

EDITED: 30 Oct 2011 08:54 by COMTRONBOB
From: ComtronBob30 Oct 2011 08:41
To: Chris (CHRISSS) 35 of 64

As "View full message" doesn't appear to be working, post continued...

As supplies age, and the hold-up capacitors become weak, certain critical supply rails may sag under load.  GPUs have a considerable demand for surge current just as soon as they are called upon for any complex rendering.

As to BSODs specifically (revised in Win8 to the FOD — Frown Of Death — see below graphic), while hardware can certainly be at fault, better than 85% of the time it turns out to be a driver issue.  Rather than go through much additional trial-and-error, I'd personally prefer to test, not guess.  To that end, a free diagnostic, called WhoCrashed, may be able to answer the question as to the precise culprit.  Note that on first use WhoCrashed will download either the 32 or 64 bit Debugging Tools for Windows (WinDbg) Package from MS, which it uses to collect and extract data for analysis.

I assume you already looked into Win32K.sys, and got nowhere?  Sometimes the driver indicated on the BSOD is just where the process hung and not the root cause.  If nothing else, take a look at the STOP message troubleshooting list.

Two of the more common drivers known to be problematic, often causing a blue screen, are TCPIP.sys and/or Intel's netw5v32.sys, which is part of many WiFi driver packages.

There is also the possibility of HAL errors.  The HAL sometimes needs rebuilding after replacing certain critical hardware, like the MoBo, CPU, GPU/graphics card, or upgrading certain drivers, like those for the GPU, network card/on-board network chipset, or the MoBo itself.  Rebuilding the HAL requires, at minimum, a repair reinstall, sometimes called a "refresh" reinstall, which should retain most of your settings.

Two questions:

1. Does the machine behave in Safe Mode?  Safe Mode loads only the compatibility drivers, which doesn't run the video in a stressful fashion.  So that can significantly narrow things down.

2. Have you tried running with a "live" Linux CD.  If it behaves running a live Linux distro, that pretty much rules out hardware.  (You might give either Knoppix, or the KDE-based BackTrack a try). 


MS "feels your pain" with
new Win8 Frown Of Death.
 
...And so should Nvidia
(feel our pain ;-)

One of the more typical hardware-related problems that can cause a blue screen error is overheating of the CPU, GPU, memory sticks, and sometimes even the Northbridge.  For this reason, do you have some type of temp' monitor installed?  Something like the free HWMonitor?  (It's from the same folks as CPU-Z).  Click on the "Version History" link for the available downloads.

If you had an Nvidia GPU, I'd point you to this Inquirer article, detailing the chip substrate overheating problem they had, as things typically looked just like your video, prior to crashing altogether.  There are still a lot of those bad GPUs floating around.

I think that about covers it for the moment. <grin>

EDITED: 30 Oct 2011 08:51 by COMTRONBOB
From: Chris (CHRISSS)30 Oct 2011 10:47
To: ComtronBob 36 of 64

That's a very thorough and helpful post, thanks. The infromation in my profile hadn't been updated but is now.

 

It's a (mostly) new system with a fresh install of 7 which started crashing soon after it was built. I've been running the computer with only one of the 2GB RAM modules and it's perfectly stable. As soon as I swapped them over it started crashing again. I had 2 BSODs within 5 minutes (one being the memory_corrupt one), swapped them back and stable against.

 

I assume it's the Corsair Value Select RAM I accidentally bought instead of their XMS ones so when I send it back I'll pay the extra £2 for better stuff.

From: ComtronBob30 Oct 2011 12:17
To: Chris (CHRISSS) 37 of 64

"That's a very thorough and helpful post, thanks".

Glad you found it useful. :-)

"The infromation in my profile hadn't been updated but is now".

Indeed, I see you've made a considerable step up, going to the Gigabyte GA-Z68XP-UD3 and AMD 6870.

I've not kept up on the details of AMD cards, as you can probably tell. <g>  And I'm still not sufficiently awake to plow through the comparison tables.  But, presuming they exist and are accessible, you may be able to take advantage of unlocking any extra shaders, similar to the way it can be done for the AMD 6950 cards.

This blurb on the 10.12a Hotfix Drivers for the 6950 and 6970 cards may, or may not, also apply to you.  But you should probably have a look-see, just in case.

"As soon as I swapped them over it started crashing again.  I had 2 BSODs within 5 minutes (one being the memory_corrupt one), swapped them back and stable against.

Ah!  That wasn't quite clear from your post #33.

"...when I send it back I'll pay the extra £2 for better stuff".

Sounds like a plan.

Good luck!

From: Chris (CHRISSS)30 Oct 2011 12:54
To: ComtronBob 38 of 64

Yes indeed, it's a big difference from the old system which wasn't too high spec when it was bought 5 years ago. Having 4GB of RAM certainly makes everything seem smoother, but running with 2GB at the mo isn't nice.

 

The RAM passed all Memtest86+ test running for 12 hours or so but as soon as Windows was booted things started to become quite unstable. Good thing I have two modules so I can test each separately.

 

Seems the 6870 can only be overclocked, no unlocking of anything to give it special powers. I probably should try doing something with the CPU as that's the main extra the K processors give.

From: ComtronBob30 Oct 2011 23:35
To: Chris (CHRISSS) 39 of 64

"The RAM passed all Memtest86+ test running for 12 hours or so but as soon as Windows was booted things started to become quite unstable".

Even though you're likely not saddled with one or more of those "trick" FPGA'd modules, if you're so inclined, you may still care to give the old MS WinDiag memory diagnostic a try.  In spite of it not being as fancy as Memtest86+, it often flags modules as bad that sail right past Memtest86+ as being A-OK.

If you do decide to play around with WinDiag, note that if you hit the *P* (for pause), a new option will appear at the top of the screen: The *M* option (for menu).  You can then hit M -> 2 (advanced) -> 1 (change cache settings) -> 2 (turn off caching for all tests).  This allows you isolate if there are any CPU cache related issues, which are usually non-obvious.

There's also a "change the test suite" option that will let you perform some additional/more rigorous tests.

One other thing I like to do while running memory diagnostics is take a hair dryer and deliberately heat up the DIMMs under test.  (Though you have to be careful to not overheat them to excess).  As they heat up, the CL-timings tend to shift, sometimes to an out-of-bounds value.  This will usually show up on-screen for several test loops before you crash. <grin>  This approach is far more likely to find a marginal DIMM than running a diagnostic by itself for days at a time.

Have fun!

From: Chris (CHRISSS) 2 Nov 2011 10:44
To: ALL40 of 64

I sent Scan a message on Sunday which they finally replied to yesterday. They asked if I'd ran memtest and said they won't change the RAM for something else, just replace it. Tried memtest again last night with just the dodgy RAM in and it said there was a problem straight away.

 

Waiting to hear back from Scan from yesterday now.

From: Chris (CHRISSS) 2 Nov 2011 11:16
To: Chris (CHRISSS) 41 of 64
So that message did post. The buttons in beehive seemed to stop doing anything in the android browser.
From: Chris (CHRISSS) 6 Nov 2011 09:21
To: ALL42 of 64

Oh FFS! I put the other RAM in the computer the night before last because I got fed up with how slow it was running with only 2GB (does Windows do anything special when using 4GB that would make it run like shit if 2GB is taken away? It's running so slow and thrashing the HDD constantly, the old computer wasn't like this with 2GB) but it started crashing straight away so turned it off on the PSU.

 

Tried last night to turn it back on, lights came on on the card reader but the power button did nothing. Turned it on and off but just couldn't get it to boot. Tried this morning and as soon as I turned the PSU on there was a lovely fop sound. I have a feeling it might have died :(

From: ComtronBob 6 Nov 2011 12:54
To: Chris (CHRISSS) 43 of 64

Rather than re-explain the basic diagnostic procedure for a box that appears dead, see this post of mine on the PCmag board.  Let us know what you find.

From: Chris (CHRISSS) 6 Nov 2011 17:25
To: ComtronBob 44 of 64

I'll have a look through that later when I'm at home. Fairly sure the PSU has gone to heaven or maybe just a blown fuse. The PSU has a light on the power switch and that didn't come on after the pop something made.

 

I did have to replace the first PSU from Hiper which died after about two years and it looks like the replacement has lasted about the same time. I'll check it when I go home later.

 

I've been looking at PSUs and it seems I have to go above 500W to get two PCI-E connectors and a P8 connector so I'm thinking either of these:

 

-Thermaltake 575W Toughpower XT Modular PSU
-OCZ Fatal1ty Series 550W Modular PSU

 

The first is a little more expensive but mentions solid state capacitors and has a 5 year warranty instead of 3.

From: CHYRON (DSMITHHFX) 6 Nov 2011 23:35
To: Chris (CHRISSS) 45 of 64
FWIW, I thought my pc was dead about a year ago. I had it overloaded with too many hdd plugged in, either overheating or overloading the PSU. Similar symptoms as you describe (though probably a different cause), until I unplugged it from the wall and plugged it back in. Then it booted right up (the problem returned later, after which I decided to go with just one, big drive).
From: Chris (CHRISSS) 7 Nov 2011 09:18
To: CHYRON (DSMITHHFX) 46 of 64
Could be something like that, I'll have a check if I get the PSU working again. It's definitely not working, not had a chance to look at it yet, could just be the fuseseses. Won't matter if I open it up to have a look now it's out of warranty.
From: ComtronBob 7 Nov 2011 10:16
To: Chris (CHRISSS) 47 of 64

Because I was in a rush to make my previous post I skipped over your question, so I want to back up a little.

"...does Windows do anything special when using 4GB that would make it run like shit if 2GB is taken away?  It's running so slow and thrashing the HDD constantly".

Not particularly.  Obviously, it depends on what's loading at startup.  If what's loaded consumes more than 2GB, the overflow is pushed out to the swapfile/pagefile on the HDD, which would account for all the drive thrashing.

To get a cursory view of what's running (and how much memory is consumed by each item) you can use the Task Manager and/or Resource Monitor.  For how to interpret various indicators see Measuring memory usage in Windows 7.  For a more granular view you can download the free Process Explorer from MS's SysInternals division.  It's like Task Manager on steroids, and will let you identify hidden processes nested by either SvcHost.exe or RunDll32.exe.

You can use the MSconfig utility to prevent unneeded/unused programs from loading at startup.  (AutoRuns is the free steroids version of MSconfig).  You can do the same for Windows Services; have a look at the Black Viper's site.  Be sure to see his Win7 SP1 Service Configurations page.  By disabling unneeded/unused services and programs you can significantly shave your memory footprint.

You may also care to see RAM, Virtual Memory, Pagefile and all that stuff, which should give you a better idea of their relationship to each other.  Various versions had been previously published by MS as KB2267427 and KB555223, both of which have been removed.


Moving on to your immediate problem, unless you know for sure that the "lovely fop sound" you heard came from the PSU, I would be cautious.  If you can, try substituting another known-good supply before buying a new one.  You want to make sure the supply you had been using didn't take anything else with it for the ride over the cliff.

I'm assuming you've run your requirements by a few of those PSU Calculators?  (Thermaltake has their own calculator).

The PSU Stickies at the Overclockers forum are also a good resource.

The 575 Watt Thermaltake Toughpower XT (P/N TPX-575M) has good specs all around.  It's a similar design to the Corsair TX650, so you may want to look at a few PSUs from Corsair.

I also took a look at the 550 Watt OCZ Fatal1ty Series.

Personally, I would go with the Thermaltake.  Not only for the better warranty (which is a good measure of the vendor's confidence level and expected MTTF/MTBF), and the solid dielectric capacitors.  But, according to Newegg, it's also EPS12V (v2.91) compliant.  Because EPS12V supplies are often employed for high-reliability server use they use a failsafe design that minimizes the likelihood of follow-on (domino effect) damage of anything connected to them.

From my perspective, unless it's a throwaway box, the PSU is not the place to pinch pennies.

Something else to consider is what I call the "dust bunny margin".  What many used to do is deliberately oversize the supply to obtain additional thermal margin before having to clean out the dust bunnies.  Unfortunately, with many new, high-efficiency models (compared to older, lower wattage units) there is often no increase in heatsink size or fan CFM when going to a higher wattage model.  The result being there's no increase in dust bunny margin before overheating occurs.  So you wind up having to clean it out with greater frequency.

It's worth noting that excess heat can significantly reduce the life of most passive components.  For instance, electrolytic capacitors are typically rated for use at a nominal temperature of 25°C.  For every 10°C above that temperature the rated MTTF (i.e., expected life) will go down by half.  So if a part is rated for 50K hours at 25°C, it's only going to get 25K hours at 35°C.  Just something to keep in mind when premature failures are encountered.

From: Chris (CHRISSS) 7 Nov 2011 12:29
To: ComtronBob 48 of 64

Unfortunately I don't have a spare PSU with a P8 or any PCI-E connectors. I've had two previous PSUs blow up, one took out the motherboard with it and the other caused no collateral damage. The PSU calculator suggests 412W with some capacitor aging so shouldn't be overloading it.

 

Actually I had trouble starting the old computer recently with this PSU so wouldn't be surprised if it had died..

 

I do usually do a bit of optimisation to reduce memory usage as much as I can which I didn't do with the 4GB installed as it was running super smooth anyway so could just be it's using more memory than usual and using the swap file lots.

From: Chris (CHRISSS) 7 Nov 2011 21:41
To: ALL49 of 64

I've taken the PSU out and opened it up (without electrocuting myself this time (although that was about 10 years ago)) and it looks like the PSU suffered the same.fate as the old motherboard, leaky capacitors. Probably why it was playing up.now and again. Not sure what went pop though, nothing obvious.

 

So need to send the RAM back (should Scan pay the postage?) and order a new PSU and hopefully I will have a stable computer at last.

From: JonCooper 7 Nov 2011 21:48
To: Chris (CHRISSS) 50 of 64
(should Scan pay the postage?)


was it their fuck-up or yours?