Archive for February, 2012

What hardware to choose when building a GPU based password cracker right now (Q1 2012)?

Posted in Notes to myself, Password cracking on 2012/02/06 by mram

GPU based password cracking has unmet power when brute force cracking. Although brute force cracking is only part of the game (see also my over a year old post on CPU based cracking not being dead here) any modern security testing lab includes GPU password cracking functionality.

The field of GPU hardware is heavily in development. What was top of the line 18 months ago is somewhat reasonable right now. As I’m the process of upgrading the GPU hardware in our security testing lab myself, I just researched several possibilities with the current state of GPU hardware taken into account. This may be different in a few months, but for now (Q1 2012) these are the best picks I could find. And I thought to share them with you.

I narrowed it down to four different options, ranging from a few hundred to 13.000 Euro.

Common decisions for all possible options

Before diving into the different options, let’s discuss a few main decisions that are the same for any way you go.

Power is not really an issue when you can combine power supplies

GPU cards consume a lot of power. Having several GPU cards in your box requires a massive PSU. We are talking 1200+ Watt here when having a few modern cards. High Watt PSUs are expensive, especially when you want  ’80 PLUS’ certified – you want these as these are guaranteed to require only 20% of extra Watt from the power outlet to reach the advertised amount of Watt, these extra 20% are transformed into heat, the byproduct of any PSU. But as you do consume a lot of power you do need a big – and therefore expensive – PSU. Fortunately there are easy solutions to combine several mid range PSUs into the PSU of your requirements. ADD2PSU allows you to daisy-chain even more than two PSUs into one. Lian-Li Dual Power Supply Adapter (availability is hard, not sure if still shipped) allows you to combine two PSUs into one. Both simple solutions for our problem. Of course you can do this yourself with soldering cables. But with these solutions and prices (Eur 20) I wouldn’t start tampering with electrical power myself.

When picking PSUs make sure to take PSU that allow for enough connectors. Preferably a PSU like the Corsair AX1200 that allows for connecting the cords yourself.

CPU, chipset and main memory don’t really make a difference

It is all about the GPU cards. Unless you want to do more on the box you are creating I wouldn’t spent too much Euros on top of the line CPUs, chipsets and memory. Any Intel socket 1366 or even socket 1155 is good enough. If you want to go AMD, socket  AM3 or AM3+  is good enough. Of course you can go to the newest sockets  but it doesn’t provide you with more cracking power. Same goes for MHz, it will not provide you with more cracking power. Memory should be enough to run your OS of choice and some more. Don’t be on the cheap side, no computer runs OK with insufficient RAM, but I still need to find the first cracking program that requires gigabytes of memory, except for rainbowtable (in that case system ram does matter a bit but you should calculate your needs based on the size of tables you are using).

Be smart and don’t pick top of the line here on CPU, socket and main memory. It will save you a considerable amount of money that you can than spend on GPU cards.

One of the commentators (Bitweasil, author of the Cryptohaze Multiforcer crack tooling so definitely somebody who has experience with this) recommended to match system RAM with RAM from GPU. With system RAM begin very cheap nowadays and most GPU cards shipping with about a Gig of RAM, you would probably match it by using a ‘default’ amount of 4-8GB. He also recommends to match the amount of CPU cores with the amount of GPU cards, just in case GPU drivers are not optimized as they should. I guess this makes sense, but also shouldn’t be a problem with most CPU’s nowadays being multi core.

PCIe1x is fast enough

This is an important one when choosing your main board. Many boards are advertised with X amount of PCIe16x slots. But when you look closer in the specs you notice that the 16x speed is shared between slots. So when for example slot 1 and slot 3 are used simultaneously, they are downgraded to both PCIe8x or even lower. If you think “more is better” this really makes it hard to pick a main board with as many as possible PCI16x slots. I’ve got news for you, main boards with 8 slots of true PCIe16x are limited to non existing. But there is also is no need for. If you go gaming (still the largest market for creators of main board with many PCI slots) you want to go SLI with some PCIe16x. In that case the cards mostly communicate via the SLI bridge and not via the PCI bus. But we go password cracking, not gaming. And with password cracking PCIe1x is fast enough.

PCIe works with lanes. The amount of lanes is a factor of two between 1 and 32 and is represented by the number directly after the “PCIe”. PCIe16x means 16 lanes. 16x Seems to be top of the line on most boards. PCI version 2 (which is the most used version for GPU cards and main boards right now) has a speed of “500MB per second per lane”. Now, with games textures and vertices are continuously processed by the cards. These are heavy calculations on big sets of input data that together require significant throughput on the PCIe bus. But with password cracking we are talking simple operations on data. Transfer of ‘data’ as in the list of base input words that are to be hashed + ‘operation’ as the set of calculations to be performed by the cores to calculate the hash on the GPU are transferred over the bus only periodically. No way that the GPU can calculate hashes so fast it requires 500MB of data + operations every second. GPU’s are simply not powerful enough at this moments to achieve 500MB/s.

So, PCIe1x is speedy enough. Suddenly a lot more main boards become available :-)

Memory on GPU card is not a delimiting factor

This continues on the discussion that the throughput of the GPU cards isn’t that big for password cracking compared to gaming. Using a gigabyte of memory on the card is a ridiculous huge amount that no tool will use. Perhaps only when you are using ridiculously large dictionary files. But if you are using dictionaries that are approaching 1 gigabyte you might need to verify the usefulness of the dictionary. Brute force will be faster.

So, save yourself some money and don’t go for the GPU cards with a ridiculous amount of memory. It will not improve your cracking speed. And with most cards shipping nowadays with 1-1.5GB of RAM, my pick would be those, and not the extra expensive with 2GB.

PCI riser cards can come in handy

With also the PCIex1 slots being usable for cracking, the only thing you need to overcome to use all PCIe slots on a main board is the fact that most GPU cards require the physical space of two PCIe slots. Flexible PCI riser cards come in handy here.  If you can find a way to lift the cards and have a big enough box to fit all these double sized GPU cards, you can then interconnect them with the main board via (flexible) PCI riser cards. Many solutions exist. Note that in theory all you need is a PCIe1x connection (the shortest possible connector). Just make sure the card you buy allows for it without sawing holes in the PCI connector (and if you do want to saw in your PCIe equipment here is an excellent tutorial: http://blog.zorinaq.com/?e=42) .

AMD has the more powerful architecture

When buying GPU cards for password cracking you have two different vendors to choose from: NVIDIA and AMD. Which one to pick? Short answer: go AMD, the results are all over the place.

Long answer: go AMD because they have an architectural preference of more cores/ALUs, resulting in more parallel calculations. AMD has more cores at a bit lower speed, where NVIDIA goes less cores but higher speed. For gaming there is not much between them. But AMD’s solution comes in handy for the task of password cracking.  You can read up all kinds of things like AMD’s move from the VLIW to the CGN architecture, NVIDIA’s current FERMI architecture that the Geforce500 architecture is based on, the move to the 28nm process AMD already made and NVIDIA will do with the to be released Geforce600 architecture, but the bottom line is that AMD’s approach is faster for password cracking.

The battle isn’t won, both NVIDIA and AMD have the same goal: continue awesome graphics performance but also enlarge the use for General Purpose computing on GPU. So perhaps NVIDIA’s next move will change things, but for now go AMD.

Pick the AMD HD79XX series

AMD recently released the HD79xx series. My pick right now would be the HD7970 card. It’s performance is top of the line, and the pricing is not ridiculous (check stats at https://en.bitcoin.it/wiki/Mining_hardware_comparison). You can go one series below and go HD6970 or HD6990 (basically 2 HD6970s on one board). But only go that way if you find a nice discount.

In the next few weeks AMD will release the HD7990, which basically will be 2 HD7970s on one board. They did the same trick with the HD6990, and if something teaches us from that release is that availability will be very hard. If you buy one card that may not be a problem, but buy 4 of those at once and you may have an issue. Do note that AMD has an issue where no more than 8 cards are recognized by the system. So when going HD6990 or future HD7990 you can only hold 4 of them (as these cards are double GPUs on one card).  I’m sure NVIDIA has similar issues I just don’t know the exact limit at this moment (it used to be limited to four cards about two years ago).

Linux support for AMD sucks, expect issue or wait for new software versions

AMD has shown not to take Linux as seriously as Windows. The catalyst drivers for Linux are a mess, although they are getting better and better. NVIDIA has been in the same spot a few years back, and they have fixed it. AMD will also fix this, but it will take some time. Right now you can expect that the current release (12.1) has issue detecting the latest HD7970 card. Simply wait for a newer version of go Windows if you want to use this card.

So, with these main topics discussed, let’s dive into the four different options you have. Of course your budget is the main decider for what way you want to go. More budget pays for more power. I’ll start with the cheapest one.

Option 1: add new cards to your existing GPU cracker

Budget estimate: a few hundred Euro

If you already have a GPU box you can simply add or swap cards. As stated above the CPU, memory and chipset will not hold you back. Simply add an HD7970 to your box. Or if you already went NVIDIA find yourself a nice GTX590 or a discounted GTX570.

My experience with combining AMD and NVIDIA cards in one box are pretty bad. You can expect issues at the driver level (does combining NVIDIA and AMD drivers sound like a good idea to you?) and with the password cracking tooling (you are pushing limits and may encounter bugs the creators never looked for).  Good luck with that.

Note that Bitweasil notes that he has success with mixing AMD and NVIDIA on Linux (see his tips in the comments). I have not tried it, but give the driver model of Linux I would not be surprised if it does work. My experience with mixing cards is on Windows 7, which has been far from trouble free.

Option 2: building a new tower model GPU cracker from scratch

Budget estimate: base system 1000 Euro + Euros for a maximum of 4 double sided GPU cards to add

If you don’t already have a GPU box you can simply build your own. The option explained here covers hardware needed for a ‘simple’ tower model PC stacked with GPU cards to the max. Current of the shelve main boards allow for a maximum of 8 PCI cards, which leaves for a maximum of 4 double sided GPU cards.

As explained earlier you can go moderate on CPU, memory and chipset. Challenges here are to find main boards with as much as possible PCI slots but also the right tower model cases to have room for all the GPU cards and PSU’s. Cooling may also be an issue, although any big case allows for plenty fans to be positioned.

Main board options

  • Gigabyte GA-X79-UD3: uses the latest Intel socket 2011, is advertised to handle 4-way-SLI (which in our case is important as it will handle 4 double width GPU cards) and is advertised in NL for around Eur190. Also, as it has 2 PCIe1x slots, if you start using PCI risers you can add even more cards.
  • Gigabyte GA-990FXA-UD7: for AMD cpu’s. Not newest socket but has 6 PCIe slots in 16x size, one in PCIe1x size and a traditional PCI slot. Supposed to handle 4-way-SLI and advertised around Eur190.
  • Gigabyte GA-X79-UD7: basically the same as the Gigabyte X79-UD3 but this one doesn’t have any traditional PCI slots. With Eur300 it’s more expensive and I would only pick this one if you would go with PCI riser cards to fully use the extra slots. Also this main board requires a XL-ATX case (discussed later on).
  • Gigabyte GA-X58A-UD9: uses an older Intel socket but comes with 7 PCIe slots, all in PCIx16 size, but not 16x speed. Can handle 4-way-SLI Advertised around Eur400, but not sure if it is shipped anymore. It needs a XL-ATX case. I would only pick this one if you go PCI riser and choose GPU cards that don’t support a PCIe1x connector.
  • EVGA 270-WS-W555-A2: supports Intel 1366 socket (if you want to go Intel Xeon), has 7 PCIe slots and can cope 4-WAY-SLI. Advertised around $600, which I find expensive. But some prefer the ‘professional’ approach EVGA has on the main boards. Main reason for this one is the brand and if you want to use dual Xeon CPUs. For all cards to be filled you need a case that can hold 9 PCI cards. See below for a list.
  • MSI Big Bang MARSHAL B3: a bit older Intel socket (1155), but has 8 PCIe slots available, all in full size, reasonably priced at Eur340. However, can’t find it at many web shops so availability may be an issue.
  • MSI 890FXA-GD70: recommended by Bitweasil as he has good experience with it. Takes AMD cpu’s and takes 4 double sized GPU cards. I couldn’t find it anymore in NL web shops, but the last price it was know to go for was Eur180, which is pretty good.

Cases options

Main challenge with the case is size. Although not a real standard XL-ATX, Ultra ATX and HPTX are terms to look for. Some of the cases I found:

Cooling

Make sure to spend some effort on cooling. With that many GPU cards and PSUs you will need it. Any big case you buy allows for fans to be added. Make sure to use these.

Water cooling can be an option, but to be honest I don’t have experience with it so can’t advice you on it. I also haven’t looked at the options as our GPU cracking machines are positioned in an air controller lab.

Option 3: building your own scalable supercomputer on a budget

Budget estimate: base system 1000 Euro + Euros for as many GPU cards as you can fit

We will be using the same components here as we did with option 2, except for the case. Budget for the main computer is about the same. But as you can stack more GPU cards you can spend your bigger budget before going to a second box.

Main issue with the previous option is that you will not be using all PCIe slots. With double sided GPU cards you need PCI riser cables to use all slots, and no case allows for 8 double sided GPU cards to be fitted away from the main board. So, what if we go without case? The guys at HighSpeed PC have a product called Top Desk Tech Station. It’s as simple as a case can be.

Now, with the advertised options you have the same space as a normal XL-ATX case. However, they also build custom design. I’ve been in contact with them for an extended version of their HPTX version. It’s fairly easy for them to adjust the design so you can lift the GPU cards and stack 8 double sided cards. I’ve seen the not yet released design and it simply rocks as it has a third level for your cards that use the PCI-risers. You can go even further and use PCI splitters to combine several cards on one PCIe slot (do note AMD’s maximum of 8 cards recognized). The Top Desk Tech Station XL-ATX goes for Eur180.

Pricing for the custom built (which will become a new product as they receive more and more demand) is not detailed at this moment. But the price they were offering me the custom built for is only just a tad more expensive and still is very reasonable.

Now for connecting the cards to the main board you need flexible PCI riser cables. These come in 16x size and in 1x size. Price around 10 to 30 Euro per cable.

Cooling does become an issue, so make sure you attach enough fans to your system. In my situation where the box is in an air conditioned environment these cooling issues are non existing.

If you are worried about warranty find yourself a local computer dealer that will built this system for you and sell it as one. That way they can handle any warranty issues if you encounter them.

Option 4: buying a pre built super GPU computer

Budget estimate: Eur13.000 excluding tax and shipping

The final option you have is to go professional and buy a solution from the guys at Renderstream. My pick would be the VDACTr8-A model. It can hold 8 double size GPU cards. The Renderstream solution is based on the TYAN FT77B7015 barebone with a custom built S7015 main board that has the PCIe slots positioned so it takes 8 double sized GPU cards.

Perhaps you can purchase these components yourself and save some money. I did look into this but had a really hard time finding shops where you can buy the TYAN FT72B7015 and the main board. Eventually I gave up. Also buying the entire solution from one vendor has much added benefit in terms of warranty and service. Be sure to ask them yourselves for a quote, but think about Eur13.000 for the basic VDACTr8-A model with 8x HD7970. That is excluding tax and shipping. Positive note for us Europeans, second half of 2012 they will be opening their warehouse/shipping center in Europe.

update: added a few remarks from Bitweasil’s comments below to be inline with the text. Also added more details on the custom built from the guys at Top Desk.

Advertisements