Archive for the Password cracking Category

What hardware to choose when building a GPU based password cracker right now (Q1 2012)?

Posted in Notes to myself, Password cracking on 2012/02/06 by mram

GPU based password cracking has unmet power when brute force cracking. Although brute force cracking is only part of the game (see also my over a year old post on CPU based cracking not being dead here) any modern security testing lab includes GPU password cracking functionality.

The field of GPU hardware is heavily in development. What was top of the line 18 months ago is somewhat reasonable right now. As I’m the process of upgrading the GPU hardware in our security testing lab myself, I just researched several possibilities with the current state of GPU hardware taken into account. This may be different in a few months, but for now (Q1 2012) these are the best picks I could find. And I thought to share them with you.

I narrowed it down to four different options, ranging from a few hundred to 13.000 Euro.

Common decisions for all possible options

Before diving into the different options, let’s discuss a few main decisions that are the same for any way you go.

Power is not really an issue when you can combine power supplies

GPU cards consume a lot of power. Having several GPU cards in your box requires a massive PSU. We are talking 1200+ Watt here when having a few modern cards. High Watt PSUs are expensive, especially when you want  ’80 PLUS’ certified – you want these as these are guaranteed to require only 20% of extra Watt from the power outlet to reach the advertised amount of Watt, these extra 20% are transformed into heat, the byproduct of any PSU. But as you do consume a lot of power you do need a big – and therefore expensive – PSU. Fortunately there are easy solutions to combine several mid range PSUs into the PSU of your requirements. ADD2PSU allows you to daisy-chain even more than two PSUs into one. Lian-Li Dual Power Supply Adapter (availability is hard, not sure if still shipped) allows you to combine two PSUs into one. Both simple solutions for our problem. Of course you can do this yourself with soldering cables. But with these solutions and prices (Eur 20) I wouldn’t start tampering with electrical power myself.

When picking PSUs make sure to take PSU that allow for enough connectors. Preferably a PSU like the Corsair AX1200 that allows for connecting the cords yourself.

CPU, chipset and main memory don’t really make a difference

It is all about the GPU cards. Unless you want to do more on the box you are creating I wouldn’t spent too much Euros on top of the line CPUs, chipsets and memory. Any Intel socket 1366 or even socket 1155 is good enough. If you want to go AMD, socket  AM3 or AM3+  is good enough. Of course you can go to the newest sockets  but it doesn’t provide you with more cracking power. Same goes for MHz, it will not provide you with more cracking power. Memory should be enough to run your OS of choice and some more. Don’t be on the cheap side, no computer runs OK with insufficient RAM, but I still need to find the first cracking program that requires gigabytes of memory, except for rainbowtable (in that case system ram does matter a bit but you should calculate your needs based on the size of tables you are using).

Be smart and don’t pick top of the line here on CPU, socket and main memory. It will save you a considerable amount of money that you can than spend on GPU cards.

One of the commentators (Bitweasil, author of the Cryptohaze Multiforcer crack tooling so definitely somebody who has experience with this) recommended to match system RAM with RAM from GPU. With system RAM begin very cheap nowadays and most GPU cards shipping with about a Gig of RAM, you would probably match it by using a ‘default’ amount of 4-8GB. He also recommends to match the amount of CPU cores with the amount of GPU cards, just in case GPU drivers are not optimized as they should. I guess this makes sense, but also shouldn’t be a problem with most CPU’s nowadays being multi core.

PCIe1x is fast enough

This is an important one when choosing your main board. Many boards are advertised with X amount of PCIe16x slots. But when you look closer in the specs you notice that the 16x speed is shared between slots. So when for example slot 1 and slot 3 are used simultaneously, they are downgraded to both PCIe8x or even lower. If you think “more is better” this really makes it hard to pick a main board with as many as possible PCI16x slots. I’ve got news for you, main boards with 8 slots of true PCIe16x are limited to non existing. But there is also is no need for. If you go gaming (still the largest market for creators of main board with many PCI slots) you want to go SLI with some PCIe16x. In that case the cards mostly communicate via the SLI bridge and not via the PCI bus. But we go password cracking, not gaming. And with password cracking PCIe1x is fast enough.

PCIe works with lanes. The amount of lanes is a factor of two between 1 and 32 and is represented by the number directly after the “PCIe”. PCIe16x means 16 lanes. 16x Seems to be top of the line on most boards. PCI version 2 (which is the most used version for GPU cards and main boards right now) has a speed of “500MB per second per lane”. Now, with games textures and vertices are continuously processed by the cards. These are heavy calculations on big sets of input data that together require significant throughput on the PCIe bus. But with password cracking we are talking simple operations on data. Transfer of ‘data’ as in the list of base input words that are to be hashed + ‘operation’ as the set of calculations to be performed by the cores to calculate the hash on the GPU are transferred over the bus only periodically. No way that the GPU can calculate hashes so fast it requires 500MB of data + operations every second. GPU’s are simply not powerful enough at this moments to achieve 500MB/s.

So, PCIe1x is speedy enough. Suddenly a lot more main boards become available :-)

Memory on GPU card is not a delimiting factor

This continues on the discussion that the throughput of the GPU cards isn’t that big for password cracking compared to gaming. Using a gigabyte of memory on the card is a ridiculous huge amount that no tool will use. Perhaps only when you are using ridiculously large dictionary files. But if you are using dictionaries that are approaching 1 gigabyte you might need to verify the usefulness of the dictionary. Brute force will be faster.

So, save yourself some money and don’t go for the GPU cards with a ridiculous amount of memory. It will not improve your cracking speed. And with most cards shipping nowadays with 1-1.5GB of RAM, my pick would be those, and not the extra expensive with 2GB.

PCI riser cards can come in handy

With also the PCIex1 slots being usable for cracking, the only thing you need to overcome to use all PCIe slots on a main board is the fact that most GPU cards require the physical space of two PCIe slots. Flexible PCI riser cards come in handy here.  If you can find a way to lift the cards and have a big enough box to fit all these double sized GPU cards, you can then interconnect them with the main board via (flexible) PCI riser cards. Many solutions exist. Note that in theory all you need is a PCIe1x connection (the shortest possible connector). Just make sure the card you buy allows for it without sawing holes in the PCI connector (and if you do want to saw in your PCIe equipment here is an excellent tutorial: http://blog.zorinaq.com/?e=42) .

AMD has the more powerful architecture

When buying GPU cards for password cracking you have two different vendors to choose from: NVIDIA and AMD. Which one to pick? Short answer: go AMD, the results are all over the place.

Long answer: go AMD because they have an architectural preference of more cores/ALUs, resulting in more parallel calculations. AMD has more cores at a bit lower speed, where NVIDIA goes less cores but higher speed. For gaming there is not much between them. But AMD’s solution comes in handy for the task of password cracking.  You can read up all kinds of things like AMD’s move from the VLIW to the CGN architecture, NVIDIA’s current FERMI architecture that the Geforce500 architecture is based on, the move to the 28nm process AMD already made and NVIDIA will do with the to be released Geforce600 architecture, but the bottom line is that AMD’s approach is faster for password cracking.

The battle isn’t won, both NVIDIA and AMD have the same goal: continue awesome graphics performance but also enlarge the use for General Purpose computing on GPU. So perhaps NVIDIA’s next move will change things, but for now go AMD.

Pick the AMD HD79XX series

AMD recently released the HD79xx series. My pick right now would be the HD7970 card. It’s performance is top of the line, and the pricing is not ridiculous (check stats at https://en.bitcoin.it/wiki/Mining_hardware_comparison). You can go one series below and go HD6970 or HD6990 (basically 2 HD6970s on one board). But only go that way if you find a nice discount.

In the next few weeks AMD will release the HD7990, which basically will be 2 HD7970s on one board. They did the same trick with the HD6990, and if something teaches us from that release is that availability will be very hard. If you buy one card that may not be a problem, but buy 4 of those at once and you may have an issue. Do note that AMD has an issue where no more than 8 cards are recognized by the system. So when going HD6990 or future HD7990 you can only hold 4 of them (as these cards are double GPUs on one card).  I’m sure NVIDIA has similar issues I just don’t know the exact limit at this moment (it used to be limited to four cards about two years ago).

Linux support for AMD sucks, expect issue or wait for new software versions

AMD has shown not to take Linux as seriously as Windows. The catalyst drivers for Linux are a mess, although they are getting better and better. NVIDIA has been in the same spot a few years back, and they have fixed it. AMD will also fix this, but it will take some time. Right now you can expect that the current release (12.1) has issue detecting the latest HD7970 card. Simply wait for a newer version of go Windows if you want to use this card.

So, with these main topics discussed, let’s dive into the four different options you have. Of course your budget is the main decider for what way you want to go. More budget pays for more power. I’ll start with the cheapest one.

Option 1: add new cards to your existing GPU cracker

Budget estimate: a few hundred Euro

If you already have a GPU box you can simply add or swap cards. As stated above the CPU, memory and chipset will not hold you back. Simply add an HD7970 to your box. Or if you already went NVIDIA find yourself a nice GTX590 or a discounted GTX570.

My experience with combining AMD and NVIDIA cards in one box are pretty bad. You can expect issues at the driver level (does combining NVIDIA and AMD drivers sound like a good idea to you?) and with the password cracking tooling (you are pushing limits and may encounter bugs the creators never looked for).  Good luck with that.

Note that Bitweasil notes that he has success with mixing AMD and NVIDIA on Linux (see his tips in the comments). I have not tried it, but give the driver model of Linux I would not be surprised if it does work. My experience with mixing cards is on Windows 7, which has been far from trouble free.

Option 2: building a new tower model GPU cracker from scratch

Budget estimate: base system 1000 Euro + Euros for a maximum of 4 double sided GPU cards to add

If you don’t already have a GPU box you can simply build your own. The option explained here covers hardware needed for a ‘simple’ tower model PC stacked with GPU cards to the max. Current of the shelve main boards allow for a maximum of 8 PCI cards, which leaves for a maximum of 4 double sided GPU cards.

As explained earlier you can go moderate on CPU, memory and chipset. Challenges here are to find main boards with as much as possible PCI slots but also the right tower model cases to have room for all the GPU cards and PSU’s. Cooling may also be an issue, although any big case allows for plenty fans to be positioned.

Main board options

  • Gigabyte GA-X79-UD3: uses the latest Intel socket 2011, is advertised to handle 4-way-SLI (which in our case is important as it will handle 4 double width GPU cards) and is advertised in NL for around Eur190. Also, as it has 2 PCIe1x slots, if you start using PCI risers you can add even more cards.
  • Gigabyte GA-990FXA-UD7: for AMD cpu’s. Not newest socket but has 6 PCIe slots in 16x size, one in PCIe1x size and a traditional PCI slot. Supposed to handle 4-way-SLI and advertised around Eur190.
  • Gigabyte GA-X79-UD7: basically the same as the Gigabyte X79-UD3 but this one doesn’t have any traditional PCI slots. With Eur300 it’s more expensive and I would only pick this one if you would go with PCI riser cards to fully use the extra slots. Also this main board requires a XL-ATX case (discussed later on).
  • Gigabyte GA-X58A-UD9: uses an older Intel socket but comes with 7 PCIe slots, all in PCIx16 size, but not 16x speed. Can handle 4-way-SLI Advertised around Eur400, but not sure if it is shipped anymore. It needs a XL-ATX case. I would only pick this one if you go PCI riser and choose GPU cards that don’t support a PCIe1x connector.
  • EVGA 270-WS-W555-A2: supports Intel 1366 socket (if you want to go Intel Xeon), has 7 PCIe slots and can cope 4-WAY-SLI. Advertised around $600, which I find expensive. But some prefer the ‘professional’ approach EVGA has on the main boards. Main reason for this one is the brand and if you want to use dual Xeon CPUs. For all cards to be filled you need a case that can hold 9 PCI cards. See below for a list.
  • MSI Big Bang MARSHAL B3: a bit older Intel socket (1155), but has 8 PCIe slots available, all in full size, reasonably priced at Eur340. However, can’t find it at many web shops so availability may be an issue.
  • MSI 890FXA-GD70: recommended by Bitweasil as he has good experience with it. Takes AMD cpu’s and takes 4 double sized GPU cards. I couldn’t find it anymore in NL web shops, but the last price it was know to go for was Eur180, which is pretty good.

Cases options

Main challenge with the case is size. Although not a real standard XL-ATX, Ultra ATX and HPTX are terms to look for. Some of the cases I found:

Cooling

Make sure to spend some effort on cooling. With that many GPU cards and PSUs you will need it. Any big case you buy allows for fans to be added. Make sure to use these.

Water cooling can be an option, but to be honest I don’t have experience with it so can’t advice you on it. I also haven’t looked at the options as our GPU cracking machines are positioned in an air controller lab.

Option 3: building your own scalable supercomputer on a budget

Budget estimate: base system 1000 Euro + Euros for as many GPU cards as you can fit

We will be using the same components here as we did with option 2, except for the case. Budget for the main computer is about the same. But as you can stack more GPU cards you can spend your bigger budget before going to a second box.

Main issue with the previous option is that you will not be using all PCIe slots. With double sided GPU cards you need PCI riser cables to use all slots, and no case allows for 8 double sided GPU cards to be fitted away from the main board. So, what if we go without case? The guys at HighSpeed PC have a product called Top Desk Tech Station. It’s as simple as a case can be.

Now, with the advertised options you have the same space as a normal XL-ATX case. However, they also build custom design. I’ve been in contact with them for an extended version of their HPTX version. It’s fairly easy for them to adjust the design so you can lift the GPU cards and stack 8 double sided cards. I’ve seen the not yet released design and it simply rocks as it has a third level for your cards that use the PCI-risers. You can go even further and use PCI splitters to combine several cards on one PCIe slot (do note AMD’s maximum of 8 cards recognized). The Top Desk Tech Station XL-ATX goes for Eur180.

Pricing for the custom built (which will become a new product as they receive more and more demand) is not detailed at this moment. But the price they were offering me the custom built for is only just a tad more expensive and still is very reasonable.

Now for connecting the cards to the main board you need flexible PCI riser cables. These come in 16x size and in 1x size. Price around 10 to 30 Euro per cable.

Cooling does become an issue, so make sure you attach enough fans to your system. In my situation where the box is in an air conditioned environment these cooling issues are non existing.

If you are worried about warranty find yourself a local computer dealer that will built this system for you and sell it as one. That way they can handle any warranty issues if you encounter them.

Option 4: buying a pre built super GPU computer

Budget estimate: Eur13.000 excluding tax and shipping

The final option you have is to go professional and buy a solution from the guys at Renderstream. My pick would be the VDACTr8-A model. It can hold 8 double size GPU cards. The Renderstream solution is based on the TYAN FT77B7015 barebone with a custom built S7015 main board that has the PCIe slots positioned so it takes 8 double sized GPU cards.

Perhaps you can purchase these components yourself and save some money. I did look into this but had a really hard time finding shops where you can buy the TYAN FT72B7015 and the main board. Eventually I gave up. Also buying the entire solution from one vendor has much added benefit in terms of warranty and service. Be sure to ask them yourselves for a quote, but think about Eur13.000 for the basic VDACTr8-A model with 8x HD7970. That is excluding tax and shipping. Positive note for us Europeans, second half of 2012 they will be opening their warehouse/shipping center in Europe.

update: added a few remarks from Bitweasil’s comments below to be inline with the text. Also added more details on the custom built from the guys at Top Desk.

CPU based password cracking is not dead!

Posted in Password cracking, Security on 2010/11/05 by mram

In the old day, password cracking (or password auditing or recovery if you are that old school) was relatively easy. You got the hashes from a system, put them in John The Ripper, waited a while and had results. If you wanted faster cracking you just bought a bigger CPU. In the last few years much has changed. We have seen new ways for password cracking like pre-computation tables and rainbow tables. But one of the major recent shifts is that to new architectures with massive theoretical power that we can use for brute force password cracking.

In this post I will not be challenging the enormous computational advantages for brute force password cracking that new architectures provide. These new architectures are simply better for specialized tasks.

However, this post is about:

  1. Putting the power of new architectures in perspective (that of a professional penetration tester*, see below for details);
  2. Proving that CPU based password cracking is long from dead;
  3. The introduction to a little hobby project I will discuss in a future post.

New architectures and why they are not usable yet

So let’s start with the new architectures that are  already being discussed in relationship to password cracking. These architectures are:

  • Cell architecture (e.g. from the PlayStation3);
  • Field-Programmable Gate Array (FPGA) and to some extend even Application Specific Integrated Circuits (ASIC);
  • Cloud Computing;
  • Graphics cards, or Graphics Processing Units (GPU).

I will only cover GPU here and not the Cell, FPGA/ASIC and Cloud architectures. They are proven to be very fast in very specific situations, but non usable at this moment as they have too many disadvantages at this moment. For Cell research has been done by Nick Breese but practical implementations are very limited, only MD5 and WPA that I know of. Others you should create yourself. FPGA and ASICS require a setup per hash type or reprogramming your setup. They require detailed knowledge of pseudo-hardware design and programming skills for every specific hash type. Therefor they are relatively expensive and only interesting for very targeted attacks. Finally, Cloud Computing sounds cool but is ridiculous expensive for password cracking. It also has an inherent insecurity that you will be sending your client’s data to a  service provider which in itself may require you to change your contract with your own client.

That does not mean that some bloke somewhere in the world has got a setup up and running. Or that very specific setups are actually kicking ass, like the FPGA setup for cracking A5/1.  It does mean that these architectures are not ready for wide scale usage at this moment.

If you disagree with the stated disadvantages, please keep on reading as there are disadvantages to GPU based cracking that also apply to the just mentioned architectures.

They say GPU is the new way

There is one architecture that has come to a very fast rise in the last few years: graphics cards, or Graphics Processing Units (GPU) with General-Purpose computation on Graphics Processing Units (GPGPU) standing for the activity of using your graphics card for doing other computations than graphics. I will not cover the details of GPU’s here, many resources exist on the intertubes. What is important for this article is that GPU’s have more power than CPU’s have for parallelization, which happens to be quite useful for brute force password cracking as these are simple calculations that can be programmed in parallel very easily. It’s Single Instruction Single Data (SISD) on CPU versus Single Instruction in Multiple Data (SIMD) for GPUs. SIMD wins in regards to raw power on predefined tasks. Many research exists on the net about this topic, like this.

OK, that’s all very interesting but nothing new I hear you say. GPUs are fast, new, full with potential and kicks but for password cracking. But let’s take a step back and be critical for a moment.

Should we give up on CPU based password cracking?

My answer is no, or not yet. I’ve got two reasons for that:

  1. Brute force cracking is only part of the game;
  2. GPU tools have several key disadvantages at this moment.

I will discuss both.

Brute force power is only a part of the job

Up until now I only covered brute force cracking. I would like to point out that brute force cracking should only be considered as a last resort. A fast cracking of password hashes depends on much more:

  1. A descent cracking strategy for the hash type. Hash types differ in ease of cracking. Per hash types and per knowledge about the client or the effective password policy it differs if you want to use rainbow tables, dictionaries, brute force and/or educated guesses, and in what order you want to use this;
  2. A good dictionary, customized to the environment. Dictionary cracking is faster than brute force and is an essential part of cracking. The dictionary should reflect the words people tend to use as the base of their password. The dictionary is than used for cracking on the raw words (e.g. vanessa) and on the mutations of the raw words (e.g. Vanessa2003). With a dictionary adjusted to the specific environment you can make a big difference.
  3. Good, stable tools that you can use for the actual cracking. This means support for the hash type and non crashing. If I put in a list of hashes to be cracking during the night, I must be sure that I get some results in the morning. I also need an easy to use interface.  In my case I want it to be accessible for my team via a web interface and possibly via a secure email interface.
  4. Raw power for brute force cracking. This is the step where we simply try all possible combinations of the characters space to find a password as apparently the password is that strong.

As you can see, only the final step includes brute force cracking. By the time you get there most of the times you already have cracked a large set of the hashes. If you have more raw power, you can make a difference on the final step. Only on the final step.

Disadvantages of GPU tools

I’ve been playing around with a GPU setup for several years now. My setup consist of: Intel i7 920 @ 2,66GHz, 6GB DDR3 @ 1066MHz, 2x ASUS ENGTX295x GPU cards with 1.8GB memory, 1x NVIDIA 9800GT,  ASUS P6T7 Supercomputer mother board and a 1500Watt power supply. Now, this a pretty impressive system and the results of cracking on this box are also. It has shown that GPU based password cracking is very fast and an easy way to go for replacement of CPU based password cracking on a single box.

But during my testing, this setup has shown several very important disadvantages  that prevent me and my team from usage. These disadvantages are:

  1. Support of hashes. Many tools exist and most of the tools support the most used hashes, e.g. LM, NTLM, MD5. But there are many more hash types that I need support for (e.g. Kerberos, MD5 Crypt, MS Cached, MySQL, SHA, Oracle, etc.) as they are used in the real world at my clients;
  2. The tools are highly unstable. It truly is a market that is not yet matured. Whizzkids pop-up doing some blindingly fast implementations of specific hashes types. But the result is that the tools are in a beta or 0.x stadia, remain there for a long time and that the majority of the tools only focus on 1 hash type;
  3. It’s very hard to scale. Clustering or distributed usage is not possible with the current tools so you are stuck with one box. To put a box full of GPU cards requires immense power supplies, a mother board with a ridiculous amount of PCI slots (of which only a few exist). And then you still only have one box which isn’t very useful if you have a team of people wanting to crack hashes.  You could also go Tesla, but the performance on Tesla setups is not that great with the available tools: the whizzkids simply can’t develop for an architecture that they don’t have. Tesla is also not the cheapest way to go;
  4. It’s hard to automate as many tools only support 1 hash per time. The interfaces of the tools are all different with some being solely interactive (non scriptable, darn you Windows GUI apps);
  5. The performance gain of GPUs is on average about 5-30x compared to CPU based cracking. Faster is better, but I find it not _that_ shocking (OK, relatively).

It’s when I simply compete my old school CPU based John The Ripper setup with wordlists and easy to use and stable interface to the GPU cracking server and tools with these disadvantages, the CPU thing is simply faster most of the time. Only when I’m looking for that true random 9 character hash GPUs do the trick. But when you are at 10 characters, the majority of the hashes is non breakable for GPU. So the sweet spot for GPU is limited at this moment.

No I don’t forget the commercial tools

There are commercial tools available that do support more hash types, are distributed to some extend and should be stable. For example the guys at Elcomsoft make some cool stuff. I really support these companies in making their business out of password auditing. But their licensing and/or their fee simply doesn’t make it usable for me. Also, the impact of cracking passwords to my clients is a bigger when I only use freely available tools. Yes, you may think this shouldn’t count as a valid reason. But the thing is that I can recommend the client only to a certain extend. In the end my client is the one that decides to pick up a finding about weak passwords and give it a certain priority for follow up. In my experience the acceptance and priority is much higher when I use freely available tools during the illustrating in the reporting and/or the demo of the hack.

Conclusion

So, there you have it. I’ve been using GPU based password cracking for some time now. I’ve seen the power. I fully support all the different tools that are out there and I expect them to be fully awesome in the future. I really do.

But at this moment there are too many disadvantages, and the advantages are not that great. Maturing the GPU tools, having support for more hashes and be able to cluster it, that would be great. But before we are their yet I don’t want to give up CPU based password cracking combined with a good cracking strategy and good dictionaries. It simply better suits my need as a professional pentester.

For me personally one of the biggest disadvantages of current GPU tools is the interface and the abbility to scale to distributed environment. As I will show you in a future blog post, I’ve got a pretty cool solution for that for CPU based tooling: clustering with a proper interface. I get really cool results with that.

* My background here is that of professional penetration testing. When I’m at a client and hacked one or several of their system I need to pick out the (too) easy password immediately and be able to crack the remaining hundreds or thousands hashes I found in short amount of time. I don’t necessary need to crack them all, although that would be convenient. Where a real hacker has lots of time, I need to provide the client with proper insight within a short amount of time.  I don’t have a gazillion euro budget to buy all tools available and I will not be sending the hashes of my client to a different service provider. I do work in a team of testers, we share a lab consisting of several systems that can support us in our work, and I do have knowledge of what type of passwords people actually choose. Operations of the cracking servers should be fast and easy for us.