Generating effective wordlists

Overview

Cracking passwords is more art than science. Most users are (hopefully) beyond the days of using common passwords found in popular wordlists, but people are still people, and passwords are hard. It is now common to find that users have passwords comprised of industry-specific words.

So how do we find the right words?

This is a guide on creating customised wordlists for use in brute-force attacks and password cracking.

What is password cracking

Password cracking is the process of hashing provided words or combinations and comparing it to the original hash. If the hashes match, the password is cracked. This means we have to provide the correct combination and hash type to get a match. Some hashing algorithms are easier for modern systems to process than others, which will effect the amount of hashes-per-second that can be generated.

For hashes that are more intensive (such as WPA passphrases for wireless), running through a large wordlist is time consuming and often ineffective. Therefore coming up with a short and more tailored list may be a better approach.

Let’s say we have extracted a client’s NTDS, which uses NTLM hashes and are easy to crack. We are performing a password audit for the client to see how many of their passwords are “easily” cracked.

Before we start, we need to perform some quick steps to prepare for password cracking.

Common Wordlists

There are literally thousands of wordlists available on the internet for you to grab. Some will be more effective than others if you are looking to crack a specific hash. But don’t just download them all blindly. Some of these are extraordinarily large (like nearly 100gb).

Common ones are:

There are a lot of very specific wordlists available which can be incredibly helpful, such as This one for cracking NetGear generated WiFi passwords. If you are trying to crack a specific password and know a little about the device, this can go a long way.

I put all my favourite wordlists in a folder, ordered by file size. (I will cover why they start at 2 a bit later)

./Wordlists
- ./Favourites
-- 2.mystery-list.txt
-- 3.ignis-10M.txt
-- 4.rockyou.txt
-- 5.kaonashi.txt
-- 6.realuniq.lst
-- 7.rockyou2021.txt

Custom wordlists

As I mentioned earlier these common wordlists, while still incredibly useful, are not always going to be effective. So we need to get a little creative. We can use some tools or some creative thinking to create targeted wordlists.

As an example, we can take some JSON data, such as this one which is a list of Australian suburbs. Then we can run it through jq to get what we need.

cat suburbs.json| jq -r '.data[].suburb'

Or we can refine it further:

cat suburbs.json| jq -r '.data[] | select(.urban_area=="Albury - West") | .suburb'

You can continue to alter the list as you like, doing things like removing the spaces between each word, as they tend to be uncommon in passwords.

CeWL

Another great way to get relevant data is to scrape the client’s website. We can use CeWL from digininja to do this for us.

cewl -c -m 5 --lowercase -w cewlwordlist.txt https://www.volkis.com.au

This will scrape the Volkis website for all words longer than 5 characters and order them according to their frequency. This is great for finding industry-specific words and key information about the client that people may use for passwords.

We get something that looks like this:

security, 792
volkis, 486
penetration, 395
testing, 264
about, 252
assessment, 162
strategy, 159
engineering, 153
review, 152
social, 152
touch, 151
compliance, 143
vulnerability, 141
cloud, 139
policy, 136
physical, 134
vulnerabilities, 132
executive, 129
terms, 129
education, 127
there, 127
their, 125

We then want to clean it up a bit. After all, there are some useless words in there, such as “about”, “terms”, “there” and “their”. But going through the whole list sucks, so we take the top 200 words from the list (cleaning out the counter as we go) and go from there:

head -n 200 cewlwordlist.txt | sed 's/,\s.*$//' > top_200_cewlwordlist.txt

Then we can manually check 200 words and remove any that are pointless.

CUPP

The Common User Password Profiler (CUPP) tool from Mebus is a great tool for generating user-specific passwords, but also has a bunch of extra uses.

By launching an interactive session, you can follow the prompts to generate wordlists based off the victim’s information. It is generally a good idea to start with only a little bit of information (first and last name), otherwise the wordlists can get very large, very quickly.

cupp -i

It will also prompt if you would like to add Special Characters, “Random numbers”, and convert characters to “1337 mode”. However, this tool is a little old so it needs a little configuration for this to be useful.

Find your “cupp.cfg” file. If you installed it via the Kali repos it will likely be at /etc/cupp.cfg, but you can find it using either:

locate cupp.cfg` 
#or 
find / -iname "cupp.cfg

Edit the config file to add in later years (I add in up to 2025) and change the l33t mode settings as you see fit. (I like a = @, i = !). Also adjust the “Word length shaping” setting to account for longer words.

Note: If you want to add more characters you will need to edit the tool itself to do the extra substitutions.

Another cool feature of CUPP is to take an existing wordlist (like the ones we created earlier) and mangle them for us with our settings.

cupp -w <wordlist>

Username Anarchy

One last tool worth mentioning is Username Anarchy by Urban Adventurer. It’s a simple tool that will take a string and generate a short list of possible usernames. As it is not uncommon to find user’s names in passwords, this can generate some interesting results. It’s also super useful for brute-force attacks.

./username-anarchy Harry Potter > harrypotter.txt

Cleaning our wordlists

If we have information about the password policy, we can use that to our advantage to trim our wordlists down. This reduces the amount of processing and increases out chances of success.

This is the cleanest way I have found to do this. (there are more efficient methods but they are harder to read). This one-liner will find all words meeting the following requirements:

  • Uppercase letter
  • Lowercase letter
  • Special character
  • Number
  • Between 8-12 characters
cat rockyou.txt |\
 grep '[[:upper:]]' |\
 grep '[[:lower:]]' |\
 grep '[[:digit:]]' |\
 grep '[[:punct:]]' |\
 grep -E '^.{8,12}$' > wordlist.txt

If you just want to set a minimum character length (12 characters in this case) you can use:

cat rockyou.txt |\
 grep '[[:upper:]]' |\
 grep '[[:lower:]]' |\
 grep '[[:digit:]]' |\
 grep '[[:punct:]]' |\
 grep -Ev '^.{,11}$' > wordlist.txt

You can get more specific too if need be:

  • Uppercase letter
  • Lowercase letter
  • Longer than 20 chars
  • One of the following special characters: $,#,@
  • Ends in a digit
cat rockyou.txt |\
 sed -r '/[$#@]+/!d' |\
 sed -r '/^.{,19}$/d' |\
 sed -r '/[0-9]$/!d' |\
 grep '[[:upper:]]' |\
 grep '[[:lower:]]' > wordlist.txt

Now that we have our wordlists sorted, we can use them for a brute-force attack with your favourite tool, or offline password cracking.

Password cracking

So now we get to the fruits of our work. The password cracking itself. Assuming we have got our password hash list and our wordlists sorted, we need to start cracking.

Let’s pretend we have got Domain Admin on our client’s network and we have extracted the NTDS. We are going to perform a bit of a password audit for them.

Cleaning the hash list

Each of our consultants has a different preference of tools. As such they produce slightly different outputs that need to be cleaned up before they can be processed. One such tool is GoSecretsDump, which is fast but can be annoying to clean.

So, first we want to strip machine accounts out. We can tackle these later if we need to.

cat ntds.txt | grep -v "\$:" 

Then we need to strip out anything that is not an NTLM hash:

grep -v "aes[128|256]" | grep -v "des-cbc" | grep -v "rc4-hmac"

and then we need to strip out anything that isn’t a hash at all (That isn’t a spelling error. Well it is, but the tool outputs a spelling error… which helps us )

grep -v "gosecretsdump" | grep -v "Coudln"

Lastly, we may want to remove the password history, for the sake of cleanliness while testing.

grep -Ev "_history[0-9]{1,2}:" 

So the full command looks like this:

cat ntds.txt |\
 grep -Ev "_history[0-9]{1,2}:" |\
 grep -v "\$:" |\
 grep -v "aes[128|256]" |\
 grep -v "des-cbc" |\
 grep -v "rc4-hmac"|\
 grep -v "gosecretsdump" |\
 grep -v "Coudln" > ntds_clean.txt

Rules

The next stage is to decide on what rule set to run it through, as the words in a wordlist are often not enough. Larger rulesets are more effective but more time consuming so they need to be considered. Hashcat has built in rules, that can be found in the ./rules folder. Some, such as leetspeak and best64 can be useful but not always.

You can download additional rulesets from the internet. Some popular ones are Clem9669 and OneRuleToRuleThemAll or the updated version of OneRuleToRuleThemStill. Or, if you are feeling brave, you can make your own.

Command syntax

This is my go-to command for cracking passwords.

hashcat -m 1000 /path/to/ntds.txt -r ./rules/OneRuleToRuleThemStill.rule /path/to/wordlists/* -O --loopback

Let’s break it down:

  • -m 1000 - Mode 1000 (NTLM). A list of modes can be found here.
  • /path/to/ntds.txt - a text file containing the password hashes.
  • -r ./rules/OneRuleToRuleThemStill.rule - Use this rule to mangle the wordlist
  • /path/to/wordlists/* - Add all wordlists in this folder to the Guess Queue (more on this in a second)
  • -O - Use optimised kernels (changes the driver used by hashcat)
  • --loopback - Add all discovered passwords to a list, then run through that list at the end re-applying the same ruleset to find new passwords.

When supplying a wildcard for the wordlist, they will be run in alphabetical order. For this reason I start my list of common wordlists at 2., which allows me room to add in custom wordlists I would like to try first by appending them with 0., or 1..

It will continue to work through each wordlists, applying the --loopback function after each one, until the Guess Queue is complete.

We can also stack rules if we need to to create different permutations. However, the order of the rules is important and will create different results.

A common pairing is -r ./rules/best64.rule -r ./rules/leetspeak.rule which will do some basic transformations, and then apply the leetspeak rule on top of those permutations. However, doing this in the reverse order gets a different list, which I have found to be less effective.

Another common one is -r /rules/clem9669_case.rule -r ./rules/clem9669_medium.rule. This works quite well but is a much larger set than the best64/leetspeak combo and therefore takes more time. However, more permutations means higher chance of success, so make your judgement accordingly.

Conclusion

Hopefully this little(?) guide on wordlists and password cracking will help you to find some new and exciting passwords. I will add to it as I find new tricks.

Good luck.