In general, "they" say having a GOOD 'dictionary' or 'wordlist' (as far as I know, they're the same!) is 'key'. But what makes them GOOD? Most people will then say the bigger is better. However, this isn't always the case... (for the record I don't agree with this - more on this later). This post contains (other than a mass of download links) is pretty pictures and confusing numbers, which shows the break down of statistics regarding 17 wordlists. These wordlists, which the original source(s) can be found online, have been 'analysed', 'cleaned' and then 'sorted' for example:
Collection Name (Orignal Source) | Size
(Extracted/ Compressed)
| Download | MD5 |
---|---|---|---|
Collection of Wordlist v.2 | 374806023 (3.9GB / 539MB) | Part 1, Part 2, Part 3 | 5510122c3c27c97b2243208ec580cc67 |
HuegelCDC | 53059218 (508MB / 64MB) | Part 1 | 52f42b3088fcb508ddbe4427e8015be6 |
Naxxatoe-Dict-Total-New | 4239459985 (25GB / 1.1GB) | Part 1, Part 2, Part 3 Part 4, Part 5, Part 6 | e52d0651d742a7d8eafdb66283b75e12 |
Purehates Word list | 165824917 (1.7GB / 250MB) | Part 1, Part 2 | c5dd37f2b3993df0b56a0d0eba5fd948 |
theargonlistver1 | 4865840 (52MB / 15MB) | Part 1 | b156e46eab541ee296d1be3206b0918d |
theargonlistver2 | 46428068 (297MB / 32MB) | Part 1 | 41227b1698770ea95e96b15fd9b7fc6a |
theargonlistver2-v2 (word.lst.s.u.john.s.u.200) | 244752784 (2.2GB / 219MB) | Part 1, Part 2 | 36f47a35dd0d995c8703199a09513259 |
WordList Collection | 472603140 (4.9GB / 1.4GB) | Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 | a76e7b1d80ae47909b5a0baa4c414194 |
wordlist-final | 8287890 (80MB / 19MB) | Part 1 | db2de90185af33b017b00424aaf85f77 |
wordlists-sorted | 65581967 (687MB / 168MB) | Part 1 | 2537a72f729e660d87b4765621b8c4bc |
wpalist | 37520637 (422MB / 66MB) | Part 1 | 9cb032c0efc41f2b377147bf53745fd5 |
WPA-PSK WORDLIST (40 MB) | 2829412 (32MB / 8.7MB) | Part 1 | de45bf21e85b7175cabb6e41c509a787 |
WPA-PSK WORDLIST 2 (107 MB) | 5062241 (55MB / 15MB) | Part 1 | 684c5552b307b4c9e4f6eed86208c991 |
WPA-PSK WORDLIST 3 Final (13 GB) | 611419293 (6.8GB / 1.4GB) | Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 | 58747c6dea104a48016a1fbc97942c14 |
-=Xploitz=- Vol 1 - PASSWORD DVD | 100944487 (906MB / 109MB) | Part 1 | 38eae1054a07cb894ca5587b279e39e4 |
-=Xploitz=- Vol 2 - Master Password Collection | 87565344 (1.1GB / 158MB) | Part 1 | 53f0546151fc2c74c8f19a54f9c17099 |
-=Xploitz Pirates=- Masters Password Collection #1! -- Optimized | 79523622 (937MB / 134MB) | Part 1 | 6dd2c32321161739563d0e428f5362f4 |
17-in-1 | 5341231112 (37GB / 4.5GB) | Part 1 - Part 24 | d1f8abd4cb16d2280efb34998d41f604 |
18-in-1 | 5343814622 (37GB / 4.5GB) | Part 1 - Part 24 | aee6d1a230fdad3b514a02eb07a95226 |
18-in-1 [WPA Edition] | 1130701596 (12.6GB / 2.9GB) | Part 1 - Part 15 | 425D47C549232B62DBB0E71B8394E9D9 |
Table 1 - 'raw' data |
Table 2 - Calculated Differences |
Table 3 - Summary |
Graph 1 - Number of lines in a collection |
Graph 2 - Percentage of unique words in a collection |
Graph 3 - Number of lines removed during claning |
Graph 4 - Percentage of content removed |
Graph 5 - Percentage of words between 8-63 characters (WPA) *Red means it is MEANT for WPA* |
Start Value | uniq | sort | uniq (awk '!x[$0]++') |
---|---|---|
word1,word2,word2,word3 | word1,word2,word3 | word1,word2,word3 |
word1,word2,word2,word3,word1 | word1,word2,word3,word1 | word1,word2,word3 |
word1,word2,word1,word1,word2,word3,word1 | word1,word2,word1,word2,word3,word1 | word1,word2,word3 |
# Merging rm -vf CREADME CHANGELOG* readme* README* stage* echo "Number of files:" `find . -type f | wc -l`cat * > /tmp/aio-"${PWD##*/}".lst && rm * && mv /tmp/aio-"${PWD##*/}".lst ./ && wc -l aio-"${PWD##*/}".lst file -k aio-"${PWD##*/}".lst # Uniq Lines cat aio-"${PWD##*/}".lst | sort -b -f -i -T "$(pwd)/" | uniq > stage1 && wc -l stage1 # "Clean" Lines tr '\r' '\n' < stage1 > stage2-tmp && rm stage1 && tr '\0' ' ' < stage2-tmp > stage2-tmp1 && rm stage2-tmp && tr -cd '\11\12\15\40-\176' < stage2-tmp1 > stage2-tmp && rm stage2-tmp1 cat stage2-tmp | sed "s/ */ /gI;s/^[ \t]*//;s/[ \t]*$//" | sort -b -f -i -T "$(pwd)/" | uniq > stage2 && rm stage2-* && wc -l stage2 # Remove HTML Tags htmlTags="a|b|big|blockquote|body|br|center|code|del|div|em|font|h[1-9]|head|hr|html|i|img|ins|item|li|ol|option|p|pre|s|small|span|strong|sub|sup|table|td|th|title|tr|tt|u|ul" cat stage2 | sed -r "s/<[^>]*>//g;s/^\w.*=\"\w.*\">//;s/^($htmlTags)>//I;s/<\/*($htmlTags)$//I;s/&*/&/gI;s/"/\"/gI;s/'/'/gI;s/'/'/gI;s/</ stage3 && wc -l stage3 && rm stage2 # Remove Email addresses cat stage3 | sed -r "s/\w.*\@.*\.(ac|ag|as|at|au|be|bg|bill|bm|bs|c|ca|cc|ch|cm|co|com|cs|de|dk|edu|es|fi|fm|fr|gov|gr|hr|hu|ic|ie|il|info|it|jo|jp|kr|lk|lu|lv|me|mil|mu|net|nil|nl|no|nt|org|pk|pl|pt|ru|se|si|tc|tk|to|tv|tw|uk|us|ws|yu):*//gI" | sort -b -f -i -T "$(pwd)/" | uniq > stage4 && wc -l stage4 && rm stage3 # Misc pw-inspector -i aio-"${PWD##*/}".lst -o aio-"${PWD##*/}"-wpa.lst -m 8 -M 63 ; wc -l aio-"${PWD##*/}"-wpa.lst && rm aio-"${PWD##*/}"-wpa.lst pw-inspector -i stage4 -o stage5 -m 8 -M 63 ; wc -l stage5 7za a -t7z -mx9 -v200m stage4.7z stage4 du -sh *"AIO + Sort"
cat * > /tmp/aio-"${PWD##*/}".lst && rm * && mv /tmp/aio-"${PWD##*/}".lst ./ tr '\r' '\n' < aio-"${PWD##*/}".lst > stage1-tmp && tr '\0' ' ' < stage1-tmp > stage1-tmp1 && tr -cd '\11\12\15\40-\176' < stage1-tmp1 > stage1-tmp && mv stage1-tmp stage1 && rm stage1-* # End Of Line/New Line & "printable" htmlTags="a|b|big|blockquote|body|br|center|code|del|div|em|font|h[1-9]|head|hr|html|i|img|ins|item|li|ol|option|p|pre|s|small|span|strong|sub|sup|table|td|th|title|tr|tt|u|ul" cat stage1 | sed -r "s/ */ /gI;s/^[ \t]*//;s/[ \t]*$//;s/<[^>]*>//g;s/^\w.*=\"\w.*\">//;s/^($htmlTags)>//I;s/<\/*($htmlTags)$//I;s/&*/&/gI;s/"/\"/gI;s/'/'/gI;s/'/'/gI;s/</ stage2 && rm stage1 sort -b -f -i -T "$(pwd)/" stage2 > stage3 && rm stage2 grep -v " * .* " stage3 > stage3.1 grep " * .* " stage3 > stage3.4 rm stage3 for fileIn in stage3.*; do # Place one or two words at the start, cat "$fileIn" | uniq -c -d > stage3.0 # Sort, then find dups (else uniq could miss out a few values if the list wasn't in order e.g. test1 test2 test3, test2, test4) sort -b -f -i -T "$(pwd)/" -k1,1r -k2 stage3.0 > stage3 && rm stage3.0 # Sort by amount of dup times (9-0) then by the value (A-Z) sed 's/^ *//;s/^[0-9]* //' stage3 >> "${PWD##*/}"-clean.lst && rm stage3 # Remove "formatting" that uniq adds (Lots of spaces at the start) cat "$fileIn" | uniq -u >> "${PWD##*/}"-clean.lst # Sort, then add unique values at the end (A-Z) rm "$fileIn" done rm -f stage* #aio-"${PWD##*/}".lst wc -l "${PWD##*/}"-clean.lst md5sum "${PWD##*/}"-clean.lstIf you're wanting to try this all out for your self, you can find some more wordlists here: