Content Discovery: Directories, Files And Links Enumeration:

Hacktivist-Attacker
7 min readMay 29, 2024

--

Welcome FriendsšŸ˜ƒ.. Now In this Blog, We would Know About how to Discover the Contents of ours Target..

Content Discovery Banner..

In the Previous Post we could exploit the our vulnerable targetā€™s TechnologiesšŸ’„. If you donā€™t looked out the Previous post You can look out it in below..

Lets start Dive into the blog..,

Yeah. We will Start..Are you Ready..!!!

What Means ContentsšŸ¤”?

Basically an domain was Structured by Different kinds of Data. The Data were called as contents. An Normal Domain will render the image, videos and other data for give the better feel and give better experience for the users. As an normal user we can only see these kinds of data (images,videos,etc..,) when we visit the Domain/Site.
However , The Application contains some contents (backup files, config files,etc.,) and they are hidden from the Applicationā€™s GUI . But they are behind in the application GUI and referenced in the Source Code. These kinds of data needed for the Applications to Work Well.

For Example., The Companyā€™s Logo and Some Report files would appear in the Web Application. But The Log file and Some sensitive Contents are hidden from the Application. The Log file can be used by applicationā€™s administrator to track what going on the application. Like Log File , These hidden contents can be used maliciously by Hackers. So, As a Penetration Tester we would find any sensitive contents and Make it safer from the Wrong Hands. Remember , Files are not only the sensitive contents also some external links can Lead to Data Exposure.

Also the Content Discovery gives the Good Structure for the Application. So we can Perform the Content Discovery In Recon Stageā€¦!!!

Next.., What Means Content DiscoveryšŸ¤”?

In Simply Says, Content Discovery is Discovering/Enumerating all the Available Contents in ours Targetā€™s Web Application. These Contents Can be Files, Directories and Links.

We can Divide the the Content Discovery Methodology in Two Parts..,

  1. Crawling ā†’ Exploring all the Contents from ours targetā€™s Resources (Source codes, refer links, js, css,ā€¦).
  2. Brute Forcing ā†’ Forcing the target to Return the Resources, If It has.

We can see the Techniques one by one. Lets we dive Enumeration Phaseā€¦

NOTE:

Both this two techniques, Most of the Tools work on Actively interact with ours Target. So Before You start Crawling , you might check if domainā€™s policy allows you to do Perform the Active Techniques..!!!

Lets Startā€¦.

1.Crawling:

Crawling is an Technique which means Exploring all the Contents from ours Targetā€™s Resources(Source codes, refer links, js, css,ā€¦).

Every Web Applicationā€™s Source Code was available in Behind it. To view the Page Source Code, We can Simply Right Click On ours Mouse or Touchpad and Press the View page Source option from ours Browser. The Source code of the Page Will Load on the New Tab. We can Analyze the Source codes for extracting files, Endpoints and etc,ā€¦ However one singe page contains at least 2000 lines of codes. It Becomes 20000 lines when when the page becomes 10. It might Very Harder to manually analyze them and It will lead Hit our LazinessšŸ˜’..

So tired Brotherā€¦

But the Programs(tools) would able to do the Job Repeat and Repeat and Never gets LazyšŸ˜€. We can Look out the Tools(Crawlers) in Belowā€¦,

Tools:

Automation is Always Ultimateā€¦

1.Hakrawler ā†’ Web crawler for gathering URLs and JavaScript file locations.

https://github.com/hakluke/hakrawler.

haklukeā€™s hakrawler from Github..

Install:

go install github.com/hakluke/hakrawler@latest
#Installing Tools from go Command , only run from the Commnad line If Path "go/bin" set as Enviroinment Variable.

Usage:

echo https://example.com | hakrawler -subs -d 5

#Usage Explanation:

echo <your target> : Giving the input url to crawling
-subs : Also crawl the contents from the subdomains
-d : Depth(Level To Crawl)

#For Know more About the Usage You can Simply run the Help Command..
hakrawler -h

2.crawley ā†’ Crawls web pages and prints any link it can find.

https://github.com/s0rg/crawley.

jmgā€™s Crawley from github..

Install:

1.Download the .deb file Or supported file type for your system.

https://github.com/s0rg/crawley/releases

2.Install the .deb Package On your System.,

sudo dpkg -i crawley_v1.7.5_.amd64.deb
#replace The crawley_v1.7.5_.amd64.deb with your Installed Debian File name.

Usage:

crawley -depth -1 -all -workers 4 https://example.com

#Usage Explanation:

-depth : Depth(Level To crawl)(-1 means No limit for Deth)
-all : Enumerate Contents From All Resources(js,css)
-workers : Number Of Workers For Crawling..

#For Know more About the Usage You can run the crawley's Help Command..
crawley -h

Internet Archives:

Internet Archives are time Travel Mechanism for The Internet. They are crawl the domains regularly and save the data in their database and give access to the public. Some of them are non-profit company, Basically they give us Access to the archived the Resources On the Internet.

The Archives companies are WaybackMachine,Common crawl and AlienVaultā€™s Open Threat Exchange.

WaybackMachine:

source: https://archive.org/

You can search your Target Domain In WaybackMachine and Traverse all the archives Contents from the Past to Present.

WaybackMachine returns the archived urls of the example.com

We can search and filter results. Also we can navigate to the Screenshot of the archived page..!!!!

We can Get the all the results In GUI. However we need urls as text format for using them to Analyze and Test in future. The tool Waybackurls runs from the command line and return the results from the WaybackMachine..

4.Waybackurls ā†’ Fetch all the URLs that the Wayback Machine knows about for a domain.

https://github.com/tomnomnom/waybackurls.

Tomnomnomā€™s Wayback machine from github..

Install:

go install github.com/tomnomnom/waybackurls@latest

Usage:

echo example.com | waybackurls

#Usage Explanation:

echo <target> : Giving the url to Pass it in Waybackmachine to get Results.

#For more Options
waybackurls -h

The Tool will return all the Archived Pages and Links for the The given domainā€™s url..

5.Gau(getallurls) ā†’ Fetch known URLs from AlienVaultā€™s Open Threat Exchange, Common Crawl and Wayback Machine.

https://github.com/lc/gau

lcā€™s Gau fom Githubā€¦

Like the Waybackurls .,Gau(getallurls) is another awesome tools that combine the three archives to crawl ours Target Domain. Gau will Explore ours targetā€™s URLs from AlienVaultā€™s Open Threat Exchange, the Common Crawl and Wayback Machine Crawlers.

Install:

go install github.com/lc/gau/v2/cmd/gau@latest

Usage:

echo example.com | gau 

#Usage Explanation:

echo <target> : Giving the url to Pass it on the Crawlers to get Results.

#For more Options
gau -h

2.Brute Forcing:

Brute forcing is an awesome Techniques based on Forcing the target to Return the Resources, If it has. Simply Its like Guessing the Resources of ours Target Domain.

Tools:

1.Gobuster ā†’ An Go language tool written for Directory/File, DNS Bruteforing

https://github.com/OJ/gobuster.

OJā€™s Gobuster from github..

Install:

go install github.com/OJ/gobuster/v3@latest

Usage:

gobuster dir -u https://example.com -w /path/to/wordlist.txt -e --delay 1s -t 10 -o target-dirs.txt

#Usage Explanation:

dir : Directory and File Enumeration Mode
-u : Target Url
-w : Wordlists(The File That onatins List Of Contents(file and dirs))
-e : Print the Content with full Url..
--delay : Interval for Requests per Second(1s means 1 second)
-t : Requests made per seconds(10 requests per second)
-o : Save the output in the File

#For List the More Usage:
gobuster dir -h

2. Dirsearch ā†’ Brute Forcer for Identifying the contents of the Web Application.

https://github.com/maurosoria/dirsearch

Maurosoriaā€™s dirsearch from github

Install:

#Cloning the tool
git clone https://github.com/maurosoria/dirsearch.git --depth 1

#Installing tool with Python as user
python setup.py install --user

Usage:

dirsearch -u https://example.com -w /path/to/wordlist.txt --full-url -i 200-599 --delay=1 -t 10 -o target-dirs.txt

#Usage Explanation:

-u : Target Url
-w : Wordlists(The File That onatins List Of Contents(file and dirs))
--full-url : Print the Content with full Url..
-i : status codes range shown in Results.(status codes between 200-599)
--delay : Interval for Requests per Second(delay=1 means 1 second)
-t : Requests made per seconds(10 requests per second)
-o : Save the output in the File

#For List the More Usage:
dirsearch -h

We can found if the resource were exist by the Response code. The Tool would Output with Multiple Response Codes. For example If the Status code is 200 Then the File or Folder is Exist on the Application.

CAUTIONāŒ:

Be aware of the threads you have been using In Brute Forcing. If you use the high threads value comparing than allowed policy, Which is conducted as illegal and causes ours target Domainā€™s availability. So always Check your target policy before the Automation.

Once we enumerated all contents, we would look for any sensitive Contents are Available in ours target. After Enumerated the Contents , We can manually analyze them and Verify If it has Sensitive Contents( backup and Config files/folders, Sensitive Links,etc..,)

After the Content Discovery Part we can move on the next Part in Recon, which is JavaScript Analyzing And Exploiting. Thank You for your Attention GuysšŸ’«šŸ’«. We will Meet on the Next BlogšŸ’–ā€¦.

Bye Bye dudes....

--

--

Hacktivist-Attacker

The Person Who Can Help You To Become A Best Version in The World Of Web Penetration Testing/Bug Bounty..