Updated: February 24, 2023.
In this guide, you will learn step by step how to use JetOctopus to analyze log files and learn about how Googlebot and other bots crawl your website. This is a basic guide that will be ideal for people who want to get started with log file analysis and don’t have a lot of experience in the subject.
A log file analysis is one of the more advanced technical SEO tasks which may seem a bit daunting if you have never done that before.
However, my goal with this guide is to show you that log file analysis can actually be fun and you don’t need to be an absolute pro to do it.
I swear that – once you start diving deep into log files – you will fall in love with it.
Ready? Let’s get started.
What you need to get started with log file analysis
To be able to do a log file analysis of your website, you need the following:
- Access to JetOctopus where you have your website added as a project
- Access to your server to be able to download log files
- Depending on the method you want to access logs, you may need an SFTP client like Filezilla to download log files to your computer and then upload them to JetOctopus.
- Actual log files from your server
❗Make sure to also check my guide on how to audit your website with JetOctopus.
❓Looking to hire someone to audit your website or do SEO? Make sure to check the SEO services I offer such as SEO audits, SEO consulting, SEO mentorship, and monthly SEO services.
👉 Contact me for more information or if you have any questions or learn more about why you want to hire me as your SEO consultant.
Get prepared to analyze log files
Here is all you need to do step by step. My assumption is that you have just created a JetOctopus account and have no data fresh there.
Let’s start by setting up JetOctopus and crawling your website.
1. Open JetOctopus and click + Create Project to add your website. Add all data as required.
2. Click + New Crawl to initiate the crawl and have the freshest data. Configure the crawl settings and integrate your website with GSC (you will need that in the future).
3. Start the crawl and get back when it’s done.
4. Once the crawl is complete, navigate to Log Analyzer > Manage Logs > Integrations and enable Logs integration.
Access & download log files from your server
Now we need to actually feed JetOctopus with log files. All you need to do is access your server and download log files from the desired period.
Depending on your web host, this process may differ.
❗Below I’m showing you how I did it with Cloudways which I honestly think is the best hosting option you can get.
1. When in Cloudways, navigate to Applications and click on your website.
2. Under Application Management, navigate to Access Details.
3. Under APPLICATION CREDENTIALS, you will find the credentials for SFTP or SSH access to your website. Create new credentials if needed.
4. Open an SFTP client like Filezilla (I am using Filezilla).
5. Enter your credentials, connect to your server and navigate to the logs folder.
6. Download the entire folder to your computer. Review the files downloaded and select which files you actually want to analyze.
Upload log files to JetOctopus
Let’s now upload the downloaded log files to JetOctopus:
1. When in JetOctopus, navigate to Log Analyzer > Manage Logs > Upload files and upload the log files in the Apache and/or Nginx format.
Your downloaded files are likely to be in .gz format. Extract them before uploading.
2. Upload the files and click Import Files.
3 Once the import is complete and JetOctopus has been able to correctly identify the files, you will see something like this. Click on Manage logs to start!
4. I have uploaded the access log for the last four days (June 16-June 19).
Now you have all you need to do a log file analysis.
Analyze your log files with JetOctopus!
And now the fun part you have been waiting for begins!
Let me show you a few things you can analyze here. Of course, what exactly you want to analyze depends on the purpose of your log file analysis and your technical skills.
Here I am showing you basic things you can learn about your website and the bots visiting it.
Get a general overview of bots visiting your website.
Navigate to Overview and select the period you want to analyze.
As you can see in the screenshot above, Googlebot accounts for 77% of bot visits to my website in the last 48 hours.
The default segments in Overview let you analyze all pages visited by bots (All Pages), all the indexable pages visited by bots (All Pages), pages visited by Googlebot only (Visited by Googlebot), visited by bots other than Googlebot (Not Visited by Googlebot), and more.
Note that you can also create your own segments.
Check interesting stats of the activity of the main bots (Google, Bing, Yandex) on your website.
Go to Impact and select the bot you want to analyze.
You can see the data like what parts of your website a given crawler crawls the most (i.e. the pages that are internally linked or not), how deep the crawler goes, and more.
Check for possible health issues.
Navigate to Health. You will see the data like types of problems encountered, dynamics of problems (bots and organic visits), and HTTP methods.
As you can see in the screenshot below, the main issue JetOctopus detected for my site is that I have a bunch of non-permanent redirects (302).
Check what other bots are visiting your site.
Navigate to Other Bots. This report is really interesting!
You will see the ratio of good vs fake bots. You can also see the list of non-search bots (e.g. LinkedIn bot, Ahrefs, etc.). It is good to know who is spidering your website.
Analyze GoogleBot & look for potential issues
And, of course, since Google is the dominant player, we are most interested in what GoogleBot is doing on our websites. Below are a few basic things you can learn about GoogleBot crawling your site.
Check what pages and resources GoogleBot is visiting.
Navigate to Bot Dynamics > Google to get an overview of how GoogleBot was visiting your website. Select the time period you want to analyze.
On the first screen, you can see the information about the total number of visits, the pages visited, an overview of status codes, and bot load time.
Scroll down to see more information like what type of GoogleBot was dominant (Desktop or Mobile), dynamics of mobile/desktop bots, GoogleBot versions, and more.
Click on View pages under Crawl Budget in Pages to see exactly what pages GoogleBot was visiting.
This will open Pages in Logs section with a filter for GoogleBot applied.
Click on View pages under Crawl Budget in Visits to see all resources that GoogleBot has been crawling and using crawl budget for.
This will open Raw Logs data.
Check resources that returned status codes other than 200 when GoogleBot visited them.
Under Data Tables > Raw Logs, select Non-200 Status to analyze status codes other than 200 encountered by GoogleBot.
As you can see in the screenshot below, GoogleBot came across the 304 (Not Changed) status a lot.
To view only pages with status codes other than 200, go to Pages in Logs > Pages with 4xx, 5xx, or Pages with 304.
Check if the Google crawl budget is being wasted.
Crawl budget is not an issue for small websites (under tens of thousands of pages) but can really become a huge problem for very large sites (millions of pages).
You can check if the crawl budget is being wasted by analyzing the pages returning status 4xx (error) and 5xx (server error). The latter may indicate that your server is being overloaded which may cause Google to crawl your website less.
To check if GoogleBot is crawling any resources returning errors (4xx status codes), go to Data Tables > Raw Logs > 4xx Status. If you see thousands or tens of thousands of URLs returning 4xx, make sure to investigate it more carefully.
To check if GoogleBot is encountering resources returning status 5xx, navigate to Data Tables > Raw Logs > 5xx Status. Fortunately, I don’t have anything there!
Practical tips & insights
To be able to make the most of your log file analysis, I suggest the following:
- Analyze a longer period of time to draw meaningful conclusions. I suggest analyzing at least one month’s worth of data. When analyzing, make sure to switch between different date ranges to spot trends or differences on specific days or in specific weeks.
- Don’t be afraid to play with Raw Logs and use filters to find exactly what you are looking for.
- If your website is not a massive one, I encourage you to review all Raw Logs and all Pages in Logs. It will be a great exercise for you to understand how bots are crawling your website and what they come across.
As you can see, thanks to awesome tools such as JetOctopus Log Analyzer, you don’t need to be a super-advanced tech SEO geek to be able to analyze log files. And you don’t need to do that in Excel, either.
If you like this tutorial, please share it with other SEOs or share it on Twitter. If you haven’t already, follow me on Twitter and subscribe to the SEOSLY YouTube channel. Thank you!
Make sure to check my other similar guides: