2016-09-27

Optimizing a website for your target audience can be tricky. Optimizing a large website for your target audience can be even trickier. Since the last article, DeepCrawl recently launched a significant update to a brand new crawler which is a lot faster and there are now a bunch of new reports available.

Below are 46 new and updated tips and tricks to optimise the search signals for your websites in Google’s organic search.

Spend more time making recommendations and changes

(and less time trawling through data)

1. Crawl MASSIVE Sites

If you have a really large website with millions of pages you can scan unlimited amounts with the custom setting – so long as you have enough credits in your account!

How To Do It:

From the crawl limits in step 3 of crawl set up, adjust the crawl limit to suit your target domain’s total URLs

Crawl up to 10 million using pre-fabricated options from the dropdown menu

For a custom crawl, select “custom” from the dropdown menu and adjust max URLs and crawl depth to suit your reporting needs



2. Compare Previous Crawls

Built into DeepCrawl is the ability to compare your current crawl with the most recent crawl, this is useful for tracking changes as they are implemented, and for providing rich data for your organization to show the (hopefully positive) impacts of your SEO changes on the site. You’ll also be able to see all of your previous crawls as well.

How To Do It:

On step 4 of your crawl set up, you can select to compare your new crawl to a previous one



3. Monitor Trends Between Crawls

Tracking changes between crawls gives you powerful data to gauge site trends, get ahead of any emerging issues, and spot potential opportunities. DeepCrawl highlights these for you through the Added, Removed, and Missing reports. These are populated once a project has been re-crawled, and appear in every metric reported upon.

Once a follow-up crawl is finished, the new crawl shows your improved stats in green and potential trouble areas in red.

In addition to calculating the URLs which are relevant to a report, DeepCrawl also calculates the changes in URLs between crawls. If a URL appears in a report where it did not appear in the previous crawl, it will be included in the ‘Added report’. If the URL was included in the previous crawl, and is present in the current crawl, but not in this specific report, then it is reported within the ‘Removed report’. If the URL was in the previous crawl, but was not included in any report in the current crawl, it is included in the ‘Missing report’ (e.g. the URL may have been unlinked since last crawled).



4. Filters, filters, filters!

DeepCrawl reports are highly customisable. With well over 120 filtering options, you can really drill down into the data and isolate exactly what you are looking for – in relation to your specific SEO strategy and needs. Whether highlighting pages with high load times/GA bounce rates, missing social tags, broken JS/CSS, or low content: HTML ratio. You can even save your filters by creating tasks within reports, so the SEO issues that matter to you most will be flagged every time you recrawl, making the thorns in your side easy to monitor, as well as your progress!

5. Assign Tasks, Ease Workflows

After you’ve gone through a big audit, assigning and communicating all your to-do’s can be challenging for any site owner. By using the built-in task manager, a button on the top right of each area of your crawl, you can assign tasks as you go along to your team members, and give each task a priority. This system helps you track actions from your site audit in the Issues area, easing team workflows by showing you what is outstanding, what you’ve achieved so far etc. You can also add deadlines and mark projects as fixed, all from the same screen in the DeepCrawl projects platform. Collaborators receiving tasks do not need a DeepCrawl account themselves, as they’ll have access to the specific crawl report shared as guests.

6. Share Read-Only Reports

This is one of my favourite options: sharing reports with C-levels and other decision makers without giving them access to other sensitive projects is easily doable with DeepCrawl. Generate an expiring URL to give them a high-level view of the site crawl as a whole or to kick out a PDF that focuses on a granular section, including content, indexation and validation. This also doubles up for sharing links externally when prospecting clients, especially if you’ve branded your DeepCrawl reports with your name and company logo.

7. DeepCrawl is now Responsive

With DeepCrawl’s new responsive design, crawl reports look great across devices. This means you can also set off crawls on the go, useful when setting off a quick site audit or pitch. Whilst you set off crawls from the palm of your hand you can also edit the crawls while you monitor them in real-time, in case you need to alter the speed or levels etc.

8. Brand your Crawl Reports

Are you a freelancer or an agency? In order to brand DeepCrawl reports with your business information/logo (or a client’s logo), and serve data to your team or client that is formatted to look as though it came directly from your shop, you can white-label them.

How To Do It:

Go to Account Settings

Select from Theme, Header colour, Menu colour, Logo and Custom proxy

Make the report yours!

Optimise your Crawl Budget

9. Crawl Your Site like Googlebot

Crawl your website just like search engine bots do. Getting a comprehensive report of every URL on your website is a mandatory component for regular maintenance. Crawl and compare your website, sitemap and landing pages to identify orphan pages, optimise your sitemaps and prioritise your workload.

10. Optimise your Indexation

DeepCrawl gives you the versatility to get high-level and granular views of indexation across your entire domain. Check if search engines can see your site’s most important pages from the Indexation report, which sits just under the Dashboard on the left hand navigation area. Investigate no-indexed pages to make sure you’re only blocking search engines from URLs when it’s absolutely necessary.

11. Discover Potentially Non-indexable Pages

To stop you wasting crawl budget and/or identify wrongly canonicalised content, the Indexation reports show you a list of all no-indexed pages on your site, and gives you details about their meta-tags e.g. nofollowed, rel canonical, noindex etc. Pages with noindex directives in the robots meta tag, robots.txt or X-Robots-Tag in the header should be reviewed, as they can’t be indexed by search engines.

12. Discover Disallowed URLs

The Disallowed URLs report, nested under Uncrawled URLs within Indexation, contains all the URLs that were disallowed in the robots.txt file on the live site, or from a custom robots.txt file in the Advanced Settings. These URLs cannot be crawled by Googlebot, which prevents their appearance in search results, these should be reviewed to ensure that none of your valuable pages are being disallowed. It’s good to get an idea of which URLs may not be crawled by search engines.

13. Check Pagination

Pagination is crucial for large websites with lots of products or content, by making sure the right pages display for relevant categories on the site. You’ll find First Pages in a series in the pagination menu, you can also view unlinked paginated pages which is really useful for hunting down pages which might have rel=next and rel=prev implemented wrongly.

Understand your Technical Architecture

14. Introducing Unique Internal Links & Unique Broken Links

The Unique Internal Links report shows you the number of instances for all the anchor texts DeepCrawl finds in your crawl, so you can maximise your internal linking structure and spread your link juice out to rank for more terms! The links in this report can be analysed to understand for anchor text, as well as the status of the redirect target URL.

DeepCrawl’s Unique Broken Links report shows your site’s links with unique anchor text and target URL where the target URL returns a 4xx or 5xx status. Naturally, these links can result in poor UX and waste crawl budget, so they can be updated to a new target page or removed from the source page. This nifty new report is unique to DeepCrawl!

15. Find Broken Links

Fixing 404 errors reduces the chance of users landing on broken pages and makes it easier on the crawlers, so they can find the most important content on your site more easily. Find 404s in DeepCrawl’s Non-200 Pages report. This gives you a full list of a 4xx errors on site at the time of audit, including their page title, URL, source code and the link on the page found to return a 404.

You’ll also find pages with 5xx errors, any unauthorised pages, non-301 redirects, 301 redirects, and uncategorised HTTP response codes, or pages returning a text/html content type and an HTTP response code which isn’t defined by W3C – these pages won’t be indexed by search engines and their body content will be ignored.

16. Fix Broken Links

Having too many broken links to internal and external resources on your website can lead to a bad user experience, as well as give the impression your website is out of date. Go to the Broken Pages report from the left hand menu to find them. Go fix them.

17. Check Redirects – Including Redirection Loops

Check the status of temporary and permanent redirects on site by checking the Non-200 Status report, where your redirects are nested. You can download 301 and 302 redirects or share a project link with team members to start the revision process

You can also find your Broken Redirects, Max Redirects and Redirect Loops. The Max Redirects report is defaulted to show pages that hop more than 4 times, this number can be customised on step 4 of your crawl set up in Report Settings, nested under Report Setup within the Advanced Settings.

The new Redirect Loop report contains URL chains which redirect back to themselves. These chains will result in infinite loops, causing errors in web browsers for users, and prevent crawling by search engines. In short, fixing them helps bots and, users, and prevent the loss of important authority. Once found you can update your redirect rules to prevent loops!

18. Verify Canonical Tags

Incorrect canonical tags can waste crawl budget and cause the bots to ignore parts of your site, leaving your site in danger, as search engines may have trouble correctly identifying your content. View canonicalised pages in your DeepCrawl audit from the Indexation report. The Non-Indexable report gives you the canonical’s location and page details (like hreflang, links in/out, any duplicate content/size, fetch time etc).

19. Verify HREFLANG Tags

If your website is available in multiple languages, you need to validate the site’s HREFLANG tags. You can test HREFLANG tags through the validation tab in your universal crawl dashboard.

If you have HREFLANG tags in your sitemaps, be sure to include Web Crawl in your crawl set up, as this includes crawling your XML sitemaps. DeepCrawl reports on all HREFLANG combinations, working/broken, and/or unsupported links as well as pages without HREFLANG tags.

How To Do It:

The Configuration report gives you an overview of HREFLANG implementation

In the lower left menu, the HREFLANG section breaks down all the aspects of HREFLANG implementation into categorised pages

20. Optimise Image Tags

By using the custom extraction tool you can extract a list of images that don’t have alt tags across the site which can help you gain valuable rankings in Google Image Search.

How To Do It:

Create custom extraction rules using Regular Expressions

Hint: Try “/(<img(?!.*?alt=([‘”]).*?\2)[^>]*)(>)/” to catch images that have alt tag errors or don’t have alt tags altogether

Paste your code into “Extraction Regex” from the Advanced Settings link on step 4 of your crawl set up

Check your reports from your projects dashboard when the crawl completes. DeepCrawl gives two reports when using this setting: URLs that followed at least one rule from your entered syntax and URLs that returned no rule matches

21. Is Your Site Mobile-Ready?

Since “Mobilegeddon”, SEO’s have all become keenly aware of the constant growth of mobile usage around the world, where 70% of site traffic is from a mobile device. To optimise for the hands of the users holding those mobile devices, and the search engines connecting them to your site, you have to send mobile users content in the best way possible, and fast!

DeepCrawl’s new Mobile report shows you whether pages have any mobile configuration, and if so whether they are configured responsively, or dynamically. The Mobile report also shows you any desktop mobile configurations, mobile app links, and any discouraged viewport types.

22. Migrating to HTTPS?

Google has been calling for “HTTPS everywhere” since 2014, and it has been considered a ranking signal. It goes without saying that sooner or later most sites may to switch to the secure protocol. By crawling both http/https, DeepCrawl’s HTTPS report will show you:

HTTP resources on HTTPS pages

Pages with HSTS

HTTP pages with HSTS

HTTP pages

HTTPs pages

Highlighting any HTTP resources on HTTPS pages enables you to make sure your protocols are set up correctly, avoiding issues when search engines and browsers are identifying whether your site is secure or not. Equally, your users won’t see a red lock appear in the URL instead of a green lock, and won’t get a warning message from browsers saying proceed with caution, site insecure. Which is not exactly what people want to see when they are about to make a purchase, because when they see it, they probably won’t…

23. Find Unlinked Pages driving Traffic

DeepCrawl really gives you a holistic picture of your site’s architecture. You can incorporate up to 5 sources into each crawl. By combining a website crawl with analytics you’ll get a detailed gap analysis, and find URLs which have generated traffic but aren’t linked – also known as orphans – will be highlighted for you.

Pro tip: even more so if you add backlinks and lists to your crawl too! You can link your Google Analytics account and sync 7 or 30 days of data, or you can manually upload up to 6 months worth of GA or any other analytics (like Omniture) for that matter. This way you can work on your linking structure, and optimise pages that would otherwise be missed opportunities.

24. Sitemap Optimisation

You can opt to manually add sitemaps into your crawl and/or let DeepCrawl automatically find them for you. It’s worth noting that if DeepCrawl does not find them, then it’s likely that search engines won’t either! By including sitemaps into your website crawl, DeepCrawl identifies the pages that either aren’t linked internally or are missing from sitemaps.

By including analytics in the crawl you can also see which of these generate entry visits, revealing gaps in the sitemaps. Moreover, shedding light on your internal linking structure by highlighting where they don’t match up, as you’ll see the specific URLs that are found in the sitemaps but aren’t linked, likewise those that are linked but are not found in your sitemaps.

Performing a gap analysis to optimise your sitemaps with DeepCrawl enables you to visualise your site’s structure from multiple angles, and find all of your potential areas and opportunities for improvement. TIP: You can also use the tool as an XML sitemaps generator.

25. Control Crawl Speed

You can crawl at rates as fast as your hosting environment can handle, which should be used with caution to avoid accidentally taking down a site. DeepCrawl boasts one of the most nimble audit spiders available for online marketers working with enterprise level domains.

Whilst appreciating the need for speed, accuracy is what’s most important! That said, you can change crawl speeds by URLs crawled per second when setting up, or even during a live crawl. Speeds range from 0-50 URLs crawled per second.

Is your Content being found?

26. Identifying Duplicate Content

Duplicate content is an ongoing issue for search engines and users alike, but can be difficult to hunt down. Especially on really, really large sites. But, these troublesome pages are easily identified using DeepCrawl.

Amongst DeepCrawl’s duplicate reporting features, lies the Duplicate Body Content report. Unlike the previous version of DeepCrawl, the new DeepCrawl doesn’t require the user to adjust sensitivity. All duplicate content is flagged, to help avoid repeated content that can confuse search engines, make original sources fail to rank, and aren’t really giving your readership added value.

Finding Duplicate Titles, Descriptions, Body Content, and URLs with DeepCrawl is an effortless time-saver.

27. Identify Duplicate Pages, Clusters, Primary Duplicates & Introducing True Uniques

Clicking into Duplicate Pages from the dashboard gives you a list of all the duplicates found in your crawl, which you can easily download or share. DeepCrawl now also gives you a list of Duplicate Clusters so you can look at groups of duplicates to try find the cause/pattern of these authority-draining pages.

There is also a new report entitled True Uniques. These pages have no duplicates coming off them in any way shape or form, are the most likely to be indexed, and naturally are very important pages in your site.

Primary Duplicates have duplicates coming off them – as the name implies – but have the highest internal link weight from each set of duplicated pages. Though signals like traffic and backlinks need be reviewed to assess the most appropriate primary URL, these pages should be analysed – as they are the most likely to be indexed.

Duplicate Clusters are pages sharing an identical title and near identical content with another page found in the same crawl. Duplicate pages often dilute authority signals and social shares, affecting potential performance and reducing crawl efficiency on your site. You can optimise clusters of these pages, by removing internal links to their URLs and redirecting duplicate URLs to the primary URL, or adding canonical tags to another one.

How To Do It:

Click “Add Project” from your main dashboard

Under the crawl depth setting tell DeepCrawl to scan your website at all its levels

Once the crawl has finished, review your site’s duplicate pages from the “issues” list on your main dashboard or search for ‘duplicate’ in the left nav search bar

28. Sniff Out Troublesome Body Content

Troublesome content impacts UX and causes negative behavioral signals like bouncing back to the search results. Review your page-level content after a web crawl by checking out empty or thin pages, and digging into duplication. DeepCrawl gives you a scalpel’s precision in pinpointing the problems right down to individual landing pages, which enables you to direct your team precisely to the source of the problem.

How To Do It:

Click on the Body Content report in the left hand menu

You can also select individual issues from the main dashboard

Assign tasks using the task manager or share with your team

29. Check for Thin Content

Clean, efficient code leads to fast loading sites – a big advantage in search engines and for users. Search engines tend to avoid serving pages that have thin content and extensive HTML in organic listings. Investigate these pages easily from the Thin Pages area nested within the Content report.

30. Avoid Panda, Manage Thin Content

In a post-Panda world it’s always good to keep an eye on any potentially thin content which can negatively impact your rankings. DeepCrawl has a section dedicated to thin and empty pages in the body content reports.

The Thin Pages report will show all of your pages with less than the minimum to content size specified in Advanced Settings > Report Settings (these settings are defaulted at 3 kb, you can also choose to customise them). Empty pages are all your indexable pages with less content than the Content Size setting specified (default set at 0.5 kilobytes) in Advanced Settings > Report Settings.

How To Do It:

Typing content in the main menu will give you the body content report

Clicking on the list will give you a list of pages you can download or share

31. Optimise Page Titles & Meta Descriptions

Page titles and meta descriptions are often the first point of contact for users to your site coming from the search results, well written and unique descriptions can have a big impact on click through rates and user experience. Through the Content report, DeepCrawl gives you an accurate count of duplicate, missing and short meta descriptions and titles.

32. Clean Up Page Headers

Cluttered page headers can impair the click through rate if users’ expectations are not being managed well. CTRs can vary by wide margins, which makes it difficult to chart the most effective path to conversion.

If you suspect your page headers are cluttered by running a crawl with Google Analytics data, you can assess key SEO landing pages, and gain deeper insights, by combining crawl data with powerful analytics data including bounce rate, time on page, and load times.

Other nuggets that DeepCrawl gives

33. How Does Your Site Compare to Competitors?

Set up a crawl using the “Stealth Crawl” feature to perform an undercover analysis of your competitor’s site, without them ever noticing. Stealth Crawl randomises IPs, user agents with delays between requests – making it virtually indistinguishable from regular traffic. Analyse their site architecture and see how your site stacks up in comparison, insodoing discovering areas for improvement.

How To Do It:

Go to the Advanced Settings in step 4 of your crawl setup and select and tick Stealth Mode Crawl nested under the Spider Settings

34. Test Domain Migration

There’s always issues with newly migrated websites which usually generate page display errors and the site going down. By checking status codes post-migration in DeepCrawl you can keep an eye on any unexpected server-side issues as you crawl.

In the Non-200 Pages report you can see the total number of non-200 status codes, including 5xx and 4xx errors that DeepCrawl detected during the platform’s most recent crawl.

35. Test Individual URLs

Getting a granular view over thousands of pages can be difficult, but DeepCrawl makes the process digestible with an elegant Pages Breakdown pie chart on the dashboard that can be filtered and downloaded for your needs. The pie chart (along with all graphs and reports) can be downloaded in the format of your choice, whether CSV/PNG/PDF etc.

View Primary Pages by clicking the link of the same name (primary pages) in your dashboard overview. From here, you can see a detailed breakdown of each and every unique and indexable URL of up to 200 metrics, including DeepRank (an internal ranking system), clicks in, load time, content/HTML ratio, social tags, mobile-optimization (or lack thereof!), pagination and more.

36. Make Landing Pages Awesome and Improve UX

To help improve conversion and engagement, use DeepCrawl metrics to optimise page-level factors like compelling content and pagination, which are essential parts of your site’s marketing funnel that assist in turning visitors into customers.

You can find content that is missing key parts through the page content reports to help engage visitors faster, deliver your site’s message in a clearer way and increasing chances for conversions and exceeding user expectations.

37. Optimise your Social Tags

To increase shares on Facebook (Open Graph) and Twitter and get the most out of your content and outreach activities, you need to make sure your Twitter Cards and Open Graph tags are set up and set up correctly.

Within DeepCrawl’s Social Tagging report you will see pages with or without social tags, whether those that do are valid, and OG:URL Canonical Mismatch or, pages where the Open Graph URL is different to the Canonical URL. These should be identical, otherwise shares and likes might not be aggregated for your chosen URL in your Open Graph data but be spread across your URL variations.

Want Customised Crawls?

38. Schedule Crawls

You can schedule crawls using DeepCrawl to automatically run them in the future and adjust their frequency and start time. This feature can also be used to avoid times of heavy server load. Schedules range from every hour to every quarter.

How To Do It:

In step 4 of your crawl set up click on Schedule crawl

39. Run Multiple Crawls at Once

You can crawl multiple sites (20 at any given time) really quickly as DeepCrawl is cloud-based, spanning millions of URLs at once while still being able to use your computer to evaluate other reports. Hence with DeepCrawl you can perform your Pitch, SEO Audit and your Competitor Analysis at the same time.

40. Improve Your Crawls with (Google) Analytics Data

By authenticating your Google Analytics accounts into DeepCrawl you can understand the combined SEO and analytics performance of key pages in your site. By overlaying organic traffic and total visits on your individual pages you can prioritise changes based on page performance.

How To Do It:

On step 2 of your crawl set up, go to Analytics, click add Google account

Enter your Google Analytics name and password to sync your data permissions within the DeepCrawl

Click the profile you want to share for the DeepCrawl project

DeepCrawl will sync your last 7 or 30 days of data (your choice), or you can choose to upload up to 6 months worth of data by uploading the data as a CSV file, whether from Google Analytics or Omniture or any providers

41. Upload Backlinks and URLs

Identify your best linked sites by uploading backlinks from Google Search Console, or lists of URLs from other sources to help you track the SEO performance of the most externally linked content on your site.

<img class="aligncenter size-large wp-image-60846" src="http://www.stateofdigital.com/wp-content/uploads/2016/09/41-Upload-Backlinks-and-URLs-800x500.jpg" alt="41-upload-backlinks-and-urls" width="640" height="400" srcset="http://www.stateofdigital.com/wp-content/uploads/2016/09/41-Upload-Backlinks-and-URLs-800x500.jpg 800w, http://www.stateofdigital.com/wp-content/upload

Show more