SEO Audit: How to Look for Duplicate Content

Duplicate Content

Written by Jeremy Earle, JD

July 12, 2022

Duplicate content is a major red flag that should be investigated in your SEO audit. How to accomplish it, and what you should be on the lookout for

This can range from URL-based concerns to physical content reproduced without any changes on each of the site’s pages.

Additionally, duplicate content on product and category pages is a WordPress-specific issue that should be considered.

Duplicate content is a major red flag that should be investigated in your SEO audit.

A checklist and instructions are provided below.

Get to the Bottom of Your Website’s Duplicate Content Problems Faster

What to Look for and How to Find Out, a service developed by Copyscape, may help you easily spot content duplication on your site.

Using this tool, you can easily see which pages have a high match % and which pages have a high match percentage with other pages.

Find out which pages of your website have been republished elsewhere on the internet.

How to Verify

  • Copyscape is regarded as a de facto auditing standard in the SEO community. With their premium service, you may use this tool to find and remove duplicate material from your entire website. To see if any of your site’s pages have been copied elsewhere on the web, run a search in Copyscape.
  • Check Google’s index to see if any of your site’s material has been plagiarised from elsewhere on the internet. Use Google’s search bar to type in a portion of the text you’d like to verify. This should make it easier for you to track down where it’s been taken.

Look for Duplicate Content on URLs

Duplicate material isn’t restricted to only text on the page for detection.

This can expose flaws that confuse Google when crawling your site, such as URLs that lead to duplicate content.

The following should be checked and investigated:

  • The frequency with which new material is added to the site.
  • Amount of content that is being updated.
  • A look at the page’s history of updates.

How to Verify

The Last Modified column can be found by scrolling to the right in Screaming Frog.

  • Find out how often the site has been updated and how much new content has been added since the last time you visited.
  • Analyze page-update trends over time. As an example, consider the following:

As a competitor, you may go so far as to crawl your rivals every month and keep this information on hand to see what they’re doing.

This data can be easily analyzed and updated in an Excel table, and previous trends may be spotted if you want to know what your competitors are up to.

This is a priceless nugget of knowledge.

The First Thing to Do

  • Helpful additional content.
  • Syndicated content.

If syndicated material constitutes a significant portion of a website’s content, it’s helpful to know how it is divided up into segments or otherwise syndicated.

Thin content can be easily identified with this approach, which can create custom filters for locating further, relevant content.

The prominence of the Keyword

It’s possible to determine if a keyword has “keyword prominence” by using the above approach to generate custom filters.

H1, H2, and H3 tags should all contain the Keyword.

Screaming Frog’s H1 tab has the H1, H2, and H3 tags, accessed by clicking on them.

Clicking on the H2 tab is also an option. The site’s H3 tags can also be identified using a custom filter.

The First Thing to Do

  • Keyword word order.
  • Grammar and punctuation.
  • Reading level.

You may not want to go through the discomfort of finding grammatical and spelling errors on your site during a site audit, but doing so before you submit content will help ensure that your site is a solid performance.

Use the Hemingway App to edit and compose your material if you’re not a professional writer.

It’s a good idea to use it before putting your work online.

Count of Links to Other Sites

The performance of a page can be adversely affected by the sheer volume of external links it contains.

It has long been considered excellent practice by SEOs not to have more than 100 links on a single page.

There are conflicting reports, notwithstanding Google’s declaration that the necessity for restricting outbound links to 100 per page has been abolished.

Outbound connections are not a ranking factor, according to John Mueller. What’s the deal?

If you’re having trouble coming up with a solution, consulting previous research can be helpful. conducted a study that found the opposite:

“The outcomes are obvious.

A link to an authoritative site can have a positive effect on a website’s search engine rankings.”

Context is critical because a page’s 100 outbound links can be anything from 100 navigational links to 100 fabricated links. ‘Context is vital.’

The goal here is to examine both the quality and amount of those linkages.

If you see an unusual pattern in the number of links, you should look into the quality and quantity of those links further.

Screaming Frog still allows you to perform a bonus check if you so desire, but it isn’t necessary any longer.

How to Verify

Screaming Frog’s Outlinks tab can be found after selecting the URL in the main window of the page you wish to examine for outbound links.

If you’d prefer a quicker method of locating all of your site’s outbound links, you can select Bulk Export > All Outlinks.

Page with the Most Internal Links

Click on the URL in the main Screaming Frog window and select the Inlinks tab to see how many internal links point to that page.

It’s also possible to view all of the site’s internal links by selecting Bulk Export > All Inlinks.

Internal Links to a Page’s Content

To evaluate the quality of internal links connecting to each page on the site, we may use an Excel document we produced during the stage where we bulk exported all of the links:

Defective Hyperlinks

Using an SEO audit to locate and fix broken links can give you a head start on resolving any difficulties that may arise down the road.

How to Verify

Select HTML from the Filter: dropdown menu and sort the pages by status code when Screaming Frog has finished its site crawl.

This will show you all of the error pages before the live 200 OK pages in descending order.

We are looking for all of the 400 errors, 500 errors, and other page errors in this check.

As long as you haven’t found the link in the Google index in a while, you can overlook 400 faults and let it slip out of the search results.

However, if they’ve been indexed for some time, you’ll probably want to reroute them.

Identifying and removing affiliate links from an affiliate-heavy website is a smart path to follow if your audit goal is to find and delete affiliate connections.

How to Verify

Affiliate links tend to have a common referrer or section of their URL that may be identified across a wide variety of websites.

These links can be found by creating a custom filter.

Conditional formatting in Excel allows you to remove affiliate links from bulk exports from Screaming Frog and identify where they are.

Length of URL

Screaming Frog’s URL tab, Filter, and then Over 115 Characters can be used to locate URLs longer than 115 characters.

When you use this tool, you’ll be able to see all of the URLs on your site that are longer than 115 characters.

Classification of Web Pages

Screaming Frog’s site structure section, found on the far right of the spider tool, can be used to get a high-level overview of page categories.

How to Verify

The site structure page lets you see the most popular URLs on the site, along with the categories into which they fall. In addition, the reaction times tab allows you to identify page response time concerns.

WordPress makes it simple to discover and correct content duplication concerns that may influence search engine results. They may be fixed here.

Small companies, bloggers, and huge news organisations may benefit from WordPress’s ease of use. There are plugins for just about everything regarding canonical links and other best practices.

Duplicate content is a new problem that has arisen due to the ease with which information and designs may be published.

Duplicate content is a frequent cause of a WordPress website’s failure to rank.

Even though it’s not exactly what we’re used to seeing in SEO as “duplicate content,” it still has to be handled.

The following is a list of the most prevalent forms of WordPress content duplication and how to correct them.

1.There are a few things to keep in mind while you.

Many WordPress sites struggle with tags. When you tag an article, it generates a new page with additional material that you believe is pertinent to the topic at hand.

The page will have excerpts or entire articles from various sources. When a tag is the same as a primary page on your core website (assuming it is not a blog), you’ve established a rival to that page on your site.

It’s common for tags to be reworked copies of themselves, resulting in an almost identical material.

There is a risk that if this occurs, the site’s value might be negatively affected.

It’s all good! It’s a simple repair.

Either remove the tags entirely or add a meta robots noindex dofollow directive.

If I use the no index no-follow tag, search engines will be informed that my page is sparse, but they will still crawl and index the links.

It is now clear to search engines that the page isn’t as beneficial as others, and you’ve shown them how to get your excellent material-specific articles and pages.

There are two main categories to consider here:

It’s not uncommon for category pages to have many posts and articles as a tag.

As article snippets, they don’t always answer a question or provide an effective solution, and as a result, they may not be useful to those searching for solutions.

2. For this reason, their content is often deemed thin.

However, there is one notable exception.

A WordPress website like Search Engine Journal, for example, has categories that are devoted to channels and certain specialisations inside a channel, for example.

In general, a category may be quite helpful to someone looking for information on a certain channel. Because this is a tag, you’ll want to treat it differently.

Put a meta robot index and dofollow tags, construct distinct titles and copy for the category, and, if the schema is applicable, add it in to introduce the category.

As a result of your input, we have a better idea of what kinds of searches and who should see the page.

Search engines may give you credit for it. If you’re a company, make sure they don’t compete with your primary website pages.

3. Third, there are a lot of opposing topics.

The absence of original material is the second problem I notice while inspecting WordPress blogs.

Consider the world of food blogging. Even if you’re employing recipe schema and other means of recipe differentiation, what if you weren’t aware of them at the outset?

If you have 20 chocolate chip cookie recipes, many of them likely use identical components and terminology, leading to rivalry.

If you don’t make the additional effort, the recipes may not show up since they are competing with each other. Each recipe is distinct and may serve a different function.

Cookie categories and subcategories are a good idea in this situation. If you can’t, go back and add modifiers to your sentences (e.g., spicy, savoury, chewy, for parties, for large groups).

The next step is to begin adding content describing the end product (which does not necessarily need to be at the top since you want to convey the recipe swiftly to the customer). Make certain that the copy is relevant to the subject and demonstrates why, how, and where it differs from the competition.

Do you need any other examples?

Do you have a Christmas gift guide or article on a particular theme? Is there anything else that’s changed since last year? What are some good Mother’s Day crafts? What is some thoughtful Valentine’s Day presents for X.Y.Z.?

These don’t stand out enough to be considered original. Numerous postings can compete for attention.

If you include a year in your title (e.g., 2016, 2017), search engines may disregard your page since you are no longer relevant. In this regard, using the tactics outlined above may be beneficial.

4. URLs of Search Boxes

Search boxes on WordPress sites may produce URLs; however, this isn’t something I’ve seen very frequently.

These URLs might be indexed if someone connects to them or if search engines can discover and crawl them.

Meta robots noindex dofollow, like in the tags, might help, but it’s unlikely to be sufficient.

To get to the bottom of this, you’ll need to identify the URL’s unique identity. After the primary URL, you’ll often see a “? “.

Add a disallow to this parameter in your robots.txt file now. If this is implemented appropriately, thick or, duplicate content problems should be less of an issue.

5. Then there’s number

Other problems, such as duplication or thin content, might occur with technologies that automate most jobs and make life easier.

Have you checked your site to see if there are any others?

These options are examples of creating PDFs for printing that are also indexable or other versions in quotations, which might be detrimental for short postings.

This isn’t a big deal, but you may have an RSS feed that publishes whole pages instead of snippets and simply feeds headlines or descriptions.


Most of these WordPress problems may be identified and fixed using the tactics above. You should see an increase in your organic search results if you remove all of your duplicate material.

You May Also Like…

How to Do SEO Analytics

How to Do SEO Analytics

This guide will help you understand SEO analytics and will help you improve your understanding of optimization...