List of 18 Million Websites
With Social Media Handles Extracted

What Is This Dataset?

This List of 18 Million Websites With Social Media Handles Extracted is a large-scale, structured database of active websites enriched with their publicly available social media profile links. Instead of manually scraping websites to find Instagram, Facebook, LinkedIn, Twitter/X, or YouTube links, this dataset gives you direct access to extracted social handles in a ready-to-use format.

It is built for marketers, researchers, lead generation teams, SaaS founders, and data professionals who need website + social presence intelligence at scale.

What’s Included

• 18,000,000+ verified websites
• Extracted social media profile URLs (when publicly available)
• Instagram profiles
• Facebook pages
• LinkedIn company pages
• Twitter / X profiles
• YouTube channels
• HTML title tags
• Meta descriptions
• Top-Level Domains (.com, .net, .org, .io, country TLDs, etc.)

Each row connects a website directly to its social presence — allowing you to instantly identify businesses that are active on specific platforms.

Key Use Cases

Social Media Lead Generation
Filter businesses by platform presence. For example, find all websites with active Instagram accounts or LinkedIn company pages. Perfect for influencer outreach, SaaS prospecting, and B2B lead generation.

Multi-Channel Outreach
Combine website outreach with direct social messaging. Increase response rates by contacting businesses through multiple channels.

Market Research & Brand Analysis
Analyze which industries are active on specific social platforms. Identify platform adoption trends across niches and regions.

Competitive Intelligence
Discover competitor social profiles at scale and analyze their messaging, positioning, and cross-platform presence.

Agency Prospecting
Perfect for marketing agencies selling social media management, paid ads, SEO, or web development. Filter for businesses with weak or missing social presence and target them directly.

AI & Data Science
Use website-to-social mapping for training machine learning models, brand classification, engagement prediction, and digital footprint analysis.

Who This Is For

• Digital marketers
• Lead generation specialists
• Social media agencies
• SaaS founders
• Data analysts
• Growth teams
• AI & ML engineers
• Researchers

If your strategy depends on identifying businesses with active social media presence, this dataset eliminates weeks of scraping and manual research.

Download & File Formats

The dataset is delivered as a compressed .zip download and is approximately 5.4GB uncompressed. It is included in two formats:

• MySQL dump for database deployment
• CSV files for Excel, Google Sheets, and analytics tools

Training & Support

Full setup guidance is included. You’ll receive step-by-step instructions on: • Importing into MySQL
• Filtering by platform (e.g., Instagram-only websites)
• Exporting targeted CSV lists
• Segmenting by TLD or keywords

Even if you’ve never handled large datasets before, the instructions make it straightforward.

In Short

This List of 18 Million Websites With Social Media Handles Extracted is a powerful bridge between the web and social platforms.

It gives you instant visibility into which businesses are active where — enabling smarter targeting, faster prospecting, and deeper digital intelligence.

Dataset Pictures

CSV File CSV File of 18 million websites with social handles


MySQL Table MySQL table of 18 million websites with social handles


MySQL Table Filtered By TLD MySQL table of 18 million websites with social handles filtered by TLD

MySQL Table Filtered By Niche MySQL table of 18 million websites with social handles filtered by niche

MySQL Table Filtered By Keywords (California) MySQL table of 18 million websites with social handles filtered by keywords


Video Demo



Hi everyone, it's Jamie from anysoftwareyouwant.com and in this video we're going to give you a lightning quick demo of our huge data set of over 18 million websites all with their social handles extracted. So just to give you a brief description of this data set, our crawlers have crawled most of the internet. We checked over 99% of websites online and we've searched for their social handles and if they do have social handles then we include them in this data set. So this is over 18 million websites that have social handles and we've extracted them and they're in this data set for you. As you can see it's actually a little bit over 18 million, it's about 65,000 over 18 million but we like to over deliver these data sets because sometimes websites go offline quite quickly etc. So let's have a quick browse of this data set. You can see we've got the domain here and then we've got the Instagram, X, Facebook and LinkedIn handles if we could find them. So as I said earlier we only include websites in this data set where we found one or more social handles. We've also got the title and description from the website that we found the social handles on. Now that's really useful because it means you can find social accounts, certain niches by searching for keywords that are in the title or description of the website that we found those handles on. We've also got the TLD, so if you want to for example only find socials or websites from Canada you can or from the UK you can and then finally we've got our categories. So how we've placed websites in categories works like this. Sometimes a website can span multiple categories, for example many blogs do that. So we have a primary category and a secondary category and we match these websites based on a predefined set of keywords. So if we find more keywords in the description in the title to do with fitness than we tend to do with say religion then we'll place it in a primary fitness category and then a secondary religion category for example. And also we give a certainty score, so we've got a primary match score and that is how certain we are that it fits in that category and that's worked out from the amount of keywords we find in the title or description and the length of an etc to work that score out. And that score is useful because you might be doing outreach campaigns where if you're targeting for example gyms and you message a website that's in for example finance niche because you didn't use the match score filter it might hurt your campaign's success rate might get messages marked as spam etc so you can use that match score to filter down to only websites that were pretty certain or within a certain niche. So now we'll give you a few quick demos of how to work with the data set in the format you see here which is a MySQL database. We provide it as a MySQL dump and we also provide a comma separated value CSV file. And also upon purchase we provide full training in the video format that shows you how to do more advanced queries and searches on these data sets and also just how to handle these large data sets locally on your local machine. For example how to set up MySQL, how to import the MySQL dump, how to search these large CSV files PowerShell. So even if you're not technical we provide full video training on how to use these massive data sets. So for our first demo here we've just put together a simple SQL query so we'll say for this example let's say we're a social media marketing agency and we specialize in helping people on Instagram who have a dentistry. So here you can see we're finding all websites where we've managed to extract a Instagram handle and we're putting the category matching to the primary precise secondary category because when we categorize websites we have a broad and then a precise category and as you can see we found nearly 43,000 websites that are probably dental websites that have Instagram. And for our next quick demo here we've searched for all websites in this data set that have a Facebook handle as you can see here and also have California in the website's title or description. So this is really good for finding websites in certain geographical areas that have a presence on social media and as you can see because this data set is so large we found just under 47,000 websites. Now for our final demo as we said earlier we do provide this huge data set in a comma separated value format as well and you can see here we've used this PowerShell command to filter down to any rows that contain London and then put them into a new file. And as you can see here now the file is a bit smaller we can open it up inside any CSV reader and it will be able to handle the new file. As you can see we've got London in the titles here and if it's not in the title it will be in the description so obviously if you are a social media agency this is a really good way to find businesses in your city that are active on social media. And if we go all the way down to the data set you can see because we've got so many websites in this 18 million huge data set you can see we found 86 or over 86,000 websites that have some form of social presence that are based in London. So that concludes our demos for now we hope you can see how powerful these data sets are for outreach and lead generation and the big data projects etc. As I said full training is provided in video format upon purchase of how to work with these big data sets on your local machine. And as always thank you for watching.




top