Open Source Data

What is Open Source Data?

  • Open data are broad databases that you, I, or anyone with an internet connection can access. That type of data comes from outside worldwide outlets. It can be anything from government agencies gathered public data to roundups of economic trends from banks and financial conglomerates but most importantly, marketing data. Why is open data important?  Open data is knowledge that is publicly available to anyone to use. 
  • For market, this data can be used for predictive intelligence and forecasting, identifying trends purchasing demographic group, finding new opportunities for innovation, and so much more.  It is not only with the advent of big data that businesses should be immersed in their own data.
  • ​That’s why at the moment we’ve identified the top open data sources ready for use for marketing related purposes.
  • Our Open Data sets are broken down into two segments; the first part which consists of Reference and Suppression are compiled by our team and provided to you. The second set, after the page divider, are open data sources from other domains that provide the information. Put it to good use!
Navigation |
    Add a header to begin generating the table of contents

    Reference Tables - Open Data

    1. Phone Geo-Reference (US)

    Lookup reference table that pairs phone area code with State & Timezone. Great for using on automations within your CRM or Web-join submission compiled sheets.

    2. Gender Name Lookup Data Set

    A table of 99.4% of all first names linked with Gender. Each gender match contains a “confidence match score” from 0 to 1 that clarifies how accurate the Name to Gender match is.

    Suppression Data for Marketing Campaigns:

    1.  DNC Litigator Data Set

    Open Data list of confirmed and validated phone numbers of litigators for DNC related issues. The reason we classify this as a “Community List” is the ability for others to contribute to the dynamic list (all values are validated and confirmed before new records are added).

    You can view more information and access the data source here.

    2. Distro Email Values

    Before validating email addresses, do you remove the values that you know will be bad? This data set includes the top 70% of all accept all and disposable email leading names such as info@, admin@, contact@, etc. You would be surprised how much money you will save by removing these values pre-verify.

    3. Swear/Curse Words

    To be used as look-up suppression values in your autoresponder emails.

    Business Directory Data

    Yelp Directory

    Tap into the millions of existing business reviews using Yelp’s open datasets to gain a deeper understanding of sentiment toward businesses, as well as any patterns and trends.

    View data set here

    Google Scholar

    In search engine fashion, Google Scholar lets users search for datasets like they would with any other Google search. Find educational, peer-reviewed sources of data on just about any topic!

    View data set here

    Pew Research Center

    Pew is one of the largest open data sources in the U.S. with datasets aggregated through high-quality surveys. Data from surveys are typically released two years after reports are issued. You’ll have to create a free login to access Pew Research Center.

    View data set here

    Open Corporates

    One of the largest open databases of companies in the world holds hundreds-of-millions of datasets in essentially any country.

    View data set here

    Graph API

    Curated by Facebook, Graph API is the primary way for apps to read and write to the Facebook social graph. It is essentially a representation of all information on Facebook now and in the past.

    View data set here

    Social Mention

    Acquire real-time data on social sentiment, keyword usage, users, and hashtags using the Social Mention search engine.

    View data set here

    Google Trends

    Search what the world is searching using Google Trends datasets on latest search trends. Marketers can pinpoint timely campaigns using this data.

    View data set here


    Under the supervision of Google, Kaggle is an online community of data scientists who publish seemingly random datasets on everything from tracking the frequency of internet memes to “last words of death row inmates.”

    View data set here

    r/datasets (Reddit)

    Reddit is a vast online community, and this particular source is comprised of Redditors who scrape the web for interesting datasets in the R programming language.

    View data set here

    Google Public Data Explorer

    Many of the sources included on this list are actually consolidated on the Google Public Data Explorer. If you’re not sure where to start pulling data from, this could be a good starting point. There’s also free access to the Google Dataset Search engine.

    View data set here