Data Sources

profileAPI Data Sources

Last Updated: April 24th, 2026

Overview

profileAPI acquires data both directly and through reputable partners. The list below describes the root sources of the data, regardless of whether profileAPI reaches those sources directly or through one provider or a chain of providers. Some data is directly published. Some is contributed through the use of applications and platforms. Some is inferred from observable patterns. Some is confirmed by the behavior of public-facing systems such as mail infrastructure. profileAPI then transforms, cleans, and merges the data depending on the use case.

This document focuses on root sources only. It does not separately list partner access paths, licensed procurement routes, internal QA steps, validation layers, or resolved entity layers as standalone source categories.

Comprehensive Source List

1. Public web pages

Publicly accessible non-first-party pages on the open internet.

This includes blogs, general web pages, and other indexable public pages that do not fit more specific categories such as company websites, professional profiles, directories, news, or social media. These pages may also expose useful metadata such as timestamps, labels, schema markup, and linked references.

2. Company websites

First-party company-owned web properties.

These are a source for company descriptions, product information, leadership references, office locations, contact details, and general business context. This includes the company website whether accessed live or through older copies of the same source. Company websites may also expose embedded metadata and structured references.

3. Career pages and job postings

Company career pages, recruiting pages, and job-board listings.

These are a source for hiring activity, team buildout, geographic expansion, role demand, and technology clues.

4. Public professional profiles

Public profile pages tied to a person's work identity.

These are a source for name, role, employer, employment history, seniority, and career context. They may also contain structured metadata, timestamps, and linked profile references.

5. Social media profiles

Public social-media and profile-style pages.

These are a source for identity clues, employer references, self-description, links, location hints, and activity context.

6. Resumes and candidate-profile pages

Public CVs, resumes, portfolio bios, and candidate-style profile pages.

These are a source for job history, education, skills, employer history, and identity matching.

7. Portfolio and personal websites

Personal domains, bio pages, author pages, and portfolio sites.

These are a source for identity, work history, contact methods, employer references, and profile links.

8. Company directories and business listings

Structured or semi-structured listings of companies and organizations.

These are a source for company name, website, phone, category, location, and related business metadata.

9. Conference and event pages

Speaker pages, webinar pages, agenda pages, and attendee-facing event materials.

These are a source for professional identity, title, employer, topic area, and recency signals.

10. News, press, and announcement pages

Press releases, funding announcements, executive-move announcements, partnership posts, and launch articles.

These are a source for company events, leadership changes, fundraising, acquisitions, and timing signals.

11. Public records

Government records, lawful public registries, and other formal public-reference datasets.

These are a source for legal entity data, registrations, licenses, and other official record fields.

12. Contact pages and published contact details

Public pages where email addresses, phone numbers, forms, or other contact endpoints are directly published.

These are a direct source for contact discovery and validation. This includes publicly visible business phone numbers appearing across company sites, listings, profiles, signatures, and contact pages.

13. User- or customer-contributed data

Data contributed through the use of an application, platform, or service.

This can include uploads, synced records, confirmed contacts, corrections, and other customer- or user-generated inputs that become part of the dataset. In many cases, this comes from software products that have customer data flowing through them.

14. Domain ownership and web infrastructure signals

Technical website- and domain-level signals used to understand domain ownership, domain-company mapping, related domains, and mail-domain configuration.

These are a source of company and contact context derived from the structure and behavior of public-facing web and mail infrastructure.

15. Email discovery and email-pattern signals

Email candidates and validation signals derived from company domains, observed mailbox conventions, naming patterns, and SMTP-style verification or similar mail-system behavior.

This includes inferred email patterns such as first.last or first initial plus last, domain-level mailbox convention knowledge, and mail-system signals that help confirm whether a guessed email appears valid or exists.

Closing framing

Across providers, the root-source landscape is broader than just public web pages alone. It includes directly published information, user- and customer-contributed inputs, domain and mail-system signals, and source surfaces that expose identity, company, and contact data. profileAPI's role is to acquire this data directly or through partners, then transform, clean, and merge it based on the use case.