Proxycurl VS PeopleDataLabs - Global Company Profile Dataset

January 29, 2021

4 min read

Since we launched LinkDB, we have received a barrage of requests for a company profile dataset. We understand that pairing people with companies will help...

Since we launched LinkDB, we have received a barrage of requests for a company profile dataset. We understand that pairing people with companies will help our customers understand questions like:

  • How many employees does a company have?
  • What is the makeup of roles in a company?

It is only natural we made crawling company profiles exhaustively a priority. But why work if you can get anything for free? So let's talk about the elephant in the room.

PeopleDataLabs (PDL) offers a "free company dataset."

It is true, PDL offers a free company dataset, and I was curious:

  • Why is PDL offering this dataset for free?
  • How many companies do they have?
  • What fields does this dataset have?
  • Is this dataset any good?

I put my spy hat on, went over to their website, and gave away personal information, including my phone number, and received this email shortly.

How many companies does the free Company dataset have?

I picked the CSV dataset dump in the email, and a file named free_company_dataset.csv.zip began to download. I unpacked it, and I ran the following wc command to find out how many lines of companies there are:

\$ wc -l free_company_dataset.csv 12258431 free_company_dataset.csv There we have it. PDL's company profile dataset has 12.25M company profiles.

Next, I wanted to find out what fields this dataset have:

\$ head free_company_dataset.csv name,domain,year_founded,industry,size_range,locality,country,Professional Social Network_url,current_company_employee_estimate,total_employee_estimate (le) poisson rouge,lprnyc.com,,entertainment,51-200,,,professionalsocialnetwork.com/company/-le-poisson-rouge,42,224 nearfox.com,nearfox.com,2015,internet,11-50,,,professionalsocialnetwork.com/company/zip-news,4,43 "mullin landscape associates, llc",,2007,construction,1-10,,,professionalsocialnetwork.com/company/mullin-landscape-associates-llc-,20,27 armatile,armatilearchitectural.com,1975,design,1-10,,,professionalsocialnetwork.com/company/armatile-limited,13,23 chameleon venues,,,marketing and advertising,1-10,,,professionalsocialnetwork.com/company/chameleon-venues,2,5 wagner kirkman blaine klomparens & youmans llp,wkblaw.com,1976,law practice,51-200,,,professionalsocialnetwork.com/company/wagner-kirkman-blaine-klomparens-&-youmans-llp,40,139 skilled engineering limited,,,insurance,1-10,,,professionalsocialnetwork.com/company/skilled-engineering-limited,1,31 gillette management llc,,,consumer goods,1-10,,,professionalsocialnetwork.com/company/gillette-management-llc,0,7 choice wood company,choicecompanies.com,1983,architecture & planning,1-10,,,professionalsocialnetwork.com/company/choice-wood-company,9,37 The column labels of the CSV file are:

  • name
  • domain
  • year_founded
  • industry_size_range
  • locality
  • country
  • Professional Social Network_url
  • current_company_employee_estimate
  • total_employee_estimate

Not bad. It does have the most important fields, except the timestamp for the last point of update.

How old is PDL's company profile dataset?

profiles, and make statistical inferences.

I extracted the first 999 companies from the dataset, and threw it into a Bulk Professional Social Network Company scraping script that I opened-sourced [here](https://github.com/nubelaco/enrich-Professional Social Network-companies-in-bulk). This script uses Proxycurl's [Professional Social Network Company Profile API endpoint](https://nubela.co/proxycurl/Professional Social Network/company) to scrape and enrich a Professional Social Network Company Profile URL if it is valid.

Out of [999 companies](https://docs.google.com/spreadsheets/d/1vsjyQ1OssxQHLdm8g2e0ZccjfRqC72rCQ3_4G5mRBzI/edit? there were only results for [835 companies](https://docs.google.com/spreadsheets/d/11a_JK1zTlS2b2a_ZBhES4f8gr-21vzpf_5XDf4J47Vc/edit?

16.4%, or 164 out of 999 companies provided in the dataset, are not valid on Professional Social Network.Extrapolating that, 2,010,382 companies are dead in free PDL's company dataset.

I conclude that this dataset is super old.

Why is PDL offering you an outdated Company Profile Dataset for free?

Because you are an ideal customer interested in big datasets, they can collect personal and contact information about you to further upsell you.

Our turn - 17M companies in Proxycurl's LinkDB, our profile database

What about our dataset?

In January, we commissioned a crawl of all public Professional Social Network company profiles. I am happy to share that we have 17+M company profiles available now in LinkDB. [Proxycurl's Professional Social Network Company Profile API endpoint](https://nubela.co/proxycurl/docs#Professional Social Network-company-profile-endpoint) was employed to accomplish this feat.

These company profiles were updated just a few days ago and are up-to-date at the point of writing. And they will stay up to date because we will not stop refreshing them.

Fields in Proxycurl's Company Profile Dataset

The following fields represent companies in our dataset:

  • Professional Social Network_internal_id
  • description
  • website
  • industry
  • company_size
  • company_size_on_Professional Social Network
  • HQ
  • company_type
  • founded_year
  • specialties
  • locations
  • name
  • tagline
  • universal_name_id
  • funding_data
  • search_id
  • similar_companies
  • follower_count

Yes, our dataset has a lot more fields.

In summary: Proxycurl VS PeopleDataLabs - Company Profile Dataset

Proxycurl Company Profile Dataset PDL Company Profile Dataset

17M profiles 12.25M profiles

Last updated on 25th January 2021 Last updated many years ago

Standard fields + description, headquarter location, company type, specialities, locations, profile picture, similar companies, Professional Social Network follower count Standard fields

0% DEAD profiles 16.4% DEAD profiles

Monthly data updates No updates

Proxycurl's Global Company Profile Dataset is available now.

  • Please don't take my word for it. Try it yourself. If you register and log into Proxycurl, you will access LinkDB, our PostgreSQL server, which contains the Proxycurl's Global Company dataset. Make a few queries and sample the data for yourself :)
  • Yes, we do sell a snapshot of our global company dataset. Keen? Please send me an email to hello@nubela.co.