Using Proxycurl's Historic Professional Social Network Employee Count Tool for Investment Research
March 22, 2023
9 min read
What is Proxycurl's Historic Professional Social Network Employee Count Tool, and why is it useful? Proxycurl's Historic Professional Social Network...
What is Proxycurl's Historic Professional Social Network Employee Count Tool, and why is it useful?
Proxycurl's Historic Professional Social Network Employee Count Tool counts employees during historical snapshots and provides a trend line dating as far back as the user wants (with the caveat that it's only accurate as long as there are no major statistical confounding factors, such as the company being in stealth). This might already sound awesome to you ut let's talk about why you might want to use this tool.
First of all, we have to understand that Professional Social Network employee data is a good estimate for total employee data: We're mostly interested in the tech sector in the English-speaking world, where Professional Social Network is the ubiquitous social media platform for sharing employment status. And we won't be looking back more than a few years in time, so Professional Social Network's status as such won't be in question. We'll also apply common sense to the results; if our trend line goes down to 0 and news stories said that the company in question was in stealth but existed, we'll believe the news stories, not our trend line.
Example: HireEZ
Let's now look at an example. Here's a graph of HireEZ's extrapolated employee counts. What can we learn from it?
 A couple things stand out:
A couple things stand out:
- 
HireEZ had a huge period of growth in headcount from about December 2020 - February 2021, continuing even through July 2021. 
- 
HireEZ had another burst of hiring around February 2022. 
- 
The recent tech layoffs have not left them alone, and they've had a downturn in headcount since October 2022. 
Without this data, what methods might we use to look for the health of a company? If we were investors in HireEZ, we would have access to investor reports, but we're outsiders. Well then, we can ask, what are investors doing? Let's look at HireEZ's funding rounds. In August 2020, HireEZ (then known as Hiretual) raised a $13 million Series B round, and in February 2022, they raised another $26 million.
These dates don't correlate exactly to our data - it looks like they started hiring a bit after the first funding round that we saw - but it's a good estimate, and it looks like they used these injections of capital to fund hiring.
A quick search doesn't pull up results for "hireez layoffs," but we can make inferences. Is this company doing well now? Of course, this question must be put into context with the entire market going through a downturn right now, but these graphs give us some intuition.
Example: Backblaze
In contrast to the privately-held HireEZ, Backblaze is public - they IPO'd in November 2021. Public companies have an indicator of their health that's unavailable for private companies: their stock prices. But how does this correlate to employee count? And which is the better signal of company health?
Here's a screenshot of Backblaze's stock since their IPO, courtesy of [Google Finance](https://www.google.com/finance/quote/BLZE:NASDAQ?
 And here's their employee count, which has risen pretty steadily, with a very small reduction in the past couple months.
And here's their employee count, which has risen pretty steadily, with a very small reduction in the past couple months.
 What do you think about their current health? What does this graph tell you that the stock price didn't? Have you learned something new?
What do you think about their current health? What does this graph tell you that the stock price didn't? Have you learned something new?
I'm in ow can we make graphs like this?
Obviously, we have to scrape Professional Social Network in some way, as Professional Social Network doesn't provide an API. If you're a regular reader of this blog or already use the Proxycurl API, it's great to have you back f not, you may want to read this post about scraping Professional Social Network for structured data.
In this particular case, though, there's actually another option that we could theoretically use, which you might be familiar with: Professional Social Network's in-house Premium Business Insights. However, if you want to analyze this at scale (which we do!) then it would be a TOS violation to scrape data using this tool, and you'd risk your Professional Social Network account being banned. So we are back to the Proxycurl API, built for developers looking to scrape Professional Social Network.
There is no API endpoint where we can say requests.get(some_endpoint, \{\}) and have it return what we want. But that's okay, that's why the rest of this post will be a lot of fun e have built a tool that allows you to make use of three endpoints to perform a calculation that approximates the result. In the rest of this post, I will introduce you to the Proxycurl Historic Employee Count Tool.
Want to run it yourself? It's available as a docker container. To download, run:
docker pull ghcr.io/nubelaco/historic-employee-count-tool:master 
And to execute run:
docker run -it ghcr.io/nubelaco/historic-employee-count-tool:master PROXYCURL_API_KEY TARGET_COMPANY_LI_URL > employee_count_history.csv 
You can also clone the repository yourself if you want to edit the Python code or you're not comfortable with Docker, and there's some additional documentation available in that repo's README that's not covered here.
How to build a historic Professional Social Network employee count tool
High-level overview of the tool
- 
Grab the current month's total employee count from Professional Social Network. This is very easy. 
- 
For each of the past N months, grab a snapshot of the number of employees with public profiles on Professional Social Network (since this is all that's available to us). This is a little tricker, but still doable. 
- 
Use step 1 and 2 to calculate Xin the following ratio for each month:previous snapshot:X=current month's snapshot count:current month's total told us by Professional Social Network.
Proxycurl endpoints we will use
- 
Employee Listing Endpoint - One of the company endpoints, this lists every employee in a company and gives links to their profile URLs. In the Proxycurl API, a Professional Social Network URL is always the unique identifier of an entity, be it company, person, job, or anything else. 
- 
Employee Count Endpoint - Another one of the company endpoints, this does exactly what it says: It gives us a count of the employees employed by the company. You can get both cached information from Proxycurl in the form of a linkdb_employee_count, which can be eitherpast,current, orallemployees. For this tool, we're more interested in theProfessional Social Network_employee_count, which is scraped directly from Professional Social Network and includes private profiles.
- 
Person Profile Endpoint - This endpoint is optional and is a performance enhancement and cache invalidator. We could, if we wanted, use the first endpoint with the enrich_ option instead. That endpoint would then enrich our first query with the person endpoint profile results. But for performance, we can batch our queries here & use the async [Proxycurl Python client library](https://pypi.org/project/proxycurl-py/) to run the script a bit faster - and with theuse_ flag, which lets us ensure our data is never more than 29 days out of date.
How to implement the high-level overview
- 
Query data from these endpoints. 
- 
Create an array of datetime.dateintervals usingtimedeltaandcalendar.monthrange(honestly, this would probably have been the hardest part of the entire project had I not done something nearly identical several years ago).
- 
Check when the experiences.startandexperiences.endranges intersect the month intervals we created in step 2 for every employee whose profile.
- 
Progressively accumulate these intersections into an array of total_employee_ranges.
- 
Calculate, for each month, Xin our ratio from the high-level overview:@staticmethod def get_adjusted_employee_counts(past_employee_counts: List[int], current_employee_count: int) -> List[int]: current_employee_ adjusted_employee_ for item in past_employee_counts: if current_employee_ 0: adjusted_employee_counts.append(0) continue adjusted_employee_counts.append(int(item * current_employee_count / current_employee_estimate)) return adjusted_employee_counts
- 
Construct a csv to print to the user. 
Gotchas
There's a few gotchas we have to address:
- 
The proxycurl-py library raises and logs an exception on profiles that 404, but in this particular case we are expecting some profiles not to exist, so we need to catch & silence this particular exception. 
- 
In some cases, users will link to an internationalized URL of the company, for example https://pl.professionalsocialnetwork.com. We will match against only the last portion of the url (keeping in mind there could be a trailing slash), or:@staticmethod def identifier(url: str) -> str: return arr[-2] if arr[-1] == '' else arr[-1]
- 
Off-by-one: What to do about the current month? We don't want to print it to the user, it's going to be incomplete. But in our proportion of previous snapshot:X=current month's snapshot count:current month's total told us by Professional Social Network, the entire RHS (right-hand-side) must refer to the CURRENT month. So we do have to include the current month in our data set. The choice we made, therefore, was to include the current month in my months object, but then not print it to the user at the end:`for i, month in enumerate(self.month_ranges): if 0:Recall we used the first slot for current information, and not for a full month of data.continue o.append(f",{adjusted_employee_accounts[i]}") ` 
Do you have to query the entire company to get meaningful results?
Probably not e tested on Stripe, which Professional Social Network lists as having 8003 employees, and the linkdb_employee_count from the Employee Count Endpoint gives 8766 when employment_status is set to all. If we query only 3000 of these, or a bit under half, we can get a pretty accurate picture of the trend line, and better yet, our script takes only about 6 minutes to run. Here's a graph with various limits:
 In the graph above, the solid blue line shows the trend if we didn't do the extra work involved to use the Person Profile Endpoint at all, and simply used the employee listing endpoint with `enrich_ As you can see, the trend is still mostly visible; however, this method is significantly slower than the async Person Profile Endpoint method, and we don't recommend it.
In the graph above, the solid blue line shows the trend if we didn't do the extra work involved to use the Person Profile Endpoint at all, and simply used the employee listing endpoint with `enrich_ As you can see, the trend is still mostly visible; however, this method is significantly slower than the async Person Profile Endpoint method, and we don't recommend it.
The orange heavy dashed line represents the most accurate data; this line was generated by using all company data. The next two lines show the query limited to 3000 and 1000 employees, respectively, and finally we showed a query limited to 500. We chose 3000 as a default, but you can use this chart to guide your decision based on the size of the company you're querying and how precise a result you want.
You might wonder: is there any bias in the ordering of the sample data? The Proxycurl orders by Professional Social Network ID, so hopefully employees aren't leaving and joining based on alphabetical order ut just in case they are, this sample is being determined randomly:
   @staticmethod     def get_limited_sample_of_urls(past_employee_urls: List[str], limit: int) -> List[str]:         if -1 or limit >= len(past_employee_urls):             return past_employee_urls         shuffle(ordering)         for i in range(limit):             ret.append(past_employee_urls[ordering[i]])         return ret
More data is coming
Intrigued? We have more of these tay in touch to make sure you don't miss out on it by [subscribing to our newsletter](https://sendy.nubela.co/subscription? Or sign up for our API and build this project yourself - if you identify something interesting give us a shout at hello@nubela.co & maybe we'll feature you in another post