Creating a Lead Scoring Model With HubSpot (and Validating it With R)


A reliable lead scoring model is a sales Holy Grail.

Lead scoring allows you to know where each lead is within their journey from discovering your service to closing, at scale.

Of course, the perfect lead scoring model doesn’t exist. All we can do as marketers is to create the closest thing to it, thanks to our knowledge of:

  • our target audience,
  • our marketing stack (the content you have out there, your company presence, etc.),
  • our ideal customer, and,
  • the ideal customer journey.

In this post, I will lay out the strategy I’ve used to craft a lead scoring model using HubSpot’s native tool. Since this is an imperfect process, I’ll also show you how I’ve used very basic R to validate the different lead scoring iterations I went through.

Let’s get started.

Fresh, honest, and actionable growth content. In your inbox.

Scoring lead is personal

First, I must reinforce one important point: my lead scoring system will not fit your business. And vice versa.

Scoring leads is extremely personal. If you scroll back up to the 4-point list above, you’ll know why. Only you know (even if partially) who your customers are, what your company does, who you are targeting, and how you move your customers through a journey.

The fact is, there is no perfect, universal lead scoring model. This is also the reason why I’m showing you how to use R to validate whatever you come up with.

Expected score distribution across all leads

Before we embark on creating the lead scoring model, it’s worth taking a moment to think about what the distribution would look like.

This is purely a mental and theoretical exercise.

Picture your leads and what you believe to be qualifying information about them. My business is B2B, and my website has a lot of top of the funnel (TOFU) content. I, therefore, expect that my leads should be split in the following format:

  1. Lots of unqualified leads. I expect a vast amount of the people who have entered the minimum amount of details required on my website are nowhere near qualified enough to be sent to a salesperson. They most likely downloaded a piece of content only to never be seen again. We have loads of those.
  2. A few slightly qualified leads. I expect my website’s TOFU content will impact a few leads and start driving them down the qualification path. They would have downloaded a piece of content and browsed the website a bit.
  3. An average amount of almost qualified leads. These people would have downloaded one or two pieces of content, browsed the website, and maybe even signed up to our newsletter.
  4. A small number of qualified leads. This number is getting smaller and smaller as they get more qualified. These are the golden tickets for our sales team and, unsurprisingly, are few.

I expect the score distribution across my entire database of leads to look like this:

lead scoring expected distribution

My beautiful drawing of the expected score distribution across my contacts. It looked better on paper.

Take a minute to ponder and be honest with yourself. What would your graph look like? This is important for later on when we’ll validate our findings.

First attempt at lead scoring: key actions

For my first attempt, I decided to tackle scoring leads by key actions taken on my site.

Since I run a B2B website, the main key action I want my leads to take is to submit an enquiry form. This allows our sales team to reach out to them.

Secondary actions would be to download content. Usually, by doing this, the leads give us quite a bit of information that allows us to paint a decent picture of their potential.

Finally, I work for a UK company and therefore any lead that does the above while being in the UK is great for us.

So, using HubSpot’s native manual lead scoring tool (Settings -> Marketing -> Lead Scoring), I built the following model:

  • Submit an enquiry: +10
  • Download a piece of content: +10
  • Country is UK: +10

And a few more small tasks for 2 points each. I also added negative scores for any leads outside of the UK, US, or Canada, as well as any leads that use a non-company email address (Hotmail, Gmail, AOL, etc.).

The max possible score using this model is 42.

Great. Now what?

HubSpot’s native scoring tool

HubSpot’s native scoring tool is great. It’s easy to use and, when set up properly, can really deliver insights into the health of your lead database.

When you’ve set up a scoring system, they allow you check its viability by pushing a test contact through.

hubspot lead scoring test contact tool

HubSpot’s native lead scoring test tool allows you to pick a single contact to parse through your model.

You can either enter a contact manually or find one through the list, and get that person’s score back. You can then use this number to find out if the scoring model makes sense.

For instance, if you know that one of your leads is a fantastic opportunity for you, you can enter that email address and see what comes back. Close to max number? Great! Close to zero? There might be a problem.

This is all quite useful but unfortunately very limited. Here’s why.

  1. You can’t see the impact your scoring has had on your entire lead database.
  2. You can’t possibly test all your contacts one by one to make sure you’re on the right track.
  3. Your scoring model is very hard to picture beyond what HubSpot displays.

So, to remedy this and validate our model, we’re going to export all our contacts and plot their lead score distribution. This will give us an overview of the impact our model has had and will tell us whether we’re on track to match our expectations.

Exporting lead score data in HubSpot

Easy peasy.

Go to Contacts -> Lists.

Create a new list (I called mine ‘playing with lead scores’).

Make the list static (you want a fixed point in time).

Under Contact properties, look for HubSpot score. Define it as ‘known’.

This should give you a list of all your contacts and their lead scores.

Important: it takes a bit of time for HubSpot to populate your list. It also takes them a few minutes to apply your scoring model to your entire database. Go make a coffee and take your time. You definitely don’t want to be working on partial data.

Once you’re done, click Actions -> Export. Make sure you have the HubSpot score selected as a column (you don’t need anything else). Export in CSV.

Validating your HubSpot lead scoring model with R

We are going to use R to render a representative graph of the score distribution across all our leads. I’m aware this could also be done in Excel and a myriad of other tools, R is just the one I chose.

Preparing your data

First, we’ll clean up the spreadsheet. Open your CSV file and remove all columns except the HubSpot Score one. Make sure you keep the column name on row 1.

hubspot lead scores in csv

Strip out your export from HubSpot to just the scores.

And… that’s it. Fairly easy step.

Visualising the score distribution with R

In your preferred R terminal (I use R Studio), we’re now going to pull all the scores and arrange them on a distribution graph.

This is the code I used:

# Load the csv data into my environment.
hubspot_scores <- read.csv(file.choose(),stringsAsFactors=FALSE)

# Create an array containing each score
C <- c(hubspot_scores$HubSpot.Score)

counts <- table(C)

# Use the array to plot the scores
barplot(counts, ylim=c(0,1200), col="#FE4B34", 
main="HubSpot Lead Score Distribution",
        xlab="Scores", ylab="Number of contacts")

# For testing purposes, this allows me to
# pick a random score and return the number
# of contacts with that specific score.
count35 <- length(which(hubspot_scores$HubSpot.Score == 55))

Note: I’m only just starting to learn R. If there’s a better way to do it, please drop a comment below — I would love to learn from you.

When I run this code with the data from my first iteration, this is what I got:

hubspot lead scoring distribution with R

Our first lead scoring iteration is not looking consistent.

Referring back to what I expected to see, this clearly isn’t it. The chart is all over the place and very uneven. It doesn’t make sense.

This suggests I haven’t attributed the right scoring structure to my contacts. Let’s try again.

🔥Hot tip🔥

Adjust your Y axis for clearer insights

When dealing with data, even for something as simple as this little exercise, clustering is everything.

So, a simple but 🔥H O T🔥 tip: adjust your Y axis. In the code above, modify the second number on this line



Your data will 100% be different than mine, so please don’t forget to do this.

Oh, and you’ll have to tweak that number again when you iterate on your model.

Second lead scoring iteration: progressive approach

My theory behind the clearly poor score distribution we see above is simple. It looks like I tied clear cut scores to specific actions, without any progression at all.

For instance, I attributed 5 points to users who visited more than 5 pages on my site. But no point at all for anything else.

What about the people who visit 2 pages? What if these 2 pages are product pages?

Clearly, there’s a progression there I could fix. Here’s what I’ve done:

  1. Progress score from 1 to 5 points for visiting pages (1-2 pages is 1 point, 3-4 pages is 2 points, etc.).
  2. Progress score from 1 to 5 points for clicking marketing emails (same progression as above).
  3. Progress score from 1 to 5 for downloading documents (same progression as above).

I also realised I made a rookie mistake in my first attempt. I tied a key marketing action (filling up the enquiry form) to a bunch of points.

This is silly because my scoring system’s purpose is to show me who in my contact database is getting warmer — not the ones who are already hot and ready to speak to sales.

Once a contact fills it out, the game is already won. It’s not a key action that requires points, it’s the end goal. So, I removed this action from my model as well as removed any contact who has filled my enquiry form from the contact list.

Right, let’s run this again.

Validating using R (again)

Following the same steps as above, I downloaded my list of contacts and threw my data into my R script.

This is what came out.

lead scoring with hubspot distribution in R

This new lead scoring model shows a much more consistent distribution.

Now that’s more like it!

We can clearly see a massive spike on the left of all the unqualified contacts. That’s fine, I expect to have loads of those (and so should you).

Then, we see a clear bell-shaped distribution. Few are barely qualified, some are a little bit qualified, then it goes back all the way down towards super-frickin-warm-lead.

Final thoughts

  • Creating your own lead scoring model is more valuable than relying on a ready-made feature that comes with your CRM. You know what a good lead looks like. Dig into it.
  • Don’t forget to remove your end goal from your scoring. You’re trying to proactively find warm leads to help nudge in the right direction. Once they’re over the line, they shouldn’t be marketing’s focus anymore.
  • You’re never done. I’m not done. This is probably not the most ideal scoring system. Talk with your sales team regularly, check the healthiness of the warm leads you’re giving them using your model; and tweak.
  • Start small. You can really go as complex and granular as you’d like with this, but I suggest you don’t. Start small, extract insights, talk with sales, and enhance.
  • Once you’re ready to go bigger better faster stronger, read Lindsay’s article on lead scoring in HubSpot. She gives great tips on advanced lead scoring like list segmentation, internal reporting, personas, and more.
  • If you work at HubSpot (or know someone who does), I’d be delighted if you added a feature that displays your score distribution from within your score building tool. I’ll even let you use my super advanced code up there, free of charge. Just the kind of guy I am.

Thank you for reading. If you found this article (or any parts of it) useful, it would mean a lot to me if you could share with a friend.

Fresh, honest, and actionable growth content. In your inbox.



Digital marketer. Company grower. Bearded internet warrior. I grow projects into profitable businesses.

Want to know more? Read this.

Recent Posts


Social media is the best way to get in touch with me. Use the buttons below, follow, and please engage -- I answer every message.