Skip to content

How to score a startup idea: a framework that forces honest answers.

Scoring a startup idea is useful only if the framework forces you to answer questions you would rather avoid. Most scorecards do the opposite: they are structured to produce an encouraging result. Here is one that is not.

BY Farzan Ansari8 MIN READSTRATEGY

Startup idea scoring: the framework, the common mistakes, and the evidence that separates a defensible answer from a confident one.

Every founder has a way of evaluating startup ideas. Most of those methods have a fatal flaw: they are designed to produce a positive outcome for the idea the founder already wants to pursue. The fix is a scoring framework with explicit kill criteria before the analysis starts. The questions are framed generously. The assumptions are optimistic by default. The result is a structured confirmation of what the founder already believed.

A scoring framework that is actually useful has to be designed to produce failures. It should be easy for a weak idea to score low and hard for a weak idea to score high. The questions should probe the assumptions most likely to be wrong, not the assumptions most likely to be right.

The framework below scores across five dimensions: market, buyer, economics, competition, and team. Each dimension has three questions. Each question is scored 0 (cannot answer), 1 (weak evidence), 2 (some evidence), or 3 (strong evidence). A total score above 33 out of 45 indicates a strong foundation. A score below 25 indicates at least one dimension needs significantly more work before the idea is ready to build.

Dimension one: market

Question 1: Can you state the TAM using a bottoms-up model built from named data sources, not a research firm headline?

Question 2: Is the segment you are targeting growing at more than 5 percent annually, based on a sourced trend figure from the past 24 months?

Question 3: Is the SAM large enough to support the revenue target in your 5-year plan at a realistic market penetration rate (5 percent or less)?

The market dimension is designed to fail ideas built on top-down market sizing or undifferentiated TAM claims. A founder who cannot build a bottoms-up SAM from public data sources should score 0 or 1 on the first question, which immediately signals that the market thesis is not yet validated.

Dimension two: buyer

Question 1: Can you describe your buyer by job title, company size, industry, and geography, and explain precisely why they are the right buyer (rather than an adjacent role)?

Question 2: Does your buyer control or heavily influence the budget for a purchase of the size you intend to charge?

Question 3: Have you spoken to at least five people who match the buyer description, and did more than three of them describe the specific problem you are solving without prompting?

The buyer dimension is where most idea evaluations are weakest. "Enterprise companies" is not a buyer description. "The VP of Revenue Operations at a Series B to D B2B SaaS company with between 50 and 300 employees" is a buyer description. The third question is deliberately structured to require primary research, which most founders skip.

Dimension three: economics

Question 1: Can you state the CAC for your planned go-to-market motion, sourced from benchmarks for comparable companies, not from a bottom-of-funnel assumption?

Question 2: Does your LTV to CAC ratio exceed 3:1 at the price point you intend to charge and a realistic churn rate for your segment?

Question 3: At 24-month customer counts that are achievable with your initial funding, do the unit economics reach a positive contribution margin?

The economics dimension forces a unit economics calculation with specific numbers. It is easy to describe a business that works at scale. The relevant question is whether it works at the customer counts you can reach in 24 months. Many startup models look viable at 500 customers and unviable at 50 customers, which is where you actually start.

Dimension four: competition

Question 1: Can you describe the current solution (not just the named competitors) for your target buyer, including its cost and the failure modes that create dissatisfaction?

Question 2: Is there a reason the gap you are entering has not been filled by an existing player, and is that reason durable?

Question 3: If the largest existing competitor in your space copied your core feature in 90 days, would there be anything left that is defensible?

The competition dimension is designed to surface the scenarios founders prefer not to think about. The third question is the most important: a business that depends on a feature lead that can be closed in 90 days is not a business with durable competitive advantage. The expected answer is something other than the feature itself: distribution, data, customer relationships, or a go-to-market motion the large player cannot profitably replicate.

Dimension five: team

Question 1: Has at least one member of the founding team done the highest-risk execution task in the plan before, in a directly comparable context?

Question 2: For the two most critical hires in the first 12 months, do you have a realistic path to filling those roles within the time your plan requires?

Question 3: Is there a founder on the team who has a strong existing relationship with the buyer you are targeting, or a clear and specific plan to build that network before launch?

The team dimension does not score whether the team is impressive in aggregate. It scores whether the specific execution risks of this specific idea are covered by the team as currently constituted.

How to interpret the score

An idea that scores above 33 out of 45 is ready for investment of time and resources. An idea that scores 25 to 33 has specific gaps that are worth addressing before proceeding. An idea that scores below 25 has at least one fundamental question unanswered, and building before answering it is a high-risk decision.

The score is not a judgment about the founder or the long-term potential of the idea. It is a map of what is known and what is not. A score of 20 with a clear plan for how to answer the open questions is more valuable than a score of 35 built on assumptions no one has tested.

A scoring rubric that survives diligence

A useful scoring framework has five dimensions, each with an evidence requirement. Market: bottom-up TAM with at least one primary database source. Wedge: a single buyer persona with a job title, tool they use today, and headcount. Differentiation: a 10× claim with a named falsifier. GTM: a credible CAC and a channel hypothesis with cost evidence. Team: a relevant skill mapped to each of the first three hires. Each dimension scores 0 to 4 based on evidence quality, not feelings.

A 0 means no evidence. A 1 means the founder has an opinion but no data. A 2 means the founder has cited a source but the source is generic. A 3 means the founder has a primary source and a specific number. A 4 means the founder has named the threshold at which the conclusion would change. Maximum score is 20.

The reason this rubric works is that it ranks evidence quality, not founder enthusiasm. A founder who scores 12 with all 3s is in better shape than a founder who scores 16 with mixed 4s and 1s. The 4s suggest deep work in some areas; the 1s suggest the founder has not done the work in others. The unevenness is the signal.

Three failure modes of scoring frameworks

The first failure is the score-as-decision trap. Founders treat a high score as permission to build, and a low score as permission to give up. Both are wrong. The score is a map of where the work is solid and where it is shaky. The decision is whether to invest the time to firm up the shaky parts.

The second failure is the rubric drift. Founders adjust the rubric until their idea scores well. This is so common it shows up in every category of self-assessment. The fix is to lock the rubric before scoring the first idea and treat any subsequent rubric change as a separate decision with its own justification.

The third failure is the comparison shortcut. "My idea scores 14 and Slack’s scores 18, so I should aim higher." Score comparisons across companies and stages produce noise. The rubric is for tracking the same idea over time, not for ranking your idea against others. a16z’s framework writing is consistent on this point: metrics matter when they drive an action, not when they drive a comparison.

The scoring conversation

The most valuable use of a scoring framework is not the score itself. It is the conversation it forces. A founder who scores their own idea on the rubric above will discover, by the end, which two dimensions are weakest. Those two are the work for the next 30 days. The score gives the founder a structured way to articulate what they do not yet know, which is the actual deliverable.

Two outside readers, scoring the same idea on the same rubric, will produce three different scores. The difference is the diagnostic. Where they agree, the evidence is clear. Where they disagree, the evidence is ambiguous, which means the founder has not communicated it well. Both directions are useful.

Verdikt’s methodology bakes this rubric into every verdict, with explicit evidence-tier scoring for each dimension and a named kill criterion for the dimensions that score below a 2. The verdict that comes out the back is not a score; it is a recommendation paired with the conditions under which the recommendation would change. The rubric is the diagnostic. The recommendation is the deliverable.

FAQ

Frequently asked questions

Should you use a scoring framework to decide whether to pursue a startup idea?
A scoring framework is most useful for identifying which assumptions are weakest, not for making a binary go or no-go decision. The goal is to surface the two or three dimensions where your evidence is thinnest, so you can direct research effort toward them before committing to building. A low score should produce a research plan, not a rejection.
What should you do if a startup idea scores low on the economics dimension?
Rebuild the unit economics model with realistic benchmarks rather than optimistic assumptions. The most common fix is a price point increase (which improves LTV), a go-to-market motion change (which reduces CAC), or a target market shift (which changes the benchmark churn rate). If no combination of reasonable adjustments produces a viable unit economics model, the economics dimension score is telling you something important about the business model.
PUT IT TO WORK

Test your own idea. Get a verdict in under one hour.

Start a verdict