Startup market research: the framework, the common mistakes, and the evidence that separates a defensible answer from a confident one.
Research report vendors charge between $2,000 and $5,000 for a single industry report. Most of those reports contain aggregate market size figures useful for context and essentially nothing else that a founder actually needs. The data you need for startup decision-making is mostly available for no charge through government databases, industry associations, and primary research.
The distinction between what research reports give you and what you actually need is important. Reports give you category-level metrics: total market size, aggregate growth rates, and demographic distributions across an entire industry. What you need are segment-level metrics: how many buyers of a specific type exist, what they currently spend, and what they complain about. Category-level and segment-level are different, and only one of them is useful for the decisions you are making.
Government databases: the most underused research source
The US Census Bureau's Statistics of US Businesses is the most precise source for counting businesses of a specific size in a specific industry. The NAICS code system allows you to filter by industry with granularity down to six-digit codes. The most recent data (typically two years behind the current year) counts establishments by employee size band (1-4, 5-9, 10-19, 20-49, 50-99, 100-249, 250-499, 500+) and state. This data is the foundation of a bottoms-up market size model.
The Bureau of Labor Statistics Occupational Employment and Wage Statistics program reports the number of people employed in specific occupational titles by industry and geography. If you are building a product for compliance officers, the OEWS will tell you how many compliance officers are employed in the United States, in which industries, and at what salary. Salary data is a proxy for budget authority: a compliance officer earning $130,000 per year at a 200-person company is more likely to have a $15,000 software budget than one earning $65,000.
The SEC's EDGAR database contains financial filings for all publicly traded US companies, including detailed revenue figures, cost of goods sold breakdowns, and operational expenses. If your target market includes publicly traded companies in a specific sector, EDGAR gives you precise financial data that no research report can match for accuracy.
The FTC and industry-specific regulatory agencies publish enforcement actions and compliance guidance documents that reveal how often the regulatory requirements your product addresses actually create problems. An FTC enforcement database showing 400 actions per year in your sector is primary data about the severity of the compliance problem your product solves.
Industry associations: membership data and benchmark reports
Almost every industry has at least one trade association, and most of those associations publish annual benchmark reports and member surveys. The data is often more specific than general research firm reports because it is collected from actual practitioners in the industry.
Examples: the Society for Human Resource Management (SHRM) publishes annual reports on HR technology adoption rates and budget allocation. The Healthcare Financial Management Association (HFMA) publishes data on revenue cycle technology spending at hospitals and health systems. The Equipment Leasing and Finance Association publishes quarterly data on originations and portfolio quality for asset-based lending.
Many of these reports are available to non-members for download, or are available in public library databases through Business Source Complete or IBISWorld (both available for free through most US public library systems). A library card to the New York Public Library, Chicago Public Library, or most major urban library systems gives you access to IBISWorld industry reports without any cost.
LinkedIn as primary research
LinkedIn is a database of individual professionals and companies that can be queried with the same specificity as a paid market research tool. The combination of job title, company size, industry, geography, and seniority filters in LinkedIn company and people search produces buyer counts accurate enough for a bottoms-up SAM estimate.
Beyond counting, LinkedIn is useful for two other research tasks. Job posting history (via LinkedIn Jobs with date filters) reveals what tools companies are currently using (job postings frequently list "experience with [tool name]" as a requirement) and what they are hiring for (which reveals operational priorities and pain points). People's career histories reveal the typical career path to the buyer role you are targeting, which informs how to reach them in outreach and how they perceive their own career advancement.
Reddit, G2, and Capterra for buyer voice
Reddit communities for specific professional roles are one of the highest-signal sources of buyer voice available. Subreddits like r/sysadmin, r/accounting, r/marketing, r/legal, and dozens of other role-specific communities contain thousands of posts where practitioners describe their actual problems, current tools, and frustrations. A keyword search within a relevant subreddit for terms related to your product's problem area produces genuine, unfiltered primary research.
G2 and Capterra are structured sources of buyer voice. Search for your top two or three competitors, read the 50 most recent reviews sorted by most recent, and tag each review by theme. The negative theme clusters that appear in 30 percent or more of reviews represent real, persistent market dissatisfaction that your product can address.
Primary research as the final step
Secondary research from all of the above sources tells you what is true in aggregate. Primary research (customer interviews) tells you what is true for the specific buyer you are targeting. Both are necessary. Secondary research structures your hypotheses and helps you design better interview questions. Primary research tests those hypotheses against individual buyers.
Five customer interviews are sufficient to determine whether the most critical hypotheses hold for the specific buyer segment you have targeted. Those five interviews, combined with two weeks of secondary research from the sources above, produce a research foundation comparable to what a paid research firm would deliver for a category report, but focused on the specific segment you are targeting rather than the category in aggregate.
Free data sources, ranked by usefulness for early-stage research
The most useful free source for US-based startups is SEC EDGAR. Every public company files quarterly and annual reports with segment-level revenue, geographic breakdowns, and competitive analysis written by the company’s own counsel. A 30-minute read of one incumbent’s 10-K tells you more about the category than most paid reports. The trick is to read the "Risk Factors" and "Management’s Discussion" sections, not the headline numbers.
FRED at the St. Louis Fed holds nearly every US macro and category time-series you would want, from healthcare spending to construction starts to consumer-confidence indices. The interface lets you download any chart as a CSV and the data is updated as governments publish.
Bureau of Labor Statistics is the right source for occupational and industry employment data. The Occupational Employment and Wage Statistics tables give you headcounts and median wages by SOC code, which is the cleanest way to size B2B markets where the buyer maps to a job title. The Quarterly Census of Employment and Wages breaks employment down by NAICS code at the county level.
The US Census and American Community Survey cover consumer and demographic dimensions. For non-US markets, the equivalent sources are Eurostat, UK ONS, Statistics Canada, Australian Bureau of Statistics, and country-specific equivalents.
Free industry-specific sources
For SaaS specifically, OpenView’s Expansion SaaS Benchmarks and ChartMogul’s SaaS benchmarks reports are published annually and free. They cover ARR, retention, CAC payback, and growth-rate distributions by stage. Pacific Crest’s SaaS Survey covers similar territory and is also free.
For consumer apps, Sensor Tower’s free reports and App Annie’s State of Mobile provide download and revenue estimates. For fintech, the CFPB’s Consumer Complaint Database is an underused source for category pain points. For healthcare, CMS data sets cover utilization, payment, and quality metrics.
Where to actually start
Two hours of structured reading on the right three sources is worth more than a $5,000 Gartner subscription. The structure that works: spend 30 minutes on the public incumbent’s most recent 10-K, 30 minutes on the relevant BLS or category-specific dataset to size the buyer population, 30 minutes reading 10 user reviews on G2 or Capterra to surface pain points, and 30 minutes building a comparable-wedge benchmark from publicly listed pricing pages.
The output of that two-hour session is enough to write a credible bottom-up TAM, name the buyer with a job title and headcount, and articulate two pain points the incumbents are not solving. That is a defensible market memo at zero cost. The rest of the work is customer interviews, which are also free. The total cost of decent market research at pre-seed is approximately zero dollars and 20 hours of focused work.
Where paid reports are worth it: only after you have validated the wedge and need to defend specific assumptions in a fundraising context. At that point, a $3,000 IBISWorld report or a CB Insights subscription can fill specific gaps. Before validation, the spend is misallocated. Spend the money on plane tickets to meet customers instead.