Killian Brief
April 30, 2026 · Nightly Run · 6 Bets Shortlisted
Bets shortlisted
6
Avg judge score
70/100
Run cost
$3.81
Bet #1

Trust layer for mid-market legal AI

judge 65/100edge 1.5/10ai native

Mid-size law and compliance firms — 50 to 500 people — burn 30-45 minutes per query making associates hunt through PDF graveyards for answers that already exist in their own files. Harvey and Casetext won't help them: those tools are trained on public databases, priced at $100-200/seat for BigLaw, and structurally can't ingest a firm's proprietary precedents. Partners don't trust the outputs anyway because citations hallucinate.

There are roughly 8,000 mid-market legal/compliance firms in EU+UK fitting our profile. At 1% capture and €18k blended ACV (€8k build + €1.5k/mo), that's ~€1.4M ARR — small but real, and the underserved segment is genuinely empty between Notion AI and Harvey.

The wedge is narrow but defensible short-term: own-document ingestion plus source-authority-weighted citations plus a senior-annotation feedback loop. Generic RAG players (Glean, Guru) will commoditize retrieval in 18-24 months, so the moat is the trust layer, not the pipes.

Why now: GPT-4-class models finally pass the 95% citation-accuracy bar on domain documents; 12 months ago they didn't.

Honest on edge: I have no specific operator advantage here. This is a cold-start enterprise sale into a sector I don't own.

The path: one paid pilot, €2,700 build fee, 14 days to first answers. Kill at day 60 if we don't have 2 signed POs and €2k MRR, or if InfoSec blocks ingestion at 2 of 5 prospects without a self-hosted answer.

Small check, fast read, clean kill. Let's run it.

The detail behind the pitch
Problem
Professional services firms (law, consulting, accounting) with unstructured document repositories spend 30-45 minutes per query manually searching PDFs for client answers, creating labor inefficiency.
Proposed solution
AI research assistant that ingests firm's documents, answers plain-language questions with exact citations, weighted by source authority, and learns from senior annotations.
Target market
Mid-size legal, compliance, and consulting teams (50-500 person firms) with 100+ documents to search; willingness to pay €2,700-15,000+ for build + €1,000-2,000/month maintenance.
First test
Build for 1 law firm or compliance team; measure if system cuts query-resolution time from 30-45 min to <2 min with 95%+ accuracy; target 5+ queries/week adoption.
Kill criteria
<2 paying clients contracted (PO signed, not verbal) AND <€2,000 MRR by day 60 → kill; OR pilot firm logs <5 queries/week for any 2 consecutive weeks within days 14-45 → pivot ingestion model; OR data-security objection raised by 2 of first 5 prospects without a documented resolution path by day 30 → kill or hard-pivot to on-premise/self-hosted architecture before spending further on sales
Competitive landscape
Incumbents: Casetext CoCounsel (Thomson Reuters), Harvey AI, Luminance, iManage RAVN, Relativity aiR, Kira Systems (Litera), Ironclad AI, Lexis+ AI (LexisNexis), Westlaw AI (Thomson Reuters), Notion AI / Guru (generic KB players) Pricing: $50-$200/seat/mo for SaaS incumbents; Harvey AI ~$100-$200/seat/mo enterprise; Casetext ~$100/seat/mo; Kira Systems ~$1,500-$3,000/mo flat; custom/enterprise deals common above 50 seats Saturation: medium Wedge: Own-document ingestion with verifiable, source-authority-weighted citations and a senior-annotation feedback loop directly addresses the trust and proprietary-knowledge gaps that public-database-only incumbents structurally cannot close. User complaints: Hallucinated citations that look plausible but reference non-existent document sections — partners don't trust outputs without manual verification; No ingestion of firm's own proprietary documents; tools are trained on public legal databases only, missing internal precedents and client memos; Black-box answers with no source authority weighting — junior and senior sources treated identically; Steep per-seat SaaS pricing makes firm-wide rollout expensive for mid-size firms (50-500 people); Onboarding requires IT/vendor involvement; no self-serve ingestion pipeline for unstructured legacy PDFs; No annotation/feedback loop — senior partner corrections are discarded, not learned from; Generic RAG pipelines fail on domain-specific document structures (e.g., legal schedules, annexes, exhibit cross-references) Notes: The large incumbents (Harvey, Casetext, Lexis+ AI) dominate BigLaw and are priced/positioned for it; the mid-market (50-500 person firms) is meaningfully underserved because per-seat SaaS economics don't justify adoption and self-hosted or build-plus-retainer models are rare. The proposed pricing structure (€2,700-15,000 build + €1,000-2,000/month) maps well to mid-market budget cycles and avoids the per-seat trap. The authority-weighting and annotation loop are genuine differentiators absent from all major incumbents reviewed. Key risk: well-funded generic RAG startups (e.g., Glean, Guru) are moving down-market and could commoditize the retrieval layer within 18-24 months, making the annotation/trust layer the only durable moat.
Skeptic + judge rationale
Death modes: - The single pilot law firm's IT/compliance department blocks document ingestion due to client confidentiality clauses and GDPR/data-residency concerns, delaying go-live by 8-12 weeks; the founder burns runway waiting for security sign-off and never reaches the 5 queries/week adoption threshold within the 90-day window - The senior-annotation feedback loop requires a named partner to spend 2-3 hours/week labeling corrections, but partners bill at €400-800/hr and treat annotation as unbillable overhead — adoption stalls at 1-2 queries/week from a single curious associate, never reaching firm-wide use, making the 'learning' differentiator a paper feature that cannot be demonstrated to a second prospect - The build fee (€2,700-15,000) closes with the managing partner verbally, but contract execution requires sign-off from procurement/finance who reclassify it as a software vendor relationship requiring a 60-90 day legal review, InfoSec questionnaire, and DPA negotiation — the founder reaches day 90 with a signed NDA but no purchase order and zero MRR # Judge rationale (score=65.0) Strong on ARPU (€12-24k/yr/client), recurring revenue, and a real underserved mid-market wedge with differentiated authority-weighting. Loses heavily on human intervention: build-plus-retainer model means Lisandro is on InfoSec calls, DPA negotiations, custom ingestion tuning, and annotation onboarding for every client — antithetical to zero-human thesis. Sales cycles to law/compliance firms are 60-90 days with procurement gates, not 14, and defensibility is thin once Glean-class players move down-market. Operationally it's software plus active service delivery, not pure self-serve SaaS.
Reply "approve #1" on Telegram to ship this bet.
Bet #2

Auditable CSV diffs for compliance sign-off

judge 75/100edge 1.5/10b2b saas

Every data analyst at a mid-market bank or hospital cleans CSVs the same way: dedupe in OpenRefine, normalize a column, hit save, and pray. When compliance asks 'what changed?' they screenshot Excel or shrug. OpenRefine's undo dies when you close the tab. Trifacta costs $95/seat. Nothing in between produces a row-level, human-readable diff a stakeholder can sign.

Market is tight but real: ~50k analysts in regulated mid-market shops, target ARPU $30/mo, 2% capture = ~$360k ARR floor — fundable only if we land a compliance vertical (finance, clinical ops) where audit artifacts carry legal weight. Wedge is narrow: diff-as-exportable-artifact. Copyable in a quarter if Alteryx notices, so speed matters.

Why now: regulators (SOC2, HIPAA data lineage, EU AI Act provenance rules) are pushing audit trails down to the row level for the first time, and mid-market compliance teams are scrambling without budget for Talend.

Why us: honestly, no special edge here. This is a generic SaaS bet I'd run on execution speed, not unfair advantage. Worth flagging.

The path: 14 days, ~$2k. Build diff viewer for dedup/normalize/trim. Recruit 5 analysts AND cold-email 10 compliance officers directly — the skeptic is right that Reddit-recruited analysts aren't buyers. Kill if <3 of 5 would share the diff with a stakeholder, <$500 MRR by day 45, zero compliance inbound by day 60.

Small check, fast read, clean kill. Let's see if compliance picks up the phone.

The detail behind the pitch
Problem
Data analysts and CSV users can't verify what changed when cleaning datasets, forcing them to trust outputs blindly or manually audit changes.
Proposed solution
CSV cleaner tool that shows before/after diffs for every modification (removals, edits, normalizations) with undo capability.
Target market
Data analysts, scientists, compliance teams at mid-market companies; ~50k potential users willing to pay for data trust
First test
Build diff viewer for 3 common CSV transformations (dedup, normalize, trim). Recruit 5 data analysts via Reddit/Twitter to clean one file, measure time-to-trust vs. baseline.
Kill criteria
<3 of 5 beta testers report they would pay OR share the diff artifact with a stakeholder (not just 'complete with confidence') AND <$500 MRR or <2 paid conversions by day 45 AND 0 inbound compliance-team contacts by day 60 → kill
Competitive landscape
Incumbents: OpenRefine, Trifacta / Alteryx Designer Cloud, Parabola, DataWrangler (archived), Excel Power Query, CleaningTool.io, Talend Data Preparation Pricing: $0 (OpenRefine, OSS) to $50-$95/seat/mo (Trifacta/Alteryx); Parabola ~$80/mo; Talend mid-market contracts $15k+/yr Saturation: medium Wedge: Compliance-grade, shareable row-level diff report after every clean operation — something no current tool exports in a reviewable, stakeholder-friendly format. User complaints: OpenRefine has no native undo history export or shareable audit log — changes are session-local and lost on close; Trifacta/Alteryx is expensive and overkill for analysts who just want transparent CSV edits; Excel Power Query shows transformation steps but hides cell-level diffs — you can't see exactly which rows changed; No tool produces a human-readable, row-level before/after diff that non-technical stakeholders can sign off on; Compliance teams can't use OpenRefine outputs as audit evidence because there's no immutable change log; Most tools are pipeline-builders, not diff-viewers — the verification UX is an afterthought Notes: The OSS incumbent (OpenRefine) solves 70% of the workflow but has a notorious gap in auditable, exportable change history — its 'undo' is session-only and not sharable. Enterprise tools (Trifacta, Talend) add audit logs but at a price and complexity that prices out mid-market analysts. The real wedge is the diff-as-artifact: a versioned, exportable, human-readable record of every change that can be attached to a compliance ticket or shared with a data owner for sign-off. The 50k TAM estimate is plausible but tight; growth likely depends on landing compliance-heavy verticals (finance, healthcare, regulated data ops) where audit trails have legal weight.
Skeptic + judge rationale
Death modes: - OpenRefine is free and analysts Reddit-recruited for the test already use it; they tolerate the session-local undo limitation because exporting a diff report is a compliance team's problem, not theirs — the actual buyer (compliance officer) never shows up in a Reddit/Twitter recruitment funnel, so 5/5 testers 'complete with confidence' but zero convert to paid because they have no budget authority and no felt pain around audit artifacts - The diff-as-artifact wedge requires a compliance team to actually demand the artifact from the analyst — but in mid-market companies this workflow doesn't exist yet: compliance teams accept Excel screenshots or nothing, so the tool sits unused between the analyst (who doesn't need the report) and the compliance buyer (who doesn't know to ask for it), creating a two-sided adoption deadlock that kills paid conversion before day 30 - CSV files with >50k rows cause the browser-based or lightweight diff viewer to render slowly or crash, and the first 2-3 Reddit testers post publicly that 'it choked on my real dataset' — the negative social proof in the exact recruitment channel poisons the beta pool, signups stall at under 10, and the founder can't recover perception in the 90-day window # Judge rationale (score=75.0) Wins on shape: pure SaaS, low capex, recurring revenue, minimal human-in-loop once shipped. Loses points on the two-sided adoption deadlock flagged by the skeptic — analyst recruits aren't the buyer (compliance is), which likely pushes real first paying customer past 30 days and dampens ARPU until vertical-specific positioning lands. Defensibility is weak: OpenRefine is free, and the diff-export wedge is copyable in a quarter by any incumbent that notices. Solid but not a layup; total reflects a fundable but adoption-risky bet.
Reply "approve #2" on Telegram to ship this bet.
Bet #3

Google Doc for weekend trip checklists

judge 71/100edge 1.5/10consumer app

Every friend group has that one organizer drowning in WhatsApp scrollback at 11pm Thursday: 'wait, who said they'd bring the cooler?' Itinerary apps want accounts and credit cards for a 48-hour camping trip. Nobody installs Wanderlog for a beach weekend. So the chaos stays in the chat, and the cooler gets forgotten.

Market is real but modest: ~5M US group trip planners a year, organizer-pays model. At 1% capture × $30/yr (saved-trips upgrade) that's $1.5M ARR — a lifestyle wedge, not a unicorn. I want to be honest about that upfront.

The wedge: a Google-Doc-style link — no signup, click and you're in, your name next to 'tent' turns green when you check it. Wanderlog and Lambus chase itineraries; Trello has no travel context; nobody owns the accountless shared checklist. The real competitor isn't them, it's WhatsApp being 'good enough.'

Why now: link-based collab UX (Figma, Tldraw, Excalidraw) trained users that 'open link, edit, done' is normal. Five years ago people expected signup walls.

Why us: honestly, thin. No operator edge here — this is a generic consumer bet. Don't fund it for the founder; fund it for the cheap test.

The path: 14 days, ~$2k, ship to 10 real trip groups. Kill if <3 groups get majority-member engagement, zero upgrades by day 45, and <30% return for a second trip. Reversible, fast, no ego.

Small bet on whether 'good enough' has a crack in it. Worth two weeks to find out.

The detail behind the pitch
Problem
Friends planning group trips lose coordination across fragmented chat/notes/screenshots; no single shared truth for tasks, owners, and status.
Proposed solution
Lightweight shared trip checklist (no accounts needed) where one link syncs task assignments, completion status, and ownership in real-time.
Target market
Friend groups 4-8 people organizing weekend trips, ~5M annual US trip planners; monetize via freemium (paid for multiple trips/integrations)
First test
Deploy link-based checklist to 10 actual trip planning groups. Measure: adoption rate, task completion visibility, willingness to pay for multi-trip feature.
Kill criteria
<3 of 10 test groups have >50% of members open the shared link AND update at least one task (proving multi-user adoption, not solo use) by day 14; AND zero paid feature upgrades (multi-trip or integrations) across all 10 groups by day 45; AND >70% of groups create no second trip within 60 days → kill or full pivot
Competitive landscape
Incumbents: Wanderlog, TripIt, Google Trips (discontinued), Notion (DIY), Trello (DIY), Splitwise (finance-adjacent), Airbnb Trips, TravelJoy, Lambus Pricing: $0 free tier; $4-$10/user/mo for premium (Wanderlog ~$5/mo, TripIt Pro ~$49/yr) Saturation: low Wedge: Zero-friction, no-account shared link (like a Google Doc for trip checklists) that assigns task ownership and syncs live — purpose-built for the 48-hour coordination window before a weekend trip. User complaints: Account/signup walls kill adoption — non-organizers won't install another app; Tools are itinerary-focused (flights, hotels) not task/ownership-focused; Group members revert to WhatsApp/iMessage even after organizer sets up a tool; No real-time shared checklist with named ownership — tasks get lost in chat; Too feature-heavy for a 2-day camping trip; overwhelming onboarding; Splitwise handles money but nothing handles 'who's bringing the cooler' Notes: Incumbents cluster in two camps: full itinerary planners (Wanderlog, TripIt, Lambus) that require accounts and focus on logistics/bookings, and generic task tools (Trello, Notion) with no travel context or frictionless sharing. No direct competitor owns the 'shared accountless trip checklist' niche. The real moat risk is that WhatsApp/iMessage group chats are 'good enough' for many — the wedge only lands if the shareable link UX is truly zero-click for joiners. Freemium model is viable; conversion hook should be saving/reusing checklists across multiple trips rather than per-seat pricing, given the ephemeral nature of the use case.
Skeptic + judge rationale
Death modes: - WhatsApp link-sharing kills the loop: organizer shares the checklist link in the group chat, but non-organizers complete tasks inside WhatsApp ('I'll bring the cooler' reply) rather than clicking through to update the checklist — measurable as <40% of invited members ever opening the link, making the 'shared truth' value prop collapse into a solo to-do list the organizer manages alone - One-trip utility with zero return rate: the ephemeral 48-hour use case means each friend group is a single-use customer — after the camping trip ends, the link dies psychologically even if the product survives; no individual has enough solo trips to hit the 'multiple trips' paywall naturally, so freemium conversion requires a behavioral habit that never forms — measurable as >80% of groups never returning to create a second trip within 60 days - The organizer is the only paying-capable user but won't pay $4-10/mo for a tool their friends use once every 3 months: the person with the problem (coordinator chaos) is the only one who experiences the pain acutely enough to pay, but the per-trip value is ~$0 perceived because WhatsApp 'worked fine last time' — measurable as zero paid upgrades across 10 test groups despite stated willingness-to-pay in surveys # Judge rationale (score=71.0) Wins on capital (pure web app, no accounts = cheap to build/host) and human intervention (self-serve link sharing, zero ops). Loses heavily on ARPU and recurring revenue: ephemeral one-trip use case means freemium conversion is structurally weak — organizer pays alone for a tool used quarterly, and skeptic's 'one-trip utility, no return' death mode is the dominant risk. Defensibility is near-zero; any incumbent could ship a no-account link in a sprint. Market is real but monetization shape (per-trip, not per-seat-recurring) means even success looks like a low-LTV consumer app — borderline bet, leans toward kill unless a B2B angle (event planners, small tour ops) emerges.
Reply "approve #3" on Telegram to ship this bet.
Bet #4

Etsy-native wholesale contracts + tracker

judge 70/100edge 1.5/10info product

Etsy sellers who land their first wholesale buyer hit a wall: they're suddenly negotiating MAP pricing, net-30 terms, and commission splits with nothing but a Google Doc and a prayer. The 1,000+ template sellers on Etsy charge $3-$15 for generic PDFs with no legal review, no Etsy-specific licensing clauses, no way to track bulk tiers. It's the moment a hobbyist becomes a business — and the tooling is garbage.

The honest market math: 1,000-3,000 sellers actively seeking wholesale, maybe 300 paying users at steady state. At $89 one-time template + 20% upsell to a $19/mo dashboard, that's roughly $40-60k ARR ceiling on Etsy alone. Tight. The expansion path — Shopify, Faire, Amazon Handmade craft sellers — pushes addressable to 20-30k, but I'd be lying if I called that proven.

The wedge is the bundle: a lawyer-reviewed, Etsy-specific agreement (MAP, platform fees, licensing) that no one on Etsy offers, with a tracker stapled on. Static template sellers can't add software; Jotform and DocuSign don't know what Etsy is.

I won't pretend you have an operator edge here — you don't. This isn't manufacturing or aviation. It's a cheap probe.

The path: $0 to draft the template with a paralegal review (~$400), list on Etsy and Gumroad, post in r/Etsysellers and r/EtsyWholesale within 14 days. Kill if <5 paid transactions in 30 days or stated WTP <$20. SaaS only unlocks if template sells.

Small bet, fast read, template-first. Let's run the probe.

The detail behind the pitch
Problem
Etsy seller wants to wholesale custom stickers to another retailer but has no contract template or framework to enforce bulk discounts, credit, or commissions.
Proposed solution
Pre-built wholesale agreement template and/or light SaaS dashboard to track bulk orders, enforce commissions, and manage licensing terms for Etsy sellers.
Target market
Etsy sellers offering wholesale (estimated 1000-3000 annually seeking this); willing to pay $50-150 per contract or $15-30/month for SaaS.
First test
Create basic wholesale contract template; post in r/Etsysellers asking if they'd use it or pay for a version; measure downloads/interest in 7 days.
Kill criteria
<5 unique paying transactions (template OR SaaS trial) within 30 days of launch AND average stated WTP in user interviews below $20 one-time → kill the SaaS track immediately and pivot to pure template licensing; OR if SaaS onboarding completion rate (OAuth connect + first order tagged) is below 20% of signups by day 45 → kill the SaaS dashboard entirely
Competitive landscape
Incumbents: Etsy digital template sellers (ProsperousPrintables, WavesofchangeSup, REAL TEMPLATE IDEAS), template.net, Business-in-a-Box, Jotform PDF Templates, DocuSign / HelloSign (generic contract signing) Pricing: $3-$15 one-time on Etsy for static templates; template.net ~$8-$40/mo subscription; Jotform $34-$99/mo (broad form platform); no dedicated SaaS for Etsy wholesale tracking found Saturation: medium Wedge: Bundle a legally-reviewed, Etsy-specific wholesale agreement template with a lightweight SaaS dashboard that auto-tracks bulk order tiers and commission payouts — something none of the 1,000+ static template sellers on Etsy currently offer. User complaints: Generic templates are not Etsy-specific (no MAP pricing, no Etsy licensing clauses, no platform fee considerations); Static PDFs/Docs require manual customization with no enforcement or tracking layer; No tooling to track bulk order tiers, commissions owed, or payment milestones — seller must DIY in spreadsheets; Templates from Etsy sellers have no legal review signal, creating trust concerns for buyers; No integration with Etsy orders API to auto-populate or reconcile wholesale vs. retail volume Notes: The template side is genuinely crowded on Etsy (1,000+ listings) but commoditized and race-to-the-bottom priced ($3-$15). The SaaS layer for Etsy-native wholesale management has zero identifiable incumbents, representing the true wedge. Market size is tight (est. 1,000-3,000 TAM) so SaaS economics are risky at $15-30/mo unless expanded to all handmade/craft sellers on Shopify, Faire, or Amazon Handmade. The highest-leverage bet is a template + onboarding funnel that upsells the SaaS, not SaaS-first.
Skeptic + judge rationale
Death modes: - TAM is functionally ~300 paying users: of the 1,000-3,000 Etsy sellers 'seeking wholesale,' fewer than 10% are actively mid-negotiation at any given time and willing to pay above the $3-15 Etsy floor they've been conditioned to expect — resulting in a ceiling of ~$4,500 total addressable MRR at $15/mo that cannot support a viable SaaS business before runway expires - Reddit validation test produces 200+ upvotes and 50 free downloads but zero paid conversions, because r/Etsysellers users share free resources virally and vocal 'I'd pay for this' commenters never open the Stripe checkout link — founder mistakes engagement for demand and burns 60 days building SaaS before discovering the actual WTP is $0-5 one-time - Etsy API wholesale tracking wedge collapses on first technical contact: Etsy's API does not expose a 'wholesale vs. retail' order distinction, bulk order tiers require manual tagging by the seller, and the promised auto-reconciliation requires OAuth per-seller authentication that 80%+ of non-technical Etsy sellers abandon mid-setup — leaving the SaaS dashboard as a glorified spreadsheet with a $20/mo price tag nobody pays # Judge rationale (score=70.0) Wins on capital (template costs ~$0 to draft), fast time to first sale, and low ongoing human intervention if kept template-first. Loses heavily on market size — skeptic's ~300 paying user ceiling is credible and caps ARPU economics at $15-30/mo. Defensibility is weak (1,000+ Etsy template sellers race-to-bottom) and the SaaS wedge depends on Etsy API capabilities that may not exist. Viable as a low-effort template play, marginal as a SaaS bet.
Reply "approve #4" on Telegram to ship this bet.
Bet #5

Risk-gating layer for the AI PR flood

judge 72/100edge 1.5/10infra tooling

Open-source maintainers are drowning. A solo dev with a 400-star repo now wakes up to 15 AI-generated PRs — half of them plausible-looking slop from someone's Cursor agent — and there's no tool that says 'review these three, auto-close those eight.' Maintainers are quitting projects over this, right now, in 2024.

The market is honestly narrow: maybe 3–5k mid-size OSS projects hit by serious AI-PR volume today, growing fast. At $15/mo and 5% capture that's ~$200K ARR — real but not venture-scale unless we ride the wave into team/enterprise tiers ($50–100/seat for companies whose internal monorepos face the same flood). That's the actual prize.

The wedge is real but thin. CodeRabbit reviews content, Mergify routes PRs — nobody combines AI-detection + contributor trust + machine-readable policy into one risk score. Mergify could ship this in a sprint; that's the death scenario. Our only durable moat is becoming the CODEOWNERS-style policy standard before they wake up.

Why now: AI PR volume crossed the maintainer pain threshold in the last 6 months. Before mid-2024 this product had no users.

Why us: weak. No operator edge here — this is a pure dev-tools bet judged on speed and distribution, not on your unfair advantages.

The path: free GitHub Action, 3–5 pilot repos with verified 10+ AI PRs/week, 14 days, ~$2K in capital. Kill if <60% call the score useful or zero willingness-to-pay.

Small, reversible, honest. Let's run the 14 days and see if the pain is real.

The detail behind the pitch
Problem
Open-source and smaller projects struggle to handle the incoming flood of high-volume AI-generated PRs without maintainer burnout, lacking tools to assess risk and set machine-readable policies before human review.
Proposed solution
Build a GitHub Actions app that auto-scores PR risk (based on diff patterns, maintainer policies, contributor trust history) and gates/routes PRs to maintainers using machine-readable project policies.
Target market
Open-source maintainers of mid-size projects (100–1K stars) receiving 10+ AI-generated PRs/week; GitHub marketplace + self-hosted licensing.
First test
Create a free GitHub Action, install it in 3–5 active open-source repos, collect feedback on risk-scoring accuracy and policy expressiveness over 14 days via surveys.
Kill criteria
<3 repos with verified 10+ AI PRs/week installed AND active after 14 days, OR risk-score rated 'useful' by <60% of surveyed maintainers on a binary useful/not-useful question, OR zero maintainers answer 'yes' to 'Would you pay $15/mo for this?' by day 14, OR no organic install (outside founder outreach) within 30 days → kill
Competitive landscape
Incumbents: CodeRabbit, Graphite Automations, Reviewpad, PullApprove, Mergify, DangerJS, Aviator Pricing: $12–$19/seat/mo (CodeRabbit pro); Mergify free tier + $8–$18/seat/mo; Reviewpad free OSS tier; Graphite ~$18/seat/mo Saturation: medium Wedge: None of the incumbents combine AI-generation detection, contributor trust history, and machine-readable policy enforcement into a single risk-gating layer — they either review code content (CodeRabbit) or automate routing rules (Mergify) but never both with AI-PR awareness. User complaints: Existing tools focus on code quality review, not on risk-scoring or gating AI-generated vs human PRs specifically; Mergify and PullApprove automate routing but require complex YAML config with no AI-aware heuristics; CodeRabbit reviews content but does not enforce machine-readable project policies or contributor trust scoring; Maintainers report no tool distinguishes AI-generated PRs from human PRs or adjusts review burden accordingly; DangerJS requires per-repo scripting overhead that small OSS teams lack bandwidth to maintain; No incumbent surfaces a composite 'risk score' combining diff complexity, contributor history, and AI-generation likelihood Notes: The space is moderately crowded at the layer of automated PR routing and AI code review, but the specific intersection of AI-PR flood detection + risk scoring + policy gating is genuinely unoccupied. The strongest competitive threat is Mergify expanding its automation rules with AI signals, or CodeRabbit adding a policy/gating layer — both are well-funded and have OSS distribution. The defensible moat is the contributor trust graph and the machine-readable policy spec (a CODEOWNERS-like standard), which would create switching costs if adopted. GitHub itself is the silent incumbent risk — it could ship AI PR filtering natively into the platform.
Skeptic + judge rationale
Death modes: - AI-generation detection accuracy is too low (<70% precision/recall on real repos) because LLM-authored PRs are increasingly indistinguishable from human PRs, causing maintainers to lose trust in the risk score after seeing 3–5 false positives in the first week, and abandoning the tool by day 10 — before any willingness-to-pay signal can be captured - The 3–5 pilot repos are recruited from the founder's personal network and already have low AI-PR volume (<2/week), meaning the core pain point never manifests during the 14-day test, surveys return 'nice to have' rather than 'urgent need' responses, and no urgency-driven conversion signal emerges to justify continued development - Mergify ships an 'AI PR detection' beta flag inside their existing automation rules engine within 60 days (they have the GitHub webhook infrastructure, YAML policy engine, and OSS distribution already), collapsing the wedge and making the standalone product redundant before the first paying customer is closed — maintainers already using Mergify see no reason to add a second tool # Judge rationale (score=72.0) Wins on capital (pure GitHub Action, near-zero infra cost), low ops complexity (Vercel/serverless-tier), and clean SaaS recurring model with minimal human-in-loop once shipped. Loses heavily on ARPU ($15/mo/seat against OSS maintainers who notoriously don't pay), market size (mid-size OSS maintainers receiving 10+ AI PRs/week is a narrow slice, low hundreds-to-low-thousands of real buyers), and defensibility (Mergify/CodeRabbit can replicate the wedge in a sprint, GitHub itself is a platform risk). Days-to-revenue is realistic at 60-90d given OSS sales cycles and the kill criteria's own paid-conversion bar by day 14 is optimistic. Decent bet on shape, weak on economics — would need a clear path to team/enterprise tiers to justify.
Source: hn:ask_hn
Reply "approve #5" on Telegram to ship this bet.

★ Killian's Wildcard

Off-Brief, Off-Hand

Tonight's instinct bet — synthesized from training, not pulled from sources. Same calibration, different lane.
The Wildcard

Flat-fee returns SaaS for sub-$5M Shopify brands

judge 65/100edge 3.0/10

Loop Returns just priced out the entire mid-market. A Shopify apparel brand doing $2M/yr now pays $340/mo plus $0.20 per return — and apparel returns at 20-30%, so a Black Friday spike means a $1,200 surprise bill. Returnly got shut down by Affirm in 2023. These merchants are bleeding 15-25% of paid ad ROAS to returns and have no tool built for them.

BuiltWith counts ~80,000 Shopify stores in the $500k-$5M apparel band. Capture 1.5% at $99/mo flat, no per-return fees = $1.4M ARR. Real, not hand-wavy.

The wedge: $99 flat + AI policy layer (store-credit bonus, exchange-first prompts, restocking config) that nobody under $200/mo offers. Rich Returns and Return Prime are pure workflow — no intelligence. Loop is too expensive. AfterShip's free tier caps at 3 returns/mo.

Why now: Returnly's 2023 shutdown displaced thousands of merchants, Loop hiked prices the same year, and GPT-4 makes the policy-optimization layer cheap to build solo.

Why us: honestly, thin. My COGS/manufacturing background gives me returns-economics intuition, but this is a SaaS distribution play. I'm not pretending otherwise.

The path: $200 to join 5 DTC Slacks, offer a free returns audit, deliver a 1-page rec doc, ask for $99/mo. Kill if <5 audit requests in 14 days, <3 paid by day 30, or any churn before day 45 citing workflow limits.

The risk I'll name: Rich Returns ships a GPT tier in 60 days and the wedge compresses. So we move fast or we don't move. Let's find out in two weeks.

The detail behind the pitch
Problem
Shopify merchants doing $500k-$5M/yr in apparel/accessories lose 15-25% of paid ad ROAS to returns, but Loop Returns and Returnly are priced for $10M+ GMV brands ($400-2000/mo plus per-return fees) — sub-$5M brands eat the cost or use clunky free apps.
Proposed solution
A no-app, email+Stripe-link returns workflow with AI-drafted policy optimization (offer store credit with 10% bonus, restocking fees, exchange-first prompts) priced at flat $99/mo with no per-return fees.
Target market
~80,000 Shopify stores in the $500k-$5M GMV apparel band per BuiltWith + Shopify's public segment data; DTC brands burned by Loop's pricing creep.
First test
Post in 5 DTC Slack/Discord communities (DTC Newsletter's, 2PM, Operators) offering a free 'returns audit' — ask for their last 90 days of return data, deliver a 1-page recommendations doc, ask for $99/mo to implement. Spend: $0-200 for community access. Run cold LinkedIn outreach to 200 DTC ops managers with same offer.
Kill criteria
<5 audit requests in 14 days OR <3 paid conversions ($297 MRR) by day 30 OR any paying customer churns before day 45 citing workflow limitations → kill; secondary trigger: if 3+ audits complete but 0 convert to paid within 7 days of doc delivery, kill the audit-to-paid motion immediately and reassess pricing or offer before day 21
Competitive landscape
Incumbents: Loop Returns, Returnly (acquired by Affirm, then shut down 2023), AfterShip Returns, Happy Returns (UPS), Return Prime, Narvar Return, ShipBob Returns, Rich Returns, Clicksit / ZigZag Returns Pricing: Loop: $155-$340/mo base + $0.10-$0.25/return overage (enterprise tiers $750-2000/mo); AfterShip Returns: free tier (3 returns/mo) then $23-$239/mo; Return Prime: free–$9.99/mo lite, $49-$199/mo full; Rich Returns: $14-$49/mo; Happy Returns: carrier/volume-negotiated, mid-market focus Saturation: medium Wedge: Flat $99/mo with zero per-return fees + AI policy optimization (store-credit bonus, exchange-first prompts) targets the 80k Shopify apparel merchants priced out of Loop and underserved by generic low-cost alternatives that have no intelligence layer. User complaints: Loop's per-return fees stack unpredictably on top of base subscription, causing bill shock for seasonal spikes; Loop raised prices in 2023 with little notice; smaller merchants forced off the platform or onto watered-down plans; AfterShip free tier capped at 3 returns/month — useless for real volumes; paid tiers lack exchange-first logic; Return Prime and Rich Returns lack AI-driven policy nudges (store credit bonus, restocking fee config) — pure workflow tools only; No app on Shopify store = friction concern for merchants, but email+link flow is actually how many small brands already handle returns ad hoc; Returnly shut down post-Affirm acquisition (2023), displacing its merchant base and creating an opening; Most tools require Shopify app install + onboarding; merchants report setup complexity as a top complaint in app store reviews; Per-return fees make cost unpredictable and punish high-return-rate categories like apparel Notes: The $99 flat-fee positioning is genuinely differentiated from Loop/Returnly on price structure, and no incumbent in the sub-$200/mo tier offers AI-drafted policy optimization — that combo is the real wedge. However, AfterShip and Return Prime are encroaching downmarket and could replicate features quickly; the moat is thin unless the AI policy layer demonstrably lifts ROAS/LTV metrics that merchants can see in a dashboard. The 'no-app' email+Stripe-link flow is clever for instant setup but may limit retention and expansion revenue versus embedded app competitors. Biggest risk: Loop's 2024 pricing restructure or AfterShip adding an AI tier could compress the window within 12-18 months.
Skeptic + judge rationale
Death modes: - Rich Returns ($14-49/mo) or Return Prime ($49/mo) ships a GPT-powered 'policy optimizer' feature within 60 days of seeing community posts — merchants choose the embedded Shopify app with AI over a no-app $99 email workflow, and the wedge collapses before the founder reaches 10 paying customers - Audit converts to insight but not to payment: merchants take the free 1-page recommendations doc, implement the store-credit bonus and exchange-first prompts manually in their existing free AfterShip or email setup, ghost the $99/mo ask — conversion rate from audit to paid stays below 5% because the 'aha moment' is the PDF, not the ongoing tool - No-app email+Stripe-link workflow fails the returns volume stress test: a merchant with 80+ returns/month finds the email threading becomes unmanageable, escalates a single botched return publicly in the Operators Slack where the founder just ran outreach, poisoning the community channel that was the entire acquisition strategy # Judge rationale (score=65.0) Wins on capital ($0-200 to test), SaaS recurring revenue at ~$1.2k ARPU, and a real 80k-merchant market with a credible pricing wedge. Loses heavily on human intervention: audit-to-paid motion requires founder doing data analysis + custom recommendation docs per prospect, and the email+Stripe-link workflow likely needs human-in-loop ops for return disputes/edge cases — this is service delivery dressed as SaaS. Defensibility is thin (Rich Returns or AfterShip can ship a GPT policy layer in 60 days), and operational complexity rises fast once a merchant hits 80+ returns/month on email threading. Decent bet on paper, but violates the zero-human thesis until the audit-to-onboard flow is fully automated.
Reply "approve wildcard" on Telegram to ship.