
Idea Generation
Know moreGenerating, refining, and evaluating research hypotheses. Systems span direct LLM prompting, retrieval-augmented and knowledge-graph generation, multi-agent collaboration, and learned quality signals. The central challenge: LLMs can produce ideas that appear novel and well-motivated, yet often struggle to generate ones that remain feasible, distinctive, and impactful after execution.












Paper2Social
Posts crafted from the survey across X, LinkedIn, Reddit, and Mastodon — each tuned to its platform's tone, length, and audience.
A fully automated AI system can now generate a research paper for as little as $15. But under pressure, every frontier LLM still fabricates results. The capability-vs-integrity tension is real. Our survey of 200+ papers maps the boundary. #AIforScience #LLM
Three findings every research lab should know in 2026: 1. AI handles the mechanical — but research-level code plateaus near 37% success. 2. 95.8% of rejected papers are misclassified as acceptable by LLM reviewers. 3. The most successful auto-research systems converge on a 3-layer architecture: exploration, execution, verification. Read the full survey →
[D] We surveyed 200+ AI auto-research papers. Here's what works, what doesn't, and what to do about it. Tools are now generating papers in 2.3 hours at $15. But the failure modes are getting harder to see — not easier. Ideation looks novel until execution; LLM reviewers are systematically lenient; cost decouples from quality past a modest budget. Full breakdown of the capability boundary, by stage. AMA in comments.
2/8 The ideation-execution gap is real: LLM ideas score 5.38 on novelty → drop to 3.41 after a human implements them. Brilliant on paper, brittle in practice. We see the same shape across stages.
Up to 17.5% of CS papers already carry detectable AI modification. The community needs to shift from detection (a losing race) to declaration. We propose stage-by-stage disclosure norms in the playbook. #AIethics #scicomm
Excited to release the Practitioner's Playbook — a stage-by-stage guide on what to delegate to AI and what to keep under human ownership. For each of the 8 research stages: ✅ delegate, ⚠️ retain, ❌ key risk. Built from controlled experiments and 250+ papers. Free + open. @worldbench