- AI models struggle in investment banking because critical workflow data (IPO execution, M&A, complex spreadsheets) is scarce, proprietary, and highly specialized.
- Models frequently hallucinate and compound errors across multi-step, numerically intensive tasks, with studies showing sub-50% accuracy on basic finance problems.
- Most enterprise and financial AI pilots fail to progress beyond proof-of-concept or deliver P&L impact, and early deployments have already produced billions in losses from flawed outputs and compliance issues.
- Effective use of AI in banking requires domain-specific data, tightly scoped use cases, strong human oversight, and rigorous governance rather than attempts at full automation.
Read More
“Why AI Models Suck at Investment Banking” main points:
The article argues that while AI models are impressively capable in broad tasks (poetry, text generation, code), they falter in investment banking largely because training data for specialized workflows is sparse and largely proprietary—there’s inadequate data for IPO execution, M&A diligence, spreadsheet work, etc. [1]. AI’s tendency to hallucinate and accumulate errors across sequential steps undermines its suitability for high-stakes financial work. Real-world ROI remains weak: Forbes cites a MIT study finding that 95% of generative AI pilots don’t move past proof-of-concept [1].
Corroborating and contrasting evidence:
Multiple sources back these claims. A study evaluating 22 models across basic finance tasks finds average accuracy below 50%, underscoring a substantial performance gap in entry-level banking functions [2]. A separate Forbes analysis notes that 95% of financial AI pilots fail to generate P&L impact, even as attention hypes up transformative potential; yet some firms (especially senior decision-makers) leveraging AI correctly outperform due to context and integration [3].
EY’s survey of large companies finds that early AI deployment has led to ~$4.4 billion in combined losses, driven by compliance failures, flawed outputs, and sustainability disruption—not job loss or legal issues [4].
Strategic implications:
— Investment banks and deal firms should temper expectations: AI is not ready to replace junior bankers but can assist in amplifying human expertise. Overpromising autonomy may erode trust internally and with clients.
— Focus should shift to acquiring domain-specific data, building feedback loops, and integrating human judgment where context matters. Tasks requiring precision should be adhered to hybrid models—AI + oversight—rather than full automation.
— Risk management and governance are essential: accuracy, auditability, bias, explainability, and regulatory compliance pose rising costs and institutional risk.
— Select pilots carefully: projects with clear ROI and high repeatability (e.g. document processing, memo generation, formatting, initial screening) may succeed; complex, multi-party workflows (e.g. fundraisings, complicated M&A) are less forgiving.
Open questions:
— What models or architectures can better handle small, private, and sequential datasets characteristic of investment banking workflows?
— How to rigorously measure and reduce compounding errors in multi-step AI tasks?
— What regulatory frameworks will emerge to govern AI failures, responsibility, and disclosure in high-stakes financial decisions?
— How can institutions source or create enough high-quality data without breaching confidentiality or incurring prohibitive costs?
— What training or organizational change is needed to bridge the intuition gap between technologists building AI and bankers using it?
Altogether, while the promise of AI in investment banking is real, current limitations are too material to ignore; strategic investment in infrastructure, data, and governance is required to turn potential into consistent performance.
Supporting Notes
- Forbes reports that major tech companies will spend about $400 billion on AI infrastructure in 2025; yet return on investment in working use cases has remained weak, especially in investment banking workflows [1].
- MIT study cited by Forbes found 95% of generative AI pilots do not move beyond proof‐of‐concept, failing to generate measurable impact [1].
- Vals AI study evaluated 22 models on basic finance tasks; models averaged below 50% accuracy, with some large models scoring <10% in tasks like analyzing SEC filings [2].
- MIT study reasserted in Forbes confirms high failure rate of enterprise AI, primarily due to misaligned expectations and inadequate domain integration [3].
- EY survey of 975 large companies found ~$4.4 billion in losses tied to AI deployment, primarily from compliance issues, flawed outputs, bias, and sustainability disruptions—not legal penalties [4].
- The time-series and numerical reasoning weaknesses: Levy’s University of Chicago study shows LLMs’ accuracy drops sharply when adding large numbers or when input data is subtly manipulated, evidencing reliance on pattern matching rather than true analytic reasoning [2].
- Analysis emphasizes that current AI models have little public data for real world workflows like corporate finance documentation, M&A, and IPOs, where data is proprietary and nuanced [1].
Sources
- [1] www.forbes.com (Forbes) — October 31, 2025
- [2] www.washingtonpost.com (The Washington Post) — April 22, 2025
- [3] www.forbes.com (Forbes) — September 17, 2025
- [4] www.reuters.com (Reuters) — October 8, 2025