Ramit Sawhney(Netaji Subhas Institute of Technology), Arshiya Aggarwal(Delhi Technological University), Piyush Khanna(Delhi Technological University), Puneet Mathur(University of Maryland College Park), Taru Jain(GGSIPU) and Rajiv Ratn Shah(IIIT Delhi)
Stock volatility is a degree of deviations from expected returns, and thus, estimates risk, which is crucial for investment decision making. Volatility forecasting is complex given the stochastic nature of market microstructure, where we use frenzied data over various modalities to make temporally dependent forecasts. Transcripts of earnings calls of companies are well studied for risk modeling as they offer unique investment insight into stock performance. Anecdotal evidence shows company CEO's vocal cues could be indicative of the stock performance. The recently developing body of work on analyzing earnings calls treat stocks as independent of each other, thus not using rich relations between stocks. To this end, we introduce the first neural model that employs cross inter-modal attention for deep verbal-vocal coherence and accounts for stock interdependence through multi-layer network embeddings. We show that our approach outperforms state-of-the-art methods by augmenting speech features with correlations from text and stock network modalities. Lastly, we analyze the components and financial implications of our method through an ablation and case study.