# Financial time series forecasting with AI ## Journey in Financial Forecasting: From Failed Models to a New Paradigm I embarked on a project to forecast financial time series, specifically predicting stock price movements. My goal was to classify the next day's price movement into one of three categories: **decline, flat, or rise**. Here's a chronicle of my journey through various models, frustrating roadblocks, and an eventual, unexpected conclusion. #### Phase 1: The Initial Approach - A Classification Task My first strategy was to build a robust classification model. I chose XGBoost as a strong baseline and a Transformer model for its prowess in handling sequential data. **My toolkit included:** * **Stock Data:** Sourced using `pykrx`. * **News Data:** Gathered with `gnews` and processed for sentiment analysis using the `'tabularisai/multilingual-sentiment-analysis'` model. * **Technical Indicators:** A suite of features generated with the `ta` library. * **Loss Function:** To combat the natural imbalance between "rise," "decline," and "flat" days, I used `class_weights` within the `CrossEntropyLoss` function. **The Initial Results:** * **XGBoost Accuracy:** 43.5% * **Transformer Accuracy:** 41.7% These results were disheartening. With a random chance baseline of 33.3%, my models were performing only marginally better. My primary suspect was the classic **class imbalance problem**, which I thought I had addressed. #### Phase 2: The Grind - A Barrage of Failed Attempts Convinced I could solve this, I systematically tried a wide array of techniques to improve performance. Each attempt, however, ended in failure. * **Data Augmentation:** Neither adding noise nor using SMOTE (Synthetic Minority Over-sampling Technique) yielded any improvement. * **Model & Training Adjustments:** * Changing the classification threshold. * Varying the number of technical features (both more and less). * Switching the model architecture from a Transformer to an LSTM. * Replacing `CrossEntropyLoss` with `FocalLoss` to better handle hard-to-classify examples. * Implementing `EarlyStopping` to prevent overfitting. * Increasing the `dropout` value for regularization. * Tuning the learning rate (`LR`) and applying gradient clipping. Nothing worked. The performance remained stubbornly low. #### Phase 3: The Pivot - New Data & New Problems Frustrated with classification, I wondered if I was framing the problem incorrectly. **Attempt 1: Switch to Regression** I changed the task from classifying direction to regressing the future price. The result? An $RMSE$ of 0.0212. Impressively... it was even worse. This confirmed that predicting the exact price is significantly harder, so I rolled back to classification. **Attempt 2: Add Macroeconomic Data** Perhaps my model was missing the bigger picture. I integrated macroeconomic data from FRED: * KOSPI Index * KRW/USD Exchange Rate * 10-Year Treasury Constant Maturity Rate The result? Still no significant improvement. The "WHY????" echoed in my mind. #### Phase 4: Exploring the State-of-the-Art (SOTA) I decided it was time to see what cutting-edge research models could do. I looked into methods like 'Time-R1' and 'TiRex'. I started with TiRex. Since TiRex doesn't work well in a non-CUDA Windows environment, I fired up a Google Colab notebook with a GPU. I fed it my data for a **zero-shot** prediction—meaning the model had *zero* training on my specific dataset. The results were shocking. TiRex's zero-shot performance was substantially better than my meticulously trained custom models. This was a major turning point. The idea of painstakingly building my own dataset and fine-tuning a model like Qwen2 4B (similar to the Fin-R1 approach) suddenly seemed inefficient. #### The Final Revelation: Why Reinvent the Wheel? As I contemplated the immense effort of fine-tuning, a simpler question emerged: "Why don't I just use an API?" My research led me to a fascinating article by Kakao Bank, "[ChatGPT and Stock Price](https://tech.kakaobank.com/posts/2403-chatgpt-and-stock-price/)," which demonstrated the impressive capability of large language models (LLMs) like GPT-4 in financial contexts. **Conclusion:** My journey through the weeds of custom model building led me to a powerful conclusion. The challenge of stock market forecasting isn't just a numerical time-series problem; it's deeply intertwined with understanding the narrative, sentiment, and complex interplay of news and macroeconomic events. This is precisely where massive, pre-trained LLMs excel. Instead of building a specialized model from scratch, the more effective and efficient path forward may be to leverage the emergent reasoning capabilities of models like GPT-4 via an API. My next steps will be to explore this new and promising direction.