The Data Science Salon New York 2024 provided a remarkable platform for professionals to connect and exchange insights on the evolving landscape of AI and Machine Learning. The event featured in-depth discussions on Generative AI, Large Language Models (LLMs) in Finance, and a range of other AI tools for finance.
Attendees explored the latest advancements, shared best practices, and examined new tools and techniques that are shaping the industry.
The discourse on Generative AI was vibrant and thought-provoking. Innovations like GPT-4, LLaMA 3, and Sora's text-to-video technology highlight AI's potential to drive a new industrial revolution. However, this technological leap brings challenges such as ethical issues, security risks, biases, and fears of 'super intelligence.' Therefore, seasoned industry scientists convened at the New York Data Science Salon to address these concerns and chart a path forward.
Sumedha Rai, a data scientist from Acorns Grow, delved into the critical issue of bias in AI, particularly in the fintech industry. Rai outlined various sources of bias, including data, algorithmic, and cognitive biases, explaining their impact on AI models and society. She highlighted real-world examples, such as Amazon's biased hiring model and racial disparities in loan approvals, emphasizing the profound consequences of biased AI on marginalized groups.
Rai stressed the need for proactive measures to mitigate bias, the importance of diverse, interdisciplinary teams in AI development, and the regular evaluation of AI models to ensure fairness and accuracy.
Claudia Pereira Johnson from Nubank explored how fintech companies can leverage data to create personalized customer experiences. Nubank, one of the world's largest digital banks, utilizes financial and app engagement data to provide customized shopping recommendations within their banking app.
Claudia emphasized the importance of data cleaning and feature creation to build accurate customer profiles, which can then be used for targeted marketing and product recommendations. By integrating credit card transactions, app engagement, and demographic data, Nubank offers tailored financial solutions that improve customer satisfaction and loyalty.
Oren Netzer from Data Heroes introduced the concept of coresets, a data sampling technique originating from computational geometry. Coresets create a weighted subset of the original dataset, retaining statistical properties and outliers, making them highly effective for machine learning tasks. Netzer highlighted the advantages of coresets, such as reducing complexity, speeding up data processing, and improving model performance
. By using coresets, companies can handle large datasets more efficiently, balance quality metrics more effectively, and perform frequent model updates with minimal computational costs.
Here are two main takeaways from the corsets technique:
Antonio Ponte, the Global Deposit Product Manager at Citi, discussed the transformative potential of large language models (LLMs) and generative AI for finance. He highlighted their disruptive impact across industries, particularly in banking, predicting a significant boost in market capitalization. Antonio outlined various integration strategies for generative AI, ranging from leveraging vendor-provided features to deploying open-source models on private infrastructures.
He emphasized the development process of LLM solutions, detailing three key phases: exploration and learning, prototype development, and production deployment. Antonio underscored the need to prioritize use cases, establish robust infrastructure, and ensure stringent governance to mitigate risks such as data sensitivity and model hallucinations.
"Generative AI is not just transforming tasks but entire industries, making knowledge workers more productive and redefining the scope of their roles. We're on the brink of a new era where AI will become as indispensable as GPS in our daily lives," Antonio notes.
Jayeeta Putatunda, a senior data scientist at Fitch Ratings, explores the transformative potential of large language models (LLMs) across various industries. She emphasizes the importance of robust evaluation frameworks to ensure that LLMs are deployed effectively and without reputation risks. Her talk covers industry trends, evaluation metrics, and specific methodologies for assessing LLM performance, including traditional word-based metrics and advanced embedding-based evaluations.
Jayeeta highlights the challenges of deploying LLM-focused products into production and the necessity of rigorous evaluation to build trust and reliability in AI solutions. She underscores the advancements in LLMs, from foundational models to sophisticated, application-specific tools that address diverse business needs.
"Without proper evaluation, you can't ensure your product serves business use cases effectively and builds the necessary trust," Jayeeta states, emphasizing the critical role of evaluation in the successful deployment of LLMs. By adopting a structured approach to evaluation and leveraging both traditional and advanced metrics, businesses can ensure that their LLM solutions are robust, reliable, and effective in real-world applications.
Harry Mendell, a Federal Reserve Data Architect, discussed the challenges of implementing AI in banking supervision, emphasizing the difficulties in transitioning AI prototypes to production due to governance and security issues. The Federal Reserve is transitioning to cloud-based architectures, using a data mesh to ensure data quality, automatic ETL processes, and comprehensive data cataloging. This system manages governance profiles, making data searchable and accessible for analysis while maintaining security protocols.
Industry experts discussed the complexities of AI deployment in financial services. Sassoon Kosian from New York Life explained the collaboration required to build and deploy predictive models, while S&P Global's Andrey Pakhomov detailed managing fixed income pricing with machine learning. Vincent David from Capital One emphasized the need for high performance, low latency, and robust infrastructure. Sumedha Rai from Acorns Grow shared insights into managing fraud detection models and natural language processing efforts.
Capital One's engineering leader Vincent David highlights the importance of ensuring that models run reliably in production, emphasizing the need for high performance, low latency, and robust infrastructure. This involves constant monitoring and adjusting to handle the high volumes of transactions typical in financial services. Acorns Grow's data scientist Sumedha Rai shares insights into managing fraud detection models and natural language processing efforts, underscoring the need for comprehensive monitoring and evaluation to maintain model accuracy and effectiveness.
The panelists also discuss the concept of proof of concept (POC) in AI projects, describing it as an essential step to validate models before full-scale deployment. They highlight challenges such as cost, scalability, and ensuring compliance with regulatory requirements. Emphasizing the importance of involving interdisciplinary teams, they note that successful AI deployment requires collaboration across engineering, data science, product management, and business stakeholders.
Iro Tasitsiomi from T. Rowe Price explored the complexities and misconceptions surrounding generative AI. She highlighted the non-deterministic nature of generative AI compared to traditional AI's deterministic approach. Key issues include confusion caused by inconsistent definitions of AI from major bodies like the EU and the National Institute of Standards, leading to incorrect problem-solving approaches and misguided regulatory questions.
Key issues include the confusion caused by inconsistent definitions of AI from major bodies like the EU and the National Institute of Standards. These misunderstandings lead to incorrect problem-solving approaches and misguided regulatory questions. The speaker emphasizes the importance of focusing on the efficiency of current AI models rather than continually expanding their size, which may not be necessary and could lead to data exhaustion.
"My goal today is to convince you to think of shape as the fundamental organizing principle of data analysis and that topology is the right way to think about shape." Zack Golkhou, Director of AI and Data Science at J.P. Morgan, discusses the importance of understanding data shape through topology for accurate analysis and modeling. Topology, focusing on local connections between data points rather than absolute distances, offers a robust framework for analyzing complex data.
These tools have practical applications in finance, such as fraud detection, identifying market regimes, and understanding cash flow dynamics under varying market conditions. The integration of quantum computing with topological data analysis is also highlighted for handling higher-dimensional data and uncovering deeper patterns, enhancing the ability to model and understand complex systems.
Varun Nakra from Deutsche Bank covered credit risk modeling, distinguishing between interpretable and explainable machine learning. Interpretable models allow stakeholders to understand and trust decisions, while explainable models provide post-hoc explanations for black-box decisions. Varun advocated combining off-the-shelf machine learning algorithms with constraints like additivity, sparsity, and monotonicity to develop interpretable models without losing accuracy.
"The conclusion is that we can creatively construct a system that uses off-the-shelf ML models subjected to domain-specific constraints and perform constraint optimization to develop interpretable models as per our requirements without losing model accuracy" he said. This approach aims to reduce development time and effort while ensuring reliable, transparent results.
Prasad from S&P Global explained various credit risk models used to assess corporate and financial institution creditworthiness. He discussed the integration of qualitative data through a Sentimental Model Overlay for timely credit evaluations and future incorporation of ESG factors due to regulatory and investor demands. He emphasized the practical applications of these models for banks, insurers, and asset managers.
This year brings impressive GenAI developments, and we’re eagerly watching to see what the future holds! Catch all the sessions from Data Science Salon NYC on-demand and they'll be back in NYC on May, 15, 2025 (save the date).