Are you tired of endlessly searching through old messages? Let’s use deep learning to fix that! In this deep dive, we show how you can quickly turn your slack channels into an accurate, usable knowledge base with the aid of state-of-the-art open-source tools like refinery, Qdrant and a little bit of Python. Along the way, you’ll learn something about modern large language models (LLM), semantic search as well as modern text classification techniques.
Dr. Benedikt Mangold
The interesting part about working on a research hypothesis: you typically don’t know the answer yet. Otherwise, no research would be required, and you could start using the results right away. This fact is the basis for many internal discussions on how to do data science in agile teams, where often we have been asked to commit on results after a given time budget. This talk is sharing experiences from one year of working on data science research in an agile environment at GfK.
The objective of the Smartnet.ai project at Orange is the reduction of time between malfunction in the telecommunication system and recognition of its root cause. Although this project is for a Telco, its aim is universal for all industries dealing with constant service delivery. The Orange team put into production a scalable system on Google Cloud Platform. Its Deep Neural Network module allows for anomaly detection and data engineering for identifying the root cause. The system works in a streaming mode with data quality and drift monitoring modules.
Air pollution in cities has to stay below certain limits for health reasons. This talk shows an approach to forecast air pollution for the city of Berlin for the next few days using xgboost. The talk will focus on the model and the challenges we faced during the modeling process. We will also have a glance at the tech stack we use (e.g., Clickhouse, kubernetes, Docker, Redash, FastAPI).
Dr. Juan Orduz
This sessions presents a bayesian approach to model cohort-level retention rates and revenue over time. We at Wolt use bayesian additive regression trees to model the retention component which we couple with a linear model to model the revenue component. This method is flexible enough to allow adding additional covariates to both model components. This bayesian model allows us to quantify the uncertainty in the estimation, understand the effect of the covariates and forecast the future revenue, and retention rates. The source code is open sourced on GitHub and for the presentation we will use synthetic data.
Dr. Michele Dallachiesa
This talk discusses the challenges faced by the UK Local Planning Authorities (LPAs) in managing written representations from the community for their Local Plans. The volume of information can be overwhelming, making it difficult for LPAs and the Planning Inspectorate to review and assess plans. The talk presents a case study that explores the potential of Large Language Models (LLMs) to streamline the analysis of representations, with significantly reduced processing time and improved accuracy.
70% of AI projects fail. Why? Could it be due to incomprehensible solutions looking for problems? Why not leverage the marketing framework of 5 Ps of marketing to address this challenge and learn how to market data & AI solutions? This keynote will show how this concept is being applied at a pharma company to market data products, addressing the product, price, place, promotion, and people … all of those elements essential for successful use of data in any organization.
Predictive modelers love building lots of models and then selecting the best model to deliver to the stakeholder. Popular metrics for model assessment and selection include R^2 and classification accuracy. Most often, however, the business doesn’t care about these and analysts may miss deploying the best models for the business. In this talk, a methodology for finding business-centric metrics will be explored so that the effectiveness of the models is significantly improved.
Dr. Majid Mortazavi
Production processes are often prone to failures and defects due to heavy manual works which lead to increased production costs and delays. Here, I demonstrate not only an end-to-end automatic visual inspection based on the state-of-the-art object detection algorithms to verify the correct assembly of through-hole technology components on printed circuit boards with live feedback in real-time to avoid slippage of defective products, but also invaluable lessons learned for a successful end-to-end product.
Dr. Torge Schmidt
Personalization is a crucial principle in content moderation and information selection for many websites, especially in e-commerce where showing relevant products to customers is key to providing a valuable service. We will provide a practical guide to build a cloud-based near real-time recommender service. Attendees will come away with a comprehensive understanding of the business considerations and technical challenges involved in building a production-ready recommender system on AWS.
Dialysis machines and other MedTech equipment need to operate continuously to guarantee that patients receive proper treatment. We developed a predictive maintenance solution for the dialysis machines of B. Braun. I will discuss how we went from proof of concept to global rollout in a big data cloud environment, what the challenges were from a feature engineering and modelling perspective and how we utilized a broad arsenal of techniques from supervised and unsupervised machine learning.
Text classification is a well-known use case. But how do you approach this classification task if the text is in multiple different languages? In this session Katharina presents different options for solving multi-language classification problems like language embeddings as well as different translation models and APIs and discuss the challenges to use them in practice. Finally, she presents a new approach that works especially well on short texts and which is now used to classify banking transactions at ING.
Prof. Dr. Peter Gentsch
In this Keynote Session we delve into the powerful impact of AI technologies like ChatGPT on businesses. Understand how they are rewriting the rules, instigating innovation and causing industrial disruption. We’ll unpack the applications of generative AI, from overhauling customer support to personalizing marketing and driving data-informed decision-making. Join us as we demystify the transformative power of AI and its potential to redefine your business trajectory.
Prof. Dr. Wil van der Aalst
The keynote introduces Object-Centric Process Mining (OCPM) which can be seen as a major breakthrough in process mining. Traditional approaches for process modeling and process analysis tend to focus on one type of objects and each event refers to precisely one such object. OCPM takes a more holistic and comprehensive approach to process analysis and improvement by considering multiple object types and events that involve any number of objects. The keynote presents the basic concepts and the need for object-centric process mining techniques using many examples. OCPM is rapidly being adopted in commercial systems, showing its practical relevance. Process mining experts expect that OCPM will become the “normal” way of doing process mining and therefore of value to any organization that wants to improve its processes.
The talk presents how ING has increased its productivity in delivering propensity-to-buy models for direct marketing campaigns by implementing an AutoML pipeline that utilizes the existing infrastructure. The implementation of clear standards and a highly automated model development process with state-of-the art features (SHAP, Bayesian parameter tuning, etc.) has decreased the time needed for development to deployment to less than one month while model updates only need a few hours.
Dr. Martin Dlask
An accurate recommendation is a challenge since predictions often miss novelty, are repetitive or become irrelevant to the users. For a large mobile game, it is crucial to enhance the relevance of displayed content. We will present the main challenges and their solution when building a recommender system for millions of players. We discuss the iterative approach to model development, the strategy of competing model testing as well as methods that can diminish the bias of the prediction.
The availability of high-quality video content representations is a key success factor for a variety of our business-critical use cases. Traditional approaches of using content metadata are error prone, difficult to maintain and usually don’t capture the necessary details to distinguish one video from another. In this talk we’d like to show how we at ProSiebenSat1 generate video embeddings using open-source approaches. We will also dive into our tech stack and different ways of evaluating video embeddings.
Dr. Julian Wagner
On-demand shuttle services offer an innovative and sustainable transportation solution, but routing requires near real-time decision-making under various constraints. We present a reinforcement learning approach that learns to fulfill transportation requests in reduced waiting time by integrating predictive demand into positioning of the shuttles. The method was used by the Deutsche Bahn to route autonomous on-demand shuttles in a German town and demonstrates the potential to reduce travel time and traffic and to improve accessibility for impaired passengers.
Dr. Frank Eichinger
DATEV eG is processing monthly 15 mio. payslips of German employees. We present a privacy-preserving way to learn a machine-learning model on this sensitive data and a product to assess if the salary of current/prospective employees is in line with the market. Besides data engineering and machine learning, this talk focusses on anonymisation, namely differential privacy in combination with privacy amplification and highlights the learnings as an organisation to build data-driven products.
Dr. Rob Pasternak
Large language models have taken the world by storm, but what if you’d like to use one with your internal text data? LLMs aren’t trained on your data, and finetuning a bespoke LLM is daunting. However, retrieval-augmented generation (RAG) combines LLMs with document search to create powerful generative NLP systems for your text data. In this talk we will discuss how to build effective RAG pipelines, as well as how to address potential concerns like privacy and hallucination effects.
Dr. Christoph Best
Targeting the right audience is central to successful online marketing. However, traditional audience targeting (e.g. age and gender) is severely limited. Modern audience targeting incorporates and predicts real user behavior, and uses machine learning to create audience that are truly relevant for online advertising. I’ll discuss the data sources and machine-learning methods to construct relevant audiences for online advertisers, and the tools to explore and visualize such audiences.
Dr. Alexander Khachikyan
At PAYBACK, a global loyalty program, a crucial aspect of upskilling initiatives within the ML/DL domain is the empowerment of citizen data scientists – subject matter experts – not typically rooted in conventional data science. Recognizing their unique position to bridge the gap between data analytics and real-world business cases, PAYBACK is leveraging AutoML tools to enhance their understanding of the ML lifecycle. Here, we present the approach adopted for training citizen data scientists.
Dr. Alexey Fofonov
Industrial paper manufacturing is a highly technological process that requires continuous effort to prevent interruptions. Early detection and proper identification of problems are crucial for timely servicing, optimal personnel allocation, and avoiding breakdowns. Mondi Group and d-fine developed a smart condition monitoring solution that uses the existing technical infrastructure as a maintenance assistant.
Dr. Tim Tolkmitt
In an increasingly competitive Subscription-Video-on-Demand (SVoD) market, customer loyalty is key. RTL+ uses the Customer Lifetime Value project to track KPIs and visualize survival curves for different customer cohorts based on usage and marketing channels. Culminating in a model to predict remaining customer lifetime. The project has already aided RTL+ in making strategic budget decisions.
Despite the high degree of automation in industrial control systems, human operators continue to play a critical role in ensuring uptime, quality, and safety. We present five different ways in which AI technologies can be applied and used together in a workflow to support operators in preventing and resolving issues with ease, speed, and confidence. This showcases how AI is paving the way for remote and autonomous operations that mitigate against human-induced error, protecting lives and assets.
Renewable energy dependence on fluctuating weather conditions may prevent a fast energy transition. Since 2013, Edison have developed custom algorithms for wind and solar forecasting, moving from proprietary tools to open-source libraries, from on-premises architecture to MLOps cloud infrastructure. This case study is aimed at data scientists and MLOps engineers willing to deal with a huge amount of models. You will learn how ML/AI can boost renewable energy penetration in the market.
Data Mesh has grown beyond theory and buzzword. The initial concept as introduced by Zhamak Dehghani in 2019 is described as a socio-technological approach. It has implications on how data is organized and worked with. This keynote will reflect on a Data Mesh implementation journey, on pitfalls and how they can be avoided. In addition to the organizational transformation, we will in particular describe the foundational data platform development, because it’s one of the key success factors.
Knowledge is everything!
Sign up for our newsletter to receive:
Yes, I would like to subscribe to the Machine Learning Week Europe Newsletter.