Meet Neelay Shah, Machine Learning Engineer on our Data Processing team!

Neelay and the Data Processing team develop pipelines to process data at scale for use-cases such as training foundation models. They ensure that data and models can be processed in a distributed fashion, efficiently spreading computational work across multiple machines to allow for scaling when large data sets are involved. They also partner closely with the Data Science team who use the infrastructure developed by the Machine Learning Engineers to perform experiments for client projects.

Ryan Sargent
April 29, 2025

What are your main responsibilities? What does a typical week look like for you?

Currently I am working on a team in the backend department that is responsible for data processing. The goal of this team primarily is to go from raw data which we get from partners or clients and transform them into data formats which can be used to train machine learning models.

Up until the end of last year, I was on the Data Science Engineering team. The main responsibility of the team is to develop pipelines to perform inference with machine learning models. This means training these models to make accurate predictions on new data quickly and at scale. Towards the end of the year, I was part of a task force assembled for the Atlas foundation model project. As part of this, we developed pipelines to process data at large scale and to perform experiments to develop foundation models. Once the project wrapped, I transferred to my current team to continue working with my team lead. 

On a typical day, I spend most of my time programming. While programming there are a bunch of considerations to take into account - we need to develop code that is fast and robust while remaining cost effective, so we perform benchmarking experiments to ensure that it meets our requirements.

What is a recent challenge that your team overcame that you’re particularly proud of?

One of the things that I worked on last year was optimizing the previous iteration of our foundation model (RudolfV) so that we could continue to use our inference workflows without increasing runtime or costs because increased runtime naturally meant increasing cost for the pipelines as well.

This was challenging because it was much larger than all the other models we had been using before and therefore it was much slower as well. Due to the large data size, I think it started around 2-3x slower than the normal models. This meant if an experiment normally finished in four hours, it would now take eight hours. And that was obviously something we were not happy with.

To solve this, we went down the technical route and explored using existing open source tools to optimize our models to infer on data faster. We ended up introducing a new technical component called TensorRT. Using this, we were able to optimize so that the foundation model ran just as fast after optimization as the non-foundation models used to run. This meant that we didn't increase costs when using the foundation models by any significant amount.

What is an impactful skill that you developed since joining the team that you did not expect?

When I joined as a junior engineer, I expected to be developing a lot of pipelines and mostly doing technical work. But one thing that I have been excited to learn about as part of this model optimization work last year was also to have an idea of business costs. 

One of the biggest aims of optimizing a model is to ensure that runtime doesn’t increase. Increased runtime leads to higher operational costs, delayed decision making, limited scalability, and ultimately a reduced user experience for our clients. Our focus on finding the balance between model performance and computational efficiency is really an exercise in also improving client user experience.

To learn about the impact of model optimization on business costs while working on the Data Science Engineering team, I was involved in benchmarking work to see how much using the foundation model would cost compared to non-foundation models we had previously developed. I learned about how to use cloud cost estimation tools and how to conduct small benchmarking experiments to estimate runtime costs of models. It has been very impactful to learn how these estimates can help us to make key decisions on what kind of compute infrastructure to use when running our models to optimize runtime costs. 

What is your favorite part about working at Aignostics?

First and foremost, it's the feeling I have of working towards a good cause. I think there are plenty of jobs all around the world where you can do interesting technical work, but this feeling I have at the end of the day knowing that what I worked on will contribute towards a good cause, towards this overarching goal of providing better cancer detection solutions, is the biggest reason I like this job.

Of course in addition, I think we have a very good company culture. We are very diverse, we are very international, and I love that. I get to learn from people around me every day - people from all different walks of life, with more experience, people from different cultures, and I think that's amazing.

How would you describe your team in just three emojis? 

📈🧠🌐

As a team, we are always looking to train machine learning models that have strong prediction performance, and continue to improve and drive better results. Machine learning also draws inspiration from the human brain, and one of the key design concepts of our pipelines, not just in the Data Processing team but also in other teams across the company, is the notion of distributed computing. You have lots of data and if you want to scale any sort of inference or training, we need to use multiple machines at the same time. So any sort of workflow that we run, it needs to be able to be distributed and interconnected.

Related Articles

Meet Srishti Munjal Mehta, Scientific Program Manager on our Translational Programs team!

15.4.2025

Srishti and the Translational Programs team work closely with clients on translational R&D projects including topics such as comprehensive tissue profiling and spatial analyses, precise quantification of challenging biomarkers, and prediction of biomarker expression.

Meet Timo Haschler, Senior Scientific Project Manager on our Target and Biomarker Discovery team!

12.3.2025

Timo Haschler and his team work closely with clients to identify novel drug targets and biomarkers. This includes the multi-year collaboration we announced with Bayer in 2024, to identify novel cancer targets and co-create a target identification platform to enable better patient identification, stratification, and selection for clinical trials.

Pathology's Evolution: How Aignostics Builds on Charité Berlin's 300-Year Legacy

27.3.2025

Aignostics started in 2018 within the walls of the Berlin Institute of Health at Charité – Universitätsmedizin Berlin, before ultimately spinning-out in 2020. But did you know that Charité Berlin has a rich history dating all the way back to 1710? Let’s explore some of that history, along with the history of pathology’s digital transformation, to understand the origins and drivers of Aignostics’ founding.