Fast Data Science Ltd
Apr 06, 2023
No image

Finding molecules and proteins in scientific literature

4-6 months
Germany, Ingelheim am Rhein
view project
Service categories
Service Lines
Big Data
Domain focus
Advertising & Marketing
Programming language
Big Data
Data Mining
Data Science
Marketing Analytics
Text Analytics


The client needed to parse scientific literature and identify occurrences of molecules or proteins. The client has a database of molecules ordered by researchers around the world for experimentation. The client wanted to match these molecules to molecules mentioned in the scientific literature.


We trained a machine learning model to learn from annotated examples and annotate new publications as they come in as containing molecules of interest to our client. This helped the company to automate the process of identifying molecules of interest in new publications, which would traditionally require a team of human experts to manually review every publication for potential candidates. The result is a more efficient and effective way to identify drugs that are built our the client's research.


Our client was able to get a clear view of how their molecules were being used, and which research results were stemming from molecules whose formulae had been publicised. Researchers who had used our client's molecules without attribution could be contacted to ensure that credit was given to our client for providing the molecules.