Dr. Baishali Dutta

Dr. Baishali Dutta

Data Scientist as well as Ex-CERN Physicist


Hey guys!!! I am a physicist by education but a data scientist by passion.

You won't believe me, but I always wanted to work on the Large Hadron Collider (LHC) at CERN but never thought that my dream would ever come true. This had a significant influence on my data science passion. I was involved in pure data science, starting from data acquisition to the application of the machine learning algorithms until you find the model suitable enough to work with. A long six years of research work at CERN transformed me to think data.

I am also very passionate about writing clean code, maintaining design patterns for software I work on. This always inspired me to get the best out of me for any kind of software I get involved in.

Something other than my passion for data science - I love travelling a lot, and whenever I can, I document my journeys for my YouTube channel.

Lastly, which could sound a bit unusual, but I am a passionate mythology fan, and in my leisure times, I love reading about mythologies.

Data Science Experience

The following comprises the list of data science projects I have been involved with since my master's degree:

  • Keyword Extraction (NLP) Developing a solution to extract the most important keywords/phrases that provides a quick and easy overview of customers’ feedbacks for a certain product or product category from large-scale reviews instead of going through the reviews one-by-one. It would help companies to analyse their product/brand status while analysing where they excel or lack in comparison with their competitors. Involved technologies are KeyBERT model using BERT-embeddings, spaCy, SQL, AWS and Docker.

  • COVID 19 Impact on Beauty Industry Stock analysis of different beauty consumer companies and the impact on the stock data due to COVID-19 pandemic. Primarily, the stocks of these companies have been considered for the analysis - M.A.C, Estée Lauder Companies, ULTA Beauty Inc., E.L.F Beauty Inc and Revlon Inc. ‣ GitHub

  • EPAR Sentiment Analysis (NLP) Compared different algorithms for Natural Language Processing (NLP) to classify the feedbacks on clinical efficacies from European Public Assessment Reports (EPARs) published by European Medicine Agency (EMA) ‣ GitHub

  • Pneumonia Disease Detection A machine learning model to detect Pneumonia by inspecting chest X-ray images using Convolutional Neural Network (CNN), TensorFlow and Keras ‣ GitHubTry it out

  • Comments Toxicity Detection (NLP) A machine learning model to detect the toxicity of comments from social networks using Natural Language Processing (NLP), Bidirectional Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN), TensorFlow and Keras ‣ GitHubTry it out

  • Dark Matter Search (Postdoctoral Research at CERN) Searched for Dark Matter in hundreds of petabytes of data collected at the biggest CERN detector using machine learning algorithms primarily boosted decision tree and neural networks for feature extraction and hypothesis testing to interpret the data ‣ Publication

  • Doctoral Thesis at CERN Performed a precision measurement using the massive CERN dataset. Applied advanced statistical methods such as likelihood profiling to carefully select the signal of interest ‣ Publication

  • CERN Authorship Qualification Project Successfully classified the CERN detector signal into electrons with an efficiency range of 95 - 99.8% by employing statistical and machine learning methods such as pattern recognition, clustering, multivariate likelihood-based method and boosted decision tree ‣ Publication

  • Master's Thesis on Fermilab, USA Project Studied the expected performance and potential of a future Fermilab experiment using simulations. Used data science techniques primarily chi-square minimization to determine the sensitivity of the experiment ‣ Project Report

  • Summer Internship at Bhabha Atomic Research Centre Performed data acquisition and cleaning on the data collected by the Scintillator detector from various sources. Applied statistical regression to interpret and improve the detector performance ‣ Project Report


If you are interested to contact me, here how you can