100 AI/ML Engineer Interview Questions Part B

AI News & Updates AI Research Artificial Intelligence (AI) Solutions blog Machine Learning & Data Science
Nov 20
0

Part B — Technical AI/ML Questions (70) | MalikFarooq.com Style

Meet ALI - Your AI/ML Interview Guide

Background: ALI is a Computer Science student at IIT Delhi with a strong passion for AI/ML. He completed his internship at TCS AI Lab where he worked on cutting-edge machine learning projects. His key project involves Stock Price Prediction using LSTM networks, and his favorite algorithm is Random Forest due to its interpretability and robust performance. ALI will guide you through each question with practical examples from his academic and industry experience.

ML Fundamentals

Question 1

What is the difference between supervised and unsupervised learning?

Explanation

Supervised learning uses labeled data to train models that can make predictions on new data. Like ALI's stock price prediction project at TCS AI Lab, where he used historical price data (features) with known future prices (labels) to train his LSTM model. Unsupervised learning finds patterns in data without labels, like clustering customer segments or finding hidden topics in documents.

Memory Trick

Supervised = Teacher present, Unsupervised = Self-discovery

How ALI Answers

"During my internship at TCS AI Lab, I worked on supervised learning for stock prediction where we had historical prices as labels. But when I analyzed trading patterns without knowing the outcomes, that was unsupervised learning - like using clustering to find similar trading behaviors."

Question 2

Explain bias-variance tradeoff with an example.

Explanation

Bias is error from oversimplifying the model (underfitting), while variance is error from being too sensitive to training data (overfitting). ALI's Random Forest algorithm balances this beautifully - individual trees have high variance, but averaging reduces variance while maintaining low bias. In his stock prediction project, a simple linear model had high bias, while a complex neural network had high variance.

Memory Trick

Bias = Bullseye missed consistently, Variance = Arrows scattered around

How ALI Answers

"At IIT Delhi, I learned this through my Random Forest projects. When I used just one decision tree, predictions varied wildly (high variance). When I used linear regression for stock prices, it consistently missed the mark (high bias). Random Forest gave me the sweet spot by averaging multiple trees."

Question 3

What is cross-validation and why is it important?

Explanation

Cross-validation splits data into multiple folds, training on some and testing on others, then averages results. It prevents overfitting and gives more robust performance estimates. ALI used 5-fold cross-validation in his stock prediction project at TCS to ensure his LSTM model wasn't just memorizing specific time periods but learning actual market patterns.

Memory Trick

Cross-validation = Multiple dress rehearsals before the main show

How ALI Answers

"In my TCS internship, I initially trained my LSTM on 2019 data and tested on 2020. It failed during COVID! Cross-validation taught me to test across multiple time periods, making my model more robust for real-world deployment."

Question 4

Explain overfitting and how to prevent it.

Explanation

Overfitting occurs when a model learns training data too well, including noise, leading to poor generalization. Prevention methods include regularization (L1/L2), dropout, early stopping, and cross-validation. ALI encountered this when his LSTM memorized specific stock patterns from 2019 but failed on 2020 data during his TCS project.

Memory Trick

Overfitting = Student who memorizes answers but can't solve new problems

How ALI Answers

"My first LSTM model at TCS had 99% training accuracy but 60% validation accuracy - classic overfitting! I fixed it using dropout layers, early stopping, and regularization. My IIT professors always said: 'A model that's too good on training data is usually too bad on real data.'"

Question 5

What is regularization and its types?

Explanation

Regularization adds penalty terms to prevent overfitting. L1 (Lasso) adds sum of absolute weights, promoting sparsity. L2 (Ridge) adds sum of squared weights, shrinking coefficients. ALI used L2 regularization in his Random Forest feature selection and dropout regularization in his LSTM networks at TCS AI Lab.

Memory Trick

L1 = Lasso selects features, L2 = Ridge reduces weights

How ALI Answers

"During my stock prediction project, I had 50+ features initially. L1 regularization helped me identify the 15 most important ones, while L2 regularization in my LSTM prevented weights from exploding. It's like having a coach who tells you to focus on key skills rather than trying everything."

Algorithms

Question 6

Why do you prefer Random Forest? Explain its working.

Explanation

Random Forest creates multiple decision trees using bootstrap sampling and random feature selection, then averages their predictions. It reduces overfitting through ensemble learning and provides feature importance scores. ALI loves it because it's interpretable, handles missing values well, and performed excellently in his TCS projects for risk assessment alongside his LSTM model.

Memory Trick

Random Forest = Wisdom of crowds + Random sampling

How ALI Answers

"Random Forest is my go-to algorithm because it combines simplicity with power. At TCS, while my LSTM predicted stock prices, Random Forest helped identify which features mattered most. It's like having multiple experts give opinions and taking the average - usually more reliable than any single expert."

Question 7

Compare SVM with Random Forest.

Explanation

SVM finds optimal hyperplane for classification/regression using kernel trick, while Random Forest uses ensemble of decision trees. SVM works well with high dimensions but needs feature scaling. Random Forest handles mixed data types and provides feature importance. ALI used SVM for text classification in his IIT projects but prefers Random Forest for structured financial data.

Memory Trick

SVM = Single optimal boundary, Random Forest = Multiple simple boundaries

How ALI Answers

"In my IIT coursework, I used SVM for sentiment analysis of financial news, but for my TCS stock prediction features, Random Forest was better. SVM needed careful preprocessing and parameter tuning, while Random Forest worked out-of-the-box with mixed numerical and categorical features."

Question 8

Explain gradient boosting and its variants.

Explanation

Gradient boosting builds models sequentially, each correcting errors of previous ones. Variants include XGBoost (optimized implementation), LightGBM (leaf-wise growth), and CatBoost (handles categorical features). ALI used XGBoost alongside his Random Forest for ensemble predictions in his TCS stock prediction project.

Memory Trick

Gradient Boosting = Learning from mistakes sequentially

How ALI Answers

"At TCS, I combined Random Forest with XGBoost for stock predictions. While Random Forest gave stable predictions, XGBoost fine-tuned the errors. It's like having a student (Random Forest) and a tutor (XGBoost) who corrects the student's mistakes iteratively."

Question 9

What is K-means clustering and its limitations?

Explanation

K-means groups data into K clusters by minimizing within-cluster sum of squares. Limitations include: need to specify K, assumes spherical clusters, sensitive to initialization and outliers. ALI used K-means during his IIT projects to segment trading strategies but found it struggled with non-spherical patterns in his TCS financial data.

Memory Trick

K-means = K circles drawn around similar points

How ALI Answers

"I used K-means in my IIT project to group similar stocks, but it assumed all groups were circular. Real financial data has complex shapes, so I later used DBSCAN for density-based clustering in my TCS internship to better capture irregular trading patterns."

Question 10

Explain the concept of ensemble methods.

Explanation

Ensemble methods combine multiple models to create a stronger predictor. Types include bagging (Random Forest), boosting (XGBoost), and stacking (meta-learning). ALI's TCS project used ensemble of LSTM, Random Forest, and XGBoost, with a meta-learner combining their predictions for better stock price forecasting.

Memory Trick

Ensemble = Orchestra of algorithms playing in harmony

How ALI Answers

"My TCS project taught me that no single algorithm is perfect. I combined my LSTM (for time patterns), Random Forest (for feature interactions), and XGBoost (for error correction) using a simple voting ensemble. Like having multiple experts - each good at different aspects of the problem."

Deep Learning

Question 11

What is backpropagation and how does it work?

Explanation

Backpropagation calculates gradients by propagating error backwards through the network using chain rule. It updates weights to minimize loss function. ALI implemented this from scratch during his IIT coursework and used it in his LSTM stock prediction model at TCS, where gradients flowed back through time steps to learn temporal patterns.

Memory Trick

Backpropagation = Error flowing backwards like water finding its source

How ALI Answers

"In my LSTM project at TCS, backpropagation was crucial for learning stock patterns. When the model predicted wrong prices, the error traveled backwards through all time steps, adjusting weights. It's like learning from mistakes - the error tells each layer exactly how much it contributed to the wrong answer."

Question 12

What are activation functions and why are they needed?

Explanation

Activation functions introduce non-linearity, enabling networks to learn complex patterns. Common ones: ReLU (fast, avoids vanishing gradients), Sigmoid (outputs 0-1), Tanh (outputs -1 to 1). ALI used ReLU in his LSTM hidden layers and sigmoid for the final stock price prediction probability at TCS.

Memory Trick

Activation functions = Adding curves to straight lines

How ALI Answers

"Without activation functions, my LSTM would just be linear regression! I used ReLU for hidden layers because it's fast and prevents vanishing gradients, and tanh for LSTM gates. It's like adding decision-making ability to each neuron - not just passing information, but processing it."

Question 13

Explain vanishing and exploding gradients problem.

Explanation

Vanishing gradients occur when gradients become too small in deep networks, preventing early layers from learning. Exploding gradients happen when gradients become too large. Solutions include gradient clipping, better initialization, and architectures like LSTM. ALI faced this in his stock prediction LSTM and solved it using gradient clipping and proper initialization.

Memory Trick

Vanishing = Whisper getting fainter, Exploding = Shout getting louder

How ALI Answers

"My first LSTM at TCS had exploding gradients - losses jumped wildly! I fixed it with gradient clipping. Later, I faced vanishing gradients in deeper networks. LSTM's gating mechanism naturally helps with vanishing gradients, which is why it works well for long sequences in stock data."

Question 14

What is dropout and how does it prevent overfitting?

Explanation

Dropout randomly sets some neurons to zero during training, forcing the network to not rely on specific neurons. This creates an ensemble effect and prevents overfitting. ALI used 0.3 dropout in his LSTM layers at TCS to prevent the model from memorizing specific stock price patterns and improve generalization.

Memory Trick

Dropout = Randomly removing team members to make others stronger

How ALI Answers

"In my TCS LSTM, dropout was like training with some neurons blindfolded. It forced the network to learn robust patterns rather than memorizing specific sequences. I used 30% dropout between LSTM layers, which significantly improved performance on unseen stock data."

Question 15

Compare different optimizers (SGD, Adam, RMSprop).

Explanation

SGD uses fixed learning rate, Adam combines momentum with adaptive learning rates, RMSprop adapts learning rate based on recent gradients. Adam is generally preferred for its adaptive nature. ALI experimented with all three in his LSTM project, finding Adam worked best for stock prediction due to its ability to handle sparse gradients in financial data.

Memory Trick

SGD = Steady pace, Adam = Adaptive smart runner, RMSprop = Recent memory-based

How ALI Answers

"I tested all optimizers in my TCS LSTM project. SGD was too slow, RMSprop was better but Adam won - it adapted learning rates for each parameter. For stock data with different feature scales (prices vs volumes), Adam's parameter-wise adaptation was crucial for convergence."

CNNs

Question 16

Explain how CNNs work and their key components.

Explanation

CNNs use convolution layers to detect local patterns, pooling for dimensionality reduction, and fully connected layers for classification. Key components: filters/kernels, feature maps, pooling layers. ALI used CNNs during his IIT computer vision course and experimented with CNN-LSTM hybrid for analyzing stock chart patterns at TCS.

Memory Trick

CNN = Sliding window detectors finding patterns locally

How ALI Answers

"At IIT, I learned CNNs for image recognition, but at TCS I got creative - I converted stock price data into chart images and used CNNs to detect patterns like head-and-shoulders or triangles. Combined with my LSTM for time series, it created a powerful hybrid model."

Question 17

What are different types of pooling and their effects?

Explanation

Max pooling takes maximum value (preserves strong features), Average pooling takes average (smooths features), Global pooling reduces entire feature map to single value. ALI used max pooling in his CNN experiments for stock chart analysis as it preserved the most significant price movements while reducing dimensionality.

Memory Trick

Max = Keep the strongest, Average = Smooth everything, Global = One value per map

How ALI Answers

"In my CNN experiments at TCS, I used max pooling because stock charts have sharp peaks and valleys that represent important price movements. Max pooling preserved these critical features while reducing computation, unlike average pooling which would smooth out important signals."

Question 18

Explain transfer learning in CNNs.

Explanation

Transfer learning uses pre-trained models as starting points, leveraging learned features from large datasets. Approaches include feature extraction (freeze layers) and fine-tuning (adjust all layers). ALI used transfer learning from ImageNet-trained models when experimenting with chart pattern recognition in his TCS project, adapting them for financial data visualization.

Memory Trick

Transfer learning = Standing on giants' shoulders

How ALI Answers

"Rather than training a CNN from scratch for stock chart patterns, I used a pre-trained ResNet from ImageNet and fine-tuned it. The low-level edge detectors were already perfect for chart lines, I just needed to adapt the higher layers for financial patterns. Saved weeks of training time!"

Question 19

What is the purpose of padding in CNNs?

Explanation

Padding adds pixels around input borders to control output size and preserve edge information. Valid padding uses no padding, Same padding maintains input size. ALI used same padding in his CNN experiments to ensure that edge patterns in stock charts (like breakout points at chart edges) weren't lost during convolution operations.

Memory Trick

Padding = Adding frame around picture to preserve edges

How ALI Answers

"In stock charts, important patterns often occur at the edges - like breakouts at the end of the time period. Without padding, my CNN would lose these edge features during convolution. Same padding ensured every part of the chart got equal attention during feature extraction."

Question 20

Compare different CNN architectures (LeNet, AlexNet, VGG, ResNet).

Explanation

LeNet (simple, small), AlexNet (introduced ReLU, dropout), VGG (deeper with 3x3 filters), ResNet (skip connections solve vanishing gradients). ALI studied these architectures at IIT and found ResNet most suitable for his deep CNN experiments due to its ability to train very deep networks without vanishing gradient problems.

Memory Trick

LeNet→AlexNet→VGG→ResNet = Simple→Deeper→Uniform→Skip connections

How ALI Answers

"At IIT, we progressed through these architectures chronologically. For my TCS project, I chose ResNet because stock pattern recognition needed deep networks, and ResNet's skip connections prevented vanishing gradients. It was like having shortcuts in a tall building - information could flow easily to any floor."

RNN/LSTM/GRU

Question 21

Why do we need LSTM over vanilla RNN?

Explanation

Vanilla RNNs suffer from vanishing gradients, limiting their ability to learn long-term dependencies. LSTM solves this with gating mechanisms (forget, input, output gates) and cell state. ALI's stock prediction project at TCS required learning patterns spanning weeks/months, which vanilla RNNs couldn't handle but LSTM mastered through its memory mechanism.

Memory Trick

RNN = Short-term memory, LSTM = Long-term memory with gates

How ALI Answers

"My TCS project needed to learn from quarterly earnings patterns affecting stock prices months later. Vanilla RNN forgot this long-term information, but LSTM's cell state acted like a conveyor belt carrying important information across many time steps, enabling it to connect distant events."

Question 22

Explain the three gates in LSTM and their functions.

Explanation

Forget gate decides what to remove from cell state, Input gate determines what new information to store, Output gate controls what parts of cell state to output. ALI's LSTM learned to forget irrelevant market noise, remember important price trends, and output relevant predictions for his TCS stock prediction model.

Memory Trick

Forget = Eraser, Input = Pen, Output = Display screen

How ALI Answers

"In my stock LSTM, the forget gate learned to ignore weekend price gaps, the input gate focused on volume spikes during earnings, and the output gate determined when to make confident predictions. It's like having three smart assistants managing what to remember, learn, and share."

Question 23

What is the difference between LSTM and GRU?

Explanation

GRU combines forget and input gates into update gate, has reset gate instead of output gate, and no separate cell state. It's simpler and faster than LSTM but may be less expressive. ALI compared both in his TCS project - LSTM performed slightly better for complex stock patterns, but GRU was faster for real-time predictions.

Memory Trick

GRU = LSTM's simpler cousin with 2 gates instead of 3

How ALI Answers

"I tested both architectures at TCS. For my main stock prediction model, LSTM's three gates provided better control and slightly higher accuracy. But for real-time trading alerts, GRU's speed advantage made it more practical. It's a classic accuracy vs speed tradeoff."

Question 24

Explain bidirectional RNNs and when to use them.

Explanation

Bidirectional RNNs process sequences in both forward and backward directions, capturing context from both past and future. Useful for tasks where future context matters, like NLP. ALI experimented with bidirectional LSTMs for stock prediction but found them less useful since future stock prices shouldn't influence past predictions in real trading scenarios.

Memory Trick

Bidirectional = Reading the story forwards and backwards simultaneously

How ALI Answers

"I tried bidirectional LSTM for stock analysis, but realized it was cheating - using future prices to predict past ones! However, it was perfect for my IIT NLP project analyzing financial news sentiment, where understanding the complete sentence context improved classification accuracy significantly."

Question 25

What is sequence-to-sequence learning?

Explanation

Sequence-to-sequence (Seq2Seq) models use encoder-decoder architecture where encoder processes input sequence into fixed representation, decoder generates output sequence. Common in machine translation, text summarization. ALI explored Seq2Seq for generating trading signals from stock sequences during his TCS internship, creating a model that translated price patterns into buy/sell recommendations.

Memory Trick

Seq2Seq = Translator converting one language sequence to another

How ALI Answers

"At TCS, I built a Seq2Seq model that 'translated' 30-day stock price sequences into 5-day future price predictions. The encoder LSTM compressed price patterns into a context vector, and the decoder LSTM generated future predictions step by step, like translating English to French but with numbers!"

Transformers/LLMs

Question 26

What is the attention mechanism and why is it important?

Explanation

Attention mechanism allows models to focus on relevant parts of input sequence when generating each output. It calculates attention weights showing which input positions are most relevant for current prediction. ALI studied attention during his IIT advanced ML course and experimented with adding attention to his LSTM for better stock prediction by focusing on relevant historical periods.

Memory Trick

Attention = Spotlight highlighting relevant information

How ALI Answers

"I added attention to my TCS LSTM to see which historical periods it focused on for predictions. Surprisingly, it paid most attention to earnings announcement periods and market crash days, even if they were months ago. It's like having a smart assistant who knows exactly which past events matter for today's decision."

Question 27

How do Transformers work and why are they better than RNNs?

Explanation

Transformers use self-attention to process all positions simultaneously, enabling parallelization. They have encoder-decoder architecture with multi-head attention and positional encoding. Better than RNNs due to parallelization, better long-range dependencies, and no vanishing gradients. ALI explored Transformers for financial text analysis during his advanced coursework at IIT.

Memory Trick

Transformers = Parallel processing with global attention

How ALI Answers

"While my LSTM processed stock data sequentially, Transformers could look at all time points simultaneously. In my IIT project analyzing financial reports, Transformers could connect distant sentences instantly, whereas RNNs might forget the beginning by the time they reached the end. It's like reading the entire document at once vs word by word."

Question 28

What are the key components of a Transformer model?

Explanation

Key components: Multi-head attention (parallel attention mechanisms), positional encoding (sequence position info), feed-forward networks, layer normalization, and residual connections. ALI studied these components during his IIT coursework and implemented a simplified Transformer for time series forecasting in his research project.

Memory Trick

Transformer = Multi-head attention + Position info + Feed-forward + Normalization

How ALI Answers

"In my IIT research project, I implemented each component step by step. Multi-head attention was like having multiple perspectives on the same data, positional encoding told the model about sequence order, and residual connections prevented vanishing gradients in deep networks. Each piece serves a specific purpose in the architecture."

Question 29

Explain the difference between BERT and GPT architectures.

Explanation

BERT uses encoder-only architecture for bidirectional context, trained with masked language modeling. GPT uses decoder-only architecture for autoregressive generation, trained to predict next token. BERT excels at understanding tasks, GPT at generation tasks. ALI used BERT for financial sentiment analysis during his IIT projects due to its bidirectional understanding capability.

Memory Trick

BERT = Bidirectional understanding, GPT = Generative prediction

How ALI Answers

"For analyzing financial news sentiment at IIT, I chose BERT because it could understand context from both directions - crucial for financial language where 'not bad' means good! GPT would be better if I wanted to generate financial reports, but for understanding existing text, BERT's bidirectional nature was perfect."

Question 30

What is fine-tuning in the context of LLMs?

Explanation

Fine-tuning adapts pre-trained LLMs to specific tasks by training on domain-specific data with lower learning rates. It leverages learned representations while adapting to new domains. ALI fine-tuned BERT on financial news during his IIT project, taking advantage of general language understanding and adapting it for financial sentiment classification.

Memory Trick

Fine-tuning = Teaching a smart student a new subject

How ALI Answers

"Instead of training BERT from scratch for financial sentiment, I fine-tuned a pre-trained model on financial news data. It already knew English grammar and semantics, I just taught it finance-specific language patterns. Like teaching a literature expert to understand technical jargon - much faster than starting from zero."

Embeddings

Question 31

What are word embeddings and how do they work?

Explanation

Word embeddings represent words as dense vectors in continuous space where similar words are closer together. Methods include Word2Vec, GloVe, and contextual embeddings like BERT. ALI used pre-trained word embeddings in his financial news analysis project at IIT, where words like "profit" and "earnings" were mapped to similar vector spaces.

Memory Trick

Embeddings = GPS coordinates for words in meaning space

How ALI Answers

"In my IIT sentiment analysis project, word embeddings helped my model understand that 'revenue growth' and 'profit increase' are similar concepts, even though the words are different. It's like having a map where related financial terms cluster together in the same neighborhood."

Question 32

Explain Word2Vec (Skip-gram vs CBOW).

Explanation

Skip-gram predicts context words from target word, works well with rare words. CBOW predicts target word from context, faster and works well with frequent words. Both use shallow neural networks to learn word representations. ALI experimented with both during his IIT NLP coursework, finding Skip-gram better for financial terminology due to rare technical terms.

Memory Trick

Skip-gram = One word predicts neighbors, CBOW = Neighbors predict center word

How ALI Answers

"For my financial text analysis at IIT, Skip-gram worked better because financial documents have many rare technical terms like 'amortization' or 'EBITDA'. Skip-gram learned good representations for these rare words by focusing on their context, while CBOW struggled with infrequent financial terminology."

Question 33

What are contextual embeddings and how do they differ from static embeddings?

Explanation

Static embeddings (Word2Vec, GloVe) assign fixed vectors to words regardless of context. Contextual embeddings (ELMo, BERT) generate different vectors based on surrounding context. ALI discovered this difference when analyzing financial news where "bank" could mean financial institution or river bank - contextual embeddings captured this distinction better.

Memory Trick

Static = Fixed home address, Contextual = Current location based on surroundings

How ALI Answers

"In financial texts, the word 'bear' could refer to a bearish market or an actual bear in a nature article. Static embeddings gave the same vector regardless, but BERT's contextual embeddings understood the difference based on surrounding words like 'market' vs 'forest'. Context matters hugely in finance!"

Question 34

How do you handle out-of-vocabulary (OOV) words?

Explanation

Strategies include: UNK tokens for rare words, subword tokenization (BPE, WordPiece), character-level models, and FastText (uses subword information). ALI encountered this with company-specific jargon in financial reports and used subword tokenization to handle new terms during his TCS project.

Memory Trick

OOV = Break unknown words into known pieces

How ALI Answers

"At TCS, I encountered many company-specific financial terms not in standard vocabularies. Using WordPiece tokenization, my model could break 'cryptocurrency' into 'crypto' + 'currency' and still understand the meaning, even if it had never seen the full word before. Like solving a puzzle using familiar pieces."

Question 35

What is the curse of dimensionality in embeddings?

Explanation

Curse of dimensionality refers to problems in high-dimensional spaces where data becomes sparse and distance metrics become less meaningful. In embeddings, very high dimensions can lead to overfitting and computational issues. ALI experimented with different embedding dimensions in his projects, finding 300D embeddings optimal for his financial text analysis tasks.

Memory Trick

High dimensions = Everything becomes equally distant and sparse

How ALI Answers

"In my IIT experiments, I tried 1000D embeddings thinking bigger is better, but performance dropped! High dimensions made every word seem equally distant from others. I found 300D embeddings hit the sweet spot - enough capacity to capture meaning without the curse of dimensionality affecting similarity calculations."

Feature Engineering

Question 36

What is feature scaling and when is it needed?

Explanation

Feature scaling normalizes features to similar ranges. Min-Max scaling scales to [0,1], Standardization scales to mean=0, std=1. Needed for distance-based algorithms, gradient descent optimization. ALI scaled stock prices (thousands) and volumes (millions) in his TCS project so that LSTM could learn effectively without one feature dominating others.

Memory Trick

Feature scaling = Putting all players on equal footing

How ALI Answers

"In my TCS stock prediction, prices were in thousands while volumes were in millions. Without scaling, my LSTM focused only on volume changes and ignored price patterns. StandardScaler made both features equally important, dramatically improving prediction accuracy."

Question 37

Explain different techniques for handling categorical variables.

Explanation

Techniques include: One-hot encoding (binary columns), Label encoding (ordinal numbers), Target encoding (mean of target), Embedding layers for high cardinality. ALI used one-hot encoding for stock sectors and embedding layers for company symbols in his TCS project, as company symbols had too many categories for one-hot encoding.

Memory Trick

One-hot = Binary flags, Target encoding = Average outcome per category

How ALI Answers

"For stock sectors (10 categories), I used one-hot encoding in my Random Forest. But for individual company symbols (500+ companies), one-hot would create too many columns, so I used embedding layers in my LSTM to learn dense representations. It's like having a compact ID card instead of a huge checklist."

Question 38

What is feature selection and its different methods?

Explanation

Feature selection chooses most relevant features. Methods: Filter methods (correlation, chi-square), Wrapper methods (RFE, forward/backward selection), Embedded methods (Lasso, Random Forest importance). ALI used Random Forest feature importance in his TCS project to identify the most predictive technical indicators from 50+ candidates.

Memory Trick

Filter = Statistical tests, Wrapper = Try combinations, Embedded = Built-in selection

How ALI Answers

"I started with 50+ technical indicators for stock prediction. Random Forest feature importance (embedded method) showed that moving averages and RSI were most predictive. This reduced my features to 15 without losing accuracy, making my LSTM train faster and preventing overfitting."

Question 39

How do you create features from time series data?

Explanation

Time series feature engineering includes: Lag features, Rolling statistics (mean, std), Time-based features (day of week, month), Technical indicators (RSI, MACD), Fourier transforms for seasonality. ALI created extensive time-based features for his stock prediction, including rolling volatility, momentum indicators, and calendar effects.

Memory Trick

Time features = Past values + Rolling stats + Calendar effects + Technical indicators

How ALI Answers

"My TCS LSTM used raw prices, but I also engineered features like 20-day moving average, weekly volatility, RSI, and 'Monday effect' indicator. These helped capture market microstructure that raw prices alone couldn't reveal. The model learned both from sequence patterns and engineered domain knowledge."

Question 40

What is feature interaction and how do you handle it?

Explanation

Feature interaction occurs when the effect of one feature depends on another's value. Methods to capture: Polynomial features, Product features, Tree-based methods (naturally capture interactions), Neural networks. ALI found that stock volume and price movements had strong interactions - high volume + price increase was more significant than either alone.

Memory Trick

Feature interaction = 1 + 1 = 3 (synergistic effects)

How ALI Answers

"In stock analysis, high volume alone doesn't mean much, neither does small price change. But high volume WITH significant price movement indicates strong market sentiment. My Random Forest automatically captured this interaction, while I manually created volume×price_change features for my linear models at TCS."

Data Cleaning

Question 41

How do you handle missing data?

Explanation

Missing data strategies: Deletion (listwise/pairwise), Imputation (mean/median/mode, KNN, iterative), Model-based (Random Forest, MICE). ALI handled missing stock prices using forward-fill (carry last observation) and missing volume data with median imputation during weekends and holidays in his TCS project.

Memory Trick

Missing data = Delete, Fill with average, or Predict what's missing

How ALI Answers

"Stock markets are closed weekends, creating 'missing' data. I used forward-fill for prices (last price carries forward) but median imputation for volume (zero volume would skew the model). For random missing earnings data, I used Random Forest to predict missing values based on similar companies."

Question 42

What are outliers and how do you detect them?

Explanation

Outliers are data points significantly different from others. Detection methods: Statistical (Z-score, IQR), Distance-based (KNN), Isolation Forest, Local Outlier Factor. ALI used IQR method to detect price anomalies but kept them as they often represented important market events like earnings surprises or news announcements.

Memory Trick

Outliers = Data points that don't fit the crowd

How ALI Answers

"In stock data, I found many 'outliers' - huge price jumps during earnings or crashes during COVID. Instead of removing them (they're informative!), I created a separate 'volatility regime' feature. My LSTM learned to adapt its predictions based on whether the market was in normal or high-volatility periods."

Question 43

How do you handle duplicate data?

Explanation

Duplicate detection involves identifying exact or near-duplicate records. Strategies include: removing exact duplicates, fuzzy matching for near-duplicates, and keeping duplicates if they represent valid repeated events. ALI encountered duplicate stock price entries due to data feed issues at TCS and developed automated deduplication pipelines using pandas drop_duplicates and custom fuzzy matching.

Memory Trick

Duplicates = Same story told twice, usually keep only one

How ALI Answers

"At TCS, our data feeds sometimes sent the same price tick multiple times. I used pandas drop_duplicates() on timestamp+symbol+price combinations. But for corporate actions like stock splits, the same price might appear legitimately, so I learned to check context before removing 'duplicates'."

Question 44

What is data leakage and how do you prevent it?

Explanation

Data leakage occurs when future information accidentally influences past predictions. Types include target leakage (features derived from target) and temporal leakage (future data in training). ALI initially included 'next day return' as a feature for predicting today's direction - classic target leakage he caught during model validation at TCS.

Memory Trick

Data leakage = Using tomorrow's newspaper to predict today's stock price

How ALI Answers

"I accidentally included a 'future_volatility' feature in my stock prediction model and got 95% accuracy - too good to be true! I learned to strictly separate training data by time and carefully check that all features use only past information. My IIT professors taught me: 'If it's too good to be true, check for leakage first.'"

Question 45

How do you handle inconsistent data formats?

Explanation

Data inconsistency includes different date formats, currency units, text casing, and encoding issues. Solutions involve standardization, regular expressions, and ETL pipelines. ALI dealt with stock data from multiple exchanges with different timestamp formats, currency denominations, and symbol naming conventions during his TCS internship.

Memory Trick

Inconsistent formats = Speaking different dialects of the same language

How ALI Answers

"At TCS, I worked with data from NSE (Indian format) and NYSE (US format). Dates came as 'DD-MM-YYYY' vs 'MM/DD/YYYY', and prices in INR vs USD. I built preprocessing pipelines using pandas to standardize everything to UTC timestamps and USD values before feeding into my LSTM model."

Evaluation Metrics

Question 46

Explain precision, recall, and F1-score with examples.

Explanation

Precision = TP/(TP+FP) - of predicted positives, how many are correct. Recall = TP/(TP+FN) - of actual positives, how many are caught. F1-score = harmonic mean of precision and recall. ALI used these metrics for his stock direction prediction model at TCS, where high precision meant fewer false buy signals, and high recall meant catching most profitable opportunities.

Memory Trick

Precision = Accuracy of predictions, Recall = Completeness of detection

How ALI Answers

"For my TCS buy/sell signal model: High precision meant when I predicted 'buy', the stock usually went up (few false alarms). High recall meant I caught most profitable opportunities (didn't miss good trades). F1-score balanced both - crucial since missing profits and taking losses are equally costly."

Question 47

When would you use ROC-AUC vs Precision-Recall AUC?

Explanation

ROC-AUC works well for balanced datasets, measures TPR vs FPR across thresholds. PR-AUC better for imbalanced datasets, focuses on positive class performance. ALI used ROC-AUC for balanced bull/bear market classification but switched to PR-AUC for rare event detection like market crashes, where positive cases were only 5% of data.

Memory Trick

ROC-AUC = Balanced datasets, PR-AUC = Imbalanced/rare events

How ALI Answers

"For predicting normal bull vs bear markets (roughly 50-50 split), ROC-AUC worked great. But for detecting market crashes (rare events, <5% of time), a model predicting 'no crash' 95% of the time got high ROC-AUC but was useless! PR-AUC better reflected the model's ability to actually catch crashes when they happened."

Question 48

What are regression evaluation metrics?

Explanation

Key regression metrics: MAE (Mean Absolute Error), RMSE (Root Mean Square Error), R² (coefficient of determination), MAPE (Mean Absolute Percentage Error). ALI used RMSE for his LSTM stock price prediction as it penalizes large errors heavily, which is important when big prediction errors could mean significant financial losses.

Memory Trick

MAE = Average error, RMSE = Penalizes big errors, R² = Explained variance

How ALI Answers

"For my TCS stock price LSTM, I primarily used RMSE because predicting $100 when actual is $50 is much worse than being off by $1 consistently. RMSE's quadratic penalty matched real trading - large errors cause disproportionate losses. I also tracked R² to ensure my model explained most price variance."

Question 49

How do you evaluate time series models?

Explanation

Time series evaluation requires temporal split (no random shuffling), walk-forward validation, and domain-specific metrics like directional accuracy, Sharpe ratio for trading. ALI used walk-forward validation for his LSTM, training on rolling windows and testing on future periods to simulate real trading conditions at TCS.

Memory Trick

Time series = Train on past, test on future, never mix time periods

How ALI Answers

"For my TCS LSTM, I couldn't use random train-test split (that would be time travel!). I used walk-forward validation - train on 2018-2019, test on 2020 Q1, then retrain including 2020 Q1 and test on Q2. This mimicked real trading where you continuously update models with new data."

Question 50

What is the difference between Type I and Type II errors?

Explanation

Type I error (False Positive) = rejecting true null hypothesis, saying there's an effect when there isn't. Type II error (False Negative) = accepting false null hypothesis, missing a real effect. ALI applied this to trading: Type I = buying when shouldn't (false buy signal), Type II = not buying when should (missing profit opportunity).

Memory Trick

Type I = False alarm, Type II = Missed detection

How ALI Answers

"In my TCS trading model, Type I error meant buying a stock that then dropped (false buy signal - lost money). Type II error meant not buying a stock that then rose (missed opportunity - lost potential profit). I tuned my model's threshold based on which error was costlier in different market conditions."

MLOps

Question 51

What is MLOps and why is it important?

Explanation

MLOps combines ML, DevOps, and Data Engineering to automate and monitor ML model deployment, maintenance, and retraining. It includes version control, CI/CD pipelines, monitoring, and governance. ALI learned MLOps importance at TCS when his manually deployed LSTM model broke in production due to data drift, leading him to implement automated monitoring and retraining pipelines.

Memory Trick

MLOps = DevOps for Machine Learning lifecycle

How ALI Answers

"My TCS LSTM worked great in testing but failed in production when market conditions changed. MLOps taught me to monitor model performance, detect data drift, and automatically retrain when accuracy drops. It's like having a health monitoring system for your ML models - preventing failures before they happen."

Question 52

Explain model versioning and experiment tracking.

Explanation

Model versioning tracks different model versions with their code, data, and hyperparameters. Experiment tracking logs metrics, parameters, and artifacts for reproducibility. Tools include MLflow, Weights & Biases, DVC. ALI used MLflow during his TCS internship to track hundreds of LSTM experiments with different architectures and hyperparameters.

Memory Trick

Versioning = Git for models, Experiment tracking = Lab notebook for ML

How ALI Answers

"At TCS, I ran 200+ LSTM experiments with different hyperparameters. Without MLflow tracking, I'd lose track of which combination worked best. MLflow automatically logged my learning rates, dropout values, and validation RMSE, letting me easily reproduce the best model months later for production deployment."

Question 53

What is model drift and how do you detect it?

Explanation

Model drift occurs when model performance degrades over time. Data drift = input distribution changes, Concept drift = relationship between inputs and outputs changes. Detection methods include statistical tests, performance monitoring, and distribution comparisons. ALI's TCS model experienced drift during COVID when market behaviors completely changed.

Memory Trick

Model drift = World changes, model becomes outdated

How ALI Answers

"My TCS LSTM trained on pre-COVID data failed miserably in March 2020. I implemented drift detection using KL-divergence to compare current vs training data distributions, and performance monitoring that triggered alerts when accuracy dropped below 80%. Now I automatically retrain when drift is detected."

Question 54

What is A/B testing for ML models?

Explanation

A/B testing compares model performance by routing traffic to different model versions and measuring business metrics. It ensures new models actually improve real-world outcomes, not just validation metrics. ALI implemented A/B testing at TCS to compare his new LSTM against the existing Random Forest model, gradually increasing traffic to the LSTM as it proved superior.

Memory Trick

A/B testing = Real-world model comparison with actual users

How ALI Answers

"My LSTM had better validation accuracy than the existing Random Forest, but I needed to prove it worked in practice. I set up A/B testing with 10% traffic to LSTM, 90% to Random Forest, measuring actual trading profits. Once LSTM showed 15% better returns over a month, I gradually increased its traffic to 100%."

Question 55

Explain CI/CD for machine learning pipelines.

Explanation

CI/CD for ML automates testing, validation, and deployment of models. Includes data validation, model testing, performance checks, and gradual rollouts. Unlike traditional software, ML pipelines must validate data quality, model performance, and handle model artifacts. ALI implemented GitLab CI/CD at TCS to automatically retrain and deploy his LSTM when new data arrived.

Memory Trick

ML CI/CD = Automated pipeline from data to deployed model

How ALI Answers

"At TCS, my CI/CD pipeline triggered daily: validate new stock data → retrain LSTM if performance dropped → run validation tests → deploy to staging → A/B test → gradual production rollout. What used to take me 2 days manually now happens automatically overnight, with safety checks at every step."

Deployment

Question 56

What are different ways to deploy ML models?

Explanation

Deployment options include: REST APIs (Flask, FastAPI), Batch processing, Real-time streaming, Edge deployment, Model serving platforms (MLflow, Seldon). ALI deployed his TCS LSTM as both a REST API for real-time predictions and batch processing for daily portfolio optimization.

Memory Trick

Deployment = Real-time API, Batch jobs, Streaming, or Edge devices

How ALI Answers

"At TCS, I deployed my LSTM in two ways: FastAPI for real-time stock predictions (traders needed instant results) and Apache Airflow for daily batch processing (portfolio rebalancing overnight). Real-time for urgent decisions, batch for heavy computations - different needs, different deployment strategies."

Question 57

Explain containerization for ML models (Docker).

Explanation

Docker containerization packages models with their dependencies, ensuring consistency across environments. Benefits include reproducibility, scalability, and isolation. ALI used Docker to package his LSTM model with specific TensorFlow versions and Python libraries, ensuring it ran identically on his laptop, TCS servers, and cloud platforms.

Memory Trick

Docker = Shipping container for code - works everywhere identically

How ALI Answers

"My LSTM worked perfectly on my laptop but crashed on TCS servers due to different TensorFlow versions. Docker solved this - I packaged everything (model, dependencies, environment) into a container. Now it runs identically anywhere, from my IIT lab to production servers to cloud platforms."

Question 58

What are the challenges in real-time ML inference?

Explanation

Real-time challenges include: Latency requirements, Throughput scaling, Model size optimization, Feature store integration, and Fallback mechanisms. ALI faced latency issues with his TCS LSTM in live trading - he optimized using model quantization, caching, and implemented fallback to simpler models when LSTM was too slow.

Memory Trick

Real-time = Fast, Scalable, Reliable - pick any two is hard enough!

How ALI Answers

"Live trading needed predictions in <100ms, but my LSTM took 200ms. I optimized using TensorRT quantization (reduced precision), Redis caching for recent predictions, and a fallback Random Forest for when speed mattered more than accuracy. It's about balancing speed, accuracy, and reliability."

Question 59

How do you handle model serving at scale?

Explanation

Scaling strategies include: Horizontal scaling (multiple replicas), Load balancing, Auto-scaling, Model optimization (quantization, pruning), and Caching. ALI used Kubernetes at TCS to auto-scale his LSTM service based on trading volume - more instances during market hours, fewer during off-hours.

Memory Trick

Scale = Multiple copies + Load balancer + Auto-scaling + Optimization

How ALI Answers

"During market open, my TCS LSTM got 1000+ requests/second; during nights, maybe 10/hour. Kubernetes auto-scaling spun up 20 pod replicas during peak hours and scaled down to 2 during off-hours. Load balancer distributed requests evenly, and Redis cached frequent predictions for instant responses."

Question 60

What is model monitoring in production?

Explanation

Model monitoring tracks performance, data quality, and system health in production. Metrics include accuracy, latency, throughput, error rates, and data drift. ALI implemented comprehensive monitoring at TCS using Grafana dashboards to track his LSTM's prediction accuracy, response times, and alert when performance degraded.

Memory Trick

Monitoring = Health checkup for deployed models

How ALI Answers

"My TCS monitoring dashboard showed real-time metrics: LSTM accuracy (updated hourly), prediction latency (should be <100ms), error rates, and data drift indicators. When accuracy dropped below 75% or latency spiked above 200ms, I got Slack alerts to investigate and potentially trigger model retraining."

Cloud Basics

Question 61

Compare major cloud platforms for ML (AWS, GCP, Azure).

Explanation

AWS offers SageMaker, comprehensive services. GCP has strong AI/ML integration with BigQuery, Vertex AI. Azure provides Azure ML Studio, good enterprise integration. ALI used AWS SageMaker during his TCS project for easy LSTM training and deployment, appreciating its notebook environment and automatic scaling capabilities.

Memory Trick

AWS = Comprehensive, GCP = AI-focused, Azure = Enterprise-friendly

How ALI Answers

"At TCS, we used AWS SageMaker for its simplicity - I could train my LSTM on powerful GPUs without managing infrastructure. For my IIT projects with large datasets, I preferred GCP's BigQuery integration. Each platform has strengths: AWS for variety, GCP for AI tools, Azure for Microsoft ecosystem integration."

Question 62

What are the benefits of cloud-based ML?

Explanation

Benefits include: Scalable compute, Managed services, Cost efficiency (pay-per-use), Global accessibility, and Built-in MLOps tools. ALI moved from local training to AWS when his LSTM training time went from 2 days on his laptop to 2 hours on cloud GPUs, while only paying for actual usage time.

Memory Trick

Cloud ML = Infinite compute + Managed services + Pay per use

How ALI Answers

"Training my LSTM locally took 48 hours on my IIT laptop. On AWS p3.2xlarge GPU instance, it finished in 2 hours for just $6. Plus, I got managed Jupyter notebooks, automatic model versioning, and easy deployment - services that would take weeks to set up myself. Cloud democratizes access to powerful ML infrastructure."

Question 63

Explain serverless computing for ML workloads.

Explanation

Serverless computing runs code without managing servers, scaling automatically based on demand. For ML: AWS Lambda, Google Cloud Functions, Azure Functions for inference; serverless training with services like SageMaker Processing. ALI used AWS Lambda for lightweight stock prediction API calls, automatically scaling from 0 to thousands of concurrent requests.

Memory Trick

Serverless = Code runs automatically, scales instantly, pay per execution

How ALI Answers

"My TCS stock prediction API using AWS Lambda scaled from 0 to 500 requests instantly during market volatility, then back to 0 during weekends. I only paid for actual prediction requests - perfect for unpredictable trading patterns. No server management, automatic scaling, cost-effective for sporadic workloads."

Question 64

What is cloud storage for ML data?

Explanation

Cloud storage options: Object storage (S3, GCS) for raw data, Data lakes for structured/unstructured data, Data warehouses (BigQuery, Redshift) for analytics, Feature stores for ML features. ALI stored raw stock data in S3, processed features in BigQuery, and used SageMaker Feature Store for his LSTM training pipeline at TCS.

Memory Trick

Cloud storage = Raw data lakes + Processed warehouses + Feature stores

How ALI Answers

"My TCS data pipeline: Raw market data → S3 (cheap storage) → BigQuery (fast processing) → SageMaker Feature Store (ML-ready features) → LSTM training. Each storage type optimized for its purpose: S3 for durability, BigQuery for analytics, Feature Store for consistent ML features across training and serving."

Question 65

How do you ensure security and privacy in cloud ML?

Explanation

Security measures include: Encryption at rest/transit, IAM policies, VPC/network isolation, Data anonymization, and Compliance frameworks (GDPR, HIPAA). ALI implemented strict IAM policies at TCS, ensuring only authorized personnel could access sensitive financial data, with all model training done in private VPCs.

Memory Trick

Cloud security = Encrypt + Access control + Network isolation + Compliance

How ALI Answers

"At TCS, financial data security was paramount. We encrypted all S3 data, used IAM roles (not root access), trained models in private VPCs isolated from internet, and anonymized customer data. Regular security audits ensured compliance. You can't just focus on model accuracy - data protection is equally critical in real ML projects."

Real-World AI Scenarios

Question 66

How would you build a recommendation system?

Explanation

Approaches include: Collaborative filtering (user-user, item-item), Content-based filtering, Matrix factorization, Deep learning (neural collaborative filtering), and Hybrid systems. ALI designed a stock recommendation system during his TCS project, combining collaborative filtering (similar investor portfolios) with content-based features (company fundamentals).

Memory Trick

Recommendations = People like you + Items like this + Deep patterns

How ALI Answers

"For TCS's stock recommendation system, I used collaborative filtering to find investors with similar portfolios, then recommended stocks they owned. Combined with content-based filtering using company fundamentals (P/E ratio, sector), and neural collaborative filtering to capture complex patterns. Cold start problem solved using Random Forest with company features."

Question 67

How would you approach fraud detection?

Explanation

Fraud detection involves: Anomaly detection, Supervised learning on labeled fraud cases, Real-time scoring, Feature engineering (transaction patterns), and Ensemble methods. ALI studied financial fraud patterns during his IIT coursework, using isolation forests for anomaly detection and Random Forest for classification with engineered time-based features.

Memory Trick

Fraud detection = Anomaly detection + Pattern recognition + Real-time alerts

How ALI Answers

"For my IIT fraud detection project, I combined multiple approaches: Isolation Forest for unknown fraud patterns, Random Forest trained on labeled cases, and engineered features like 'transactions per hour' and 'deviation from user's normal spending'. Real-time scoring with 99.5% precision was crucial - false positives block legitimate transactions."

Question 68

How would you build a chatbot?

Explanation

Chatbot architecture includes: NLU (intent classification, entity extraction), Dialogue management, Response generation, and Integration layers. Modern approaches use transformers, pre-trained models like GPT, and conversational AI platforms. ALI built a financial query chatbot during his IIT project using BERT for intent classification and template-based responses.

Memory Trick

Chatbot = Understand intent + Manage context + Generate response

How ALI Answers

"My IIT financial chatbot used BERT to classify user intents (stock price query, portfolio advice, market news), spaCy for entity extraction (company names, dates), and a rule-based dialogue manager. For responses, I used templates for structured queries and fine-tuned GPT-2 for explanatory answers about market concepts."

Question 69

How do you handle multi-modal AI systems?

Explanation

Multi-modal AI combines different data types (text, images, audio, time series). Approaches include Early fusion (combine raw features), Late fusion (combine predictions), and Joint learning (shared representations). ALI experimented with combining stock price data (time series) and financial news sentiment (text) using attention mechanisms to weight different modalities.

Memory Trick

Multi-modal = Combine different senses like humans do

How ALI Answers

"My advanced TCS project combined stock price LSTM with news sentiment BERT. I used late fusion - LSTM processed numerical data, BERT handled news text, then a neural network combined their outputs with attention weights. During earnings season, news sentiment got higher weights; during normal times, price patterns dominated."

Question 70

What are the ethical considerations in AI/ML?

Explanation

Ethical considerations include: Bias and fairness, Transparency and explainability, Privacy protection, Accountability, and Social impact. ALI studied algorithmic bias at IIT and ensured his TCS trading models didn't discriminate against smaller companies or specific sectors, implementing LIME for model explainability to build trader trust.

Memory Trick

AI Ethics = Fair + Transparent + Private + Accountable + Beneficial

How ALI Answers

"At TCS, I discovered my model was biased against small-cap stocks due to limited training data. I used SMOTE for data balancing and LIME to explain predictions to traders. We also implemented differential privacy for sensitive client data and regular bias audits. As my IIT professor said: 'With great ML power comes great responsibility.'"

↑

Drive Link

100 AI/ML Engineer Interview Questions Part B

Part B — Technical AI/ML Questions (70)

Meet ALI - Your AI/ML Interview Guide

ML Fundamentals

Algorithms

Deep Learning

CNNs

RNN/LSTM/GRU

Transformers/LLMs

Embeddings

Feature Engineering

Data Cleaning

Evaluation Metrics

MLOps

Deployment

Cloud Basics

Real-World AI Scenarios

100 AI/ML Engineer Interview Questions Part A

ML/AI Engineer Interview Deep Dive Part C

Leave A Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Recent Posts

When the musics over turn off the light

When the musics over turn off the light

When the musics over turn off the light

Categories

Newsletter

Drive Link

100 AI/ML Engineer Interview Questions Part B

Part B — Technical AI/ML Questions (70)

Meet ALI - Your AI/ML Interview Guide

ML Fundamentals

Algorithms

Deep Learning

CNNs

RNN/LSTM/GRU

Transformers/LLMs

Embeddings

Feature Engineering

Data Cleaning

Evaluation Metrics

MLOps

Deployment

Cloud Basics

Real-World AI Scenarios

100 AI/ML Engineer Interview Questions Part A

ML/AI Engineer Interview Deep Dive Part C

Leave A Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Recent Posts

When the musics over turn off the light

When the musics over turn off the light

When the musics over turn off the light

Categories

Tags

Newsletter