How to ace the Facebook data scientist interview
The final stage of the interview is a hour series of interviews at either the Menlo Park, Seattle, or the New York Facebook campus. There are 4 different minute interviews that each cover a different case study. There’s also a minute lunch break to discuss the role with a current data scientist.
You have the entire minute interview to answer each question:
- 1 SQL technical question
- 1 product interpretation question
- 1 quantitative analysis question
- 1 applied data question
The SQL technical question will be similar in format to the technical screening questions; you’ll receive a data set and be asked to solve problems using SQL. However, this SQL question tends to be more difficult and has a longer solution than those in the technical screening.
The product interpretation question asks you to measure product performance with details like target KPIs and how to implement a/b testing. You might be asked just to walk through this or you may have to create a high-level plan of the implementation.
An example problem for this would be:
“How would you measure the performance of X new feature?”
The quantitative analysis question is a basic statistics problem that tests if you understand the basics of statistical data analysis. Many candidates find this to be the easiest part of the interview as it is simply a baseline that you’ve not forgotten the fundamentals.
Example questions for this are things like:
- What is Bayes’ theorem and when would you use it?
- What is hypothesis testing?
- What is p-value and how do you interpret it in context?
- List assumptions about data in the context of linear regression.
- How would you explain the application of probability to your product manager?
The applied data question asks you to consider a solution at a high-level. You’ll outline your process, list any assumptions you have, describe possible shortcomings and how you’ve prepared for them, and explain how you reached your conclusions. The interviewer will ask follow-up questions during the process to see how deeply you’re thinking about this solution.
Questions for this section are intentionally broad, such as:
- Do people interact more or less on Facebook with their siblings?
- How would you measure interaction?
- How would you determine if people are siblings?
- How could Facebook use this information?
- How does activity vary depending on the season? What region/s are you looking at? How would you weight activity, is a comment worth more than a like?
- What factors would you use to distinguish users?
- How could Facebook use this information?
Between two of these interviews, you’ll get a casual minute interview with a current data scientist to ask them about their day-to-day responsibilities, challenges, and anything else you’re curious about.
This is essentially a behavioral interview to see if you’ve got the right mindset and excitement to fit the company. Ask them insightful questions that show you’re thinking about the job, and turn your charm up to 11!
Some good questions to ask are:
- What was the most difficult project of your career and how did you solve it?
- What are the unique benefits of being a Facebook data scientist?
- What tips do you wish you had when you started working at Facebook?
- What is your favorite Facebook feature to work on and why do you like it?
The Facebook Data Science, Product Analytics Interview
A theme of the on-sight product interviews is that they are all flavors of above framework. This interview was a senior technical person, who asked more direct questions about modeling and interpretation. The question you are likely to get will be something that would require a sophisticated model to solve:
- How could we reduce the number of fake accounts on Facebook, and measure the impact of a fake account on our user base?
- What content should we show on the Instagram Explore Page?
Use the above definitions again. These definitions will assist you once again in making sure you are addressing the problem head-on and in an impactful way for the business. Translating Data Science to business value is at the core of this interview
Get buy-in from interviewer along the way. Unlike my first screen, this interview was more reserved about giving feedback to my ideas. I stopped my train of thought multiple times to check-in with them, and made sure that they were onboard with my assumptions
Understand the advantages and limitations of the model you choose. Choose a modeling approach that you feel comfortable explaining how to implement this. Many of my follow up questions were around how a model would handle certain scenarios and edge-cases associated with my experiment.
From what I’ve read online, this question almost always comes in the form of:
- How would you assess the value of [Facebook Product]?
This interview is with a Product Person, so the requirements and tips are slightly different
Research & Understand Facebook’s Products. This is something that my recruiter harped on as well, and I’m glad I spent some time familiarizing myself with how their products currently run. This question requires you to understand the business value of a random Facebook feature, and there are a lot. I recommend mentally grouping certain features to experiences (Facebook feed to sharing thoughts/feelings, Groups for finding a community, Instagram for sharing experiences, etc.)
Be prepared to defend your decisions, or pivot. This applies to all interviews, but particularly this one. The interview will likely push back on your idea just slightly, in order for you to reconsider. Understanding if they are pushing back because they need more information or because you are going down the wrong path is a tricky EQ problem, but in my opinion the purpose interview. I like to brainstorm 2–3 potential solutions and talk out the pros/cons of each, then select the best one. Then in response to any pushback, I can either defend my selection but comparing it against the other options, or realize that an additional piece of information provided changes which option is optimal
Talk out your thought process. I found a lot of my success in this interview came from talking out my assumptions and reading how the interviewer was responding. From my experience, as a Product Person this person is a lot more expressive and receptive to guiding your train of thought in the right direction
This is your classic SQL/Python coding experience. There are plenty of resources out there already Focus on knowing Groupbys, Joins, and Sorting. Questions were along this lines of:
- Join these two tables so there is a distinct row for each value in column X
- Given a table of user session data, find the average length of each session
Do practice problems. I always find the best way to prepare for a code interview is to code. Whether it’s doing HackerRank problems, or taking a course with problems, build a habit of doing DS problems!
Talk before you Write. I always like to pseudocode out my solution before I implement it, to ensure I capture the intended outcome of the question. Sometimes (like in this interview!) there will be a mistake in my logic that will come out in the pseudocode, so you can catch it early. Once I get the pseudocode down, I have a short conversation with the interviewer to ensure I am addressing the right question. Talking out your thought process is always a good idea!
The final interview focused on Statistics. To be honest, this interview was the hardest because (1) it was the last one of the day so I was fatigued and (2) it was a big shift from the product questions asked in previous interviews. I couldn’t find any problem examples online, but the question I got was something similar to:
- Given a medical test produce a 1% False Positive rate, and the the population True Positive Rate is 5%, what is the likelihood of a True Positive given by the test?
Reacquaint yourself Basic Probability Laws. Despite being a Data Scientist for several years, remembering the definition of conditional probability took me a second (embarrassing, I know). Revisiting these and being able to talk about them in detail and at a business level will help you move through this interview with ease. Bayes Rule always comes in handy:
Reacquaint yourself with Confusion Matrices. If F1 Score and True Positive Rate don’t immediately jump out at you as talking points for this question, it would be a great exercise for you to brush up on Confusion Matrix terms, and translating them into non-statistical examples. I believe my expertise in this regard is what saved my earlier fumble in the interview
Complete Guide: Facebook Data Science Interview Questions
Facebook Data Science is Unique
Working as a data scientist at a big company is a dream come true for many. However, before pursing this role, it's important to know how it works. You need to understand that Facebook sees data science a bit differently, even compared to companies like Google or Amazon.
Core areas of work
To start, it might be helpful to define the 4 core areas Facebook data scientists work in. This will frame everything in your interview process:
- Use quantitative tools to uncover opportunities, set team goals, and work with cross-functional partners to guide the product roadmap.
- Explore, analyze, and aggregate large data sets to provide actionable information, and create intuitive visualizations to convey those results to a broad audience.
- Design informative experiments considering statistical significance, sources of bias, target populations, and potential for positive results.
- Collaborate with engineers on logging, product health monitoring, and experiment design/ analysis.
If you're interested
Start reaching out to your network. 80% of hiring at Facebook is through sourcing or referrals, so if you want to get in, this is your best bet.
Leverage your immediate network and second degree connections to score a referral. It doesn't matter who refers you for the role. In fact, there is a whole team at Facebook dedicated to helping referrals, and the process there is fairly fast-paced: some candidates get a recruiter call within a week.
Once you get that recruiter phone call, what happens next?
The Interview Process: Early Stages
Step 1: Recruiter Phone Screen
- No Python or SQL questions here - this phone interview will mainly focus on your background, with a few behavioral questions peppered in.
- Essentially, the recruiter is just looking to assess your communication skills and scope of your previous work.
- Most candidates make it through.
Step 2: Video Interview with a Facebook Data Scientist
It's 45 minutes long and primarily consists of technical interview questions. In this interview, they're looking for two things:
- Creativity and skill when solving business problems.
- How you articulate/communicate your solutions.
How you engage with the problem, including your thought process, structure, and communication style, are even more important than your ability to solve the problem.
Preparing for early stage interviews
Facebook recommends spending some time understanding what they consider a "Facebook product":
"Spend some time engaging with Facebook Products less as a user and more as someone who is tasked with improving or developing these products. The “What We Build” tab on this link outlines what we consider a “Facebook Product” (Ads, Mobile, Timeline, News Feed, Messaging, etc.). Note: It isn’t a complete list"
In addition to this research, you should always look into interview questions - you can try Candor Community for recent data.
Characteristics Facebook looks for
In this first data science interview, Facebook looks for 4 factors to move you forward to the next round:
1. Structure: How good you are at taking a large problem or open ended question and framing it in the right context.
2. Action: Does your review lead to specific action items for the team? This should show what working with you is like.
3. Analytical Understanding: Can you translate between numbers and words (i.e. prove to your interviewer that product “X” should be built through data resulting in analytical proof)?
4. Hypothesis Driven: Can you identify reasonable hypotheses and apply basic logic to support those hypotheses?
The Interview Process: The Final Round!
Typically, candidates for the data science role will be invited for an onsite interview during the final round. This could take place in Menlo Park, Seattle, or New York. During COVID, however, you will likely do this interview via video.
In this final round, you'll still face some standard data science interview questions. In addition, you'll be tested on your product knowledge, especially if you're on the product analyst track.
Product Interpretation Questions
This will involve a case study focused on making sense of user behavior using data, analytics and metrics. Questions in this section are generally broad, like:
How would you evaluate Facebook Groups?
How would you assess engagement on Instagram Stories?
When answering these questions,
- You'll be expected to show scope: Can you put your data scientist hat aside for a moment and see the perspective of the PM and the rest of the team?
- You will likely be asked follow ups: Commonly around adding new features, creating metrics, and testing hypothesis through designing appropriate tests.
The data science job at Facebook, unlike other tech companies, has broad scope. Many job seekers assume this interview is just about jamming on some sql/ python, but it's not. The case study will really test your ability to think about the business and answer questions about how a feature or metric can affect the user experience. You will also get related questions around trade-offs and how they may affect revenue or engagement.
👉 Try your hand at recently asked questions on our Mock Interview platform.
Applied Data/Analytics Interview
This is the part of the data scientist interview you probably expected.
Technical questions will focus on solving a problem, using a dataset Facebook will provide during the interview. This will often include data pertaining to Facebook products, so it really helps to understand those ahead of the interview. Specifically, think through how the products came to be, what decisions were made along the way, and what metrics matter.
Occasionally, you will be asked a question like:
Here's X card transaction dataset, how will you use machine learning to design a fraud detection algorithm?
To solve them, you will be expected to
- Select the right data and draw inferences from it to make informed decisions
- Build metrics
- Circle back to how your decisions might impact the core product
Data scientists at Facebook are always expected to approach problems with broad scope and keen product sense.
Don't forget to consider details like a/b testing, thinking of technical tradeoffs - don't just focus on the data analysis. Here are some questions Facebook recommends focusing your mind on while you structure a solution:
- Why do you think they made certain decisions about how it works?
- What could you do to improve the product?
- What kind of metrics would you want to consider when solving for questions around a product's health, growth, or engagement?
- How would you measure the success of different parts of the product?
- What metrics would you assess when trying to solve business problems related to our product?
- How would you tell if a product is performing well or not?
If you want to prep more, we highly recommend using genuine Facebook data scientist interview questions and spending time understanding how the salary negotiation process works at FANG.
The information provided herein is for general informational purposes only and is not intended to provide tax, legal, or investment advice and should not be construed as an offer to sell, a solicitation of an offer to buy, or a recommendation of any security by Candor, its employees and affiliates, or any third-party. Any expressions of opinion or assumptions are for illustrative purposes only and are subject to change without notice. Past performance is not a guarantee of future results and the opinions presented herein should not be viewed as an indicator of future performance. Investing in securities involves risk. Loss of principal is possible.
Third-party data has been obtained from sources we believe to be reliable; however, its accuracy, completeness, or reliability cannot be guaranteed. Candor does not receive compensation to promote or discuss any particular Company; however, Candor, its employees and affiliates, and/or its clients may hold positions in securities of the Companies discussed.
Facebook Data Scientist
The role of a Facebook Data Scientist
The data scientist role at Facebook combines strong analytical and technical skills with sharp product sense. Compared to other big tech companies, the data scientist role here is a broad one and involves setting team goals, finding opportunities in the data to shift product focus, modeling predictions, and setting a culture of rigorous experimental testing. The role focuses more on deploying your data science understanding and quantitative skills to optimize Facebook's products and adding business value to them. Here's the role in a little more detail.
- Applying your knowledge and skills in quantitative analysis, data mining, and the presentation of data to get a clear idea of how Facebook’s users interact with both their consumer and business products.
- Using quantitative tools to see through opportunities, set team goals, and work with cross-functional partners to guide the product development/improvement roadmap.
- Informing, influencing, supporting, and executing their product decisions and product launches
- Forecasting and setting the product team’s goals
- Exploring, analyzing and aggregating large data sets to provide actionable information, and creating intuitive visualizations to convey those results to a broad audience.
- Designing informative experiments considering statistical significance, sources of bias, target
populations, and potential for positive results.
- Collaborating with engineers on logging, product health monitoring, and experimenting with design/analysis
- Working in Hadoop and Hive primarily, sometimes MySQL, Oracle, and Vertica.
Skills/Qualifications required/preferred for Facebook data scientist
- 2+ years of experience doing quantitative analysis within a large-scale company or fast-paced environment
- Experience in SQL or other programming languages
- Development experience in any scripting language (PHP, Python, Perl, etc.)
- Experience communicating the results of analyses with product and leadership teams to influence the strategy of the product
- Knowledge of statistics (e.g. hypothesis testing, regressions)
- Experience manipulating data sets through statistical software (ex. R, SAS) or other methods
The interview process for the Facebook data scientist role consists of 2 stages:
- Initial Screening round (45 minutes)
- Onsite interview
The initial interview will be a minute video conference with a Facebook data scientist. The interview will include these sections:
- Analytical: 10 – 20 minutes.
- Technical: 10 – 20 minutes.
- Q&A: 5 minutes.
Initial Screening round
The initial screening round is 45 minutes duration. It has three main sections:-
The analytical section takes about minutes of your initial screening interview time. This part of your initial screen is designed to help the interviewer assess your product sense. The questions asked will allow you to show the interviewer your approach to solving business questions and problems, as well as how creative and articulate you are at thinking through these problems while solving them. Do keep in mind that it’s not about arriving at the perfect or correct answer, but rather about your approach to the problem.
To prepare, spend some time engaging with Facebook Products less as a user and more as someone who is tasked with improving or developing these products(such as Ads, Mobile, Timeline, News Feed, Messaging, etc).
Like the analytical section, the technical section too lasts for minutes. During the technical section of the interview, the interviewer will be assessing your ability to translate a high-level question into an execution strategy and explain how the result is relevant and what aspects may still be lacking.
What the interviewer will assess
- Language-neutral skills in coding/data manipulation
- Working with grouping and aggregate functions.
- Utilizing different types of joins (left, inner, outer, etc.) including when and how to use a self-join.
- Appending multiple data sources (union in SQL, concat in Pandas, bind_rows in R).
- Filtering data by multiple, complex conditions.
- De-duplicating, sorting, handling missing/incomplete data.
- Assessing Efficiency. The interviewer may ask you to think of more efficient ideas or to explain why you’re making certain efficiency/simplicity tradeoffs.
- You may work in whatever dialect you like, but you’ll be able to answer all questions with ANSI-standard functions (think PostgreSQL). If you use a dialect-specific syntax, you may need to explain it to your interviewer.
- Try to maintain consistency in capitalization/indentation style for better readability.
- Given the heavy focus on data manipulation, most people choose to use libraries, such as Pandas / NumPy in Python or dplyr in R. It’s possible to solve the questions in pure Python / R (or any Turing-complete language), but doing so will likely be much slower and more difficult.
- The interview will either be on a whiteboard or in a plain text environment, so there’ll be no access to function autocomplete or help documentation.
- A few small mistakes in syntax won’t automatically disqualify you, but pseudocode or a general explanation isn’t acceptable. You must know the function names, input arguments, etc., to implement the core skills listed above.
- Think out loud.
Narrate your approach to the problem/question asked as you go through the problem so that the interviewer has insight into your thought process.
- Deconstruct problems.
Follow the modular thinking approach to big ambiguous problems, breaking them into smaller groups, and combining the groups for a solution.
Resort to mid answer course correction if your interviewer prompts you that you’re heading in the wrong direction.
Ask clarifying questions during the interview.
- Prepare an answer to the cliched "Why Facebook?" question.
Facebook interviewers like to see people who know about the company's environment, projects, challenges, etc.
If time permits you may pop in a few questions yourself, say about Facebook and analytics.
Always remember to keep your Product Owner hat on. Think like a member of the product team that built the product/feature for both parts of your interview. The interviewer must feel that you’re thinking about these questions:
- Why do you think they made certain decisions about how it works?
- What could you do to improve the product?
- What kind of metrics would you want to consider when solving questions around a product’s health, growth, or engagement?
- How would you measure the success of different parts of the product?
- What metrics would you assess when trying to solve business problems related to our products?
- How would you tell if a product is performing well or not?
- How would you set up an experiment to evaluate any new products
The QnA round consists of a few questions of general nature that the recruiter might put to you or vice-versa. This is a short session of roughly 5 minutes.
Knowing the following:
- An attendance log for every student in a school district with: attendance_events: date | student_id | attendance
- A summary table with demographics for each student in the district: all_students : student_id | school_id | grade_level | date_of_birth | hometown
Using this data, how would you answer the following?
- What per cent of students attend school on their birthday?
- Which grade level had the largest drop in attendance between yesterday and today?
The initial screen is followed by an onsite round. The onsite interview will test more deeply the concepts and skills tested in the initial screener. Also, questions from another focus area, quantitative analysis, will be asked. Throughout your discussions during the day, the interviewers will be judging you on your ability to tell a compelling story with data, make data-driven decisions, and impact change through product development and optimization. Many of the questions throughout the interviews will be in the context of Facebook's products. Therefore it would be worth preparing for the different metrics that you’d use to measure the success of different Facebook Products for the interview loop.
Structure of the onsite interview:
The onsite interview will include the following minute sections:
- Analysis Case: Product Interpretation.
- Analysis Case: Applied Data.
- Quantitative Analysis.
- Technical Analysis.
Analysis case 1: Product Interpretation
The product case study is focused on understanding user behaviour through data and metrics. Product interpretation (PI) is all about translating user behaviour into product ideas and insights using data and metrics.
What the interviewer is looking to assess:
- Understanding of hypotheses for launching new features: “How can I improve a product?”
- Ability to consider and quantify tradeoffs of a feature in terms of metrics.
- Ability to design experiments to test these hypotheses.
- How you interpret the results of experiments.
- How you communicate decision-making via metrics.
How would you evaluate YouTube’s video recommendations?
How would you make facebook's newsfeed feature more relevant to a particular age group?
How would you measure the performance of Facebook Ads?
Analysis case 2: Applied data
The applied data interview primarily tests the technical side of your problem-solving approach using data. To perform well in this section, it's helpful to engage with each of facebook's core products, trying to reverse-engineer in your mind how
these products came to be, what metrics, and what testing, hypotheses and experimentation were used.
Applied data questions will require you to:
• Consider what data sets are best suited to answer a product question.
• Draw inferences from a data set.
• Combine multiple signals into a data-informed statement.
• Map analytical insights back to product impact.
Do people interact more or less on Facebook with their siblings?
How would you measure social interaction?
How does activity vary depending on the season? What region/regions are you considering?
How would you weigh a user's Facebook activity? Does a comment carry more value than alike?
What factors would you use to distinguish users?
How could this information be of use to Facebook?
This part of the interview focuses on basic questions designed to evaluate your quantitative reasoning and applied statistics skills. It would be very helpful to brush up on the core stats concepts that you might use to solve business problems.
It has 2 parts:
- Quantitative reasoning
- Applied Statistics
Key concepts/skills tested:
- Knowledge of key mathematical concepts such as probability
- Statistical knowledge
- How these concepts relate to Facebook products
- What do you think the distribution of time spent per day on Facebook looks like?
- What metrics would you use to describe that distribution?”
This is hands-on skill testing. Here the focus is on the practical application of statistical concepts.
Key skills tested:
- Whether you can apply statistical concepts to real-world problems
- Whether you can draw meaningful and logical inferences from data using core statistical concepts.
Questions will most likely cover the following concepts/areas:
- Estimation and logical reasoning in the context of a real-world product.
- Elements of descriptive statistics (mean/expected value, median, mode, percentiles, etc.).
- Common distributions, such as binomial or normal distributions.
- The profile of real-world data.
- Law of Large Numbers, Central Limit Theorem, Linear Regression.
- Conditional probabilities, including Bayes’ Theorem.
- How do you apply A / B testing?
- Do you know how to interpret experiment results
- Do you understand common distributions?
The technical analysis section of the onsite loop is similar to the technical section of the initial screen. Here, the interviewer assesses your ability to brainstorm given data, and analyze open-ended product-related problems with code.
- Ability to structure and articulate a solution based on data while solving an open-ended problem.
- Using your coding skills to reach an executable solution based on a well-defined approach.
- Identifying and addressing edge cases.
- Adapting or improvising code based on new information and/or constraint
- Given the timestamps of logins, how many people on Facebook were active all seven days of a week on a mobile phone?
- How do you determine what product on Facebook was used most by the non-employee users for the last quarter?
Tips for the technical analysis interview:
- Most questions are designed/based on SQL, so proficiency in SQL will be an added advantage.
- Interviewers will be asking whiteboarding solutions, so make sure you practice a few problems on the whiteboard before your interview.
- While minor syntax errors in coding may not be penalized, you must be able to articulate your logic and approach in the code to the interviewer.
Salary ranges for Facebook Data Scientists
The salary of a Facebook Data Scientist varies widely. It usually starts at around , USD (total compensation) for a starting level role, and goes up to , USD (total compensation) for senior level roles. The median salary hovers around the , USD mark (total compensation) for someone with about 1 years of experience. The breakdown of this is about , USD in base salary, about 65, USD in stock, and a 15, USD bonus component.
Frequently Asked Questions
How many rounds are there in the Facebook Data Science interview?
There are two rounds, namely the Initial Screening Round and the Onsite Round.
Does the the Initial Screening round have any subsections?
Yes, the Initial Screening round has 3 subsections, namely Analytical Section, Technical Section and a QnA round.
How long does the onsite round last?
The onsite round is the lengthiest of all the interview rounds lasting for about two hours.
What is the structure of the onsite round?
The onsite round has four minute sections, namely Analysis Case: Product Interpretation, Analysis Case: Applied Data, Quantitative Analysis, and Technical Analysis.
Relevant interview guides
Applied data interview facebook
Subscribe to Interview Query Blog
The Facebook interview consists of multiple technical and business case questions, heavily focused on applying technical knowledge to business case scenarios. Facebook data scientists are expected to work cross functionally and explore, analyze, and aggregate large data sets to provide actionable information.
Facebook Interview Questions
Facebook interview questions generally fall into four main categories:
- Product and business sense
- Technical data analysis (SQL, pandas)
- Statistics and probability
- Modeling knowledge and applying data
The technical screen will generally consist of one product question and one data analysis question. Be sure to prepare for both in order to move on to the onsite.
Case Study Interview Questions
Q1:Facebook composer, the posting tool, drops from 3% posts per user last month to % posts per user today.How would you investigate what happened?
The question states the drop is from 3% a month ago to % today. The first thing we have to do is clarify the context around the problem before jumping to conclusions about metrics. Is today a weekday and one month from today a weekend so users are posting less? Is there a special event or seasonality? Is this an ongoing downward trend or a one-time occurrence spike downwards?
The second part is understanding the metric itself. What drove the decrease: was it the number of users that increased or the number of posts that decreased? The interviewer will likely ask you to jump into one or both of the metrics to discuss what could have caused the decrease.
Q2:A Facebook Groups product manager decides to add threading to comments on group posts. Comments per user increase by 10% but posts go down 2%.Why would that be? What metrics would prove your hypotheses?
Threading restructures the flow of comments so that, instead of responding to the post, users can now respond to individual comments beneath the post.
What effect might this have on a push notification ecosystem?
Q3: Facebook is rolling out a new feature called "Mentions" which is an app specifically for celebrities on Facebook to connect with their fans.How would you measure the health of the Mentions app?
We can start by breaking down some structure on what the interviewer is looking for. Whenever we're given these open-ended product questions, it makes sense to think about structuring the questions with well-defined objectives so we're not switching between different answers.
1. Did you begin by stating what the goals of the feature are before jumping into defining metrics? What is the point of the Mentions feature?
2. Are your answers structured or do you tend to talk about random points?
3. Are the metrics definitions specific or are they generalized in an example like “I would find out if people used Mentions frequently”?
Q4: How can Facebook figure out when users falsify their attended schools?
Q5: If 70% of Facebook users on iOS use Instagram, but only 35% of Facebook users on Android use Instagram, how would you investigate the discrepancy?
Q6: Facebook Newsfeed engagement is down by 10%. How would you find out why?
For more practice and guidance, run through our Product Sense course.
SQL Interview Questions
Q1: Write a SQL query to create a histogram of number of comments per user in the month of January Assume bin buckets class intervals of one.
Since a histogram is just a display of frequencies of each user, all we really need to do is get thetotal count of user comments in the month of January for each user, and then group by that count.
Q2:In the table below, column `action` represents either ('post_enter', 'post_submit', 'post_canceled') for when a user starts a post (enter), ends up canceling it (cancel), or ends up posting it (submit).
Write a query to get the post success rate for each day in the month of January
Let's see if we can clearly define the metrics we want to calculate before just jumping into the problem. We want post success rate for each day over the past week.
To get that metric, we can assume post success rate can be defined as:
(total posts created) / (total posts entered)
Additionally, since the success rate must be broken down by day, we must make sure that a post that is entered must be completed on the same day.
Now that we have these requirements, it's time to calculate our metrics. We know we have to GROUP BY the date to get each day's posting success rate. We also have to break down how we can compute our two metrics of total posts entered and total posts actually created.
Q3: We want to build a naive recommender and we're given two tables, one table called `friends` with a user_id and friend_id columns representing each user's friends, and another table called `page_likes` with a user_id and a page_id representing the page each user liked.
Write an SQL query to create a metric to recommend pages for each user based on recommendations from their friends liked pages.
We can start by visualizing what kind of output we want from the query. Given that we have to create a metric for each user to recommend pages, we know we want something with a user_id and a page_id along with some sort of recommendation score.
How can we easily represent the scores of each user_id and page_id combo? One naive method would be to create a score by summing up the total likes by friends on each page that the user hasn't currently liked. The max value on our metric would be the most recommendable page.
The first thing we have to do then is to write a query to associate users to their friends liked pages. We can do that easily with an initial join between the two tables.
Check out our SQL course for more help.
Statistics and Probability Interview Questions
Q1: What do you think the distribution of time spent per day on Facebook looks like? What metrics would you use to describe that distribution?
Having the vocabulary to describe a distribution is an important skill as a data scientist when it comes to communicating ideas to your peers. There are four important concepts, with supporting vocabulary, that you can use to structure your answer to a question like this. These are:
- Center (mean, median, mode)
- Spread (standard deviation, inter quartile range, range)
- Shape (skewness, kurtosis, uni or bimodal)
- Outliers (Do they exist?)
In terms of the distribution of time spent per day on Facebook (FB), one can imagine there may be two groups of people on Facebook:
- People who scroll quickly through their feed and don’t spend too much time on FB.
- People who spend a large amount of their social media time on FB.
Based on this, what kind of claims could we make about the distribution of time spent on FB?
Q2: We use people to rate ads. There are two types of raters, random and independent from our point of view:
- 80% of raters are careful and they rate an ad as good (60% chance) or bad (40% chance).
- 20% of raters are lazy and they rate every ad as good (% chance).
Suppose we have raters each rating one ad independently. What's the expected number of good ads?
Keep in mind that in order for the rater to rate an ad, the rater must first be selected. So the event that the rater is selected happens first, then the rating happens. How would you represent this fact arthmetically using basic properties of probability?
Hint: If we only have one rater, we don't need to test that rater's personality more than once.
Q3: Three zebras are chilling in the desert when a lion suddenly attacks. Each zebra is sitting on a corner of an equally spaced triangle. Each zebra randomly picks a direction and only runs along the outline of the triangle to either edge of the triangle.
What is the probability that none of the zebras collide?
There are two scenarios in which the zebras do not collide: if they all move clockwise or if they all move counterclockwise.
How do we calculate the probability that an individual zebra chooses to move clockwise or counterclockwise? How can we use this individual probability to calculate the probability that allzebras choose to move in the same direction?
Machine Learning Interview Questions
Q1: How would you test whether having more friends now increases the probability that a Facebook member is still an active user after 6 months?
Since we are interested in whether or not someone will be an active user in 6 months or not, we can test this assumption by first looking at the existing data. One way to do so is to put users into buckets determined by friend size six months ago and then look at their activity over the next six months.
If we set a metric to define "active user", such as if they logged in X number of times, posted once, etc., we can then just compute the averages on these metrics across the buckets to determine if more friends is equivalent to higher engagement metrics.
Q2: We're given different posts such as your friends baby pictures, Buzzfeed Tasty videos, and birthday posts and have to decide how to rank them.
How would you optimize the ratio of public versus private content? How would you build a model, what features would you use, and what metrics would you track?
Q3: You've been asked to generate a machine learning model that can map the nicknames of people using Facebook.How do you go about designing this model?
Q4:A product manager has asked you to develop a method to match users to their siblings on Facebook. How would you evaluate a method or algorithm to match users with their siblings? What metrics might you use?
See our Machine Learning course for more in-depth explanations.
Facebook System Design Interview Questions
Q1: How would you build the recommendation algorithm for type-ahead search for Netflix?
Let's think about a simple use case to start out with. Say that we type in the word "hello" for the beginning of a movie.
If we typed in h-e-l-l-o, then a suitable suggestion might be a movie like "Hello Sunshine" or a Spanish movie named "Hola".
Let's now move on to an MVP within the scope. We can begin to think of the solution in the form of a prefix table.
How a prefix table works is that your prefix, your input string, outputs your output string, one at a time to start with. For an mvp, we could input a string and output a suggestion string with added fuzzy matching and context matching.
But now how do we recommend a certain movie?
Run through our System Design course for more problems.
Coding Interview Questions
Q1: There are two lists of dictionaries representing friendship beginnings and endings: friends_added and friends_removed. Each dictionary contains the user_ids and created_at time of the friendship beginning/ending.
Write a function to generate an output which lists the pairs of friends with their corresponding timestamps of the friendship beginning and then the timestamp of the friendship ending.
Note that you are only looking for friendships that have an end date. Because of this, every friendship that will be in our final output is contained within the friends_removed list. So if you start by iterating through the friends_removed dictionary, you will already have the id pair and the end date of each listing in our final output–you just need to find the corresponding start date for each end date.
The friends_added and friends_removed dictionaries are already sorted by date. Because of this, you can be sure that as long as you iterate from the top through both, you will find the correct pairings of dates since each end date can only have one corresponding start date appearing before it in time.
Q2: In data science, there exists the concept of stemming, which is the heuristic of chopping off the end of a word to clean and bucket it into an easier feature set.
Given a dictionary consisting of many roots and a sentence, stem all the words in the sentence with the root forming it. If a word has many roots can form it, replace it with the root with the shortest length.
At first it simply looks like we can just loop through each word and check if the root exists in the word and if so, replace the word with the root. But since we are technically stemming the words we have to make sure that the roots are equivalent to the word at it's prefix rather than existing anywhere within the word.
We're given a dictionary of roots with a sentence string. Given we have to check each word, try creating a function that takes a word and returns the existing word if it doesn't match a root, or return the root itself.
Q3: You're given a dataframe of students:
Write a function to select only the rows where the student's favorite color is green or red and their grade is above
This question requires us to filter a dataframe by two conditions: first, the grade of the student, and second, their favorite color.
Let's start with filtering by grade since it's a bit simpler than filtering by strings. We can filter columns in pandas by setting our dataframe equal to itself with the filter in place. In this case:
If we were to look at our dataframe after passing that line of code, we'd see that every student with a lower grade than 90 no longer appears in our data frame.
Looking for more data science questions? Check out our LinkedIn Data Science interview guide or our Python interview questions resource.
- Ford 3000 hydraulic diagram
- Air lift 2000
- Chemical guys wheels
- Stock jl rubicon tire size
- Call verizon
- Lesco fertilizer
- 2019 titleist irons
- Free tattoo fonts cursive
- Technics sa 500 receiver