Data Analytics Interview Questions

1. What is SQL and why is it important for data analytics?

Answer:

SQL (Structured Query Language) is a programming language used to manage and manipulate relational databases. It is important for data analytics because it allows analysts to extract, manipulate, and analyze data stored in relational databases.

2.How do you retrieve data from a database?

Answer:

You can retrieve data using the SELECT statement.

Example:
SELECT * FROM employees;

3. What is a JOIN in SQL and what are its types?

Answer:

A JOIN is used to combine rows from two or more tables based on a related column. Types include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.

Example:
SELECT a.name, b.salary FROM employees a INNER JOIN salaries b ON a.id = b.employee_id;

4. Explain the difference between WHERE and HAVING clauses

Answer:

WHERE is used to filter records before any groupings are made, while HAVING is used to filter records after grouping.

Example:
SELECT department, COUNT(*) FROM employees WHERE age > 30 GROUP BY department HAVING COUNT(*) > 10;

5. How do you find duplicate records in a table?

Answer:

You can find duplicates using the GROUP BY and HAVING clauses.

Example:
SELECT name, COUNT(*) FROM employees GROUP BY name HAVING COUNT(*) > 1;

6. What is a subquery?

Answer:

A subquery is a query within another query.

Example:
SELECT name FROM employees WHERE id IN (SELECT employee_id FROM salaries WHERE salary > 50000);

7. How do you update data in a table?

Answer:

You can update data using the UPDATE statement.

Example:
UPDATE employees SET salary = 60000 WHERE id = 1;

8. What is the difference between DELETE and TRUNCATE?

Answer:

DELETE removes rows one by one and can have a WHERE clause, TRUNCATE removes all rows without logging individual row deletions.

Example:
DELETE FROM employees WHERE id = 1; vs TRUNCATE TABLE employees;

9. How do you create a new table?

Answer:

You can create a table using the CREATE TABLE statement.

Example:
CREATE TABLE employees (id INT, name VARCHAR(100), age INT);

10. What are indexes and why are they important?

Answer:

Indexes are used to speed up the retrieval of rows by creating a data structure that allows quick lookup of values.

Example:
CREATE INDEX idx_name ON employees(name);

11. Explain the difference between UNION and UNION ALL.

Answer:

UNION removes duplicate records, UNION ALL includes duplicates.

Example:
SELECT name FROM employees1 UNION SELECT name FROM employees2;

12. How do you handle NULL values in SQL?

Answer:

You can use IS NULL, IS NOT NULL, COALESCE, or IFNULL functions.

Example:
SELECT COALESCE(name, 'Unknown') FROM employees;

13. What is a primary key?

Answer:

A primary key is a unique identifier for a record in a table.

Example:
ALTER TABLE employees ADD PRIMARY KEY (id);

14. How do you perform a case-insensitive search in SQL?

Answer:

Use the LOWER or UPPER functions.

Example:
SELECT * FROM employees WHERE LOWER(name) = 'john';

15. What is the difference between CHAR and VARCHAR?

Answer:

CHAR is fixed-length, VARCHAR is variable-length.

Example:
CREATE TABLE employees (name CHAR(50)); vs CREATE TABLE employees (name VARCHAR(50));

16. How do you retrieve the top N records from a table?

Answer:

Use the LIMIT clause.

Example:
SELECT * FROM employees ORDER BY salary DESC LIMIT 10;

17. What is a stored procedure?

Answer:

A stored procedure is a set of SQL statements that can be stored and executed on the database server.

Example:
CREATE PROCEDURE GetEmployees() BEGIN SELECT * FROM employees; END;

18. Explain the concept of normalization.

Answer:

Normalization is the process of organizing data to reduce redundancy and improve data integrity.

Example:
Dividing a single table into multiple tables to eliminate duplicate data.

19. How do you combine rows from different tables without using a JOIN?

Answer:

You can use a subquery.

Example:
SELECT a.name, (SELECT salary FROM salaries WHERE employee_id = a.id) FROM employees a;

20. What is a foreign key?

Answer:

A foreign key is a field in one table that uniquely identifies a row of another table.

Example:
ALTER TABLE salaries ADD FOREIGN KEY (employee_id) REFERENCES employees(id);

21. What is Power BI and how is it used in data analytics?

Answer:

Power BI is a business analytics tool by Microsoft that provides interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards.

22. What are the key components of Power BI?

Answer:

Power BI Desktop, Power BI Service, Power BI Mobile, Power BI Gateway, Power BI Report Server, and Power BI Embedded.

23. How do you connect to data sources in Power BI?

Answer:

Use the Get Data feature in Power BI Desktop to connect to various data sources like SQL Server, Excel, web, etc.

24. What is DAX and why is it important in Power BI?

Answer:

DAX (Data Analysis Expressions) is a formula language used in Power BI, Power Pivot, and Analysis Services for creating custom calculations and aggregations.

Example: 
SUM(Sales[Amount])

25. How do you create a calculated column in Power BI?

Answer:

Go to the Modeling tab, select New Column, and enter your DAX formula.

Example: 
SUM(Sales[Amount])

26. What is a measure in Power BI?

Answer:

A measure is a calculation used in aggregations, often created using DAX.

Example: 
SUM(Sales[Amount])

27. How do you create a report in Power BI?

Answer:

Use Power BI Desktop to connect to data, transform it, and use visualization tools to create charts, graphs, and other visual elements.

28.What are Power BI dashboards?

Answer:

Dashboards are single-page, often interactive, visual representations of data, created from reports to provide at-a-glance insights.

Answer:

Publish reports to the Power BI Service and share them with others using sharing links, workspaces, or embedding them in websites or applications.

30. What is the role of Power Query in Power BI?

Answer:

Power Query is used for data ingestion and transformation, allowing users to extract, transform, and load (ETL) data from various sources.

31. How do you handle data refresh in Power BI?

Answer:

Set up a data refresh schedule in the Power BI Service to update datasets automatically from the connected data sources.

32. What are slicers in Power BI?

Answer:

Slicers are visual tools that allow users to filter data in reports and dashboards interactively.

Example:
Adding a date slicer to filter sales data by date.

33. How do you create relationships between tables in Power BI?

Answer:

Use the Model view to drag and drop fields to create relationships, or use the Manage Relationships feature.

34. What is Power BI Q&A?

Answer:

Power BI Q&A allows users to ask questions about their data in natural language and get answers in the form of visualizations.

35. How do you use bookmarks in Power BI?

Answer:

Bookmarks capture the current state of a report page, including filters and visuals, allowing users to save and navigate to specific views.

36. What are Power BI themes?

Answer:

Themes are predefined sets of colors and formatting that can be applied to reports to ensure consistency and improve visual appeal.

37. How do you optimize performance in Power BI reports?

Answer:

Use techniques such as reducing data load, optimizing DAX queries, using aggregations, and indexing data sources.

38. What is the difference between Power BI Pro and Power BI Premium?

Answer:

Power BI Pro is a per-user license that allows for sharing and collaboration, while Power BI Premium offers dedicated resources and enhanced performance for larger-scale deployments

39. How do you use R and Python in Power BI?

Answer:

Use the R and Python visuals in Power BI Desktop to run scripts and create custom visualizations.

40. What is the purpose of the Power BI Service?

Answer:

The Power BI Service is a cloud-based platform for sharing, collaborating on, and managing Power BI reports and dashboards.

41. What is Python and why is it used in data analytics?

Answer:

Python is a high-level programming language known for its simplicity and readability. It is used in data analytics for its powerful libraries and tools for data manipulation, analysis, and visualization.

42. How do you read a CSV file in Python?

Answer:

Use the pandas library.

Example: 
import pandas as pd; df = pd.read_csv('file.csv')

43. What are some popular Python libraries for data analysis?

Answer:

pandas, NumPy, matplotlib, seaborn, and scikit-learn.

44. How do you handle missing values in a dataset using Python?

Answer:

Use pandas functions like dropna() and fillna().

 Example:
 df.dropna() or df.fillna(0)

45. How do you merge two dataframes in Python?

Answer:

Use the merge function in pandas.

 Example:
 pd.merge(df1, df2, on='key')

46. What is a lambda function in Python?

Answer:

A lambda function is an anonymous function defined with the lambda keyword.

 Example:
 lambda x: x + 1

47. How do you visualize data in Python?

Answer:

Use libraries like matplotlib and seaborn.

Example:
import matplotlib.pyplot as plt; plt.plot(data)

48. What is the purpose of the groupby function in pandas?

Answer:

The groupby function is used to group data by one or more columns and apply aggregate functions.

Example:
df.groupby('column').sum()

49. How do you perform linear regression in Python?

Answer:

Use the LinearRegression class from scikit-learn.

Example:
from sklearn.linear_model import LinearRegression; model = LinearRegression(); model.fit(X, y)

50. What is the difference between a list and a dictionary in Python?

Answer:

A list is an ordered collection of items, while a dictionary is an unordered collection of key-value pairs.

Example:
list_example = [1, 2, 3] vs dict_example = {'key': 'value'}

51. What is the difference between Series and DataFrame in pandas?

Answer:

A Series is a one-dimensional labeled array capable of holding any data type, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different data types.

Example:
import pandas as pd
series = pd.Series([1, 2, 3])
dataframe = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

52. How do you filter rows in a DataFrame based on a condition?

Answer:

Use the loc method or boolean indexing.

Example:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
filtered_df = df[df['A'] > 1]

53. How do you apply a function to each element of a DataFrame column?

Answer:

Use the apply method.

Example:
df['C'] = df['A'].apply(lambda x: x * 2)

54. What is the purpose of the pivot_table function in pandas?

Answer:

The pivot_table function is used to create a spreadsheet-style pivot table as a DataFrame.

Example:
pivot_df = df.pivot_table(values='B', index='A', aggfunc='sum')

55. How do you handle datetime data in pandas?

Answer:

Use the pd.to_datetime function to convert a column to datetime.

Example:
df['date'] = pd.to_datetime(df['date_column'])

56. How do you concatenate two DataFrames vertically?

Answer:

Use the concat function with axis=0.

Example:
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})
concatenated_df = pd.concat([df1, df2], axis=0)

57. What is the purpose of the pivot function in pandas?

Answer:

The pivot function is used to reshape data where you need a new column for each unique value in a specified column.

Example:
pivot_df = df.pivot(index='date', columns='item', values='value')

58. How do you calculate the correlation matrix of a DataFrame?

Answer:

Use the corr method.

Example:
correlation_matrix = df.corr()

59. How do you drop duplicate rows in a DataFrame?

Answer:

Use the drop_duplicates method.

Example:
df = df.drop_duplicates()

60. How do you handle missing data in a DataFrame?

Answer:

Use the fillna or dropna methods.

Example:
df = df.fillna(0)  # Fill missing values with 0
df = df.dropna()   # Drop rows with missing values

61. What is a lambda function and how is it used in data analytics?

Answer:

A lambda function is an anonymous function defined with the lambda keyword. It is used for short, throwaway functions.

Example:
df['C'] = df['A'].apply(lambda x: x * 2)

62. How do you read an Excel file into a pandas DataFrame?

Answer:

Use the pd.read_excel function.

Example:
df = pd.read_excel('file.xlsx', sheet_name='Sheet1')

63. How do you save a DataFrame to a CSV file?

Answer:

Use the to_csv method.

Example:
df.to_csv('file.csv', index=False)

64. What is the purpose of the groupby function in pandas?

Answer:

The groupby function is used to group data by one or more columns and apply aggregate functions.

Example:
grouped_df = df.groupby('column').sum()

65. How do you plot data from a DataFrame?

Answer:

Use the plot method from pandas or libraries like matplotlib and seaborn.

Example:
import matplotlib.pyplot as plt
df['A'].plot(kind='bar')
plt.show()

66. How do you calculate the moving average of a column in a DataFrame?

Answer:

Use the rolling method.

Example:
df['moving_avg'] = df['A'].rolling(window=3).mean()

67. What is the purpose of the merge function in pandas?

Answer:

The merge function is used to combine two DataFrames based on a key column.

Example:
merged_df = pd.merge(df1, df2, on='key')

68. How do you handle outliers in a dataset?

Answer:

Use statistical methods to identify and handle outliers, such as Z-scores or the IQR method.

Example:
from scipy import stats
df = df[(np.abs(stats.zscore(df['A'])) < 3)]

69. What is the purpose of the describe method in pandas?

Answer:

The describe method provides summary statistics of the DataFrame.

Example:
summary_stats = df.describe()

70. How do you reshape a DataFrame using the melt function?

Answer:

The melt function unpivots a DataFrame from wide to long format.

Example:
melted_df = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])

71. What is the purpose of the iloc method in pandas?

Answer:

The iloc method is used for integer-location based indexing for selection by position.

Example:
subset = df.iloc[0:5, 0:2]

72. How do you concatenate two DataFrames horizontally?

Answer:

Use the concat function with axis=1.

Example:
      df1 = pd.DataFrame({'A': [1, 2]})
      df2 = pd.DataFrame({'B': [3, 4]})
      concatenated_df = pd.concat([df1, df2], axis=1)

73. What is the purpose of the astype method in pandas?

Answer:

The astype method is used to cast a pandas object to a specified data type.

Example:
df['A'] = df['A'].astype(float)

74. How do you create dummy variables for categorical data in pandas?

Answer:

Use the get_dummies function.

Example:
dummies = pd.get_dummies(df['category_column'])

75. How do you remove a column from a DataFrame?

Answer:

Use the drop method.

Example:
df = df.drop(columns=['column_to_drop'])