Data Analytics Interview Questions
Answer:
SQL (Structured Query Language) is a programming language used to manage and manipulate relational databases. It is important for data analytics because it allows analysts to extract, manipulate, and analyze data stored in relational databases.
Answer:
You can retrieve data using the SELECT statement.
Example: SELECT * FROM employees;
Answer:
A JOIN is used to combine rows from two or more tables based on a related column. Types include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
Example: SELECT a.name, b.salary FROM employees a INNER JOIN salaries b ON a.id = b.employee_id;
Answer:
WHERE is used to filter records before any groupings are made, while HAVING is used to filter records after grouping.
Example: SELECT department, COUNT(*) FROM employees WHERE age > 30 GROUP BY department HAVING COUNT(*) > 10;
Answer:
You can find duplicates using the GROUP BY and HAVING clauses.
Example: SELECT name, COUNT(*) FROM employees GROUP BY name HAVING COUNT(*) > 1;
Answer:
A subquery is a query within another query.
Example: SELECT name FROM employees WHERE id IN (SELECT employee_id FROM salaries WHERE salary > 50000);
Answer:
You can update data using the UPDATE statement.
Example: UPDATE employees SET salary = 60000 WHERE id = 1;
Answer:
DELETE removes rows one by one and can have a WHERE clause, TRUNCATE removes all rows without logging individual row deletions.
Example: DELETE FROM employees WHERE id = 1; vs TRUNCATE TABLE employees;
Answer:
You can create a table using the CREATE TABLE statement.
Example: CREATE TABLE employees (id INT, name VARCHAR(100), age INT);
Answer:
Indexes are used to speed up the retrieval of rows by creating a data structure that allows quick lookup of values.
Example: CREATE INDEX idx_name ON employees(name);
Answer:
UNION removes duplicate records, UNION ALL includes duplicates.
Example: SELECT name FROM employees1 UNION SELECT name FROM employees2;
Answer:
You can use IS NULL, IS NOT NULL, COALESCE, or IFNULL functions.
Example: SELECT COALESCE(name, 'Unknown') FROM employees;
Answer:
A primary key is a unique identifier for a record in a table.
Example: ALTER TABLE employees ADD PRIMARY KEY (id);
Answer:
Use the LOWER or UPPER functions.
Example: SELECT * FROM employees WHERE LOWER(name) = 'john';
Answer:
CHAR is fixed-length, VARCHAR is variable-length.
Example: CREATE TABLE employees (name CHAR(50)); vs CREATE TABLE employees (name VARCHAR(50));
Answer:
Use the LIMIT clause.
Example: SELECT * FROM employees ORDER BY salary DESC LIMIT 10;
Answer:
A stored procedure is a set of SQL statements that can be stored and executed on the database server.
Example: CREATE PROCEDURE GetEmployees() BEGIN SELECT * FROM employees; END;
Answer:
Normalization is the process of organizing data to reduce redundancy and improve data integrity.
Example: Dividing a single table into multiple tables to eliminate duplicate data.
Answer:
You can use a subquery.
Example: SELECT a.name, (SELECT salary FROM salaries WHERE employee_id = a.id) FROM employees a;
Answer:
A foreign key is a field in one table that uniquely identifies a row of another table.
Example: ALTER TABLE salaries ADD FOREIGN KEY (employee_id) REFERENCES employees(id);
Answer:
Power BI is a business analytics tool by Microsoft that provides interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards.
Answer:
Power BI Desktop, Power BI Service, Power BI Mobile, Power BI Gateway, Power BI Report Server, and Power BI Embedded.
Answer:
Use the Get Data feature in Power BI Desktop to connect to various data sources like SQL Server, Excel, web, etc.
Answer:
DAX (Data Analysis Expressions) is a formula language used in Power BI, Power Pivot, and Analysis Services for creating custom calculations and aggregations.
Example: SUM(Sales[Amount])
Answer:
Go to the Modeling tab, select New Column, and enter your DAX formula.
Example: SUM(Sales[Amount])
Answer:
A measure is a calculation used in aggregations, often created using DAX.
Example: SUM(Sales[Amount])
Answer:
Use Power BI Desktop to connect to data, transform it, and use visualization tools to create charts, graphs, and other visual elements.
Answer:
Dashboards are single-page, often interactive, visual representations of data, created from reports to provide at-a-glance insights.
Answer:
Publish reports to the Power BI Service and share them with others using sharing links, workspaces, or embedding them in websites or applications.
Answer:
Power Query is used for data ingestion and transformation, allowing users to extract, transform, and load (ETL) data from various sources.
Answer:
Set up a data refresh schedule in the Power BI Service to update datasets automatically from the connected data sources.
Answer:
Slicers are visual tools that allow users to filter data in reports and dashboards interactively.
Example: Adding a date slicer to filter sales data by date.
Answer:
Use the Model view to drag and drop fields to create relationships, or use the Manage Relationships feature.
Answer:
Power BI Q&A allows users to ask questions about their data in natural language and get answers in the form of visualizations.
Answer:
Bookmarks capture the current state of a report page, including filters and visuals, allowing users to save and navigate to specific views.
Answer:
Themes are predefined sets of colors and formatting that can be applied to reports to ensure consistency and improve visual appeal.
Answer:
Use techniques such as reducing data load, optimizing DAX queries, using aggregations, and indexing data sources.
Answer:
Power BI Pro is a per-user license that allows for sharing and collaboration, while Power BI Premium offers dedicated resources and enhanced performance for larger-scale deployments
Answer:
Use the R and Python visuals in Power BI Desktop to run scripts and create custom visualizations.
Answer:
The Power BI Service is a cloud-based platform for sharing, collaborating on, and managing Power BI reports and dashboards.
Answer:
Python is a high-level programming language known for its simplicity and readability. It is used in data analytics for its powerful libraries and tools for data manipulation, analysis, and visualization.
Answer:
Use the pandas library.
Example: import pandas as pd; df = pd.read_csv('file.csv')
Answer:
pandas, NumPy, matplotlib, seaborn, and scikit-learn.
Answer:
Use pandas functions like dropna() and fillna().
Example: df.dropna() or df.fillna(0)
Answer:
Use the merge function in pandas.
Example: pd.merge(df1, df2, on='key')
Answer:
A lambda function is an anonymous function defined with the lambda keyword.
Example: lambda x: x + 1
Answer:
Use libraries like matplotlib and seaborn.
Example: import matplotlib.pyplot as plt; plt.plot(data)
Answer:
The groupby function is used to group data by one or more columns and apply aggregate functions.
Example: df.groupby('column').sum()
Answer:
Use the LinearRegression class from scikit-learn.
Example: from sklearn.linear_model import LinearRegression; model = LinearRegression(); model.fit(X, y)
Answer:
A list is an ordered collection of items, while a dictionary is an unordered collection of key-value pairs.
Example: list_example = [1, 2, 3] vs dict_example = {'key': 'value'}
Answer:
A Series is a one-dimensional labeled array capable of holding any data type, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different data types.
Example: import pandas as pd series = pd.Series([1, 2, 3]) dataframe = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
Answer:
Use the loc method or boolean indexing.
Example: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) filtered_df = df[df['A'] > 1]
Answer:
Use the apply method.
Example: df['C'] = df['A'].apply(lambda x: x * 2)
Answer:
The pivot_table function is used to create a spreadsheet-style pivot table as a DataFrame.
Example: pivot_df = df.pivot_table(values='B', index='A', aggfunc='sum')
Answer:
Use the pd.to_datetime function to convert a column to datetime.
Example: df['date'] = pd.to_datetime(df['date_column'])
Answer:
Use the concat function with axis=0.
Example: df1 = pd.DataFrame({'A': [1, 2]}) df2 = pd.DataFrame({'A': [3, 4]}) concatenated_df = pd.concat([df1, df2], axis=0)
Answer:
The pivot function is used to reshape data where you need a new column for each unique value in a specified column.
Example: pivot_df = df.pivot(index='date', columns='item', values='value')
Answer:
Use the corr method.
Example: correlation_matrix = df.corr()
Answer:
Use the drop_duplicates method.
Example: df = df.drop_duplicates()
Answer:
Use the fillna or dropna methods.
Example: df = df.fillna(0) # Fill missing values with 0 df = df.dropna() # Drop rows with missing values
Answer:
A lambda function is an anonymous function defined with the lambda keyword. It is used for short, throwaway functions.
Example: df['C'] = df['A'].apply(lambda x: x * 2)
Answer:
Use the pd.read_excel function.
Example: df = pd.read_excel('file.xlsx', sheet_name='Sheet1')
Answer:
Use the to_csv method.
Example: df.to_csv('file.csv', index=False)
Answer:
The groupby function is used to group data by one or more columns and apply aggregate functions.
Example: grouped_df = df.groupby('column').sum()
Answer:
Use the plot method from pandas or libraries like matplotlib and seaborn.
Example: import matplotlib.pyplot as plt df['A'].plot(kind='bar') plt.show()
Answer:
Use the rolling method.
Example: df['moving_avg'] = df['A'].rolling(window=3).mean()
Answer:
The merge function is used to combine two DataFrames based on a key column.
Example: merged_df = pd.merge(df1, df2, on='key')
Answer:
Use statistical methods to identify and handle outliers, such as Z-scores or the IQR method.
Example: from scipy import stats df = df[(np.abs(stats.zscore(df['A'])) < 3)]
Answer:
The describe method provides summary statistics of the DataFrame.
Example: summary_stats = df.describe()
Answer:
The melt function unpivots a DataFrame from wide to long format.
Example: melted_df = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
Answer:
The iloc method is used for integer-location based indexing for selection by position.
Example: subset = df.iloc[0:5, 0:2]
Answer:
Use the concat function with axis=1.
Example: df1 = pd.DataFrame({'A': [1, 2]}) df2 = pd.DataFrame({'B': [3, 4]}) concatenated_df = pd.concat([df1, df2], axis=1)
Answer:
The astype method is used to cast a pandas object to a specified data type.
Example: df['A'] = df['A'].astype(float)
Answer:
Use the get_dummies function.
Example: dummies = pd.get_dummies(df['category_column'])
Answer:
Use the drop method.
Example: df = df.drop(columns=['column_to_drop'])