EDA Exploratory Data Analysis
Variance is data
Variance is data
Understand the Data:
Begin understanding of your dataset. Look at the size, shape, and data types of your dataset. Understand what each column represents.
Get business understanding of the data
Handle Missing Values: Check for missing values in your dataset and decide how to handle them. You might choose to impute missing values, remove rows or columns with missing values, or leave them as-is depending on the context.
Statistical Summary: Compute summary statistics for numerical columns such as mean, median, standard deviation, minimum, maximum, etc. This gives you an overview of the distribution and variability of your data.
This is very important check the variance of the each columns and covariance among them - highly important
Visualize the Data Distribution: Create visualizations such as histograms, box plots, or density plots to understand the distribution of numerical variables. For categorical variables, use bar plots to visualize the frequency of each category.
Helpful for indepth understanding of data
Explore Relationships: Analyze relationships between variables using scatter plots, correlation matrices, or pair plots. This helps you understand how variables are related to each other and identify potential patterns or trends.
Detect Outliers: Identify outliers or anomalies in your data. Outliers can significantly impact your analysis and may require special treatment. Visualizations such as box plots or scatter plots can help identify outliers.
Feature Engineering: Create new features or transform existing ones to better represent the underlying data. This could involve creating dummy variables for categorical variables, scaling numerical variables, or extracting new features from existing ones.
Explore Time Series Data (if applicable): If your data includes time series information, analyze trends, seasonality, and other patterns over time. Use time series plots, autocorrelation plots, and decomposition techniques to understand the temporal nature of your data.
Segmentation Analysis: Explore how different subgroups within your dataset behave differently. This could involve segmenting your data based on certain criteria and analyzing each segment separately.
DS_Portfolio/Logistic_Regression_ML1.ipynb at main · gauravry/DS_Portfolio (github.com)
https://colab.research.google.com/drive/1WkDmKOEUueBnwMnjvTC6dWhnbZ1OsiCS?usp=sharing
https://colab.research.google.com/drive/1NvAAqPd7NCRLFmuabq0wNJ3ERGD1GumU?usp=sharing