참고 답변
Data Wrangling is very much related concepts to Data Preprocessing. It's also known as Data munging. It involves the process of cleaning, transforming, and organizing the raw, messy or unstructured data into a usable format. The main goal of data wrangling is to improve the quality and structure of the dataset. So, that it can be used for analysis, model building, and other data-driven tasks.
Data wrangling can be a complicated and time-consuming process, but it is critical for businesses that want to make data-driven choices. Businesses can obtain significant insights about their products, services, and bottom line by taking the effort to wrangle their data.
Some of the most common tasks involved in data wrangling are as follows:
- Data Cleaning: Identify and remove the errors, inconsistencies, and missing values from the dataset.
- Data Transformation: Transformed the structure, format, or values of data as per the requirements of the analysis. that may include scaling & normalization, encoding categorical values.
- Data Integration: Combined two or more datasets, if that is scattered from multiple sources, and need of consolidated analysis.
- Data Restructuring: Reorganize the data to make it more suitable for analysis. In this case, data are reshaped to different formats or new variables are created by aggregating the features at different levels.
- Data Enrichment: Data are enriched by adding additional relevant information, this may be external data or combined aggregation of two or more features.
- Quality Assurance: In this case, we ensure that the data meets certain quality standards and is fit for analysis.