开源众包
231人浏览/45人投稿
764天前
已托管赏金
使用Python语言,提交一个.ipynb代码文件(含必要注释)和数据源(如有,提供.cvs或.xlsx),在colab可正常运行,以下为具体要求(参考学习材料见附件):For Building a Prediction Model you can use the Sklearn libraries, which contain a range of useful algorithms (import LinearRegression, SVR, RandomForestRegressor, GradientBoostingRegresso and others. Import the metrics for evaluating models). You can also use different libraries for Python up to you.1. Load a DataFrame for prediction problem or generate a random DataFrame for regression problem with your own setting (use sklearn.datasets.make_regression() function ).2. Make a Data manipulation .2.1. Print 0 to 10 rows and 0 to 5 columns from dataset.2.2. Analyze DataFrame. Remove some columns up to you (use drop () function).2.2. Check the data types of the columns.3. Make Data Visualization. Create a simple line chart.4. Make pre-processing data (see workshop #3).4.1 Remove the columns with more than 20% of gaps or fill any missing data.4.2 Remove unnecessary or duplicated features (use df.duplicated()). Justify the decision to remove the features.4.3 Convert categorical data to numeral values.4.4 Drop data with correlation more than 90%.4.5 Separate the dataset into feature columns and target column.4.6 Create Training and Testing Data.4.7 Perform Data Normalization.5. Fitting Prediction Model ( up to you ) with your own setting. Make result visualization up to you (for example: pie plot, scatter plot, histogram, bar plot) .6. Estimate the model performance. Use Mean absolute error (MAE),Root mean squared error (RMSE),Relative absolute error (RAE), Relative squared error (RSE),the coefficient of determination, often called R2.7. Make visualization for the model performance.