pandas自定义函数

news/2024/7/7 9:57:51
  1. sort_values和reset_index

    new_titanic_survival = titanic_survival.sort_values("Age",ascending=False)
    print (new_titanic_survival[0:10])
    titanic_reindexed = new_titanic_survival.reset_index(drop=True)
    print(titanic_reindexed.iloc[0:10])
    

    运行结果:
    在这里插入图片描述

  2. 自定义函数

    # This function returns the hundredth item from a series
    def hundredth_row(column):
        # Extract the hundredth item
        hundredth_item = column.iloc[99]
        return hundredth_item
    
    # Return the hundredth item from each column
    hundredth_row = titanic_survival.apply(hundredth_row)
    print (hundredth_row)
    

    运行结果:
    在这里插入图片描述

  3. 非0行个数

    def not_null_count(column):
        column_null = pd.isnull(column)
        null = column[column_null]
        return len(null)
    
    column_null_count = titanic_survival.apply(not_null_count)
    print (column_null_count)
    

    运行结果:
    在这里插入图片描述

  4. 练习

    #By passing in the axis=1 argument, we can use the DataFrame.apply() method to iterate over rows instead of columns.
    def which_class(row):
        pclass = row['Pclass']
        if pd.isnull(pclass):
            return "Unknown"
        elif pclass == 1:
            return "First Class"
        elif pclass == 2:
            return "Second Class"
        elif pclass == 3:
            return "Third Class"
    
    classes = titanic_survival.apply(which_class, axis=1)
    print (classes)
    

    运行结果:
    在这里插入图片描述

  5. 连续值离散化

    def is_minor(row):
        if row["Age"] < 18:
            return True
        else:
            return False
    
    minors = titanic_survival.apply(is_minor, axis=1)
    #print minors
    
    def generate_age_label(row):
        age = row["Age"]
        if pd.isnull(age):
            return "unknown"
        elif age < 18:
            return "minor"
        else:
            return "adult"
    
    age_labels = titanic_survival.apply(generate_age_label, axis=1)
    print (age_labels)
    

    运行结果:
    在这里插入图片描述

  6. 添加列

    titanic_survival['age_labels'] = age_labels
    age_group_survival = titanic_survival.pivot_table(index="age_labels", values="Survived")
    print (age_group_survival)
    

    运行结果:
    在这里插入图片描述


http://www.niftyadmin.cn/n/4714810.html

相关文章

Series结构

读取csv文件&#xff1a; import pandas as pd fandango pd.read_csv(fandango_score_comparison.csv) series_film fandango[FILM] print(series_film[0:5]) series_rt fandango[RottenTomatoes] print (series_rt[0:5])运行结果&#xff1a; 制作Series # Import the Se…

折线图的绘制

to_datetime import pandas as pd unrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) print(unrate.head(12))运行结果&#xff1a; 绘图 from pandas.plotting import register_matplotlib_converters #%matplotlib inline #Using the different…

技术人员不应该固步自封

能力的提高不是通过量&#xff0c;而是通过质来提高的。 经常听到人们说&#xff0c;这点东西犯不到花这么大力气。 如果是学术问题&#xff0c;我觉得OK&#xff0c;确实是这样&#xff0c;因为有思路就行了。 但是技术问题则不同&#xff0c;光有想法是不够的。工程上是要…

子图的操作

读数据绘图&#xff1a; import pandas as pd from pandas.plotting import register_matplotlib_convertersunrate pd.read_csv(unrate.csv) unrate[DATE] pd.to_datetime(unrate[DATE]) first_twelve unrate[0:12] plt.plot(first_twelve[DATE], first_twelve[VALUE]) plt…

字符串相似度算法 / The Arithmetic of String Similarity Degree

dongle2001的《字符串相似度算法介绍(整理)》中提到&#xff0c;算法分为三类&#xff1a; 1、编辑距离&#xff08;Levenshtein Distance&#xff09; 编辑距离就是用来计算从原串&#xff08;s&#xff09;转换到目标串(t)所需要的最少的插入&#xff0c;删除和替换 的数目…

条形图与散点图

取出一行数据 import pandas as pd reviews pd.read_csv(fandango_scores.csv) cols [FILM, RT_user_norm, Metacritic_user_nom, IMDB_norm, Fandango_Ratingvalue, Fandango_Stars] norm_reviews reviews[cols] print(norm_reviews[:1])运行结果&#xff1a; 显示柱形图…

概要设计与详细设计 / Conceptual Design and Detail Design

概要设计与详细设计的区别 概要设计就是设计软件的结构&#xff0c;包括组成模块&#xff0c;模块的层次结构&#xff0c;模块的调用关系&#xff0c;每个模块的功能等等。同时&#xff0c;还要设计该项目的应用系统的总体数据结构和数据库结构&#xff0c;即应用系统要存储什…

柱形图和盒图

读取数据 import pandas as pd import matplotlib.pyplot as plt reviews pd.read_csv(fandango_scores.csv) cols [FILM, RT_user_norm, Metacritic_user_nom, IMDB_norm, Fandango_Ratingvalue] norm_reviews reviews[cols] print(norm_reviews[:5])运行结果&#xff1a; …