🍦 文件就是操作系统提供给用户或应用程序操作硬盘的一种功能，任何语言都离不开对文件的操作，Python语言更不例外。

1 文本

文本
- .txt、.csv、.log等格式的文本文件，可以使用内置库进行处理。
- .txt、.log文件使用open()打开，read()或readlines()读取。
- .csv文件则需要用到内置库csv，通过csv内置库解析和处理CSV数据。

1-1 打开文件

打开文件
- open()函数，文件处理的关键函数，有filename和mode两个参数。
- 打开文件的模式
  - x：创建指定文件，如果文件存在则返回错误。
  - r：默认值，打开文件进行读取，如果文件不存在，则出错。
  - w：打开文件进行内容的写入，如果文件不存在则创建文件。
  - a：打开文件进行内容的追加，如果文件不存在则创建文件。
- 指定文件的形式
  - t：默认值，文本模式。
  - b：二进制，例如图像。
- 打开文件进行读取，指定文件名就可以了。

1-2 读取文件

读取文件
- read()方法
  - open()函数返回一个文件对象，该对象的read()方法可以读取文件的内容。
  - 若文件位于其他位置，须指定文件路径，read()方法可指定要返回的字符数。
- readline()方法
  - 使用该方法将返回文件的一行内容，调用两次则可以获取前两行。
  - 通过遍历文件的行，可以逐行读取整个文件，完成操作后将文件关闭是个好习惯。
  - 在某些情况下，由于缓冲，对文件所做的更改可能在关闭文件之前不会显示出来。

(1) read()

x = open("file/txt_file.txt", "r")                         # 确保文件存在，否则报错
print(x.read())                                            # 获取文件内容

y = open("file/txt_file.txt", "r")                         # 相对路径
print(y.read(3))                                           # 返回文件的前3个字符

# f = open("D:\\txt_file.txt", "r")                        # 指定文件路径，绝对路径
# print(f.read())

(2) readline()

x = open("file/txt_file.txt", "r")
print(x.readline())                                       # 读取文件的一行
print(x.readline())                                       # 读取文件的前两行

y = open("file/txt_file.txt", "r")
for i in y:                                               # 逐行循环文件
    print(i)

x.close()                                                 # 完成后关闭文件
y.close()

1-2 写入文件

写入文件
- 写入现有文件，必须向open()函数添加参数。
  - a：将内容追加到文件的末尾。
  - w：将内容覆盖任何现有内容。
- 创建新文件，使用open()函数带以下参数之一的方法。
  - x：创建一个文件，如果文件存在，则将返回错误。
  - w：打开文件进行内容的写入，如果文件不存在则创建文件。
  - a：打开文件进行内容的追加，如果文件不存在则创建文件。

x = open("file/text_file.txt", "a")                       # 打开文件并将内容附加到文件中，文件不存在则创建
x.write("Now the file has more content!")
x.close()
x = open("file/text_file.txt", "r")
print(x.read())

y = open("file/text_file.txt", "w")                       # 打开文件并覆盖整个文件的内容，文件不存在则创建
y.write("Woops! I have deleted the content!")
y.close()
y = open("file/text_file.txt", "r")
print(y.read())

1-3 删除文件

删除文件
- os模块：删除整个文件夹，需要使用os.rmdir()方法。
- 导入模块并运行os.remove()方法对文件进行删除操作。
- 为避免出现错误，需要在尝试删除文件之前检查文件是否存在。

import os

os.rmdir("test")                                          # 删除文件夹，只有文件夹为空时才可删除
os.remove("file/test_txt.txt")                            # 删除一个文件

x = open("file/test_txt.txt", "a")                        # 打开文件并将内容附加到文件中
x.write("Now the file has more content!")
x.close()

if os.path.exists("file/test_txt.txt"):
    os.remove("file/test_txt.txt")                        # 检查文件是否存在，然后将其删除
else:
    print("The file does not exist.")

2 CSV

import csv


def write_csv(file_path, data):
    with open(file_path, "w", newline="") as file:
        writer = csv.writer(file)                         # 写入数据
        writer.writerows(data)


def read_csv(file_path):
    with open(file_path, "r") as file:
        reader = csv.reader(file)                         # 读取数据
        for row in reader:
            print(row)


def delete_rows(file_path, rows_to_delete):
    with open(file_path, "r") as file:                    # 删除数据
        reader = csv.reader(file)
        rows = list(reader)
    with open(file_path, "w", newline="") as file:
        writer = csv.writer(file)
        for i, row in enumerate(rows):
            if i+1 not in rows_to_delete:                 # 行号从1开始
                writer.writerow(row)


if __name__ == "__main__":
    data = [
        ["Lily", "28", "女"],
        ["John", "25", "男"], ["Lucy", "30", "女"],
        ["Ross", "20", "女"], ["Yves", "26", "男"]
    ]
    file_name = r"file/csv_file.csv"
    write_csv(file_name, data)
    read_csv(file_name)
    print("--------------------")
    delete_rows(file_name, [2, 4])                        # 删除第2行和第4行，CSV文件结构不支持直接删除列
    read_csv(file_name)

3 XML

import xml.etree.ElementTree as ET


def write_xml(file_path, data):                           # 写入数据
    root = ET.Element("data")
    for row in data:
        child = ET.SubElement(root, "row")
        for i, column in enumerate(row):
            sub_element = ET.SubElement(child, "column")
            sub_element.text = column
    tree = ET.ElementTree(root)
    tree.write(file_path)


def read_xml(file_path):                                  # 读取数据
    tree = ET.parse(file_path)
    root = tree.getroot()                                 # 获取根元素
    for row in root.iter("row"):
        for column in row.iter("column"):
            print(column.text)


def delete_xml(file_path, row_id=None, column_id=None):
    tree = ET.parse(file_path)                            # 解析XML数据
    root = tree.getroot()                                 # 获取根元素

    if row_id is not None:                                # 删除指定行
        for child in root.findall("row"):                 # 判断是否为要删除的行
            if child.attrib.get("id") == str(row_id):
                root.remove(child)                        # 从解析树中移除该行

    if column_id is not None:                             # 删除指定列
        for row in root.findall("row"):                   # 判断是否为要删除的列
            columns = row.findall("column")
            if len(columns) >= column_id:                 # 从解析树中移除该列
                row.remove(columns[column_id - 1])

    tree.write(file_path)                                 # 将修改后的解析树写回XML文件


if __name__ == "__main__":
    data = [
        ["Lily", "AAAA", "1111"],
        ["John", "BBBB", "2222"], ["Lucy", "DDDD", "4444"],
        ["Ross", "CCCC", "3333"], ["Yves", "EEEE", "5555"]
    ]
    file_name = r"file/xml_file.xml"
    write_xml(file_name, data)
    read_xml(file_name)                                   # 删除第2行和第3列
    print("----")
    delete_xml(file_name, row_id=2, column_id=3)
    read_xml(file_name)

4 Json

import json


def write_json(file_path, data):                          # 写入数据
    with open(file_path, "w") as file:
        json.dump(data, file)


def read_json(file_path):                                 # 读取数据
    with open(file_path, "r") as file:
        data = json.load(file)
        return data


def delete_json(file_path):
    with open(file_path, "r") as file:
        data = json.load(file)                            # 读取Json数据
    for item in data:                                     # 删除特定的键值对
        if item.get("Age") == "30":
            del item["Age"]
    with open(file_path, "w") as file:                    # 将修改后的数据写回文件
        json.dump(data, file)


if __name__ == "__main__":
    data = [
        {"Name": "John", "Age": "25", "Sex": "男"},
        {"Name": "Lucy", "Age": "30", "Sex": "女"},
        {"Name": "Ross", "Age": "20", "Sex": "女"},
        {"Name": "Yves", "Age": "26", "Sex": "男"}
    ]
    file_name = r"file/json_file.json"
    write_json(file_name, data)
    print(read_json(file_name))
    delete_json(file_name)
    print(read_json(file_name))

5 Excel

Excel
- xlwt库用于将数据写入Excel中，不支持xlsx格式，仅支持xls格式的Excel，命令pip install xlwt进行安装。
- xlrd库用于读取Excel中的数据，新版不支持xlsx格式，若要支持，命令pip install xlrd==1.2.0进行安装。
  - Python3.9新版中，xlrd库更新删除了getiterator()方法，xlrd版本1.2.0在读取xlsx格式文档时会报错。
  - 执行时可能报错信息：AttributeError: 'ElementTree' object has no attribute 'getiterator'。
  - 到Python目录的\Lib\site-packages\xlrd\xlsx.py文件中，将两个getiterator()都改为iter()即可。
- 常用单元格中的数据类型：empty(空的)、string(文本)、number、date、boolean、error、blank(空白表格)。
- Excel中三大对象：Cell(单元格对象)、Sheet(工作表对象)、WorkBook(工作簿对象)。
  - style_compression=0：表示是否压缩。
  - cell_overwrite_ok=True：是否可以覆盖单元格(覆写)，默认为False。
  - worksheet.write(行, 列, 值)：向Sheet页的第一行第一列写入数据值。

import xlwt
import xlrd

wbook = xlwt.Workbook(encoding="utf-8", style_compression=0)
sheet = wbook.add_sheet("tabname", cell_overwrite_ok=True)
sheet.write(0, 0, "xlwt")                                 # 第1行第1列写入xlwt
wbook.save("file/excel_file.xls")                         # 保存到当前路径file目录的excel_file.xls中

data = xlrd.open_workbook("file/excel_file.xls")          # 读取excel_file.xls中的数据
print(data)

5-1 写入样式

写入样式
- xlwt中设置单元格样式主要通过XFStyle这个类来完成。
- font：字体，Font类实例。
- pattern：填充，Pattern类实例。
- borders：边框，Borders类实例。
- alignment：对齐，Alignment类实例。
- protection：保护，Protection类实例。
- num_format_str(属性名)：数据格式(属性)，str(值类型)。

(1) 字体

import xlwt

wbook = xlwt.Workbook(encoding="utf-8")
sheet = wbook.add_sheet("font", cell_overwrite_ok=True)

style = xlwt.XFStyle()                                    # 初始化样式
fonts = xlwt.Font()                                       # 为样式创建字体

fonts.name = "Arial"                                      # 指定字体
fonts.height = 200                                        # 与Excel字体大小比例是1:20，200/20=10号字体
fonts.bold = True                                         # 加粗
fonts.underline = True                                    # 加下划线
fonts.struck_out = True                                   # 加横线
fonts.italic = True                                       # 斜体字
fonts.colour_index = 4                                    # 字体颜色

style.font = fonts                                        # 设定字体样式
sheet.write(0, 0, "no style")                             # 向sheet页添加数据
sheet.write(1, 0, "font", style)

wbook.save("file/excel_file.xls")                         # 保存到当前路径file目录的excel_file.xls中

(2) 背景颜色

import xlwt

wbook = xlwt.Workbook(encoding="utf-8")
sheet = wbook.add_sheet("pattern", cell_overwrite_ok=True)

style = xlwt.XFStyle()                                    # 初始化样式
pattern = xlwt.Pattern()                                  # 为样式创建背景图案

pattern.pattern = xlwt.Pattern.SOLID_PATTERN              # 设置背景颜色模式
pattern.pattern_fore_colour = 3                           # 不同值代表不同背景色

style.pattern = pattern                                   # 设定背景图案样式
sheet.write(0, 0, "no style")                             # 向sheet页添加数据
sheet.write(1, 0, "pattern", style)

wbook.save("file/excel_file.xls")                         # 保存到当前路径file目录的excel_file.xls中

(3) 边框设置

import xlwt

wbook = xlwt.Workbook(encoding="utf-8")
sheet = wbook.add_sheet("borders", cell_overwrite_ok=True)

style = xlwt.XFStyle()                                    # 初始化样式
borders = xlwt.Borders()                                  # 为样式创建边框设置

borders.left = xlwt.Borders.THIN                          # 设定边框属性
borders.right = xlwt.Borders.THIN
borders.top = xlwt.Borders.THIN
borders.bottom = xlwt.Borders.THIN

style.borders = borders                                   # 设定边框样式
sheet.write(0, 0, "no style")                             # 向sheet页添加数据
sheet.write(1, 0, "borders", style)

wbook.save("file/excel_file.xls")                         # 保存到当前路径file目录的excel_file.xls中

(4) 对齐方式

import xlwt

wbook = xlwt.Workbook(encoding="utf-8")
sheet = wbook.add_sheet("alignment", cell_overwrite_ok=True)

style = xlwt.XFStyle()                                    # 初始化样式
alignment = xlwt.Alignment()                              # 为样式创建对齐方式设置

alignment.vert = 0x01                                     # vert垂直对齐：0x00上、0x01中、0x02下
alignment.horz = 0x03                                     # horz水平对齐：0x01左、0x02中、0x03右
alignment.wrap = 1                                        # 设定自动换行

style.alignment = alignment                               # 设定对齐方式
sheet.write(0, 0, "no style")                             # 向sheet页添加数据
sheet.write(1, 0, "alignment\nalignment\nalignment", style)

wbook.save("file/excel_file.xls")                         # 保存到当前路径file目录的excel_file.xls中

(5) 单元格格式

import xlwt
from datetime import datetime

wbook = xlwt.Workbook(encoding="utf-8")
sheet = wbook.add_sheet("format", cell_overwrite_ok=True)

data = "2022-02-02"
style = xlwt.XFStyle()                                    # 初始化样式
num_format_str = "YYYY\/MM\/DD"                           # ..\site-packages\xlwt\Style.py文件中可查
style.num_format_str = num_format_str

sheet.write(0, 0, data)                                   # 向sheet页添加数据
sheet.write(1, 0, datetime.strptime(data, "%Y-%m-%d").date(), style)

wbook.save("file/excel_file.xls")                         # 保存到当前路径file目录的excel_file.xls中

(6) 列宽与行高*

import xlwt

wbook = xlwt.Workbook(encoding="utf-8")
sheet = wbook.add_sheet("col", cell_overwrite_ok=True)

sheet.write(0, 0, "no style")                             # 向sheet页添加数据
sheet.write(0, 1, "col")
sheet.write(1, 0, "col")
sheet.col(1).width = 256*30                               # 设定第1列列宽，约为30个字符
sheet.row(1).set_style(xlwt.easyxf("font: height 720"))   # 44.4磅，怎么换算的？？？

wbook.save("file/excel_file.xls")                         # 保存到当前路径file目录的excel_file.xls中

(7) 行与列合并

import xlwt

wbook = xlwt.Workbook(encoding="utf-8")
sheet = wbook.add_sheet("merge", cell_overwrite_ok=True)

sheet.write(0, 0, "no style")                              # 向sheet页添加数据
sheet.write_merge(0, 3, 1, 4, "merge")                     # 1-4行，2-5列，合并单元格

wbook.save("file/excel_file.xls")                          # 保存到当前路径file目录的excel_file.xls中

5-2 读取数据

# file/excel_file.xlsx                                     # 操作时将该行删除，并把页签重命名为“info”
Name  Age Language  Registration date
Sims  30  C         2020-12-20
Peck  29  C#        2020-12-21
Lane  28  C++       2020-12-22
Fred  27  Java      2020-12-23
Miya  26  Python    2020-12-24

(1) 获取Sheet

import xlrd

data = xlrd.open_workbook("file/excel_file.xlsx")         # 读取文件中的数据

sheet1 = data.sheets()[0]                                 # 索引顺序获取首个sheet
sheet2 = data.sheet_by_index(0)                           # 索引顺序获取首个sheet
sheet3 = data.sheet_by_name("info")                       # 通过sheet获取，上述方式若不存在sheet会报错
sheetNames = data.sheet_names()                           # 获取文件中的所有sheet页面名称
print(sheet1)
print(sheet2)
print(sheet3)
print("All sheet page names: " + str(sheetNames))

(2) 获取行操作

import xlrd

data = xlrd.open_workbook("file/excel_file.xlsx")         # 读取文件中的数据
sheet = data.sheet_by_index(0)                            # 获取首个sheet页数据

nrows = sheet.nrows                                       # 获取该sheet中的有效行数
print("Number of valid nrows: " + str(nrows))

cells = sheet.row_len(0)                                  # 获取第一行中有效单元格数
print("Number of valid cells: " + str(cells))

firstLine = sheet.row_values(0)                           # 获取第一行内容
print(firstLine)

for r in range(nrows):                                    # 获取所有行数据
    print(sheet.row_values(r))

print(sheet.row_slice(0))                                 # 返回由该行中所有的单元格对象组成的列表

(3) 获取列操作

import xlrd

data = xlrd.open_workbook("file/excel_file.xlsx")         # 读取文件中的数据
sheet = data.sheet_by_index(0)                            # 获取首个sheet页数据

ncols = sheet.ncols                                       # 获取该sheet中的有效列数
print("Number of valid ncols: " + str(ncols))
print(sheet.col(0, 0, 2))                                 # 返回第一行和第二行的单元格对象组成的列表
print(sheet.col_slice(0, 0, 2))                           # 返回第一行和第二行的单元格对象组成的列表
print(sheet.col_values(0, 0, 2))                          # 返回第一行和第二行的单元格数据组成的列表

(4) 获取单元格

import xlrd

data = xlrd.open_workbook("file/excel_file.xlsx")          # 读取文件中的数据
sheet = data.sheet_by_index(0)                             # 获取首个sheet页数据

print(sheet.cell(0, 0))                                    # 返回第一行第一列的单元格对象
print("Data: " + str(sheet.cell_value(0, 0)))              # 返回第一行第一列的单元格数据

6 SQLite

import sqlite3


def connect_to_database(file_path):                       # 连接数据库，不存在则创建
    conn = sqlite3.connect(file_path)
    return conn


def create_table(conn):                                   # 创建表格
    cursor = conn.cursor()
    cursor.execute("""CREATE TABLE IF NOT EXISTS employees
                      (id INTEGER PRIMARY KEY, name TEXT, age INTEGER)""")
    conn.commit()
    cursor.close()


def insert_data(conn, name, age):                         # 插入数据
    cursor = conn.cursor()
    cursor.execute("INSERT INTO employees (name, age) VALUES (?, ?)", (name, age))
    conn.commit()
    cursor.close()


def select_data(conn):                                    # 查询数据
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM employees")
    rows = cursor.fetchall()
    for row in rows:
        print(row)
    cursor.close()


def update_data(conn, name, age):                         # 更新数据
    cursor = conn.cursor()
    cursor.execute("UPDATE employees SET age = ? WHERE name = ?", (age, name))
    conn.commit()
    cursor.close()


def delete_data(conn, name):                              # 删除数据
    cursor = conn.cursor()
    cursor.execute("DELETE FROM employees WHERE name = ?", (name,))
    conn.commit()
    cursor.close()


def close_connection(conn):                               # 关闭数据库连接
    conn.close()


if __name__ == "__main__":
    file_name = r"file/sqlite3_file.db"
    conn = connect_to_database(file_name)
    create_table(conn)
    insert_data(conn, "John", 25)
    insert_data(conn, "Jane", 30)
    select_data(conn)
    print("---------------")
    update_data(conn, "John", 35)
    delete_data(conn, "Jane")
    select_data(conn)
    close_connection(conn)

7 压缩文件

压缩文件
- 处理压缩解压ZIP格式的文件，可以使用Python的内置库zipfile模块进行操作。
- 若需支持其他压缩解压格式，如RAR、7ZIP、TAR.GZ等，则可使用第三方库。
  - py7zr库：pip install py7zr，7ZIP的压缩解压。
  - rarfile库：pip install rarfile，RAR压缩解压。
  - pyunpack库：pip install pyunpack，TAR.GZ、TAR.BZ2、TAR、ZIP、7Z、ISO等多种格式。

import os
import zipfile


def zip_file(file_path, zip_file):                        # 压缩单个文件
    with zipfile.ZipFile(zip_file, "w", zipfile.ZIP_DEFLATED) as zipf:
        zipf.write(file_path, arcname=os.path.basename(file_path))


def zip_files(file_path, zip_file):                       # 压缩多个文件
    with zipfile.ZipFile(zip_file, "w", zipfile.ZIP_DEFLATED) as zipf:
        for file in file_path:
            zipf.write(file, arcname=os.path.basename(file))


def zip_folder(zip_file, folder_path):                    # 压缩文件夹
    with zipfile.ZipFile(zip_file, "w", zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_path):
            for file in files:                            # 所有文件直接添加到压缩文件的根目录下
                zipf.write(os.path.join(root, file))


def zip_folders(zip_file, folder_path):                   # 压缩过程中保留文件夹结构
    with zipfile.ZipFile(zip_file, "w", zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_path):
            for file in files:
                zipf.write(
                    os.path.join(root, file),
                    os.path.relpath(                      # 保留文件夹结构的关键，生成一个相对路径
                        os.path.join(root, file), os.path.join(folder_path, "..")
                    )                                     # 从zip_folder到os.path.join(root, file)
                )


def zip_folders_ext(zip_file, folder_path, extensions):
    with zipfile.ZipFile(zip_file, "w", zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_path):
            for file in files:                            # 压缩特定文件扩展名，保留文件夹结构
                if file.endswith(tuple(extensions)):
                    zipf.write(
                        os.path.join(root, file), os.path.relpath(
                            os.path.join(root, file), os.path.join(folder_path, "..")
                        )
                    )


def unzip_file(unzip_file, folder_path):                  # 文件解压，默认保留文件夹结构
    with zipfile.ZipFile(unzip_file, "r") as zipf:
        zipf.extractall(folder_path)


if __name__ == "__main__":                                # 必须先创建zip文件夹，否则报错
    zip_file(r"./file/json_file.json", r"zip/zip_file.zip")
    zip_files([
        r"./file/json_file.json", r"./file/sqlite3_file.db", r"./file/csv_file.csv"
    ], r"zip/zip_files.zip")
    zip_folder(                                           # 压缩当前文件夹下file目录中的所有文件
        r"zip/zip_folder.zip", r"./file"                  # 也是按照文件夹结构进行压缩的？？？
    )
    zip_folders(r"zip/zip_folders.zip", r"./file")
    zip_folders_ext(r"zip/zip_folders_ext.zip", r"./file", ".csv")
    unzip_file(r"zip/zip_file.zip", "./zip")
    unzip_file(r"zip/zip_folders.zip", "./zip")

Python

#内置库 #第三方库 #文件处理

Python 文件处理

https://stitch-top.github.io/2021/02/03/python/python03-python-wen-jian-chu-li/

作者

Dr.626

发布于

2021年2月3日 22:23:57

许可协议

Python NumPy 基础上一篇

Python 基础(二) 下一篇