一奇唤、pandas讀取csv文件
import pandas as pd
f = open("C:\\Users\\matrix\\lz_business_advisor_dashboard_key_metrics_source.csv", encoding = 'utf-8')
pd.read_csv(f)
參考
【1】詳解pandas的read_csv方法:https://blog.csdn.net/weixin_37706204/article/details/120827141
二咬最、pandas讀取excel文件
import pandas as pd
data = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, index_col = 0, nrows = 3)
data
其中,
sheet_name:第幾個(gè)sheet頁,是從0開始的
header:第幾行是header
index_col :其實(shí)列
nrows:取多少行
三踪宠、pandas在指定列添加一列
import pandas as pd
df = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df
四冲泥、pandas寫入excel文件
import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df.to_excel(r"C:\Users\matrix\test.xlsx")
五、pandas寫入數(shù)據(jù)庫
import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# create conn
conn = create_engine('mysql+pymysql://root:123456@localhost:3306/test', encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace')
print(insert_rows)
# df.to_excel(r"C:\Users\matrix\test.xlsx")
參考:pandas 寫入mysql數(shù)據(jù)庫.to_sql方法詳解
六怀伦、pandas處理表頭
import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df = pd.DataFrame(df, columns = ['date_of_data','Date'])
df.columns = (['date_of_data', 'date'])
df
七脆烟、pandas正則
replace方法不支持正則,需要用sub方法
demo: 將所有表頭除了數(shù)字房待、字母和下劃線以外的字符全部替換為下劃線邢羔,并且全部字符小寫
import pandas as pd
import re
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
# field = field.replace(' ', '')
# field = regex1.sub('_', field)
# transform except 'A-Za-z0-9' char to '_'
field = re.sub(r'[^A-Za-z0-9]', r'_', field)
# transform multiple '_' char to '_'
field = re.sub(r'_+', r'_', field)
# lower all char
field = field.lower()
# field = regex2.sub('#', field)
field_list.append(field)
df.columns = tuple(field_list)
df
八、pandas將所有類改為str類型
import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df[:] = df[:].astype(str)
df.dtypes
九桑孩、pandas寫入數(shù)據(jù)庫列類型
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
dtype={col_name: VARCHAR(500) for col_name in df}
)
import pandas as pd
import pymysql
import sqlalchemy
from sqlalchemy import create_engine
import re
from sqlalchemy.types import VARCHAR
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
# field = field.replace(' ', '')
# field = regex1.sub('_', field)
# transform except 'A-Za-z0-9' char to '_'
field = re.sub(r'[^A-Za-z0-9]', r'_', field)
# transform multiple '_' char to '_'
field = re.sub(r'_+', r'_', field)
# lower all char
field = field.lower()
# field = regex2.sub('#', field)
field_list.append(field)
df.columns = tuple(field_list)
# transform all columns to str type, str map to db text
# df[:] = df[:].astype(str)
# df.dtypes
# df[field_list]
# create conn
conn_string = 'mysql+pymysql://root:123456@localhost:3306/test'
conn = create_engine(conn_string, encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
dtype={col_name: VARCHAR(500) for col_name in df}
)
print(insert_rows)
參考:
[1] Pandas to_sql將列類型從varchar更改為text
[2] pandas to_sql all columns as nvarchar (可行方案)