在上手之前先確保你的這些軟件/第三方庫已經(jīng)裝好囱桨,可以使用:
- oracle database 11g 客戶端+服務(wù)端
- python 3.7.7
注:如果你的python是64位的万哪, oracle database的客戶端也要求是64位。 - cx-Oracle 7.3.0
注:連接ORACLEDB需要裝cx_oracle脐恩,連接MYSQL需要裝pymysql镐侯,連接Hadoop(HDFS)需要裝pyhive。無論操作什么數(shù)據(jù)庫驶冒,SQLAlchemy的查詢語法幾乎一致苟翻,代碼可重用性高。而原生SQL是存在語法差異的骗污,同一SQL語句崇猫,換個(gè)數(shù)據(jù)庫就可能運(yùn)行不了了。 - SQLAlchemy 1.3.16
-
pandas 1.0.3
如果pandas安裝成功但是import pandas失斝璺蕖(WIN10系統(tǒng))诅炉,請(qǐng)將concrt140.dll、msvcp140.dll放到python的安裝目錄下屋厘。如圖:
或者安裝這個(gè):
pandas是一個(gè)數(shù)據(jù)分析庫涕烧,完虐EXCEL。筆者學(xué)會(huì)用Python之后就把Office卸載了汗洒。
筆者是一個(gè)重度潔癖患者,Python和VSCode都用綠色版溢谤。
oracle客戶端用的也是綠色版瞻凤,服務(wù)端沒辦法弄綠色版,就裝在自家的一臺(tái)服務(wù)器上溯香,通過局域網(wǎng)訪問鲫构。
準(zhǔn)備演示數(shù)據(jù)庫
為了演示,先創(chuàng)建表和數(shù)據(jù):
新建cmd終端 用數(shù)據(jù)庫管理員登錄oracle database:
> sqlplus sys/123456 as sysdba
創(chuàng)建新用戶
>>> create user abc identified by 123;
授予DBA權(quán)限
grant connect,resource,dba to abc;
用新的用戶登錄
> sqlplus abc/321
創(chuàng)建第一張表
>>> CREATE TABLE Websites
(
id integer PRIMARY KEY,
name varchar(255),
url varchar(255),
alexa integer,
country varchar(255)
);
向第一張表插入數(shù)據(jù)
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('1','Google','https://www.google.cm/','1','USA');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('2','淘寶','https://www.taobao.com/','13','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('3','菜鳥教程','http://www.runoob.com/','4689','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('4','微博','http://weibo.com/','20','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('5','Facebook','https://www.facebook.com/','3','USA');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('6','百度','https://www.baidu.com/','4','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('7','stackoverflow','http://stackoverflow.com/','0','IND');
查看數(shù)據(jù)
>>> select * from Websites;
+----+---------------+---------------------------+-------+---------+
| id | name | url | alexa | country |
+----+---------------+---------------------------+-------+---------+
| 1 | Google | https://www.google.cm/ | 1 | USA |
| 2 | 淘寶 | https://www.taobao.com/ | 13 | CN |
| 3 | 菜鳥教程 | http://www.runoob.com/ | 4689 | CN |
| 4 | 微博 | http://weibo.com/ | 20 | CN |
| 5 | Facebook | https://www.facebook.com/ | 3 | USA |
| 6 | 百度 | https://www.baidu.com/ | 4 | CN |
| 7 | stackoverflow | http://stackoverflow.com/ | 0 | IND |
+----+---------------+---------------------------+-------+---------+
創(chuàng)建第二張表
>>> CREATE TABLE access_log
(
aid integer PRIMARY KEY,
site_id integer,
count integer,
col_date date,
CONSTRAINT fk_site_id FOREIGN KEY (site_id) REFERENCES Websites(id)
);
插入數(shù)據(jù)
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('1','1','45',to_date('2016-05-10','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('2','3','100',to_date('2016-05-13','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('3','1','230',to_date('2016-05-14','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('4','2','10',to_date('2016-05-14','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('5','5','205',to_date('2016-05-14','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('6','4','13',to_date('2016-05-15','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('7','3','220',to_date('2016-05-15','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('8','5','545',to_date('2016-05-16','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('9','3','201',to_date('2016-05-17','yyyy-mm-dd'));
查看數(shù)據(jù)
>>> select * from access_log
+-----+---------+-------+------------+
| aid | site_id | count | col_date |
+-----+---------+-------+------------+
| 1 | 1 | 45 | 2016-05-10 |
| 2 | 3 | 100 | 2016-05-13 |
| 3 | 1 | 230 | 2016-05-14 |
| 4 | 2 | 10 | 2016-05-14 |
| 5 | 5 | 205 | 2016-05-14 |
| 6 | 4 | 13 | 2016-05-15 |
| 7 | 3 | 220 | 2016-05-15 |
| 8 | 5 | 545 | 2016-05-16 |
| 9 | 3 | 201 | 2016-05-17 |
+-----+---------+-------+------------+
創(chuàng)建第三張表
>>> CREATE TABLE apps
(
id integer PRIMARY KEY,
app_name varchar(255),
url varchar(255),
country varchar(255)
);
插入數(shù)據(jù)
>>> INSERT INTO apps (id,app_name,url,country)
VALUES ('1','QQ APP','http://im.qq.com/','CN');
>>> INSERT INTO apps (id,app_name,url,country)
VALUES ('2','微博 APP','http://weibo.com/','CN');
>>> INSERT INTO apps (id,app_name,url,country)
VALUES ('3','淘寶 APP','https://www.taobao.com/','CN');
查看表
>>> select * from apps
+----+------------+-------------------------+---------+
| id | app_name | url | country |
+----+------------+-------------------------+---------+
| 1 | QQ APP | http://im.qq.com/ | CN |
| 2 | 微博 APP | http://weibo.com/ | CN |
| 3 | 淘寶 APP | https://www.taobao.com/ | CN |
+----+------------+-------------------------+---------+
至此玫坛,表創(chuàng)建完畢结笨。
用SQLAclemy ORM連接數(shù)據(jù)庫
打開python交互,開始編寫程序
>>> import os
>>> os.environ["PATH"] += "D:\\oracle"#添加環(huán)境變量,是Oracle數(shù)據(jù)庫客戶端綠色版實(shí)例zip文件的解壓目錄炕吸,注意這里不是sys.path伐憾,“sys.path”是Python在import時(shí)搜索包的路徑。
>>> os.environ["NLS_LANG"] = "SIMPLIFIED CHINESE_CHINA.UTF8"#解決Oracle編碼問題
>>> from sqlalchemy import create_engine
>>> from sqlalchemy import MetaData, Table
>>> from sqlalchemy.orm import sessionmaker
>>> import pandas as pd
>>> from sqlalchemy import func
創(chuàng)建數(shù)據(jù)庫連接(只是創(chuàng)建連接赫模,還沒有綁定到MetaData)
注:這里展示的是連接ORACLE树肃,連接其他數(shù)據(jù)庫的格式基本相同,詳見SQLAclemy文檔瀑罗。
如果要as sysdba登錄胸嘴,換成'oracle://abc:123@127.0.0.1:1521/orcl?mode=SYSDBA'
>>> engine = create_engine('oracle://abc:123@127.0.0.1:1521/orcl', encoding='utf8', convert_unicode=True, echo = False)
準(zhǔn)備反射已有表,創(chuàng)建元數(shù)據(jù)對(duì)象(數(shù)據(jù)庫連接綁定到MetaData)
>>> metadata = MetaData(bind=engine)
反射整個(gè)數(shù)據(jù)庫(這才真正開始操作數(shù)據(jù)庫)
>>> metadata.reflect()
>>> print(metadata.tables.keys())#獲取所有的表名
dict_keys(['apps', 'access_log', 'websites'])
反射單表
>>> #反射表一
>>> web_t = Table('websites', metadata, autoload=True, autoload_with=engine)
>>> web = web_t.alias('web')#起個(gè)別名
>>> #反射表二
>>> acc_t = Table('access_log', metadata, autoload=True,autoload_with=engine)
>>> acc = acc_t.alias('acc')#起個(gè)別名
>>> #反射表三
>>> app_t = Table('apps', metadata, autoload=True,autoload_with=engine)
>>> app = app_t.alias('app')#起個(gè)別名
>>> # 列出所有的字段名
>>> web_col_name = [eval('u'+repr(i)) for i in web.columns.keys()]
>>> acc_col_name = [eval('u'+repr(i)) for i in acc.columns.keys()]
>>> app_col_name = [eval('u'+repr(i)) for i in app.columns.keys()]
>>> print(web_col_name,acc_col_name,app_col_name)
['id', 'name', 'url', 'alexa', 'country'] ['aid', 'site_id', 'count', 'col_date'] ['id', 'app_name', 'url', 'country']
這樣起別名也是一樣的斩祭,缺點(diǎn)是多導(dǎo)了個(gè)函數(shù):
from sqlalchemy.orm import aliased
web = aliased(web_t,name='web')
關(guān)于跨庫查詢:
假設(shè)我的數(shù)據(jù)庫實(shí)例ORCL中還有另外兩個(gè)用戶劣像,'def','ghi'。
'def'下有表'A','ghi'下有表'B'摧玫,用戶‘a(chǎn)bc’對(duì)'def','ghi'下的表'A'耳奕、'B'擁有查詢的權(quán)限。
只需這么做:
from sqlalchemy.ext.automap import automap_base
autobase = automap_base()
sc = ['def','ghi']
for i in sc:
autobase.metadata.reflect(engine, schema=i)
autobase.prepare()
autobase.metadata.bind = engine
A_t = Table('def.A', autobase.metadata, autoload=True, autoload_with=engine)
B_t = Table('ghi.B', autobase.metadata, autoload=True, autoload_with=engine)
autobase.metadata.reflect(views=True)#views=True可以把視圖的表也弄進(jìn)來
web_t = Table('websites', autobase.metadata, autoload=True, autoload_with=engine)
這樣就能在一次查詢中同時(shí)使用web_t诬像、A_t屋群、B_t這三張表了,別忘了起別名坏挠,不然容易報(bào)“標(biāo)識(shí)符過長”的錯(cuò)誤芍躏。
============基礎(chǔ)查詢============
ORM對(duì)表查詢的操作,需要先創(chuàng)建一個(gè)全局session實(shí)例降狠,通過這個(gè)session向數(shù)據(jù)庫提交語句纸肉。
>>>Session = sessionmaker()
>>>session = Session()
別名
表別名:反射表的時(shí)候已經(jīng)介紹了給表起別名的操作,翻譯成SQL是這樣的:
>>> SQL = '''SELECT w.name FROM websites w'''
>>> pd.read_sql(SQL,engine)
name
0 Google
1 淘寶
2 菜鳥教程
3 微博
4 Facebook
5 百度
6 stackoverflow
下面講字段別名
原生SQL 給字段起別名
>>> SQL = "SELECT name AS n, country AS c FROM Websites"
>>> pd.read_sql(SQL,engine)
n c
0 Google USA
1 淘寶 CN
2 菜鳥教程 CN
3 微博 CN
4 Facebook USA
5 百度 CN
6 stackoverflow IND
ORM 給字段起別名
>>> ret = session.query(web.c.name.label('n'),web.c.country.label('c'))
>>> pd.DataFrame(ret.all())
n c
0 Google USA
1 淘寶 CN
2 菜鳥教程 CN
3 微博 CN
4 Facebook USA
5 百度 CN
6 stackoverflow IND
查詢整張表
SELECT 語句
原生SQL
>>> SQL = 'SELECT * FROM Websites'
>>> pd.read_sql(SQL,engine)
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 3 菜鳥教程 http://www.runoob.com/ 4689 CN
3 4 微博 http://weibo.com/ 20 CN
4 5 Facebook https://www.facebook.com/ 3 USA
5 6 百度 https://www.baidu.com/ 4 CN
6 7 stackoverflow http://stackoverflow.com/ 0 IND
ORM
>>> ret = session.query(web)
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 3 菜鳥教程 http://www.runoob.com/ 4689 CN
3 4 微博 http://weibo.com/ 20 CN
4 5 Facebook https://www.facebook.com/ 3 USA
5 6 百度 https://www.baidu.com/ 4 CN
6 7 stackoverflow http://stackoverflow.com/ 0 IND
>>> pd.read_sql("access_log",engine)
aid site_id count col_date
0 1 1 45 2016-05-10
1 2 3 100 2016-05-13
2 3 1 230 2016-05-14
3 4 2 10 2016-05-14
4 5 5 205 2016-05-14
5 6 4 13 2016-05-15
6 7 3 220 2016-05-15
7 8 5 545 2016-05-16
8 9 3 201 2016-05-17
查詢部分表
ORM
>>> ret.first()
(1, 'Google', 'https://www.google.cm/', 1, 'USA')
>>> ret = session.query(web).slice(1,3)
>>> pd.DataFrame(ret.all())
id name url alexa country
0 2 淘寶 https://www.taobao.com/ 13 CN
1 3 菜鳥教程 http://www.runoob.com/ 4689 CN
帶條件查詢
WHERE 子句
原生SQL
>>> SQL = "SELECT url,alexa FROM Websites WHERE name='淘寶'"
>>> pd.read_sql(SQL,engine)
url alexa
0 https://www.taobao.com/ 13
ORM 用 表別名對(duì)象.c.字段名 訪問字段
>>> ret = session.query(*[web.c.url,web.c.alexa]).filter(web.c.name == '淘寶')
>>> pd.DataFrame(ret.all())
url alexa
0 https://www.taobao.com/ 13
原生SQL
>>> SQL = "SELECT url,alexa FROM Websites WHERE id>2 AND country = 'CN'"
>>> pd.read_sql(SQL,engine)
url alexa
0 http://www.runoob.com/ 4689
1 http://weibo.com/ 20
2 https://www.baidu.com/ 4
ORM
>>> ret = session.query(web.c.url,web.c.alexa).filter(web.c.id > 2, web.c.country == "CN")
>>> pd.DataFrame(ret.all())
url alexa
0 http://www.runoob.com/ 4689
1 http://weibo.com/ 20
2 https://www.baidu.com/ 4
多條件查詢
先導(dǎo)入幾個(gè)必要的函數(shù)
>>> from sqlalchemy import or_, and_, case
AND OR
原生SQL
>>> SQL = "SELECT * FROM Websites WHERE id<=2 or id>2 AND country = 'CN'"
>>> pd.read_sql(SQL,engine)
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 3 菜鳥教程 http://www.runoob.com/ 4689 CN
3 4 微博 http://weibo.com/ 20 CN
4 6 百度 https://www.baidu.com/ 4 CN
ORM
>>> ret = session.query(web).filter(or_(web.c.id<=2, and_(web.c.id>2, web.c.country == "CN")))
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 3 菜鳥教程 http://www.runoob.com/ 4689 CN
3 4 微博 http://weibo.com/ 20 CN
4 6 百度 https://www.baidu.com/ 4 CN
LIKE/NOT LIKE 與 通配符%
原生SQL
>>> SQL = "SELECT * FROM Websites WHERE url NOT LIKE '%.com%'"
>>> pd.read_sql(SQL,engine)
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
ORM “~”代表反
>>> ret = session.query(web).filter(~web.c.url.like("%.com%"))
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
>>> ret = session.query(web).filter(~web.c.url.ilike("%.com%"))#不區(qū)分大小寫用ilike
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
IN/NOT IN
原生SQL (要NOT IN只要在前面加 “NOT”)
>>> SQL = "SELECT * FROM Websites WHERE name IN ('Google','菜鳥教程')"
>>> pd.read_sql(SQL,engine)
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 3 菜鳥教程 http://www.runoob.com/ 4689 CN
ORM (要not in 只要在前面加 “~”)
>>> ret = session.query(web).filter(web.c.name.in_(['Google','菜鳥教程']))
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 3 菜鳥教程 http://www.runoob.com/ 4689 CN
IS NULL/IS NOT NULL
原生SQL
>>> SQL = "SELECT * FROM Websites WHERE name IS NULL"
>>> pd.read_sql(SQL,engine)
Empty DataFrame
Columns: [id, name, url, alexa, country]
Index: []
ORM
>>> ret = session.query(web).filter(web.c.name.is_(None))
>>> pd.DataFrame(ret.all())
Empty DataFrame
Columns: []
Index: []
>>> ret = session.query(web).filter(web.c.name == None)#或者這樣
>>> pd.DataFrame(ret.all())
Empty DataFrame
Columns: []
Index: []
原生SQL
>>> SQL = "SELECT * FROM Websites WHERE name IS NOT NULL"
>>> pd.read_sql(SQL,engine)
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 3 菜鳥教程 http://www.runoob.com/ 4689 CN
3 4 微博 http://weibo.com/ 20 CN
4 5 Facebook https://www.facebook.com/ 3 USA
5 6 百度 https://www.baidu.com/ 4 CN
6 7 stackoverflow http://stackoverflow.com/ 0 IND
ORM
>>> ret = session.query(web).filter(web.c.name.isnot(None))
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 3 菜鳥教程 http://www.runoob.com/ 4689 CN
3 4 微博 http://weibo.com/ 20 CN
4 5 Facebook https://www.facebook.com/ 3 USA
5 6 百度 https://www.baidu.com/ 4 CN
6 7 stackoverflow http://stackoverflow.com/ 0 IND
>>> ret = session.query(web).filter(web.c.name != None)#或者這樣
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 3 菜鳥教程 http://www.runoob.com/ 4689 CN
3 4 微博 http://weibo.com/ 20 CN
4 5 Facebook https://www.facebook.com/ 3 USA
5 6 百度 https://www.baidu.com/ 4 CN
6 7 stackoverflow http://stackoverflow.com/ 0 IND
BETWEEN
原生SQL
>>> SQL = "SELECT * FROM Websites WHERE alexa BETWEEN 1 AND 20"
>>> pd.read_sql(SQL,engine)
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 4 微博 http://weibo.com/ 20 CN
3 5 Facebook https://www.facebook.com/ 3 USA
4 6 百度 https://www.baidu.com/ 4 CN
ORM
>>> ret = session.query(web).filter(web.c.alexa.between(1, 20))
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 2 淘寶 https://www.taobao.com/ 13 CN
2 4 微博 http://weibo.com/ 20 CN
3 5 Facebook https://www.facebook.com/ 3 USA
4 6 百度 https://www.baidu.com/ 4 CN
CASE WHEN語句
原生SQL
>>> SQL = '''SELECT name,CASE
... WHEN alexa BETWEEN 1 AND 10 THEN '1 to 10'
... WHEN alexa BETWEEN 11 AND 100 THEN '11 to 100'
... ELSE 'morethen 100'
... END AS class
... FROM Websites'''
>>> pd.read_sql(SQL,engine)
name class
0 Google 1 to 10
1 淘寶 11 to 100
2 菜鳥教程 morethen 100
3 微博 11 to 100
4 Facebook 1 to 10
5 百度 1 to 10
6 stackoverflow morethen 100
ORM
>>> ret = session.query(web.c.name,case([(web.c.alexa.between(1,10),'1 to 10'),(web.c.alexa.between(11,100),'11 to 100')],else_='morethen 100').label('class'))
>>> pd.DataFrame(ret.all())
name class
0 Google 1 to 10
1 淘寶 11 to 100
2 菜鳥教程 morethen 100
3 微博 11 to 100
4 Facebook 1 to 10
5 百度 1 to 10
6 stackoverflow morethen 100
數(shù)據(jù)整理查詢
去重 SELECT DISTINCT
>>> from sqlalchemy import distinct
原生SQL
>>> SQL = "SELECT DISTINCT country FROM Websites"
>>> pd.read_sql(SQL,engine)
country
0 USA
1 IND
2 CN
ORM
>>> ret = session.query(web.c.country).distinct()
>>> pd.DataFrame(ret.all())
country
0 USA
1 IND
2 CN
或者
>>> ret = session.query(distinct(web.c.country).label('country'))
>>> pd.DataFrame(ret.all())
country
0 USA
1 IND
2 CN
排序 ORDER BY
原生SQL
>>> SQL = "SELECT * FROM Websites ORDER BY alexa ASC"#ASC正序
>>> SQL = "SELECT * FROM Websites ORDER BY alexa DESC"#DESC倒序
>>> pd.read_sql(SQL,engine)
id name url alexa country
0 3 菜鳥教程 http://www.runoob.com/ 4689 CN
1 4 微博 http://weibo.com/ 20 CN
2 2 淘寶 https://www.taobao.com/ 13 CN
3 6 百度 https://www.baidu.com/ 4 CN
4 5 Facebook https://www.facebook.com/ 3 USA
5 1 Google https://www.google.cm/ 1 USA
6 7 stackoverflow http://stackoverflow.com/ 0 IND
ORM
>>> ret = session.query(web).order_by(web.c.alexa.asc())# 正序
>>> ret = session.query(web).order_by(web.c.alexa.desc())# 倒序
>>> pd.DataFrame(ret.all())
id name url alexa country
0 3 菜鳥教程 http://www.runoob.com/ 4689 CN
1 4 微博 http://weibo.com/ 20 CN
2 2 淘寶 https://www.taobao.com/ 13 CN
3 6 百度 https://www.baidu.com/ 4 CN
4 5 Facebook https://www.facebook.com/ 3 USA
5 1 Google https://www.google.cm/ 1 USA
6 7 stackoverflow http://stackoverflow.com/ 0 IND
連表 JOIN ON
原生SQL INNER JOIN 找到符合Websites.id=access_log.site_id的所有一一對(duì)應(yīng)的方案
>>> SQL = """SELECT * FROM Websites JOIN access_log ON Websites.id=access_log.site_id"""
>>> pd.read_sql(SQL,engine)
id name url alexa country aid site_id count col_date
0 1 Google https://www.google.cm/ 1 USA 1 1 45 2016-05-10
1 1 Google https://www.google.cm/ 1 USA 3 1 230 2016-05-14
2 2 淘寶 https://www.taobao.com/ 13 CN 4 2 10 2016-05-14
3 3 菜鳥教程 http://www.runoob.com/ 4689 CN 2 3 100 2016-05-13
4 3 菜鳥教程 http://www.runoob.com/ 4689 CN 9 3 201 2016-05-17
5 3 菜鳥教程 http://www.runoob.com/ 4689 CN 7 3 220 2016-05-15
6 4 微博 http://weibo.com/ 20 CN 6 4 13 2016-05-15
7 5 Facebook https://www.facebook.com/ 3 USA 8 5 545 2016-05-16
8 5 Facebook https://www.facebook.com/ 3 USA 5 5 205 2016-05-14
>>> SQL = """SELECT * FROM Websites INNER JOIN access_log ON Websites.id=access_log.site_id"""
>>> pd.read_sql(SQL,engine)
id name url alexa country aid site_id count col_date
0 1 Google https://www.google.cm/ 1 USA 1 1 45 2016-05-10
1 1 Google https://www.google.cm/ 1 USA 3 1 230 2016-05-14
2 2 淘寶 https://www.taobao.com/ 13 CN 4 2 10 2016-05-14
3 3 菜鳥教程 http://www.runoob.com/ 4689 CN 2 3 100 2016-05-13
4 3 菜鳥教程 http://www.runoob.com/ 4689 CN 9 3 201 2016-05-17
5 3 菜鳥教程 http://www.runoob.com/ 4689 CN 7 3 220 2016-05-15
6 4 微博 http://weibo.com/ 20 CN 6 4 13 2016-05-15
7 5 Facebook https://www.facebook.com/ 3 USA 8 5 545 2016-05-16
8 5 Facebook https://www.facebook.com/ 3 USA 5 5 205 2016-05-14
ORM INNER JOIN
>>> ret = session.query(web, acc).filter(web.c.id == acc.c.site_id)#通過filter實(shí)現(xiàn)的inner join
>>> pd.DataFrame(ret.all())
id name url alexa country aid site_id count col_date
0 1 Google https://www.google.cm/ 1 USA 1 1 45 2016-05-10
1 1 Google https://www.google.cm/ 1 USA 3 1 230 2016-05-14
2 2 淘寶 https://www.taobao.com/ 13 CN 4 2 10 2016-05-14
3 3 菜鳥教程 http://www.runoob.com/ 4689 CN 2 3 100 2016-05-13
4 3 菜鳥教程 http://www.runoob.com/ 4689 CN 9 3 201 2016-05-17
5 3 菜鳥教程 http://www.runoob.com/ 4689 CN 7 3 220 2016-05-15
6 4 微博 http://weibo.com/ 20 CN 6 4 13 2016-05-15
7 5 Facebook https://www.facebook.com/ 3 USA 8 5 545 2016-05-16
8 5 Facebook https://www.facebook.com/ 3 USA 5 5 205 2016-05-14
>>> ret = session.query(web, acc).join(acc,web.c.id == acc.c.site_id)#臨時(shí)定義的inner join
>>> pd.DataFrame(ret.all())
id name url alexa country aid site_id count col_date
0 1 Google https://www.google.cm/ 1 USA 1 1 45 2016-05-10
1 1 Google https://www.google.cm/ 1 USA 3 1 230 2016-05-14
2 2 淘寶 https://www.taobao.com/ 13 CN 4 2 10 2016-05-14
3 3 菜鳥教程 http://www.runoob.com/ 4689 CN 2 3 100 2016-05-13
4 3 菜鳥教程 http://www.runoob.com/ 4689 CN 9 3 201 2016-05-17
5 3 菜鳥教程 http://www.runoob.com/ 4689 CN 7 3 220 2016-05-15
6 4 微博 http://weibo.com/ 20 CN 6 4 13 2016-05-15
7 5 Facebook https://www.facebook.com/ 3 USA 8 5 545 2016-05-16
8 5 Facebook https://www.facebook.com/ 3 USA 5 5 205 2016-05-14
>>> ret = session.query(web).join(acc)#通過外鍵的inner left join
>>> pd.DataFrame(ret.all())
id name url alexa country
0 1 Google https://www.google.cm/ 1 USA
1 1 Google https://www.google.cm/ 1 USA
2 2 淘寶 https://www.taobao.com/ 13 CN
3 3 菜鳥教程 http://www.runoob.com/ 4689 CN
4 3 菜鳥教程 http://www.runoob.com/ 4689 CN
5 3 菜鳥教程 http://www.runoob.com/ 4689 CN
6 4 微博 http://weibo.com/ 20 CN
7 5 Facebook https://www.facebook.com/ 3 USA
8 5 Facebook https://www.facebook.com/ 3 USA
>>> ret = session.query(acc).join(web)#通過外鍵的inner right join
>>> pd.DataFrame(ret.all())
aid site_id count col_date
0 1 1 45 2016-05-10
1 2 3 100 2016-05-13
2 3 1 230 2016-05-14
3 4 2 10 2016-05-14
4 5 5 205 2016-05-14
5 6 4 13 2016-05-15
6 7 3 220 2016-05-15
7 8 5 545 2016-05-16
8 9 3 201 2016-05-17
原生SQL LEFT JOIN 喊熟,右表沒有的記錄也顯示
>>> SQL = """SELECT * FROM Websites LEFT JOIN access_log ON Websites.id=access_log.site_id"""
>>> pd.read_sql(SQL,engine)
id name url alexa country aid site_id count col_date
0 1 Google https://www.google.cm/ 1 USA 1.0 1.0 45.0 2016-05-10
1 1 Google https://www.google.cm/ 1 USA 3.0 1.0 230.0 2016-05-14
2 2 淘寶 https://www.taobao.com/ 13 CN 4.0 2.0 10.0 2016-05-14
3 3 菜鳥教程 http://www.runoob.com/ 4689 CN 2.0 3.0 100.0 2016-05-13
4 3 菜鳥教程 http://www.runoob.com/ 4689 CN 9.0 3.0 201.0 2016-05-17
5 3 菜鳥教程 http://www.runoob.com/ 4689 CN 7.0 3.0 220.0 2016-05-15
6 4 微博 http://weibo.com/ 20 CN 6.0 4.0 13.0 2016-05-15
7 5 Facebook https://www.facebook.com/ 3 USA 8.0 5.0 545.0 2016-05-16
8 5 Facebook https://www.facebook.com/ 3 USA 5.0 5.0 205.0 2016-05-14
9 6 百度 https://www.baidu.com/ 4 CN NaN NaN NaN NaT
10 7 stackoverflow http://stackoverflow.com/ 0 IND NaN NaN NaN NaT
ORM 指定isouter=True相當(dāng)于left join
>>> ret = session.query(web,acc).join(acc, isouter=True)#通過外鍵的left join
>>> pd.DataFrame(ret.all())
id name url alexa country aid site_id count col_date
0 1 Google https://www.google.cm/ 1 USA 1.0 1.0 45.0 2016-05-10
1 1 Google https://www.google.cm/ 1 USA 3.0 1.0 230.0 2016-05-14
2 2 淘寶 https://www.taobao.com/ 13 CN 4.0 2.0 10.0 2016-05-14
3 3 菜鳥教程 http://www.runoob.com/ 4689 CN 2.0 3.0 100.0 2016-05-13
4 3 菜鳥教程 http://www.runoob.com/ 4689 CN 9.0 3.0 201.0 2016-05-17
5 3 菜鳥教程 http://www.runoob.com/ 4689 CN 7.0 3.0 220.0 2016-05-15
6 4 微博 http://weibo.com/ 20 CN 6.0 4.0 13.0 2016-05-15
7 5 Facebook https://www.facebook.com/ 3 USA 8.0 5.0 545.0 2016-05-16
8 5 Facebook https://www.facebook.com/ 3 USA 5.0 5.0 205.0 2016-05-14
9 6 百度 https://www.baidu.com/ 4 CN NaN NaN NaN NaT
10 7 stackoverflow http://stackoverflow.com/ 0 IND NaN NaN NaN NaT
>>> ret = session.query(web,acc).join(acc,full=True)#通過外鍵的full join
>>> pd.DataFrame(ret.all())
id name url alexa country aid site_id count col_date
0 1 Google https://www.google.cm/ 1 USA 1.0 1.0 45.0 2016-05-10
1 3 菜鳥教程 http://www.runoob.com/ 4689 CN 2.0 3.0 100.0 2016-05-13
2 1 Google https://www.google.cm/ 1 USA 3.0 1.0 230.0 2016-05-14
3 2 淘寶 https://www.taobao.com/ 13 CN 4.0 2.0 10.0 2016-05-14
4 5 Facebook https://www.facebook.com/ 3 USA 5.0 5.0 205.0 2016-05-14
5 4 微博 http://weibo.com/ 20 CN 6.0 4.0 13.0 2016-05-15
6 3 菜鳥教程 http://www.runoob.com/ 4689 CN 7.0 3.0 220.0 2016-05-15
7 5 Facebook https://www.facebook.com/ 3 USA 8.0 5.0 545.0 2016-05-16
8 3 菜鳥教程 http://www.runoob.com/ 4689 CN 9.0 3.0 201.0 2016-05-17
9 6 百度 https://www.baidu.com/ 4 CN NaN NaN NaN NaT
10 7 stackoverflow http://stackoverflow.com/ 0 IND NaN NaN NaN NaT
多表聯(lián)合 UNION
原生SQL UNION 從 "Websites" 和 "apps" 表中選取所有不同的country(只有不同的值):
>>> SQL = '''SELECT country FROM Websites
... UNION
... SELECT country FROM apps
... ORDER BY country'''
>>> pd.read_sql(SQL,engine)
country
0 CN
1 IND
2 USA
ORM
>>> ret = session.query(web.c.country).union(session.query(app.c.country))
>>> pd.DataFrame(ret.all())
country
0 CN
1 IND
2 USA
原生SQL 使用 UNION ALL 從 "Websites" 和 "apps" 表中選取所有的country(也有重復(fù)的值):
>>> SQL = '''SELECT country FROM Websites
... UNION ALL
... SELECT country FROM apps
... ORDER BY country'''
>>> pd.read_sql(SQL,engine)
country
0 CN
1 CN
2 CN
3 CN
4 CN
5 CN
6 CN
7 IND
8 USA
9 USA
ORM
>>> ret = session.query(web.c.country).union_all(session.query(app.c.country))
>>> pd.DataFrame(ret.all())
country
0 USA
1 CN
2 CN
3 CN
4 USA
5 CN
6 IND
7 CN
8 CN
9 CN
>>> #cte()相當(dāng)于ORACLE的WITH AS,比上面的方法姐刁,多創(chuàng)建了臨時(shí)表芥牌,避免重復(fù)多次查詢相同內(nèi)容,提高性能聂使。
>>> sub = session.query(web.c.country).union_all(session.query(app.c.country)).cte("temp_table")
>>> pd.DataFrame(session.query(sub).all())
web_country
0 USA
1 CN
2 CN
3 CN
4 USA
5 CN
6 IND
7 CN
8 CN
9 CN
合并 CONCAT()
原生SQL
>>> SQL = "SELECT CONCAT(id, name) FROM Websites"
>>> pd.read_sql(SQL,engine)
CONCAT(ID,NAME)
0 1Google
1 2淘寶
2 3菜鳥教程
3 4微博
4 5Facebook
5 6百度
6 7stackoverflow
ORM
>>> ret = session.query(func.concat(web.c.id, web.c.name).label('id,name'))
>>> pd.DataFrame(ret.all())
id,name
0 1Google
1 2淘寶
2 3菜鳥教程
3 4微博
4 5Facebook
5 6百度
6 7stackoverflow
原生SQL
>>> SQL = "SELECT CONCAT(CONCAT(id, name), url) FROM Websites"
>>> pd.read_sql(SQL,engine)
CONCAT(CONCAT(ID,NAME),URL)
0 1Googlehttps://www.google.cm/
1 2淘寶https://www.taobao.com/
2 3菜鳥教程http://www.runoob.com/
3 4微博http://weibo.com/
4 5Facebookhttps://www.facebook.com/
5 6百度https://www.baidu.com/
6 7stackoverflowhttp://stackoverflow.com/
ORM
>>> ret = session.query(func.concat(func.concat(web.c.id, web.c.name),web.c.url).label('id,name,url'))
>>> pd.DataFrame(ret.all())
id,name,url
0 1Googlehttps://www.google.cm/
1 2淘寶https://www.taobao.com/
2 3菜鳥教程http://www.runoob.com/
3 4微博http://weibo.com/
4 5Facebookhttps://www.facebook.com/
5 6百度https://www.baidu.com/
6 7stackoverflowhttp://stackoverflow.com/
高級(jí)查詢
統(tǒng)計(jì)函數(shù)
平均 AVG()
原生SQL 選擇訪問量高于平均訪問量的 "site_id" 和 "count":
>>> SQL = "SELECT site_id, count FROM access_log WHERE count > (SELECT AVG(count) FROM access_log)"
>>> pd.read_sql(SQL,engine)
site_id count
0 1 230
1 5 205
2 3 220
3 5 545
4 3 201
ORM
>>> ret = session.query(acc.c.site_id,acc.c.count).filter(acc.c.count>session.query(func.avg(acc.c.count)))
>>> pd.DataFrame(ret.all())
site_id count
0 1 230
1 5 205
2 3 220
3 5 545
4 3 201
計(jì)數(shù) COUNT()
原生SQL
>>> SQL = "SELECT count(*) FROM access_log"
>>> pd.read_sql(SQL,engine)
COUNT(*)
0 9
ORM
>>> session.query(acc).count()#一個(gè)簡便方法
9
>>> session.query(func.count('*')).select_from(acc).scalar()#常規(guī)方法
9
>>> session.query(func.count(acc.c.aid)).scalar()#不用select_from()用主鍵acc.c.aid
9
原生SQL 計(jì)算 "access_log" 表中不同 site_id 的記錄數(shù):
>>> SQL = "SELECT COUNT(DISTINCT site_id) AS nums FROM access_log"
>>> pd.read_sql(SQL,engine)
nums
0 5
ORM
>>> session.query(distinct(acc.c.site_id)).count()#一個(gè)簡便方法
5
>>> ret = session.query(func.count(distinct(acc.c.site_id)).label('nums'))
>>> pd.DataFrame(ret.all())
nums
0 5
>>> #順便介紹下子查詢
>>> retSub = session.query(acc.c.site_id).distinct().subquery('t')
>>> ret = session.query(func.count(retSub.c.site_id).label('nums'))
>>> pd.DataFrame(ret.all())
nums
0 5
最大值 MAX()
原生SQL
>>> SQL = "SELECT MAX(alexa) AS max_alexa FROM Websites"
>>> pd.read_sql(SQL,engine)
max_alexa
0 4689
ORM
>>> ret = session.query(func.max(web.c.alexa).label('max_alexa'))
>>> pd.DataFrame(ret.all())
max_alexa
0 4689
最小值MIN()
原生SQL
>>> SQL = "SELECT MIN(alexa) AS min_alexa FROM Websites"
>>> pd.read_sql(SQL,engine)
min_alexa
0 0
ORM
>>> ret = session.query(func.min(web.c.alexa).label('min_alexa'))
>>> pd.DataFrame(ret.all())
min_alexa
0 0
求和 SUM()
原生SQL
>>> SQL = "SELECT SUM(count) AS nums FROM access_log"
>>> pd.read_sql(SQL,engine)
nums
0 1569
ORM
>>> ret = session.query(func.sum(acc.c.count).label("nums"))
>>> pd.DataFrame(ret.all())
nums
0 1569
格式化
原生SQL
>>> SQL = "SELECT aid, site_id, count, to_char(col_date,'yyyymm') as ym FROM access_log"#將字符串轉(zhuǎn)日期用to_date
>>> pd.read_sql(SQL,engine)
aid site_id count ym
0 1 1 45 201605
1 2 3 100 201605
2 3 1 230 201605
3 4 2 10 201605
4 5 5 205 201605
5 6 4 13 201605
6 7 3 220 201605
7 8 5 545 201605
8 9 3 201 201605
ORM
>>> ret = session.query(acc.c.aid, acc.c.site_id, acc.c.count,
... func.to_char(acc.c.col_date,'yyyymm').label("ym"))
>>> pd.DataFrame(ret.all())
aid site_id count ym
0 1 1 45 201605
1 2 3 100 201605
2 3 1 230 201605
3 4 2 10 201605
4 5 5 205 201605
5 6 4 13 201605
6 7 3 220 201605
7 8 5 545 201605
8 9 3 201 201605
分組聚合 GROUP BY HAVING
原生SQL
>>> SQL = """SELECT site_id, to_char(col_date,'yyyymm') as ym,SUM(count) as sc
FROM access_log GROUP BY site_id, to_char(col_date,'yyyymm')"""
>>> pd.read_sql(SQL,engine)
site_id ym sc
0 1 201605 275
1 2 201605 10
2 3 201605 521
3 5 201605 750
4 4 201605 13
ORM
>>> cols = [acc.c.site_id,func.to_char(acc.c.col_date,'yyyymm').label('ym')]
>>> ret = session.query(*cols,func.sum(acc.c.count).label('sc')).group_by(*cols)
>>> pd.DataFrame(ret.all())
site_id ym sc
0 1 201605 275
1 2 201605 10
2 3 201605 521
3 5 201605 750
4 4 201605 13
注意:不要寫成如下形式:
#這是不對(duì)的壁拉,同樣的代碼不能寫第二遍
session.query(acc.c.site_id,func.to_char(acc.c.col_date,'yyyymm').label('ym'),
func.sum(acc.c.count).label('sc')).group_by(acc.c.site_id,
func.to_char(acc.c.col_date,'yyyymm').label('ym'))
原生SQL
>>> SQL = """SELECT Websites.name, SUM(access_log.count) AS nums FROM Websites
... INNER JOIN access_log
... ON Websites.id=access_log.site_id
... WHERE Websites.alexa < 200
... GROUP BY Websites.name
... HAVING SUM(access_log.count) > 200"""
>>> pd.read_sql(SQL,engine)
name nums
0 Google 275
1 Facebook 750
ORM
>>> #隱式外鍵關(guān)聯(lián)
>>> ret = session.query(web.c.name,func.sum(acc.c.count).label("nums")).\
... join(acc).filter(web.c.alexa < 200).\
... group_by(web.c.name).having(func.sum(acc.c.count) > 200)
>>> pd.DataFrame(ret.all())
name nums
0 Google 275
1 Facebook 750
>>> #顯式外鍵關(guān)聯(lián)
>>> ret = session.query(web.c.name,func.sum(acc.c.count).label("nums")).\
... join(acc).filter(web.c.id==acc.c.site_id,web.c.alexa < 200).\
... group_by(web.c.name).having(func.sum(acc.c.count) > 200)
>>> pd.DataFrame(ret.all())
name nums
0 Google 275
1 Facebook 750
EXISTS/NOT EXISTS
原生SQL EXISTS
>>> SQL = "SELECT Websites.name, Websites.url FROM Websites \
... WHERE EXISTS (SELECT count FROM access_log WHERE Websites.id = access_log.site_id AND count > 200)"
>>> pd.read_sql(SQL,engine)
name url
0 Google https://www.google.cm/
1 菜鳥教程 http://www.runoob.com/
2 Facebook https://www.facebook.com/
ORM
>>> ret = session.query().filter(web.c.id==acc.c.site_id,acc.c.count>200).exists()
>>> ret = session.query(web.c.name,web.c.url).filter(ret)
>>> pd.DataFrame(ret.all())
name url
0 Google https://www.google.cm/
1 菜鳥教程 http://www.runoob.com/
2 Facebook https://www.facebook.com/
>>> # 另一種方法,更接近原生SQL
>>> from sqlalchemy.sql import exists
>>> ret = exists([acc.c.count]).where(and_(web.c.id==acc.c.site_id,acc.c.count>200))
>>> ret = session.query(web.c.name,web.c.url).filter(ret)
>>> pd.DataFrame(ret.all())
name url
0 Google https://www.google.cm/
1 菜鳥教程 http://www.runoob.com/
2 Facebook https://www.facebook.com/
原生SQL NOT EXISTS
>>> SQL = "SELECT Websites.name, Websites.url FROM Websites \
... WHERE NOT EXISTS (SELECT count FROM access_log WHERE Websites.id = access_log.site_id AND count > 200)"
>>> pd.read_sql(SQL,engine)
name url
0 淘寶 https://www.taobao.com/
1 微博 http://weibo.com/
2 百度 https://www.baidu.com/
3 stackoverflow http://stackoverflow.com/
ORM
>>> ret = session.query().filter(web.c.id==acc.c.site_id,acc.c.count>200).exists()
>>> ret = session.query(web.c.name,web.c.url).filter(~ret)
>>> pd.DataFrame(ret.all())
name url
0 淘寶 https://www.taobao.com/
1 微博 http://weibo.com/
2 百度 https://www.baidu.com/
3 stackoverflow http://stackoverflow.com/