大概流程
一段對SQL執(zhí)行完整的一套代碼攻询。分為四個步驟:
總結來說Calcite有以下主要功能:
- SQL 解析
- SQL 校驗
- 查詢優(yōu)化
- SQL 生成器
-
數(shù)據(jù)連接
Calcite 解析SQl的步驟:
Calcite 解析步驟
如上圖中所述,一般來說Calcite解析SQL有以下幾步:
- Parser. 此步中Calcite通過Java CC將SQL解析成未經(jīng)校驗的AST
- Validate. 該步驟主要作用是校證Parser步驟中的AST是否合法,如驗證SQL scheme州弟、字段钧栖、函數(shù)等是否存在; SQL語句是否合法等. 此步完成之后就生成了RelNode樹(關于RelNode樹, 請參考下文)
- Optimize. 該步驟主要的作用優(yōu)化RelNode樹, 并將其轉化成物理執(zhí)行計劃。主要涉及SQL規(guī)則優(yōu)化如:基于規(guī)則優(yōu)化(RBO)及基于代價(CBO)優(yōu)化; Optimze 這一步原則上來說是可選的, 通過Validate后的RelNode樹已經(jīng)可以直接轉化物理執(zhí)行計劃婆翔,但現(xiàn)代的SQL解析器基本上都包括有這一步拯杠,目的是優(yōu)化SQL執(zhí)行計劃。此步得到的結果為物理執(zhí)行計劃啃奴。
- Execute潭陪,即執(zhí)行階段。此階段主要做的是:將物理執(zhí)行計劃轉化成可在特定的平臺執(zhí)行的程序。如Hive與Flink都在在此階段將物理執(zhí)行計劃CodeGen生成相應的可執(zhí)行代碼依溯。
Calcite相關組件
Calcite主要有以下概念:
- Catalog: 主要定義SQL語義相關的元數(shù)據(jù)與命名空間老厌。
- SQL parser: 主要是把SQL轉化成AST.
- SQL validator: 通過Catalog來校證AST.
- Query optimizer: 將AST轉化成物理執(zhí)行計劃、優(yōu)化物理執(zhí)行計劃.
- SQL generator: 反向將物理執(zhí)行計劃轉化成SQL語句.
Convert query to SqlNode
public class Test {
public static void main(String[] args) throws SqlParseException {
// Convert query to SqlNode
String sql = "select price from transactions";
//構建config黎炉,有默認的構造也有帶參數(shù)的config 返回ConfigImpl 有默認的實現(xiàn)
Config config = SqlParser.configBuilder().build();
//根據(jù)config構建sql解析器 reader是SourceStringReader,String流
SqlParser parser = SqlParser.create(sql, config);
//構建parse tree
SqlNode node = parser.parseQuery();
}
}
上述代碼流程:
調(diào)用SqlParser將SQL語句生成SQL Tree枝秤。這部分是Java CC基于Parser.jj文件模板來實現(xiàn)的,輸出為SqlNode的Tree慷嗜。
詳細分解
/** Implementation of
* {@link Config}.
* Called by builder; all values are in private final fields. */
private static class ConfigImpl implements Config {
private final int identifierMaxLength; //最大長度標識
private final boolean caseSensitive;
//有效SQL兼容模式的枚舉
private final SqlConformance conformance;
private final Casing quotedCasing; //Casing是枚舉類型,大小寫,未改變
private final Casing unquotedCasing;
private final Quoting quoting; //引用
private final SqlParserImplFactory parserFactory; //
//私有構造函數(shù)
private ConfigImpl(int identifierMaxLength, Casing quotedCasing,
Casing unquotedCasing, Quoting quoting, boolean caseSensitive,
SqlConformance conformance, SqlParserImplFactory parserFactory) {
this.identifierMaxLength = identifierMaxLength;
this.caseSensitive = caseSensitive;
this.conformance = Objects.requireNonNull(conformance);
this.quotedCasing = Objects.requireNonNull(quotedCasing);
this.unquotedCasing = Objects.requireNonNull(unquotedCasing);
this.quoting = Objects.requireNonNull(quoting);
this.parserFactory = Objects.requireNonNull(parserFactory);
}
追溯Config,有默認的ConfigImpl實現(xiàn),ConfigImpl是SqlParser的內(nèi)部類,通過builder來構建淀弹。ConfigBuilder里邊都有默認的Config配置。
/** Builder for a {@link Config}. */
public static class ConfigBuilder {
private Casing quotedCasing = Lex.ORACLE.quotedCasing;
private Casing unquotedCasing = Lex.ORACLE.unquotedCasing;
private Quoting quoting = Lex.ORACLE.quoting;
private int identifierMaxLength = DEFAULT_IDENTIFIER_MAX_LENGTH;
private boolean caseSensitive = Lex.ORACLE.caseSensitive;
private SqlConformance conformance = SqlConformanceEnum.DEFAULT;
private SqlParserImplFactory parserFactory = SqlParserImpl.FACTORY;
在這里講一下SqlParserImplFactory
public interface SqlParserImplFactory {
/**
* Get the underlying parser implementation.
*
* @return {@link SqlAbstractParserImpl} object.
*/
SqlAbstractParserImpl getParser(Reader stream);
}
SqlAbstractParserImpl的代理庆械,Parse的代理薇溃,這里可以自定義Parse的實現(xiàn)
接下來是生成SQLParser
public static SqlParser create(Reader reader, Config config) {
SqlAbstractParserImpl parser =
//config里邊得到Factory
config.parserFactory().getParser(reader);
return new SqlParser(parser, config);
}
SqlParserImplFactory.java
/**
* Get the underlying parser implementation.
*
* @return {@link SqlAbstractParserImpl} object.
*/
SqlAbstractParserImpl getParser(Reader stream);
這里會返回SqlAbstractParserImpl具體的實現(xiàn)類。
然后
//解析select語句
public SqlNode parseQuery() throws SqlParseException {
return parser.parseSqlStmtEof();
//略過
}
構建parse tree SqlNode即為生成的AST的根節(jié)點缭乘,代表了整個抽象語法樹沐序。
我們來看看Parser.jj的具體的解析過程源碼:
在SqlParserImpl.java。讀Parser.jj stream的形式讀取忿峻。SQL解析器,由JavaCC從Parser.jj生成辕羽。
/**
* Parses an SQL statement followed by the end-of-file symbol.
* 解析SQL語句逛尚,后跟文件結束符號。
*/
final public SqlNode SqlStmtEof() throws ParseException {
SqlNode stmt;
stmt = SqlStmt();
jj_consume_token(0);
{if (true) return stmt;}
throw new Error("Missing return statement in function");
}
接著來看:
SqlStmt();這里會有具體的解析過程,讀者可以自行閱讀相應源碼刁愿。代碼最終會生成SqlNode
/**
* Parses an SQL statement.
* 解析SQL語句绰寞。
*/
final public SqlNode SqlStmt() throws ParseException {
SqlNode stmt;
if (jj_2_34(2)) {
stmt = SqlSetOption(Span.of(), null);
} else if (jj_2_35(2)) {
stmt = SqlAlter();
} else if (jj_2_36(2)) {
stmt = OrderedQueryOrExpr(ExprContext.ACCEPT_QUERY);
} else if (jj_2_37(2)) {
stmt = SqlExplain();
} else if (jj_2_38(2)) {
stmt = SqlDescribe();
} else if (jj_2_39(2)) {
stmt = SqlInsert();
} else if (jj_2_40(2)) {
stmt = SqlDelete();
} else if (jj_2_41(2)) {
stmt = SqlUpdate();
} else if (jj_2_42(2)) {
stmt = SqlMerge();
} else if (jj_2_43(2)) {
stmt = SqlProcedureCall();
} else {
jj_consume_token(-1);
throw new ParseException();
}
{if (true) return stmt;}
throw new Error("Missing return statement in function");
}
Convert SqlNode to RelNode
//VolcanoPlanner會根據(jù)動態(tài)算法優(yōu)化查詢 cost factory
VolcanoPlanner planner = new VolcanoPlanner();
//行表達式代理 一些常見的字符值會被緩存
RexBuilder rexBuilder = createRexBuilder();
//優(yōu)化查詢期間的環(huán)境 RelOptPlanner會把相關的表達式轉換成語義上的表達式
RelOptCluster cluster = RelOptCluster.create(planner, rexBuilder);
//創(chuàng)建converter
SqlToRelConverter converter = new SqlToRelConverter(...);
//通常root為tree的根節(jié)點
RelRoot root = converter.convertQuery(node, false, true);
上述代碼流程
SqlToRelConverter將SQL Tree轉化為Calcite中的RelNode。雖然兩種Node都是類似于Tree的形式铣口,但是表示的含義不同滤钱。SqlNode有很多種,既包括MIN脑题、MAX這種表達式型的件缸,也包括SELECT、JOIN這種關系型的叔遂,轉化過程中他炊,將這兩種分離成RelNode關系型和RexNode表達式型。
詳細分解
VolcanoPlanner.java
/**
* Creates a {@code VolcanoPlanner} with a given cost factory.
*/
public VolcanoPlanner(RelOptCostFactory costFactory, //
Context externalContext) {
super(costFactory == null ? VolcanoCost.FACTORY : costFactory, //
externalContext);
this.zeroCost = this.costFactory.makeZeroCost();
}
RexBuilder.java
行表達式代理 一些常見的字符值會被緩存 (NULL, TRUE, FALSE, 0, 1, '') are cached
public RexBuilder(RelDataTypeFactory typeFactory) {
this.typeFactory = typeFactory;
this.booleanTrue =
makeLiteral(
Boolean.TRUE,
typeFactory.createSqlType(SqlTypeName.BOOLEAN),
SqlTypeName.BOOLEAN);
this.booleanFalse =
makeLiteral(
Boolean.FALSE,
typeFactory.createSqlType(SqlTypeName.BOOLEAN),
SqlTypeName.BOOLEAN);
this.charEmpty =
makeLiteral(
new NlsString("", null, null),
typeFactory.createSqlType(SqlTypeName.CHAR, 0),
SqlTypeName.CHAR);
this.constantNull =
makeLiteral(
null,
typeFactory.createSqlType(SqlTypeName.NULL),
SqlTypeName.NULL);
}
基本都是對于一些常量的初始化已艰。沒有多說的痊末。
RelOptCluster.java
優(yōu)化查詢期間的環(huán)境 RelOptPlanner會把相關的表達式轉換成語義上的表達式
RelOptCluster(RelOptPlanner planner, RelDataTypeFactory typeFactory,
RexBuilder rexBuilder, AtomicInteger nextCorrel,
Map<String, RelNode> mapCorrelToRel) {
this.nextCorrel = nextCorrel;
this.mapCorrelToRel = mapCorrelToRel;
this.planner = Objects.requireNonNull(planner);
this.typeFactory = Objects.requireNonNull(typeFactory);
this.rexBuilder = rexBuilder;
this.originalExpression = rexBuilder.makeLiteral("?");
// set up a default rel metadata provider,
// giving the planner first crack at everything
//元數(shù)據(jù) 它提供了標準邏輯代數(shù)的一般公式和推導規(guī)則
setMetadataProvider(DefaultRelMetadataProvider.INSTANCE);
//特質
this.emptyTraitSet = planner.emptyTraitSet();
assert emptyTraitSet.size() == planner.getRelTraitDefs().size();
}
DefaultRelMetadataProvider構造函數(shù)提供很多規(guī)則,可以自己去看哩掺。
protected DefaultRelMetadataProvider() {
super(
ImmutableList.of(
RelMdPercentageOriginalRows.SOURCE,
RelMdColumnOrigins.SOURCE,
RelMdExpressionLineage.SOURCE,
RelMdTableReferences.SOURCE,
RelMdNodeTypes.SOURCE,
RelMdRowCount.SOURCE,
RelMdMaxRowCount.SOURCE,
RelMdMinRowCount.SOURCE,
RelMdUniqueKeys.SOURCE,
RelMdColumnUniqueness.SOURCE,
RelMdPopulationSize.SOURCE,
RelMdSize.SOURCE,
RelMdParallelism.SOURCE,
RelMdDistribution.SOURCE,
RelMdMemory.SOURCE,
RelMdDistinctRowCount.SOURCE,
RelMdSelectivity.SOURCE,
RelMdExplainVisibility.SOURCE,
RelMdPredicates.SOURCE,
RelMdAllPredicates.SOURCE,
RelMdCollation.SOURCE));
}
回到RelOptCluster凿叠,繼續(xù)往下走
SqlToRelConverter.java
創(chuàng)建converter 完成SqlNode 到 RelNode的轉換
@Deprecated // to be removed before 2.0
public SqlToRelConverter(
RelOptTable.ViewExpander viewExpander,
SqlValidator validator, //SQL校驗器
Prepare.CatalogReader catalogReader,
RelOptPlanner planner,
RexBuilder rexBuilder,
SqlRexConvertletTable convertletTable) {
this(viewExpander, validator, catalogReader,
RelOptCluster.create(planner, rexBuilder), convertletTable,
Config.DEFAULT);
}
接下來完成SqlNode 到RelNode轉換
converter.convertQuery(node, false, true);
這個過程中,有校驗的方法,我們一起來看看
public SqlNode validate(SqlNode topNode) {
SqlValidatorScope scope = new EmptyScope(this);
scope = new CatalogScope(scope, ImmutableList.of("CATALOG"));
final SqlNode topNode2 = validateScopedExpression(topNode, scope);
final RelDataType type = getValidatedNodeType(topNode2);
Util.discard(type);
return topNode2;
}
看看怎么具體做validate:
先來看看Scope:
上圖
圖上,有很多個***Scope盒件。
convertQuery()方法中有
RelNode result = convertQueryRecursive(query, top, null).rel;
得到RelRoot(RelNode樹的根節(jié)點)
protected RelRoot convertQueryRecursive(SqlNode query, boolean top,
RelDataType targetRowType) {
final SqlKind kind = query.getKind();
switch (kind) {
case SELECT:
return RelRoot.of(convertSelect((SqlSelect) query, top), kind);
case INSERT:
return RelRoot.of(convertInsert((SqlInsert) query), kind);
case DELETE:
return RelRoot.of(convertDelete((SqlDelete) query), kind);
case UPDATE:
return RelRoot.of(convertUpdate((SqlUpdate) query), kind);
case MERGE:
return RelRoot.of(convertMerge((SqlMerge) query), kind);
case UNION:
case INTERSECT:
case EXCEPT:
return RelRoot.of(convertSetOp((SqlCall) query), kind);
case WITH:
return convertWith((SqlWith) query, top);
case VALUES:
return RelRoot.of(convertValues((SqlCall) query, targetRowType), kind);
default:
throw new AssertionError("not a query: " + query);
}
}
我們來看:RelRoot.of(convertSelect((SqlSelect) query, top), kind);
/**
* Converts a SELECT statement's parse tree into a relational expression.
*/
public RelNode convertSelect(SqlSelect select, boolean top) {
final SqlValidatorScope selectScope = validator.getWhereScope(select);
final Blackboard bb = createBlackboard(selectScope, null, top);
convertSelectImpl(bb, select);
return bb.root;
}
中間有一些方法沒有看懂蹬碧,稍后補上。
來看:
RelNode optimized = planner.findBestExp();
public RelNode findBestExp() {
ensureRootConverters();
registerMaterializations();
int cumulativeTicks = 0;
for (VolcanoPlannerPhase phase : VolcanoPlannerPhase.values()) {
setInitialImportance();
RelOptCost targetCost = costFactory.makeHugeCost();
int tick = 0;
int firstFiniteTick = -1;
int splitCount = 0;
int giveUpTick = Integer.MAX_VALUE;
while (true) {
++tick;
++cumulativeTicks;
if (root.bestCost.isLe(targetCost)) {
if (firstFiniteTick < 0) {
firstFiniteTick = cumulativeTicks;
clearImportanceBoost();
}
if (ambitious) {
// Choose a slightly more ambitious target cost, and
// try again. If it took us 1000 iterations to find our
// first finite plan, give ourselves another 100
// iterations to reduce the cost by 10%.
targetCost = root.bestCost.multiplyBy(0.9);
++splitCount;
if (impatient) {
if (firstFiniteTick < 10) {
// It's possible pre-processing can create
// an implementable plan -- give us some time
// to actually optimize it.
giveUpTick = cumulativeTicks + 25;
} else {
giveUpTick =
cumulativeTicks
+ Math.max(firstFiniteTick / 10, 25);
}
}
} else {
break;
}
} else if (cumulativeTicks > giveUpTick) {
// We haven't made progress recently. Take the current best.
break;
} else if (root.bestCost.isInfinite() && ((tick % 10) == 0)) {
injectImportanceBoost();
}
LOGGER.debug("PLANNER = {}; TICK = {}/{}; PHASE = {}; COST = {}",
this, cumulativeTicks, tick, phase.toString(), root.bestCost);
VolcanoRuleMatch match = ruleQueue.popMatch(phase);
if (match == null) {
break;
}
assert match.getRule().matches(match);
match.onMatch();
// The root may have been merged with another
// subset. Find the new root subset.
root = canonize(root);
}
ruleQueue.phaseCompleted(phase);
}
if (LOGGER.isTraceEnabled()) {
StringWriter sw = new StringWriter();
final PrintWriter pw = new PrintWriter(sw);
dump(pw);
pw.flush();
LOGGER.trace(sw.toString());
}
RelNode cheapest = root.buildCheapestPlan(this);
if (LOGGER.isDebugEnabled()) {
LOGGER.debug(
"Cheapest plan:\n{}", RelOptUtil.toString(cheapest, SqlExplainLevel.ALL_ATTRIBUTES));
LOGGER.debug("Provenance:\n{}", provenance(cheapest));
}
return cheapest;
}
細節(jié)稍后補上履恩。
來看:
Optimize RelNode
RelNode optimized = planner.findBestExp();