我們在ES生產全文檢索的索引的時候, 會定義mapping, 并依據mapping編寫搜索邏輯, 并進行調優(yōu). 比如對于文章類的數據(任意業(yè)務方), 我們可以把它們的共性, 如標題, 副標題, 內容, 作者等固定為通用的mapping字段, 然后就可以很快的應用上基礎的搜索能力了. 但是, 業(yè)務方的數據, 是各種各樣的, 同樣是標題類的數據, 有的叫做title, 有的叫做head; 再比如, 有的業(yè)務方希望自己數據的標題其實是 以及一級標題 + 二級標題 + tags 這樣就給數據導入帶來了問題, 需要給每個業(yè)務方編寫數據轉化, 操作的邏輯. 占用了大量的人力, 并且發(fā)布更新都很麻煩.
所以: 希望能有一種方式來簡化這個流程, 讓數據的導入過程更加簡單, 減少人力和工作量.
讓數據的導入過程更加簡單 是終極業(yè)務需求, 我這里針對其中的一點: 給每個業(yè)務方編寫數據轉化, 操作的邏輯 闡述下我的解決思路.
"code" : 200,
"msg" : "success",
"result" : {
"data" : [
"id": 2,
"type" : "cat",
"title_one": "beautiful animal",
"title_two": "pets",
"content" : "i have a cat"
用戶希望搜索標題的時候, 可以同時搜索"title_one", "title_two". 則我們需要一個設施可以操作用戶的數據, 將"title_one", "title_two" 加和, 生成新的字段和數據.
這部分邏輯原先是用代碼完成的, 代碼是最靈活的, 可以滿足各種需求. 那為了能夠讓開發(fā)同學任然具有這份靈活性, 我決定還是使用"代碼" 的方式為他們提供支持, 類似規(guī)則引擎那樣, 開發(fā)同學編寫規(guī)則, 實際由規(guī)則引擎去執(zhí)行具體的操作.
拆解為: 規(guī)則的解析 + 規(guī)則的執(zhí)行
● 文件讀取器: 配置文件讀取器, 方法調用分析工具之類的程序分析工具. java 的 class文件載入器, aroma
● 生成器: 收集內部數據結構信息, 產生輸出. 對象-關系數據庫映射工具, 序列化工具, 源代碼生成器, 網頁生成器
● 翻譯器: = 讀取器 + 生成器. 代碼插裝工具, 匯編器和編譯器.
● 解釋器: 讀取文件, 解碼, 執(zhí)行指令. 計算器, python的實現.
如果要實現類似c語言風格的邏輯語言, 對我來說太難了, 還是借鑒了lisp
, 采用簡單的語法結構: S表達式
, 關于lisp, 不去贅述, 我學的也不好....
concat(max(list(2, 1, $user_data.clicknumber)), "something")
這段邏輯最終出來的數據是: 100someting, 一個json的str.
這是最簡單的一個例子, 畢竟json還有object, list 等數據結構, 都要能夠無損支持.
grammar CalcRefactorV1;
start : expr; // start rule, EOF if needed
// 沒有類型檢查
expr : list_expr
| not_list_expr
| compare_expr
| condition
| '(' expr ')'
list_expr: LIST '(' not_list_expr (',' not_list_expr)* ')' # ListCons
| FLATTEN '(' list_expr (',' list_expr)* ')' # ListFlatten
// | SUB '(' list_expr ')' # ListSub
not_list_expr : NOT_LIST_VARIABLE # NotListVar
| CONCAT '(' not_list_expr (',' not_list_expr)* ')' # NotListConcat// 1 或多
// | CONCAT '(' list_expr ')'
| SIZE '(' list_expr ')' # ListSize
| JOIN '(' not_list_expr ',' list_expr ')' # ListJoin
| SUM '(' list_expr ')' # ListSum
| SUM '(' not_list_expr (',' not_list_expr)* ')' # NotListSum
| NUM # Num
| STRING # Str
//compare_expr : COMPARE '(' expr ',' expr ')' # Compare ;
compare_expr : COMPARE '(' not_list_expr ',' not_list_expr ')' # Compare ;
condition : CONDITION '(' compare_expr ',' expr ',' expr ')' # Condi ;
//condition : CONDITION '(' compare_expr ',' not_list_expr ',' not_list_expr ')' # Condi ;
// 簡單字面量值
//value: NUM
CONCAT: 'concat';
FLATTEN: 'flatten';
//SUB: 'sub';
SUM: 'sum';
LIST: 'list';
SIZE: 'size';
JOIN: 'join';
CONDITION: 'condition';
COMPARE: 'compare';
// list variable: $.[*] $.[*].xxx $.xx.yy12.[*] $.[*].xxx
// \$(\.[a-zA-Z_][a-zA-Z0-9_]*)*\.\[\*\]((\.[a-zA-Z_][a-zA-Z0-9_]*)|(\.\[\*\]))* ........ || 從這里開始的可以丟掉吧? 如果為了性能, 畢竟也不應該寫這么復雜的
//LIST_VARIABLE: '$'('.'(ALPHA_)ALPHA__DIGIT*)*('.[*]'|('.'((ALPHA_)ALPHA__DIGIT*)?'['INT':'INT']'))(('.'(ALPHA_)ALPHA__DIGIT*)|'.[*]'|('.'((ALPHA_)ALPHA__DIGIT*)?'['INT':'INT']'))* ;
LIST_VARIABLE: '$'('.'ALPHA__DIGIT+)*('.[*]'|('.'(ALPHA__DIGIT+)?'['INT':'INT']'))(('.'ALPHA__DIGIT+)|'.[*]'|('.'(ALPHA__DIGIT+)?'['INT':'INT']'))* ;
// $.lll.
NUM: '-'?(DIGIT+ | DIGIT+'.'DIGIT+ | '.'DIGIT);
STRING: '"'(ESC|.)*?'"';
WS: [ \t\n\r]+ -> skip;
DIGIT: [0-9];
INT: '0'|'-'?[1-9]DIGIT*;
ALPHA: [a-zA-Z];
ESC: '\\"' | '\\\\';
// $.lll.
另外, 在pom里增加了一段處理羅輯, 每次編譯前, 都要生成
package .................functions;
import ...........GsonJsonPathUtils;
import com.google.gson.JsonElement;
import com.google.gson.JsonPrimitive;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CodePointCharStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import java.util.LinkedList;
import java.util.List;
import java.util.function.Function;
import java.util.stream.Collectors;
import .....................gen.*;
* This class provides an implementation and provides a way to calculate result of user defined expr: such as sum, size, etc.
* todo optimized
public class CalculationVisitor extends CalcRefactorV1BaseVisitor<Function<String, JsonElement>> {
* 暴露給上層的接口
* @param expr 用戶定義的字符串表達式
public static Function<String, JsonElement> parseExpr(String expr) {
CodePointCharStream input = CharStreams.fromString(expr);
CalcRefactorV1Lexer calcLexer = new CalcRefactorV1Lexer(input);
CommonTokenStream commonTokenStream = new CommonTokenStream(calcLexer);
CalcRefactorV1Parser calcParser = new CalcRefactorV1Parser(commonTokenStream);
ParseTree tree = calcParser.expr();
CalculationVisitor visitor = new CalculationVisitor();
Function<String, JsonElement> visit = visitor.visit(tree);
return visit;
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitListCons(CalcRefactorV1Parser.ListConsContext ctx) {
List<CalcRefactorV1Parser.Not_list_exprContext> exprs = ctx.not_list_expr();
List<Function<String, JsonElement>> argsFuncs = exprs.stream().map(this::visit).collect(Collectors.toList());
return new ListCons(argsFuncs);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitListFlatten(CalcRefactorV1Parser.ListFlattenContext ctx) {
List<CalcRefactorV1Parser.List_exprContext> listExprContexts = ctx.list_expr();
List<Function<String, JsonElement>> argsFuncs = listExprContexts.stream().map(this::visit).collect(
return new Flatten(argsFuncs);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitListVar(CalcRefactorV1Parser.ListVarContext ctx) {
String variable = ctx.getText();
return s -> GsonJsonPathUtils.read(s, variable);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitNotListVar(CalcRefactorV1Parser.NotListVarContext ctx) {
String variable = ctx.getText();
return s -> GsonJsonPathUtils.read(s, variable);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitNotListConcat(CalcRefactorV1Parser.NotListConcatContext ctx) {
List<CalcRefactorV1Parser.Not_list_exprContext> notListExprContexts = ctx.not_list_expr();
List<Function<String, JsonElement>> argsFuncs = notListExprContexts.stream().map(this::visit).collect(
return new NotListConcat(argsFuncs);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitListSize(CalcRefactorV1Parser.ListSizeContext ctx) {
CalcRefactorV1Parser.List_exprContext listExprContext = ctx.list_expr();
Function<String, JsonElement> argsFunc = visit(listExprContext);
return new Size(argsFunc);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitListJoin(CalcRefactorV1Parser.ListJoinContext ctx) {
CalcRefactorV1Parser.Not_list_exprContext notListExprContext = ctx.not_list_expr();
CalcRefactorV1Parser.List_exprContext list_exprContext = ctx.list_expr();
Function<String, JsonElement> seperatorFunc = visit(notListExprContext);
Function<String, JsonElement> arrayFunc = visit(list_exprContext);
return new Join(seperatorFunc, arrayFunc);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitListSum(CalcRefactorV1Parser.ListSumContext ctx) {
Function<String, JsonElement> functions = visit(ctx.list_expr());
return new ListSum(functions);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitNotListSum(CalcRefactorV1Parser.NotListSumContext ctx) {
List<CalcRefactorV1Parser.Not_list_exprContext> notListExprContexts = ctx.not_list_expr();
List<Function<String, JsonElement>> ans = new LinkedList<>();
for (CalcRefactorV1Parser.Not_list_exprContext notListExprContext : notListExprContexts) {
Function<String, JsonElement> visit = visit(notListExprContext);
return new NotListSum(ans);
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitNum(CalcRefactorV1Parser.NumContext ctx) {
return s -> new JsonPrimitive(Integer.parseInt(ctx.NUM().getText()));
* @param ctx the parse tree
* @return func
public Function<String, JsonElement> visitStr(CalcRefactorV1Parser.StrContext ctx) {
return s -> new JsonPrimitive(ctx.getText().substring(1, ctx.getText().length() - 1));
在antlr4的項目里面, 有很多的g4文件, 都是網友貢獻的
這是json的語法文件, 接下來嘗試寫了個解析和判斷json是否規(guī)范的代碼.
/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */
// Derived from http://json.org
grammar JSON;
: value EOF
: '{' pair (',' pair)* '}'
// | '{' '}'
: STRING ':' value
: '[' value (',' value)* ']'
// | '[' ']'
| obj
| arr
| 'true'
| 'false'
| 'null'
: '"' (ESC | SAFECODEPOINT)* '"'
fragment ESC
: '\\' (["\\/bfnrt] | UNICODE)
fragment UNICODE
fragment HEX
: [0-9a-fA-F]
: ~ ["\\\u0000-\u001F]
: '-'? INT ('.' [0-9] +)? EXP?
fragment INT
: '0' | [1-9] [0-9]*
// no leading zeros
fragment EXP
: [Ee] [+\-]? INT
// \- since - means "range" inside [...]
: [ \t\n\r] + -> skip
import .......gen.*;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CodePointCharStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.stream.Collectors;
public class JsonStructureVisitor extends JSONBaseVisitor<DataStructure> {
public static DataStructure parseJson(String json) {
CodePointCharStream input = CharStreams.fromString(json);
JSONLexer lexer = new JSONLexer(input);
CommonTokenStream commonTokenStream = new CommonTokenStream(lexer);
JSONParser parser = new JSONParser(commonTokenStream);
ParseTree tree = parser.json();
JsonStructureVisitor visitor = new JsonStructureVisitor();
DataStructure visit = visitor.visit(tree);
return visit;
public DataStructure visitJson(JSONParser.JsonContext ctx) {
if (Objects.equals(ctx.value().getText(), "null")) {
return null;
return visit(ctx.value());
public DataStructure visitObj(JSONParser.ObjContext ctx) {
ObjectStructure obj = new ObjectStructure();
for (JSONParser.PairContext pairContext : ctx.pair()) {
DataStructure child = visit(pairContext.value());
if (child != null) {
return obj;
public DataStructure visitPair(JSONParser.PairContext ctx) {
return super.visitPair(ctx);
public DataStructure visitArr(JSONParser.ArrContext ctx) {
ListStructure listStructure = new ListStructure();
Map<String, List<DataStructure>> collect = ctx.value().stream().map(this::visit).collect(
Collectors.groupingBy(x-> GsonInstance.getGson().toJson(x)));
if (collect.size() > 1) {
throw new RuntimeException(String.format("list: %s 中只能有一種類型", ctx.getText()));
return listStructure;
public DataStructure visitValue(JSONParser.ValueContext ctx) {
if (ctx.STRING() != null) {
return new PrimitiveStructure("string");
if (ctx.NUMBER() != null) {
return new PrimitiveStructure("number");
if (Objects.equals(ctx.getText(), "true") || Objects.equals(ctx.getText(), "false")) {
return new PrimitiveStructure("boolean");
return super.visitValue(ctx);
需要對json數據進行解析, 探查, 操作, 利用的是Gson + jsonpath
import com.jayway.jsonpath.Configuration;
import com.jayway.jsonpath.JsonPath;
import com.jayway.jsonpath.spi.json.GsonJsonProvider;
import com.jayway.jsonpath.spi.mapper.GsonMappingProvider;
import java.util.Map;
import java.util.Objects;
public class GsonJsonPathUtils {
private static final Configuration config;
static {
Configuration.ConfigurationBuilder builder = Configuration.builder();
builder.jsonProvider(new GsonJsonProvider());
builder.mappingProvider(new GsonMappingProvider());
config = builder.build();
* todo: unfinished method, 給定某個數據結構, 驗證path表達式, 是否合法
* @param ds the data structure
* @param jsonPathMap the json path map
* @param path the json path
public static void constructAllJsonPath(DataStructure ds, Map<String, JsonPath> jsonPathMap, String path) {
if (Objects.equals(ds.type(), DataStructure.object)) {
ObjectStructure obj = (ObjectStructure) ds;
for (int i = 0; i < obj.getFields().size(); i++) {
String fieldName = obj.getField(i);
String newP = path + "." + fieldName;
jsonPathMap.put(newP, JsonPath.compile(path));
constructAllJsonPath(obj.getFieldsType(i), jsonPathMap, newP);
* @param jsObj gson 的 json object
* @param jsPath compiled json path
* @param <T> type of result, jsObj, jsArray, jsPrimitive
* @return 從某個json中獲取的值
public static <T> T read(Object jsObj, JsonPath jsPath) {
return JsonPath.using(config).parse(jsObj).read(jsPath);
* @param jsObj gson 的 json object
* @param jsPathStr have not compiled json path
* @param <T> type of result, jsObj, jsArray, jsPrimitive
* @return 從某個json中獲取的值
public static <T> T read(Object jsObj, String jsPathStr) {
return JsonPath.using(config).parse(jsObj).read(jsPathStr);
* @param str gson 的 原始字符串
* @param jsPath compiled json path
* @param <T> type of result, jsObj, jsArray, jsPrimitive
* @return 從某個json中獲取的值
public static <T> T read(String str, JsonPath jsPath) {
return JsonPath.using(config).parse(str).read(jsPath);
* @param str gson 的 json object
* @param jsPathStr haven't compiled json path
* @param <T> type of result, jsObj, jsArray, jsPrimitive
* @return 從某個json中獲取的值
public static <T> T read(String str, String jsPathStr) {
return JsonPath.using(config).parse(str).read(jsPathStr);
* @param str gson 的 原始字符串
* @param jsPathStr compiled json path
* @return 獲取json中的這個路徑代表的元素的長度, 一定得是數組, 否則拋異常
public static int length(String str, String jsPathStr) {
return JsonPath.using(config).parse(str).read(jsPathStr + ".length()");
* @param jsObj gson 的 json object
* @param jsPathStr haven't compiled json path
* @return 獲取json中的這個路徑代表的元素的長度, 一定得是數組, 否則拋異常
public static int length(Object jsObj, String jsPathStr) {
return JsonPath.using(config).parse(jsObj).read(jsPathStr + ".length()");
public static final GsonBuilder builder;
public static final Gson gson;
static {
builder = new GsonBuilder();
builder.registerTypeAdapter(DataStructure.class, new GsonDataStructureSerde());
gson = builder.create();
自定義序列化, 反序列化
這個類, 可以用來表示json的結構:object, list, primitive.
import com.alibaba.fastjson.annotation.JSONType;
//@JSONType(seeAlso = {ListStructure.class, ObjectStructure.class, PrimitiveStructure.class})
// NOTICE: 如果有自引用的話, equals 和 toString 方法, 會導致StackOverFlow 或者 oom
public abstract class DataStructure {
public static final String list = "list";
public static final String primitive = "primitive";
public static final String object = "object";
private final String type;
public DataStructure(String type) {
this.type = type;
public String type() {
return this.type;
public boolean isPrimitive() {
return this.type.equals(primitive);
public boolean isList() {
return this.type.equals(list);
public boolean isObject() {
return this.type.equals(object);
public abstract boolean selfValidate();
import com.alibaba.fastjson.annotation.JSONType;
import java.util.Objects;
//@JSONType(typeName = "list")
public class ListStructure extends DataStructure {
public DataStructure itemType;
public ListStructure() {
public void setItemType(DataStructure itemType) {
this.itemType = itemType;
// NOTICE: 如果有自引用的話, equals 和 toString 方法, 會導致StackOverFlow 或者 oom
public String toString() {
return "ListStructure{" +
"itemType=" + itemType +
public boolean equals(Object obj) {
if (obj == null) {
return false;
if (obj instanceof ListStructure) {
ListStructure other = (ListStructure) obj;
return Objects.equals(this.itemType, other.itemType);
return false;
public boolean selfValidate() {
return itemType != null && itemType.selfValidate();
import com.alibaba.fastjson.annotation.JSONType;
import lombok.Getter;
import java.util.*;
import java.util.function.Consumer;
//@JSONType(typeName = "object")
public class ObjectStructure extends DataStructure implements Iterable<AbstractMap.SimpleImmutableEntry<String, DataStructure>> {
public List<String> fields = new ArrayList<>();
public List<DataStructure> fieldsType = new ArrayList<>();
public ObjectStructure() {
public void addField(String key) {
public void addFieldType(DataStructure type) {
// NOTICE: 如果有自引用的話, equals 和 toString 方法, 會導致StackOverFlow 或者 oom
public String toString() {
return "ObjectStructure{" +
"fields=" + fields +
", fieldsType=" + fieldsType +
public boolean equals(Object obj) {
if (obj == null) {
return false;
if (obj instanceof ObjectStructure) {
ObjectStructure other = (ObjectStructure) obj;
boolean b1 = this.fields.size() == other.fields.size();
boolean b2 = this.fieldsType.size() == other.fieldsType.size();
if (!(b1 && b2)) {
return false;
for (int i = 0; i < this.fields.size(); i++) {
String field = this.fields.get(i);
Optional<DataStructure> thisFieldType = this.getFieldType(field);
Optional<DataStructure> otherFieldType = other.getFieldType(field);
Boolean isEqual = thisFieldType.flatMap(t -> otherFieldType.map(t::equals)).orElse(false);
if (!isEqual) {
return false;
return true;
return false;
public Optional<DataStructure> getFieldType(String fieldName) {
int i = fields.indexOf(fieldName);
return (i >= 0) ? Optional.of(fieldsType.get(i)) : Optional.empty();
public String getField(int i) {
return this.fields.get(i);
public DataStructure getFieldsType(int i) {
return this.fieldsType.get(i);
public Iterator<AbstractMap.SimpleImmutableEntry<String, DataStructure>> iterator() {
return new Iterator<AbstractMap.SimpleImmutableEntry<String, DataStructure>>() {
private int i = 0;
public boolean hasNext() {
return i < fields.size();
public AbstractMap.SimpleImmutableEntry<String, DataStructure> next() {
i += 1;
return new AbstractMap.SimpleImmutableEntry<>(fields.get(i - 1), fieldsType.get(i - 1));
public boolean selfValidate() {
return fields.size() == fieldsType.size() && fieldsType.stream().allMatch(DataStructure::selfValidate);
import com.alibaba.fastjson.annotation.JSONType;
import io.swagger.v3.oas.models.media.Schema;
import java.util.*;
//@JSONType(typeName = "primitive")
public class PrimitiveStructure extends DataStructure {
private String pType = "string"; // number, string, boolean // todo date
private String format = ""; // byte,short, integer,long, unsigned_long, double, float, // todo unknown, 浮點 + 整型
public static final List<String> pTypes = Arrays.asList("string", "number", "boolean");
public static final List<String> swaggerTypes = Collections.unmodifiableList(Arrays.asList("string", "number", "boolean", "integer"));
public static PrimitiveStructure swaggerSchema2DataStructureConverter(Schema<?> schema) {
assert PrimitiveStructure.swaggerTypes.contains(schema.getType()); // 將所有非object, array的結構都認作原子/基本結構
String type;
String format = "";
if (Objects.equals(schema.getType(), "string")) {
type = "string";
} else if (Objects.equals(schema.getType(), "number")) {
type = "number";
format = schema.getFormat() == null ? "" : schema.getFormat();
} else if (Objects.equals(schema.getType(), "integer")) {
type = "number";
format = schema.getFormat() != null ? Objects.equals(schema.getFormat(), "int32") ? "integer" : "long" : "integer";
} else if (Objects.equals(schema.getType(), "boolean")) {
type = "boolean";
} else {
throw new RuntimeException("unknown swagger type: " + schema.getType());
PrimitiveStructure primitiveStructure = new PrimitiveStructure(type);
return primitiveStructure;
// 在swagger文檔中是這樣定義數據類型的
// https://swagger.io/docs/specification/data-models/data-types/
//number – Any numbers.
//number float Floating-point numbers.
//number double Floating-point numbers with double precision.
//integer – Integer numbers.
//integer int32 Signed 32-bit integers (commonly used integer type).
//integer int64 Signed 64-bit integers (long type).
public PrimitiveStructure(String pType) {
if (!pTypes.contains(pType)) {
throw new RuntimeException("unknown pType: " + pType);
this.pType = pType;
// NOTICE: 如果有自引用的話, equals 和 toString 方法, 會導致StackOverFlow 或者 oom
public String toString() {
return "PrimitiveStructure{" + "pType='" + pType + '\'' + ", format='" + format + '\'' + '}';
public boolean selfValidate() {
boolean p = Objects.equals(pType, "string") || Objects.equals(pType, "number") || Objects.equals(pType, "boolean");
boolean f = format != null;
return p && f;
public void setFormat(String format) {
this.format = format;
public String getPType() {
return this.pType;
public String getFormat() {
return this.format;
public boolean equals(Object obj) {
if (obj == null) {
return false;
if (obj instanceof PrimitiveStructure) {
PrimitiveStructure other = (PrimitiveStructure) obj;
return Objects.equals(this.pType, other.pType);
return false;