延云YDB视图表-全库资源汇聚搜索高效解决方案

延云YDB视图表-全库资源汇聚搜索高效解决方案

()         背景介绍

业务有成百上千张表,每张表有上百个列。想要根据关键字搜索含有该关键字的记录怎么办? 每张表都写一个sql么?每个列都要拼到where条件里去搜索么?no 其实对于资源汇聚类型的搜索可以一条sql就搞定,又兼容原先的表结构,业务其他部分依然可以单表查询。

视图表解决了物理表存在的如下问题;

问题一:
1.每张表都要维护自己的BUFFER,弄不好就OOM,导入性能实在太差.
2.都要单独开个线程去维护这张表,浪费资源.
3.生成的索引都很小,但是文件数量太多,频繁合并索引导致namenode非常繁忙,检索的时候文件太细,对磁盘的IOPS也是一个考验.

4.字段类型经常写错,而物理表无法更改数据类型,视图表则可以任意更改.

问题二:

要根据关键字搜索含有该关键字的所有表的对应记录

1.采用物理表意味着每张表都写一个sql,目前一共拼了NSQL  请求次数太多,对并发与响应时间影响较大

2.资源汇聚类型的搜索,通过视图表内部对应的物理表可以一条sql就搞定

3.视图表又兼容原先的表结构,对业务其他非资源汇聚类型的搜索没有影响。

 

Ybb视图表示建立在物理表的基础上的。一张物理表上面可以放很多视图表,这些表统一管理,统一心跳,统一一个索引,对外部用户来说对视图表的查询和对物理表的查询,在使用上没有什么区别。

视图表并不真实存储数据,如下图所示,视图表通过映射将数据存储在其对应的物理表中.

 

(创建视图表

1.      基本用法

创建YDB视图表的方法,与创建物理表区别不大

创建YDB视图表要使用create ydbview 语法(物理表的语法为create ydbtable)

例子如下

create ydbview  ydbview_001(

phonenum ydblike,

usernick string ,

ydb_sex string ,

ydb_province string ,

ydb_grade string,

ydb_age string,

ydb_blood string,

ydb_zhiye string  ,

ydb_earn string,

ydb_prefer string,

ydb_consume string ,

ydb_day string ,

amtdouble tdouble,

amtlong tlong physical@amtlong_y_tlong_id,

content ydblike

)

;

2.      为视图表指定不同的物理表

默认视图表背后对应的存储物理表为ydb_physical_table,如果想根据业务将视图表拆分到不同的物理表中,可以显示在创建视图表的时候指定物理表.

目前通过 physical_example ydb_physical_table, 在创建表的时候 指定视图表对应的物理表physical_example 为物理表(物理表physical_example会自动创建)

示例如下:

create ydbview  ydbview_001(

physical_example ydb_physical_table,

phonenum ydblike,

usernick string ,

ydb_sex string ,

ydb_province string ,

ydb_grade string,

ydb_age string,

ydb_blood string,

ydb_zhiye string  ,

ydb_earn string,

ydb_prefer string,

ydb_consume string ,

ydb_day string ,

amtdouble tdouble,

amtlong tlong physical@amtlong_y_tlong_id,

content ydblike

)

;

3.       设置字段汇聚方式

因设计视图表的初衷是为了进行汇聚搜索,默认视图表中的每个字段除了单独存储外,还会在物理表的ydbview_combine进行汇聚,如果我们不需要进行这种汇聚,可以在创建视图表的时候进行标记.

下面示例中 ydb_viewstore_config ydbconfig

viewstore@all viewstore@alllike, 表示默认对所有字段都进行汇聚, 该列除了1,本身单独存储外,2还会在物理表的汇聚列进行存储,3 并且在汇聚列里会创建索引,用于ydbview_like like '@@AUTH_ACCOUNT@15840480528' 的搜索

viewstore@combineonly  viewstore@combinelikeonly 表示只汇聚,而不进行单独的物理存储,

viewstore@fieldonly表示只在对应的字段里进行物理存储但不会进行汇聚

viewstore@alltext          表示该列除了    1,本身单独存储外,2还会在物理表的汇聚列进行存储,3 但在汇聚列里不会创建索引
##viewstore@combinetextonly  当前列不存储,但发往汇聚列存储
##viewstore@combinelikeonly  当前列不存储,但发往汇聚列存储,并在汇聚列进行索引,供搜索使用

另外也可以针对没一列进行个性化的配置

 

 

create ydbview  ydbview_example(

physical_example ydb_physical_table,

ydb_viewstore_config ydb_viewstore_config viewstore@all,

phonenum ydblike,

usernick string viewstore@all,

ydb_sex string viewstore@combineonly,

ydb_province string viewstore@fieldonly,

ydb_grade string,

ydb_age string,

ydb_blood string,

ydb_zhiye string ,

ydb_earn string,

ydb_prefer string,

ydb_consume string ,

ydb_day string physical@ydbday_y_string_id,

amtdouble tdouble,

amtlong tlong,

content ydblike

)

;

 

4.      设置视图表与物理表的字段映射关系

默认视图表与物理表额映射关系是根据字段的名字通过hash算法进行一对一映射的,这样针对视图表的查询本身不会有任何的影响,但是在物理表中,这个字段是不可见的.但是有的时候,我们也想在物理表中除了汇聚列可见外,有些特殊的列如时间列,也是可见的,这样在汇聚搜索的时候,除了可以进行汇聚搜索,有可以按照时间进行排序.

 

针对这种场景可以显示的通过physical参数来指定当前视图表在其物理表中的映射名字,physical@amtlong_y_tlong_id,表示将当前字段映射到 amtlong_y_tlong_id 的字段里,这个字段本质上是一个动态列(物理表里无法预先知道视图表的字段名,故只能采用动态列),amtlong是一个名字y_tlong_id是映射的数据类型,这里的数据类型大家一定要采用y_开头的类型,目前不支持简写.

create ydbview  ydbview_example(

physical_example ydb_physical_table,

ydb_viewstore_config ydb_viewstore_config viewstore@all,

phonenum ydblike,

usernick string viewstore@all,

ydb_sex string viewstore@combineonly,

ydb_province string viewstore@fieldonly,

ydb_grade string,

ydb_age string,

ydb_blood string,

ydb_zhiye string ,

ydb_earn string,

ydb_prefer string,

ydb_consume string ,

ydb_day string physical@ydb_day_y_string_id,

amtdouble tdouble,

amtlong tlong,

content ydblike

)

;

()         向视图表中导入数据

向视图表中导入数据的方法与物理表中导入数据的方法完全一样.

示例如下

 

./bin/ydbimport.sh -ov -t ydbview_001 -p 3000w -tp txt  -f /data/example/ydb -sp , -fl phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,amtdouble,amtlong,content

 

--也可以像下面这样在JDBC中执行

ydbimport ["-ov" ,"-t" ,"ydbview_001" ,"-p" ,"3000w" ,"-tp" ,"txt" ,"-f" ,"/data/example/ydb" ,"-sp" ,"," ,"-fl" ,"phonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_day,amtdouble,amtlong,content"]

 

()         视图表的查询

 

----count(*)计数

select count(*) from ydbview_003 where ydbpartion = '3000w' limit 10 ;

----数据预览

select * from ydbview_003 where ydbpartion = '3000w' limit 10 ;

----全文检索

select content,usernick from ydbview_003 where ydbpartion = '3000w' and content='王老吉' limit 10 ;

----高性能like匹配(注意:该字段类型为ydblike类型,所以能保证匹配顺序)

select content,usernick from ydbview_003 where ydbpartion = '3000w' and content like '170*王老吉' limit 10 ;

select phonenum,usernick from ydbview_003 where ydbpartion = '3000w' and phonenum like '^170' limit 10 ;

select phonenum,usernick from ydbview_003 where ydbpartion = '3000w' and phonenum like '436^' limit 10 ;

select phonenum,usernick from ydbview_003 where ydbpartion = '3000w' and phonenum like '1*0*7*88' limit 10 ;

select phonenum,usernick from ydbview_003 where ydbpartion = '3000w' and phonenum like '^170*88^' limit 10 ;

 

----高性能not like匹配(注意:该字段类型为ydblike类型,所以能保证匹配顺序)

select content,usernick from ydbview_003 where ydbpartion = '3000w' and content not like '170*王老吉' and content='王老吉' limit 10 ;

 

----普通string 类型的like匹配(注意:改字段类型为string类型,本质是暴力扫描,如果该列重复值很低,那么性能并不好)

select ydb_province,count(*) from ydbview_003 where ydbpartion = '3000w' and ydb_province like '**' group by ydb_province limit 10 ;

----普通string 类型的not like匹配(注意:改字段类型为string类型,本质是暴力扫描,如果该列重复值很低,那么性能并不好)

select ydb_province,count(*) from ydbview_003 where ydbpartion = '3000w' and ydb_province not like '**' and ydb_province like '**' group by ydb_province limit 10 ;

----多个条件组合过滤

select ydb_sex,ydb_grade,ydb_age,ydb_blood,amtlong from ydbview_003 where ydbpartion = '3000w' and ydb_sex='' and ydb_grade='本科' and (ydb_age='2030' or ydb_blood='O') and  (amtlong like '([3000 TO 4000] )') limit 10 ;

----sum求和

select sum(amtdouble) from ydbview_003 where ydbpartion = '3000w' ;

----avg求平均数

select avg(amtdouble) as avgamt from ydbview_003 where ydbpartion = '3000w' ;

----更复杂点的统计

select count(*),count(amtdouble),avg(amtdouble),sum(amtdouble),min(amtdouble),max(amtdouble),min(ydb_province),max(ydb_province) from ydbview_003 where ydbpartion = '3000w' ;

 

----单列group by

select ydb_sex,count(*),count(amtdouble),sum(amtdouble) from ydbview_003 where ydbpartion = '3000w' group by ydb_sex limit 10 ;

 

----多列group by

select ydb_sex,ydb_province,count(*) as cnt,count(amtdouble),sum(amtdouble) from ydbview_003 where ydbpartion = '3000w' group by ydb_sex,ydb_province order by cnt desc limit 10 ;

 

----top N 排序 单列

select ydb_sex, phonenum,amtlong,amtdouble from ydbview_003 where ydbpartion='3000w'  order by amtdouble desc  limit 10 ;

 

----top N 排序 多列

select ydb_sex, phonenum,amtlong,amtdouble from ydbview_003 where ydbpartion='3000w'  order by amtdouble desc ,amtlong limit 10 ;

 

 

 

--查看YDB

show tables ;

@hiveonly show tables;

show tables detail;

--查看表结构

desc ydbview_003;

@hiveonly desc ydbview_003 ;

--查看表的分区

show partitions ydbview_003 ;

@hiveonly show partitions ydbview_003;

 

--按条件删除

delete from ydbview_003 where ydbpartion='3000w' and ydb_sex='' and ydb_blood='A'

--将对应分区数据清零

truncate view ydbview_003 partions 3000w ;

truncate view ydbview_003 partions 3000w1,3000w2

--清理所有分区

truncate view ydbview_003  ;

 

--清理所有分区数据并删除视图表表结构

drop view ydbview_003 ;

 

()         对应物理表(汇聚表)的常见使用方法

1. --预览数据

(这些字段只能支持返回,但不能用于检索,本质上是从汇聚字段里拆解开返回的)

select ydbview_name,

phonenum, usernick, ydb_sex, ydb_province, ydb_grade, ydb_age, ydb_blood, ydb_zhiye, ydb_earn, ydb_prefer, ydb_consume, ydb_day, amtdouble, amtlong, content

from ydb_physical_table  where ydbpartion = '3000w'  limit 10;

2. --查看每张视图表的数据量

select count(*) from ydb_physical_table where ydbpartion = '3000w'  limit 10

select ydbview_name,count(*) from ydb_physical_table where ydbpartion = '3000w'  group by ydbview_name limit 10

3. 全库与全字段汇聚搜索

select phonenum,ydb_sex,ydbview_like from ydb_physical_table where ydbpartion='3000w' and ydbview_like like '20160311' limit 20

4. 对具体某一个表的汇聚搜索

select ydbview_name,phonenum,ydb_sex,ydbview_like from ydb_physical_table where ydbpartion='3000w' and ydbview_like like '许桂英and ydbview_name='ydbview_003' limit 20

5. 对具体某一个列的汇聚搜索

select ydbview_name,phonenum,ydb_sex,ydbview_like from ydb_physical_table where ydbpartion='3000w' and ydbview_like like '@@phonenum@13297145763'  limit 20

6. --按照视图表显示设置的 对应的物理表的amtlong_y_tlong_id 字段排序

select ydbview_name,amtlong_y_tlong_id,ydbview_like from ydb_physical_table  where ydbpartion = '3000w' and ydbview_like like '@@phonenum@13297145763' order by amtlong_y_tlong_id desc  limit 10;