7-Apache Hive SQL DQL、参数配置与函数

hadoop离线day07-Apache Hive SQL DQL、参数配置与函数

今日课程学习目标

#掌握HiveQL DQL查询语句
	基础select查询语法和MySQL一样 
	侧重记忆不一样的（见下面大纲）
#掌握HiveSQL join查询
	重点是内关联、左关联
#学会Hive shell命令行与参数配置
#理解函数分类标准(UDF、UDTF、UDAF)
	根据函数输入和输出的行数
		一进一出
		一进多出
		多进一出
#掌握常用函数的使用
	见多识广，多用多会，不用就忘

今日课程内容大纲

#1、PyCharm|IDEA Hive插件
	支持HQL语法，作为客户端访问HS2
#2、HQL DQL 数据查询语言
	cluster by
	distribute by+sort by
	order by
	union
	CTE
	Hive的join语法
#3、hive的shell命令行和参数配置方式
#4、Hive的函数
	函数分类：内置函数、用户定义函数
	UDF、UDTF、UDAF

知识点01：Hive SQL--DQL--基础查询语句及其注意事项

1.1、select语法树

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
JOIN table_other ON expr
[WHERE where_condition]
[GROUP BY col_list [HAVING condition]]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]
]
[LIMIT number]

不管是写select语句还是看select语句，==from关键字==及后面的表是最重要；
其后面的表可能是一张真实物理存在的表，也可能是虚拟的表（查询的结果视图view）
select基础语法

⭐⭐SQL的重点⭐⭐

where语句中为什么不能使用聚合函数？
- 因为聚合函数要使用它的前提是结果集已经确定。而where子句还处于“确定”结果集的过程中，因而不能使用聚合函数。
Having和where的区别？
- having在分组后对数据进行过滤，能使用聚合函数，where在分组前对数据过滤，不能使用聚合函数
- HAVING子句可以让我们筛选分组后的各组数据,并且可以在Having中使用聚合函数，因为此时where，group by已经执行结束，结果集已经确定。
group by语法中的查询字段的限制？为什么要求是分组字段或者是聚合函数应用的字段
- 一个分组中非分组字段和聚合函数字段会有多个值，但在查询后每组只占一行，所以冲突无法显示
- 省流：避免出现一个字段多个值的歧义
梳理执行顺序？
- 顺序：from〉where〉group（含聚合）>having>order〉select；

知识点02：Hive SQL--DQL--高阶查询--cluster、distribute、sort

CLUSTER BY 分桶查询

为了探究底层细节，建议使用beeline客户端练习方面查看查询过程日志。

语法
```
select * from student;  --普通查询
select * from student cluster by num; --分桶查询 根据学生编号进行分桶查询
```
Q：分为几个部分？分的规则是什么？
- 分为几个部分取决于reducetask个数
- 分的规则和分桶表的规则一样 hashfunc(字段) % reducetask个数

reducetask个数是如何确定的？ reducetask个数就决定了最终数据分为几桶。

--如果用户没有设置,不指定reduce task个数。则hive根据表输入数据量自己评估
--日志显示：Number of reduce tasks not specified. Estimated from input data size: 1
select * from student cluster by num;

--手动设置reduce task个数
--日志显示：Number of reduce tasks not specified. Defaulting to jobconf value of: 2
set mapreduce.job.reduces =2;
select * from student cluster by num;
----分桶查询的结果真的根据reduce tasks个数分为了两个部分，并且每个部分中还根据了字段进行了排序。

--总结：cluster by xx  分且排序的功能
	  分为几个部分 取决于reducetask个数
	  排序只能是正序 用户无法改变
	   
--需求：把student表数据根据num分为两个部分，每个部分中根据年龄age倒序排序。	
set mapreduce.job.reduces =2;
select  * from student cluster by num order by age desc;
select  * from student cluster by num sort by age desc;
--FAILED: SemanticException 1:50 Cannot have both CLUSTER BY and SORT BY clauses

DISTRIBUTE BY+SORT BY

功能：相当于把cluster by的功能一分为二。

distribute by只负责分;
sort by只负责分之后的每个部分排序。
并且分和排序的字段可以不一样。

--当后面分和排序的字段是同一个字段 加起来就相等于cluster by
CLUSTER BY(分且排序) = DISTRIBUTE BY（分）+SORT BY（排序） 

--下面两个功能一样的
select  * from student cluster by num;
select  * from student distribute by num sort by num;

--最终实现
select  * from student distribute by num sort by age desc;

ORDER BY

--首先我们设置一下reducetask个数，随便设置
--根据之前的探讨，貌似用户设置几个，结果就是几个，但是实际情况如何呢？
set mapreduce.job.reduces =2;
select  * from student order by age desc;

--执行中日志显示
Number of reduce tasks determined at compile time: 1 --不是设置了为2吗 

--原因：order by是全局排序。全局排序意味着数据只能输出在一个文件中。因此也只能有一个reducetask.
--在order by出现的情况下，不管用户设置几个reducetask,在编译执行期间都会变为一个，满足全局。

order by 和sort by
- order by负责全局排序意味着整个mr作业只有一个reducetask 不管用户设置几个编译期间hive都会把它设置为1。
- sort by负责分完之后局部排序。

7-Apache Hive SQL DQL、参数配置与函数_image.png

知识点03：Hive SQL--DQL--高阶查询--Union联合查询

注意合并细节

完整版select语法树

[WITH CommonTableExpression (, CommonTableExpression)*]
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
  FROM table_reference
  [WHERE where_condition]
  [GROUP BY col_list]
  [ORDER BY col_list]
  [CLUSTER BY col_list
    | [DISTRIBUTE BY col_list] [SORT BY col_list]
  ]
 [LIMIT [offset,] rows];

union联合查询

UNION用于将来自多个SELECT语句的结果合并为一个结果集。

--语法规则
select_statement UNION [ DISTINCT|ALL ] select_statement UNION [ALL | DISTINCT] select_statement ...;

--使用DISTINCT关键字与使用UNION默认值效果一样，都会删除重复行。
select num,name from student_local
UNION
select num,name from student_hdfs;
--和上面一样
select num,name from student_local
UNION DISTINCT
select num,name from student_hdfs;

--使用ALL关键字会保留重复行。
select num,name from student_local
UNION ALL
select num,name from student_hdfs limit 2;

--如果要将ORDER BY，SORT BY，CLUSTER BY，DISTRIBUTE BY或LIMIT应用于单个SELECT
--请将子句放在括住SELECT的括号内
SELECT num,name FROM (select num,name from student_local LIMIT 2)  subq1
UNION
SELECT num,name FROM (select num,name from student_hdfs LIMIT 3) subq2;

--如果要将ORDER BY，SORT BY，CLUSTER BY，DISTRIBUTE BY或LIMIT子句应用于整个UNION结果
--请将ORDER BY，SORT BY，CLUSTER BY，DISTRIBUTE BY或LIMIT放在最后一个之后。
select num,name from student_local
UNION
select num,name from student_hdfs
order by num desc;

知识点04：Hive SQL--DQL--高阶查询--CTE表达式

Important

通用表表达式（CTE）是一个临时结果集，该结果集是从WITH子句中指定的简单查询
派生而来的，该查询紧接在SELECT或INSERT关键字之前。

通俗解释：sql开始前定义一个SQL片断，该SQL片断可以被后续整个SQL语句所用到，并且可以多次使用。

--select语句中的CTE
with q1 as (select num,name,age from student where num = 95002)
select *
from q1;

-- from风格
with q1 as (select num,name,age from student where num = 95002)
from q1
select *;

-- chaining CTEs 链式
with q1 as ( select * from student where num = 95002),
     q2 as ( select num,name,age from q1)
select * from (select num from q2) a;

-- union
with q1 as (select * from student where num = 95002),
     q2 as (select * from student where num = 95004)
select * from q1 union all select * from q2;

-- ctas  ⭐⭐⭐⭐⭐
-- creat table as select 创建一张表来自于后面的查询语句  表的字段个数 名字 顺序和数据行数都取决于查询
-- create table t_ctas as select num,name from student limit 2;

create table s2 as
with q1 as ( select * from student where num = 95002)
select * from q1;

-- view
create view v1 as
with q1 as ( select * from student where num = 95002)
select * from q1;

select * from v1;

CTAS语法
7-Apache Hive SQL DQL、参数配置与函数_image-1.png

知识点05：Hive SQL--DQL--Join语法及其使用（侧重inner、left）

数据库《三大范式设计》

--为什么在SQL类语言中有join语法出现？
基于sql业务的实际使用，针对不同的业务创建不同的表，数据保存在不同的表中。
有的业务需求基于多份数据共同组合查询才能返回，基于多张表进行查询，所以有了join关联查询。

语法树

join_table:
    table_reference [INNER] JOIN table_factor [join_condition]
  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition
  | table_reference LEFT SEMI JOIN table_reference join_condition
  | table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10)
 
join_condition:
    ON expression

具体6种join方式，重点掌握 inner 和left join（左连接以左表为准，右表有满足的则显示，没有则返回null）。
全外连接：把A和B去掉重复
- 在功能上：等价于对这两个数据集合分别进行左外连接和右外连接，然后再使用消去重复行的操作将上述两个结果集合并为一个结果集。
左半开连接：
- 执行内连接后只显示左表的记录
cross join：慎用2w×5w=10亿

--Join语法练习 建表
drop table if exists employee_address;
drop table if exists employee_connection;
drop table if exists employee;

--table1: 员工表
CREATE TABLE employee(
   id int,
   name string,
   deg string,
   salary int,
   dept string
 ) row format delimited
fields terminated by ',';

--table2:员工家庭住址信息表
CREATE TABLE employee_address (
    id int,
    hno string,
    street string,
    city string
) row format delimited
fields terminated by ',';

--table3:员工联系方式信息表
CREATE TABLE employee_connection (
    id int,
    phno string,
    email string
) row format delimited
fields terminated by ',';

--加载数据到表中
load data local inpath '/root/hivedata/employee.txt' into table employee;
load data local inpath '/root/hivedata/employee_address.txt' into table employee_address;
load data local inpath '/root/hivedata/employee_connection.txt' into table employee_connection;

select * from employee;
+--------------+----------------+---------------+------------------+----------------+
| employee.id  | employee.name  | employee.deg  | employee.salary  | employee.dept  |
+--------------+----------------+---------------+------------------+----------------+
| 1201         | gopal          | manager       | 50000            | TP             |
| 1202         | manisha        | cto           | 50000            | TP             |
| 1203         | khalil         | dev           | 30000            | AC             |
| 1204         | prasanth       | dev           | 30000            | AC             |
| 1206         | kranthi        | admin         | 20000            | TP             |
| 1201         | gopal          | manager       | 50000            | TP             |
| 1202         | manisha        | cto           | 50000            | TP             |
| 1203         | khalil         | dev           | 30000            | AC             |
| 1204         | prasanth       | dev           | 30000            | AC             |
| 1206         | kranthi        | admin         | 20000            | TP             |
+--------------+----------------+---------------+------------------+----------------+

select * from employee_address;


select * from employee_connection;

--1、内连接  inner join == join
  返回左右两边同时满足条件的数据
  
select e.*,e_a.*
from employee e inner join employee_address e_a
on e.id =e_a.id;

--等价于 inner join
select e.*,e_a.*
from employee e join employee_address e_a
on e.id =e_a.id;

--等价于 隐式连接表示法
select e.*,e_a.*
from employee e , employee_address e_a
where e.id =e_a.id;

  
--2、左连接  left join  ==  left OUTER join
  左表为准，左表全部显示，右表与之关联 满足条件的返回，不满足条件显示null
  
select e.*,e_conn.*
from employee e left join employee_connection e_conn
on e.id =e_conn.id;

--等价于 left outer join 左外连接
select e.id,e.*,e_conn.*
from employee e left outer join  employee_connection e_conn
on e.id =e_conn.id;  
  
--3、右连接  right join  ==  right OUTER join 右外连接
  右表为准，右表全部显示，左表与之关联 满足条件的返回，不满足条件显示null
 
select e.id,e.*,e_conn.*
from employee e right join employee_connection e_conn
on e.id =e_conn.id;

--等价于 right outer join
select e.id,e.*,e_conn.*
from employee e right outer join employee_connection e_conn
on e.id =e_conn.id;

--4、外连接 全外连接 full join == full outer join
FULL OUTER JOIN 关键字只要左表（table1）和右表（table2）其中一个表中存在匹配，则返回行.
FULL OUTER JOIN 关键字结合了 LEFT JOIN 和 RIGHT JOIN 的结果。

select e.*,e_a.*
from employee e full outer join employee_address e_a
on e.id =e_a.id;
--等价于
select e.*,e_a.*
from employee e full join employee_address e_a
on e.id =e_a.id;

--5、左半连接 left semi join

select *
from employee e left semi join employee_address e_addr
on e.id =e_addr.id;

--相当于 inner join,但是只返回左表全部数据， 只不过效率高一些
select e.*
from employee e inner join employee_address e_addr
on e.id =e_addr.id;

--6、交叉连接cross join
将会返回被连接的两个表的笛卡尔积，返回结果的行数等于两个表行数的乘积。对于大表来说，cross join慎用。
在SQL标准中定义的cross join就是无条件的inner join。返回两个表的笛卡尔积,无需指
定关联键。
在HiveSQL语法中，cross join 后面可以跟where子句进行过滤，或者on条件过滤。

知识点06：Hive SQL--DQL--Join语法注意事项

优化方式在最后一天调优课程中学习；
允许使用复杂的联接表达式，支持非等值连接；
同一查询中可以连接2个以上的表；
如果每个表在联接子句中使用相同的列，则Hive将多个表上的联接转换为单个MR作业
join时的最后一个表会通过reducer流式传输，并在其中缓冲之前的其他表，因此，将大表放置在最后有助于减少reducer阶段缓存数据所需要的内存
在join的时候，可以通过语法STREAMTABLE提示指定要流式传输的表。如果省略STREAMTABLE提示，则Hive将流式传输最右边的表。

知识点07：Hive SQL--第一代客户端的功能

批处理：一次连接，一次交互，执行结束断开连接
交互式处理：保持持续连接，一直交互

注意：如果说hive的shell客户端指的是第一代客户端bin/hive

而第二代客户端bin/beeline属于JDBC客户端不是shell。

bin/hive

功能1：作为第一代客户端连接访问metastore服务，使用Hive。交互式方式

功能2：启动hive服务

/export/server/apache-hive-3.1.2-bin/bin/hive --service metastore 
/export/server/apache-hive-3.1.2-bin/bin/hive --service hiveserver2

功能3：批处理执行Hive SQL 命令后加-e后跟命令，-f后跟脚本文件

#-e 执行后面的sql语句
/export/server/apache-hive-3.1.2-bin/bin/hive  -e 'select * from itheima.student'

#-f 执行后面的sql文件
vim hive.sql
select * from itheima.student limit 2

/export/server/apache-hive-3.1.2-bin/bin/hive  -f hive.sql

#sql文件不一定是.sql 要保证文件中是正确的HQL语法。

#-f调用sql文件执行的方式 是企业中hive生产环境主流的调用方式。

知识点08：Hive SQL--参数配置方式与优先级范围

有哪些参数可以配置？

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

配置方式有哪些？注意配置方式影响范围影响时间是怎样？

方式1：配置文件 con/hive-site.xml

影响的是基于这个安装包的任何使用方式。

方式2：配置参数 --hiveconf

/export/server/apache-hive-3.1.2-bin/bin/hive --service metastore  

/export/server/apache-hive-3.1.2-bin/bin/hive --service hiveserver2  --hiveconf hive.root.logger=DEBUG,console

#影响的是session会话级别的

方式3：set命令

session会话级别的 设置完之后将会对后面的sql执行生效。
session结束 set设置的参数将失效。

也是推荐搭建使用的设置参数方式。  谁需要 谁设置 谁生效

总结
- 从方式1到方式3 影响的范围是越来越小的。
- 从方式1到方式3 优先级越来越高。set命令设置的会覆盖其他的。
- Hive作为的基于Hadoop的数仓，也会把Hadoop 的相关配置解析加载进来。

知识点09：Hive SQL--内置常见运算符

官方链接：https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
查看运算符和函数的帮助手册

可以使用课程资料中中文版参考手册

--显示所有的函数和运算符
show functions;
--查看运算符或者函数的使用说明
describe function +;
--使用extended 可以查看更加详细的使用说明
describe function extended +;

具体分类
- 关系运算符
- 算术运算符
- 逻辑运算符

--1、创建表dual
create table dual(id string);
--2、加载一个文件dual.txt到dual表中
--dual.txt只有一行内容：内容为一个空格
load data local inpath '/root/hivedata/dual.txt' into table dual;
--3、在select查询语句中使用dual表完成运算符、函数功能测试
select 1+1 from dual;

select 1+1;

----------------Hive中关系运算符--------------------------
--is null空值判断
select 1 from dual where 'itcast' is null;

--is not null 非空值判断
select 1 from dual where 'itcast' is not null;

--like比较： _表示任意单个字符 %表示任意数量字符
- 在mysql中叫模糊查询
--否定比较： NOT A like B
select 1 from dual where 'itcast' like 'it_';
select 1 from dual where 'itcast' like 'it%';
select 1 from dual where  'itcast' not like 'hadoo_';
select 1 from dual where  not 'itcast' like 'hadoo_';

- rlike：确定字符串是否匹配正则表达式，是REGEXP_LIKE()的同义词。
select 1 from dual where 'itcast' rlike '^i.*t

-----

#### 知识点10：Hive SQL--函数及分类标准（UDF、UDAF、UDTF）

内置的函数（==build in func==）

> 所谓的内置指的是hive开发好，可以直接上手使用的；

- 内置函数往往根据函数的应用功能类型来分类
- 日期函数、数字函数、字符串函数、集合函数、条件函数....



用户定义函数（==user-defined function==）

> 用户编程实现函数的逻辑在hive中使用。

- UDF根据函数==输入行数和输出行数==进行分类

- UDF 、UDAF、UDTF![7-Apache Hive SQL DQL、参数配置与函数_image-7.png](/img/user/czc%E7%9F%A5%E8%AF%86%E5%BA%93/%E6%9D%82%E4%B8%83%E6%9D%82%E5%85%AB/9-%E9%99%84%E4%BB%B6/%E9%99%84%E4%BB%B6/7-Apache%20Hive%20SQL%20DQL%E3%80%81%E5%8F%82%E6%95%B0%E9%85%8D%E7%BD%AE%E4%B8%8E%E5%87%BD%E6%95%B0_image-7.png)

  ```shell
  #1、UDF（User-Defined-Function）普通函数 一进一出  输入一行数据输出一行数据
  
  0: jdbc:hive2://node1:10000> select split("allen woon hadoop"," ");
  +----------------------------+--+
  |            _c0             |
  +----------------------------+--+
  | ["allen","woon","hadoop"]  |
  +----------------------------+--+
  
  #2、UDAF（User-Defined Aggregation Function）聚合函数，多进一出 输入多行输出一行
  
  count sum max  min  avg
  
  #3、UDTF（User-Defined Table-Generating Functions）表生成函数 一进多出 输入一行输出多行
  
  explode 、parse_url_tuple

UDF分类标准的扩大化

本来，udf/udtf/udaf3个标准是针对用户自定义函数分类的；
但是，现在可以将这个分类标准扩大到hive中所有的函数，包括内置函数和自定义函数；
不要被UD这两个字母所影响。 Built-in Aggregate Functions (UDAF).

函数相关的常用帮助命令

--显示所有的函数和运算符
show functions;
--查看运算符或者函数的使用说明
describe function +;
desc function 
--使用extended 可以查看更加详细的使用说明
describe function extended count;

知识点11：Hive SQL--常用内置函数梳理--字符串函数


desc function extended concat_ws;
-- 注意：查看一个函数的使用方法究竞看什么？
-- 看方法的签名列表concat_ws（separator，[string|array（string)]+)
什么是签名列表：（函数或者方法需要几个参数，什么类型的参数，参数的顺序）


-- StringFunctions字符串函数--
select concat("angela","baby");
-- 带分隔符字符串连接函数：concat_ws（separator，[string|array(string)】+)
select concat_ws（'.','www',array('itcast', 'cn'));


--字符串截取函数：substr(str, pos[, len]) 或者  substring(str, pos[, len])
select substr("angelababy",-2); --pos是从1开始的索引，如果为负数则倒着数
select substr("angelababy",2,2);


正则表达式替换函数：regexp_replace(str, regexp, rep)
select regexp_replace('100-200', '(\\d+)', 'num'); --正则分组
                                  所有数字替换成'num'：'num-num'

--正则表达式解析函数：regexp_extract(str, regexp[, idx]) 提取正则匹配到的指定组内容
select regexp_extract('100-200', '(\\d+)-(\\d+)', 2);


URL解析函数：parse_url 注意要想一次解析出多个 可以使用parse_url_tuple这个UDTF函数
select parse_url('http://www.itcast.cn/path/p1.php?query=1', 'HOST'); 
select parse_url('http://www.itcast.cn/path/p1.php?query=1', 'PATH');


分割字符串函数: split(str, regex)： regex表示支持正则
select split('apache hive', '\\s+'); --匹配一个或者多个空白符
                    **记住**：'\s'表示：匹配任何空白字符，包括空格、制表符、换页符等等

--json解析函数：get_json_object(json_txt, path)
--$表示json对象
select get_json_object('[{"website":"www.itcast.cn","name":"allenwoon"}, {"website":"cloud.itcast.com","name":"carbondata 中文文档"}]', '$.[1].website');

知识点12：Hive SQL--内置函数梳理--时间日期、数值

Date Functions 日期函数

日期和时间戳数字之间的转换

unix_timestamp 日期转unix时间戳

from_unixtime unix时间戳转日期

date_add

date_sub

datediff

--获取当前日期: current_date
select current_date();
--获取当前时间戳: current_timestamp
--同一查询中对current_timestamp的所有调用均返回相同的值。
select current_timestamp();
--获取当前UNIX时间戳函数: unix_timestamp
select unix_timestamp();
--日期转UNIX时间戳函数: unix_timestamp

select unix_timestamp("2011-12-07 13:01:03");
👆要求日期格式必须是标准的日期格式：data 年-月-日 时:分:秒

--指定格式日期转UNIX时间戳函数: unix_timestamp
select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss');
👆自定义时间格式需要手动设置

- UNIX时间戳转日期函数: from_unixtime
select from_unixtime(1620723323);
select from_unixtime(0, 'yyyy-MM-dd HH:mm:ss'); ：'1970-01-01 00:00:00'


--日期比较函数: datediff  日期格式要求'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'
select datediff('2012-12-08','2012-05-09');
--日期增加函数: date_add
select date_add('2012-02-28',10);
--日期减少函数: date_sub
select date_sub('2012-01-1',10);

Mathematical Functions 数学函数

round 取整

rand 取随机值

--取整函数: round  返回double类型的整数值部分 （遵循四舍五入）
select round(3.1415926);
--指定精度取整函数: round(double a, int d) 返回指定精度d的double类型
select round(3.1415926,4);
--向下取整函数: floor
select floor(3.1415926);
select floor(-3.1415926);：-4
--向上取整函数: ceil
select ceil(3.1415926);
select ceil(-3.1415926);：-3
--取随机数函数: rand 每次执行都不一样 返回一个0到1范围内的随机数
select rand();
--指定种子取随机数函数: rand(int seed) 得到一个稳定的随机数序列
select rand(5);

知识点13：Hive SQL--内置函数梳理--条件转换、集合、加密

Conditional Functions 条件函数

都重要。尤其是case when

--if条件判断: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
select if(1=2,100,200);
select if(sex ='男','M','W') from student limit 3; 
👆如果是男，返回m，否则w

--空判断函数: isnull( a )
select isnull("allen");
select isnull(null);

--非空判断函数: isnotnull ( a )
select isnotnull("allen");
select isnotnull(null);

-空值转换函数: nvl(T value, T default_value)
把可能存在的null值转为0，因为任何值+null则一定返回null
⭐将NULL值替换为指定的默认值，若第一个参数非NULL则返回自身，否则返回第二个参数。⭐
select nvl("allen","itcast");
select nvl(null,"itcast");

--非空查找函数: COALESCE(T v1, T v2, ...)
-返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL
select COALESCE(null,11,22,33);
select COALESCE(null,null,null,33);
select COALESCE(null,null,null);

- 条件转换函数: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
简单CASE表达式，将表达式与每个WHEN后的条件匹配，返回第一个匹配的结果，否则返回ELSE值（无ELSE时默认返回NULL）。结果：mary
select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end;
对100判断，如果是50则返回tom，如果是100则返回mary，否则返回tim，这里显然返回mary
select case sex when '男' then 'male' else 'female' end from student limit 3;

Type Conversion Functions 类型转换函数

前置知识：Hive中支持类型的隐式转换有限制自动转换不保证成功就显示null

cast显示类型转换函数

--任意数据类型之间转换:cast
- cast属于用户的显示类型转换，hive底层还会进行隐式类型转换，可能失败
select cast(12.14 as bigint);
select cast(12.14 as string);
select cast("hello" as int);
+-------+
|  _c0  |
+-------+
| NULL  |
+-------+

Data Masking Functions 数据脱敏函数

mask脱敏掩码处理

数据脱敏：让敏感数据不敏感 13455667788 --->134****7788

--mask
--将查询回的数据，大写字母转换为X，小写字母转换为x，数字转换为n。
select mask("abc123DEF"); 输出：xxxnnnXXX
select mask("abc123DEF",'-','.','^'); 自定义替换的字母 输出：...^^^---
👆'-','.','^'表示大写字母转换为-，小写字母转换为.，数字转换为^^^


--mask_first_n(string str[, int n]
--对前n个进行脱敏替换
select mask_first_n("abc123DEF",4);

--mask_last_n(string str[, int n])
select mask_last_n("abc123DEF",4);

--mask_show_first_n(string str[, int n])
--除了前n个字符，其余进行掩码处理
select mask_show_first_n("abc123DEF",4);

--mask_show_last_n(string str[, int n])
select mask_show_last_n("abc123DEF",4);

--mask_hash(string|char|varchar str)
--返回字符串的hash编码。
select mask_hash("abc123DEF");

Misc. Functions 其他杂项函数、加密函数

--如果你要调用的java方法所在的jar包不是hive自带的 可以使用add jar添加进来
--hive调用java方法: java_method(class, method[, arg1[, arg2..]])
select java_method("java.lang.Math","max",11,22);

--反射函数: reflect(class, method[, arg1[, arg2..]])
select reflect("java.lang.Math","max",11,22);

--取哈希值函数:hash
select hash("allen");

--current_user()、logged_in_user()、current_database()、version()

--SHA-1加密: sha1(string/binary)
select sha1("allen");

--SHA-2家族算法加密：sha2(string/binary, int)  (SHA-224, SHA-256, SHA-384, SHA-512)
select sha2("allen",224);
select sha2("allen",512);

--crc32加密:
select crc32("allen");

--MD5加密: md5(string/binary)
select md5("allen");

;
select 1 from dual where '123456' rlike '^\d+

知识点10：Hive SQL--函数及分类标准（UDF、UDAF、UDTF）

内置的函数（build in func）

所谓的内置指的是hive开发好，可以直接上手使用的；

内置函数往往根据函数的应用功能类型来分类
日期函数、数字函数、字符串函数、集合函数、条件函数....

用户定义函数（user-defined function）

用户编程实现函数的逻辑在hive中使用。

UDF根据函数输入行数和输出行数进行分类
UDF 、UDAF、UDTF

{{CODE_BLOCK_22}}

UDF分类标准的扩大化

本来，udf/udtf/udaf3个标准是针对用户自定义函数分类的；
但是，现在可以将这个分类标准扩大到hive中所有的函数，包括内置函数和自定义函数；
不要被UD这两个字母所影响。 Built-in Aggregate Functions (UDAF).

函数相关的常用帮助命令

知识点11：Hive SQL--常用内置函数梳理--字符串函数

知识点12：Hive SQL--内置函数梳理--时间日期、数值

Date Functions 日期函数

日期和时间戳数字之间的转换

unix_timestamp 日期转unix时间戳

from_unixtime unix时间戳转日期

date_add

date_sub

datediff

Mathematical Functions 数学函数

round 取整

rand 取随机值

知识点13：Hive SQL--内置函数梳理--条件转换、集合、加密

Conditional Functions 条件函数

都重要。尤其是case when

Type Conversion Functions 类型转换函数

前置知识：Hive中支持类型的隐式转换有限制自动转换不保证成功就显示null
cast显示类型转换函数

{{CODE_BLOCK_28}}

Data Masking Functions 数据脱敏函数

mask脱敏掩码处理

数据脱敏：让敏感数据不敏感 13455667788 --->134****7788

Misc. Functions 其他杂项函数、加密函数

; --判断是否全为数字
select 1 from dual where '123456aa' rlike '^\d+

知识点10：Hive SQL--函数及分类标准（UDF、UDAF、UDTF）

内置的函数（build in func）

所谓的内置指的是hive开发好，可以直接上手使用的；

内置函数往往根据函数的应用功能类型来分类
日期函数、数字函数、字符串函数、集合函数、条件函数....

用户定义函数（user-defined function）

用户编程实现函数的逻辑在hive中使用。

UDF根据函数输入行数和输出行数进行分类
UDF 、UDAF、UDTF

{{CODE_BLOCK_22}}

UDF分类标准的扩大化

本来，udf/udtf/udaf3个标准是针对用户自定义函数分类的；
但是，现在可以将这个分类标准扩大到hive中所有的函数，包括内置函数和自定义函数；
不要被UD这两个字母所影响。 Built-in Aggregate Functions (UDAF).

函数相关的常用帮助命令

知识点11：Hive SQL--常用内置函数梳理--字符串函数

知识点12：Hive SQL--内置函数梳理--时间日期、数值

Date Functions 日期函数

日期和时间戳数字之间的转换

unix_timestamp 日期转unix时间戳

from_unixtime unix时间戳转日期

date_add

date_sub

datediff

Mathematical Functions 数学函数

round 取整

rand 取随机值

知识点13：Hive SQL--内置函数梳理--条件转换、集合、加密

Conditional Functions 条件函数

都重要。尤其是case when

Type Conversion Functions 类型转换函数

前置知识：Hive中支持类型的隐式转换有限制自动转换不保证成功就显示null
cast显示类型转换函数

{{CODE_BLOCK_28}}

Data Masking Functions 数据脱敏函数

mask脱敏掩码处理

数据脱敏：让敏感数据不敏感 13455667788 --->134****7788

Misc. Functions 其他杂项函数、加密函数

;

--regexp：功能与rlike相同用于判断字符串是否匹配正则表达式
select 1 from dual where 'itcast' regexp '^i.*t

知识点10：Hive SQL--函数及分类标准（UDF、UDAF、UDTF）

内置的函数（build in func）

所谓的内置指的是hive开发好，可以直接上手使用的；

内置函数往往根据函数的应用功能类型来分类
日期函数、数字函数、字符串函数、集合函数、条件函数....

用户定义函数（user-defined function）

用户编程实现函数的逻辑在hive中使用。

UDF根据函数输入行数和输出行数进行分类
UDF 、UDAF、UDTF

{{CODE_BLOCK_22}}

UDF分类标准的扩大化

本来，udf/udtf/udaf3个标准是针对用户自定义函数分类的；
但是，现在可以将这个分类标准扩大到hive中所有的函数，包括内置函数和自定义函数；
不要被UD这两个字母所影响。 Built-in Aggregate Functions (UDAF).

函数相关的常用帮助命令

知识点11：Hive SQL--常用内置函数梳理--字符串函数

知识点12：Hive SQL--内置函数梳理--时间日期、数值

Date Functions 日期函数

日期和时间戳数字之间的转换

unix_timestamp 日期转unix时间戳

from_unixtime unix时间戳转日期

date_add

date_sub

datediff

Mathematical Functions 数学函数

round 取整

rand 取随机值

知识点13：Hive SQL--内置函数梳理--条件转换、集合、加密

Conditional Functions 条件函数

都重要。尤其是case when

Type Conversion Functions 类型转换函数

前置知识：Hive中支持类型的隐式转换有限制自动转换不保证成功就显示null
cast显示类型转换函数

{{CODE_BLOCK_28}}

Data Masking Functions 数据脱敏函数

mask脱敏掩码处理

数据脱敏：让敏感数据不敏感 13455667788 --->134****7788

Misc. Functions 其他杂项函数、加密函数

;

-------------------Hive中算术运算符---------------------------------
--取整操作: div 给出将A除以B所得的整数部分。例如17 div 3得出5。
select 17 div 3;

--取余操作: % 也叫做取模mod A除以B所得的余数部分
select 17 % 3;

位与操作: & A和B按位进行与操作的结果。与表示两个都为1则结果为1
select 4 & 8 from dual; --4转换二进制：0100 8转换二进制：1000
select 6 & 4 from dual; --4转换二进制：0100 6转换二进制：0110
位或操作: | A和B按位进行或操作的结果或表示有一个为1则结果为1
select 4 | 8 from dual;
select 6 | 4 from dual;
位异或操作: ^ A和B按位进行异或操作的结果异或表示两者的值不同,则结果为1
select 4 ^ 8 from dual;
select 6 ^ 4 from dual;

--3、Hive逻辑运算符
--与操作: A AND B 如果A和B均为TRUE，则为TRUE，否则为FALSE。如果A或B为NULL，则为NULL。
select 1 from dual where 3>1 and 2>1;
--或操作: A OR B 如果A或B或两者均为TRUE，则为TRUE，否则为FALSE。
select 1 from dual where 3>1 or 2!=2;
--非操作: NOT A 、!A 如果A为FALSE，则为TRUE；如果A为NULL，则为NULL。否则为FALSE。
select 1 from dual where not 2>1;
select 1 from dual where !2=1;

--在:A IN (val1, val2, ...) 如果A等于任何值，则为TRUE。
select 1 from dual where 11 in(11,22,33);
--不在:A NOT IN (val1, val2, ...) 如果A不等于任何值，则为TRUE
select 1 from dual where 11 not in(22,33,44);


-----

#### 知识点10：Hive SQL--函数及分类标准（UDF、UDAF、UDTF）

内置的函数（==build in func==）

> 所谓的内置指的是hive开发好，可以直接上手使用的；

- 内置函数往往根据函数的应用功能类型来分类
- 日期函数、数字函数、字符串函数、集合函数、条件函数....



用户定义函数（==user-defined function==）

> 用户编程实现函数的逻辑在hive中使用。

- UDF根据函数==输入行数和输出行数==进行分类

- UDF 、UDAF、UDTF![7-Apache Hive SQL DQL、参数配置与函数_image-7.png](/img/user/czc%E7%9F%A5%E8%AF%86%E5%BA%93/%E6%9D%82%E4%B8%83%E6%9D%82%E5%85%AB/9-%E9%99%84%E4%BB%B6/%E9%99%84%E4%BB%B6/7-Apache%20Hive%20SQL%20DQL%E3%80%81%E5%8F%82%E6%95%B0%E9%85%8D%E7%BD%AE%E4%B8%8E%E5%87%BD%E6%95%B0_image-7.png)

  {{CODE_BLOCK_22}}



==**UDF分类标准的扩大化**==

- 本来，udf/udtf/udaf3个标准是针对用户自定义函数分类的；
- 但是，现在可以将这个分类标准扩大到==hive中所有的函数，包括内置函数和自定义函数==；
- 不要被UD这两个字母所影响。  Built-in Aggregate Functions (UDAF).



函数相关的常用帮助命令

{{CODE_BLOCK_23}}

-------

#### 知识点11：Hive SQL--常用内置函数梳理--字符串函数

{{CODE_BLOCK_24}}

-----

#### 知识点12：Hive SQL--内置函数梳理--时间日期、数值

Date Functions 日期函数

> 日期和时间戳数字之间的转换 
>
> ==unix_timestamp==  日期转unix时间戳
>
> ==from_unixtime==  unix时间戳转日期
>
> ==date_add== 
>
> ==date_sub==  
>
> ==datediff==

{{CODE_BLOCK_25}}

---

Mathematical Functions 数学函数

> ==round== 取整
>
> ==rand== 取随机值

{{CODE_BLOCK_26}}

-----

#### 知识点13：Hive SQL--内置函数梳理--条件转换、集合、加密

Conditional Functions 条件函数

> ==都重要==。尤其是case when

{{CODE_BLOCK_27}}

-----

Type Conversion Functions 类型转换函数

- 前置知识：Hive中支持类型的隐式转换  有限制  自动转换  不保证成功  就显示null

- ==cast显示类型转换函数==

  {{CODE_BLOCK_28}}

----

Data Masking Functions 数据脱敏函数  

> mask脱敏 掩码处理
>
> 数据脱敏：==让敏感数据不敏感==   13455667788 --->134****7788

{{CODE_BLOCK_29}}

-----

Misc. Functions 其他杂项函数、加密函数

{{CODE_BLOCK_30}}