Skip to the content.

PII[^1]泄露–用CodeQL识别日志中的PII数据

shopizer是一款开源电子商务系统,使用Java语言开发。shopizer‘s github

本次实验所以内容代码都会上传至https://github.com/SummerSec/learning-codeql


Source–敏感字段

已知敏感有:

CodeQL的Field获取字段名再根据正则模糊匹配的方式

/**
 *@name SimplePIIField
 */

import java

from Field f
where 
    (f.getName().matches("%email%") or
    f.getName().matches("%phone%") or
    f.getName().matches("creditCard%")) and
    f.fromSource()
select f

image-20210418173835827

转化成类的形式

/**
 *@name SimplePIIClass
 */

import java

class SenInfoField extends Field{
    SenInfoField(){
        (this.getName().matches("%email%") or
        this.getName().matches("%phone%") or
        this.getName().matches("creditCard%")) and
        this.fromSource()
    }
}

from SenInfoField sif
select sif 

image-20210418173846083


Sink–日志输出调用

shopizer使用的是slf4j日志框架输出日志,StringFormatMethod是CodeQL对该日志框架处理其定义是:

/**
 * A format method using the `org.slf4j.Logger` format string syntax. That is,
 * the placeholder string is `"{}"`.
 */

查询slf4j调用

/**
 *@name Logger slf4j 记录器记录方法调用查询
 */

import java
import semmle.code.java.StringFormat

from LoggerFormatMethod lfm
// select lfm.getAReference().getAnArgument()
select lfm.getAReference()

image-20210418173902809

image-20210418173927488


污点数据流追踪

source以PII字段emailphonecreditCardsinkslf4j的参数。

/**
 *@name PIIQuery
 *@kind problem
 */

import java
import semmle.code.java.dataflow.TaintTracking
import semmle.code.java.StringFormat

class SenInfoField extends Field{
    SenInfoField(){
        (this.getName().matches("%email%") or
        this.getName().matches("%phone%") or
        this.getName().matches("creditCard%")) and
        this.fromSource()
    }
}

class MySenInfoTaintConfig extends TaintTracking::Configuration{
    MySenInfoTaintConfig(){
        this = "MySenInfoTaintConfig"

    }

    override predicate isSource(DataFlow::Node source){
        source.asExpr() = any(SenInfoField sif).getAnAccess()
    }
    override predicate isSink(DataFlow::Node sink){
        sink.asExpr() = any(LoggerFormatMethod lfm).getAReference().getAnArgument()
    }
}



from MySenInfoTaintConfig config, DataFlow::Node source, DataFlow::Node sink, SenInfoField f
where 
    config.hasFlow(source, sink) and
    // source.asExpr() = f.getAnAccess()
    f.getAnAccess() = source.asExpr()
select sink, "PII data from field [email protected] is written to long here",f , f.getName()
// select sink,"PII data from field [email protected] is written to long here",source, source.asExpr().toString()

image-20210418165443032

image-20210418165519019

在where clause中38和39行是一样的效果,因为在CodeQL中=的作用是判断左右两边是否是相同、相等,所以左右的顺序是没有区别。

或者也可以这样子写:

from MySenInfoTaintConfig config, DataFlow::Node source, DataFlow::Node sink
where 
    config.hasFlow(source, sink) 
select sink,"PII data from field [email protected] is written to long here",source, source.asExpr().toString()

image-20210418174013455

PS:关于[email protected]参考Defining the results of a query


完整路径显示

显示路径需要将@kind problem改成@kind path-problem,并且导入import DataFlow::PathGraph

/**
 *@name PIIQueryPath
 *@kind path-problem
 *@description 污染路径
 */

import java
import semmle.code.java.StringFormat
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraph


class SenInfoField extends Field{
    SenInfoField(){
        (this.getName().matches("%email%") or
        this.getName().matches("%phone%") or
        this.getName().matches("creditCard%")) and
        this.fromSource()
    }
}

class MySenInfoTaintConfig extends TaintTracking::Configuration{
    MySenInfoTaintConfig(){
        this = "MySenInfoTaintConfig"

    }

    override predicate isSource(DataFlow::Node source){
        source.asExpr() = any(SenInfoField sif).getAnAccess()
    }
    override predicate isSink(DataFlow::Node sink){
        sink.asExpr() = any(LoggerFormatMethod lfm).getAReference().getAnArgument()
    }
}

from MySenInfoTaintConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink, source, sink, "PII data from field [email protected] is written to long here", source, source.getNode().toString()

image-20210418174047087


### 无害处理

在路径查询结果中,我们查看的时候可以发现creditCardmask%方法处理了,mask%方法是马赛克的意思。排除这个有这个方法路径,让结果更少的误报,这就需要重写isSanitizer谓词。在污点追踪里,Sanitizer即是无害处理。

image-20210418165612231

重写谓词isSanitizer,这里只需要排除方法名有mask%即可。

/**
 *@name PIIQuerySanitizerPath
 *@kind path-problem
 *@description 排除一些无效查询
 */

import java
import semmle.code.java.StringFormat
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraph

import java
import semmle.code.java.dataflow.TaintTracking
import semmle.code.java.StringFormat

class SenInfoField extends Field{
    SenInfoField(){
        (this.getName().matches("%email%") or
        this.getName().matches("%phone%") or
        this.getName().matches("creditCard%")) and
        this.fromSource()
    }
}

class MySenInfoTaintConfig extends TaintTracking::Configuration{
    MySenInfoTaintConfig(){
        this = "MySenInfoTaintConfig"

    }

    override predicate isSource(DataFlow::Node source){
        source.asExpr() = any(SenInfoField sif).getAnAccess()
    }
    override predicate isSink(DataFlow::Node sink){
        sink.asExpr() = any(LoggerFormatMethod lfm).getAReference().getAnArgument()
    }
    override predicate isSanitizer(DataFlow::Node sanitizer){
        sanitizer.asExpr() = any(
            Method m| m.getName().matches("mask%")
        ).getAReference().getAnArgument()
    }
}

from MySenInfoTaintConfig config, DataFlow::PathNode source, DataFlow::PathNode sink, SenInfoField f
where 
    config.hasFlowPath(source, sink) and
    source.getNode().asExpr() = f.getAnAccess()
select sink,source,sink ,"PII data from field [email protected] is written to long here",f ,f.getName()

image-20210418171150005


总结

确定SourcesSink编写污点追踪数据流–>完善和进一步找到路径–>无害处理。


参考

https://youtu.be/hHaOxbyqy44