Giter VIP home page Giter VIP logo

tx-analysis's Introduction

TxAnalysis

目的

获取以太坊合约信息,爬取合约源码,并对合约及交易信息进行分析

数据获取

1.参考论文(ICSE'20)的合约数据集构建方法,我们从Google BigQuery的以太坊公共数据集进行查询

(使用BigQuery沙盒可以不创建绑定信用卡的结算账号直接查询)

获取合约地址,统计合约调用次数,为减少合约分析的数量,我们选取被调用超过8000次的合约,查询如下:

SELECT contracts.address, COUNT(1) AS tx_count
FROM `bigquery-public-data.crypto_ethereum.contracts` AS contracts
JOIN `bigquery-public-data.crypto_ethereum.transactions` AS transactions ON (transactions.to_address = contracts.address)
GROUP BY contracts.address HAVING COUNT(1) > 8000
ORDER BY tx_count DESC

增加以下筛选条件进一步筛选ERC20合约和ERC721合约

WHERE contracts.is_erc20 = true

WHERE contracts.is_erc721 = true

获取的数据见raw_data/gt8000-all.csv, raw_data/gt8000-erc20.csv, raw_data/gt8000-erc721.csv(命名规则:调用次数-筛选条件.csv)

2.从Etherscan获取合约源码及信息,使用了py-etherscan-api包,脚本位于raw_data/script/get_contracts.py

爬取的结果位于raw_data/jsons/,json数据格式如下:

// File: contract_address.json
{
    "SourceCode": "",
    "ABI": "",
    "ContractName": "",
    "CompilerVersion": "",
    "OptimizationUsed": "",
    "Runs": "",
    "ConstructorArguments": "",
    "EVMVersion": "",
    "Library": "",
    "LicenseType": "",
    "Proxy": "",
    "Implementation": "",
    "SwarmSource": ""
}

数据处理

  1. 从json数据中提取SourceCode字段,脚本位于raw_data/script/extract.py,处理结果将保存到raw_data/codes/目录下,以合约地址.sol命名

  2. 对合约按照erc20和erc721类型进行标记,脚本位于raw_data/script/mark.py,生成的结果将保存为raw_data/marked.xlsx

  3. 合约地址.sol重命名为调用次数排名(降序)_调用次数_合约地址.sol,脚本位于raw_data/script/rename.py,处理完成的合约位于contracts/

  4. 部分合约使用Vyper语言编写,对contracts/下的合约使用raw_data/script/fix.sh脚本过滤,invalid_contract手动修正,共计86个Vyper合约,其中编号55、70是反编译获得

  5. 部分合约包含多个文件,解析不完全,使用raw_data/script/reparser.py脚本处理,涉及1746个合约

  6. 部分合约源代码存在换行符等符号错误,使用raw_data/script/filter.sh脚本处理

  7. 根据代码相似度对源码进行分类,使用raw_data/script/group.sh脚本处理

  8. 将分类后的合约放入不同文件夹中,使用raw_data/script/group.py脚本处理

  9. 将分类信息写入contract_info,使用raw_data/script/write_group.py脚本处理

  10. 从etherscan获取合约类型标记,使用raw_data/script/get_info.py脚本处理

分析结果

对TOP100合约的分类和功能描述情况详见contract_info.xlsx

对TOP1000合约的分组和分类情况详见group_info.xlsx

分析结果详见https://github.com/JolyonJian/contracts

tx-analysis's People

Contributors

jolyonjian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.