Giter VIP home page Giter VIP logo

insiderecaptcha's Introduction

Summary

A few days ago, Google has introduced a new version of ReCaptcha, theorically allowing most users to complete it by only ticking a checkbox. If the user isn't deemed as human by Google, the old version with distorted text appears. Although I used a normal Firefox version, I still had to fill the text captcha after clicking, so it didn't really worked for me. My curiosity induced me to look at the JavaScript in order to know how all this really works...

What happens on the wire

First, the browser makes the few following requests:

  • https://www.google.com/recaptcha/api.js, whose function is mainly to load the next one...
  • https://www.gstatic.com/recaptcha/api2/r20141202135649/recaptcha__en.js, which contains common code.
  • https://apis.google.com/_/scs/apps-static/_/js/ (followed by a bunch of more or less cryptic parameters) which contains other common JavaScript code.

The browser then makes a requests to https://www.google.com/recaptcha/api2/anchor, whose response contains the very interesting stuff: a callback to a function called recaptcha.anchor.Main.init, which contains two base64-encoded parameters.

The first parameter points to a JavaScript file: https://www.google.com/js/bg/6yg-ggdQgQAg8SAADJkAjc-JMNnOnYuIGgH_iBV7uf8.js. The second one contains *double-*base64-encoded binary data.

It turned out this new ReCaptcha system is heavily obfuscated, as Google implemented a whole VM in JavaScript with a specific bytecode language.

The first parameter is the bytecode interpreter. After trimming the (function(){eval(' and ')})(), and passing it to JSBeautifier, I finally dove in this mass of minified code.

The analysis

The interpreter has two entry points: the M function which is executed when ReCaptcha is loaded, and M.prototype.ha which is executed when you click the checkbox, and that returns the information for Google servers.

I first discovered that the bytecode was encrypted using the XTEA algorithm. Each block of 8 bytes is xored with a keystream (so decryption and encryption functions are the same), where the first 32-bit word of plaintext is read from the bytecode file, the second 32-bit word is the position in the bytecode file divided by 8, and the key is by default [0, 0, 0, 0].

By default... because it would have been too simple: it turns out the bytecode has direct access to JavaScript variables of its own interpreter, and changes its own decryption key and even its own opcodes numbers at many points.

Even more nifty, the bytecode key is once generated by directly hashing JavaScript code from the interpreter (Function.toString() rocks, it doesn't?), or with the output of browser-specific functions and CSS rules, or with the hostname of the calling domain (www.google.com)...

After about 2 days of work, I produced a working disassembler and then decompiler for the ReCaptcha bytecode. You can try it from this GitHub repository. However, it stills has some hardcoded keys values, so it will only work on the bytecode sample contained in the enc file for now.

Just execute the ./decomp.py file to give it a try, it will output pseudo-JavaScript. xhr1 and xhr2 are byte arrays that contains the data later sent to Google servers.

Gathered information

Google servers will receive and process, at least, the following information:

  • Plug-ins
  • User-agent
  • Screen resolution
  • Execution time, timezone
  • Number of click/keyboard/touch actions in the <iframe> of the captcha
  • It tests the behavior of many browser-specific functions and CSS rules
  • It checks the rendering of canvas elements
  • Likely cookies server-side (it's executed on the www.google.com domain)
  • And likely other stuff...

You can look at the decompiled bytecode for more precision.

This information, along with numeric values hardcoded in the bytecode (forcing a potential bot to read all of it), is sent to the https://www.google.com/recaptcha/api2/frame page. Look at the M.prototype.Q function to see how the encoding process is realized. Some of information (the one I call xhr2 in the decompiler, which is retrieved in the this.c[this.g] variable − xhr1 is in this.c[this.d]) is also encrypted with XTEA.

What's next...

We could:

  • Make statistics about when the checkbox-captcha suffices and when it doesn't.
  • Programmatically bypass the captcha by interpreting bytecode.
  • Programmatically bypass the captcha by simply executing a rendering engine and automating movements of the mouse. But it would be slighty less funny.

Cheers and good reversing!

insiderecaptcha's People

Contributors

recaptchareverser avatar

Watchers

Mauricio Vargas avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.