Symbolic Analysis, Google MusicLM & DetectGPT - CS News #4

Feb 01, 2023

Welcome to CS News #4!

The CS newsletter helps you keep track of the latest news in different technology domains like AI, Security, Software development, Blockchain/P2P and discover new interesting projects and techniques!

💡Project Highlight: angr a symbolic analysis framework

angr is an Open Source binary analysis platform for Python. It can analyse a binary statically (ie without executing it) but also dynamically (during the actual execution) and can be used to perform symbolic analysis of a program.

Symbolic Analysis is a multiple steps process that can be used to find the inputs for a program that matches certain conditions (ex: find an input that bypasses some logic checks of the program) where angr can be used to:

disassemble the binary to get its basic blocks, and conditional branchings
represent conditional branching as boolean logic

use a SAT solver, z3, a theorem prover made by Microsoft to check if the different constraints are satisfiable for some inputs values or not
compute an input that matches the different conditional branchings to finally execute the part of the program you want

Here is an example of how you can solve a defcamp challenge which only prints “Nice!” when a certain input is given to the challenge binary using angr in only a few lines:

>>> import os
>>> import angr
# load the challenge binary
>>> project = angr.Project("defcamp_quals_2015_r100", auto_load_libs=False)
>>> simgr = project.factory.simgr()
# tell angr to find a way to execute the part of the binary outputing the "Nice!" string 
>>> simgr.explore(find=lambda path: 'Nice!' in path.state.posix.dumps(1))
# print the first solution found
>>> print(simgr.found[0].state.posix.dumps(0))
Code_Talkers

$ ./defcamp_quals_2015_r100
Enter the password: Code_Talkers
Nice!

👉 Discover angr

🎶 MusicLM

Google Research introduced MusicLM, a model generating high-fidelity music from text descriptions outperforming previous systems both in audio quality and adherence to the text description. MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption.

bella ciao - humming

bella ciao - generated a cappella chorus

bella ciao - generated electronic synth lead

Furthermore Google Research publicly released MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts. However, as stated by techcrunch Google will not release access to the model as it fears copyright strikes.

👉 Discover MusicLM

🧐 DetectGPT

ChatGPT has democratised and shown the usage and potential of Large Language Models (LLM). With these and potential harmful applications (bot, spamming, fishing, etc) comes a question, is it possible to distinguish human-written text from auto-generated text?

To address this issue researchers from Stanford University published DetectGPT a new curvature-based criterion for judging if a passage of a text is generated from a given LLM. Its method demonstrates that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function, thus without requiring training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text, it outperforms existing zero-shot methods for model sample detection.

👉 Read more

CS News

Symbolic Analysis, Google MusicLM & DetectGPT - CS News #4

💡Project Highlight: angr a symbolic analysis framework

🎶 MusicLM

🧐 DetectGPT

Interesting stuff

Discussion about this post