Can AI Speed up your Code?
AI “software engineers” have been gaining a lot of attention recently. But do they actually work?
Unfortunately, it’s difficult to tell. The existing products aren’t widely available to use and test. The most widely used benchmark, SWE-Bench Lite, has real-world issues and patches from GitHub. However, the patches in SWE-Bench Lite only changes 10 lines of code per patch! This rises to 37 lines of code for the full SWE-Bench, but neither are reflective of complex software engineering tasks.
To understand the limits of AI for software development, we’ve been exploring how AI can be used for code using real-world tasks. In this blog post, we’ll describe how we used AI to convert (a part) of a popular Python library to C++ for speed improvements.
What we’ve found is that AI is quite good at writing large volumes of code, but it suffers from precision. Unfortunately (or fortunately?), AI isn’t here to replace humans yet. Read on to see why! And connect with us if you’re interested in fast Python or understanding AI and code.
Python vs C++
Python is one of the most widely used languages - and for good reason. It’s extremely flexible, which allows for fast development. However, it’s also extremely slow. C++ is nearly the opposite. It allows fine-grained control of memory and the final compiled code. Unfortunately, it’s considered difficult to use.
Python has “bindings” to allow developers to write C++ code for Python. This can be wildly more efficient than raw Python. Many popular libraries, including pytorch and numpy, use this interface for potentially thousands of times speed improvements. Beyond numerical computation, Astral has been building fast Python linting and package management in Rust, with up to 100x performance improvements!
As organizations move from needing to move fast to needing high performance, there’s often a tough choice: continue using a flexible language like Python or switch to a higher performance language? And can AI help with this process?
Converting urllib
We decided to explore this question by converting parts of urllib to C++. urllib heavily relies on string processing and regular expressions, which can be substantially more efficient in C++ than Python. This sounds like a great target for conversion! We decided to try to convert urlparse.
The code for urlparse is available here. We first tried to convert the entire file - none of the models we tried were willing or capable of converting the entire file, with results like this:
Since this approach didn’t work, we then split the file into parts (largely by function or declarations) and started to convert them one at a time. Some of the parts were simple, such as the constant variable declarations. Even some simple functions were easy to convert, such as the following function:
Other functions were more challenging. For example, the function _checknetloc
uses unicodedata.normalize
, which doesn’t have a default implementation in the C++ standard library. The AI systems we tried were confused by these kinds of issues.
Importantly, if we blindly trusted the AI, we would have had incorrect code for non-ASCII URLs! Our findings echo other findings around AI-assisted coding.
One of the most confusing parts of the conversion was the method to “coerce” the arguments into the subclasses of the namedtuple’s. One part of the conversion was:
Performance
After converting the code, we wanted to see if we succeeded in speeding up urlparse. We created a short benchmark of parsing long URLs. On my laptop, the standard urlparse can parse 1 million URLs in 2.6 seconds, compared to our optimized implementation (when interfacing with Python) in 1.3 seconds! We successfully improved throughput by up to 2x.
Conclusions
As we’ve seen, we can successfully convert simple Python to C++ with AI assistance. Unfortunately, the existing tools we have access to aren’t capable of doing the conversion automatically.
Nonetheless, we think this approach has some promise. Reach out if you’re interested in fast Python or interested in AI-assisted coding!