I’m porting Jonas Wagner‘s excellent smartcrop.js, which analyzes an image to recommend a good crop, from JavaScript to Java.
When porting code, I always try to understand it along the way so that if (when) there are bugs, I’ll have an idea of where the bug might be. I used ChatGPT 4 to untangle a particularly terse bit of code, and was — frankly — shocked at how well it did.
Among many other heuristics, smartcrop applies the Rule of Thirds to find a good crop of an image quickly and reliably. It implements this rule with the following code:
// Gets value in the range of [0, 1] where 0 is the center of the pictures
// returns weight of rule of thirds [0, 1]
function thirds(x) {
x = (((x - 1 / 3 + 1.0) % 2.0) * 0.5 - 0.5) * 16;
return Math.max(1.0 - x * x, 0.0);
}
Now, I’m no savant, but I have a degree in CS and have been around the metaphorical block a few times, so I like to think I’m smarter than the average bear at reading code and math. That said, I found this code to be completely inscrutable, and how — even if — it implements a heuristic for the rule of thirds completely escaped me.
When I’m faced with dense numerical code that I can’t grok, I usually break out a Jupyter notebook or Wolfram Alpha and plug and chug until I understand what’s going on. It works, but it takes time and effort, and I’m just porting a library.
So this time, I asked my good buddy, ChatGPT, for some help reverse engineering the code.
I had the following (slightly abridged) conversation about the above code with ChatGPT:
ANDY
This Java method is supposed to provide a heuristic value for the rule of thirds when cropping images. Can you help me unpack if it works, and how?
private float thirds(float x) {
x = (((x - 1.0f / 3.0f + 1.0f) % 2.0f) * 0.5f - 0.5f) * 16.0f;
return Math.max(1.0f - x * x, 0.0f);
}
CHATGPT
The rule of thirds is a principle in photography and visual arts that suggests that an image should be divided into nine equal parts by two equally spaced horizontal lines and two equally spaced vertical lines. Important compositional elements should be placed along these lines or at their intersections to create more tension, energy, and interest in the composition compared to simply centering the subject.
The thirds function you've provided seems to be a heuristic designed to quantify how well a particular point (given by its x coordinate) aligns with the rule of thirds. The function manipulates the x coordinate in a way that likely aims to give higher values (closer to 1) when x is near the thirds (either 1/3 or 2/3 of the way across the frame), and lower values (closer to 0) as x moves away from these points. Let's break down the function step-by-step:
x - 1.0f / 3.0f + 1.0f: This shifts the x value so that one of the thirds (specifically, the 1/3 mark) is moved to the origin (0 point), and then adds 1 to ensure the value is positive.
((... % 2.0f) * 0.5f - 0.5f): The modulo operation with 2 ensures that the result wraps around every 2 units, creating a repeating pattern. This is scaled down by 0.5 and then shifted by -0.5 to center the peak of the pattern at 0. The pattern repeats every 2 units, so there will be peaks at positions corresponding to the rule of thirds (1/3 and 2/3) due to the initial shift and wrap-around effect.
* 16.0f: This scales the pattern up, increasing the sensitivity of the function around the thirds. The choice of 16 is likely arbitrary and could be adjusted based on how sharply the function should penalize deviations from the rule of thirds.
Math.max(1.0f - x * x, 0.0f): Finally, the function squares the scaled value (increasing the penalty for being away from the thirds), subtracts it from 1 (inverting the curve so that values near the thirds are higher), and ensures that the result is not negative (using Math.max with 0).
In essence, this function is designed to produce a bell-shaped curve that peaks at points corresponding to the rule of thirds, and the curve's width and height are manipulated through the scaling and shifting operations. The closer an element is to a third, the higher the heuristic value returned by the function, with the value decreasing quadratically as the element moves away from the third.
This heuristic could be used in image cropping algorithms to score potential crops based on how well they align with the rule of thirds, favoring compositions where important elements are near these thirds lines.
ANDY
This is an outstanding description. Can you rewrite the above method so that the intent is clear? Using this as commenting throughout the function would be useful.
CHATGPT
Certainly! I'll rewrite the method with more descriptive variable names and include comments to explain each step of the process. This should make the intent and functionality of the method clearer:
/**
* Calculates a heuristic value for alignment with the rule of
* thirds. This function evaluates how closely a given point (x)
* aligns with the rule of thirds, returning a higher value for
* points closer to the thirds lines (1/3 and 2/3) and lower values
* for points further away. The function uses a bell-shaped curve
* that peaks at the thirds lines and decreases quadratically with
* distance from these points.
*
* @param x The normalized position of a point within a frame, where
* 0 <= x <= 1.
* @return A heuristic value indicating the alignment with the rule
* of thirds, where 1 is perfectly aligned, and 0 indicates no
* alignment.
*/
private float evaluateRuleOfThirds(float x) {
// Shift x so that 1/3 aligns with 0 and wrap around every 2
// units to create a repeating pattern
float shiftedAndWrappedX = (((x - 1.0f / 3.0f + 1.0f) % 2.0f) * 0.5f - 0.5f);
// Scale the pattern to increase sensitivity around the thirds
float scaledX = shiftedAndWrappedX * 16.0f;
// Apply a quadratic function to create a bell-shaped curve that
// peaks at the thirds and ensure the result is non-negative
float heuristicValue = Math.max(1.0f - scaledX * scaledX, 0.0f);
return heuristicValue;
}
ANDY
In the above method, can you break the expression for shiftedAndWrappedX into multiple steps, each with their own comments? The Javadoc and everything else already looks great.
CHATGPT
Sure, I'll break down the expression for shiftedAndWrappedX into multiple steps with individual comments to further clarify each part of the process:
/**
* Calculates a heuristic value for alignment with the rule of
* thirds. This function evaluates how closely a given point (x)
* aligns with the rule of thirds, returning a higher value for
* points closer to the thirds lines (1/3 and 2/3) and lower values
* for points further away. The function uses a bell-shaped curve
* that peaks at the thirds lines and decreases quadratically with
* distance from these points.
*
* @param x The normalized position of a point within a frame, where
* 0 <= x <= 1.
* @return A heuristic value indicating the alignment with the rule
* of thirds, where 1 is perfectly aligned, and 0 indicates no
* alignment.
*/
private float evaluateRuleOfThirds(float x) {
// Step 1: Shift 'x' so that the 1/3 mark aligns with 0
float shiftedX = x - 1.0f / 3.0f;
// Step 2: Add 1 to ensure the value is positive (necessary for
// the modulo operation)
float positiveShiftedX = shiftedX + 1.0f;
// Step 3: Apply modulo 2 to create a repeating pattern every 2
// units
float wrappedX = positiveShiftedX % 2.0f;
// Step 4: Scale and shift the wrapped value to center the peak
// of the bell curve at 0. The '* 0.5f' scales the pattern down,
// and the '- 0.5f' shifts the peak to 0
float shiftedAndWrappedX = (wrappedX * 0.5f) - 0.5f;
// Step 5: Scale the pattern to increase sensitivity around the
// thirds
float scaledX = shiftedAndWrappedX * 16.0f;
// Apply a quadratic function to create a bell-shaped curve that
// peaks at the thirds and ensure the result is non-negative
float heuristicValue = Math.max(1.0f - scaledX * scaledX, 0.0f);
return heuristicValue;
}
That was surprisingly helpful. It picked up the quadratic nature of the algorithm and identified that the term 16.0f
is an arbitrary weight used to control the shape of the bell curve, and then explained it clearly and concisely. And it did it all in less time than I would have taken mucking about in Jupyter.
Hopefully the reader agrees that the resulting code is clearer:
/**
* Calculates a heuristic value for alignment with the rule of thirds.
* This function evaluates how closely a given point (x) aligns with the
* rule of thirds, returning a higher value for points closer to the
* thirds lines (1/3 and 2/3) and lower values for points further away.
* The function uses a bell-shaped curve that peaks at the thirds lines
* and decreases quadratically with distance from these points.
*
* @param x The normalized position of a point within a frame, where
* 0 <= x <= 1.
* @return A heuristic value indicating the alignment with the rule of
* thirds, where 1 is perfectly aligned, and 0 indicates no alignment.
*/
private float evaluateRuleOfThirds(float x) {
// Step 1: Shift 'x' so that the 1/3 mark aligns with 0
float shiftedX = x - 1.0f / 3.0f;
// Step 2: Add 1 to ensure the value is positive (necessary for the
// modulo operation)
float positiveShiftedX = shiftedX + 1.0f;
// Step 3: Apply modulo 2 to create a repeating pattern every 2 units
float wrappedX = positiveShiftedX % 2.0f;
// Step 4: Scale and shift the wrapped value to center the peak of the
// bell curve at 0. The '* 0.5f' scales the pattern down, and the
// '- 0.5f' shifts the peak to 0
float shiftedAndWrappedX = (wrappedX * 0.5f) - 0.5f;
// Step 5: Scale the pattern to increase sensitivity around the thirds
float scaledX = shiftedAndWrappedX * 16.0f;
// Step 6: Apply a quadratic function to create a bell-shaped curve
// that peaks at the thirds and ensure the result is non-negative
float heuristicValue = Math.max(1.0f - scaledX * scaledX, 0.0f);
return heuristicValue;
}
So how did ChatGPT do?
In a code review setting, I would have bounced the original code without even reading it for being too hard to read and understand. ChatGPT's rewrite, on the other hand, would have been a pleasure to review with a (human) author: it's well-commented with meaningful variable names and a clear explanation of what's (supposed to be) going on. (Note that this is different from saying the code would have passed review and is ready for production!)
The original code is hard to understand. Without dunking on the design of the (again, excellent) smartcrop.js library, this function's contract is confusing. (It's supposed to be called with 2 * abs(x / width - 0.5)
for width and the analog for height, which is odd, and the implementation has some extraneous math that's not required.) ChatGPT faithfully broke out the code exactly as written and attempted to explain it, doing a rather good job all things considered, but did not attempt to address the strange design. (Of course, if you expect an LLM to fix your software design, that's on you, I suppose.)
In the final analysis, this is at best a messy code sample. But, of course, that's why I used ChatGPT to analyze it to begin with. So whether this was a meaningful example to discuss due to its complexity and errors or merely a distraction is, as always, up to you, dear reader.
Going in, I believed that ChatGPT and similar tools have a role in modern software development workflows. Working through this example increased my confidence in that belief. ChatGPT isn't in danger of taking anyone's jobs -- at least, not yet -- but it's already useful.
For reference, I used my bog-standard $20/mo plan on OpenAI, which grants me access to ChatGPT 4. It’s possible one could have done this with a free plan, but I haven’t tested. Also, I’m not sure how ChatGPT 4 stacks up against ChatGPT 3.x in these scenarios.