Svarichevsky Mikhail - RSS feed http://3.14.by/ Svarichevsky Mikhail - RSS feed en-us Tue, 10 Jun 2006 04:00:00 GMT Thu, 30 Nov 23 21:56:26 +0000 3@14.by 120 10 <![CDATA[Finishing 10 minute task in 2 hours using ChatGPT]]> http://3.14.by/en/read/chatgpt-gpt-openai-10-minute-task-in-2-hours-llm Many of us have heard stories where one was able to complete days worth of work in minutes using AI, even being outside of one's area of expertise. Indeed, often LLM's do (almost) miracles, but today I had a different experience.

The task was almost trivial: generate look-up table (LUT) for per-channel image contrast enhancement using some S-curve function, and apply it to an image. Let's not waste any time: just fire up ChatGPT (even v3.5 should do, it's just a formula), get Python code for generic S-curve (code conveniently already had visualization through matplotlib) and tune parameters until you like it before plugging it into image processing chain. ChatGPT generated code for logistic function, which is a common choice as it is among simplest, but it cannot change curve shape from contrast enhancement to reduction simply by changing shape parameter.

The issue with generated code though was that graph was showing that it is reducing contrast instead of increasing it. When I asked ChatGPT to correct this error - it apologized and produced more and more broken code. Simply manually changing shape parameter was not possible due to math limitation - formula is not generic enough. Well, it is not the end of the world, LLM's do have limits especially on narrow-field tasks, so it's not really news. But the story does not end here.


For reference, this is ChatGPT's code:

import numpy as np
import matplotlib.pyplot as plt

def create_s_curve_lut():
    # Define parameters for the sigmoid curve
    a = 10.0  # Adjust this parameter to control the curve's shape
    b = 127.5  # Midpoint of the curve (127.5 for 8-bit grayscale)

    # Create the S-curve LUT using the sigmoid function
    lut = np.arange(256)
    lut = 255 / (1 + np.exp(-a * (lut - b) / 255))

    # Normalize the LUT to the 0-255 range
    lut = (lut - np.min(lut)) / (np.max(lut) - np.min(lut)) * 255

    return lut.astype(np.uint8)

# Create the S-curve LUT
s_curve_lut = create_s_curve_lut()

# Plot the S-curve for visualization
plt.plot(s_curve_lut, range(256))
plt.xlabel("Input Values (0-255)")
plt.ylabel("Output Values (0-255)")
plt.title("S-curve Contrast Enhancement LUT")
plt.show()

# You can access the S-curve LUT with s_curve_lut

At this point I gave up on ChatGPT LUT code and redid it using more universal regularized incomplete beta function. I adjusted a=b parameter to achieve curve shape that I like and applied LUT to image using OpenCV's LUT function. To my surprise and disbelief function was reducing contrast instead of increasing it. What?

After extensive head-scratching, to troubleshoot the problem I made a simplified linear contrast enhancement LUT and observed expected result. Only when I added linear contrast LUT to the graph issue became clear: When I abandoned ChatGPT's S-curve function, I kept graph code. In this code ChatGPT marked graph's axis labels and even added title. But then it threw a wrench by feeding x-data into Y axis and vice versa, effectively flipping the graph. As parameters of plt.plot are not named, it is very easy to miss this error for a human.

When I tuned shape factor for beta function with a flipped graph - I made it contrast-reducing that looked like it is what I needed. When I told ChatGPT that it's S-curve function is reducing contrast instead of increasing it - I misled it (and it unconditionally believed me), as S-curve was correct but error was in graph piece. Surely, if you tell ChatGPT that error is in plt.plot parameters - it can correct it.

I remember my teacher of analytic geometry at the final exam: when I was proving my solution - he could unexpectedly do not agree with one of the steps and claim that there is an error. To get maximum mark one had to not panic and continue defending correct solution. Hopefully we will see LLM's disagree with users more.

▶ Show error in code

But that's not all: Just when I've thought we are done - there is one more bug in the code. One can notice slight asymmetry of GPT-TRAP curve at high end. It's a rounding error - calculated value is simply cast to uint8 (which discards fractional part) instead of rounding, so in average we are getting 0.5 unit / ~0.25% lower brightness of the image and significantly more rare full white values (255). What is interesting is that this error appeared to be systematic and present in all generated samples from all LLM's I've tested. I.e. apparently error was very widespread in training data of all LLM's, so they all have learned that "multiply by 255 and cast to uint8" is enough to fit values to 0..255 range.Technically this is true, but result is mathematically flawed.

▶ Show error in code

My conclusions are:
  • LLM's are like junior developers - they can and will do unexpected mistakes, they need clear instructions and guidance. The difference though is that junior developers will learn over time and LLM's will get better only in next generation. Like junior developers - LLM's needs to be "managed" with reasonable expectations.
  • All code from LLM's must be verified, the more niche field - the more tests. LLM's generate code that looks correct, and when it's not - errors can be very subtle and expensive to debug/fix.
  • In case of unexpected or puzzling results it is often faster to simply ask multiple LLM's : now in addition to ChatGPT (3.5/4) we have Copilot, Bard, Replit and more. None of these gave perfect results from the first time, but some errors were different and often less subtle / easier to get it working in 20 minutes total.
  • Some of the errors are systematic for multiple LLM's, which apparently come from training data (as LLM's currently unconditionally trust training data, unlike humans). I.e. currently LLM's cannot exceed level training data on quality, but can only approach it. It is unclear how much further work on LLM's will be needed to get perfect result consistently, I afraid it might be the case where last 10% of the work require 90% of time.
]]>
Sun, 22 Oct 23 23:18:49 +0000
<![CDATA[Sirius and color twinkling ]]> http://3.14.by/en/read/Sirius-color-twinkling-astronomy-astrophotography-turbulence-atmosphere
Why it happens? Stars twinkle due to turbulence of the atmosphere acting as a random gradient refractive index "prism" (which is randomly shifting image & splitting colors - yes, even air has dispersion and it's visible here!) - so more/less light of different colors randomly hit lens aperture / eye. For stars air turbulence is sampled (in this case) in cylinder 62mm in diameter and ~50km in length, which makes effect very visible. Jupiter for example will average turbulence over a cone which opens up to 7.2m at 50km due to angular size of the planet, which will dramatically reduce contrast of twinkling due to averaging. Same averaging (reduction of twinkling) could happen for large telescopes (300mm+) even for stars, simply due to averaging across larger air volume.




One more:

]]>
Mon, 25 Sep 23 03:05:34 +0000
<![CDATA[EVE Online - it's getting crowded in space]]> http://3.14.by/en/read/eve-2023-1-million-sp-space-mmorpg should now get 1'000'000 SP on first login and that's the point of this post.

]]>
Sun, 24 Sep 23 23:24:51 +0000
<![CDATA[65B LLaMA on CPU]]> http://3.14.by/en/read/LLaMA-Replit-LLM-language-model-ai-cpu-quantization
16 years ago dog ate my AI book. At the time (and way before that) common argument on «Why we still don't have AI working and it is always 10 years away» was that we can't make AI work even at 1% or 0.1% human speed, even on supercomputers of the time – therefore it's not about GFLOPS.

This weekend I ran gpt4-alpaca-lora_mlp-65B language model in home lab on CPU (using llama.cpp, due to model size – there is 0 chance to run it on a consumer GPU). This model is arguably the best open LLM (this week), and 65 billion parameters is no joke: with single precision math it won't fit in 256Gb of RAM. If you let it spill into swap, even on NVMe drive – it will run at ~1 token per minute (limited by swap speed), which is about 0.5% of human speed. Even at this snail pace it can still show superhuman performance in memorization-related tasks. It is clear that it was not possible to get there 20 years ago – training time would have been prohibitive even with unlimited government funding.

And this is where unexpected open approach of Meta proven to be superior to closed, dystopian megacorp approach of OpenAI: In 10 weeks since LLaMA was released into the wild, not only derivative models were trained but 2-4-5 bit quantization enabled larger models on consumer hardware. In my case with 5bit quantization - model fits into 64Gb of RAM and runs at ~2 tokens per second (on 64-cores), which is probably 70-90% of my human speed in best shape.

For comparison, I tried Replit-code-v1.3b 2.7B model optimized for coding. After 65B monster – Replit feels like a breeze and shows very good performance despite its size. This is a good reminder that field-specific, smaller models should always be used where possible.

It feels like "1 Trillion parameters will be enough for everybody", but such models would not be practical probably for another 2 years. Meanwhile key enablers of AI proliferation could be increase of RAM in consumer GPUs beyond 24Gb (which is sadly unlikely to happen due to commercial interests) and smaller field-specific models where I would be looking into with much more interest.]]>
Mon, 22 May 23 07:57:14 +0000
<![CDATA[First tiny ASIC sent to manufacturing]]> http://3.14.by/en/read/first-asic-tinytapeout-google-skywater-pdk-openlane 5 years ago making microchip from high-level HDL with your own hands required around 300k$ worth of software licenses, process was slow and learning curve steep.

Yesterday I've submitted my first silicon for manufacturing and it was... different. In the evening wife comes as asks "How much time until deadline?". I reply: "2 hours left, but I still have to learn Verilog." (historically my digital designs were in VHDL or schematic).

All this became possible thanks to Google Skywater PDK and openlane synthesis flow - which allowed anyone to design a microchip with no paperwork to sign and licenses to buy. Then https://tinytapeout.com by Matt Venn lowered the barrier even further (idea to tapeout in ~4 hours, including learning curve).

As expected, this all allows much more people to contribute to open source flow, with my favorite being work of Teodor-Dumitru Ene https://github.com/tdene) on hardware adders which now match and beat commercial tools. I think (and hope) that in 5 years opensource tools will dominate the market on mature nodes (28nm and up), not because they are cheaper, but because they are better and easier to use.

My design fits in 100x100µm and contains 289 standard cells. There are 7 ring oscillators with frequency dividers to compare silicon performance to analog simulation across voltage/temperature. I expect to see chips in ~6-9 months, both working and under microscope :-)]]>
Sun, 04 Sep 22 15:22:58 +0000
<![CDATA[This cake is a lie.]]> http://3.14.by/en/read/stable-diffusion-ai-This-cake-is-a-lie Stable Diffusion model that was publicly released this week is a huge step forward in making AI widely accessible.

Yes, DALL-E 2 and Midjourney are impressive, but they are a blackbox. You can play with it, but can't touch the brain.

Stable Diffusion not only can be run locally on relatively inexpensive hardware (i.e. sized perfectly for wide availability, not just bigger=better), it is also easy to modify (starting from tweaking guidance scale, pipeline and noise schedulers). Access to latent space is what I was dreaming about, and Andrej Karpathy's work on latent space interpolation https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355) is just the glimpse into many abilities some consider to be unnatural.

Model is perfect with food, good with humans/popular animals (which are apparently well represented in the training set), but more rare Llamas/Alpakas often give you anatomically incorrect results which are almost NSFW.

On RTX3080 fp16 model completes 50 inference iterations in 6 seconds, and barely fits into 10Gb of VRAM. Just out of curiosity I run it on CPU (5800X3D) - it took 8 minutes, which is probably too painful for anything practical.

One more reason to buy 4090... for work, I promise!
]]>
Fri, 26 Aug 22 19:49:29 +0000
<![CDATA[Voron V0.1 - Ferrari among 3D printers (V0.1430)]]> http://3.14.by/en/read/Voron-V0.1-coreXY-printer-ferrari
Finished assembly and tuning of my new Voron V0.1. Stationary parts are from aluminum kit, rest I printed in ASA-X. Small size allows to reach very decent speeds and accelerations: fast profile 175/306 mm/s (perimeters / infill) with acceleration of 25'000 mm/s². For high quality - 80/150 mm/s, 15'000 mm/s². Fast acceleration and direct extruder make parameters tuning for high quality comparatively easy as extrusion speed is nearly constant. Also, pressure advance + input shaper allowed to increase acceleration from 5'000 to 25'000 mm/s² with no quality degradation on the corners.

It all works on Fluidd+Klipper, SKR-PRO v1.2 + Raspberry Pi 4. When printing 306mm/s @265°C - 40W heater is no longer enough, so I had to overclock printer a little to 28V (+36% heater power). 28V is a limit for TMC2209.

Initially I was considering to participate in SpeedBenchy contest - but things there went too far in the direction of "too fast / too bad". Printing at these speeds is limited by plastic cooling - this is why achievable speeds for high quality prints for ABS/ASA are several times higher than PLA. I.e. printing above 200mm/s is all about cooling, and is a contest of fans and air-ducts.

Update: Got my serial number V0.1430 :-)]]>
Tue, 08 Feb 22 09:00:51 +0000
<![CDATA[Walking with Alpakas]]> http://3.14.by/en/read/alpaka-walk ]]> Mon, 07 Feb 22 19:09:56 +0000 <![CDATA[Milky Way @ Gurnigel, Switzerland (1593m)]]> http://3.14.by/en/read/Milky-Way-Gurnigel-Switzerland-astrophotography
30 seconds, A7III with Samyang 8mm F2.8 @ F4. Yes, this is an APS-C lens on a full frame camera - to have larger pixels / lower noise, as higher resolution here does not help. Largest challenge was Chroma noise, ether hot pixels or when star is focused into a single pixel and it's impossible to recover real color of the star. To fix that I just reset all unusually high & sharp Chroma values to neutral.

Light pollution is visible on the horizon (left side) - it's from the nearest city, Thun - 13km away. ]]>
Sun, 15 Aug 21 14:07:26 +0000
<![CDATA[C/2020 F3 (NEOWISE)]]> http://3.14.by/en/read/2020-NEOWISE-comet body:after {
content: url(//s.14.by/neowise2.jpg);
background-image: url(//s.14.by/neowise2.jpg);
visibility: hidden;
position: absolute;
left: -999em;
}
Made a photo of С/2020 F3 NEOWISE comet, making all the news now. Sigma 70mm F2.8 (@3.5), 60x2.5s (stacked).
After subtracting background - double tail became visible (dust & gas).

On mouse over - color, on click - annotation. Core is indeed slightly green
]]>
Mon, 20 Jul 20 01:26:20 +0000