Svarichevsky Mikhail - RSS feed Svarichevsky Mikhail - RSS feed en-us Tue, 10 Jun 2006 04:00:00 GMT Mon, 15 Apr 24 10:13:53 +0000 120 10 <![CDATA[Sony F828 and infrared photography]]>
Here - electric cooktop emits alot of infrared, black glass is transparent to infrared and shows it's internals:

Camera itself. With 15x15mm neodymium magnet - switch to infrared works as expected. Smaller 10x5mm magnets were too weak.
I've got 680nm+, 760nm+ and 950nm+ filters. So far 760nm one is most practical: shows world significantly different and lets decent amount of light through. At 950nm+ sensitivity of sensor drops too far, so it's only for tripod.

Next looking at my sweater in infrared:

Dyes have no spectral features in 760nm+ infrared, surprisingly even black dye. Only in 680nm+ features are barely visible.

Fri, 19 Jan 24 21:19:26 +0000
<![CDATA[Lichee Console 4A - RISC-V mini laptop : Review, benchmarks and early issues]]> small laptops and phones - but for some reason they fell out of favor of manufacturers ("bigger is more better"). Now if one wanted to get tiny laptop - one of the few opportunities would have been to fight for old Sony UMPC's on ebay which are somewhat expensive even today. Recently Raspberry Pi/CM4-based tiny laptops started to appear - especially clockwork products are neat, but they are not foldable like a laptop. When in summer of 2023 Sipeed announced Lichee Console 4A based on RISC-V SoC - I preordered it immediately and in early January I finally received it. Results of my testing, currently uncovered issues are below.

Brief specs and internals

First of all, Lichee Console is tiny, 185 x 140 x 19 mm, 656g. Build is solid and high-quality, using mostly aluminum. Keyboard has typical laptop key travel and to my feel comparable to Lenovo's but it is of course quite a bit smaller. I cannot type on it blindly (yet), but it is possible. The only inconvenient part of keyboard (in context of Linux) is compressed keys [,][.][/], which are often used in console. Trackpoint mouse is ok for someone who had it all these years on Lenovo laptops.

Lichee Console 4A runs on T-Head (Alibaba) TH1520 quad-core RISC-V SoC (4x C910 cores). While TH1520 can clock up to 2.0/2.5Ghz, in Lichee Console it is tamed down to 1.5Ghz max, likely to help with thermal dissipation. Maximum configuration is quite serious for such tiny thing: maximum of 16Gb DDR4 RAM (I got this version) and maximum of 128Gb eMMC. There is a slot for 42mm SATA M.2 SSD, but it connected via ASM1153 USB3.0->SATA adapter. More on that later.

I would prefer working on M.2 SSD as you can keep the data if something else fails (eMMC is soldered on the board and will be expensive to recover). 42mm SATA SSD's are not very popular, and the best I was able to find was Transcend MTS400 256Gb (it is still in transit). There are many such SSD's from Chinese brands though.

Display has resolution of 1280x800 and looks to be IPS. No color shift at high angles. There is a webcam on the left side of the monitor - it is average quality full HD 30p (requires good lighting), landscape orientation. It is possible to connect external display via mini-HDMI (cable included). It worked fine on FullHD monitor, but unstable/non-working on 4k.

Battery is 2S 3000mAh. Charging can be done through USB-C (maximum 5V 2.2A, does not trigger 9/12V) or via 12V jack (which I personally will not use). 12V power brick came with Chinese/US plug, so if you want to use it - you will need an adapter. Jack external diameter is 3.45mm (central positive), if you would want to find a 12V PD trigger adapter for it (something like this but double check voltage). More on battery life below.

On the software side - Debian 12 with Xfce is preinstalled, built for 64-bit RISC-V. WiFi or Ethernet connection was straightforward, and Chrome-based browser was able to play YouTube video with no issues. apt update fetches packages from Chinese server.

Unboxing video below (no comments):

Build quality:
There was only 1 build quality issue on my sample: apparently aluminum bottom part of case was squeezing keyboard, or something was pressing and bending keyboard outwards slightly (~1mm), which was catching screen bottom when opening and making unhealthy snapping sound. After reassembly & pressing on keyboard in the middle - the issue was resolved.

Disassembly/assembly is relatively complicated due to tight fit of aluminum bottom cover, and I do not recommend to disassemble the unit unless absolutely necessary.

I do not like metal clips holding lithium battery in place. After enough vibration and abuse, some swelling on the battery it is not unthinkable that they might bite into the battery and destroy the unit. Even if 0.1% of units will burn down due to this potential issue - this will be very sad. Plastic bracket for the battery or glue in place (unliked by many and hard to service) are well tested by the industry and safe. What makes it hard to do well is flex cables under the battery. On my unit I added kapton tape under and over the metal bracket to ensure it does not wiggle over the battery and has a harder time biting into the battery.

Unlike most laptops, Lichee Console uses 2 PCB's in addition to SOC module, and this will bite us later. IO board has microsd card slot, USB and analog audio.

SoC module is removable. Heat is transferred via silicon pad to a heatpipe glued to aluminum back cover.

Benchmarks & tests

CPU & Power
TH1520 @1.5GhzRaspberry Pi 4Raspberry Pi 5
idle power7.68 / 6W (with/without screen)1.93W2.42W
CoreMark 1 core6900793817725
Power 1 core8.376W (with screen)2.70W4.47W
CoreMark 4 cores256893153269860
Power 4 cores9.408W (with screen)4.85W7.35W
Here performance is slightly behind Raspberry Pi 4 due to clock speed being reduced from 2.0 to 1.5Ghz. Personally I find performance of Raspberry Pi 4 perfectly acceptable for console work, and I am satisfied with performance of TH1520 for my use. I have included Raspberry Pi 5 for comparison as it's already 2024, and later this year we'll (hopefully) see competing products using CM5.

What I don't like though is high static power consumption of Lichee Console. At idle system goes down to 300Mhz, and even with 3 cores manually parked - it still consumes ~6W (without screen). This static power consumption makes Lichee Console quite warm even at idle. Also, this gives us just ~2.5 hours of battery life without any heavy load. As USB charging is limited to 5V/2.2A - Lichee Console will charge extremely slowly when powered on (~3 hours to full charge when switched off and ~10 hours when switched on). Surely, 12V 3.45mm barrel charging is much faster.

Dynamic power consumption of C910 cores are rated at 200µW/MHz/core, which gives us 300mW dynamic power consumption for 1 core at 1.5Ghz, and 1.2W for 4 core load. Measurements confirm these numbers, so the only issue is high static power consumption. On ratio performance / dynamic power it is perfectly competitive to Raspberry Pi 4, it is only static part that hurts it.

To investigate high static power consumption I made thermal photo at idle:

Here we see that approximately half of power is dissipated by Via VL817 - USB 3.0 hub IC located right under SoC module. Less but still significant power is dissipated by ASM1153 USB->SATA adapter, despite no SATA drives connected. This is quite disappointing. If no software fix would be found to disable unused interfaces, I am personally considering de-soldering these IC's or disconnecting them from power. 5-6 vs 2.8 hours of battery life is more important for my use.

This high idle power consumption is probably why cooling fan is always on (thankfully it is quiet), even when I put Lichee Console inside the fridge :-)

WiFi & Ethernet WiFi module is connected via SDIO. Practical speed via iperf3 is 122/115 Mbit/sec. "Not great, not terrible" - but good enough for regular use.
Wired Ethernet does 925/925 Mbit/sec without jumbo packets which is nearly as good as it gets. SoC has 2 Ethernet ports, only 1 is accessible on Lichee Console.

Disk performance
Random 4k: Writes 8102 IOPS, 31.6MiB/s. Reads 2502 IOPS, 9.77MiB/s
Random 1Mb: Writes 202mb/s, Reads 130mb/s

Random access is slower than modern fast microsd cards, but sequential is acceptable (for eMMC).

Testing fast MicroSD cards (Samsung Pro Ultimate, Sandisk Extreme Pro) which can negotiate fastest possible speed (up to 200Mb/s) - uncovered that they are unstable and operations fail with io errors. This is likely caused by extremely long signal path : from SoC, then to flex connector, then folded flex cable, then path across IO board. Old/slow MicroSD cards work reliably but at snail speed. Hopefully maximum interface speed for MicroSD can be reduced in software without affecting eMMC speed.

Currently missing/broken features (mostly software):
1) Bluetooth was failing to pair devices out of the box using GUI tools.
2) No sleep function. You have to switch off / boot up every time you open Lichee Console.
3) Not sure if there is sensor detecting closed screen. Right now when closed it just continues working with the screen on.
4) Adjustment of screen brightness does not work (it is always at max brightness, or off). Update: "apt install pkexec" fixes adjustment via gui. Keyboard bindings are still need to be done.
5) Suboptimal power management leading to high static power consumption: Is it possible to disable VL817/ASM1153? Is SoC supply voltage scaling correctly at idle?

I will update the article here as software is improved.


My overall experience with Lichee Console is positive and I like it. It should be noted that at the moment it is more of a product for tinkering and not something that you can immediately use as-is for work with no changes. Substantial improvement will be required on software to fully utilize hardware capabilities (but this often happens with Linux on mobile platforms). Hardware has some flaws, they are unpleasant but not fatal (microsd stability at high speed, high idle power consumption). I am concerned about battery safety, and hopefully this is something that Sipeed can improve.

12nm TH1520 SoC offer competetive dynamic power consumption and sufficient performance, but lacks in IO (for desktop) which forced Sipeed to add additional interface ICs which happened to consume too much static power.

I hope that current rapid pace of RISC-V infrastructure development will continue and in the nearest years we'll see more RISC-V SoC's, this time with at least few lines of PCI-E - and we'll get even more exciting Linux-capable RISC-V devices. Update: Milk-V Oasis is a glimplse of this future, expected later in 2024. Looking forwared to test it.

PS. If you like microchip and their internals - you might like my blog about boiling microchips in acid :]]>
Tue, 16 Jan 24 06:34:52 +0000
<![CDATA[Ronald Reagan and Raspberry Pi]]> told a joke:

You know there’s a ten year delay in the Soviet Union of the delivery of an automobile, and only one out of seven families in the Soviet Union own automobiles. There’s a ten year wait. And you go through quite a process when you’re ready to buy, and then you put up the money in advance.

And this happened to a fella, and this is their story, that they tell, this joke, that this man, he laid down his money, and then the fella that was in charge, said to him, ‘Okay, come back in ten years and get your car.’ And he said, ‘Morning or afternoon?’ and the fella behind the counter said, ‘Well, ten years from now, what difference does it make?’ and he said, ‘Well, the plumber’s coming in the morning.'

On 28th of September preorders for Raspberry Pi 5 were opened. I did not preorder it immediately, but slept it over and placed my preorder at 6am the next day. I surely did pay 100% in advance. What I did not know at the time is that every ~6 hours was postponing delivery by ~1 month. So while first preorders were delivered in early November (unless you are a celebrity), mine was fulfilled only in early January. Still, it is better than what was happening with Raspberry Pi 4 at the peak of silicon shortage where one easily had to wait 6 months. These who really needed it surely could have paid scalpers 200% price (not sure why manufacturers hesitate to do it). Hopefully, queues for electronics will get shorter over time, not longer (although with current Taiwan situation there could be surprises).

Now, having 2 precious Pi's in my hands I can feel the privilege. The hype is partially justified, my coremark benchmarks confirm 2.2x performance boost at 1.5x power consumption and PCI-E is real. There is still quite a lot of room for further improvement until Raspberry Pi reaches 100W peak power consumption :-)

Fri, 12 Jan 24 11:59:08 +0000
<![CDATA[Finishing 10 minute task in 2 hours using ChatGPT]]> Many of us have heard stories where one was able to complete days worth of work in minutes using AI, even being outside of one's area of expertise. Indeed, often LLM's do (almost) miracles, but today I had a different experience.

The task was almost trivial: generate look-up table (LUT) for per-channel image contrast enhancement using some S-curve function, and apply it to an image. Let's not waste any time: just fire up ChatGPT (even v3.5 should do, it's just a formula), get Python code for generic S-curve (code conveniently already had visualization through matplotlib) and tune parameters until you like it before plugging it into image processing chain. ChatGPT generated code for logistic function, which is a common choice as it is among simplest, but it cannot change curve shape from contrast enhancement to reduction simply by changing shape parameter.

The issue with generated code though was that graph was showing that it is reducing contrast instead of increasing it. When I asked ChatGPT to correct this error - it apologized and produced more and more broken code. Simply manually changing shape parameter was not possible due to math limitation - formula is not generic enough. Well, it is not the end of the world, LLM's do have limits especially on narrow-field tasks, so it's not really news. But the story does not end here.

For reference, this is ChatGPT's code:

import numpy as np
import matplotlib.pyplot as plt

def create_s_curve_lut():
    # Define parameters for the sigmoid curve
    a = 10.0  # Adjust this parameter to control the curve's shape
    b = 127.5  # Midpoint of the curve (127.5 for 8-bit grayscale)

    # Create the S-curve LUT using the sigmoid function
    lut = np.arange(256)
    lut = 255 / (1 + np.exp(-a * (lut - b) / 255))

    # Normalize the LUT to the 0-255 range
    lut = (lut - np.min(lut)) / (np.max(lut) - np.min(lut)) * 255

    return lut.astype(np.uint8)

# Create the S-curve LUT
s_curve_lut = create_s_curve_lut()

# Plot the S-curve for visualization
plt.plot(s_curve_lut, range(256))
plt.xlabel("Input Values (0-255)")
plt.ylabel("Output Values (0-255)")
plt.title("S-curve Contrast Enhancement LUT")

# You can access the S-curve LUT with s_curve_lut

At this point I gave up on ChatGPT LUT code and redid it using more universal regularized incomplete beta function. I adjusted a=b parameter to achieve curve shape that I like and applied LUT to image using OpenCV's LUT function. To my surprise and disbelief function was reducing contrast instead of increasing it. What?

After extensive head-scratching, to troubleshoot the problem I made a simplified linear contrast enhancement LUT and observed expected result. Only when I added linear contrast LUT to the graph issue became clear: When I abandoned ChatGPT's S-curve function, I kept graph code. In this code ChatGPT marked graph's axis labels and even added title. But then it threw a wrench by feeding x-data into Y axis and vice versa, effectively flipping the graph. As parameters of plt.plot are not named, it is very easy to miss this error for a human.

When I tuned shape factor for beta function with a flipped graph - I made it contrast-reducing that looked like it is what I needed. When I told ChatGPT that it's S-curve function is reducing contrast instead of increasing it - I misled it (and it unconditionally believed me), as S-curve was correct but error was in graph piece. Surely, if you tell ChatGPT that error is in plt.plot parameters - it can correct it.

I remember my teacher of analytic geometry at the final exam: when I was proving my solution - he could unexpectedly do not agree with one of the steps and claim that there is an error. To get maximum mark one had to not panic and continue defending correct solution. Hopefully we will see LLM's disagree with users more.

▶ Show error in code

But that's not all: Just when I've thought we are done - there is one more bug in the code. One can notice slight asymmetry of GPT-TRAP curve at high end. It's a rounding error - calculated value is simply cast to uint8 (which discards fractional part) instead of rounding, so in average we are getting 0.5 unit / ~0.25% lower brightness of the image and significantly more rare full white values (255). What is interesting is that this error appeared to be systematic and present in all generated samples from all LLM's I've tested. I.e. apparently error was very widespread in training data of all LLM's, so they all have learned that "multiply by 255 and cast to uint8" is enough to fit values to 0..255 range.Technically this is true, but result is mathematically flawed.

▶ Show error in code

My conclusions are:
  • LLM's are like junior developers - they can and will do unexpected mistakes, they need clear instructions and guidance. The difference though is that junior developers will learn over time and LLM's will get better only in next generation. Like junior developers - LLM's needs to be "managed" with reasonable expectations.
  • All code from LLM's must be verified, the more niche field - the more tests. LLM's generate code that looks correct, and when it's not - errors can be very subtle and expensive to debug/fix.
  • In case of unexpected or puzzling results it is often faster to simply ask multiple LLM's : now in addition to ChatGPT (3.5/4) we have Copilot, Bard, Replit and more. None of these gave perfect results from the first time, but some errors were different and often less subtle / easier to get it working in 20 minutes total.
  • Some of the errors are systematic for multiple LLM's, which apparently come from training data (as LLM's currently unconditionally trust training data, unlike humans). I.e. currently LLM's cannot exceed level training data on quality, but can only approach it. It is unclear how much further work on LLM's will be needed to get perfect result consistently, I afraid it might be the case where last 10% of the work require 90% of time.
Sun, 22 Oct 23 23:18:49 +0000
<![CDATA[Sirius and color twinkling ]]>
Why it happens? Stars twinkle due to turbulence of the atmosphere acting as a random gradient refractive index "prism" (which is randomly shifting image & splitting colors - yes, even air has dispersion and it's visible here!) - so more/less light of different colors randomly hit lens aperture / eye. For stars air turbulence is sampled (in this case) in cylinder 62mm in diameter and ~50km in length, which makes effect very visible. Jupiter for example will average turbulence over a cone which opens up to 7.2m at 50km due to angular size of the planet, which will dramatically reduce contrast of twinkling due to averaging. Same averaging (reduction of twinkling) could happen for large telescopes (300mm+) even for stars, simply due to averaging across larger air volume.

One more:

Mon, 25 Sep 23 03:05:34 +0000
<![CDATA[EVE Online - it's getting crowded in space]]> should now get 1'000'000 SP on first login and that's the point of this post.

Sun, 24 Sep 23 23:24:51 +0000
<![CDATA[65B LLaMA on CPU]]>
16 years ago dog ate my AI book. At the time (and way before that) common argument on «Why we still don't have AI working and it is always 10 years away» was that we can't make AI work even at 1% or 0.1% human speed, even on supercomputers of the time – therefore it's not about GFLOPS.

This weekend I ran gpt4-alpaca-lora_mlp-65B language model in home lab on CPU (using llama.cpp, due to model size – there is 0 chance to run it on a consumer GPU). This model is arguably the best open LLM (this week), and 65 billion parameters is no joke: with single precision math it won't fit in 256Gb of RAM. If you let it spill into swap, even on NVMe drive – it will run at ~1 token per minute (limited by swap speed), which is about 0.5% of human speed. Even at this snail pace it can still show superhuman performance in memorization-related tasks. It is clear that it was not possible to get there 20 years ago – training time would have been prohibitive even with unlimited government funding.

And this is where unexpected open approach of Meta proven to be superior to closed, dystopian megacorp approach of OpenAI: In 10 weeks since LLaMA was released into the wild, not only derivative models were trained but 2-4-5 bit quantization enabled larger models on consumer hardware. In my case with 5bit quantization - model fits into 64Gb of RAM and runs at ~2 tokens per second (on 64-cores), which is probably 70-90% of my human speed in best shape.

For comparison, I tried Replit-code-v1.3b 2.7B model optimized for coding. After 65B monster – Replit feels like a breeze and shows very good performance despite its size. This is a good reminder that field-specific, smaller models should always be used where possible.

It feels like "1 Trillion parameters will be enough for everybody", but such models would not be practical probably for another 2 years. Meanwhile key enablers of AI proliferation could be increase of RAM in consumer GPUs beyond 24Gb (which is sadly unlikely to happen due to commercial interests) and smaller field-specific models where I would be looking into with much more interest.]]>
Mon, 22 May 23 07:57:14 +0000
<![CDATA[First tiny ASIC sent to manufacturing]]> 5 years ago making microchip from high-level HDL with your own hands required around 300k$ worth of software licenses, process was slow and learning curve steep.

Yesterday I've submitted my first silicon for manufacturing and it was... different. In the evening wife comes as asks "How much time until deadline?". I reply: "2 hours left, but I still have to learn Verilog." (historically my digital designs were in VHDL or schematic).

All this became possible thanks to Google Skywater PDK and openlane synthesis flow - which allowed anyone to design a microchip with no paperwork to sign and licenses to buy. Then by Matt Venn lowered the barrier even further (idea to tapeout in ~4 hours, including learning curve).

As expected, this all allows much more people to contribute to open source flow, with my favorite being work of Teodor-Dumitru Ene on hardware adders which now match and beat commercial tools. I think (and hope) that in 5 years opensource tools will dominate the market on mature nodes (28nm and up), not because they are cheaper, but because they are better and easier to use.

My design fits in 100x100µm and contains 289 standard cells. There are 7 ring oscillators with frequency dividers to compare silicon performance to analog simulation across voltage/temperature. I expect to see chips in ~6-9 months, both working and under microscope :-)]]>
Sun, 04 Sep 22 15:22:58 +0000
<![CDATA[This cake is a lie.]]> Stable Diffusion model that was publicly released this week is a huge step forward in making AI widely accessible.

Yes, DALL-E 2 and Midjourney are impressive, but they are a blackbox. You can play with it, but can't touch the brain.

Stable Diffusion not only can be run locally on relatively inexpensive hardware (i.e. sized perfectly for wide availability, not just bigger=better), it is also easy to modify (starting from tweaking guidance scale, pipeline and noise schedulers). Access to latent space is what I was dreaming about, and Andrej Karpathy's work on latent space interpolation is just the glimpse into many abilities some consider to be unnatural.

Model is perfect with food, good with humans/popular animals (which are apparently well represented in the training set), but more rare Llamas/Alpakas often give you anatomically incorrect results which are almost NSFW.

On RTX3080 fp16 model completes 50 inference iterations in 6 seconds, and barely fits into 10Gb of VRAM. Just out of curiosity I run it on CPU (5800X3D) - it took 8 minutes, which is probably too painful for anything practical.

One more reason to buy 4090... for work, I promise!
Fri, 26 Aug 22 19:49:29 +0000
<![CDATA[Voron V0.1 - Ferrari among 3D printers (V0.1430)]]>
Finished assembly and tuning of my new Voron V0.1. Stationary parts are from aluminum kit, rest I printed in ASA-X. Small size allows to reach very decent speeds and accelerations: fast profile 175/306 mm/s (perimeters / infill) with acceleration of 25'000 mm/s². For high quality - 80/150 mm/s, 15'000 mm/s². Fast acceleration and direct extruder make parameters tuning for high quality comparatively easy as extrusion speed is nearly constant. Also, pressure advance + input shaper allowed to increase acceleration from 5'000 to 25'000 mm/s² with no quality degradation on the corners.

It all works on Fluidd+Klipper, SKR-PRO v1.2 + Raspberry Pi 4. When printing 306mm/s @265°C - 40W heater is no longer enough, so I had to overclock printer a little to 28V (+36% heater power). 28V is a limit for TMC2209.

Initially I was considering to participate in SpeedBenchy contest - but things there went too far in the direction of "too fast / too bad". Printing at these speeds is limited by plastic cooling - this is why achievable speeds for high quality prints for ABS/ASA are several times higher than PLA. I.e. printing above 200mm/s is all about cooling, and is a contest of fans and air-ducts.

Update: Got my serial number V0.1430 :-)]]>
Tue, 08 Feb 22 09:00:51 +0000