Svarichevsky Mikhail - RSS feed http://3.14.by/ Svarichevsky Mikhail - RSS feed en-us Tue, 10 Jun 2006 04:00:00 GMT Sun, 01 Oct 23 09:57:19 +0000 3@14.by 120 10 <![CDATA[Sirius and color twinkling ]]> http://3.14.by/en/read/Sirius-color-twinkling-astronomy-astrophotography-turbulence-atmosphere
Why it happens? Stars twinkle due to turbulence of the atmosphere acting as a random gradient refractive index "prism" (which is randomly shifting image & splitting colors - yes, even air has dispersion and it's visible here!) - so more/less light of different colors randomly hit lens aperture / eye. For stars air turbulence is sampled (in this case) in cylinder 62mm in diameter and ~50km in length, which makes effect very visible. Jupiter for example will average turbulence over a cone which opens up to 7.2m at 50km due to angular size of the planet, which will dramatically reduce contrast of twinkling due to averaging. Same averaging (reduction of twinkling) could happen for large telescopes (300mm+) even for stars, simply due to averaging across larger air volume.


]]>
Mon, 25 Sep 23 03:05:34 +0000
<![CDATA[EVE Online - it's getting crowded in space]]> http://3.14.by/en/read/eve-2023-1-million-sp-space-mmorpg should now get 1'000'000 SP on first login and that's the point of this post.

]]>
Sun, 24 Sep 23 23:24:51 +0000
<![CDATA[65B LLaMA on CPU]]> http://3.14.by/en/read/LLaMA-Replit-LLM-language-model-ai-cpu-quantization
16 years ago dog ate my AI book. At the time (and way before that) common argument on «Why we still don't have AI working and it is always 10 years away» was that we can't make AI work even at 1% or 0.1% human speed, even on supercomputers of the time – therefore it's not about GFLOPS.

This weekend I ran gpt4-alpaca-lora_mlp-65B language model in home lab on CPU (using llama.cpp, due to model size – there is 0 chance to run it on a consumer GPU). This model is arguably the best open LLM (this week), and 65 billion parameters is no joke: with single precision math it won't fit in 256Gb of RAM. If you let it spill into swap, even on NVMe drive – it will run at ~1 token per minute (limited by swap speed), which is about 0.5% of human speed. Even at this snail pace it can still show superhuman performance in memorization-related tasks. It is clear that it was not possible to get there 20 years ago – training time would have been prohibitive even with unlimited government funding.

And this is where unexpected open approach of Meta proven to be superior to closed, dystopian megacorp approach of OpenAI: In 10 weeks since LLaMA was released into the wild, not only derivative models were trained but 2-4-5 bit quantization enabled larger models on consumer hardware. In my case with 5bit quantization - model fits into 64Gb of RAM and runs at ~2 tokens per second (on 64-cores), which is probably 70-90% of my human speed in best shape.

For comparison, I tried Replit-code-v1.3b 2.7B model optimized for coding. After 65B monster – Replit feels like a breeze and shows very good performance despite its size. This is a good reminder that field-specific, smaller models should always be used where possible.

It feels like "1 Trillion parameters will be enough for everybody", but such models would not be practical probably for another 2 years. Meanwhile key enablers of AI proliferation could be increase of RAM in consumer GPUs beyond 24Gb (which is sadly unlikely to happen due to commercial interests) and smaller field-specific models where I would be looking into with much more interest.]]>
Mon, 22 May 23 07:57:14 +0000
<![CDATA[First tiny ASIC sent to manufacturing]]> http://3.14.by/en/read/first-asic-tinytapeout-google-skywater-pdk-openlane 5 years ago making microchip from high-level HDL with your own hands required around 300k$ worth of software licenses, process was slow and learning curve steep.

Yesterday I've submitted my first silicon for manufacturing and it was... different. In the evening wife comes as asks "How much time until deadline?". I reply: "2 hours left, but I still have to learn Verilog." (historically my digital designs were in VHDL or schematic).

All this became possible thanks to Google Skywater PDK and openlane synthesis flow - which allowed anyone to design a microchip with no paperwork to sign and licenses to buy. Then https://tinytapeout.com by Matt Venn lowered the barrier even further (idea to tapeout in ~4 hours, including learning curve).

As expected, this all allows much more people to contribute to open source flow, with my favorite being work of Teodor-Dumitru Ene https://github.com/tdene) on hardware adders which now match and beat commercial tools. I think (and hope) that in 5 years opensource tools will dominate the market on mature nodes (28nm and up), not because they are cheaper, but because they are better and easier to use.

My design fits in 100x100µm and contains 289 standard cells. There are 7 ring oscillators with frequency dividers to compare silicon performance to analog simulation across voltage/temperature. I expect to see chips in ~6-9 months, both working and under microscope :-)]]>
Sun, 04 Sep 22 15:22:58 +0000
<![CDATA[This cake is a lie.]]> http://3.14.by/en/read/stable-diffusion-ai-This-cake-is-a-lie Stable Diffusion model that was publicly released this week is a huge step forward in making AI widely accessible.

Yes, DALL-E 2 and Midjourney are impressive, but they are a blackbox. You can play with it, but can't touch the brain.

Stable Diffusion not only can be run locally on relatively inexpensive hardware (i.e. sized perfectly for wide availability, not just bigger=better), it is also easy to modify (starting from tweaking guidance scale, pipeline and noise schedulers). Access to latent space is what I was dreaming about, and Andrej Karpathy's work on latent space interpolation https://gist.github.com/karpathy/00103b0037c5aaea32fe1da1af553355) is just the glimpse into many abilities some consider to be unnatural.

Model is perfect with food, good with humans/popular animals (which are apparently well represented in the training set), but more rare Llamas/Alpakas often give you anatomically incorrect results which are almost NSFW.

On RTX3080 fp16 model completes 50 inference iterations in 6 seconds, and barely fits into 10Gb of VRAM. Just out of curiosity I run it on CPU (5800X3D) - it took 8 minutes, which is probably too painful for anything practical.

One more reason to buy 4090... for work, I promise!
]]>
Fri, 26 Aug 22 19:49:29 +0000
<![CDATA[Voron V0.1 - Ferrari among 3D printers (V0.1430)]]> http://3.14.by/en/read/Voron-V0.1-coreXY-printer-ferrari
Finished assembly and tuning of my new Voron V0.1. Stationary parts are from aluminum kit, rest I printed in ASA-X. Small size allows to reach very decent speeds and accelerations: fast profile 175/306 mm/s (perimeters / infill) with acceleration of 25'000 mm/s². For high quality - 80/150 mm/s, 15'000 mm/s². Fast acceleration and direct extruder make parameters tuning for high quality comparatively easy as extrusion speed is nearly constant. Also, pressure advance + input shaper allowed to increase acceleration from 5'000 to 25'000 mm/s² with no quality degradation on the corners.

It all works on Fluidd+Klipper, SKR-PRO v1.2 + Raspberry Pi 4. When printing 306mm/s @265°C - 40W heater is no longer enough, so I had to overclock printer a little to 28V (+36% heater power). 28V is a limit for TMC2209.

Initially I was considering to participate in SpeedBenchy contest - but things there went too far in the direction of "too fast / too bad". Printing at these speeds is limited by plastic cooling - this is why achievable speeds for high quality prints for ABS/ASA are several times higher than PLA. I.e. printing above 200mm/s is all about cooling, and is a contest of fans and air-ducts.

Update: Got my serial number V0.1430 :-)]]>
Tue, 08 Feb 22 09:00:51 +0000
<![CDATA[Walking with Alpakas]]> http://3.14.by/en/read/alpaka-walk ]]> Mon, 07 Feb 22 19:09:56 +0000 <![CDATA[Milky Way @ Gurnigel, Switzerland (1593m)]]> http://3.14.by/en/read/Milky-Way-Gurnigel-Switzerland-astrophotography
30 seconds, A7III with Samyang 8mm F2.8 @ F4. Yes, this is an APS-C lens on a full frame camera - to have larger pixels / lower noise, as higher resolution here does not help. Largest challenge was Chroma noise, ether hot pixels or when star is focused into a single pixel and it's impossible to recover real color of the star. To fix that I just reset all unusually high & sharp Chroma values to neutral.

Light pollution is visible on the horizon (left side) - it's from the nearest city, Thun - 13km away. ]]>
Sun, 15 Aug 21 14:07:26 +0000
<![CDATA[C/2020 F3 (NEOWISE)]]> http://3.14.by/en/read/2020-NEOWISE-comet body:after {
content: url(//s.14.by/neowise2.jpg);
background-image: url(//s.14.by/neowise2.jpg);
visibility: hidden;
position: absolute;
left: -999em;
}
Made a photo of С/2020 F3 NEOWISE comet, making all the news now. Sigma 70mm F2.8 (@3.5), 60x2.5s (stacked).
After subtracting background - double tail became visible (dust & gas).

On mouse over - color, on click - annotation. Core is indeed slightly green
]]>
Mon, 20 Jul 20 01:26:20 +0000
<![CDATA[Tesla coil x2]]> http://3.14.by/en/read/double-tesla


]]>
Tue, 25 Feb 20 05:29:16 +0000