the results are highly disappointing. i am almost certainly going to
seriously revisit scanline-based triangle rasterization as a result of
this.
first off, there's some annoying floating point inaccuracy differences
introduced with this change that is throwing some things off slightly
(most noticeable in texture sampling). i noticed this manually comparing
barycentric coordinates for a small triangle both with and without
simd. i noticed that there were small discrepancies. i assume this is
the cause, but i am at a loss how this has crept in and how it is so
noticeable despite the fact that the differences are very small. ugh.
second, the performance improvement is .... negligible. seriously
negligible. ONLY the most simple triangle rendering sees some small
gains (like solid color, no per-pixel interpolation). to be honest,
i kind of expected that i would see no gains in the more advanced
triangle rendering (like multicolor+blended, textured+blended and also
textured+multicolor+blended) because in those cases the per-pixel
operations really seem to dwarf the cost of all of the pixel/coordinate
stepping that is happening outside the pixel_fn code. the "fix" here
is likely to either use avx / avx512 simd and try to write some
pixel_fn code that can calculate multiple pixels simultaneously. this is
complicated though because there's yet unanswered questions about how
this could be worked in to the existing pixel_fn "architecture" and how
to handle cases where only some of the 4 pixels in each step are
to be rendered. maybe easy enough, but i suspect this ends up adding
more comparisons and branches in the inner loop ... which may just
continue to kill any possible performance gains. the other option is
to use threads somehow, although if threads are on the table, i'd
rather just do parallel row rendering ... that would seem to be
significantly simpler and more efficient?
however, i do not want to require avx / avx512 support right now.
probably avx wouldn't be unreasonable, but i don't believe avx512 is
ubiquitous enough yet. i'd ideally want any code i write to be runnable
on any x86 machine in the past 10-15 years. this is also what makes
even avx questionable in my mind.
threading ... i am unsure of. frankly, half the point of this library
was to keep things simple in an "old school" kind of way, and threading
seems the opposite of that to me.
ugh ugh ugh. i fucking hate this.
two primary goals with this change:
1. test triangle dimensions of varying widths, in preparation for
upcoming simd per_pixel_triangle_2d changes. we want to ensure that
scenarios where `width < 4` and `width % 4 != 0` continue to work
fine after simd changes are done
2. test triangle rendering at various rotations, because i remember
while testing fill rule changes, sometimes issues would only be
visible at certain rotations. so this is mostly just useful for
blended rendering methods, but still, is probably generally useful
too ...
i did not expect this to be problematic so soon, based on what i'd
read in various places about the stability of nightly (especially so
for the most common platforms like x86_64, etc)
as i write this commit message, all nightly versions past this date
fail to build ggdt, failing with an "illegal instruction" error.
sigh. this makes me question whether trying to use simd right now is
actually worth it ...
i really did not want to use nightly, but i think it is worth it for
portable_simd which is still nightly-only (and i'm guessing will be
for a long while still). i was not particularly enthusiastic about any
of the alternative libraries that exist for stable rust simd, as it
always seemed like something small was missing from what i knew i would
want longer term.
the color type aliases have been a long time coming. it might make
sense to re-work the various color functions into some traits that
these color type aliases implement.
added the color function variants that accept and/or return 4 color
component arrays to aide in code that is doing more complex color
manipulation where it just makes more sense to always be working with
4 color component arrays instead of always converting to/from packed
and unpacked formats
this is based off what i saw some other libraries doing. i'm not sure
what is the "best practice" to be honest, but i definitely like this.
especially the prelude is now much simpler, which is nice.
since i cannot think of any way to verify these tests in a automated
way, these are all set up currently as "ignored" tests which need to be
manually run (either directly via the IDE, e.g. CLion as I use, or via
the an ALTERNATIVE test runner via the CLI (such as nextest). these
cannot be run all together via `cargo test` (even if running with only
a single thread). `cargo test` can only be used to run them successfully
one test per invocation.
this has very limited use, outside of some tests i am writing right now,
but maybe it will be useful down the road.
(i suppose mostly, i just wanted to try this out for myself ...)
mostly working, but the imgui ui part is only a placeholder.
also this example project is kind of overly complicated, but i really
wanted this to be something that somewhat resembles a real tool that
i'd want to build.
this has no immediate use, but i think it will be useful in the future
for writing generic constraints requiring 32-bit color support while
not forcing use of any specific SystemResources implementation
since the TextInput event cannot implement Copy, necessitating the
removal of the Copy trait from KeyboardEvent, it feels wrong to me to
have the other event enums implement Copy, because it is inconsistent.
so while this is less convenient potentially, it feels consistent.
probably not a big deal in the future one way or the other.