the results are highly disappointing. i am almost certainly going to seriously revisit scanline-based triangle rasterization as a result of this. first off, there's some annoying floating point inaccuracy differences introduced with this change that is throwing some things off slightly (most noticeable in texture sampling). i noticed this manually comparing barycentric coordinates for a small triangle both with and without simd. i noticed that there were small discrepancies. i assume this is the cause, but i am at a loss how this has crept in and how it is so noticeable despite the fact that the differences are very small. ugh. second, the performance improvement is .... negligible. seriously negligible. ONLY the most simple triangle rendering sees some small gains (like solid color, no per-pixel interpolation). to be honest, i kind of expected that i would see no gains in the more advanced triangle rendering (like multicolor+blended, textured+blended and also textured+multicolor+blended) because in those cases the per-pixel operations really seem to dwarf the cost of all of the pixel/coordinate stepping that is happening outside the pixel_fn code. the "fix" here is likely to either use avx / avx512 simd and try to write some pixel_fn code that can calculate multiple pixels simultaneously. this is complicated though because there's yet unanswered questions about how this could be worked in to the existing pixel_fn "architecture" and how to handle cases where only some of the 4 pixels in each step are to be rendered. maybe easy enough, but i suspect this ends up adding more comparisons and branches in the inner loop ... which may just continue to kill any possible performance gains. the other option is to use threads somehow, although if threads are on the table, i'd rather just do parallel row rendering ... that would seem to be significantly simpler and more efficient? however, i do not want to require avx / avx512 support right now. probably avx wouldn't be unreasonable, but i don't believe avx512 is ubiquitous enough yet. i'd ideally want any code i write to be runnable on any x86 machine in the past 10-15 years. this is also what makes even avx questionable in my mind. threading ... i am unsure of. frankly, half the point of this library was to keep things simple in an "old school" kind of way, and threading seems the opposite of that to me. ugh ugh ugh. i fucking hate this. |
||
---|---|---|
.github/workflows | ||
examples | ||
ggdt | ||
ggdt_imgui | ||
.gitignore | ||
Cargo.toml | ||
README.md | ||
rust-toolchain.toml | ||
rustfmt.toml |
ggdt: Gered's Game Dev Tools
This is a purely for-fun project of mine. It's a personal set of retro-like/inspired game development tools for use in my own projects.
It started with a focus on DOS "VGA mode 13h"-style limitations, but is not going to be limited to just that into the future and will be expanded on as I need it to do other things. It should be noted that in this project I will do a lot of (poor) reinventing of the wheel ... because it's fun. Stringing together existing libraries all the time is dull after a while.
I'm not an expert in Rust and am probably still doing a great many things unidiomatically. But I'm learning, and that is at least half the point of this project in the first place.
See the /examples
directory for some demo apps. These will be added to over time.