Well the short version is you do it by saving tris wherever you can. Which I suppose is obvious, and not much help.
I've done some more testing since then. It appears that for the sorts of models we are interested in, running on the RT3 game engine, the basic tris count (or poly count if you prefer to call it that) is a near enough proxy for rendering load and frame rate. So I don't worry about the verts any more, and just keep an eye on total tris.
I used the Pennsy H3 as a test case for several things. It comes in at 1,500 tris, and just to see what skin quality was possible I used the one image for both locomotive and tender. I figured it was good practice, since later and larger locomotives will have to be done at similar resolution even if you have the luxury of using separate loco and tender images.
I've found that anything up to around 1,500 tris is not a problem. Once it starts getting up around 2,000 it will cause a noticeable hit to frame rate sometimes. These counts are totals for locomotive
and tender combined, including all components.
But obviously less is always going to be better for performance. Using alpha instead of mesh is a good idea where it is possible. For example, the Pennsy H3 has a boiler made from an 18-sided cylinder. That's enough sides to look smooth with the livery it carries. For something like an LNER A1, where the livery has much more distinct lining on the boiler, I'd use 24 sides to stop the lining looking blocky. I decide on this by checking the model out from a range of angles and distances and seeing what makes sense.
So if the front of the smokebox was done with 18 sides it would obviously rack up another 16 tris* (if flat) or 18 tris (if peaked in the middle). But if the front of the smokebox is done as a basic square with alpha it will only be two tris, and as long as it's done to a large enough scale it will look fine in the game.
You can sometimes use a single tri instead of a square too. For the Pennsy H3 I used single tris for all wheel layers. Each locomotive wheel has four layers. Since the tender wheels are not quite as noticeable in-game, I only used three layers for those. Adding them all up, the locomotive has 10 wheels at four tris each and the tender has 8 wheels at 3 tris each, so the total is 10x4 + 8x3 + 64 tris. If I used a square instead of a triangle this would double the number of tris, so compared to that I saved 64 tris just in the wheels. I thought this was worth experimenting with because once you start getting into Garratts and Mallets the numbers of tris in the wheels starts rapidly adding up.
The only catch here is that using a single large triangle will use up more space on the texture for a given resolution. A square is more compact for a given amount of pixellation/blockiness/edge jaggy/whatever you want to call it. The in-game geometry isn't a problem, because an equilateral triangle will be perfect if you set the radius (ie: centre to verts) to be exactly double the wheel diameter.
Oh and by the way, drivetrains are a totally known quantity now. Setting up pistons, conrods and coupling bars is simple and reliable. It's even possible to stagger the two sides of the drivetrain by 180 degrees and have them work perfectly, but you can't quarter them because a 90 degree stagger will break piston animation.
*Minimum number of tris for any flat surface is N-2, where N is the number of sides around the perimeter.