paranoidmike

Office add-ins: VSTO or Shared? Why should I care?

Published on 2007-11-07 by paranoidmike6 Comments

I’ve noticed an interesting pattern in the VSTO Forum on MSDN. Cindy Meister (the queen of VSTO & Word programing) will very often ask a poster whether they’re developing a Shared Add-in or a VSTO Add-in. Nearly all the responses to such questions indicate that the poster has no idea, and I can’t blame them. It seems that even if someone’s used the VSTO solution templates to create a new project targeted at Office applications, they could still end up inadvertently creating a Shared (or COM) add-in.

I finally got frustrated enough at trying to judge whether a question I’d ask would be “worthy” of the VSTO Forum, or would just be punted back to me to find an appropriate newsgroup — so I decided to figure out what’s the point of this once and for all.

I quickly discovered that Cindy has gotten this question in the past, and has summarized some of the differences here:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1851118&SiteID=1

However, that still doesn’t answer the question of “Why should it matter whether my VSTO project actually turns out to be merely a Shared Add-in?”

Does it change the way that you’d use certain .NET classes — or does it limit the kinds of classes that can be used by a Shared Add-in?
Does it create incompatibilities on the deployed client — or does it require a different deployment approach for a Shared Add-in?
Do certain data types, methods for creating Event Handlers, or Office PIA-encapsulated APIs not work in a Shared Add-in that would work in a VSTO Add-in?

How can I tell whether the project I’m developing is a VSTO Add-in or a Shared Add-in?

Is there some set of Imports/Using statements that only work in one or the other (e.g. Imports Microsoft.Office.Core vs. Imports Microsoft.Office.Interop.Word)?
Do certain Visual Studio Templates lead to either Shared Add-ins or VSTO Add-ins (e.g. Visual C# > Office > 2003 Add-ins > Word Add-in vs. Visual C# > Office > 2007 Add-ins > Word Add-in vs Visual C# > Office > Word Document)? Or is it only relevant if the template used is Other Project Types > Extensibility > Shared Add-in?
Once I’ve created the project/solution, and if I don’t remember 100%, is there some filename, configuration setting, unique Event or other property of the project that would definitively indicate that it’s VSTO vs. Shared (e.g. ThisAddIn.vb, ThisAddIn_WindowActivate())?

For me, the specific issue that I have been having issues with, and which I’m beginning to suspect has a different solution depending on “Shared” or “VSTO” (despite the lack of clarity or obviousness in any of the sample code you’ll find on the ‘net), is the specific means by which a custom Event Handler can be created for CommandBarButtons in a managed add-in for Word 2003.

I’ve personally seen both code like this (C#):

uiButton.Click += new Microsoft.Office.Core._CommandBarButtonEvents_ClickEventHandler(uiButton_Click);

and like this (VB):

    Private WithEvents uiButton As Microsoft.Office.Core.CommandBarButton

It’s entirely UNclear whether these are exactly equivalent, and if not, under what environmental conditions/constraints one is better than the other. It’s also completely beyond me whether this would be considered part of a VSTO add-in or a Shared add-in. Finally, I wouldn’t have a clue what I should do differently were I to know which type of add-in this was – would it affect which of these two handler-setup approaches I used? What types of objects I should use? Something else?

What if Google Reader had a sucky API…?

Published on 2007-11-04 by paranoidmike5 Comments

I’ve been thinking more about the idea of building a VSTO add-in for Outlook that could synchronize between Attensa for Outlook and Google Reader (I wrote previously about the idea here).

I’ve had a heckuva time trying to figure out how to build Web Services client code into a Windows client application. There’s a ton of books out there documenting how to build the server side of Web Services; even the books and articles that talk about building Web Services client code, are really talking about consuming Web Services into another server application. I finally figured out a query or two that gave us some leads, but it’s wasn’t exactly a slam dunk.

I’ve also had more trouble than I expected to figure out what the story is with Google Reader’s API. Back in 2005 there was an effort by Niall Kennedy to reverse-engineer the then-current Reader API; at the time, the Reader team even commented a couple of times on the imminent release of the API. Ever since then, there’ve been various inquiries but no further details to let folks know where they stood. [Well, at least from the SOAP API standpoint.]

I’ve been getting more and more suspicious that Google no longer intends to document and release a SOAP-based API for Google Reader. I finally became convinced of this when I stumbled on a not-terribly-related article by Scoble, where he caught us up on Google’s inauspicious switch from their original SOAP Search API to an AJAX Search API.

My take: despite some of the comments on Scoble’s article that imply he’s just being paranoid, I happen to think there’s merit to the theory that Google has no interest in developing APIs that don’t further their advertising revenue. It’s not that I particularly dislike Google (I’m a big fan of many of their services), but after two years’ delay from when the Reader team was about to release their SOAP API anyway, it’s pretty hard to imagine that the great disappearing act is not in some way deliberate.

The fact that they developed and released the Reader AJAX API in less time demonstrates that they’re capable of getting a Reader API out in that timeframe, and the distinct lack of mention of any further API development (either on the Reader group discussion boards or on the Reader team’s blog) pretty much seals the deal for me.

So what’s an in-their-spare-time developer supposed to do at this point?

Google Reader: toss it…

From what I see of the Reader AJAX API documentation, it’s not meant as a full-fidelity API for manipulating the contents of a user’s Reader configuration, article read/unread status or even the list of blogs to which they’re subscribed. I’m not terribly interested in placing an eight-article list of links (+ Google ads) in a VSTO add-in to Outlook. Nor am I all that interested in trying to use an undocumented and unsupported (not to mention still-in-flux) SOAP API — it’s enough trouble for me to do something once, let alone every week or month when they decide to rip out something else. It sure doesn’t help to hear someone who’s been keeping an eye on this space say things like Google is embarking on the “creation of de jure and de facto standards which are pitched as being great for customers but seem suspiciously like attempts at Fire & Motion.“

Attensa Online reader: no more

I probably wouldn’t even be going down this road if Attensa had kept up their own online web-based feed server. However, despite multiple positive reviews and different URLs that used to point to it, it’s gone the way of the dodo bird. Heck, I’ve even lost access to Attensa’s own corporate feed server (to which they gave me an account last year after all the bug reports I’d sent them).

MyYahoo: no published RSS APIs

Many news reports repeat the “fact” that MyYahoo is one of the most popular online RSS readers on the ‘net. I dug into its interface, documentation, and searched around Yahoo’s site for a while, but it was so obtuse it was hard to actually find any references to terms like RSS or OPML. I’ve come to the conclusion that there’s either (a) no official API for MyYahoo feed reading, or (b) it’s just not something Yahoo has invested any effort into. However, Niall Kennedy has documented some API characteristics, and I stumbled upon Yahoo’s general developer documentation (among which they mention that most APIs use the REST approach). The Yahoo SDK provides C# & VB.NET samples for Yahoo’s various search properties but nothing more.

NewsGator Online: in a word, wow

According to a trip report from ETech 2006, the Newsgator API “uses standard protocols (SOAP) which can be consumed from any platform” and it “supports granular operations [add feed, mark item read, rename feed, etc]”. The Newsgator Online API has documentation and even C# sample code here. Newsgator also provides a web-based feed reader that’s optimized for iPhone (which is my ultimate online target) and a Windows Media Center-integrated web reader (which’d be a bonus for me with Vista Ultimate controlling my TV universe at home). Hey, they even provide a Desktop client that integrates with the IE7/Windows RSS Store (though it’s unclear whether it syncs with NewsGator Online). Plus there’s an Outlook-integrated client that has some of the capabilities of the Attensa Outlook client (though not the free price tag).

Rojo.com: unpublished API (or less?)

This one was mentioned on Wikipedia‘s article for Google Reader. I wasn’t able to find anything on Rojo’s site about an API, but I did see a few sporadic mentions on the ‘net (e.g. Yuan.CC’s archives through 2005, a Rojo forum suggestion). Unfortunately, as the discussion seems to center around “undocumented” APIs, I’m even less interested in this than in Google’s limited interface.

Bloglines: “it sucks a lot“

According to the ETech 2006 trip report, “The Bloglines Sync API didn’t work for synchronization since it is mainly a read-only API for fetching unread items from Bloglines. However it is complicated by the fact that fetching unread items from Bloglines marks them as read even if you don’t read them in retrieving applications.”

NewsAlloy: cool but not there yet

Currently in Beta, this free web-based feed reader implements the “Blogroll API” (whatever that means), and provides both a rich web-based and a mobile interface. Unfortunately, the developer has been struggling off and on to keep this project going, but it’s a really cool looking project.

Feedlounge: cancelled

Read here.

Gritwire: no idea

Gritwire Labs (where I’d expect mention of APIs to show up) doesn’t mention anything about APIs. The author once mentioned “It relies on an XML API to receive and provide data to the backend”, but there’s little if anything else that would help me understand what this “API” is used for.

Conclusion #1: Build a Generic Sync add-in

I’d started my solution as a Google-Attensa app (heck, I’d named it GoogleReaderSyncToAttensa). Now, after looking into this, and not seeing any one clear “winner” among the online web-based feed readers, I have decided that the only responsible thing to do is to build this add-in on the assumption that others’ll want to add their own web-based readers to the codebase. Thus, all the direct button-to-codebehind and code-to-Attensa calls should be abstracted as much as possible to not tie the code to any one web-based reader. The idea is to make it as *easy* as possible for someone to add their own assembly/class that encapsulates the necessary site-specific functionality, and (ideally) not have to rewrite any of the existing code, other than to call into the appropriate assembly/class.

Heck, if I was feeling really generous, I’d even consider abstracting the Attensa interface so that Microsoft’s IE7 Feeds interface could be used as the local client, or even the inevitable Google Apps offline reader. However, I’m getting waaayyy ahead of myself with this kind of thinking. I’m going to have a hard enough time abstracting the remote site interfaces, let alone try to wire the classes for “plug and play” with various local feed reading apps.

Conclusion #2: I Should Just Cough Up the $30 for Newsgator Inbox

At my core, I’m just a lazy guy. If I can find a pre-built application to do most of what I want to do, then I’ll suffer with the pain of that rather than go off and “build” something myself. That was the core lesson I learned from Comp Sci 1A03 (first-year programming) in undergrad, and my computing bias was solidified that way ever since.

After looking at the breadth and depth of offerings from Newsgator (SOAP API, REST API, Outlook client, Windows Feed Store integration, iPhone web client, Windows Media Center client, and broad support for synchronization among all the feed readers) it feels like a tough choice for me: spend a few months developing a synchronization framework between Attensa and one or more online feed readers, or spend the $30 on Newsgator Inbox and postpone development indefinitely.

I guess the big question is, am I more interested in learning how to code Outlook VSTO and Web Services (SOAP and/or REST), or am I really just interested in getting on with my RSS reading and work on something else? I do have plenty of other development projects I can continue doing, and it’s not like I have all this spare time to devote to multiple concurrent development projects. On the other hand, this is an interesting space for development, and (for those folks not paying $$ to Newsgator yet) there’s something to be said for laying the groundwork for a generic offline-online RSS synchronization framework.

I’m going to have to sleep on this for a while — this is not an easy choice.

What does it really mean to Prevent Buffer Overruns in Managed Code, Michael Howard?

Published on 2007-10-312016-11-13 by paranoidmikeLeave a comment

One of the reasons I’m spending so much of my free time writing code (and neglecting my wife and dogs, much to their chagrin and my isolation) is that I’m trying to personalize the lessons of developing code, and developing secure code, that I preach as part of my day-to-day job.

I’ve been seeing a lot of references to “don’t trust user input”, and I’ve been trying to figure out what I’m supposed to do in managed code. What I’m really after are some code samples or some prescriptive guidelines.

Of all the resources I know of on the subject, I suspect the best guidance I’ll find is in the book 19 Deadly Sins Of Software Security: Programming Flaws and How To Fix Them (Howard, LeBlanc, Viega). I flipped through this a couple of months ago and while it seemed heavily weighted towards unmanaged code (C and C++), I seem to remember a reasonable amount of mention of managed code as well.

When I dug into the table of contents, there wasn’t any one chapter entitled “don’t trust user input”. Instead there’s titles like “Sin 1: Buffer Overruns“, “Sin 2: Format String Problems“, “Sin 3: Integer Overflows“, “Sin 4: SQL Injection“, “Sin 5: Command Injection” and “Sin 14: Improper File Access“. [I believe these are all the sins that relate to trusting user input, but I’m sure that’s hardly all the ways that trusted user input can be harmful to your code’s health!]

Sin 1: Buffer Overruns

So it looks like this is the most significant of all the Sins to consider when developing managed code. Not only does it encapsulate the kind of thinking that should be applied to other Sins, but that it’s the most prevalent issue to expect in managed code and it applies to all types of managed code applications.

While I’ve understood for years what a buffer overrun means in general, I’ve never paid too much attention to thinking through exactly how to implement protections against buffer overruns. What’s worse is, the guidance for managed code developers in this book isn’t exactly crystal-clear (at least, not to a relative novice like me):

C# enables you to perform without a net by declaring unsafe sections; however, while it provides easier interoperability with the underlying operating system and libraries written in C/C++, you can make the same mistakes you can in C/C++. If you primarily program in higher-level languages, the main action item for you is to continue to validate data passed to external libraries, or you may act as the conduit to their flaws.

So what does this mean to the managed code developer? Am I reading this right, that we should only have to worry about calls to unmanaged code, and that all managed code functions are perfectly fine as-is? Or is this also trying to say that any calls between assemblies, whether managed-managed code or managed-unmanaged code, should be equally guarded so that all passed buffers are checked?

Let’s assume for the moment that it’s the former, and that only when we’re calling into an unmanaged code (PInvoke) function do we need to worry about protecting against buffer overruns. Should we assume that every single PInvoke needs to be protected against buffer overruns, no matter what? Or should we focus instead on following external user inputs, tracing them through our code, and only put guard code in place at one or more of those chained calls, when that external input will actually intersect with a PInvoke function?

Put another way, does this advice mean we should focus on the “back end” (protecting every PInvoke), or should we focus on the “front end” (tracing external input to any PInvoke)?

I have no real appreciation for this space, and I can imagine good reasons for taking either approach. However, I also don’t relish the thought of either approach. I’d hate to have to try to trace every external input all the way through the twisty paths that it’ll often take — what a nightmare for a large codebase (what a grueling code review that’d be)! On the other hand, it seems really inefficient to have to wrap every PInvoke in some form of guard code (or worse, wrap every call to the PInvoke – thus duplicating the extra code over and over, and still leaving yourself open to overlooking one or more critical calls).

And hey — if every PInvoke should always be wrapped in anti-overrun guard code, then shouldn’t the Microsoft employee who runs PInvoke.net be aware of that, and be ensuring that such guard code is included in every PInvoke signature that’s documented on that site? Based on this reasoning, I’d have to believe that it’s not practical — or not even theoretically effective — to try to protect against buffer overruns in the PInvoke signatures.

Quick Analysis of the Rest of the “User Input” Sins

Sin 2: Format String Problems

It sounds like the only significant effect of this Sin on managed code is when reading in input from external files. The recommended “guard code” is to try to be sure you’re reading in the file you want (and not some path– or filename–spoofed substitute).

Sin 3: Integer Overflows

It sounds like the only time this is a problem in managed code is when performing calculations inside unmanaged code. If I’m reading this right, the recommended “guard code” would check that the integer values passed into the unmanaged code call are in fact integer values.

Sin 4: SQL Injection

I’m not touching any SQL databases or data access libraries, so this is irrelevant to my current investigations. If it’s relevant for you, go read everything you can on the subject — it’s a doozy.

Sin 5: Command Injection

No .NET languages are mentioned in this chapter, but I would imagine that anytime a “shell execute” type command is instantiated, this vulnerability could be present. In such cases, I would follow the same advice they give: “You can either validate everything you’re going to ship off to the external process, or you can just validate the parts that are input from untrusted sources. Either one is fine, as long as you’re thorough about it.”

Sin 14: Improper File Access

It sounds like there’s no easy “rules” to implement as guard code for this class of flaw, but rather to be hyper-vigilant anytime managed code calls System.IO.File or StreamReader methods.

Note to self: review these VSTO articles

Published on 2007-10-30 by paranoidmike1 Comment

[aside: I have to remember to review these articles for any tricks that’ll help me troubleshoot/improve the VBA-to-VSTO conversion I’m doing for Word2MediaWiki++…]

Migrating a VBA Solution to a Visual Studio Tools for Office Add-In

Migrating Word VBA Solutions to Visual Studio Tools for Office

Convert VBA Code to Visual Basic When Migrating to Visual Studio 2005 Tools for Office

John R. Durant’s Consolidated List of Word 2003 Developer Resources

…and as a catch-all:

VSTO Forum: Non-VSTO Question/Issue Resources

Just one of the many reasons why Vista pisses me off…

Published on 2007-10-272016-11-13 by paranoidmike3 Comments

I’ve spent the better part of three nights a week, for at least a month, trying to figure out how to reinstall my Linksys WUSB54G USB Network Adapter. I’d bought this nice little device little while ago, and I was foolish (!?!) enough to think that I could disconnect it and plug it into any old USB port on my Vista PC, and have it work again. [After this many years of working with USB devices in this manner, what was I thinking ?!?]

Instead, I found out when I plugged it back in that its attempts to “reinstall the driver” (during creation of the “new” device — oops, I guess plugging it into a different USB port was NOT to Vista’s liking) were being stymied by one of the most impenetrable errors I’ve ever encountered: ERROR_DUPLICATE_SERVICE_NAME. Oh sure, you think this’d be an easy one to resolve eh? Sure – just try to find the duplicated name anywhere in the Services hive of the Registry. Nothing with “Linksys” in the name, and simply deleting anything with “Linksys” or “WUSB54G” in any of the setting, value or data didn’t cut it. Vista still bitched about the duplicate name.

The error has plenty of references online (e.g. peruse here or here), but no one seemed to have any decent solutions on resolving this for any of the Linksys network devices that were at all similar to the one I have. Plenty of speculation, just no good results.

Yes, I tried KB 823771, I’ve tried crawling through the SETUPAPI.LOG file, and I’ve tried a number of other brick walls to bang my head against. The closest I got with the SETUPAPI.LOG was to look for references to “xxxxx” (can’t recall what that said exactly anymore), as in:

#E279 Add Service: Failed to create service “xxxxxx”. Error 1078: The name is already in use as either a service name or a service display name.
#E033 Error 1078: The name is already in use as either a service name or a service display name.
#E275 Error while installing services. Error 1078: The name is already in use as either a service name or a service display name.
#E122 Device install failed. Error 1078: The name is already in use as either a service name or a service display name.
#E154 Class installer failed. Error 1078: The name is already in use as either a service name or a service display name.
#I060 Set selected driver.

Aside: Why I Hate Vista

I’m having a bitch of a time trying to get Vista to preserve a network connection through its Sleep & Resume states. I know that part of it is the fact that the networking hardware vendors haven’t written solid, stable drivers for Vista, but considering how widespread this issue is (even to this day — what, almost a year since release?), it’s really making me more frustrated with Vista [or perhaps it’s really I’m just pissed off at myself for having bought into the hype around Vista, when all it’s been for me since bringing it home has been needless hardware replacement and constant crashes, freezes, and troubleshooting].

This is the third network device I’ve purchased for my Vista box, and the third one that has had driver issues. The first one just didn’t have a Vista driver, and the claimed “should be compatible” XP driver just gave Vista too many bluescreens. The second one had a Vista driver and really good reviews on newegg.com, but the device would lose its driver as soon as Vista went to Sleep (and then resumed), and wouldn’t reload until I rebooted the box. I’m not kidding — I spent a month trying to get that one to work like it should’ve.

I’ve been a Windows bigot for most of my adult life, and I even spent six years working for Microsoft, every day spent trying to make sure that Windows would work reliably and securely for my customers. If *I* have this much trouble with Vista, my sympathies to those of you who’ve been trying to get by on just being a *part*-time Windows geek. [And my sarcasm should be apparent, as I am firmly of the belief that *no* one should have to learn the ins-and-outs of a computer, just to be able to operate it. If you *want* to geek out, by all means c’mon aboard. But if you have *other* interests, then the device should be your servant — not the other freakin’ way around.]

Resolution (?)

What did I finally do that did (or seems to have done) the trick?

I finally went through the Registry and deleted any key that in any way shape or form referred to “USB\VID_13B1”. The HARDWAREID for the Linksys WUSB54G USB Network Adapter is USB\VID_13B1&PID_000D (or some derivative thereof), and while this was never mentioned as the source of the error in any of the logs I crawled through, it finally seemed to me to be the most likely commonality among all the “duplicate names” that must’ve been detected by Vista during the attempted install of the device. I only found a few such entries, but obviously they were the underlying showstopper for re-introduction of this wireless device into my setup.

Grrr…

Porting Word2MediaWikiPlus to VB.NET: Part 14 (Mysteries Abound)

Published on 2007-10-27 by paranoidmike1 Comment

[Previous articles in this series: Prologue, Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11 (The Return), Part 12 (Initialization continued), Part 13 (VBA Oddities).]

Mysterious Character: Vertical Tab (VT) — Do These Still Show Up in Word Documents?

In working through the code in MediaWikiConvert_Lists(), I ran across a block of code that purports to “replace manual page breaks”, and is using the Chr(11) construct to do so. I must’ve been feeling extra-curious ’cause I went digging into what this means, and the harder I looked, the more puzzled I became.

According to ASCIITables.com, the character represented by decimal “11” is the so-called “vertical tab”. I’ve never heard of this before (but then, there’s a whole host of ASCII & Unicode characters I’ve never paid attention to before), so I had to check with a half-dozen other references on the ‘net before I was sufficiently convinced that this wasn’t some “off-by-one” problem where the VBA coders were intending to look for Chr(10) (aka “line feed”) or Chr(12) (aka “form feed”).

On the assumption that we’re really and truly looking for “vertical tab”, I had to do some deep digging to figure out what this might actually represent in a Word document. There’s the obligatory Wikipedia entry, which only said that “The vertical tab is  but is not allowed in SGML (including HTML) or XML 1.0.”. Then I found this amusing reference to one of the Perl RFCs, which quotes Russ Allbery to say “The last time I used a vertical tab intentionally and for some productive purpose was about 1984.”. [Sometimes these quotes get better with age…]

OK, so if the vertical tab is so undesirable and irrelevant, what could our VBA predecessors be thinking? What is the intended purpose of looking for an ASCII character that is so unappreciated?

Mysterious Code Fragment: “If 1 = 2” – WTF?

I started to notice these odd little appendages growing out of some of the newer code in the VBA macro. At first I figured there must be some special property of VBA that makes “If 1=2” a valid statement under some circumstances, and I just had to ferret out what that was.

Instead, the more I looked at it, the more puzzled I became. What the hell could this possibly mean? Under what circumstances would *any* logical programming language ever treat “If 1 = 2” as anything but a comparison of two absolute numbers, that will ALWAYS evaluate to False?

Eventually I had to find out what greater minds that mine thought about this, and so off to Google I go. As you might expect, there’s not much direct evidence of any programming practices that include adding this “If 1 = 2” statement. In fact, though it appears in the odd piece of code here and there, it’s surprisingly infrequent. However, I finally ran across what I take to be the best lesson on what this really means (even if I had to unearth it through the infamous “Google cache”):

>>>Anyone know how to comment out a whole section in VBA rather than just
>>>line by line with a ” ‘ “?
>>
>>If the code is acceptable (won’t break because some control doesn’t
>>exist, etc), I sometimes to
>> If 1 = 2 then
>> ….existing code
>> End If
>>
>>The code will never fire until the day 1 = 2.
>>
> Thanks, think Id prefer the first option. The second option might
> confuse any programmers that try and read my code.

Now that’s the understatement of the year.

So as far as I’m concerned, I’m going to go back and comment out any and all instances where I find this statement, as it tells me the original programmer didn’t want this code to fire, and was thinking of coming back to it someday after their last check-in.

Mysterious Approach: Localization via Macro? No way.

There are a few routines that attempt to implement localization at runtime. While this makes sense for VBA, this makes little if any sense for the use of VB.NET. Any English-only strings can be substituted in the corresponding Resources file that will accompany this code.

Thus, the MW_LanguageTexts() routine will be skipped, since it had little if any effect anyway.

Mysterious Exception: “add-in could not be found or could not be loaded”

I’ve been struggling for a few days to try to actually run this add-in, and after finding out why, I can say with confidence that there was no good troubleshooting guide for this.

Here’s the setup:

I could Build the add-in just fine — no build-time errors, only two compiler warnings (about unused variables).
However, when I tried to either (a) Debug the project from within Visual Studio, or (b) add the add-in manually to Word, I was completely stymied.
When I started the Debug sequence (F5) from Visual Studio, it would launch Word 2003, which created all its default menus and toolbars, and then threw this error dialog:
The details of this exception read:

Could not create an instance of startup object Word2MediaWiki__.ThisAddIn in assembly Word2MediaWikiPlusPlus, Version=1.0.0.0, Culture=neutral, PublicKeyToken=1a75eafd9e81be84.

************** Exception Text **************
Microsoft.VisualStudio.Tools.Applications.Runtime.CannotCreateStartupObjectException: Could not create an instance of startup object Word2MediaWiki__.ThisAddIn in assembly Word2MediaWikiPlusPlus, Version=1.0.0.0, Culture=neutral, PublicKeyToken=1a75eafd9e81be84. —> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> System.NullReferenceException: Object reference not set to an instance of an object.
   at Word2MediaWiki__.Word2MediaWikiPlusPlus.Convert..ctor() in C:\VS2005 Projects\Word2MediaWiki++\Word2MediaWiki++\Convert.vb:line 44
   at Word2MediaWiki__.ThisAddIn..ctor(IRuntimeServiceProvider RuntimeCallback) in C:\VS2005 Projects\Word2MediaWiki++\Word2MediaWiki++\ThisAddIn.vb:line 29
   — End of inner exception stack trace —

If I tried to load the add-in from within Word (using the Tools > COM Add-ins… menu — which you can add with these instructions), Word would only tell me:

Load Behavior: Not loaded. A runtime error occurred during the loading of the COM Add-in.

I won’t even bore you with the details of all the stuff I tried to do to debug this issue. It turned out that I was instantiating my Application object too early in the code (at least, the way I’d constructed it).

Broken Code

ThisAddin.vb (relevant chunk)

Imports Office = Microsoft.Office.Core
Imports Word2MediaWiki__.Word2MediaWikiPlusPlus.Convert

Public Class ThisAddIn

#Region " Variables "

    Private W2MWPPBar As Office.CommandBar
    WithEvents uiConvert As Office.CommandBarButton
    WithEvents uiUpload As Office.CommandBarButton
    WithEvents uiConfig As Office.CommandBarButton

    Dim DocumentConversion As Word2MediaWikiPlusPlus.Convert = New Word2MediaWikiPlusPlus.Convert ' Line 29

#End Region

Convert.vb (relevant chunk)

Imports Word = Microsoft.Office.Interop.Word

Namespace Word2MediaWikiPlusPlus

Public Class Convert

#Region "Variables"

        Dim App As Word.Application = Globals.ThisAddIn.Application 'PROBLEM - Line 44
        Dim Doc As Word.Document = App.ActiveDocument 'PROBLEM

#End Region

#Region "Public Subs"
        Public Sub InitializeActiveDocument()

            If Doc Is Nothing Then
                Exit Sub
            End If
…

        End Sub

#End Region

#Region “Public Subs”

Fixed Code

Convert.vb (relevant chunk)

Imports Word = Microsoft.Office.Interop.Word

Namespace Word2MediaWikiPlusPlus

Public Class Convert

#Region "Variables"

        Dim App As Word.Application 'FIXED 
        Dim Doc As Word.Document 'FIXED 

#End Region

#Region "Public Subs"
        Public Sub InitializeActiveDocument()

            App = Globals.ThisAddIn.Application 'NEW
            Doc = App.ActiveDocument 'NEW

            If Doc Is Nothing Then
                Exit Sub
            End If
…
        End Sub

#End Region

What I Think Went Wrong

As much as I understand of this, it seems like when the ThisAddIn class tries to create a new instance of the Convert class as a DocumentConversion object, the ThisAddIn object hasn’t been instantiated yet, so the reference in the Convert class to Globals.ThisAddIn.Application can’t be resolved (how can you get the ThisAddin.Application object if its parent object — ThisAddIn — doesn’t exist yet?) causes the NullReferenceException that is the heart of the problem.

By pulling out that instantiation code from the App variable declaration, and delaying it instead to one of the Convert class’s Subs, there was no need for the managed code to “chase its tail” — trying to resolve an object reference back through the calling code, which hadn’t been instantiated yet.

Y’know, I’m sure I read somewhere over the last year that combining the declaration with the instantiation of a variable is bound to lead to subtle debugging issues, but man. Losing three days to this? What a disaster.

Lesson for the day: It never pays to take shortcuts.

Another VSTO app idea? Man, I can’t keep up!

Published on 2007-10-25 by paranoidmike2 Comments

I’m an avid user of Attensa for Outlook, a free Outlook add-in for aggregating RSS feeds as folders of “messages” in Outlook. I like it because it (a) allows me to search my feeds quickly via Windows Desktop Search, and (b) lets me read my feeds whether I’m connected to the ‘net or not.

However, there isn’t currently a free way to read my feeds via a web browser (e.g. from my new iPhone – hee hee!). Well, I should say I can read my feeds via Google Reader, but my read/unread status doesn’t get sync’ed from Attensa to Google or back. That means if I bravely skim through a bunch of articles in one place, I’ll likely have to wade through them (or get distracted by them) again in the other.

I had a brainwave today (stand back, that could be contagious) about how to add functionality to be able to sync back & forth, and I think I’ve just dreamt up yet another coding project for myself:

http://supportbeta.attensa.com/thread/1081?tstart=0

I have a pretty reasonable idea how to write managed C# or VB.NET that can integrate with Office via the Visual Studio Tools for Office model. I’m not unfamiliar with web services, or with the basics of a .NET-based HTTP client [having just wasted a weekend authoring a very rudimentary web site parser]. I am bright enough to imagine that the Attensa add-in exposes a more abstract approach to addressing feeds & articles than just crawling the raw PST file, enumerating folders and addressing message objects directly.

Now what I’d need to know is: is there an Attensa SDK and/or API which I could leverage in an Outlook application add-in using VSTO? Would there be any advantage to using that abstraction layer, as opposed to just enumerating the PST folders and messages directly? If the Attensa team only exposed an unmanaged API, would I be creating a performance nightmare to code through that (with all the PInvoke‘ing that is required) rather than just take my chances with the native Outlook object model?

I can even imagine that the Attensa client might provide me a way of finding the translation between “articles from feed ‘x'” and “messages in folder ‘y'”, that relied on Attensa’s internal database, and then I could grind through the Outlook folders themselves. That’d be a damn sight easier than trying to match up (a) feeds from the Google Reader API (article, wiki) to the folders as they’re named in the PST file, and (b) articles from the Google Reader API to the messages stored in the PST file. It’d sure help if there was an indexed search capability in (a) the Google Reader API and (b) the Outlook PST object model.

Oh, it’s fun to imagine all the ways I could make my life easier…after six months of hard dev work to get there. Madman I am.

Porting Word2MediaWikiPlus to VB.NET: Part 13 (VBA Oddities)

Published on 2007-10-21 by paranoidmike2 Comments

[Previous articles in this series: Prologue, Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11 (The Return), Part 12 (Initialization continued).]

How to convert the VBA String() Function?

There’s a more-complicated-than-it-probably-needs-to-be subroutine in the Word2MediaWikiPlus codebase — called MW_SurroundHeader() — that seems to only be there to cleanup and reformat text in a Word document that has one of the Headings styles. It uses a function from VBA called simply String(), which is one of the first cases of a VBA function for which I cannot find an equivalent in VB.NET.

It turns out I found out what I needed from an oreilly.com article, and after running into a few brick walls in looking for a reference to this in MSDN, I started a more intelligent search. I kept coming back to references to the String Data Type, so I next looked at the “Strings in Visual Basic” topic that was referenced by “For more information on string manipulation…”. From there the next most logical leap was to “Building Strings in Visual Basic“, which led to “How to: Create Strings Using a StringBuilder in Visual Basic“.

Once there, I figured that since this was so helpful to me, I’d like to save someone the trouble next time so I added a little of that “Community Content” sauce that I myself appreciate so much.

Converting the Selection Object from VBA?

The MW_FontFormat() subroutine also uses a no-longer-supported VBA-ism, the Selection object. This isn’t all that well documented online either — or at least, I wasn’t able to find anything useful online to help figure out how to translate this into VB.NET. The best I could find was a mention that the Range object in VB shares some common methods & properties with the Selection object in VBA.

However, I happened to have a copy of an old book called the Microsoft Office XP Developer’s Guide, which was surprisingly results-oriented for an MSPress book. Pages 176-177 actually discuss “The Selection Object vs. the Range Object”, in which I am told that the Range object is actually superior to the Selection object, and should always be favoured wherever possible.

I’m not feeling up to the subtleties of Selection vs. Range right now, so I’ll leave this for another time.

Converting the Font Colour to HTML-compatible values?

This is another interesting puzzler… It seems that MediaWikiConvert_FontColors() calls RGB2HTML(), which calls OleConvertColor(), which calls OleTranslateColor(), which is a p/invoke to OLEAUT32.DLL. [Man, this is starting to read like a book of the Old Testament…]

I have a really strong gut instinct that there’s a managed code equivalent to this that will make the intended conversion in one step, and I intend to find it. There’s no good reason at this point to (a) have this many calls going on the stack, just to get access to a “simple” math function, or (b) to preserve an unmanaged call just because it’s been used all the way up to now.

I can think of at least three ways to try to find the managed class I’m after: search on OleTranslateColor, search on “RGB & HTML”, or start browsing books on managed web development.

According to this “Format Color for HTML” article, the call to OleTranslateColor is only necessary in cases where you’re using “system color constants” or “palette indices”. Since we’re getting very predictable input here that doesn’t appear to be using either of these two alternatives, right away we should be able to eliminate the unmanaged code.

That is, if I’m reading this right, then I should just be able to remove OleConvertColor() from the initial call in RGB2HTML() and leave the first line of code as

nRGBHex = Right("000000" & Hex(rgbColor), 6)

However, upon double-checking, it seems that other code blocks on the VBA macro are passing in some of the Word.WdColor enumeration constants — which I assume are equivalent to “system color constants”.

Rather than have the RGB2HTML() routine always thunk down to unmanaged code, it’d be smarter if we checked whether the color value of interest is a member of the Word.WdColor enumeration. But do the routines that generate the input parameter to RGB2HTML() generate either Long or WdColor values? Or alternatively, would the code implicitly convert from WdColor to Long as the RGB2HTML() routine initialized? I didn’t notice any overloaded instances of RGB2HTML() that took the input parameter as a WdColor value, so I have to assume that no matter what goes on outside this routine, all operations inside RGB2HTML() will only operate on colors of type Long.

If that assumption is correct, then we should be able to safely ignore the possibility that the input parameter may start out as a WdColor datatype, and that means we can safely eliminate the OleConvertColor() and OleTranslateColor() routines. [For the moment, having already had to dig them back up once, I’ll just comment them out and leave myself a note to delete them once I’ve had time to test these colour conversions and confirm this assumption is true.]

Colours in VBA vs. Colours in .NET

A more interesting question, however, is whether we’re losing colour fidelity in the conversions being performed here. According to VSTO For Mere Mortals, Chapter 4, “In VBA, colors are of type Long, and there are eight constants that can be used… In Visual Studio 2005, colors are of type Color, and there are more than 100 choices”.

Is it possible that the calls being used to derive the colours from the Active document are limited to the VBA colour constants, and that I should be looking to switch to other calls that return the .NET Color constants? I’ll just add this as another Task to the CodePlex project list, and deal with it later — it seems to me like this is hardly the biggest problem facing this Addin at the moment.

Belly laugh of the day: Hobo Power

Published on 2007-10-15 by paranoidmikeLeave a comment

Enjoy – I’m still laughing involuntarily…

Smell- Hobo Power

Porting Word2MediaWikiPlus to VB.NET: Part 12 (initialization continued…)

Published on 2007-10-14 by paranoidmikeLeave a comment

[Previous articles in this series: Prologue, Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11.]

MW_Initialize()

Much of this function seems to repeat the actions taken in Word2MediaWikiPlus(), so it’s a bit weird to see it done here as well (since this function is called explicitly by the other). While some of it can be immediately discarded, other bits have to be examined more closely – mostly because they’re poorly documented (at least at the point from which they’re being called):

Again we have an ImagePath enumeration and/or creation
- There’s an interesting new function I haven’t seen before: IIf(x,y,z)
  - VBA For Dummies tells me it does “test for ‘x’; if true, do ‘y’; if false, do ‘z'”. Fairly tidy little function there.
- Looking deeper into what’s going on here, the macro is assigning the ImagePath setting to a folder named “wiki” under the user’s My Pictures folder
- This doesn’t make a lot of sense for a folder of temporary files that are deleted at the end of the session (or before the beginning of the next)
- Therefore I’m going to make two changes:
  - this folder will be created as a subfolder of the user’s %TEMP% location
  - this folder will not only be emptied at the beginning of a session, but (as a good citizen of the computer) it will also empty its contents once it has completed a conversion
Again we have an EditorPath enumeration
- it appears that the only path being set is the Microsoft Photo Editor (which we’ve previously confirmed is no longer available)
- Is there any way to actually perform the image manipulation to which so much code has been devoted?

The more I look at this image extraction code, the more complicated it gets. At this point I’ve pretty much determined that, for all the effort it’ll cost to implement these image features, it’s just not worth the trouble in v1. I’ll continue to add TODO: comments to the VSTO add-in to show where the image code will eventually go, but I’m not going to do any further work to understand the image code until the rest of the Add-in is working.

Finally, there are the control characters that are being assigned (^l, ^m, ^p, ^s). They’re not documented in the code, and I’m having a hard time finding any documentation that discusses the use of these control characters. It doesn’t help that Google and MSDN Search don’t seem to allow you to search on “^p” — it seems they treat this as either “p” or “<p".

I believe I could treat these as global constants in the Convert class, but what isn’t clear is whether these control characters are:

special substitutions in Word, and will get converted to the native Word paragraph/new line/blank/page break code (in which case I should just use the native VSTO/VBA enumerations), or
treated by Word as ASCII text and sent to the Wiki server, which converts them to HTML when displaying the resulting article (in which case I should probably make sure there isn’t a better way to represent these in MediaWiki format).

Aha! After trying over & over, I finally came up with a search in Microsoft’s Knowledge Base that gave me an article talking about the “^p” (which it calls a “paragraph mark”):

WD2000: Text Converted to One-Row Table (Paragraph Marks Ignored)

These appear to be ancient character sequences (as early as Word 1.0), so I’m going to first try using the native Word enumerations for these character strings wherever possible. If I have to go back to using these character sequences, then I’ll drop them back in to the Convert class as Constants.

Aside: today I stumbled on an invaluable reference: the Microsoft Word Visual Basic Reference online. This implies it’s an authoritative reference for all VBA available in Microsoft Word. Should prove useful.

MediaWikiConvert_Prepare

From what I can tell by a single read through this routine’s code, this all appears to affect the ActiveDocument. That means all this code can go into the InitializeActiveDocument() subroutine (which I’ve conveniently already defined).

MW_SetOptions_2003() is just caching the Application.Options.SmartParaSelection value and then returning it once conversion is complete. This can be handled as with the other cached settings.

I don’t understand this code fragment at all:

    'Now, if we might have some problems, if we are in a table
    pg.Range.Select
    If Selection.Information(wdWithInTable) Then Selection.SplitTable

If a variable like convertPageHeaders was always False (as I can’t find anything that sets it True), then why would such a huge block of code be hidden inside this code block:
```
If GetReg("convertPageHeaders")... EndIf
```
```
It's just hard to guess what the programmer's intentions were with a never (rarely?) called piece of code.
```
Then there’s a lot of boring code conversion, where I’m just giving methods and variables more meaningful names, adding appropriate prefixes to all the Word enums being used, and just commenting the crap out of things where I don’t have a clue how to fix some weird or cryptic code routines.

Reference to a non-shared member requires an object reference

The most interesting thing I’ve had to research so far was the problem I created for myself by implementing the code into two classes (so far). I finally got around to calling the Convert class’ public methods in the ThisAddin class’ uiConvert_Click() handler. As the naive little programmer that I am, I of course first tried to just set the Imports statement at the top of the ThisAddin class, and then call the public methods “naked” like so:

        InitializeActiveDocument()

        InitializeConversion()

        PerformConversion()

Of course that didn’t work, but I didn’t know why at the time. Instead, I scratched my head for quite a while over how to handle the compiler warning “Error 232: Reference to a non-shared member requires an object reference“.

I’ve run up against this before, and I’m pretty sure I was lured at the time down the path to hell: I started adding Shared declarations all over the place. It’s really tempting — when the IDE implies you should try an easy fix like this, it’s hard to know why this should be bad. “Didn’t the IDE’s developers know what they were doing?” “Why would they lead morons like me astray?”

Unfortunately, this is akin to tugging at that first loose strand of a nice wool sweater: pretty soon I’d added so many additional Shared declarations that I’m sure the code was wide open to all sorts of future, stealthy issues I have no idea about.

This time around, once I saw that one Shared begat yet another implied request to add another Shared declaration, I stopped and did some further digging around. While I wasn’t able to find any articles or MSDN docs that really spelled it out for me, I think I figured out a worthy approach on my own. [This forum thread was as good as any.]

I’ve published the following as Community Content to the “Error 232” page on MSDN.

Avoid adding the Shared keyword

While this error message tempts the inexperienced programmer with the “easy” solution of just adding the Shared keyword to the requested Method, I advise strongly against it. Unfortunately there’s little documentation or advice out there aimed at the programmers like me who don’t really understand the problems they’ve created, nor the trade-offs in the possible solutions being (cryptically) recommended. Hopefully this’ll help out other folks like myself avoid the really nasty mistake I’ve already made a few times.

The trouble with adding the Shared keyword to a second Class’ Method is that it rarely stops there. Once you’ve shared a method, whether Public, Private or otherwise, many of that method’s members will also need adjustments. At least in my experience, the first Shared keyword will work as well as cutting off the Hydra’s head: it usually leads to one or more instances of the error “Error 227: Cannot refer to an instance member of a class from within a shared method or shared member initializer without an explicit instance of the class.” The first time I tried to kill this Hydra, I had tried to rewrite a bunch of code, and ended up with a rat’s nest of Shared keywords scattered everywhere.

A Better Approach than Adding the Shared Keyword

As the advice on this page (cryptically) recommends, try creating an instance of the class. The big fear that initially scared me off was that I’d end up either (a) unknowingly creating and destroying tons of unnecessary instances of that Class as objects, or (b) not understanding when the object I’d created fell out of scope (and would creep up on me with unpredictable garbage collection-derived errors).

What I did to alleviate this issue was to declare a “class-level” variable in the calling class of the type of the class being called, and then use that variable as the root of all subsequent uses of the called class’ methods.

This example should illustrate:

Public Class BusinessLogic   ' This is the "called" class
    Public Sub PerformAction()
        Action()
    End Sub
    Private Sub Action()
            ...
    End Sub
End Class

Public Class UserInterface   ' This is the "calling" class
  Imports BusinessLogic  ' Doesn't help with Error 232, and may not be necessary at all
    Dim documentLogic As New BusinessLogic ' class-level variable 
    Private Sub uiButton_Click(ByVal Ctrl As Microsoft.Office.Core.CommandBarButton, ByRef CancelDefault As Boolean) Handles uiButton.Click
        PerformAction()  ' Causes Error 232
        documentLogic.PerformAction() ' This call is OK
    End Sub
    ...
End Class

Y’know, sometimes I’m just documenting this stuff for myself, since I know that in a few weeks’ time I’ll have completely forgotten the solution and the logic behind it. The rest of you happen to be benefiting from my lack of memory, and I wish I could say I was being completely selfless, but I’m getting too old to be lying to folks I never even met. 🙂

	Lewis on Update my Contacts with Python…
	paranoidmike on Parsing PDFs using Python
	Anne Laski on Parsing PDFs using Python
	paranoidmike on Hashicorp Vault + Ansible + CD…
	KrzWrd on Hashicorp Vault + Ansible + CD…