Suggestion of Hash Table of Parameters

Correlation

FIX Parsing (Felipe)
Suggestion of hash table of parameters
Service may go through a parser to API or directly call API
Matches, Strategies & Rulesets(Matt)
A match is a single correlation point with a variable number of parameters; links messages together
A strategy’s strength is the sum of the matches; ??? (def)
Ruleset … ??? (def)
Current issues
Efficiency
Matching without provided rules
Substring matching
Demo (Brad)
AssociationStore keeps track of matches after correlation
CorrelationEngine is mutable—able to dynamically recognize more sets of messages
Each message is given a unique identifier
Matches on trivial case of static message ID (messages are sent in unordered and are not totally identical)
Composite patterns run on totally abstract strategies

Architecture (Chelsea)

Brief description of data flow (controller started  data client registered  message router and correlation engine created  messages from the data client can be dynamically routed through the system to the proper correlation engine)
More for the class’s sake, as the stubs only contain print statements, but shows that the system builds and data flow is sensible

Licensing

We have a boiler plate (Dave: put on customer site)
JPM lawyers not yet in the right place to look into it
Need an invention disclosure or NDA about prior IP; open BSD should cover it and still allow future modifications to remain private
Need to be careful with using Mule, but we have a packaging strategy to avoid conflicts

Messaging (Dave)

Needed to create a .jar and XML file to get Mule running; .bat file starts the server
Can send the IP address, newline, and message; no exception handling right now

Questions

Jacy: Are particular correlation tasks tied to a given engine?

Brad: different sets can be correlated as desired, flexibly
Matt: could theoretically parallelize correlation engine across multiple machines and tie it back together at the view end

Felipe: Further description of multiple-version requirement?

Jacy: There is a great need for this service (though the request came from a particular team); there are many systems that need this service, but deployment limitations will not allow all of them to work off the same instance of Atropos. The different instances won’t need to correlate against one another, but they will need to coexist.

Felipe: How abstract should the correlator be?

Jacy: On the right track in terms of abstraction and allowing many strategies. Hard to imagine something more generic, just keep the implementation separate from the data semantics. Should be able to handle something other than financial traffic.

Matt: What about substrings?

Jacy: The parser should handle substrings and describe them as fields (don’t waste the correlator’s time).

Felipe: Where can the log parser live?

Jacy: It’s reasonable to expect the ability to put the parser on the JPM client, but have a backup in case it’s too intrusive. On that note, keep it lightweight to avoid intrusion. Otherwise, have the ability to send across raw strings and translate them on another machine. Again, don’t interfere!
Matt: We want to keep the location of each module as abstract as possible.

Dave: How real time does it need to be?

Jacy: As real as possible (with the understanding that it will be asynchronous). The only acceptable delay should be the time taken to actually read the files (obviously no control over JPM systems’ read-time). JPM has a “backbone” system that Atropos could buy time on and get some automatic horizontal scale. The main point is that larger inputs shouldn’t affect speed; should at least have a proof of concept that the design handles scaling. Should take advantage of both horizontal and vertical scaling.

Matt: As messages or collected or the view is updated, it seems beneficial to aggregate them.

Jacy: Anything that helps performance is acceptable, though time-boxing is not preferable. A more organic approach to batching that uses worker queues would be better and take advantage of multithreading (when a certain state is reached or an I/O operation is about to happen, etc.). Would need to wrap up the batching and see if it’s finished before the queue is updated again (maybe using System.yield). If more things are batched while waiting for I/O operations, they should also be sent with that I/O rather than waiting for the I/O to complete—this scales logarithmically rather than linearly and avoids blocking past the needs of the I/O system. Need to do some testing to figure out where the breaking point of diminishing returns falls.

Matt: Are there any foreseeable events of interest that the view client should take into account?

Jacy: Not really. Leave it extensible, but don’t worry about implementation. It’s more likely that we publish the event without a UI so others can figure out the view—plan to be able to send an alert without handling it.

Matt: Description of log files’ system architecture (FIX and XML).

Jacy: The FIX engine is market-facing, and the XML messages are client-facing. Message travel is two-way across two machines: XML  FIX  market  FIX  XML

Brad: Can we correlate messages that go to the market and come back?

Jacy: Not sure; this would be nice but would involve market latency and some obfuscation. Banks try to hide their activities to fool one another, so it’s an interesting case to figure out how to simultaneously make trades across several markets (factoring in latency accurate to the millisecond). External latency is definitely of interest, so don’t rule it out—the difference being that only the first response is interesting. Don’t be distracted by this use case.
Matt: If abstracted properly, we may be able to offer this.
Brad: We’d need to test the complexity.

Other

Feel free to send questions to Jacy (not as interested in status updates)
Mondays and Wednesdays are equally good in general
Timeline status: admin API not up, but other parts are there
Within the next two weeks or so (February 20 @ 1 pm); want a nice view client; bringing a user from the Operate team