Mirroring/intercepting SunPower Monitoring Traffic?

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • robillard
    replied
    Originally posted by astroboy
    the sunpower monitoring outage a couple of weekends ago was interesting - the supervisor seemed to keep retransmitting a whole load of data points for some number of hours, and then it gave up. when the monitoring came back on line it eventually transmitted everything that was pending, but it seems to have taken many hours of successful "live" transmissions for it to decide to replay the failed messages.
    Indeed! Since my script _is_ stateful, and I serve up my own page with a graph, my stuff kept going despite the sunpower outage, just based on the data that the supervisor kept trying to send. Combine that with the finer resolution of the differential data (there's some heavy rounding going on on the sunpower site!), and I'm quite happy with the results!

    Originally posted by astroboy
    my script is once again stateful; without checking the latest data point already submitted to PVOutputs, i pretty quickly overran the API request limit while the supervisor was sending the same stuff over and over again. also the lack of packet reassembly in Net::PcapUtils kind of sucks; with the big bursts of data the packets are fragmented. for whatever reason, under normal circumstances, if the supervisor sends a long packet, the 130 messages come early enough in the packet that they are not split by a fragmentation boundary. however when the supervisor was backed up there were lots of 130 messages in big, fragmented packets and so i missed some of them. Net::Pcap is kind of a pain in the butt so i'll probably leave well enough alone, it seems to work well enough.
    I have two scripts: the first one simply handles the Pcap stuff and streams just the supervisor->sunpower traffic, extracted as text, to stdout, and the second acts on the data. The second script spawns the first, and since my workflow requires promiscuous mode on the network interface, which requires sudo, this has the added benefit that only the Pcap script needs to run sudo. The decoupling means I don't incur any buffer overflows in the Pcap kernel buffer (I've made that big enough anyways that it would be very unlikely to happen!) in the first script, and means that the second script only needs to deal with a stream of text lines, which is easy.

    Also, turns out my net (and thus consumption) numbers were right after all (and track closely to the sunpower numbers, albeit without the massive rounding errors of the sunpower system), there was a flaw in my logic that summarizes each day's cumulative numbers. So the data was correct, just my processing sucks... I can fix that, I'm sure...

    Thanks for the response, and for all your help working through this (and the revelation that the wonky 102 messages really are most likely simply a checksum), much obliged for the help!

    Leave a comment:


  • DanKegel
    replied
    Originally posted by astroboy
    anyone who can build either one of those things probably also has the technical chops to just write the script from scratch...
    And yet it would be easy for you to check the script in to github.... maybe someone else would contribute polish to address those other issues. Give it a try, and be sure to credit that other project!

    Leave a comment:


  • astroboy
    replied
    Originally posted by DanKegel
    Has anyone written something like https://github.com/jbuehl/solaredge but for sunpower? (hint, hint )
    the section of my script that submits to PVO was ripped from that script!

    i think the problem is that most people won't be able to make use of such a script. in order to run the script, you either have to build some kind of ethernet bridge out of a raspberry pi, or you have to have a home-made router which runs linux and can execute arbitrary scripts. probably even a D-link or similar running DDWRT or tomato would be a tall order, since as written my script relies on a whole lot of perl modules that have to be installed from CPAN. not sure how easy it is to install/compile all the perl support is on one of those machines.

    given all that, anyone who can build either one of those things probably also has the technical chops to just write the script from scratch...

    Leave a comment:


  • DanKegel
    replied
    Has anyone written something like https://github.com/jbuehl/solaredge but for sunpower? (hint, hint )

    Leave a comment:


  • astroboy
    replied
    Originally posted by robillard

    astroboy, which number are you using for net? I'm using this first entry from the 140 messages for net, since that appears to be a lifetime net...
    my system does not have power monitoring - instead i am using an EAGLE, uploading that to wattvision, and then PVOutputs pulls that from WV. so basically i ignore every message except for 130... sorry i can't be of much help there.

    the sunpower monitoring outage a couple of weekends ago was interesting - the supervisor seemed to keep retransmitting a whole load of data points for some number of hours, and then it gave up. when the monitoring came back on line it eventually transmitted everything that was pending, but it seems to have taken many hours of successful "live" transmissions for it to decide to replay the failed messages.

    my script is once again stateful; without checking the latest data point already submitted to PVOutputs, i pretty quickly overran the API request limit while the supervisor was sending the same stuff over and over again. also the lack of packet reassembly in Net::PcapUtils kind of sucks; with the big bursts of data the packets are fragmented. for whatever reason, under normal circumstances, if the supervisor sends a long packet, the 130 messages come early enough in the packet that they are not split by a fragmentation boundary. however when the supervisor was backed up there were lots of 130 messages in big, fragmented packets and so i missed some of them. Net::Pcap is kind of a pain in the butt so i'll probably leave well enough alone, it seems to work well enough.

    Leave a comment:


  • robillard
    replied
    So I've been running with this sniffer for quite a while now, and while my production numbers are spot on, my consumption numbers are off. They tend to be less (sometimes significantly) from the SunPower website's numbers. Some of this I can see, due to the SunPower monitoring system rounding, but not to the extent that I'm seeing...

    astroboy, which number are you using for net? I'm using this first entry from the 140 messages for net, since that appears to be a lifetime net...

    Leave a comment:


  • robillard
    replied
    Originally posted by astroboy
    i ran all day today using the "difference in total lifetime energy method" and PVOutputs shows 6W more than the old sunpower monitoring site. based on observations, i think the monitoring website is truncating the 3 decimal digits to 2 instead of rounding them, so the two values are probably exactly the same. going to declare victory.
    Yes, I'm seeing similar results. I think the sunpower stuff is probably rounding inaccurately (or, to be cynical, maybe optimistically, to inflate production numbers?)...

    Originally posted by astroboy
    also i looked at the return traffic from the monitoring server, and what i'm seeing is a 4-digit message number coming back (1002) with a UTC timestamp. this timestamp does not always match what goes to the server, so it's probably not used as a protocol level "ack". the MD5sum (or whatever) in message 102 does not come back either. interestingly in the return packet (which is an http response) they set a cookie with a session ID, but it's set to expire 0 or 1 seconds after the date in the http header. i suppose that all that's happening at a protocol level here is http: an HTTP request (containing the inverter data) is made, which then expects an HTTP response. if the supervisor/inverter doesn't get the response, i guess it just makes the request again. so there does not appear to be any other "custom" protocol at work here.
    Interesting, I have been ignoring traffic coming back to the supervisor, and have been blissfully unaware that there was any...


    Originally posted by astroboy
    also, i saw the supervisor repeat some 130 messages today. as long as there are not multiple outstanding messages this works out OK - the duplicate message causes my script to write PVOutput with the same data again, since the difference in the inverter total lifetime energy is 0. if multiple 130s are repeated out-of-order then my calculations will get screwed up.
    Yeah, I saw this recently as well: a whole burst/flurry of past traffic being re-posted in one big lump to the monitoring server. I don't know what's causing this, though, since it would appear that the server is ACKing all the data. Perhaps something higher-level in the application protocol between the two is triggering this? Since my state-machine is also paying attention to the timestamp, this is not a problem for me, as I will simply ignore any messages that are for timestamps less then my expected next timestamp. I will have to deal properly with "holes" in the stream at some point...

    Originally posted by astroboy
    i guess what happens when the supervisor has been out of contact with the monitoring website for a long time is still an open question, and one day i'll have to test that to make sure the script handles that corner case. will be interesting to see if multiple 130/131s go out with one 102 protecting them in a single HTTP request.
    I think there are two things to deal with:
    - when the server is down, but the supervisor still wants to send data
    - when the power is out locally, and my supervisor is not sending data to the server

    I'll have to observe the system over time, and figure out how to react to these...

    Leave a comment:


  • astroboy
    replied
    should have expected this but longer status messages are fragmented by TCP which makes capturing the traffic properly a little more difficult. i dunno if it's path MTU stuff or what but the fragmentation threshold is quite low, like 600 bytes.

    Leave a comment:


  • astroboy
    replied
    it gets better - i just realized that PVOutput will accept a lifetime cumulative energy value for v1 if you pass "1" as argument c1. it resets the cumulative daily value when it receives the first data point on a given day. so pretty much all you have to do is fish out the lifetime energy, convert to watt-hours, and post it. repeated posts of the same date/v1 pair should still be OK, i hope. we'll find out tomorrow...

    Leave a comment:


  • astroboy
    replied
    i ran all day today using the "difference in total lifetime energy method" and PVOutputs shows 6W more than the old sunpower monitoring site. based on observations, i think the monitoring website is truncating the 3 decimal digits to 2 instead of rounding them, so the two values are probably exactly the same. going to declare victory.

    also i looked at the return traffic from the monitoring server, and what i'm seeing is a 4-digit message number coming back (1002) with a UTC timestamp. this timestamp does not always match what goes to the server, so it's probably not used as a protocol level "ack". the MD5sum (or whatever) in message 102 does not come back either. interestingly in the return packet (which is an http response) they set a cookie with a session ID, but it's set to expire 0 or 1 seconds after the date in the http header. i suppose that all that's happening at a protocol level here is http: an HTTP request (containing the inverter data) is made, which then expects an HTTP response. if the supervisor/inverter doesn't get the response, i guess it just makes the request again. so there does not appear to be any other "custom" protocol at work here.

    also, i saw the supervisor repeat some 130 messages today. as long as there are not multiple outstanding messages this works out OK - the duplicate message causes my script to write PVOutput with the same data again, since the difference in the inverter total lifetime energy is 0. if multiple 130s are repeated out-of-order then my calculations will get screwed up.

    i guess what happens when the supervisor has been out of contact with the monitoring website for a long time is still an open question, and one day i'll have to test that to make sure the script handles that corner case. will be interesting to see if multiple 130/131s go out with one 102 protecting them in a single HTTP request.



    Leave a comment:


  • astroboy
    replied
    alright, glad we got this sorted out... my script does not do any local logging, but i think what i'll do is just have it compute the very first interval the wrong way with the AC power (which at this point i'm assuming is probably peak), then switch to using the lifetime consumption. the error should be pretty tiny.

    i guess your 3 vs 4 decimal digit difference could be down to different firmware revisions in the inverter or supervisor? anyway here's hoping we continue to have 1W resolution even after 9999kwh.

    Leave a comment:


  • robillard
    replied
    Originally posted by astroboy
    i was looking at the packets a little more while responding to you and realized that 130.1 seems to contain total lifetime energy expressed in kwh. because my system has not been up for that long, this number currently reads out as 1549.341 kwh - there are 3 decimal digits. looking at some of these fields, it seems that they might have a fixed number of digits and the decimal point just floats around, decreasing the accuracy as the number gets bigger. what's your lifetime production in 130.1 and does it still have 3 decimal digits?
    Weird, two months ago in a capture I did (that I still have lying around), my 130.1 number was 4.4 digits. I just (mid last month) upgraded my solar install with more panels and a (new) larger inverter, so the 130.1 number restarted, and now I have 3.3 digits! Don't know why I lost that extra trailing digit of precision... (If it had been floating-decimal precision, then I should be getting 3.5!) We shall see how the number varies over time...

    Originally posted by astroboy
    if the 3 decimal places holds even as you get to 10,000+ kwh then i think you can just derive the last interval's production from the difference between the prior lifetime and the current lifetime production...
    Yes, I have this in my notes from a few months ago when last I looked at this:

    TrueIntervalNet == "Net Meter Total Lifetime Energy(kWh)"[t0] - "Net Meter Total Lifetime Energy(kWh)"[t-1]
    ProdInterval == "Inverter Cumulative Energy Produced(kWh)"[t0] - "Inverter Cumulative Energy Produced(kWh)"[t-1]

    But I the "Inverter Cumulative Energy Produced" value appears to be synthetic, and it never occurred to me (don't know why) to just use the "Inverter Total Lifetime Energy(kWh)" value instead... And sure enough, that works! If I look at the (t0)-(t-5) values for 130.1 and compare them against the interval production values that I get from the SunPower monitoring site, they are on track (albeit SP supervisor or website is rounding). Here are some examples:

    "Inverter Total Lifetime Energy(kWh)" (130.1) numbers from packet trace:
    09:30: 2129.7991
    09:35: 2129.9181
    09:40: 2130.0261
    09:45: 2130.1481

    (t0)-(t-5) delta (kWh):
    09:35: 0.1190
    09:40: 0.1080
    09:45: 0.1220

    "System Interval Energy Produced(kWh)" from SunPower monitoring:
    09:30: 0.09
    09:35: 0.12
    09:40: 0.11
    09:45: 0.12

    Cool beans, man, that works!

    For the consumption monitoring, it's only slightly trickier: the "Net Meter Total Lifetime Energy(kWh)" value is net, i.e. production minus consumption, so it's pretty easy math to get the consumption value based on that.

    Cool, I'll set that up and run with it for a bit, and see if the numbers agree over time.

    It is unfortunate that I will have to prime the system with a single record of values before I can start producing meaningful results, but at least once I've done this once, and persisted the data, then I never need to do is again, because I can use the previously-persisted data when I start back up again (assuming no discontinuities)...

    Cool, thanks for talking this through with me, man, much obliged! I owe you a beverage...

    Leave a comment:


  • astroboy
    replied
    yeah the 128 bits triggered the thought of md5sum in my mind. i did not notice the URL difference - that is interesting. my guess is that 100 is just a heartbeat since in a system with no consumption monitoring, the inverter stops reporting 130/131 when the sun goes down. having said that the 120 messages (which appear to be supervisor status) do continue 24 hours a day making message 100 somewhat redundant, but since it goes to a different URL i suppose it's purpose is different.
    Last edited by astroboy; 03-05-2016, 08:05 PM.

    Leave a comment:


  • astroboy
    replied
    i was looking at the packets a little more while responding to you and realized that 130.1 seems to contain total lifetime energy expressed in kwh. because my system has not been up for that long, this number currently reads out as 1549.341 kwh - there are 3 decimal digits. looking at some of these fields, it seems that they might have a fixed number of digits and the decimal point just floats around, decreasing the accuracy as the number gets bigger. what's your lifetime production in 130.1 and does it still have 3 decimal digits?

    if the 3 decimal places holds even as you get to 10,000+ kwh then i think you can just derive the current interval's production from the difference between the prior lifetime and the current lifetime production...

    also i don't have message 140 so i assume that contains consumption data?
    Last edited by astroboy; 03-05-2016, 08:23 PM.

    Leave a comment:


  • robillard
    replied
    Originally posted by robillard
    What I see is that every 2 minutes, the SP supervisor sends a message to the server that contains only 100 and 102 sub-messages (for lack of a better term), then every 5 minutes it sends the full one. Those packets get properly ACKed by the server. In those packets, the 100 sub-message just contains identifying info (the serial number of my supervisor) and the date+time, then the 102 sub-message contains this 128-bit value. And this 128-bit value is unique to the message in question, i.e. it is not a repeat of a previous one. Given that the only variable data in the 100 sub-message is the date+time, if the 102 was merely a hash/md5, then what's the point of the entire message? Surely not a ping, since every 5 minutes we get a full message with all the system data...
    On the other hand (after thinking about your theory for a bit more), it is rather significant that the size of the base64 data is exactly 128 bits, the size of an md5 or sha1 fingerprint...

    Furthermore, on looking closer at the capture logs, I note that the "every 2 minutes" messages (that only contain 100 and 102 sub-messages) are actually sent to a different URL (/Command/SMS2DataCollector.aspx) than the full-data messages (/Data/SMS2DataCollector.aspx)...

    So maybe it is simply a "hello" ping, and the actual data that the SP site shows is somehow derivable from the data being supplied. If so, I guess I just need to figure out how "Interval Energy Produced (kWh)" value is being computed...

    Leave a comment:

Working...