Kiln » Dependencies » Dulwich Read More
Clone URL:  
Pushed to one repository · View In Graph Contained in master

Bug 562676: Push efficiency - report missing objects only.

Fix with tests.

Changeset 8de34e3d1f19

Parent d0a63d31eac3

committed by Jelmer Vernooij

authored by Artem Tikhomirov

Changes to 3 files · Browse files at 8de34e3d1f19 Showing diff from parent d0a63d31eac3 Diff from another changeset...

1
2
 
 
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
 
849
850
 
 
 
851
852
 
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
1
 
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
 
 
 
 
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
 
 
 
 
 
 
 
 
 
 
 
 
927
928
929
930
931
932
933
934
935
936
937
 
938
939
 
940
941
942
943
 
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
 # object_store.py -- Object store for git objects -# Copyright (C) 2008-2009 Jelmer Vernooij <jelmer@samba.org> +# Copyright (C) 2008-2012 Jelmer Vernooij <jelmer@samba.org> +# and others  #  # This program is free software; you can redistribute it and/or  # modify it under the terms of the GNU General Public License  # as published by the Free Software Foundation; either version 2  # or (at your option) a later version of the License.  #  # This program is distributed in the hope that it will be useful,  # but WITHOUT ANY WARRANTY; without even the implied warranty of  # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the  # GNU General Public License for more details.  #  # You should have received a copy of the GNU General Public License  # along with this program; if not, write to the Free Software  # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,  # MA 02110-1301, USA.      """Git object store interfaces and implementation."""      import errno  import itertools  import os  import stat  import tempfile    from dulwich.diff_tree import (   tree_changes,   walk_trees,   )  from dulwich.errors import (   NotTreeError,   )  from dulwich.file import GitFile  from dulwich.objects import (   Commit,   ShaFile,   Tag,   Tree,   ZERO_SHA,   hex_to_sha,   sha_to_hex,   hex_to_filename,   S_ISGITLINK,   object_class,   )  from dulwich.pack import (   Pack,   PackData,   iter_sha1,   write_pack_header,   write_pack_index_v2,   write_pack_object,   write_pack_objects,   compute_file_sha,   PackIndexer,   PackStreamCopier,   )    INFODIR = 'info'  PACKDIR = 'pack'      class BaseObjectStore(object):   """Object store interface."""     def determine_wants_all(self, refs):   return [sha for (ref, sha) in refs.iteritems()   if not sha in self and not ref.endswith("^{}") and   not sha == ZERO_SHA]     def iter_shas(self, shas):   """Iterate over the objects for the specified shas.     :param shas: Iterable object with SHAs   :return: Object iterator   """   return ObjectStoreIterator(self, shas)     def contains_loose(self, sha):   """Check if a particular object is present by SHA1 and is loose."""   raise NotImplementedError(self.contains_loose)     def contains_packed(self, sha):   """Check if a particular object is present by SHA1 and is packed."""   raise NotImplementedError(self.contains_packed)     def __contains__(self, sha):   """Check if a particular object is present by SHA1.     This method makes no distinction between loose and packed objects.   """   return self.contains_packed(sha) or self.contains_loose(sha)     @property   def packs(self):   """Iterable of pack objects."""   raise NotImplementedError     def get_raw(self, name):   """Obtain the raw text for an object.     :param name: sha for the object.   :return: tuple with numeric type and object contents.   """   raise NotImplementedError(self.get_raw)     def __getitem__(self, sha):   """Obtain an object by SHA1."""   type_num, uncomp = self.get_raw(sha)   return ShaFile.from_raw_string(type_num, uncomp)     def __iter__(self):   """Iterate over the SHAs that are present in this store."""   raise NotImplementedError(self.__iter__)     def add_object(self, obj):   """Add a single object to this object store.     """   raise NotImplementedError(self.add_object)     def add_objects(self, objects):   """Add a set of objects to this object store.     :param objects: Iterable over a list of objects.   """   raise NotImplementedError(self.add_objects)     def tree_changes(self, source, target, want_unchanged=False):   """Find the differences between the contents of two trees     :param source: SHA1 of the source tree   :param target: SHA1 of the target tree   :param want_unchanged: Whether unchanged files should be reported   :return: Iterator over tuples with   (oldpath, newpath), (oldmode, newmode), (oldsha, newsha)   """   for change in tree_changes(self, source, target,   want_unchanged=want_unchanged):   yield ((change.old.path, change.new.path),   (change.old.mode, change.new.mode),   (change.old.sha, change.new.sha))     def iter_tree_contents(self, tree_id, include_trees=False):   """Iterate the contents of a tree and all subtrees.     Iteration is depth-first pre-order, as in e.g. os.walk.     :param tree_id: SHA1 of the tree.   :param include_trees: If True, include tree objects in the iteration.   :return: Iterator over TreeEntry namedtuples for all the objects in a   tree.   """   for entry, _ in walk_trees(self, tree_id, None):   if not stat.S_ISDIR(entry.mode) or include_trees:   yield entry     def find_missing_objects(self, haves, wants, progress=None,   get_tagged=None):   """Find the missing objects required for a set of revisions.     :param haves: Iterable over SHAs already in common.   :param wants: Iterable over SHAs of objects to fetch.   :param progress: Simple progress function that will be called with   updated progress strings.   :param get_tagged: Function that returns a dict of pointed-to sha -> tag   sha for including tags.   :return: Iterator over (sha, path) pairs.   """   finder = MissingObjectFinder(self, haves, wants, progress, get_tagged)   return iter(finder.next, None)     def find_common_revisions(self, graphwalker):   """Find which revisions this store has in common using graphwalker.     :param graphwalker: A graphwalker object.   :return: List of SHAs that are in common   """   haves = []   sha = graphwalker.next()   while sha:   if sha in self:   haves.append(sha)   graphwalker.ack(sha)   sha = graphwalker.next()   return haves     def get_graph_walker(self, heads):   """Obtain a graph walker for this object store.     :param heads: Local heads to start search with   :return: GraphWalker object   """   return ObjectStoreGraphWalker(heads, lambda sha: self[sha].parents)     def generate_pack_contents(self, have, want, progress=None):   """Iterate over the contents of a pack file.     :param have: List of SHA1s of objects that should not be sent   :param want: List of SHA1s of objects that should be sent   :param progress: Optional progress reporting method   """   return self.iter_shas(self.find_missing_objects(have, want, progress))     def peel_sha(self, sha):   """Peel all tags from a SHA.     :param sha: The object SHA to peel.   :return: The fully-peeled SHA1 of a tag object, after peeling all   intermediate tags; if the original ref does not point to a tag, this   will equal the original SHA1.   """   obj = self[sha]   obj_class = object_class(obj.type_name)   while obj_class is Tag:   obj_class, sha = obj.object   obj = self[sha]   return obj   + def _collect_ancestors(self, heads, common = set()): + """Collect all ancestors of heads up to (excluding) those in common + :param heads: commits to start from + :param common: commits to end at, or empty set to walk repository completely + :return: a tuple (A, B) where A - all commits reachable + from heads but not present in common, B - common (shared) elements + that are directly reachable from heads + """ + bases = set() + commits = set() + queue = [] + queue.extend(heads) + while queue: + e = queue.pop(0) + if e in common: + bases.add(e) + elif e not in commits: + commits.add(e) + cmt = self[e] + queue.extend(cmt.parents) + return (commits, bases) +    class PackBasedObjectStore(BaseObjectStore):     def __init__(self):   self._pack_cache = None     @property   def alternates(self):   return []     def contains_packed(self, sha):   """Check if a particular object is present by SHA1 and is packed.     This does not check alternates.   """   for pack in self.packs:   if sha in pack:   return True   return False     def __contains__(self, sha):   """Check if a particular object is present by SHA1.     This method makes no distinction between loose and packed objects.   """   if self.contains_packed(sha) or self.contains_loose(sha):   return True   for alternate in self.alternates:   if sha in alternate:   return True   return False     def _load_packs(self):   raise NotImplementedError(self._load_packs)     def _pack_cache_stale(self):   """Check whether the pack cache is stale."""   raise NotImplementedError(self._pack_cache_stale)     def _add_known_pack(self, pack):   """Add a newly appeared pack to the cache by path.     """   if self._pack_cache is not None:   self._pack_cache.append(pack)     @property   def packs(self):   """List with pack objects."""   if self._pack_cache is None or self._pack_cache_stale():   self._pack_cache = self._load_packs()   return self._pack_cache     def _iter_alternate_objects(self):   """Iterate over the SHAs of all the objects in alternate stores."""   for alternate in self.alternates:   for alternate_object in alternate:   yield alternate_object     def _iter_loose_objects(self):   """Iterate over the SHAs of all loose objects."""   raise NotImplementedError(self._iter_loose_objects)     def _get_loose_object(self, sha):   raise NotImplementedError(self._get_loose_object)     def _remove_loose_object(self, sha):   raise NotImplementedError(self._remove_loose_object)     def pack_loose_objects(self):   """Pack loose objects.     :return: Number of objects packed   """   objects = set()   for sha in self._iter_loose_objects():   objects.add((self._get_loose_object(sha), None))   self.add_objects(list(objects))   for obj, path in objects:   self._remove_loose_object(obj.id)   return len(objects)     def __iter__(self):   """Iterate over the SHAs that are present in this store."""   iterables = self.packs + [self._iter_loose_objects()] + [self._iter_alternate_objects()]   return itertools.chain(*iterables)     def contains_loose(self, sha):   """Check if a particular object is present by SHA1 and is loose.     This does not check alternates.   """   return self._get_loose_object(sha) is not None     def get_raw(self, name):   """Obtain the raw text for an object.     :param name: sha for the object.   :return: tuple with numeric type and object contents.   """   if len(name) == 40:   sha = hex_to_sha(name)   hexsha = name   elif len(name) == 20:   sha = name   hexsha = None   else:   raise AssertionError("Invalid object name %r" % name)   for pack in self.packs:   try:   return pack.get_raw(sha)   except KeyError:   pass   if hexsha is None:   hexsha = sha_to_hex(name)   ret = self._get_loose_object(hexsha)   if ret is not None:   return ret.type_num, ret.as_raw_string()   for alternate in self.alternates:   try:   return alternate.get_raw(hexsha)   except KeyError:   pass   raise KeyError(hexsha)     def add_objects(self, objects):   """Add a set of objects to this object store.     :param objects: Iterable over objects, should support __len__.   :return: Pack object of the objects written.   """   if len(objects) == 0:   # Don't bother writing an empty pack file   return   f, commit = self.add_pack()   write_pack_objects(f, objects)   return commit()      class DiskObjectStore(PackBasedObjectStore):   """Git-style object store that exists on disk."""     def __init__(self, path):   """Open an object store.     :param path: Path of the object store.   """   super(DiskObjectStore, self).__init__()   self.path = path   self.pack_dir = os.path.join(self.path, PACKDIR)   self._pack_cache_time = 0   self._alternates = None     @property   def alternates(self):   if self._alternates is not None:   return self._alternates   self._alternates = []   for path in self._read_alternate_paths():   self._alternates.append(DiskObjectStore(path))   return self._alternates     def _read_alternate_paths(self):   try:   f = GitFile(os.path.join(self.path, "info", "alternates"),   'rb')   except (OSError, IOError), e:   if e.errno == errno.ENOENT:   return []   raise   ret = []   try:   for l in f.readlines():   l = l.rstrip("\n")   if l[0] == "#":   continue   if not os.path.isabs(l):   continue   ret.append(l)   return ret   finally:   f.close()     def add_alternate_path(self, path):   """Add an alternate path to this object store.   """   try:   os.mkdir(os.path.join(self.path, "info"))   except OSError, e:   if e.errno != errno.EEXIST:   raise   alternates_path = os.path.join(self.path, "info/alternates")   f = GitFile(alternates_path, 'wb')   try:   try:   orig_f = open(alternates_path, 'rb')   except (OSError, IOError), e:   if e.errno != errno.ENOENT:   raise   else:   try:   f.write(orig_f.read())   finally:   orig_f.close()   f.write("%s\n" % path)   finally:   f.close()   self.alternates.append(DiskObjectStore(path))     def _load_packs(self):   pack_files = []   try:   self._pack_cache_time = os.stat(self.pack_dir).st_mtime   pack_dir_contents = os.listdir(self.pack_dir)   for name in pack_dir_contents:   # TODO: verify that idx exists first   if name.startswith("pack-") and name.endswith(".pack"):   filename = os.path.join(self.pack_dir, name)   pack_files.append((os.stat(filename).st_mtime, filename))   except OSError, e:   if e.errno == errno.ENOENT:   return []   raise   pack_files.sort(reverse=True)   suffix_len = len(".pack")   return [Pack(f[:-suffix_len]) for _, f in pack_files]     def _pack_cache_stale(self):   try:   return os.stat(self.pack_dir).st_mtime > self._pack_cache_time   except OSError, e:   if e.errno == errno.ENOENT:   return True   raise     def _get_shafile_path(self, sha):   # Check from object dir   return hex_to_filename(self.path, sha)     def _iter_loose_objects(self):   for base in os.listdir(self.path):   if len(base) != 2:   continue   for rest in os.listdir(os.path.join(self.path, base)):   yield base+rest     def _get_loose_object(self, sha):   path = self._get_shafile_path(sha)   try:   return ShaFile.from_path(path)   except (OSError, IOError), e:   if e.errno == errno.ENOENT:   return None   raise     def _remove_loose_object(self, sha):   os.remove(self._get_shafile_path(sha))     def _complete_thin_pack(self, f, path, copier, indexer):   """Move a specific file containing a pack into the pack directory.     :note: The file should be on the same file system as the   packs directory.     :param f: Open file object for the pack.   :param path: Path to the pack file.   :param copier: A PackStreamCopier to use for writing pack data.   :param indexer: A PackIndexer for indexing the pack.   """   entries = list(indexer)     # Update the header with the new number of objects.   f.seek(0)   write_pack_header(f, len(entries) + len(indexer.ext_refs()))     # Must flush before reading (http://bugs.python.org/issue3207)   f.flush()     # Rescan the rest of the pack, computing the SHA with the new header.   new_sha = compute_file_sha(f, end_ofs=-20)     # Must reposition before writing (http://bugs.python.org/issue3207)   f.seek(0, os.SEEK_CUR)     # Complete the pack.   for ext_sha in indexer.ext_refs():   assert len(ext_sha) == 20   type_num, data = self.get_raw(ext_sha)   offset = f.tell()   crc32 = write_pack_object(f, type_num, data, sha=new_sha)   entries.append((ext_sha, offset, crc32))   pack_sha = new_sha.digest()   f.write(pack_sha)   f.close()     # Move the pack in.   entries.sort()   pack_base_name = os.path.join(   self.pack_dir, 'pack-' + iter_sha1(e[0] for e in entries))   os.rename(path, pack_base_name + '.pack')     # Write the index.   index_file = GitFile(pack_base_name + '.idx', 'wb')   try:   write_pack_index_v2(index_file, entries, pack_sha)   index_file.close()   finally:   index_file.abort()     # Add the pack to the store and return it.   final_pack = Pack(pack_base_name)   final_pack.check_length_and_checksum()   self._add_known_pack(final_pack)   return final_pack     def add_thin_pack(self, read_all, read_some):   """Add a new thin pack to this object store.     Thin packs are packs that contain deltas with parents that exist outside   the pack. They should never be placed in the object store directly, and   always indexed and completed as they are copied.     :param read_all: Read function that blocks until the number of requested   bytes are read.   :param read_some: Read function that returns at least one byte, but may   not return the number of bytes requested.   :return: A Pack object pointing at the now-completed thin pack in the   objects/pack directory.   """   fd, path = tempfile.mkstemp(dir=self.path, prefix='tmp_pack_')   f = os.fdopen(fd, 'w+b')     try:   indexer = PackIndexer(f, resolve_ext_ref=self.get_raw)   copier = PackStreamCopier(read_all, read_some, f,   delta_iter=indexer)   copier.verify()   return self._complete_thin_pack(f, path, copier, indexer)   finally:   f.close()     def move_in_pack(self, path):   """Move a specific file containing a pack into the pack directory.     :note: The file should be on the same file system as the   packs directory.     :param path: Path to the pack file.   """   p = PackData(path)   entries = p.sorted_entries()   basename = os.path.join(self.pack_dir,   "pack-%s" % iter_sha1(entry[0] for entry in entries))   f = GitFile(basename+".idx", "wb")   try:   write_pack_index_v2(f, entries, p.get_stored_checksum())   finally:   f.close()   p.close()   os.rename(path, basename + ".pack")   final_pack = Pack(basename)   self._add_known_pack(final_pack)   return final_pack     def add_pack(self):   """Add a new pack to this object store.     :return: Fileobject to write to and a commit function to   call when the pack is finished.   """   fd, path = tempfile.mkstemp(dir=self.pack_dir, suffix=".pack")   f = os.fdopen(fd, 'wb')   def commit():   os.fsync(fd)   f.close()   if os.path.getsize(path) > 0:   return self.move_in_pack(path)   else:   os.remove(path)   return None   return f, commit     def add_object(self, obj):   """Add a single object to this object store.     :param obj: Object to add   """   dir = os.path.join(self.path, obj.id[:2])   try:   os.mkdir(dir)   except OSError, e:   if e.errno != errno.EEXIST:   raise   path = os.path.join(dir, obj.id[2:])   if os.path.exists(path):   return # Already there, no need to write again   f = GitFile(path, 'wb')   try:   f.write(obj.as_legacy_object())   finally:   f.close()     @classmethod   def init(cls, path):   try:   os.mkdir(path)   except OSError, e:   if e.errno != errno.EEXIST:   raise   os.mkdir(os.path.join(path, "info"))   os.mkdir(os.path.join(path, PACKDIR))   return cls(path)      class MemoryObjectStore(BaseObjectStore):   """Object store that keeps all objects in memory."""     def __init__(self):   super(MemoryObjectStore, self).__init__()   self._data = {}     def _to_hexsha(self, sha):   if len(sha) == 40:   return sha   elif len(sha) == 20:   return sha_to_hex(sha)   else:   raise ValueError("Invalid sha %r" % (sha,))     def contains_loose(self, sha):   """Check if a particular object is present by SHA1 and is loose."""   return self._to_hexsha(sha) in self._data     def contains_packed(self, sha):   """Check if a particular object is present by SHA1 and is packed."""   return False     def __iter__(self):   """Iterate over the SHAs that are present in this store."""   return self._data.iterkeys()     @property   def packs(self):   """List with pack objects."""   return []     def get_raw(self, name):   """Obtain the raw text for an object.     :param name: sha for the object.   :return: tuple with numeric type and object contents.   """   obj = self[self._to_hexsha(name)]   return obj.type_num, obj.as_raw_string()     def __getitem__(self, name):   return self._data[self._to_hexsha(name)]     def __delitem__(self, name):   """Delete an object from this store, for testing only."""   del self._data[self._to_hexsha(name)]     def add_object(self, obj):   """Add a single object to this object store.     """   self._data[obj.id] = obj     def add_objects(self, objects):   """Add a set of objects to this object store.     :param objects: Iterable over a list of objects.   """   for obj, path in objects:   self._data[obj.id] = obj      class ObjectImporter(object):   """Interface for importing objects."""     def __init__(self, count):   """Create a new ObjectImporter.     :param count: Number of objects that's going to be imported.   """   self.count = count     def add_object(self, object):   """Add an object."""   raise NotImplementedError(self.add_object)     def finish(self, object):   """Finish the import and write objects to disk."""   raise NotImplementedError(self.finish)      class ObjectIterator(object):   """Interface for iterating over objects."""     def iterobjects(self):   raise NotImplementedError(self.iterobjects)      class ObjectStoreIterator(ObjectIterator):   """ObjectIterator that works on top of an ObjectStore."""     def __init__(self, store, sha_iter):   """Create a new ObjectIterator.     :param store: Object store to retrieve from   :param sha_iter: Iterator over (sha, path) tuples   """   self.store = store   self.sha_iter = sha_iter   self._shas = []     def __iter__(self):   """Yield tuple with next object and path."""   for sha, path in self.itershas():   yield self.store[sha], path     def iterobjects(self):   """Iterate over just the objects."""   for o, path in self:   yield o     def itershas(self):   """Iterate over the SHAs."""   for sha in self._shas:   yield sha   for sha in self.sha_iter:   self._shas.append(sha)   yield sha     def __contains__(self, needle):   """Check if an object is present.     :note: This checks if the object is present in   the underlying object store, not if it would   be yielded by the iterator.     :param needle: SHA1 of the object to check for   """   return needle in self.store     def __getitem__(self, key):   """Find an object by SHA1.     :note: This retrieves the object from the underlying   object store. It will also succeed if the object would   not be returned by the iterator.   """   return self.store[key]     def __len__(self):   """Return the number of objects."""   return len(list(self.itershas()))      def tree_lookup_path(lookup_obj, root_sha, path):   """Look up an object in a Git tree.     :param lookup_obj: Callback for retrieving object by SHA1   :param root_sha: SHA1 of the root tree   :param path: Path to lookup   :return: A tuple of (mode, SHA) of the resulting path.   """   tree = lookup_obj(root_sha)   if not isinstance(tree, Tree):   raise NotTreeError(root_sha)   return tree.lookup_path(lookup_obj, path)   +def _collect_filetree_revs(obj_store, tree_sha, kset): + """Collect SHA1s of files and directories for specified tree + (identified by SHA1) + :param obj_store: Object store to get objects by SHA from + :param tree_sha: tree reference to walk + :param kset: set to fill with references to files and directories + """ + filetree = obj_store[tree_sha] + for name,mode,sha in filetree.iteritems(): + if not S_ISGITLINK(mode) and sha not in kset: + kset.add(sha) + if stat.S_ISDIR(mode): + _collect_filetree_revs(obj_store, sha, kset) + +def _split_commits_and_tags(obj_store, lst, ignore_unknown = False): + """Split lst into two lists, one with commit SHA1s, another with + tag SHA1s. Commits referenced by tags are included into commits + list as well. Only SHA1s known in this repository will get + through, and unless ignore_unknown argument is True, KeyError + is thrown for SHA1 missing in the repository + :param obj_store: Object store to get objects by SHA1 from + :param lst: Collection of commit and tag SHAs + :param ignore_unknown: True to skip SHA1 missing in the + repository silently. + :return: A tuple of (commits, tags) SHA1s + """ + commits = set() + tags = set() + for e in lst: + try: + o = obj_store[e] + except KeyError: + if ignore_unknown: + pass + else: + raise + else: + if isinstance(o, Commit): + commits.add(e) + elif isinstance(o, Tag): + tags.add(e) + commits.add(o.object[1]) + else: + raise KeyError('Not a commit or a tag: %s' % e) + return (commits, tags) +    class MissingObjectFinder(object):   """Find the objects missing from another object store.     :param object_store: Object store containing at least all objects to be   sent   :param haves: SHA1s of commits not to send (already present in target)   :param wants: SHA1s of commits to send   :param progress: Optional function to report progress to.   :param get_tagged: Function that returns a dict of pointed-to sha -> tag   sha for including tags.   :param tagged: dict of pointed-to sha -> tag sha for including tags   """     def __init__(self, object_store, haves, wants, progress=None,   get_tagged=None): - haves = set(haves) - self.sha_done = haves - self.objects_to_send = set([(w, None, False) for w in wants - if w not in haves])   self.object_store = object_store + # process Commits and Tags differently + # Note, while haves may list commits/tags not available locally, + # and such SHAs would get filtered out by _split_commits_and_tags, + # wants shall list only known SHAs, and otherwise + # _split_commits_and_tags fails with KeyError + have_commits, have_tags = \ + _split_commits_and_tags(object_store, haves, True) + want_commits, want_tags = \ + _split_commits_and_tags(object_store, wants, False) + # all_ancestors is a set of commits that shall not be sent + # (complete repository up to 'haves') + all_ancestors = object_store._collect_ancestors(have_commits)[0] + # all_missing - complete set of commits between haves and wants + # common - commits from all_ancestors we hit into while + # traversing parent hierarchy of wants + missing_commits, common_commits = \ + object_store._collect_ancestors(want_commits, all_ancestors) + self.sha_done = set() + # Now, fill sha_done with commits and revisions of + # files and directories known to be both locally + # and on target. Thus these commits and files + # won't get selected for fetch + for h in common_commits: + self.sha_done.add(h) + cmt = object_store[h] + _collect_filetree_revs(object_store, cmt.tree, self.sha_done) + # record tags we have as visited, too + for t in have_tags: + self.sha_done.add(t) + + missing_tags = want_tags.difference(have_tags) + # in fact, what we 'want' is commits and tags + # we've found missing + wants = missing_commits.union(missing_tags) + + self.objects_to_send = set([(w, None, False) for w in wants]) +   if progress is None:   self.progress = lambda x: None   else:   self.progress = progress   self._tagged = get_tagged and get_tagged() or {}     def add_todo(self, entries):   self.objects_to_send.update([e for e in entries   if not e[0] in self.sha_done]) - - def parse_tree(self, tree): - self.add_todo([(sha, name, not stat.S_ISDIR(mode)) - for name, mode, sha in tree.iteritems() - if not S_ISGITLINK(mode)]) - - def parse_commit(self, commit): - self.add_todo([(commit.tree, "", False)]) - self.add_todo([(p, None, False) for p in commit.parents]) - - def parse_tag(self, tag): - self.add_todo([(tag.object[1], None, False)])     def next(self):   while True:   if not self.objects_to_send:   return None   (sha, name, leaf) = self.objects_to_send.pop()   if sha not in self.sha_done:   break   if not leaf:   o = self.object_store[sha]   if isinstance(o, Commit): - self.parse_commit(o) + self.add_todo([(o.tree, "", False)])   elif isinstance(o, Tree): - self.parse_tree(o) + self.add_todo([(s, n, not stat.S_ISDIR(m)) + for n, m, s in o.iteritems() + if not S_ISGITLINK(m)])   elif isinstance(o, Tag): - self.parse_tag(o) + self.add_todo([(o.object[1], None, False)])   if sha in self._tagged:   self.add_todo([(self._tagged[sha], None, True)])   self.sha_done.add(sha)   self.progress("counting objects: %d\r" % len(self.sha_done))   return (sha, name)      class ObjectStoreGraphWalker(object):   """Graph walker that finds what commits are missing from an object store.     :ivar heads: Revisions without descendants in the local repo   :ivar get_parents: Function to retrieve parents in the local repo   """     def __init__(self, local_heads, get_parents):   """Create a new instance.     :param local_heads: Heads to start search with   :param get_parents: Function for finding the parents of a SHA1.   """   self.heads = set(local_heads)   self.get_parents = get_parents   self.parents = {}     def ack(self, sha):   """Ack that a revision and its ancestors are present in the source."""   ancestors = set([sha])     # stop if we run out of heads to remove   while self.heads:   for a in ancestors:   if a in self.heads:   self.heads.remove(a)     # collect all ancestors   new_ancestors = set()   for a in ancestors:   ps = self.parents.get(a)   if ps is not None:   new_ancestors.update(ps)   self.parents[a] = None     # no more ancestors; stop   if not new_ancestors:   break     ancestors = new_ancestors     def next(self):   """Iterate over ancestors of heads in the target."""   if self.heads:   ret = self.heads.pop()   ps = self.get_parents(ret)   self.parents[ret] = ps   self.heads.update([p for p in ps if not p in self.parents])   return ret   return None
 
121
122
123
 
124
125
126
 
121
122
123
124
125
126
127
@@ -121,6 +121,7 @@
  'lru_cache',   'objects',   'object_store', + 'missing_obj_finder',   'pack',   'patch',   'protocol',
Change 1 of 1 Show Entire File dulwich/​tests/​test_missing_obj_finder.py Stacked
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
@@ -1,0 +1,196 @@
+# test_missing_obj_finder.py -- tests for MissingObjectFinder +# Copyright (C) 2012 syntevo GmbH +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; version 2 +# or (at your option) any later version of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, +# MA 02110-1301, USA. + +from dulwich.errors import ( + MissingCommitError, + ) +from dulwich.object_store import ( + MemoryObjectStore, + ) +from dulwich.objects import ( + Commit, + Blob, + ) +from dulwich.tests import TestCase +from utils import ( + F, + make_object, + build_commit_graph, + ) + +class MissingObjectFinderTest(TestCase): + + def setUp(self): + super(MissingObjectFinderTest, self).setUp() + self.store = MemoryObjectStore() + self.commits = [] + + def __getitem__(self, n): + # rename for brevity + return self.commits[n-1] + + def cmt(self, n): + return self[n] + + def assertMissingMatch(self, haves, wants, expected): + for sha,path in self.store.find_missing_objects(haves, wants): + self.assertTrue(sha in expected, "FAILURE: (%s,%s) erroneously reported as missing" % (sha,path)) + expected.remove(sha) + + self.assertFalse(len(expected) > 0, "FAILURE: some objects are not reported as missing: %s" % (expected)) + +class MOFLinearRepoTest(MissingObjectFinderTest): + def setUp(self): + super(MOFLinearRepoTest, self).setUp() + f1_1 = make_object(Blob, data='f1') # present in 1, removed in 3 + f2_1 = make_object(Blob, data='f2') # present in all revisions, changed in 2 and 3 + f2_2 = make_object(Blob, data='f2-changed') + f2_3 = make_object(Blob, data='f2-changed-again') + f3_2 = make_object(Blob, data='f3') # added in 2, left unmodified in 3 + + commit_spec = [[1], [2,1], [3,2]] + trees = {1: [('f1', f1_1), ('f2',f2_1)], + 2: [('f1',f1_1), ('f2', f2_2), ('f3', f3_2)], + 3: [('f2', f2_3), ('f3', f3_2)] } + # commit 1: f1 and f2 + # commit 2: f3 added, f2 changed. Missing shall report commit id and a tree referenced by commit + # commit 3: f1 removed, f2 changed. Commit sha and root tree sha shall be reported as modified + self.commits = build_commit_graph(self.store, commit_spec, trees) + self.missing_1_2 = [self.cmt(2).id, self.cmt(2).tree, f2_2.id, f3_2.id] + self.missing_2_3 = [self.cmt(3).id, self.cmt(3).tree, f2_3.id] + self.missing_1_3 = [ + self.cmt(2).id, self.cmt(3).id, + self.cmt(2).tree, self.cmt(3).tree, + f2_2.id, f3_2.id, f2_3.id] + + + def test_1_to_2(self): + self.assertMissingMatch([self.cmt(1).id], [self.cmt(2).id], self.missing_1_2) + + def test_2_to_3(self): + self.assertMissingMatch([self.cmt(2).id], [self.cmt(3).id], self.missing_2_3) + + def test_1_to_3(self): + self.assertMissingMatch([self.cmt(1).id], [self.cmt(3).id], self.missing_1_3) + + def test_bogus_haves_failure(self): + """Ensure non-existent SHA in haves are not tolerated""" + bogus_sha = self.cmt(2).id[::-1] + haves = [self.cmt(1).id, bogus_sha] + wants = [self.cmt(3).id] + self.assertRaises(KeyError, self.store.find_missing_objects, self.store, haves, wants) + + def test_bogus_wants_failure(self): + """Ensure non-existent SHA in wants are not tolerated""" + bogus_sha = self.cmt(2).id[::-1] + haves = [self.cmt(1).id] + wants = [self.cmt(3).id, bogus_sha] + self.assertRaises(KeyError, self.store.find_missing_objects, self.store, haves, wants) + + def test_no_changes(self): + self.assertMissingMatch([self.cmt(3).id], [self.cmt(3).id], []) + +class MOFMergeForkRepoTest(MissingObjectFinderTest): + """ 1 --- 2 --- 4 --- 6 --- 7 + \ / + 3 --- + \ + 5 + """ + + def setUp(self): + super(MOFMergeForkRepoTest, self).setUp() + f1_1 = make_object(Blob, data='f1') + f1_2 = make_object(Blob, data='f1-2') + f1_4 = make_object(Blob, data='f1-4') + f1_7 = make_object(Blob, data='f1-2') # same data as in rev 2 + f2_1 = make_object(Blob, data='f2') + f2_3 = make_object(Blob, data='f2-3') + f3_3 = make_object(Blob, data='f3') + f3_5 = make_object(Blob, data='f3-5') + commit_spec = [[1], [2,1], [3,2], [4,2], [5,3], [6,3,4], [7,6]] + trees = {1: [('f1', f1_1), ('f2',f2_1)], + 2: [('f1',f1_2), ('f2', f2_1)], # f1 changed + 3: [('f1',f1_2), ('f2', f2_3), ('f3', f3_3)], # f3 added, f2 changed + 4: [('f1',f1_4), ('f2',f2_1)], # f1 changed + 5: [('f1',f1_2), ('f3', f3_5)], # f2 removed, f3 changed + 6: [('f1',f1_4), ('f2',f2_3), ('f3', f3_3)], # merged 3 and 4 + 7: [('f1',f1_7), ('f2',f2_3)]} # f1 changed to match rev2. f3 removed + self.commits = build_commit_graph(self.store, commit_spec, trees) + + self.f1_2_id = f1_2.id + self.f1_4_id = f1_4.id + self.f1_7_id = f1_7.id + self.f2_3_id = f2_3.id + self.f3_3_id = f3_3.id + + self.assertEquals(f1_2.id, f1_7.id, "[sanity]") + + def test_have6_want7(self): + """ + have 6, want 7. Ideally, shall not report f1_7 as it's the same as f1_2, + however, to do so, MissingObjectFinder shall not record trees of common commits + only, but also all parent trees and tree items, which is an overkill + (i.e. in sha_done it records f1_4 as known, and doesn't record f1_2 was known + prior to that, hence can't detect f1_7 is in fact f1_2 and shall not be reported) + """ + self.assertMissingMatch([self.cmt(6).id], [self.cmt(7).id], [self.cmt(7).id, self.cmt(7).tree, self.f1_7_id]) + + def test_have4_want7(self): + """ + have 4, want 7. Shall not include rev5 as it is not in the tree between 4 and 7 + (well, it is, but its SHA's are irrelevant for 4..7 commit hierarchy) + """ + self.assertMissingMatch([self.cmt(4).id], [self.cmt(7).id], [ + self.cmt(7).id, self.cmt(6).id, self.cmt(3).id, + self.cmt(7).tree, self.cmt(6).tree, self.cmt(3).tree, + self.f2_3_id, self.f3_3_id]) + + def test_have1_want6(self): + """ + have 1, want 6. Shall not include rev5 + """ + self.assertMissingMatch([self.cmt(1).id], [self.cmt(6).id], [ + self.cmt(6).id, self.cmt(4).id, self.cmt(3).id, self.cmt(2).id, + self.cmt(6).tree, self.cmt(4).tree, self.cmt(3).tree, self.cmt(2).tree, + self.f1_2_id, self.f1_4_id, self.f2_3_id, self.f3_3_id]) + + def test_have3_want6(self): + """ + have 3, want 7. Shall not report rev2 and its tree, because + haves(3) means has parents, i.e. rev2, too + BUT shall report any changes descending rev2 (excluding rev3) + Shall NOT report f1_7 as it's techically == f1_2 + """ + self.assertMissingMatch([self.cmt(3).id], [self.cmt(7).id], [ + self.cmt(7).id, self.cmt(6).id, self.cmt(4).id, + self.cmt(7).tree, self.cmt(6).tree, self.cmt(4).tree, + self.f1_4_id]) + + def test_have5_want7(self): + """ + have 5, want 7. Common parent is rev2, hence children of rev2 from + a descent line other than rev5 shall be reported + """ + # expects f1_4 from rev6. f3_5 is known in rev5; + # f1_7 shall be the same as f1_2 (known, too) + self.assertMissingMatch([self.cmt(5).id], [self.cmt(7).id], [ + self.cmt(7).id, self.cmt(6).id, self.cmt(4).id, + self.cmt(7).tree, self.cmt(6).tree, self.cmt(4).tree, + self.f1_4_id]) +